Efficient, concurrent Bayesian analysis of full waveform LaDAR data
Bayesian analysis of full waveform laser detection and ranging (LaDAR) signals using reversible jump Markov chain Monte Carlo (RJMCMC) algorithms have shown higher estimation accuracy, resolution and sensitivity to detect weak signatures for 3D surface profiling, and construct multiple layer images with varying number of surface returns. However, it is computational expensive. Although parallel computing has the potential to reduce both the processing time and the requirement for persistent memory storage, parallelizing the serial sampling procedure in RJMCMC is a significant challenge in both statistical and computing domains. While several strategies have been developed for Markov chain Monte Carlo (MCMC) parallelization, these are usually restricted to fixed dimensional parameter estimates, and not obviously applicable to RJMCMC for varying dimensional signal analysis. In the statistical domain, we propose an effective, concurrent RJMCMC algorithm, state space decomposition RJMCMC (SSD-RJMCMC), which divides the entire state space into groups and assign to each an independent RJMCMC chain with restricted variation of model dimensions. It intrinsically has a parallel structure, a form of model-level parallelization. Applying the convergence diagnostic, we can adaptively assess the convergence of the Markov chain on-the-fly and so dynamically terminate the chain generation. Evaluations on both synthetic and real data demonstrate that the concurrent chains have shorter convergence length and hence improved sampling efficiency. Parallel exploration of the candidate models, in conjunction with an error detection and correction scheme, improves the reliability of surface detection. By adaptively generating a complimentary MCMC sequence for the determined model, it enhances the accuracy for surface profiling. In the computing domain, we develop a data parallel SSD-RJMCMC (DP SSD-RJMCMCU) to achieve efficient parallel implementation on a distributed computer cluster. Adding data-level parallelization on top of the model-level parallelization, it formalizes a task queue and introduces an automatic scheduler for dynamic task allocation. These two strategies successfully diminish the load imbalance that occurred in SSD-RJMCMC. Thanks to the coarse granularity, the processors communicate at a very low frequency. The MPIbased implementation on a Beowulf cluster demonstrates that compared with RJMCMC, DP SSD-RJMCMCU has further reduced problem size and computation complexity. Therefore, it can achieve a super linear speedup if the number of data segments and processors are chosen wisely.