Efficient, concurrent Bayesian analysis of full waveform LaDAR data
Abstract
Bayesian analysis of full waveform laser detection and ranging (LaDAR)
signals using reversible jump Markov chain Monte Carlo (RJMCMC) algorithms
have shown higher estimation accuracy, resolution and sensitivity to
detect weak signatures for 3D surface profiling, and construct multiple layer
images with varying number of surface returns. However, it is computational
expensive. Although parallel computing has the potential to reduce both the
processing time and the requirement for persistent memory storage, parallelizing
the serial sampling procedure in RJMCMC is a significant challenge
in both statistical and computing domains. While several strategies have been
developed for Markov chain Monte Carlo (MCMC) parallelization, these are
usually restricted to fixed dimensional parameter estimates, and not obviously
applicable to RJMCMC for varying dimensional signal analysis.
In the statistical domain, we propose an effective, concurrent RJMCMC algorithm,
state space decomposition RJMCMC (SSD-RJMCMC), which divides
the entire state space into groups and assign to each an independent
RJMCMC chain with restricted variation of model dimensions. It intrinsically
has a parallel structure, a form of model-level parallelization. Applying
the convergence diagnostic, we can adaptively assess the convergence of the
Markov chain on-the-fly and so dynamically terminate the chain generation.
Evaluations on both synthetic and real data demonstrate that the concurrent
chains have shorter convergence length and hence improved sampling efficiency.
Parallel exploration of the candidate models, in conjunction with an
error detection and correction scheme, improves the reliability of surface detection.
By adaptively generating a complimentary MCMC sequence for the
determined model, it enhances the accuracy for surface profiling.
In the computing domain, we develop a data parallel SSD-RJMCMC (DP
SSD-RJMCMCU) to achieve efficient parallel implementation on a distributed
computer cluster. Adding data-level parallelization on top of the model-level
parallelization, it formalizes a task queue and introduces an automatic scheduler
for dynamic task allocation. These two strategies successfully diminish
the load imbalance that occurred in SSD-RJMCMC. Thanks to the coarse
granularity, the processors communicate at a very low frequency. The MPIbased
implementation on a Beowulf cluster demonstrates that compared with
RJMCMC, DP SSD-RJMCMCU has further reduced problem size and computation
complexity. Therefore, it can achieve a super linear speedup if the
number of data segments and processors are chosen wisely.