Parallel evaluation strategies for lazy data structures in Haskell
Abstract
Conventional parallel programming is complex and error prone. To improve programmer
productivity, we need to raise the level of abstraction with a higher-level
programming model that hides many parallel coordination aspects. Evaluation
strategies use non-strictness to separate the coordination and computation aspects
of a Glasgow parallel Haskell (GpH) program. This allows the specification of high
level parallel programs, eliminating the low-level complexity of synchronisation and
communication associated with parallel programming.
This thesis employs a data-structure-driven approach for parallelism derived through
generic parallel traversal and evaluation of sub-components of data structures. We
focus on evaluation strategies over list, tree and graph data structures, allowing
re-use across applications with minimal changes to the sequential algorithm.
In particular, we develop novel evaluation strategies for tree data structures, using
core functional programming techniques for coordination control, achieving more
flexible parallelism. We use non-strictness to control parallelism more flexibly. We
apply the notion of fuel as a resource that dictates parallelism generation, in particular,
the bi-directional flow of fuel, implemented using a circular program definition,
in a tree structure as a novel way of controlling parallel evaluation. This is the first
use of circular programming in evaluation strategies and is complemented by a lazy
function for bounding the size of sub-trees.
We extend these control mechanisms to graph structures and demonstrate performance
improvements on several parallel graph traversals. We combine circularity
for control for improved performance of strategies with circularity for computation
using circular data structures. In particular, we develop a hybrid traversal strategy
for graphs, exploiting breadth-first order for exposing parallelism initially, and
then proceeding with a depth-first order to minimise overhead associated with a full
parallel breadth-first traversal.
The efficiency of the tree strategies is evaluated on a benchmark program, and
two non-trivial case studies: a Barnes-Hut algorithm for the n-body problem and
sparse matrix multiplication, both using quad-trees. We also evaluate a graph search
algorithm implemented using the various traversal strategies.
We demonstrate improved performance on a server-class multicore machine with
up to 48 cores, with the advanced fuel splitting mechanisms proving to be more
flexible in throttling parallelism. To guide the behaviour of the strategies, we develop
heuristics-based parameter selection to select their specific control parameters.