2014 Rice Oil & Gas HPC has ended
Back To Schedule
Thursday, March 6 • 11:35am - 11:40am
Lightning Talk: Accelerating Reverse Time Migration: A Dataflow Approach, Hicham Lahlou, Xcelerit

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!



As the age of harvesting easily accessible Oil and Gas resources is coming to an end, more complex geologies have to be explored to find new reservoirs. These geologies often violate the assumptions underlying the Kirchhoff Time Migration (KTM) algorithm, calling for more complex algorithms to reconstruct the Earth's subsurface from seismic wave measurement data. Hence, Reverse Time Migration (RTM) is the current state of the art algorithm for seismic imaging, giving more accurate 2D and 3D images of the subsurface than KTM. Until recently, the enormous computational complexity involved hindered the widespread application of the RTM algorithm in the industry. With hardware advances of multi-core CPUs as well as increased use of high performance accelerator processors such as GPUs or the Xeon Phi, it is now possible to reconstruct subsurface images within reasonable time frames. However, most programming approaches available for these processors do not provide enough hardware abstraction for end-users, i.e., geophysicists. This poses a significant barrier to adopting advanced HPC hardware and using it efficiently. We briefly explain the RTM algorithm and how it is typically implemented. The algorithm is analyzed to identify the key performance bottlenecks, both for computation and data access. The main implementation challenges are detailed, such as managing the data, parallelizing and distributing the computation, and exploiting hardware capabilities of multi-core CPUs, GPUs, and Xeon Phi. To cope with these challenges, we propose to model the RTM as a dataflow graph and automate the performance optimizations and execution management. Dataflow graphs are directed graphs of processing stages (actors), where data is streamed along the edges and processed by the actors. This model exposes several types of parallelism and optimization opportunities, such as pipeline parallelism, data parallelism, and memory locality. Using this model, programmers can focus on the algorithm itself and the performance optimizations and execution management can be left to an automated tool. Further, the actors themselves can be implemented independently of the execution device, enabling code portability between different hardware. We give a mapping of RTM algorithms to a dataflow graph and show that this is independent of the target execution hardware. The full algorithm is captured in the model, and data and task dependencies are fully exposed - without explicitly using parallel programming concepts. The benefits of this approach and how it can overcome the implementation challenges mentioned earlier are explained in detail. Using an example implementation, important aspects of the execution management, such as memory access patterns, data transfers, cache efficiency, and asynchronous execution are detailed. We give mappings of these aspects to multi-core CPUs, GPUs, and Xeon Phi, explaining the similarities and differences. As typical systems have more than one accelerator processor, we also cover scheduling dataflow graphs to multiple execution devices. As a practical example, we use the Xcelerit SDK as an implementation framework that is based on a dataflow programming model. It exploits the mentioned optimization opportunities and abstracts the hardware specifics from the user. The performance has been measured for both multi-core CPUs and GPUs for a range of algorithm parameters. It is within 5% of equivalent hand-tuned implementations of the algorithm, but achieved with a significantly lower implementation effort. This shows the potential of employing a dataflow approach for RTM.


Thursday March 6, 2014 11:35am - 11:40am PST
BRC 103 Rice University 6500 Main Street at University, Houston, TX 77030

Attendees (0)