Loading…
2014 Rice Oil & Gas HPC has ended

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Applications Session II [clear filter]
Thursday, March 6
 

1:00pm PST

Applications Session II: A High Performance Computational Platform for Simulation of Giant Reservoir Models, Majdi Baddourah, Saudi Aramco

DOWNLOAD PRESENTATION

WATCH VIDEO

A High Performance Computational Platform for Simulation of Giant Reservoir Models Majdi Baddourah, M. Ehtesham Hayder, Badr Harbi, Ahmed Zawawi and Fouad Abouheit Saudi Aramco Abstract Simulation of high resolution reservoir models is useful to gain insight into oil and gas reservoirs. Nowadays, massive, comprehensive reservoir simulation models can be built with detailed geological and well log data. These models require a very large high performance computing (HPC) platform for conducting reservoir simulation. Saudi Aramco has developed a state-of-the-art simulator, GigaPOWERS, which is capable of simulating multibillion cell reservoir models. The presentation will provide an overview of challenges related to constructing HPCs and visualizing the simulation output of giant reservoir models, and how the computational platform at Saudi Aramco is designed to overcome these challenges. A large HPC platform can be designed for reservoir simulation by connecting multiple Linux clusters in a simulation grid. Such an environment can provide the necessary capacity and computational power to solve multibillion cell reservoir models. Such a simulation grid for reservoir simulation has been designed in Saudi Aramco’s Exploration and Petroleum Engineering Center (EXPEC) Computer Center. In this study, we provide the benchmark results of multiple giant fields to evaluate the performance of the Saudi Aramco simulation grid for reservoir simulation. Communication and input/output (I/O) routines in the simulator can add a considerable overhead in computation on such a computing platform. Connectivity between clusters on our simulation grid is tuned to maintain a high level of scalability in simulation. Excellent scalability results have been obtained for computations of giant simulation models on the simulation grid. Simulation models in the order of one billion cells pose a challenge to pre- and post-processing applications, which must load and process data in a reasonable time. Remote visualization, level of detail and load-on-demand algorithms were implemented in these applications, and data formats were revised to efficiently process and visualize massive simulation models.

Moderators
avatar for Simanti Das

Simanti Das

Manager, High Performance Computing Software Development & Support, ExxonMobil Technical Computing Company
Simanti Das, is currently the manager of High Performance Computing software development and support group in ExxonMobil Upstream IT organization. She is responsible for providing software development, optimization and support for massively parallel seismic imaging technologies for... Read More →

Speakers

Thursday March 6, 2014 1:00pm - 1:20pm PST
BRC 282 Rice University 6500 Main Street at University, Houston, TX 77030

1:20pm PST

Applications Session II: The impact of discontinuous coefficients and partitioning on parallel reservoir simulation performance, Jonathan Graham, ExxonMobil Upstream Research Company)

PRESENTATION NOT AVAILABLE

VIDEO NOT AVAILABLE

Slow convergence of iterative linear solvers for physical problems with jump or discontinuous coefficients is a well-known challenge for parallel computation. While many papers focus on the finite element (FE) method with an overlapping Schwarz or similar preconditioner for elliptic problems (e.g., Graham and Scheichl 2008), literature with respect to reservoir simulations is limited to steady-state porous media flow with FE (Vuik et al., 2001; Cliffe et al., 2000), or considers two-phase flow, but applies multi-level domain decomposition and finite differences (Kees et al., 2003). However, there are reservoir simulations using either two-stage preconditioners (Aksoylu et al., 2007; Klie et al., 2009) or overlapping Schwarz (Usadi et al., 2007). We consider, instead, the case of non-overlapping, block Jacobi for two-point-flux-approximation finite-volume reservoir simulation. With increased industrial use of HPC for reservoir simulations, significant abnormalities in performance behavior are seen when different HPC configurations are selected. This has been theorized to be partition-induced solver degradation. We construct simple test cases to clearly illustrate the effect and, dependent on the inclusion of both high and low permeability regions inside a subdomain, verify a 20% difference in the average iteration count for linear solutions of the Jacobian. We also consider realistic cases. For a 3D unstructured grid reservoir model with one-hundred-thousand grid cells, we create two identical partitions except for one cell: a nearly vertical stack of cells with high transmissibility connections to each other are split between partitions either above or below this pivotal cell. This small change resulted in a 1.7X increase in computational time due to a similar increase in total linear iterations. Experimentation on a 3-million-cell unstructured grid with as many as 1024 partitions showed a range of 2X in total linear iterations over the course of the simulation. These results show that robust industrial application of HPC for reservoir simulation requires partitioning that is load balanced, communication minimizing and encapsulates local flow patterns. A transmissibility-weighted graph partitioner has been developed that mitigates the performance variability seen in these test cases.

Moderators
avatar for Simanti Das

Simanti Das

Manager, High Performance Computing Software Development & Support, ExxonMobil Technical Computing Company
Simanti Das, is currently the manager of High Performance Computing software development and support group in ExxonMobil Upstream IT organization. She is responsible for providing software development, optimization and support for massively parallel seismic imaging technologies for... Read More →

Speakers

Thursday March 6, 2014 1:20pm - 1:40pm PST
BRC 282 Rice University 6500 Main Street at University, Houston, TX 77030

1:40pm PST

Applications Session II: Strong Scalability of Reservoir Simulation on Massively Parallel Computers: Issues and Results, Vadim Dyadechko, ExxonMobil Upstream Research Company

PRESENTATION NOT AVAILABLE

VIDEO NOT AVAILABLE

Numerical simulation of reservoirs is an integral part of commercial development studies to optimize petroleum recovery. Modern petroleum reservoir simulation requires simulating detailed and computationally expensive geological and physical models. Parallel reservoir simulators have the potential to solve larger, more realistic problems than previously possible. To make the solution of these large problems feasible, an efficient parallel implementation of the algorithm is necessary. Such a parallelization of the algorithm requires proper data structures and data layout, parallel direct and iterative solvers, and parallel preconditioners. Load balancing and minimization of communication between processors also plays very important role in achieving that goal. In this talk, we investigate parallel performance for black oil reservoir simulation on multiple massively parallel computing architectures. A deliberate strategy of performance-based development of the major types of computations encountered in reservoir simulation programs is employed. Even though most operations are memory-bandwidth bound, it is possible with careful implementation, to get excellent parallel efficiency to several 1000s of cores. We discuss numerical issues, scalability and parallel efficiency of reservoir simulator on several very large and geologically challenging examples.

Moderators
avatar for Simanti Das

Simanti Das

Manager, High Performance Computing Software Development & Support, ExxonMobil Technical Computing Company
Simanti Das, is currently the manager of High Performance Computing software development and support group in ExxonMobil Upstream IT organization. She is responsible for providing software development, optimization and support for massively parallel seismic imaging technologies for... Read More →

Speakers

Thursday March 6, 2014 1:40pm - 2:00pm PST
BRC 282 Rice University 6500 Main Street at University, Houston, TX 77030

2:00pm PST

Applications Session II: Automatic Performance Tuning of Reverse Time Migration Using The Abstract Data and Communication Library, Saber Feki, KAUST

DOWNLOAD PRESENTATION

WATCH VIDEO

With the increased complexity and diversity of mainstream HPC systems, significant effort is required to tune applications in order to achieve the best possible performance for each particular platform. This task becomes more and more challenging and requiring a larger set of skills. Automatic performance tuning is becoming a must for optimizing applications such as Reverse Time Migration (RTM) widely used in seismic imaging for oil and gas exploration. In the RTM application, the time-dependent partial differential acoustic wave equation is discretized in space and time, and the resulting system of linear equations is solved for each time step using an explicit scheme. The 3–D version of RTM is computationally intensive and its execution time becomes reasonable for field data only with a parallel implementation using domain decomposition: the simulation grid is split for each shot into smaller 3–D blocks across multiple MPI processes. At each time step, the computation of the boundary grid points requires neighboring processes to exchange the values of the needed stencil points belonging to neighboring subdomains. Typical implementations make use of the Message Passing Interface (MPI) routines for data exchange and therefore implying an extra execution time for the communication operations. The communication overhead that stem from the parallelization of the RTM algorithm would be considerably reduced using an auto-tuning tool, for instance, the Abstract Data and Communication Library (ADCL) [1, 2]. ADCL is an MPI-based communication library that aims at providing the lowest possible execution time for the communication operations and to ease the software development process with high data abstraction and predefined routines. ADCL allows the parallel code to adapt itself to the current architecture and software environment at runtime. The idea behind ADCL is to select the fastest of the available implementations for a given communication pattern during the (regular) execution of the application. For example, ADCL provides 20 different implementations for multi-dimensional (e.g., 2-D, 3-D) neighborhood communication using different combinations of (i) number of simultaneous communication partners, (ii) handling of non-contiguous messages, and (iii) MPI data transfer primitive. ADCL uses the first iterations of the application to determine the fastest neighborhood communication routine for the current execution conditions. Once performance data on a sufficient number of iterations is available, ADCL can make at runtime a decision on which alternative to use throughout the rest of the simulation. There are three main steps to carry in order to use ADCL: preparation, communication and finalization steps. Through this work, we showcase the performance benefit that come out of auto-tuning the parallel RTM application. For that purpose, we implement two versions of the RTM code for each of (i) isotropic (ISO) and (ii) tilted transversely isotropic media (TTI). The first version is the classic scenario where the commonly used MPI implementation of neighborhood communications is utilized. The second is the automatic performance-tuning version where ADCL is used to transparently select the best MPI implementation of neighborhood communications according to the runtime environment. The numerical scheme used is finite difference with a discretization at the 2nd order in time and 8th order in space. We run the simulations for a total of 720 time steps. We carry out our tests on two different parallel platforms at TOTAL E&P Research and Technology USA, LLC. The first cluster (Appro) is based on AMD CPUs, with 2GB of memory per core and an InfiniBand DDR interconnect. The second (IBM) is an Intel based cluster, with 3GB of memory per core and an InfiniBand QDR interconnect. The InfiniBand network in both clusters has a fat tree network topology. We report the MPI communications times of both ISO and TTI kernels, for both platforms and for each version of the code (with and without ADCL). The main advantage of using ADCL is performance, which consists here in decreasing the execution time of the communication operations. First, we would like to point out is that ADCL is able to select a different implementation of 3–dimensional neighborhood communication for each of the different execution environments and each of the ISO and TTI kernels. Second, the auto-tuned versions using ADCL provides up to 40% improvement in the communication time of RTM as detailed in Figure 2. Another advantage of using ADCL is productivity; namely, ADCL allows developers to implement the neighborhood communication related functions of RTM algorithm very easily. The developer does not need to worry about the choice of MPI communication routines and the memory management required for the halo cells (handling non-contiguous data). By keeping track of the memory addresses of the data structures that are passed to the main RTM function, one can easily integrate ADCL into both isotropic and tilted transversely isotropic RTM algorithms with minor changes to the original code. We are currently working on the optimization of the MPI runtime parameters using the Open Tool for Parameters Optimization (OTPO) [3] based on ADCL, for further improvement of the MPI communication performance. We are also looking into automatic tuning of the OpenACC accelerated kernels on the latest NVIDIA GPUs. Encouraging preliminary results will be presented [4]. References: [1] E. Gabriel, S. Feki, K. Benkert, M. Chaarawi. The Abstract Data and Communication Library, Journal of Algorithms and Computational Technology, Vol. 2-No. 4, page 581-600, December 2008. [2] E. Gabriel, S. Feki, K. Benkert, M. Resch. Towards Performance and Portability through Runtime Adaption for High Performance Computing Applications, 'Concurrency and Computation - Practice and Experience' journal, Vol. 22, no. 16, pp. 2230-2246, 2010. [3] M.Chaarawi,J. Squyres,E. Gabriel,S.Feki,A Tool for Optimizing Runtime Parameters of Open MPI, Recent Advances in Parallel Virtual Machine and Message Passing Interface; Lecture Notes in Computer Science Volume 5205, 2008, pp 210-217 [4] S. Feki, S. Siddiqui, “Towards Automatic Performance Tuning of OpenACC Accelerated Scientific Applications” NVIDIA GPU Technology Conference, San Jose, California, USA, March 18-21, 2013. Acknowledgement: This work has been done while Hakan Haberdar was an intern and Saber Feki was an employee in TOTAL E&P USA Research & Technology. The authors would like to thank Total for the support of this work and the help and the advising of senior HPC advisor, Terrence Liao.

Moderators
avatar for Simanti Das

Simanti Das

Manager, High Performance Computing Software Development & Support, ExxonMobil Technical Computing Company
Simanti Das, is currently the manager of High Performance Computing software development and support group in ExxonMobil Upstream IT organization. She is responsible for providing software development, optimization and support for massively parallel seismic imaging technologies for... Read More →

Speakers
avatar for Saber Feki

Saber Feki

Computational Scientist, KAUST Supercomputing Laboratory
Saber Feki received his PhD and M.S in computer science at the University of Houston in 2008 and 2010 respectively. In 2011, he joined the oil and gas industry with TOTAL as an HPC Research Scientist working on seismic imaging applications using different programming models including... Read More →


Thursday March 6, 2014 2:00pm - 2:20pm PST
BRC 282 Rice University 6500 Main Street at University, Houston, TX 77030

2:20pm PST

Applications Session II: Hybrid CPU-GPU Finite Difference Time Domain Kernels, Thor Johnsen, Chevron

DOWNLOAD PRESENTATION

WATCH VIDEO

If you are developing finite difference kernels, you’ve probably considered whether to use CPUs or GPUs. This poses you with an interesting dilemma: CPUs have larger memory capacity, but GPUs are more cost effective. Which one do you value more? This talk explores a design pattern for FDTD kernels that lets you have both, the cost efficiency of GPUs combined with the memory capacity of CPUs. We show that this approach can be used to implement any FDTD kernel, including elastic tilted orthorhombic. We further show that the elastic tilted orthorhombic kernel we implemented can propagate volumes with 10’s of billions of cells running on a single K10 GPU. We show that this kernel can be scaled up to 64 GPUs with almost linear scaling in throughput.

Moderators
avatar for Simanti Das

Simanti Das

Manager, High Performance Computing Software Development & Support, ExxonMobil Technical Computing Company
Simanti Das, is currently the manager of High Performance Computing software development and support group in ExxonMobil Upstream IT organization. She is responsible for providing software development, optimization and support for massively parallel seismic imaging technologies for... Read More →

Speakers

Thursday March 6, 2014 2:20pm - 2:40pm PST
BRC 282 Rice University 6500 Main Street at University, Houston, TX 77030