Applications Session I: HPC for Engineering Simulation of Expandable Liner Hanger Systems, Ganesh Nanaware, Baker Hughes

**DOWNLOAD PRESENTATION**

**WATCH VIDEO**

Abstract Expandable liner hangers used for wellbore construction within the oil and gas industry are complex mechanical systems. Finite Element Analysis (FEA) based engineering simulation for design and development of expandable liner hanger systems is an important activity to reduce the time and cost to introduce a reliable and robust product to the competitive market. The complex nonlinear plastic material behavior and physical interaction during setting of the expandable liner hanger requires powerful High-Performance Computing (HPC) infrastructure to solve the complex and large FEA simulation models. This presentation summarizes HPC infrastructure at Baker Hughes and how it is being used to perform engineering simulations to drive the product design for Expandable Liner Hanger Systems. The HPC resource at Baker Hughes adds value to the design process by enabling greater simulation throughput. Using HPC resources, engineering teams can analyze not just a single design idea, but many design variations faster. By simulating multiple design ideas concurrently, design teams are able to identify dramatic engineering improvements early in the design process, prior to and more effectively than physical prototyping alone. HPC specifically enables parallel processing to obtain the solution of the toughest, higher-fidelity FEA models - including more geometric detail, larger systems and more complex physics. In summary, HPC helped us to understand detailed product behavior with confidence in the design and to achieve significant reduction in product development time and cost.

**Moderators**
## Henri Calandra

**Speakers**

Abstract Expandable liner hangers used for wellbore construction within the oil and gas industry are complex mechanical systems. Finite Element Analysis (FEA) based engineering simulation for design and development of expandable liner hanger systems is an important activity to reduce the time and cost to introduce a reliable and robust product to the competitive market. The complex nonlinear plastic material behavior and physical interaction during setting of the expandable liner hanger requires powerful High-Performance Computing (HPC) infrastructure to solve the complex and large FEA simulation models. This presentation summarizes HPC infrastructure at Baker Hughes and how it is being used to perform engineering simulations to drive the product design for Expandable Liner Hanger Systems. The HPC resource at Baker Hughes adds value to the design process by enabling greater simulation throughput. Using HPC resources, engineering teams can analyze not just a single design idea, but many design variations faster. By simulating multiple design ideas concurrently, design teams are able to identify dramatic engineering improvements early in the design process, prior to and more effectively than physical prototyping alone. HPC specifically enables parallel processing to obtain the solution of the toughest, higher-fidelity FEA models - including more geometric detail, larger systems and more complex physics. In summary, HPC helped us to understand detailed product behavior with confidence in the design and to achieve significant reduction in product development time and cost.

Total

Henri Calandra obtained his M.Sc. in mathematics in 1984 and a Ph.D. in mathematics in 1987 from the Universite des Pays de l’Adour in Pau, France. He joined Cray Research France in 1987 and worked on seismic applications. In 1989 he joined the applied mathematics department of... Read More →

Thursday March 6, 2014 1:00pm - 1:20pm PST

BRC 284*Rice University 6500 Main Street at University, Houston, TX 77030*

BRC 284

Applications Session I: Developing High Compliance, High Strength Well Cement for Extreme Conditions: A Multiscale Computational Approach, Rouzbeh Shahsavari, Rice University)

**PRESENTATION NOT AVAILABLE**

**VIDEO NOT AVAILABLE**

It is challenging to develop bulk materials that exhibit high compliance, high strength and high recoverable strain because of the intrinsic trade-offs among these properties.1 A high compliance in a single phase material usually means weak interatomic bonding and thus low strength. One of the urgent applications of high compliance, high strength materials is in well cementing used in hydraulic fracturing and generally all oil and gas wells where the cement is placed in the annular gap between the drilled formation and the steel casing. Despite the critical role of wellbore cement to prevent zonal isolation and secure casing, cement failure is still a serious problem with enormous socioeconomical impacts (e.g. 2010 oil spill disaster in the Gulf of Mexico, ground water contamination via cement failure in hydraulic fracturing). Wellbore cement frequently fails in a brittle mode due to the downhole pressure, or the formation loading (creep). Given the extreme downhole conditions, to date there is neither a unified understanding nor a reliable methodology to divert this brittle fracture mechanism – a lack of knowledge, which can costs billions of dollars with huge environmental impacts. In this talk, I will describe a novel multiscale computational method to develop a high compliance, high strength wellbore cement where the high compliance assures ductility to accommodate the pressure buildup, and the high strength prevents premature failure. First, I will describe how the state-of-the-art computational atomistic modeling techniques can be used to decode the basic molecular structure of a series of cement hydrate compositions. Second, combinatorial techniques will be utilized to tune the molecular features, self-assembly and aggregation of cementitious materials, thereby providing more coherent microstructure. Third, modern optimization methods such as level sets and phase field methods (based on solving partial differential equations) will be used to simultaneously maximize the ductility and strength of the microstructure via modulating topology and multi-phase heterogeneity of the materials. Together, these multi-scale multi-paradigm methods enable to rapidly screen several fundamental physical properties at extreme conditions (e.g. HTHP, corrosive environments, etc) to find the best-in- class microstructure candidates for accelerated well cementing material discovery. Finally, I will discuss various benefits of such modern computations techniques (e.g. guiding the synthesis of the proof-of-concept high compliance, high strength prototype) with an outlook towards substantially minimizing conventional trial-and-error experiments.

**Moderators**
## Henri Calandra

**Speakers**
## Rouzbeh Shahsavari

It is challenging to develop bulk materials that exhibit high compliance, high strength and high recoverable strain because of the intrinsic trade-offs among these properties.1 A high compliance in a single phase material usually means weak interatomic bonding and thus low strength. One of the urgent applications of high compliance, high strength materials is in well cementing used in hydraulic fracturing and generally all oil and gas wells where the cement is placed in the annular gap between the drilled formation and the steel casing. Despite the critical role of wellbore cement to prevent zonal isolation and secure casing, cement failure is still a serious problem with enormous socioeconomical impacts (e.g. 2010 oil spill disaster in the Gulf of Mexico, ground water contamination via cement failure in hydraulic fracturing). Wellbore cement frequently fails in a brittle mode due to the downhole pressure, or the formation loading (creep). Given the extreme downhole conditions, to date there is neither a unified understanding nor a reliable methodology to divert this brittle fracture mechanism – a lack of knowledge, which can costs billions of dollars with huge environmental impacts. In this talk, I will describe a novel multiscale computational method to develop a high compliance, high strength wellbore cement where the high compliance assures ductility to accommodate the pressure buildup, and the high strength prevents premature failure. First, I will describe how the state-of-the-art computational atomistic modeling techniques can be used to decode the basic molecular structure of a series of cement hydrate compositions. Second, combinatorial techniques will be utilized to tune the molecular features, self-assembly and aggregation of cementitious materials, thereby providing more coherent microstructure. Third, modern optimization methods such as level sets and phase field methods (based on solving partial differential equations) will be used to simultaneously maximize the ductility and strength of the microstructure via modulating topology and multi-phase heterogeneity of the materials. Together, these multi-scale multi-paradigm methods enable to rapidly screen several fundamental physical properties at extreme conditions (e.g. HTHP, corrosive environments, etc) to find the best-in- class microstructure candidates for accelerated well cementing material discovery. Finally, I will discuss various benefits of such modern computations techniques (e.g. guiding the synthesis of the proof-of-concept high compliance, high strength prototype) with an outlook towards substantially minimizing conventional trial-and-error experiments.

Total

Henri Calandra obtained his M.Sc. in mathematics in 1984 and a Ph.D. in mathematics in 1987 from the Universite des Pays de l’Adour in Pau, France. He joined Cray Research France in 1987 and worked on seismic applications. In 1989 he joined the applied mathematics department of... Read More →

Professor, Rice University

My interest is on developing a multi-scale, multi-paradigm materials modeling approach followed by experimental characterizations to study key functional behavior of complex materials, which are critical to the infrastructure underlying the science and technology enterprises of our... Read More →

Thursday March 6, 2014 1:20pm - 1:40pm PST

BRC 284*Rice University 6500 Main Street at University, Houston, TX 77030*

BRC 284

Applications Session I: High order methods for reservoir flows, Beatrice Riviere, Rice University

**DOWNLOAD PRESENTATION**

**WATCH VIDEO**

We propose a numerical method for solving the miscible displacement problem with discontinuous Galerkin method in space and implicit Runge-Kutta method in time. The method approximates the fluid pressure and the resident fluid concentration by polynomials of arbitrary order. Our algorithm allows us to preserve the high order approximation in both space and time while reducing the computational cost by a decoupling strategy of the pressure and concentration equations. The parallelization of the algorithm has been developed in the Dune framework. Convergence and robustness of the method is shown by several numerical examples.

**Moderators**
## Henri Calandra

**Speakers**

We propose a numerical method for solving the miscible displacement problem with discontinuous Galerkin method in space and implicit Runge-Kutta method in time. The method approximates the fluid pressure and the resident fluid concentration by polynomials of arbitrary order. Our algorithm allows us to preserve the high order approximation in both space and time while reducing the computational cost by a decoupling strategy of the pressure and concentration equations. The parallelization of the algorithm has been developed in the Dune framework. Convergence and robustness of the method is shown by several numerical examples.

Total

Henri Calandra obtained his M.Sc. in mathematics in 1984 and a Ph.D. in mathematics in 1987 from the Universite des Pays de l’Adour in Pau, France. He joined Cray Research France in 1987 and worked on seismic applications. In 1989 he joined the applied mathematics department of... Read More →

Thursday March 6, 2014 1:40pm - 2:00pm PST

BRC 284*Rice University 6500 Main Street at University, Houston, TX 77030*

BRC 284

Applications Session I: A Hybrid Algorithm for Global Optimization Problems, Leticia Velazquez, Rice University

PRESENTATION NOT AVAILABLE

**VIDEO NOT AVAILABLE**

We propose a hybrid algorithm for solving global optimization prob- lems that is based on the coupling of the Simultaneous Perturbation Stochastic Approximation (SPSA) and Newton-Krylov Interior-Point (NKIP) methods via a surrogate model. There exist verified algorithms for finding approximate global solutions, but our technique will further guar- antee that such solutions satisfy physical bounds of the problem. First, the SPSA algorithm conjectures regions where a global solution may exist. Next, some data points from the regions are selected to generate a con- tinuously differentiable surrogate model that approximates the original function. Finally, the NKIP algorithm is applied to the surrogate model subject to bound constraints for obtaining a feasible approximate global solution. We present some numerical results on a set of five small problems and two medium to large-scale applications from reservoir simulations.

**Moderators**
## Henri Calandra

Henri Calandra obtained his M.Sc. in mathematics in 1984 and a Ph.D. in mathematics in 1987 from the Universite des Pays de l’Adour in Pau, France. He joined Cray Research France in 1987 and worked on seismic applications. In 1989 he joined the applied mathematics department of... Read More →

**Speakers**

PRESENTATION NOT AVAILABLE

We propose a hybrid algorithm for solving global optimization prob- lems that is based on the coupling of the Simultaneous Perturbation Stochastic Approximation (SPSA) and Newton-Krylov Interior-Point (NKIP) methods via a surrogate model. There exist verified algorithms for finding approximate global solutions, but our technique will further guar- antee that such solutions satisfy physical bounds of the problem. First, the SPSA algorithm conjectures regions where a global solution may exist. Next, some data points from the regions are selected to generate a con- tinuously differentiable surrogate model that approximates the original function. Finally, the NKIP algorithm is applied to the surrogate model subject to bound constraints for obtaining a feasible approximate global solution. We present some numerical results on a set of five small problems and two medium to large-scale applications from reservoir simulations.

Total

Thursday March 6, 2014 2:00pm - 2:20pm PST

BRC 284*Rice University 6500 Main Street at University, Houston, TX 77030*

BRC 284

Applications Session I: Approaches to Lattice Boltzmann Method (LBM) Implementation: A Case Study, Deepak Majeti, Rice University

**DOWNLOAD PRESENTATION**

**WATCH VIDEO**

In this work, we present different implementations of the Lattice Boltzmann Method algorithm for fluid dynamics simulation, parallelized from a single sequential implementation. We discuss versions of the algorithm that adapt to diverse CPU-GPU hardware as well as versions fine-tuned for a GPU cluster. We compare different programming approaches, compiler optimizations and scheduling techniques for the LBM implementation. We successfully reduce the execution time of the simulation from 5 days on a single CPU node to just 200 seconds on a single GPU node. Lattice Boltzmann Method (LBM) simulation is a widely used technique to observe the free flow of fluid through porous media. The oil and gas industries use this technique to measure the porosity of a rock. Porosity of the rock is a detrimental factor in choosing a well for oil and natural gas extraction. The process of choosing a well involves making a preliminary analysis of the rock on the field and then further processing it in the research lab for greater accuracy. The field version of LBM is usually run on a single desktop system available and should be portable with reasonable accuracy. The LBM implementation for the research laboratory must be fine-tuned to a particular research cluster and must achieve maximum performance and precision. LBM simulation is computationally intensive. For certain rocks, computing a 300x300x300 grid is equivalent to simulating a 3000x3000x3000 cube of atoms and takes around 5 days to measure the porosity of the rock with reasonable accuracy. Recently, high performance heterogeneous architectures have become ubiquitous and are being widely adopted by many industries including oil and gas industries. However, extracting the maximum performance on these heterogeneous architectures is non-trivial and requires rigorous training. In our work, we show simple yet powerful approaches and methods to take advantage of these modern heterogeneous architectures and achieve good performance. LBM has two main kernels: collision and propagation. The sequential version of the LBM code takes 20 seconds per iteration for a grid size of 300x300x300 of single precision data values on a single CPU node. One of our optimizations merges the collision and propagation kernels into a single loop in order to avoid loading and storing from memory the entire grid between the invocations of the kernels. Merging the kernels reduces the time per iteration to 12 seconds. Next, we employ array linearization and loop normalization optimizations; reducing the time per step to 6.1 seconds. Finally, we fine-tune the application by building a sparse version of the merged collision and propagation kernels. An OpenMP version of this optimized kernel takes 1 second per time step on a 12-core CPU node while an OpenCL version takes 0.02 seconds per time step on a single GPU node. We implemented the cluster version of LBM using MPI + OpenCL. The cluster implementation enables LBM simulation over very large grid sizes. Further, we experiment with different optimizations using Habanero-C, a portable language for heterogeneous (CPU-GPU) architectures. The "forasync" construct in Habanero-C is compiled down to OpenCL. Merging the collision and propagation kernels requires a shadow copy of the grid to store the new values computed in each time step. This poses a constraint on the grid size of the simulation since the GPU memory is often limited on a single device. When the entire grid does not fit in the available GPU memory, we partition the grid and perform the computation on one partition at a time. This requires data to be copied to and from the device. We overlap the communication of data with computation of the kernel to hide the data-copying overhead. The propagation kernel has conditional statements to handle the boundary cells. We strip-mine the kernel into computation on the inner grid and computation on the boundaries. The inner grid, which is free of conditionals, is efficiently executed on the GPU while the boundary cells, which have conditionals, are executed on the CPU. We also experiment by changing the data layout of the grid and observe the performance on various heterogeneous hardware. We conclude with a discussion of the performance of our implementations on recent heterogeneous (CPU-GPU) devices from AMD, Intel and NVIDIA. We are in the process of evaluating the performance of these versions on various heterogeneous hardware including AMD APU/ discrete devices, Intel IVB device and Nvidia Fermi and Kepler devices.

**Moderators**
## Henri Calandra

Henri Calandra obtained his M.Sc. in mathematics in 1984 and a Ph.D. in mathematics in 1987 from the Universite des Pays de l’Adour in Pau, France. He joined Cray Research France in 1987 and worked on seismic applications. In 1989 he joined the applied mathematics department of... Read More →

**Speakers**

In this work, we present different implementations of the Lattice Boltzmann Method algorithm for fluid dynamics simulation, parallelized from a single sequential implementation. We discuss versions of the algorithm that adapt to diverse CPU-GPU hardware as well as versions fine-tuned for a GPU cluster. We compare different programming approaches, compiler optimizations and scheduling techniques for the LBM implementation. We successfully reduce the execution time of the simulation from 5 days on a single CPU node to just 200 seconds on a single GPU node. Lattice Boltzmann Method (LBM) simulation is a widely used technique to observe the free flow of fluid through porous media. The oil and gas industries use this technique to measure the porosity of a rock. Porosity of the rock is a detrimental factor in choosing a well for oil and natural gas extraction. The process of choosing a well involves making a preliminary analysis of the rock on the field and then further processing it in the research lab for greater accuracy. The field version of LBM is usually run on a single desktop system available and should be portable with reasonable accuracy. The LBM implementation for the research laboratory must be fine-tuned to a particular research cluster and must achieve maximum performance and precision. LBM simulation is computationally intensive. For certain rocks, computing a 300x300x300 grid is equivalent to simulating a 3000x3000x3000 cube of atoms and takes around 5 days to measure the porosity of the rock with reasonable accuracy. Recently, high performance heterogeneous architectures have become ubiquitous and are being widely adopted by many industries including oil and gas industries. However, extracting the maximum performance on these heterogeneous architectures is non-trivial and requires rigorous training. In our work, we show simple yet powerful approaches and methods to take advantage of these modern heterogeneous architectures and achieve good performance. LBM has two main kernels: collision and propagation. The sequential version of the LBM code takes 20 seconds per iteration for a grid size of 300x300x300 of single precision data values on a single CPU node. One of our optimizations merges the collision and propagation kernels into a single loop in order to avoid loading and storing from memory the entire grid between the invocations of the kernels. Merging the kernels reduces the time per iteration to 12 seconds. Next, we employ array linearization and loop normalization optimizations; reducing the time per step to 6.1 seconds. Finally, we fine-tune the application by building a sparse version of the merged collision and propagation kernels. An OpenMP version of this optimized kernel takes 1 second per time step on a 12-core CPU node while an OpenCL version takes 0.02 seconds per time step on a single GPU node. We implemented the cluster version of LBM using MPI + OpenCL. The cluster implementation enables LBM simulation over very large grid sizes. Further, we experiment with different optimizations using Habanero-C, a portable language for heterogeneous (CPU-GPU) architectures. The "forasync" construct in Habanero-C is compiled down to OpenCL. Merging the collision and propagation kernels requires a shadow copy of the grid to store the new values computed in each time step. This poses a constraint on the grid size of the simulation since the GPU memory is often limited on a single device. When the entire grid does not fit in the available GPU memory, we partition the grid and perform the computation on one partition at a time. This requires data to be copied to and from the device. We overlap the communication of data with computation of the kernel to hide the data-copying overhead. The propagation kernel has conditional statements to handle the boundary cells. We strip-mine the kernel into computation on the inner grid and computation on the boundaries. The inner grid, which is free of conditionals, is efficiently executed on the GPU while the boundary cells, which have conditionals, are executed on the CPU. We also experiment by changing the data layout of the grid and observe the performance on various heterogeneous hardware. We conclude with a discussion of the performance of our implementations on recent heterogeneous (CPU-GPU) devices from AMD, Intel and NVIDIA. We are in the process of evaluating the performance of these versions on various heterogeneous hardware including AMD APU/ discrete devices, Intel IVB device and Nvidia Fermi and Kepler devices.

Total

Thursday March 6, 2014 2:20pm - 2:40pm PST

BRC 284*Rice University 6500 Main Street at University, Houston, TX 77030*

BRC 284