Loading…
This event has ended. Create your own event → Check it out
This event has ended. Create your own
View analytic

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Wednesday, March 5
 

2:00pm

Pre Workshop Tutorial
Wednesday March 5, 2014 2:00pm - 5:30pm
BRC 280 Rice University 6500 Main Street at University, Houston, TX 77030

6:00pm

Pre-Workshop Networking Reception
Wednesday March 5, 2014 6:00pm - 8:00pm
BRC
 
Thursday, March 6
 

7:00am

8:15am

Opening and Welcome: Jan E. Odegard, Executive Director, Ken Kennedy Institute for Information Technology and Edwin 'Ned' Thomas, William & Stephanie Sick Dean of Engineering, George R. Brown School of Engineering

Moderators
avatar for Jan Odegard

Jan Odegard

Executive Director, Ken Kennedy Institute for Information Technology, Rice University
Jan E. Odegard joined the Ken Kennedy Institute for Information Technology (formerly Computer and Information Technology Institute) at Rice University as Executive Director in 2002. In this role he led the development and deployment of large scale competing resources in support of research. Today, the computational resources deployed at Rice supports the research of over 100 faculty members and close to 500 users. The majority of users are... Read More →

Speakers

Thursday March 6, 2014 8:15am - 8:30am
BRC 103 Rice University 6500 Main Street at University, Houston, TX 77030

8:30am

Keynote: The never-ending story of IT/HPC evolution and its effect on our business, Peter Breunig, GM Technology Management and Architecture ITC, Chevron

Moderators
CW

Chap Wong

Chevron
Chap Wong is a Chevron Fellow, he is recognized as Chevron’s thought leader in high-performance computing. He is currently a member of the Strategy, Architecture and Emerging Technology Team in Chevron Energy Technology Company’s Technical Computing Department. Chap is engaged in market evaluation, proof of concept and deployment of the emerging technologies required to maximize the performance of Chevron’s HPC environment. 
He has over... Read More →

Speakers
PB

Peter Breunig

Peter Breunig is the General Manager of Technology Management and Architecture at Chevron IT.  In his role, he is responsible for Chevron’s Information Technology technical direction/strategy, emerging IT, and enterprise architecture.  He spent his first 20 years at Chevron in upstream technical support, application and research.   Peter earned a Bachelor of Science degree in Geology from Boston College in 1977 and a... Read More →


Thursday March 6, 2014 8:30am - 9:30am
BRC 103 Rice University 6500 Main Street at University, Houston, TX 77030

9:30am

Plenary: Some Assembly Required? Thoughts on Programming at Extreme Scale, David Bernholdt, Senior R&D Staff Member, Computer Science and Mathematics Division, Oak Ridge National Laboratories

DOWNLOAD PRESENTATION

WATCH VIDEO

Historically, getting the utmost performance from a computer has often
required programming at very low levels, such as assembly language.
Lately, with supercomputer architectures relatively stable for a long
period, we seem to have gotten away from this. But now we are entering a
new period of diversity and rapid change in extreme-scale systems.  What
are the prospects for programming such systems?  I will present an
overview of current practice and research directions in programming
environments at ORNL and the broader DOE community targeting
computational science and engineering applications on the largest
available computer systems.  I will touch upon work in programming
languages and related tools, communication middleware, operating
systems, fault tolerance, and the engineering of scientific software.
Much of this work focuses on providing higher levels of abstraction to
software developers, while encapsulating and hiding the most
performance-sensitive details.

Moderators
CW

Chap Wong

Chevron
Chap Wong is a Chevron Fellow, he is recognized as Chevron’s thought leader in high-performance computing. He is currently a member of the Strategy, Architecture and Emerging Technology Team in Chevron Energy Technology Company’s Technical Computing Department. Chap is engaged in market evaluation, proof of concept and deployment of the emerging technologies required to maximize the performance of Chevron’s HPC environment. 
He has over... Read More →

Speakers
avatar for David Bernholdt

David Bernholdt

Distinguished R&D Staff Member, Oak Ridge National Laboratory
David Bernholdt is a Distinguished R&D Staff Member at Oak Ridge | National Laboratory.  He is Group Leader for the Computer Science | Research Group in the Computer Science and Mathematics Division as | well as the lead for Programming Environment and Tools for the Oak | Ridge Leadership Computing Facility (OLCF).  His research interests | are in programming environments for high-performance scientific | computing, broadly... Read More →


Thursday March 6, 2014 9:30am - 10:10am
BRC 103 Rice University 6500 Main Street at University, Houston, TX 77030

10:10am

Networking and Break
Thursday March 6, 2014 10:10am - 10:50am
BRC

10:50am

Plenary: 'Letting the Data and the Water Flow,' Local and National DOE Preparations for Exascale Randal Rheinheimer, Deputy Division Leader, High Performance Computing at LANL

Moderators
CW

Chap Wong

Chevron
Chap Wong is a Chevron Fellow, he is recognized as Chevron’s thought leader in high-performance computing. He is currently a member of the Strategy, Architecture and Emerging Technology Team in Chevron Energy Technology Company’s Technical Computing Department. Chap is engaged in market evaluation, proof of concept and deployment of the emerging technologies required to maximize the performance of Chevron’s HPC environment. 
He has over... Read More →

Speakers
RR

Randal Rheinheimer

Deputy Division Leader, High Performance Computing at LANL


Thursday March 6, 2014 10:50am - 11:30am
BRC 103 Rice University 6500 Main Street at University, Houston, TX 77030

11:30am

Lightning Talk: Reverse Time Migration with Manycore Coprocessors, Leonardo Borges, Intel

Moderators
Speakers
avatar for Leo Borges

Leo Borges

Sr. Staff Engineer, Intel


Thursday March 6, 2014 11:30am - 11:35am
BRC 103 Rice University 6500 Main Street at University, Houston, TX 77030

11:35am

Lightning Talk: Accelerating Reverse Time Migration: A Dataflow Approach, Hicham Lahlou, Xcelerit

PRESENTATION NOT AVAILABLE

VIDEO NOT AVAILABLE

As the age of harvesting easily accessible Oil and Gas resources is coming to an end, more complex geologies have to be explored to find new reservoirs. These geologies often violate the assumptions underlying the Kirchhoff Time Migration (KTM) algorithm, calling for more complex algorithms to reconstruct the Earth's subsurface from seismic wave measurement data. Hence, Reverse Time Migration (RTM) is the current state of the art algorithm for seismic imaging, giving more accurate 2D and 3D images of the subsurface than KTM. Until recently, the enormous computational complexity involved hindered the widespread application of the RTM algorithm in the industry. With hardware advances of multi-core CPUs as well as increased use of high performance accelerator processors such as GPUs or the Xeon Phi, it is now possible to reconstruct subsurface images within reasonable time frames. However, most programming approaches available for these processors do not provide enough hardware abstraction for end-users, i.e., geophysicists. This poses a significant barrier to adopting advanced HPC hardware and using it efficiently. We briefly explain the RTM algorithm and how it is typically implemented. The algorithm is analyzed to identify the key performance bottlenecks, both for computation and data access. The main implementation challenges are detailed, such as managing the data, parallelizing and distributing the computation, and exploiting hardware capabilities of multi-core CPUs, GPUs, and Xeon Phi. To cope with these challenges, we propose to model the RTM as a dataflow graph and automate the performance optimizations and execution management. Dataflow graphs are directed graphs of processing stages (actors), where data is streamed along the edges and processed by the actors. This model exposes several types of parallelism and optimization opportunities, such as pipeline parallelism, data parallelism, and memory locality. Using this model, programmers can focus on the algorithm itself and the performance optimizations and execution management can be left to an automated tool. Further, the actors themselves can be implemented independently of the execution device, enabling code portability between different hardware. We give a mapping of RTM algorithms to a dataflow graph and show that this is independent of the target execution hardware. The full algorithm is captured in the model, and data and task dependencies are fully exposed - without explicitly using parallel programming concepts. The benefits of this approach and how it can overcome the implementation challenges mentioned earlier are explained in detail. Using an example implementation, important aspects of the execution management, such as memory access patterns, data transfers, cache efficiency, and asynchronous execution are detailed. We give mappings of these aspects to multi-core CPUs, GPUs, and Xeon Phi, explaining the similarities and differences. As typical systems have more than one accelerator processor, we also cover scheduling dataflow graphs to multiple execution devices. As a practical example, we use the Xcelerit SDK as an implementation framework that is based on a dataflow programming model. It exploits the mentioned optimization opportunities and abstracts the hardware specifics from the user. The performance has been measured for both multi-core CPUs and GPUs for a range of algorithm parameters. It is within 5% of equivalent hand-tuned implementations of the algorithm, but achieved with a significantly lower implementation effort. This shows the potential of employing a dataflow approach for RTM.

Moderators
Speakers

Thursday March 6, 2014 11:35am - 11:40am
BRC 103 Rice University 6500 Main Street at University, Houston, TX 77030

11:40am

Lightning Talk: Accelerating Compute Intense Applications, Geoff Clark, Acceleware Ltd.

Moderators
Speakers

Thursday March 6, 2014 11:40am - 11:45am
BRC 103 Rice University 6500 Main Street at University, Houston, TX 77030

11:45am

Lightning Talk: Automatic Generation of 3-D FFTs, Brian Duff, SpiralGen, Inc.

DOWNLOAD PRESENTATION

WATCH VIDEO

Automatic Generation of 3-D FFTs Brian Duff[a], Jason Larkin[a], Mike Franusich[a], Franz Franchetti[a][b] [a] SpiralGen, Inc. [b] Dept. of Electrical and Computer Engineering, Carnegie Mellon University BACKGROUND Parallel software development is notoriously difficult. The quest for exascale computing has led to fast-changing, increasingly complex, and diverse supercomputing architectures, which poses a central problem in parallel scientific computing: how can portable optimal performance be achieved with reasonable effort? One possible solution is to generate highly optimized code from a high level specification. Spiral [1, 2] is such a tool for the performance-critical domain of linear transforms, such as the ubiquitous Fourier transform. For a specified transform, Spiral automatically generates high performance code that is tuned to a given architecture. Spiral formulates the tuning as an optimization problem, and exploits the domain-specific mathematical structure of transform algorithms to implement a feedback-driven optimizer. Similar to a human expert, for a specified transform, Spiral “intelligently” generates and explores algorithmic and implementation choices to find the best match to the computer’s micro-architecture. The “intelligence” is provided by a search and learning technique that exploits the structure of the algorithm and implementation space to guide the exploration and optimization. Spiral generates high performance code for a broad set of transforms including the discrete Fourier transform, other trigonometric transforms, filter transforms, and discrete wavelet transforms. Experimental results show that the code generated by Spiral competes with, and consistently outperforms, the best available human tuned library code. In this work we extend Spiral to the computer generation of 3D-FFT code crucial in oil exploration and other domains. RESULTS We present results obtained by SpiralGen, Inc., the corporate face of Spiral, on a Blue Gene/Q system for a three dimensional fast Fourier transform (FFT). While Spiral can generate code for a range of different transform algorithms, the FFT is chosen as an example because of its ubiquitous application in diverse scientific fields, including oil exploration. The code was generated for one node of an IBM Blue Gene/Q with up to 64 threads in a batch filling half of the node’s memory. The FFT was performed on n x n x n data cubes of varying size n (shown on the x-axis) and the performance is reported in giga-floating point operations per second (Gflops/s). Figure 1 shows a comparison of Spiral-generated code results against FFTW, a well-known C library for calculating FFTs. The Spiral-generated code is consistently more than two times faster. Reasons include FFTW’s possible suboptimal support for BlueGene’s vector extensions, and Spiral’s ability to detailed tuning for vector extensions and multicore. Figure 2 compares Spiral-generated code with IBM’s own Engineering and Scientific Subroutine Library (ESSL). The ESSL implementation has overhead which causes it to be non-competitive at small FFT sizes. While EESL is competitive at some larger data sizes, the results are again consistently below those from the Spiral-generated code. CONCLUSIONS We show results with highly optimized 3-D-FFT code generated by Spiral for a BlueGene/Q platform. The focus was on a single node and we demonstrated significant speed-up compared to alternative libraries due to full support of all 64 nodes and BlueGene’s vector extension. An extension to more nodes is possible as we already did for 1-D FFTs as part of winning the HPC challenge supporting 128k cores [3]. We note that all Spiral code is fully generated. This implies that customization (e.g., when parts of the input are known to be zero) or porting (to future BlueGene platform) can be done quickly by retargeting the generator. 1. Markus Püschel, José M. F. Moura, Jeremy Johnson, David Padua, Manuela Veloso, Bryan Singer, Jianxin Xiong, Franz Franchetti, Aca Gacic, Yevgen Voronenko, Kang Chen, Robert W. Johnson, and Nick Rizzolo SPIRAL: Code Generation for DSP Transforms Proceedings of the IEEE special issue on "Program Generation, Optimization, and Adaptation," Vol. 93, No. 2, 2005, pp. 232-275 2. Markus Püschel, Franz Franchetti, and Yevgen Voronenko Spiral in Encyclopedia of Parallel Computing, Eds. David Padua, Springer 2011 3. Franz Franchetti, Yevgen Voronenko, and Gheorghe Almasi Automatic Generation of the HPC Challenge’s Global FFT Benchmark for BlueGene/P High Performance Computing for Computational Science – VECPAR 2012, Eds. Michel Daydé, Osni Marques, Kengo Nakajima, Springer 2013, pp. 187-200

Moderators
Speakers
BD

Brian Duff

Software Engineer, SpiralGen Inc.


Thursday March 6, 2014 11:45am - 11:50am
BRC 103 Rice University 6500 Main Street at University, Houston, TX 77030

11:50am

Lighting Talk: HueSpace: The next generation software development platform for E&P Visual Computing, Michele Isernia, HUE AS

DOWNLOAD PRESENTATION

WATCH VIDEO

Building modern, high performance and interactive visual computing software is quite difficult and this is proven by the major technology gap between the E&P software currently in use and the latest computing technologies available. HueSpace comes from the 3D interactive gaming industry and it is being adopted by major commercial ISVs and oil majors to develop the next generation of E&P software. HUE has been developing HueSpace since 2001 and the platform is solid and validated in production environments, it is also broad as it spans from Seismic to Reservoir to Drilling and Production. The 3D gaming industry in the last 10 years adopted the "engine" model where a single engine controls visualization, computation and large data streaming. This model has enabled the gaming industry to expand and grow tremendously. Huespace is the only commercial solution for E&P bringing this model to the industry. HueSpace enables practically unlimited data size, utilizing intelligent streaming and advanced wavelet compression to stream data on demand and apply advanced computing algorithms to the data "in flight", driven by the interactive user experience and workflow. This approach is so powerful and so different that is literally changing many of the traditional workflows in E&P. HueSpace takes care of all the data management around computing and visualization, automatically takes advantage of multiple accelerators and the data decomposition required. During the presentation we will cover the core architecture and programming model. We will then demonstrate an application that will handle massive TB datasets, apply interactive computing and visualization that normally requires cluster computing and multiple hours or days to solve, and show some of the most advanced 3D visualization available to date. We will also demo the same application working in the cloud and collaboratively across multiple users, from laptops, browsers, tablets, etc... HueSpace supports Linux and Windows, C, C++, .Net/C#, Java and Python and can be used to develop brand new interactive visual applications a well as extend existing software. HueSpace supports hybrid architectures, enabling GPU computing and other accelerators.

Moderators
Speakers

Thursday March 6, 2014 11:50am - 11:55am
BRC 103 Rice University 6500 Main Street at University, Houston, TX 77030

11:55am

Lightning Talk: Kalray MPPA-256 scalable compute cartridge: an efficient architecture applied to Oil & Gas HPC, Benoît Ganne, Kalray SA

DOWNLOAD PRESENTATION

WATCH VIDEO

Kalray MPPA-256 scalable compute cartridge: an efficient architecture applied to Oil & Gas HPC Benoît Ganne, Christian Chabrerie, Thierry Strudel benoit.ganne@kalray.eu, christian.chabrerie@kalray.eu, thierry.strudel@kalray.eu Introduction Kalray MPPA-256 is a manycore, low-power, dis- tributed memory supercomputer-on-a-chip (SCoC). It is composed of 16 clusters of 17 cores - 16 dedi- cated computational cores and 1 control core - shar- ing 2MB of SRAM, and of several Input/Output (I/O) capabilities controlled by 4 SMP quad-cores such as 2 PCIe Gen3 controller, 2 DDR3 ECC 1600 controllers and 2 40Gbps Ethernet (GbE) controllers among others. Each core implements the Kalray-1 VLIW architecture with a dedicated IEEE-754 sin- gle precision (SP) and double precision (DP) floating point unit (FPU). The 16 clusters and the 4 SMP quad-cores are interconnected through a high band- width, low latency network-on-a-chip (NoC) using a 2D-torus topology. In addition to the standard I/O capabilities, a single MPPA-256 is able to intercon- nect its NoC to 4 MPPA-256 neighbors using Kalray NoCX interconnect. This capability allows to present a single virtual manycore to the programmer, com- posed of multiple MPPA-256. Multiple MPPA-256 can be traversed in each direction transparently. The MPPA-256 topology is depicted on figure 1. Figure 1: Kalray MPPA-256 topology This architecture can be used as a building block for highly energy-efficient supercomputer: a cartridge with 4 MPPA-256, as depicted on figure 2. The 4 MPPA-256 are interconnected together on-board using NoCX, actually presenting a single 64 clus- ters (1024 computational cores), 16 SMP quad-cores manycore to the programmer. The boards can be further interconnected together through NoCX with external connectors or using a chassis interconnect to build an even bigger virtual manycore. Programming model The Kalray MPPA-256 supports C, C++ and For- tran with different programming models and Appli- cation Programming Interfaces (APIs), and can be programmed with MPI and OpenMP. Each MPPA- 256 cluster is an MPI process and in this MPI process OpenMP can be used to easily exploit the 16 compu- tational cores. Due to the distributed and asymmetric nature of the MPPA-256, the best programming model for Oil & Gas algorithms such as Reverse Time Migra- tion (RTM) or Full Waveform Inversion (FWI) is a double-buffering model (application pipeline of depth 2) as depicted by figure 3: each cluster divide its 2MB SRAM space by 2 so that while the 16 computational cores are working on a SRAM half, the next data can be pushed by DMA to the other half. System architecture The seismic data are stored on storage servers, they are sent through multiple 10GbE links to the Kalray MPPA-256 scalable compute cartridges DDR during the initialization phase. The Kalray MPPA-256 scal- able compute cartridges can then be partitioned or paired as needed depending of the workload memory size and required computational power. All the com- putation is then done locally, with frontier exchanges happening between the Kalray MPPA-256 scalable compute cartridges DDR involved. For example, using a single Kalray MPPA-256 scal- able compute cartridge with 32GB of DDR (4GB per DDR interface, 8GB per MPPA) a typical RTM shot might be computed. If the shot does exceed this amount of memory, multiple cartridges can be paired together, and on the contrary multiple shots can be computed on a single cartridge if it does fit in mem- ory. During the computation phase, snapshots can be sent back to the storage server through the 10GbE links. First experiments We experimented typical HPC workloads on a single Kalray MPPA-256 scalable compute cartridge proto- type based on 4 MPPA-256 interconnected together with each MPPA-256 using 4GB of DDR, 2GB per DDR interface. The achieved GFLOPS/W in sin- gle precision and scalability are measured for each experiment. The GFLOPS are measured using hard- ware performance counters and the power consump- tion is measured using an on-board power consump- tion measurement circuit. The first experiment is a general matrix multiply algorithm (GEMM[5]) on a 4096x4096 matrix, scal- ing from a single cluster on a single MPPA-256 to the 64 clusters available on the 4 MPPA-256. The results are presented on figure 4. The following table compares the GFLOPS/W between different architectures[1][2]: Platform & GFLOPS & Power & GFLOPS/W nVidia M2090 Fermi & 780 & 225 & 3.5 Intel i7-3820 & 209 & 95 & 2.2 DSP: TI C6678 & 93 & 10 & 9.3 MPPA-256 & 123 & 10 & 11.9 4x MPPA-256 & 433 & 41 & 10.5 The Intel results are measured using OpenBLAS[6] on the MPPA developer workstation host CPU. The scalability is nearly linear, demonstrating the archi- tecture scalability, whereas the GFLOPS/W are one of the best available today. The second experiment is a complex fast Fourier transform (FFT) of 1K points to 256K points, scaling from a single cluster to the 16 clusters available on a single MPPA. The results are presented on figure 5. The scalability is nearly linear, once again demon- strating the architecture scalability. More experi- ments will be done, to scale up to 4 MPPA-256 and to compare to other architectures. Results using benchmarks more relevant for Oil & Gas HPC such as 3-dimensional finite difference (3DFD) algorithms will be shown. Conclusion We showed that the Kalray MPPA-256 scalable com- pute cartridge expose 2 key characteristics to support future Oil & Gas Exascale HPC: Scalability: allowing to build a system as a stack- ing of well-known, more simple, systems Power efficiency: Exascale system will need more than 50GFLOPS/W[4] Still, the Kalray MPPA-256 scalable compute car- tridge is only a first step in the direction of the Oil & Gas Exascale HPC. More power efficiency will be needed in coming years, and the authors think that the model of having simple, power efficient building blocks such as scalable interconnections of multiple manycores[3] will remain. The distributed memory nature of this architecture guarantees its scalability, and as such the system can be precisely sized and expanded as needed. This paved the way to a new paradigm for scalable software-defined systems. References [1] NVIDIA, NVIDIA CUBLAS performance, available at https://developer.nvidia.com/ cublas. [2] Francisco D. Igual, Murtaza Ali, Arnon Fried- mann, Eric Stotzer, Timothy Wentz and Robert van de Geijn, Unleashing DSPs for General- Purpose HPC, available at http://www.cs. utexas.edu/users/flame/pubs/FLAWN61.pdf. [3] US National Academy of Science, The New Global Ecosystem in Advanced Computing: Implications for U.S. Competitiveness and National Security (2012). [4] DARPA, Power Efficiency Revolution For Em- bedded Computing Technologies (PERFECT) program. [5] National Science Foundation, available at http: //www.netlib.org/blas/. [6] OpenBLAS, available at http://www.openblas. net/.

Moderators
Speakers

Thursday March 6, 2014 11:55am - 12:00pm
BRC 103 Rice University 6500 Main Street at University, Houston, TX 77030

12:00pm

Networking and Lunch Break
Thursday March 6, 2014 12:00pm - 1:00pm
BRC

1:00pm

Applications Session I: HPC for Engineering Simulation of Expandable Liner Hanger Systems, Ganesh Nanaware, Baker Hughes

DOWNLOAD PRESENTATION

WATCH VIDEO

Abstract Expandable liner hangers used for wellbore construction within the oil and gas industry are complex mechanical systems. Finite Element Analysis (FEA) based engineering simulation for design and development of expandable liner hanger systems is an important activity to reduce the time and cost to introduce a reliable and robust product to the competitive market. The complex nonlinear plastic material behavior and physical interaction during setting of the expandable liner hanger requires powerful High-Performance Computing (HPC) infrastructure to solve the complex and large FEA simulation models. This presentation summarizes HPC infrastructure at Baker Hughes and how it is being used to perform engineering simulations to drive the product design for Expandable Liner Hanger Systems. The HPC resource at Baker Hughes adds value to the design process by enabling greater simulation throughput. Using HPC resources, engineering teams can analyze not just a single design idea, but many design variations faster. By simulating multiple design ideas concurrently, design teams are able to identify dramatic engineering improvements early in the design process, prior to and more effectively than physical prototyping alone. HPC specifically enables parallel processing to obtain the solution of the toughest, higher-fidelity FEA models - including more geometric detail, larger systems and more complex physics. In summary, HPC helped us to understand detailed product behavior with confidence in the design and to achieve significant reduction in product development time and cost.

Moderators
HC

Henri Calandra

Total
Henri Calandra obtained his M.Sc. in mathematics in 1984 and a Ph.D. in mathematics in 1987 from the Universite des Pays de l’Adour in Pau, France. He joined Cray Research France in 1987 and worked on seismic applications. In 1989 he joined the applied mathematics department of the French Atomic Agency. In 1990 he started working for Total SA. After 12 years of work in high performance computing and as project leader for Pre-stack Depth... Read More →

Speakers

Thursday March 6, 2014 1:00pm - 1:20pm
BRC 284 Rice University 6500 Main Street at University, Houston, TX 77030

1:00pm

Applications Session II: A High Performance Computational Platform for Simulation of Giant Reservoir Models, Majdi Baddourah, Saudi Aramco

DOWNLOAD PRESENTATION

WATCH VIDEO

A High Performance Computational Platform for Simulation of Giant Reservoir Models Majdi Baddourah, M. Ehtesham Hayder, Badr Harbi, Ahmed Zawawi and Fouad Abouheit Saudi Aramco Abstract Simulation of high resolution reservoir models is useful to gain insight into oil and gas reservoirs. Nowadays, massive, comprehensive reservoir simulation models can be built with detailed geological and well log data. These models require a very large high performance computing (HPC) platform for conducting reservoir simulation. Saudi Aramco has developed a state-of-the-art simulator, GigaPOWERS, which is capable of simulating multibillion cell reservoir models. The presentation will provide an overview of challenges related to constructing HPCs and visualizing the simulation output of giant reservoir models, and how the computational platform at Saudi Aramco is designed to overcome these challenges. A large HPC platform can be designed for reservoir simulation by connecting multiple Linux clusters in a simulation grid. Such an environment can provide the necessary capacity and computational power to solve multibillion cell reservoir models. Such a simulation grid for reservoir simulation has been designed in Saudi Aramco’s Exploration and Petroleum Engineering Center (EXPEC) Computer Center. In this study, we provide the benchmark results of multiple giant fields to evaluate the performance of the Saudi Aramco simulation grid for reservoir simulation. Communication and input/output (I/O) routines in the simulator can add a considerable overhead in computation on such a computing platform. Connectivity between clusters on our simulation grid is tuned to maintain a high level of scalability in simulation. Excellent scalability results have been obtained for computations of giant simulation models on the simulation grid. Simulation models in the order of one billion cells pose a challenge to pre- and post-processing applications, which must load and process data in a reasonable time. Remote visualization, level of detail and load-on-demand algorithms were implemented in these applications, and data formats were revised to efficiently process and visualize massive simulation models.

Moderators
avatar for Simanti Das

Simanti Das

Manager, High Performance Computing Software Development & Support, ExxonMobil Technical Computing Company
Simanti Das, is currently the manager of High Performance Computing software development and support group in ExxonMobil Upstream IT organization. She is responsible for providing software development, optimization and support for massively parallel seismic imaging technologies for Upstream business use. Upon receiving her M.S in Computer Science from University of Houston in 1988, Simanti started her career at Exxon Production Research... Read More →

Speakers

Thursday March 6, 2014 1:00pm - 1:20pm
BRC 282 Rice University 6500 Main Street at University, Houston, TX 77030

1:00pm

Programming Models, Libraries and Tools: Research and Development into Extreme Scale I/O using the Adaptable I/O System (ADIOS), Scott Klasky, Oak Ridge National Laboratory

DOWNLOAD PRESENTATION

WATCH VIDEO

One of the most pressing problems facing the High Performance Computing (HPC) community is the ability to compose next generation simulations and understand the performance vs. performance tradeoffs. One of the realizations of exascale computing is that we will not be able to simply save all of the data from simulations, and we must move to more of a “in-situ” processing paradigm. In our research we have developed techniques to deal with the increasing disparity between I/O throughput and compute capability. As part of this effort we developed the Adaptable IO System (ADIOS), which focused on a Service Oriented Architecture, by combining cutting edge research into new I/O techniques. As a result, ADIOS provides the highest level of synchronous I/O performance for a number of mission critical applications at various DOE Leadership Computing Facilities, and as a result, won a R&D 100 award. Our research has focused on many of the pressing needs for exascale computing, and in this presentation we focus on two critical areas: 1) Creating new abstractions and middleware for location-flexible scientific data analytics, 2) Creating new techniques to facilitate spatial and temporal queries of scientific data analytics.

Moderators
avatar for Amik St-Cyr

Amik St-Cyr

Senior researcher, Shell
Amik St-Cyr recently joined the Royal Dutch Shell company as a senior researcher in computation & modeling. Amik came to the industry from the NSF funded National Center for Atmospheric Research (NCAR). His work consisted in the discovery of novel numerical methods for geophysical fluid flows with particular attention to their implementation on supercomputers. He has experience in a plethora of numerical methods for solving time-dependent... Read More →

Speakers

Thursday March 6, 2014 1:00pm - 1:20pm
BRC 280 Rice University 6500 Main Street at University, Houston, TX 77030

1:00pm

Systems Infrastructure, Facilities and Visualization: OpenSFS and Lustre* file system development: past, present and future, Richard Henwood, Intel

DOWNLOAD PRESENTATION

WATCH VIDEO

The community around the open source Lustre* file system is steadily moving forward. Today seven of the top ten fastest machines in the world have a Lustre file system and a new release of Lustre software is produced every six months. Central to this effort is OpenSFS who over the last three years have supported the ongoing Lustre community releases and feature developement through funding efforts. Intel's High Performance Data Division has a number of on-going feature development projects with OpenSFS to enhance and extend the Lustre file system. This presentation provides a context and motivation to some completed projects such as multi-core support, multiple metadata servers and FSCK along with performance measurements and use cases. A number of new development projects are currently underway; replication, data on metadata server. These projects will be discussed along with use cases. [From the OpenSFS website] OpenSFS is a nonprofit organization founded in 2010 to advance Lustre development, ensuring it remains vendor-neutral, open, and freely downloadable. OpenSFS participants include vendors and customers who employ the world’s best Lustre file system experts, implementing and supporting Lustre solutions across HPC and commercial enterprises. OpenSFS actively promotes the growth, stability and vendor neutrality of the Lustre file system. * Other names and brands may be claimed as the property of others.

Moderators
avatar for Keith Gray

Keith Gray

Manager, High Performance Computing, BP
Keith Gray is Manager of High Performance Computing for BP. The HPC Team supports the computing requirements for BP’s Advanced Seismic Imaging Research efforts. This team supports one of the largest Linux Clusters dedicated to research in Oil and Gas. Mr. Gray graduated from Virginia Tech with a degree in geophysics, and has worked for BP and Amoco since 1985. He was listed in HPCWire’s People to Watch 2006.

Speakers

Thursday March 6, 2014 1:00pm - 1:20pm
BRC 103 Rice University 6500 Main Street at University, Houston, TX 77030

1:20pm

Applications Session I: Developing High Compliance, High Strength Well Cement for Extreme Conditions: A Multiscale Computational Approach, Rouzbeh Shahsavari, Rice University)

PRESENTATION NOT AVAILABLE

VIDEO NOT AVAILABLE

It is challenging to develop bulk materials that exhibit high compliance, high strength and high recoverable strain because of the intrinsic trade-offs among these properties.1 A high compliance in a single phase material usually means weak interatomic bonding and thus low strength. One of the urgent applications of high compliance, high strength materials is in well cementing used in hydraulic fracturing and generally all oil and gas wells where the cement is placed in the annular gap between the drilled formation and the steel casing. Despite the critical role of wellbore cement to prevent zonal isolation and secure casing, cement failure is still a serious problem with enormous socioeconomical impacts (e.g. 2010 oil spill disaster in the Gulf of Mexico, ground water contamination via cement failure in hydraulic fracturing). Wellbore cement frequently fails in a brittle mode due to the downhole pressure, or the formation loading (creep). Given the extreme downhole conditions, to date there is neither a unified understanding nor a reliable methodology to divert this brittle fracture mechanism – a lack of knowledge, which can costs billions of dollars with huge environmental impacts. In this talk, I will describe a novel multiscale computational method to develop a high compliance, high strength wellbore cement where the high compliance assures ductility to accommodate the pressure buildup, and the high strength prevents premature failure. First, I will describe how the state-of-the-art computational atomistic modeling techniques can be used to decode the basic molecular structure of a series of cement hydrate compositions. Second, combinatorial techniques will be utilized to tune the molecular features, self-assembly and aggregation of cementitious materials, thereby providing more coherent microstructure. Third, modern optimization methods such as level sets and phase field methods (based on solving partial differential equations) will be used to simultaneously maximize the ductility and strength of the microstructure via modulating topology and multi-phase heterogeneity of the materials. Together, these multi-scale multi-paradigm methods enable to rapidly screen several fundamental physical properties at extreme conditions (e.g. HTHP, corrosive environments, etc) to find the best-in- class microstructure candidates for accelerated well cementing material discovery. Finally, I will discuss various benefits of such modern computations techniques (e.g. guiding the synthesis of the proof-of-concept high compliance, high strength prototype) with an outlook towards substantially minimizing conventional trial-and-error experiments.

Moderators
HC

Henri Calandra

Total
Henri Calandra obtained his M.Sc. in mathematics in 1984 and a Ph.D. in mathematics in 1987 from the Universite des Pays de l’Adour in Pau, France. He joined Cray Research France in 1987 and worked on seismic applications. In 1989 he joined the applied mathematics department of the French Atomic Agency. In 1990 he started working for Total SA. After 12 years of work in high performance computing and as project leader for Pre-stack Depth... Read More →

Speakers
avatar for Rouzbeh Shahsavari

Rouzbeh Shahsavari

Professor, Rice University
My interest is on developing a multi-scale, multi-paradigm materials modeling approach followed by experimental characterizations to study key functional behavior of complex materials, which are critical to the infrastructure underlying the science and technology enterprises of our society. Depending on the problem of interest, we employ a variety of computational and experimental techniques including (but not limited to) first-principles... Read More →


Thursday March 6, 2014 1:20pm - 1:40pm
BRC 284 Rice University 6500 Main Street at University, Houston, TX 77030

1:20pm

Applications Session II: The impact of discontinuous coefficients and partitioning on parallel reservoir simulation performance, Jonathan Graham, ExxonMobil Upstream Research Company)

PRESENTATION NOT AVAILABLE

VIDEO NOT AVAILABLE

Slow convergence of iterative linear solvers for physical problems with jump or discontinuous coefficients is a well-known challenge for parallel computation. While many papers focus on the finite element (FE) method with an overlapping Schwarz or similar preconditioner for elliptic problems (e.g., Graham and Scheichl 2008), literature with respect to reservoir simulations is limited to steady-state porous media flow with FE (Vuik et al., 2001; Cliffe et al., 2000), or considers two-phase flow, but applies multi-level domain decomposition and finite differences (Kees et al., 2003). However, there are reservoir simulations using either two-stage preconditioners (Aksoylu et al., 2007; Klie et al., 2009) or overlapping Schwarz (Usadi et al., 2007). We consider, instead, the case of non-overlapping, block Jacobi for two-point-flux-approximation finite-volume reservoir simulation. With increased industrial use of HPC for reservoir simulations, significant abnormalities in performance behavior are seen when different HPC configurations are selected. This has been theorized to be partition-induced solver degradation. We construct simple test cases to clearly illustrate the effect and, dependent on the inclusion of both high and low permeability regions inside a subdomain, verify a 20% difference in the average iteration count for linear solutions of the Jacobian. We also consider realistic cases. For a 3D unstructured grid reservoir model with one-hundred-thousand grid cells, we create two identical partitions except for one cell: a nearly vertical stack of cells with high transmissibility connections to each other are split between partitions either above or below this pivotal cell. This small change resulted in a 1.7X increase in computational time due to a similar increase in total linear iterations. Experimentation on a 3-million-cell unstructured grid with as many as 1024 partitions showed a range of 2X in total linear iterations over the course of the simulation. These results show that robust industrial application of HPC for reservoir simulation requires partitioning that is load balanced, communication minimizing and encapsulates local flow patterns. A transmissibility-weighted graph partitioner has been developed that mitigates the performance variability seen in these test cases.

Moderators
avatar for Simanti Das

Simanti Das

Manager, High Performance Computing Software Development & Support, ExxonMobil Technical Computing Company
Simanti Das, is currently the manager of High Performance Computing software development and support group in ExxonMobil Upstream IT organization. She is responsible for providing software development, optimization and support for massively parallel seismic imaging technologies for Upstream business use. Upon receiving her M.S in Computer Science from University of Houston in 1988, Simanti started her career at Exxon Production Research... Read More →

Speakers

Thursday March 6, 2014 1:20pm - 1:40pm
BRC 282 Rice University 6500 Main Street at University, Houston, TX 77030

1:20pm

Programming Models, Libraries and Tools: Compiler Independent Strategy for Data Locality Optimization, Jinxin Yang, University of Houston

DOWNLOAD PRESENTATION

WATCH VIDEO

Data locality is an important optimization for loop oriented kernels. Auto tuning techniques are used to fi nd the best strategy for data locality optimization. However, auto tuning techniques are expensive and not independent of computing frameworks. Porting an application from one framework to another requires the whole auto tuning process to be repeated, in order to get an optimal solution for the new one. A global strategy will help in expediting the porting process for an application. In this work, we present a framework, consisting of OpenUH transformation directives and CHiLL framework, which provides an optimal strategy for the data locality problem which is independent of compilers. Our results show that the strategies given by our framework clearly out class the default optimization levels of OpenUH, GCC, Intel and PGI compilers.

Moderators
avatar for Amik St-Cyr

Amik St-Cyr

Senior researcher, Shell
Amik St-Cyr recently joined the Royal Dutch Shell company as a senior researcher in computation & modeling. Amik came to the industry from the NSF funded National Center for Atmospheric Research (NCAR). His work consisted in the discovery of novel numerical methods for geophysical fluid flows with particular attention to their implementation on supercomputers. He has experience in a plethora of numerical methods for solving time-dependent... Read More →

Speakers
JY

Jinxin Yang

HPCTools Group, University of Houston


Thursday March 6, 2014 1:20pm - 1:40pm
BRC 280 Rice University 6500 Main Street at University, Houston, TX 77030

1:20pm

Systems Infrastructure, Facilities and Visualization: 'Tuning and Measuring Performance on Lustre, John Fragalla, Xyratex

Moderators
avatar for Keith Gray

Keith Gray

Manager, High Performance Computing, BP
Keith Gray is Manager of High Performance Computing for BP. The HPC Team supports the computing requirements for BP’s Advanced Seismic Imaging Research efforts. This team supports one of the largest Linux Clusters dedicated to research in Oil and Gas. Mr. Gray graduated from Virginia Tech with a degree in geophysics, and has worked for BP and Amoco since 1985. He was listed in HPCWire’s People to Watch 2006.

Speakers
avatar for John Fragalla

John Fragalla

HPC Principal Architect, Xyratex
As a high performance computing (HPC) Principal Architect, John brings global expertise to Xyratex. He is a leader on many strategic and complex customer engagements worldwide, and provides technical advisement on future product development and direction within Xyratex and the industry to design storage solutions that meets customer requirements. John is considered one of the primary HPC resources, working with the global HPC community... Read More →


Thursday March 6, 2014 1:20pm - 1:40pm
BRC 103 Rice University 6500 Main Street at University, Houston, TX 77030

1:40pm

Applications Session I: High order methods for reservoir flows, Beatrice Riviere, Rice University

DOWNLOAD PRESENTATION

WATCH VIDEO

We propose a numerical method for solving the miscible displacement problem with discontinuous Galerkin method in space and implicit Runge-Kutta method in time. The method approximates the fluid pressure and the resident fluid concentration by polynomials of arbitrary order. Our algorithm allows us to preserve the high order approximation in both space and time while reducing the computational cost by a decoupling strategy of the pressure and concentration equations. The parallelization of the algorithm has been developed in the Dune framework. Convergence and robustness of the method is shown by several numerical examples.

Moderators
HC

Henri Calandra

Total
Henri Calandra obtained his M.Sc. in mathematics in 1984 and a Ph.D. in mathematics in 1987 from the Universite des Pays de l’Adour in Pau, France. He joined Cray Research France in 1987 and worked on seismic applications. In 1989 he joined the applied mathematics department of the French Atomic Agency. In 1990 he started working for Total SA. After 12 years of work in high performance computing and as project leader for Pre-stack Depth... Read More →

Speakers
avatar for Beatrice Riviere

Beatrice Riviere

Noah Harding Chair and Professor, Rice University


Thursday March 6, 2014 1:40pm - 2:00pm
BRC 284 Rice University 6500 Main Street at University, Houston, TX 77030

1:40pm

Applications Session II: Strong Scalability of Reservoir Simulation on Massively Parallel Computers: Issues and Results, Vadim Dyadechko, ExxonMobil Upstream Research Company

PRESENTATION NOT AVAILABLE

VIDEO NOT AVAILABLE

Numerical simulation of reservoirs is an integral part of commercial development studies to optimize petroleum recovery. Modern petroleum reservoir simulation requires simulating detailed and computationally expensive geological and physical models. Parallel reservoir simulators have the potential to solve larger, more realistic problems than previously possible. To make the solution of these large problems feasible, an efficient parallel implementation of the algorithm is necessary. Such a parallelization of the algorithm requires proper data structures and data layout, parallel direct and iterative solvers, and parallel preconditioners. Load balancing and minimization of communication between processors also plays very important role in achieving that goal. In this talk, we investigate parallel performance for black oil reservoir simulation on multiple massively parallel computing architectures. A deliberate strategy of performance-based development of the major types of computations encountered in reservoir simulation programs is employed. Even though most operations are memory-bandwidth bound, it is possible with careful implementation, to get excellent parallel efficiency to several 1000s of cores. We discuss numerical issues, scalability and parallel efficiency of reservoir simulator on several very large and geologically challenging examples.

Moderators
avatar for Simanti Das

Simanti Das

Manager, High Performance Computing Software Development & Support, ExxonMobil Technical Computing Company
Simanti Das, is currently the manager of High Performance Computing software development and support group in ExxonMobil Upstream IT organization. She is responsible for providing software development, optimization and support for massively parallel seismic imaging technologies for Upstream business use. Upon receiving her M.S in Computer Science from University of Houston in 1988, Simanti started her career at Exxon Production Research... Read More →

Speakers

Thursday March 6, 2014 1:40pm - 2:00pm
BRC 282 Rice University 6500 Main Street at University, Houston, TX 77030

1:40pm

Programming Models, Libraries and Tools: Portable, MPI-Interoperable Coarray Fortran 2.0, Chaoran Yang, Rice University


DOWNLOAD PRESENTATION



The past decade has seen the advent of a number of parallel programming models such as Coarray Fortran (CAF), Unified Parallel C, X10, and Chapel. Despite the productivity gains promised by these models, most parallel scientific applications still rely on MPI as their data movement model. One reason for this trend is that it is hard for users to incrementally adopt these new programming models in existing MPI applications. Because each model use its own runtime system, they duplicate resources and are potentially error-prone. Such independent runtime systems were deemed necessary because MPI was considered insufficient in the past to play this role for these languages.

The recently released MPI-3, however, adds several new capabilities that now provide all of the functionality needed to act as a runtime, including a much more comprehensive one-sided communication framework. In this paper, we investigate how MPI-3 can form a runtime system for one example programming model, CAF, with a broader goal of enabling a single application to use both MPI and CAF with the highest level of interoperability.

Speakers

Thursday March 6, 2014 1:40pm - 2:00pm
BRC 280 Rice University 6500 Main Street at University, Houston, TX 77030

1:40pm

Systems Infrastructure, Facilities and Visualization: Moving the BP High Performance Computing Center into a new facility Kent Blancett, BP

DOWNLOAD PRESENTATION

WATCH VIDEO

In 2013, BP completed the construction of a new facility to support High Performance Computing. This talk with describe the planning and steps to enable the move from the old facility, how we moved the systems, seismic data and operations with minimal downtime.

Moderators
avatar for Keith Gray

Keith Gray

Manager, High Performance Computing, BP
Keith Gray is Manager of High Performance Computing for BP. The HPC Team supports the computing requirements for BP’s Advanced Seismic Imaging Research efforts. This team supports one of the largest Linux Clusters dedicated to research in Oil and Gas. Mr. Gray graduated from Virginia Tech with a degree in geophysics, and has worked for BP and Amoco since 1985. He was listed in HPCWire’s People to Watch 2006.

Speakers

Thursday March 6, 2014 1:40pm - 2:00pm
BRC 103 Rice University 6500 Main Street at University, Houston, TX 77030

2:00pm

Applications Session I: A Hybrid Algorithm for Global Optimization Problems, Leticia Velazquez, Rice University


PRESENTATION NOT AVAILABLE


VIDEO NOT AVAILABLE

We propose a hybrid algorithm for solving global optimization prob- lems that is based on the coupling of the Simultaneous Perturbation Stochastic Approximation (SPSA) and Newton-Krylov Interior-Point (NKIP) methods via a surrogate model. There exist verified algorithms for finding approximate global solutions, but our technique will further guar- antee that such solutions satisfy physical bounds of the problem. First, the SPSA algorithm conjectures regions where a global solution may exist. Next, some data points from the regions are selected to generate a con- tinuously differentiable surrogate model that approximates the original function. Finally, the NKIP algorithm is applied to the surrogate model subject to bound constraints for obtaining a feasible approximate global solution. We present some numerical results on a set of five small problems and two medium to large-scale applications from reservoir simulations.

Moderators
HC

Henri Calandra

Total
Henri Calandra obtained his M.Sc. in mathematics in 1984 and a Ph.D. in mathematics in 1987 from the Universite des Pays de l’Adour in Pau, France. He joined Cray Research France in 1987 and worked on seismic applications. In 1989 he joined the applied mathematics department of the French Atomic Agency. In 1990 he started working for Total SA. After 12 years of work in high performance computing and as project leader for Pre-stack Depth... Read More →

Speakers

Thursday March 6, 2014 2:00pm - 2:20pm
BRC 284 Rice University 6500 Main Street at University, Houston, TX 77030

2:00pm

Applications Session II: Automatic Performance Tuning of Reverse Time Migration Using The Abstract Data and Communication Library, Saber Feki, KAUST

DOWNLOAD PRESENTATION

WATCH VIDEO

With the increased complexity and diversity of mainstream HPC systems, significant effort is required to tune applications in order to achieve the best possible performance for each particular platform. This task becomes more and more challenging and requiring a larger set of skills. Automatic performance tuning is becoming a must for optimizing applications such as Reverse Time Migration (RTM) widely used in seismic imaging for oil and gas exploration. In the RTM application, the time-dependent partial differential acoustic wave equation is discretized in space and time, and the resulting system of linear equations is solved for each time step using an explicit scheme. The 3–D version of RTM is computationally intensive and its execution time becomes reasonable for field data only with a parallel implementation using domain decomposition: the simulation grid is split for each shot into smaller 3–D blocks across multiple MPI processes. At each time step, the computation of the boundary grid points requires neighboring processes to exchange the values of the needed stencil points belonging to neighboring subdomains. Typical implementations make use of the Message Passing Interface (MPI) routines for data exchange and therefore implying an extra execution time for the communication operations. The communication overhead that stem from the parallelization of the RTM algorithm would be considerably reduced using an auto-tuning tool, for instance, the Abstract Data and Communication Library (ADCL) [1, 2]. ADCL is an MPI-based communication library that aims at providing the lowest possible execution time for the communication operations and to ease the software development process with high data abstraction and predefined routines. ADCL allows the parallel code to adapt itself to the current architecture and software environment at runtime. The idea behind ADCL is to select the fastest of the available implementations for a given communication pattern during the (regular) execution of the application. For example, ADCL provides 20 different implementations for multi-dimensional (e.g., 2-D, 3-D) neighborhood communication using different combinations of (i) number of simultaneous communication partners, (ii) handling of non-contiguous messages, and (iii) MPI data transfer primitive. ADCL uses the first iterations of the application to determine the fastest neighborhood communication routine for the current execution conditions. Once performance data on a sufficient number of iterations is available, ADCL can make at runtime a decision on which alternative to use throughout the rest of the simulation. There are three main steps to carry in order to use ADCL: preparation, communication and finalization steps. Through this work, we showcase the performance benefit that come out of auto-tuning the parallel RTM application. For that purpose, we implement two versions of the RTM code for each of (i) isotropic (ISO) and (ii) tilted transversely isotropic media (TTI). The first version is the classic scenario where the commonly used MPI implementation of neighborhood communications is utilized. The second is the automatic performance-tuning version where ADCL is used to transparently select the best MPI implementation of neighborhood communications according to the runtime environment. The numerical scheme used is finite difference with a discretization at the 2nd order in time and 8th order in space. We run the simulations for a total of 720 time steps. We carry out our tests on two different parallel platforms at TOTAL E&P Research and Technology USA, LLC. The first cluster (Appro) is based on AMD CPUs, with 2GB of memory per core and an InfiniBand DDR interconnect. The second (IBM) is an Intel based cluster, with 3GB of memory per core and an InfiniBand QDR interconnect. The InfiniBand network in both clusters has a fat tree network topology. We report the MPI communications times of both ISO and TTI kernels, for both platforms and for each version of the code (with and without ADCL). The main advantage of using ADCL is performance, which consists here in decreasing the execution time of the communication operations. First, we would like to point out is that ADCL is able to select a different implementation of 3–dimensional neighborhood communication for each of the different execution environments and each of the ISO and TTI kernels. Second, the auto-tuned versions using ADCL provides up to 40% improvement in the communication time of RTM as detailed in Figure 2. Another advantage of using ADCL is productivity; namely, ADCL allows developers to implement the neighborhood communication related functions of RTM algorithm very easily. The developer does not need to worry about the choice of MPI communication routines and the memory management required for the halo cells (handling non-contiguous data). By keeping track of the memory addresses of the data structures that are passed to the main RTM function, one can easily integrate ADCL into both isotropic and tilted transversely isotropic RTM algorithms with minor changes to the original code. We are currently working on the optimization of the MPI runtime parameters using the Open Tool for Parameters Optimization (OTPO) [3] based on ADCL, for further improvement of the MPI communication performance. We are also looking into automatic tuning of the OpenACC accelerated kernels on the latest NVIDIA GPUs. Encouraging preliminary results will be presented [4]. References: [1] E. Gabriel, S. Feki, K. Benkert, M. Chaarawi. The Abstract Data and Communication Library, Journal of Algorithms and Computational Technology, Vol. 2-No. 4, page 581-600, December 2008. [2] E. Gabriel, S. Feki, K. Benkert, M. Resch. Towards Performance and Portability through Runtime Adaption for High Performance Computing Applications, 'Concurrency and Computation - Practice and Experience' journal, Vol. 22, no. 16, pp. 2230-2246, 2010. [3] M.Chaarawi,J. Squyres,E. Gabriel,S.Feki,A Tool for Optimizing Runtime Parameters of Open MPI, Recent Advances in Parallel Virtual Machine and Message Passing Interface; Lecture Notes in Computer Science Volume 5205, 2008, pp 210-217 [4] S. Feki, S. Siddiqui, “Towards Automatic Performance Tuning of OpenACC Accelerated Scientific Applications” NVIDIA GPU Technology Conference, San Jose, California, USA, March 18-21, 2013. Acknowledgement: This work has been done while Hakan Haberdar was an intern and Saber Feki was an employee in TOTAL E&P USA Research & Technology. The authors would like to thank Total for the support of this work and the help and the advising of senior HPC advisor, Terrence Liao.

Moderators
avatar for Simanti Das

Simanti Das

Manager, High Performance Computing Software Development & Support, ExxonMobil Technical Computing Company
Simanti Das, is currently the manager of High Performance Computing software development and support group in ExxonMobil Upstream IT organization. She is responsible for providing software development, optimization and support for massively parallel seismic imaging technologies for Upstream business use. Upon receiving her M.S in Computer Science from University of Houston in 1988, Simanti started her career at Exxon Production Research... Read More →

Speakers
avatar for Saber Feki

Saber Feki

Computational Scientist, KAUST Supercomputing Laboratory
Saber Feki received his PhD and M.S in computer science at the University of Houston in 2008 and 2010 respectively. In 2011, he joined the oil and gas industry with TOTAL as an HPC Research Scientist working on seismic imaging applications using different programming models including CAF, OpenACC and HMPP. Saber currently holds the position of a computational scientist at the KAUST Supercomputing Laboratory where he was part of the technical... Read More →


Thursday March 6, 2014 2:00pm - 2:20pm
BRC 282 Rice University 6500 Main Street at University, Houston, TX 77030

2:00pm

Programming Models, Libraries and Tools: A parallel multi-physics library for rigid-body, flexible-body, and fluid dynamics, Dan Negrut, University of Wisconsin - Madison

DOWNLOAD PRESENTATION

WATCH VIDEO

We present a multi-physics, multi-discipline computational framework for the modeling, simulation, and visualization of multibody dynamics, granular flow, and fluid-solid interaction applications. The Chrono simulation tool has a modular structure, built on top of five foundation elements that provide support for (1) modeling; (2) numerical solution; (3) proximity computation and contact detection; (4) domain decomposition and inter-domain communication; and (5) pre- and post-processing. The modeling component provides support for the automatic generation of the very large and complex sets of equations for different classes of applications. This is achieved in a fashion transparent to the user who need only provide high-level model and solution parameters. Examples include the equations of motion for granular flow simulations, using either a Differential Variational Inequality (DVI) or a Discrete Element Method (DEM) approach, the dynamic equations in an Absolute Nodal Coordinate Formulation (ANCF) for flexible multibody dynamics, the Smooth Particle Hydrodynamics (SPH) discretization of the Navier-Stokes equations for fluid-solid interaction problems, etc. The numerical solution component provides the parallel algorithmic support required to solve the set of equations governing the dynamics of interest. Depending on the underlying physics, various parallel solvers are employed for: optimization problems arising in the DVI approach for handling frictional contact; solving nonlinear problems arising in the context of implicit numerical integration; SPH-based methods for fluid-solid interaction problems, etc. For discrete problems, the proximity computation and contact detection component handles contact detection tasks; for continuum problems handled in a meshless framework it produces the list of neighboring nodes that overlap the compact support associated with each node of the discretization. The domain decomposition and inter-domain communication component manages the splitting of large problems into subdomains and provides support for the required inter-process communication. This enables the MPI simulation of granular flow problems with millions of particles interacting through frictional contact, conducted on hundreds of distributed nodes. The pre/post-processing component supports the process of setting up a model using the Chrono API and provides support for efficient visualization of simulation results from problems involving millions of states resolved at frequencies of hundreds of Hertz. Chrono leverages heterogeneous parallel computing architectures, including GPU and multi-core CPU processors, as well as MPI distributed architectures, to accelerate the simulation of very large systems. Examples of such systems include those encountered in granular dynamics where the number of interacting elements can be in the millions and fluid-solid interaction simulations involving millions of fluid markers and tens of thousands of solid (rigid or flexible) bodies. Chrono handles seamlessly systems that include both complex mechanisms composed of rigid bodies connected through mechanical joints and collections of millions of discrete elements interacting through contact, impact, and friction. Chrono is available open source, under a BSD license. Completely platform-independent, Chrono::Engine libraries are available for Windows, Linux and Mac OSX, in both 32-bit and 64-bit versions.

Moderators
avatar for Amik St-Cyr

Amik St-Cyr

Senior researcher, Shell
Amik St-Cyr recently joined the Royal Dutch Shell company as a senior researcher in computation & modeling. Amik came to the industry from the NSF funded National Center for Atmospheric Research (NCAR). His work consisted in the discovery of novel numerical methods for geophysical fluid flows with particular attention to their implementation on supercomputers. He has experience in a plethora of numerical methods for solving time-dependent... Read More →

Speakers

Thursday March 6, 2014 2:00pm - 2:20pm
BRC 280 Rice University 6500 Main Street at University, Houston, TX 77030

2:00pm

Systems Infrastructure, Facilities and Visualization:'Take Good Care of the Engine If You Want to Drive, Fritz Ferstl, Univa Corporation
Workload and distributed resource management systems (often also referred to in a less descriptive way as batch schedulers) have been used as part of the of the typical HPC computing infrastructure in the Oil and Gas industry for more than a decade. So, certainly, all problems are solved and these schedulers are legacy, right? Nothing could be farther from the truth! Get the workload and resource management part wrong and all your finesse applied around application tuning, server/storage/network architecture or energy saving is a waste of effort and money to a large degree. Why is workload and resource management so crucial and why now more than ever? In this presentation we will discuss why workload and resource management is the conveyer belt which drives production in an HPC data center and what cost and productivity impacts are incurred when the workload and resource management system slows operation down or even brings it to a halt. We will provide concrete examples from practice how choices concerning the workload and resource management system and the infrastructure around it have immediate and considerable impact on time-to-results and throughput. Aspects being discussed in this context will be the selection of tools, dependencies on networking and storage, configuration and policy set-up best practices, considerations on application integration plus the utilization of accounting, reporting and analytics tools for optimizing the HPC infrastructure. Furthermore we will investigate how recent and emerging trends in the evolution of HPC infrastructure components present new challenges to workload and resource management systems and how such trends need to be taken into account when planning infrastructure upgrades. We will take a look at the growing complexity from increasingly heterogenous server hardware (multi-socket, multi-core, many core, GPUs, NUMA - to just list a few key words) and what is required to exploit such technologies to meet the scalability and throughput goals anticipated for the next generation Oil and Gas data centers. We will also showcase how energy saving efforts revolve around the workload and resource management system. [Remark: it is our intention that this presentation will be held jointly with a representative of one of our key Oil & Gas customers, namely from either Saudi Aramco, BP or PGS. This speaking opportunity at the 2014 Rice Oil & Gas Workshop was brought to our attention with too short notice to be able to get any of our customer liaisons go through their respective company internal approval process to be able to commit to participate. We therefore had to submit the paper as it stands just with a speaker from Univa.]

Moderators
avatar for Keith Gray

Keith Gray

Manager, High Performance Computing, BP
Keith Gray is Manager of High Performance Computing for BP. The HPC Team supports the computing requirements for BP’s Advanced Seismic Imaging Research efforts. This team supports one of the largest Linux Clusters dedicated to research in Oil and Gas. Mr. Gray graduated from Virginia Tech with a degree in geophysics, and has worked for BP and Amoco since 1985. He was listed in HPCWire’s People to Watch 2006.

Thursday March 6, 2014 2:00pm - 2:20pm
BRC 103 Rice University 6500 Main Street at University, Houston, TX 77030

2:20pm

Applications Session I: Approaches to Lattice Boltzmann Method (LBM) Implementation: A Case Study, Deepak Majeti, Rice University

DOWNLOAD PRESENTATION

WATCH VIDEO

In this work, we present different implementations of the Lattice Boltzmann Method algorithm for fluid dynamics simulation, parallelized from a single sequential implementation. We discuss versions of the algorithm that adapt to diverse CPU-GPU hardware as well as versions fine-tuned for a GPU cluster. We compare different programming approaches, compiler optimizations and scheduling techniques for the LBM implementation. We successfully reduce the execution time of the simulation from 5 days on a single CPU node to just 200 seconds on a single GPU node. Lattice Boltzmann Method (LBM) simulation is a widely used technique to observe the free flow of fluid through porous media. The oil and gas industries use this technique to measure the porosity of a rock. Porosity of the rock is a detrimental factor in choosing a well for oil and natural gas extraction. The process of choosing a well involves making a preliminary analysis of the rock on the field and then further processing it in the research lab for greater accuracy. The field version of LBM is usually run on a single desktop system available and should be portable with reasonable accuracy. The LBM implementation for the research laboratory must be fine-tuned to a particular research cluster and must achieve maximum performance and precision. LBM simulation is computationally intensive. For certain rocks, computing a 300x300x300 grid is equivalent to simulating a 3000x3000x3000 cube of atoms and takes around 5 days to measure the porosity of the rock with reasonable accuracy. Recently, high performance heterogeneous architectures have become ubiquitous and are being widely adopted by many industries including oil and gas industries. However, extracting the maximum performance on these heterogeneous architectures is non-trivial and requires rigorous training. In our work, we show simple yet powerful approaches and methods to take advantage of these modern heterogeneous architectures and achieve good performance. LBM has two main kernels: collision and propagation. The sequential version of the LBM code takes 20 seconds per iteration for a grid size of 300x300x300 of single precision data values on a single CPU node. One of our optimizations merges the collision and propagation kernels into a single loop in order to avoid loading and storing from memory the entire grid between the invocations of the kernels. Merging the kernels reduces the time per iteration to 12 seconds. Next, we employ array linearization and loop normalization optimizations; reducing the time per step to 6.1 seconds. Finally, we fine-tune the application by building a sparse version of the merged collision and propagation kernels. An OpenMP version of this optimized kernel takes 1 second per time step on a 12-core CPU node while an OpenCL version takes 0.02 seconds per time step on a single GPU node. We implemented the cluster version of LBM using MPI + OpenCL. The cluster implementation enables LBM simulation over very large grid sizes. Further, we experiment with different optimizations using Habanero-C, a portable language for heterogeneous (CPU-GPU) architectures. The "forasync" construct in Habanero-C is compiled down to OpenCL. Merging the collision and propagation kernels requires a shadow copy of the grid to store the new values computed in each time step. This poses a constraint on the grid size of the simulation since the GPU memory is often limited on a single device. When the entire grid does not fit in the available GPU memory, we partition the grid and perform the computation on one partition at a time. This requires data to be copied to and from the device. We overlap the communication of data with computation of the kernel to hide the data-copying overhead. The propagation kernel has conditional statements to handle the boundary cells. We strip-mine the kernel into computation on the inner grid and computation on the boundaries. The inner grid, which is free of conditionals, is efficiently executed on the GPU while the boundary cells, which have conditionals, are executed on the CPU. We also experiment by changing the data layout of the grid and observe the performance on various heterogeneous hardware. We conclude with a discussion of the performance of our implementations on recent heterogeneous (CPU-GPU) devices from AMD, Intel and NVIDIA. We are in the process of evaluating the performance of these versions on various heterogeneous hardware including AMD APU/ discrete devices, Intel IVB device and Nvidia Fermi and Kepler devices.

Moderators
HC

Henri Calandra

Total
Henri Calandra obtained his M.Sc. in mathematics in 1984 and a Ph.D. in mathematics in 1987 from the Universite des Pays de l’Adour in Pau, France. He joined Cray Research France in 1987 and worked on seismic applications. In 1989 he joined the applied mathematics department of the French Atomic Agency. In 1990 he started working for Total SA. After 12 years of work in high performance computing and as project leader for Pre-stack Depth... Read More →

Speakers
avatar for Deepak Majeti

Deepak Majeti

Doctoral Student, Rice University


Thursday March 6, 2014 2:20pm - 2:40pm
BRC 284 Rice University 6500 Main Street at University, Houston, TX 77030

2:20pm

Applications Session II: Hybrid CPU-GPU Finite Difference Time Domain Kernels, Thor Johnsen, Chevron

DOWNLOAD PRESENTATION

WATCH VIDEO

If you are developing finite difference kernels, you’ve probably considered whether to use CPUs or GPUs. This poses you with an interesting dilemma: CPUs have larger memory capacity, but GPUs are more cost effective. Which one do you value more? This talk explores a design pattern for FDTD kernels that lets you have both, the cost efficiency of GPUs combined with the memory capacity of CPUs. We show that this approach can be used to implement any FDTD kernel, including elastic tilted orthorhombic. We further show that the elastic tilted orthorhombic kernel we implemented can propagate volumes with 10’s of billions of cells running on a single K10 GPU. We show that this kernel can be scaled up to 64 GPUs with almost linear scaling in throughput.

Moderators
avatar for Simanti Das

Simanti Das

Manager, High Performance Computing Software Development & Support, ExxonMobil Technical Computing Company
Simanti Das, is currently the manager of High Performance Computing software development and support group in ExxonMobil Upstream IT organization. She is responsible for providing software development, optimization and support for massively parallel seismic imaging technologies for Upstream business use. Upon receiving her M.S in Computer Science from University of Houston in 1988, Simanti started her career at Exxon Production Research... Read More →

Speakers

Thursday March 6, 2014 2:20pm - 2:40pm
BRC 282 Rice University 6500 Main Street at University, Houston, TX 77030

2:20pm

Programming Models, Libraries and Tools: Fine Grain MPI, Earl Dodd, University of British Columbia

DOWNLOAD PRESENTATION

WATCH VIDEO

A major challenge in today’s high performance systems is how to seamlessly bridge between the fine-grain multicore processing inside one processing node to the parallelism available across the nodes in a cluster. In many cases this has led to a hybrid programming approach that has combined Message Passing Interface (MPI) with a finer grain programming model like OpenMP. However, the hybrid approach requires supporting both programming models, creates an inflexible boundary between the parts of the program using one model versus the other, and can create runtime systems models conflicts. We present a system called Fine-grain MPI (FG-MPI) that bridges the gap between multicore and cluster nodes by extending the MPI middleware to support a finer-grain process model that can support large number of concurrent threads in addition to multiple processes across the nodes. This provides a single unified process model that can both scale up and scale out without programming changes or rebuilds. FG-MPI extends the MPICH2 runtime to support execution of multiple MPI processes inside each single OS-process, essentially decoupling an MPI process from that of an OS-level process. These processes are full-fledged MPI processes. It is possible in FG-MPI to have hundreds and even thousands of MPI processes inside a single OS-process. As a result one can develop and execute MPI programs that scale to thousands and millions of MPI processes without requiring the corresponding number of processor cores. FG-MPI supports function-level parallelism, where a MPI process is bound to a function rather than a program, which brings MPI closer to that of task-oriented languages. Expressing function-level concurrency makes it easier to match the parallelism to the problem rather than to the hardware architecture. Overheads associated with the extra message-passing and scheduling of these smaller units of parallelism have been minimized. Context switching among co-located MPI processes in user space is an order of magnitude faster than that of OS-level processes. There is also support for zero-copy communication among co-located MPI processes inside the single address space. The FG-MPI runtime is integrated into the MPICH2 middleware and the co-located MPI processes share key structures inside the middleware and cooperatively progress messages for each other. FG-MPI implements a MPI-aware user-level scheduler that works in concert with MPICH2’s progress engine and is responsive to events occurring inside the middleware. For communication efficiency, we exploit the locality of MPI processes in the system and implement optimized communication between co-located processes in the same OS-process. FG-MPI can be viewed as a type of over-subscription (in the case of SPMD), however, it is the runtime scheduler that manages this over-subscription and not the OS-scheduler. Scheduling of heavy-weight MPI processes by the OS introduces a number of overheads due to costly context switches and because the OS scheduler is not aware of the cooperative nature of the communicating processes. In FG-MPI, not only are context switches an order of magnitude cheaper, but it is possible to reduce OS jitter by matching the number of OS-processes to the processor cores and having the remaining processes scheduled inside the OS-process. Cooperative execution of multiple MPI processes within an OS-process adds slackness that is important for latency hiding. This helps to reduce the idle time that can result from busy polling of the network inside the middleware. FG-MPI can improve the performance of existing MPI programs. Added concurrency makes it possible to adjust the unit of computation to better match the hardware architecture cache size. It can also aid in pipelining of several smaller messages and avoiding the rendezvous protocol commonly used for large messages. The ability to specify finer-grain smaller task-oriented units of computation makes it possible to assign them in many different ways to achieve better load balancing. The support for finer-grain tasking makes it possible to view MPI as a library for supporting concurrent programming rather simply a communication library for moving data among clusters of nodes. Function-level parallelism is closer to the notion of parallelism defined by process-oriented programming or languages based around Actor-like systems. In conclusion, FG-MPI provides a better match for today’s multicore processors and can be used for task-oriented programming with the ability to execute on a single machine (i.e., node) or a cluster of nodes. FG-MPI provides a single programming model that can execute within a single multicore node and across multiple multicore nodes in a cluster.

Moderators
avatar for Amik St-Cyr

Amik St-Cyr

Senior researcher, Shell
Amik St-Cyr recently joined the Royal Dutch Shell company as a senior researcher in computation & modeling. Amik came to the industry from the NSF funded National Center for Atmospheric Research (NCAR). His work consisted in the discovery of novel numerical methods for geophysical fluid flows with particular attention to their implementation on supercomputers. He has experience in a plethora of numerical methods for solving time-dependent... Read More →

Speakers
avatar for Earl J. Dodd

Earl J. Dodd

Chief Strategy Officer, Scalable Analytics Inc.


Thursday March 6, 2014 2:20pm - 2:40pm
BRC 280 Rice University 6500 Main Street at University, Houston, TX 77030

2:20pm

Systems Infrastructure, Facilities and Visualization: Visualization of Very Large Scientific Data, David Pugmire, Oak Ridge National Laboratory

DOWNLOAD PRESENTATION

WATCH VIDEO

Analysis and visualization of the data generated by scientific simulation codes is a key step in enabling science from computation. However, a number of challenges lie along the current hardware and software paths to scientific discovery. These challenges occur over several different axes, including: data size, data complexity, type of visualization, number of nodes in an HPC system, and the increasing amount of parallelism within a node. Further, as the computational improvements outpaces those of I/O, more data will be discarded and I/O-heavy analysis will suffer. Furthermore, the limited memory environment, particularly in the context of in situ analysis that can sidestep some I/O limitations, will require efficiency of both algorithms and infrastructure. We present work that characterizes the performance of visualization techniques in a variety of HPC settings, as well as different visualization algorithms. We also present work on a new library, the Extreme Scale Analysis and Visualization Library (EAVL), which has been developed to efficiently utilize the massive amounts of parallelism available on current, and future compute nodes, as well as provide a more descriptive data model for large scientific applications.

Moderators
avatar for Keith Gray

Keith Gray

Manager, High Performance Computing, BP
Keith Gray is Manager of High Performance Computing for BP. The HPC Team supports the computing requirements for BP’s Advanced Seismic Imaging Research efforts. This team supports one of the largest Linux Clusters dedicated to research in Oil and Gas. Mr. Gray graduated from Virginia Tech with a degree in geophysics, and has worked for BP and Amoco since 1985. He was listed in HPCWire’s People to Watch 2006.

Speakers

Thursday March 6, 2014 2:20pm - 2:40pm
BRC 103 Rice University 6500 Main Street at University, Houston, TX 77030

2:40pm

Networking and Break
Thursday March 6, 2014 2:40pm - 3:15pm
BRC

3:15pm

Keynote: Efficiency and Parallelism: The Challenges of Future Computing Bill Dally, Chief Scientist and SVP of Research, Nvidia

DOWNLOAD PRESENTATION

WATCH VIDEO

Abstract: The computing demands of mobile devices, data centers, and HPC are increasing exponentially.  At the same time, the end of Dennard scaling has slowed the rate of improvement and made all computing power limited, so that performance is determined by energy efficiency.  With improvements in semiconductor process technology offering little increase in efficiency, innovations in architecture and circuits are required to maintain the expected performance scaling.   The large scale parallelism and deep storage hierarchy of future machines poses programming challenges. This talk will discuss these challenges of efficiency and parallelism in more detail and introduce some of the technologies being developed to address them.

Moderators
avatar for Jan Odegard

Jan Odegard

Executive Director, Ken Kennedy Institute for Information Technology, Rice University
Jan E. Odegard joined the Ken Kennedy Institute for Information Technology (formerly Computer and Information Technology Institute) at Rice University as Executive Director in 2002. In this role he led the development and deployment of large scale competing resources in support of research. Today, the computational resources deployed at Rice supports the research of over 100 faculty members and close to 500 users. The majority of users are... Read More →

Speakers
BD

Bill Dally

Bill is Chief Scientist and Senior Vice President of Research at NVIDIA Corporation and a Professor (Research) and former chair of Computer Science at Stanford University. Bill and his group have developed system architecture, network architecture, signaling, routing, and synchronization technology that can be found in most large parallel computers today. While at Bell Labs Bill contributed to the BELLMAC32 microprocessor and designed the MARS... Read More →


Thursday March 6, 2014 3:15pm - 4:00pm
BRC 103 Rice University 6500 Main Street at University, Houston, TX 77030

4:00pm

Plenary: HPC Market Update, HPC Trends in the Oil/Gas Sector and IDC's Top 10 Predictions for 2014 Earl Joseph II, VP HPC and Exec Director, HPC User Forum, IDC

Moderators
avatar for Jan Odegard

Jan Odegard

Executive Director, Ken Kennedy Institute for Information Technology, Rice University
Jan E. Odegard joined the Ken Kennedy Institute for Information Technology (formerly Computer and Information Technology Institute) at Rice University as Executive Director in 2002. In this role he led the development and deployment of large scale competing resources in support of research. Today, the computational resources deployed at Rice supports the research of over 100 faculty members and close to 500 users. The majority of users are... Read More →

Speakers
avatar for Earl Joseph II

Earl Joseph II

Program Vice President and Executive Director HPC User Forum, IDC
Earl Joseph, Research Vice President of IDC's High-Performance Systems, drives research and consulting efforts associated with the United States, Europe and Asia-Pacific markets for technical servers and supercomputers. Dr. Joseph advises IDC clients on the competitive, managerial, technological, integration and implementation issues for technical servers. Dr. Joseph is also heading up IDC's high-end HPC user forum activities.  Dr... Read More →


Thursday March 6, 2014 4:00pm - 4:30pm
BRC 103 Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster Session and networking
Moderators
avatar for Jan Odegard

Jan Odegard

Executive Director, Ken Kennedy Institute for Information Technology, Rice University
Jan E. Odegard joined the Ken Kennedy Institute for Information Technology (formerly Computer and Information Technology Institute) at Rice University as Executive Director in 2002. In this role he led the development and deployment of large scale competing resources in support of research. Today, the computational resources deployed at Rice supports the research of over 100 faculty members and close to 500 users. The majority of users are... Read More →

Thursday March 6, 2014 4:30pm - 6:30pm
BRC

4:30pm

Poster: A Data-centric Profiler for Parallel Programs, Xu Liu, Rice University
An asymptotic approximation of the Dirichlet to Neumann (DtN) map of high contrast composite media with perfectly conducting inclusions that are close to touching is presented. The result is an explicit characterization of the map in the asymptotic limit of the distance between the inclusions tending to zero.The approximation of DtN map is applied to nonoverlapping domain decomposition methods as preconditioners.

Speakers

Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: A Stability Monitoring Scheme of Drill-String Vibration based on Numerical Simulations of Wave Propagations, Yu Liu, Rice University

DOWNLOAD POSTER PDF


The goal of this research is to develop a real-time stability monitoring scheme of the bottom-hole-assembly (BHA) vibration in drill-string using highly efficient numerical analysis of the lateral wave information. Lateral vibrations are considered to be severely destructive to drill-string operations but lateral vibrations or waves cannot be detected on the surface due to the strong damping environment and its highly dispersive nature. On the other hand, axial acoustic waves have been constructively utilized to transmit information through drill-string. In this study, the drill-string is modeled as a linear beam structure under gravitational field effects. An iterative wavelet-based spectral finite element method is developed to obtain a high fidelity response. Its high computational efficiency and capability to parallel computing outperform other existing methods. Numerical simulations of the lateral wave propagation at the BHA are first conducted and a time-frequency analysis technique is applied to the response in order to identify the relationship between the position of the neutral point and the dispersive properties of the lateral wave. Next, axial acoustic wave propagation through the upper drill pipe is conducted to explore the banded transmission properties of the drill-string introduced by periodic joints. Based on the results, a new monitoring scheme is proposed to monitor the stability condition of the vibration of drill-string based on a combination of lateral wave analysis at the BHA and the axial acoustic telemetry technique.

Speakers
avatar for Yu Liu

Yu Liu

PhD Student, Rice University
I am currently a PhD student in mechanical engineering and expected to graduate in November this year. I am currently working on a project of stability monitoring of drill-string vibration and wave propagation.


Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: Accelerating an iterative Helmholtz solver with FPGAs, Art Petrenko, University of British Columbia

DOWNLOAD POSTER PDF


We implement the Kaczmarz row-projection algorithm (Kaczmarz (1937)) on a CPU host + FPGA accelerator platform using techniques of dataflow programming. This algorithm is then used as the preconditioning step in CGMN, a modified version of the conjugate gradients method (Björck and Elfving (1979)) that we use to solve the time-harmonic acoustic isotropic constant-density wave equation. Using one accelerator we speed-up the solution of the wave equation for one source by 2× compared with one Intel core.

Speakers
avatar for Art Petrenko

Art Petrenko

Graduate Student, University of British Columbia
I am currently developing an implementation of an algorithm for iteratively solving large systems of linear equations using a reconfigurable computing platform. The purpose is to model propagation of seismic waves in the frequency domain as part of full-waveform inversion. The platform consists of a CPU connected to an FPGA accelerator and uses the paradigm of dataflow programming. | | My poster at the 2014 Rice O&G HPC Workshop is titled... Read More →


Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: An aggregation algebraic multigrid method on many and multi core architectures, Rajesh Gandham, Rice University

DOWNLOAD POSTER PDF


We present an efficient, robust aggregation based algebraic multigrid preconditioning tech-nique for the solution of large sparse linear systems. These linear systems arise from the dis- cretization elliptic PDEs in various applications. Some of the applications include reservoir simulations, heat equations and incompressible Navier-Stokes equations. Algebraic multigrid methods provide grid independent convergence for these problems, making them one among the best for the solutions of elliptic PDEs in practical applications. The method involves two stages, setup and solve. In the setup stage, hierarchical coarse grids are constructed through aggregation of the fine grid nodes. These aggregations are obtained using a set of maximal independent nodes from the fine grid nodes. The aggregations are combined with a piecewise constant (unsmooth) interpolation from the coarse grid solution to the fine grid solution, ensuring low setup and interpolation cost. The grid independent convergence is achieved by using recursive Krylov iterations (K-cycles) in the solve stage. An efficient combination of K-cycles and standard multigrid V-cycles is used as the preconditioner for the Conjugate Gradient method. We perform the setup on CPU using C++ and STL libraries and solve using the kernels written in a unified threading language OCCA for performance portability of the implementations on traditional CPU and modern many core GPU architectures. We present a comparison of performance of OCCA kernels when cross compiled with OpenCL, CUDA, and OpenMP at runtime on GPUs and CPUs.

Speakers
avatar for Rajesh Gandham

Rajesh Gandham

Graduate Student, Rice University
I am a PhD student working with Dr. Tim Warburton in the department of Computational and Applied Mathematics at Rice University. I am passionate about developing fast and scalable algorithms and implementations for large scale scientific computing applications. I am particularly interested in high performance via multiple GPUs.


Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: An Approximate Inverse to Extended Born Modeling Operator, Jie Hou, Rice University

DOWNLOAD POSTER PDF


In seismic imaging, one tries to recover the subsurface reflector information from seismic reflection data. It usually depends on the linearized model of Born approximation. This process is essentially to compute the inverse to Born Modeling Operator. However, the common imaging technique, Reverse Time Migration(RTM), is only the adjoint of modeling operator. Though it can position the reflectors correctly, the migration operator will not produce the correct amplitudes or wavelet. An inversion will be the true-amplitude Reverse Time Migration. The "true-amplitude” here is meant in the ray-theoretic(asymptotic) sense. True amplitude migration was first developed for Kirchhoff migration by compensating the amplitudes. Ten Kroode(2012) gave a wave-equation-based Kirchhoff operator, which is an approximate inverse of the extended modeling operator in 3D. Inspired by him, I try to derive the approximate inverse mathematically in 2D. In this project, I apply asymptotic ray theory to depth-oriented modeling/migration operator using progressing wave expression. Then the Normal Operator is analyzed using principle of stationary phase. I determine that the adjoint operator differs from an asymptotic inverse only by application of several velocity-independent filters, which I identify explicitly. In addition to the theoretical derivation, I provide a numerical implementation and illustrate that effectiveness of the asymptotic inverse via computational examples.This is very rewarding. Because the amplitude information itself, on one hand, is very useful to detect the reservoir. On the other hand, this new operator can be used as a preconditioner for Full Waveform Inversion, a process which iteratively improves an initial model by matching the measured data and modeled data. The new preconditioner will speed up the the convergence of iterations dramatically.

Speakers

Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: An Efficient Numerical Algorithm for Flash Calculations with Graphic Processor Units (GPU) in Compositional Reservoir Simulations, Guan Qin, University of Houston
Light oil and gas condensate reservoirs, as well as some enhanced oil recovery processes, usually exhibate complicated phase behavior that invovles the significant changes in fluid compositions through their production life cycle. Numerical modeling of such reservoirs requires compositional simulation that solves coupled problem of multi-phase and multi-component flow and mass exchanges among different phases. Equation of State (EOS) based flash calculation is usually employed for the calculation of phase partition of the fluid compostion in compositional simulation and could consume up to 40% of the total simulation time. Such a significant computational cost should be considered and mitigated in the implementation of the compositional simulation in parallel computing architectures. The recent breakthrough in the utilization of graphical processing unit (GPU) as a highly parallel programmable processor provides a low-cost high performance parallel computing platform. In this paper, we have proposed and developed a GPU-based algorithm for the EOS based flash calculation to improve the numerical efficiency of compositional simulations. EOS based flash calculations involve various types of data and operations. By exploiting dataflow nature of the flash calculation algorithm, the number of external references can be reduced due to better caching behavior and data reusability, and the memory bandwidth bottleneck can be alleviated. We first optimized the simulation code to reduce the overall operation counts. In addition, a new data structure was designed and implemented to achieve coalesced access to the global memory. Further optimization work was done for better utilizing the constant memory, the shared memory and the registers on GPUs, based on data characteristics. Three compositional simulation cases, including refined SPE3 and SPE5 cases, were tested. We achieved speedup factors from 15.4 to 24.9 for the flash calculation and successfully reduces the cost of the flash calculations to a trivial level, 1%~2% of the total computational time.

Speakers

Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: Approximating Traveling Salesman and p-Median Solutions using Linear Relaxations, Caleb Fast, Rice University
This poster presents a method for developing improved approximation algorithms for two common problems from operations research, the traveling salesman problem (TSP) and the p-median problem (PMP). The PMP and the TSP are of interest because of their exceptionally wide applicability. The TSP is used for many problems, such as planning routes for geological survey or collection vehicles, as well as other problems, such as DNA sequencing. The PMP meanwhile is used for facility location problems, such as determining locations for fuel stations. These problems are commonly attacked using the linear relaxation of an integer programming formulation. However, in neither problem is the error bound of the relaxation well-known. This poster presents a method both for understanding the errors of the relaxations, and for finding approximate integral solutions to the problems when the linear solution is half-integral. The approximate solutions found are better than current state-of-the-art algorithms. For each problem, the method is based on solving a different, ideally tractable problem, and then using the solution of the simpler problem to compute a solution of the original. In the case of the TSP, I solve a matching problem on the support graph of the linear relaxation, while in the case of the PMP, I solve a dominating set problem. The solutions of these problems show which fractional edges of the support graph should be included in the optimal solution and which should be removed.

Speakers

Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: Collective Transport of a Large Object in a Distributed Configuration Space, Golnaz Habibi, Rice University
Object transport has a lot of applications in industries and agriculture as well as disaster relief and warehouses applications. This poster presents a novel distributed algorithm for multi-robot systems to collectively transport a large object while avoiding obstacles in unknown environment. Given the size of the object, path planner robots generate the minimum cost path from start to the goal position by using a distributed Bellman-Ford algorithm. Then, transporter robots carry the object through the path. A transport is safe if it is obstacle free. We define transport cost as the cost of translation and rotation of the object. This study is trying to trade-off between the cost and safety of the transport by using distributed configuration space and tree-based path planing. We have implemented our algorithm both in simulation and real environments. As results show, our approach is robust to the size and shape of the object and provides a safe and efficient transport in unknown environments.

Speakers

Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: Convergence of Discontinuous Galerkin Methods for Poroelasticity Equations, Jun Tan, Rice University
In reservoir engineering and environmental engineering, people are con- cerned about poroelasticity, the modeling of coupled fluid-solid processes. The Biot model is one important mathematical model of coupled fluid-solid processes. This model involves the coupling between a transport law and a balance law, and thus it can model the fluid transport in porous media and predict the deformation of the solid. This work provides a theoretical analysis of a new numerical method for solving the poroelasticity equations. We approximate the pressure, displacement and dilatation by the discontinuous Galerkin method, which includes symmetric, nonsymmetric and incomplete interior penalty Galerkin cases. We show convergence of the mathod by deriving error estimates. Numerical examples are given.

Speakers

Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: Deformable Complex Network for Refining Low Resolution Structures, Chong Zhang, Rice University
In macromolecular X-ray crystallography, it is often desirable to build more accurate atomic models based on lower resolution experimental diffraction data. In this study, we report a refinement algorithm called the deformable complex network (DCN), which is developed by including a novel angular-network-based restraint in target function in addition to what used in deformable elastic network (DEN) model (Nature 464:1218 (2010)). Our results demonstrate that, across a wide range of low-resolution structures, significant improvements were achieved in terms of multiple refinement criteria, such as the Rfree value, overfitting and Ramachandran statistics etc.

Speakers
CZ

Chong Zhang

Student in Computational Applied Physics, Rice Quantum Institute/Applied Physics Program
Graduated in Dec. 2013 with a Ph.D. in Computational Applied Physics. | | Passionate about integrating high performance computing, physics and mathematics to tackle industrial challenges in science and engineering.


Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: Discontinuous Galerkin method for miscible displacement simulations, Jizhou Li, Rice University

DOWNLOAD POSTER PDF


During the miscible displacement process, a solvent fluid is injected into a porous medium; it mixes with a resident fluid. The fluid mixture moves in the porous medium as a single phase flow, with a velocity that follows Darcy's law. Furthermore, the solvent concentration satisfies a convection-dominated parabolic problem, with a diffusion-dispersion tensor that depends on the fluid velocity in a nonlinear fashion. The fluid pressure equation is coupled with the concentration equation. These essential aspects constitute the miscible displacement problem. This problem arises in many applications, such as production of trapped oil in reservoirs by enhanced oil recovery. The poster will present numerical simulations of miscible displacement by using discontinuous Galerkin method. The high order numerical discretization maintains mass conservation and demonstrates low sensitivity to grid distortions. The numerical method is implemented on the parallel architecture using overlapping domain decomposition. Simulation results show the robustness of the method, as well as efficiency on a parallel cluster.

Speakers
JL

Jizhou Li

Graduate Student, Rice University
I am PhD student in Computational and Applied Mathematics at Rice University. I am interested in developing efficient and accurate solutions to porous media flow and transport problems, while maintaining a solid theoretical base.


Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: Fragility assessment of above ground petroleum storage tanks under storm surge, Sabarethinam Kameshwar, Rice University

DOWNLOAD POSTER


ABSTRACT: Design guidelines for above ground storage tanks (AST) such as American Petroleum Institute (API) 620 and API 650 provide design details to prevent failure form internal liquid pressure, internal suction (vacuum), winds and earthquake loads. However, these design codes lack descriptive guidelines to prevent failure of tanks due to loads form hurricane storm surge. Due to the lack of such code provisions, failure of such ASTs due to surge loads was observed during hurricane Katrina, Rita, Ike and Gustav. During hurricane Katrina alone, tank failure led to release of 8 million gallons of crude petroleum products in to the surrounding environment. Spillage of petrochemical and other hazardous material leads to substantial economic losses due to loss of products, cleanup activities, lawsuits and repair/reconstruction of damaged tanks. Therefore, this study aims to assess fragility of ASTs subjected to storm surge loads in order to provide a basis for future design codes. Flotation and buckling have been identified as the major failure modes for tanks during storm surges. A probabilistic analysis is performed for floatation and buckling for anchored and un-anchored tanks. Random variables are identified for floatation analysis and random fields are used to generate geometric imperfections of ASTs to facilitate buckling analysis. A probabilistic analysis calls for a large number of time consuming simulations. For this purpose, the supercomputing facilities managed by the Research Computing Support Group at Rice University are used to perform the simulations. Using logistic regression, fragility of an AST typical to Houston ship channel area is assessed and measures to prevent future failure are suggested.

Speakers
SK

Sabarethinam Kameshwar

Graduate Research Assistat, Rice University


Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: GPU accelerated Lattice Boltzmann Method in Core Sample Analysis, Zheng Wang, Rice University

DOWNLOAD POSTER PDF


Lattice Bolzmann Method(LBM) is a relatively new computational method for fluid simulation. From a microscopic and mesoscopic prospective, LBM discretizes the velocity into finite directions and simulates particles in collision and propagation processes. Since the LBM is particularly successful in handling complex boundary conditions, it is now used in the oil industry. In my project, LBM is applied in core sample analysis. The core sample firstly goes through a scanner and is reconstructed into digital data. By applying LBM on the digital data, we can find out some physical properties of rocks, such as Renolds number and permeability which are very useful in oil industry. However, the LBM is usually computationally intensive. An ordinary serial code is not adequate in implementing a high resolution LBM. To simulate the fluid in a short time, we employ the technology of GPU programming which is the trend in high performance computing. By using GPU, the parallel code can be hundreds of times faster than a serial code and this allows us to apply LBM in more general cases. In addition, a combination of GPU and MPI enables the code to run on cluster and solves the memory issue.

Speakers

Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: Kalman filtering for large-scale problems, Timur Takhtaganov, Rice University

DOWNLOAD POSTER PDF


Object transport has a lot of applications in industries and agriculture as well as disaster relief and warehouses applications. This poster presents a novel distributed algorithm for multi-robot systems to collectively transport a large object while avoiding obstacles in unknown environment. Given the size of the object, path planner robots generate the minimum cost path from start to the goal position by using a distributed Bellman-Ford algorithm. Then, transporter robots carry the object through the path. A transport is safe if it is obstacle free. We define transport cost as the cost of translation and rotation of the object. This study is trying to trade-off between the cost and safety of the transport by using distributed configuration space and tree-based path planing. We have implemented our algorithm both in simulation and real environments. As results show, our approach is robust to the size and shape of the object and provides a safe and efficient transport in unknown environments.

Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: Modeling and computational challenges for transient finite element computations at dynamic gas-liquid-solid contact lines, Alex Lee, Rice University
Fluid flows with complex rheology and dynamic interfaces are important in many industrial applications, such as optimizing liquid printing and coating processes for manufacturing nanomaterials, or determining the flow of oil/brine systems in porous rockbed for oil recovery. Such transient flows featuring the evolution of solid-liquid-gas interfaces remain difficult to compute, especially when the liquid has a complex rheology. The difficulties are due to both high computing demands and the complexity of modeling various aspects of the system. For example, OpenMP-parallelized calculations for liquid transfer efficiency of a gravure printing process took roughly 8 cpu core-years on the Rice HPC clusters to obtain data published in [1]; this was in the simplified case of a static contact line model and non-dynamic gas phase. The physics of three-phase interfaces are still poorly understood so that there are currently several strategies tailored to specific purposes. We adopt an implementation of Navier's slip law that fits naturally into our transient Petrov-Galerkin Finite Element Method, and allows more realistic physics to be applied at the contact line [2]. Still, there are several computational challenges to be addressed, including mesh resolution local to the contact line, dynamic contact angle modeling, incorporation of the conformation tensor based model for viscoelastic liquids, and stability of time integration for such tightly coupled systems. In the contexts of our gravure printing problem and some toy problems---including one of potential interest in oil recovery---we will discuss our current advances and demonstrate our successes in addressing these challenges. [1] Lee, J. A., Rothstein, J. P., & Pasquali, M. (2013). J. Non-Newtonian Fluid Mech., 199, 1–11. [2] Sprittles, J. E., & Shikhmurzaev, Y. D. (2011). Int. J. Numer. Methods Fluids, 68(10), 1257–1298.

Speakers

Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: Non-blocking Data Structures for High-performance Computing in Oil and Gas Industry, Zhipeng Wang, Rice University
Concurrent data structures are becoming more and more popular in parallel computing as they are widely used in operating systems and concurrent programming. There are two types of algorithms to implement the concurrent data structures: blocking and non-blocking. Blocking algorithms are essentially lock-based algorithms which allow the sequential order for processes to complete operations on the shared data structures. However, on the asynchronous multiprocessor systems when there are more numbers of threads than numbers of cores, they suffer huge performance degradation as the result of scheduling preemption, cache misses and page faults etc. Non-blocking algorithms could tolerate those problems and potentially achieve high concurrency while maintaining low overhead, so they are more robust in the multithreaded programming models. Reservoir Simulations play a crucial role in oil and gas industry, and high concurrency and parallelization of numerical calculations without significant performance degradation in the multiprocessor systems are becoming urgent for large-scale reservoir simulations. Here we implement our new non-blocking algorithm for concurrent data structures (FIFO queues etc.) used for solving the discrete linear equation system (the discrete energy and mass balance equations) on the IBM Power7 with up to 128 threads. The results show that our new designed non-blocking algorithm could enhance the performance of the parallel program on both of the systems with dedicated processors and with multiprogrammed processors. Our research shows that there are significant potential applications of the non-blocking concurrent data structures on the high performance computing in oil and gas industry.

Speakers

Thursday March 6, 2014 4:30pm - 6:30pm
BRC - Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: OCCA: A unified approach to multi-threading languages, David Medina, Rice University

Speakers
avatar for David Medina

David Medina

Graduate Student, Rice Unviersity
My name is David Medina, I'm in my fourth year of the PhD program in the Computational and Applied Mathematics department at Rice University. I'm working under the advisement of Dr. Tim Warburton on high order numerical method applications using graphical processing units (GPUs). Together with my advisor, I've been working on OCCA to facilitate programing heterogeneous systems | | As for my interests, I really enjoy programming and... Read More →


Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: On the approximation of the \emph{DtN} map for high contrast media and its application to domain decomposition methods, Yingpei Wang, Rice University
The Kalman Filter uses a sequence of noisy observations of a system over time to produce a sequence of approximations of the state of the system. Many variations of the original Kalman Filter have been proposed and applied in a wide variety of sciences and engineering fields, such as aerospace, meteorology, geophysics, oceanology, and reservoir simulation. However, often no connections between newly proposed variants and existing ones are made and no comparisons are given. I will evaluate and compare the efficiency of recently proposed Krylov space approximate Kalman filters and of the Ensemble Kalman filter on large-scale time dependent partial differential equation models. In addition, my work establishes theoretical connections between different variations of the Kalman filter, identifies their relative advantages and weaknesses, and exposes opportunities for algorithmic improvements. Future work will include analysis and implementation of these algorithmic improvements to increase the efficiency of Kalman Filter for large-scale nonlinear problems.

Speakers
avatar for Yingpei Wang

Yingpei Wang

6100 Main St, Houston, TX 77005, Rice University
I am fifth year graduate student in Department of Computational and Applied Mathematics at Rice University. My research focuses on computing, numerical analysis and partial differential equations. I will graduate in May 2014 and I am looking for a job in related area.


Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: Optical properties of the Split Ring's Nanostructure, Yang Cao, Rice University
Pairs of split-rings resonators (SRRs) in a scalable manner over large areas have been created based on a new and facile patterning technique, which combines conventional colloidal lithography and stretchable poly (dimethylsiloxane) (PMDS) stamps. The polarization-dependent plasmonic resonances of SRRs can be tunable in a wide wavelength region, from visible to the infrared by controlling the gap sizes. Theoretical calculation based on the finite element methods shows the perfect agreement with the experiment. This novel device has a potential application in the optical sensor fields.

Speakers

Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: Performance Challenges for Emerging HPC Systems, Milind Chabbi, Rice University

DOWNLOAD POSTER PDF

Today’s supercomputers are complex and enormous in scale. As a result, harnessing their full power is exceedingly difficult. Application performance problems of interest on HPC systems include both node-level and system-wide performance issues. Node-level performance issues include utilization of memory hierarchies as well as instruction, vector, and/or thread-level parallelism, as well as power consumption. System-wide problems include load imbalance and serialization across nodes, as well as communication and I/O bottlenecks. Failure to avoid or tolerate these issues can lead to major system-wide performance bottlenecks at scale. Even worse, one application may experience performance problems that result from heavy resource consumption by other jobs. In the face of all of this complexity, tools are essential for identifying code regions that consume excessive resources (e.g., time or power) relative to the work they accomplish, quantifying their impact, and diagnosing the root causes of their inefficiency. Unique processors, memory hierarchies, accelerators, network topologies, and software stacks each require different tool support for measurement and analysis. Effective performance tools for today’s supercomputers require support ranging from hardware to application domain. To date, performance tools have focused on post-mortem analysis of application performance to pinpoint and resolve causes of performance losses. For exascale systems faced with scarce resources (especially power), efficient resource management will require programs, libraries, runtime systems, and the operating system to analyze their own performance on the fly and initiate changes, e.g., migration of work or frequency scaling, to reduce resource consumption or improve utilization. As a result, exascale systems will need new software support to analyze performance measurements on the fly and policies to determine how to react. Designing the necessary performance tools’ interfaces for measurement, analysis, and control, as well as the mechanisms to support them is a key ingredient for the success of exascale systems.

Speakers

Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: Predicting Solubility Parameters of Asphaltene Molecular Models Using Molecular Simulations, Mohan Boggara, Rice University
Production of crude oil routinely suffers from asphaltene precipitation and deposition in the reservoir and in the wellbore. Such depositions lead to significant operational and economic losses of millions of dollars to the energy industry. To predict and prevent asphaltene depositions, thorough thermo-physical characterization of crude oils is an important and significant endeavor. Current state-of-the-art industry standard thermo-physical characterization of crude oils still suffers from the lack of good predictive models and experiments. Our group focuses on the development and implementation of advanced thermodynamic models and experiments to address the thermo-physical characterization and the phase behavior of crude oils. In order to improve predictive capabilities of the thermodynamic models molecular level understanding of crude oil mixtures is the key. Specific focus of the work presented here is on predicting the solubility parameters (SP) of asphaltene models. SP is one of the key molecular parameters that can be directly related to derived thermodynamic properties as well as properties such as density and refractive index (RI). Two molecules with close SP are expected to mix at molecular level. Previous work in the group has shown a simple relationship between RI and density (one-third rule) that allows calculation of either property at any temperature and pressure by measuring their values at some standard T&P conditions. Using MD simulations, we will predict SP for various solvents and validate against experimental data in the literature. More importantly, we will predict SP for putative asphaltene models and use them as starting points for our in-house experiments in measuring solubility parameters (via density and RI measurements) of various crude oils containing asphaltenes. Overall, this work will serve as a starting point in validating the accuracy of available asphaltene models and using such validated molecular models to study the dynamics/kinetics of asphaltene precipitation and aggregation.

Speakers

Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: Predictive theory of nanocarbon growth: doping, defects, chirality, Vasilii Artyukhov, Rice University
We present our “nanoreactor” model for the kinetics of CVD graphene growth that bridges first-principles atomistic calculations and crystal growth theory. The model explains numerous experimental features such as hexagonal graphene island shapes and absence of lattice defects. It elucidates the roles of metal catalyst in shaping the lattice of graphene [1]. We demonstrate how the original model can be extended to include other chemical species by studying the effect of B-, N-, S-doping on the growth and formation of defects [2]. Our theory is validated by atomistic Monte Carlo simulations of growth of graphene islands. These simulations are further used to study the formation of grain boundaries during coalescence of misoriented islands, for which we uncover and explain the transition between straight-line and wiggling grain boundary shapes [3]. Finally, to elucidate how the energetics of carbon nanotubes may determine the chirality at nucleation, we undertake large-scale calculations of all possible nanotube caps across the whole chiral-angle range, obeying the isolated-pentagon rule. We confirm that the intrinsic energies of the caps are almost chirality-independent, leaving open possibilities for different chirality control strategies [4]. 1. V. I. Artyukhov, Y. Liu, and B. I. Yakobson, PNAS 109, 15136 (2012). 2. V. I. Artyukhov, T.R. Galeev, and B. I. Yakobson, in preparation. 3. K. V. Bets, V. I. Artyukhov, and B. I. Yakobson, in preparation. 4. E. S. Penev, V. I. Artyukhov, and B. I. Yakobson, ACS Nano (in press).

Speakers
avatar for Vasilii Artyukhov

Vasilii Artyukhov

Postdoctoral Research Associate, Rice University


Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: Sampling Techniques for Boolean Satisfiability, Kuldeep Meel, Rice University

DOWNLOAD POSTER


Boolean satisfiability (SAT) has played a key role in diverse areas spanning testing, formal verification, planning, optimization, inferencing and the like. Apart from the classical problem of checking boolean satisfiability, the problems of generating satisfying uniformly at random, and of counting the total number of satisfying assignments have also attracted significant theoretical and practical interest over the years. Prior work offered heuristic approaches with very weak or no guarantee of performance, and theoretical approaches with proven guarantees, but poor performance in practice. We propose a novel approach based on limited-independence hashing that allows us to design algorithms for both problems, with strong theoretical guarantees and scalability extending to thousands of variables. Based on this approach, we present two practical algorithms, UniWit: a near uniform generator and ApproxMC: the first scalable approximate model counter, along with reference implementations. Our algorithms work by issuing polynomial calls to SAT solver. We demonstrate scalability of our algorithms over a large set of benchmarks arising from different application domains.

Speakers
KM

Kuldeep Meel

Graduate Student, Rice University
Kuldeep is a PhD student in Rice working with Prof. Moshe Vardi and Supratik Chakraborty and obtained his B.Tech. from IIT Bombay in 2012. His research broadly falls into the intersection of program synthesis, computer-aided verification and formal methods. He is the recipient of 2013 Andrew Ladd fellowship.


Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: SimSQL: A software for large scale Bayesian Machine Learning, Zhuhua Cai, Rice University

DOWNLOAD POSTER PDF

This paper describes the SimSQL system, which allows for SQL-based specification, simulation, and querying of database-valued Markov chains, i.e., chains whose value at any time step comprises the contents of an entire database. SimSQL extends the earlier Monte Carlo database system (MCDB), which permitted Monte Carlo simulation of static database-valued random variables. Like MCDB, SimSQL uses user-specified “VG functions” to generate the simulated data values that are the building blocks of a simulated database. The enhanced functionality of SimSQL is enabled by the ability to parametrize VG functions using stochastic tables, so that one stochastic database can be used to parametrize the generation of another stochastic database, which can parametrize another, and so on. Other key extensions include the ability to explicitly define recursive versions of a stochastic table and the ability to execute the simulation in a MapReduce environment. We focus on applying SimSQL to Bayesian machine learning.

Speakers

Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: Solution of the black-oil problem by discontinuous Galerkin methods, Richard Rankin, Rice University
Black-oil is a commonly used model for simulating compressible flow of a water-gas-oil system in reservoirs. It is an example of a three-component three-phase flow. The phases are liquid, vapor and aqueous and the components are oil, gas and water. In this model, the gas component can exist in both the liquid and vapor phases. The water component only exists in the aqueous phase and the oil component only exists in the liquid phase. Consequently, the aqueous phase does not exchange mass with the liquid or vapor phases but the liquid and vapor phases can exchange mass. For the black-oil problem, we choose for primary unknowns the pressure of the liquid phase, the saturation of the aqueous phase and the saturation of the vapor phase. The saturation of the liquid phase can be obtained from the saturations of the other two phases by using the fact that the sum of the saturations of the three phases must equal one. The spatial discretization is based on the interior penalty discontinuous Galerkin method. At each time step, the equations for the primary unknowns are solved sequentially but each equation remains nonlinear with respect to its primary unknown. In several numerical examples we test the robustness of the method by varying the physical input data.

Speakers

Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: Stochastic Approaches for Nonlinear Drillstring Dynamic Analyses, Eleazar Marquez, Rice University
Encapsulated by critical drilling factors such as hydraulic power, weight-on-bit (WOB), drill-bit rotary velocity, and circulating fluids, conventional drilling operations urgently demand for alternative, reliable performance enhancing techniques capable of simultaneously reducing catastrophic events and operational time. Extensive [conventional] rotary drilling assessments regarding assembly vibration irregularities, bit-wear, fatigue, buckling, whirling, well-bore damage, and equipment failure invade scholastic and industrial spheres, targeting phenomenological characterization as a medium to augment rate-of-penetration (ROP). Recent investigational trends elucidate reasonable ROP increments centered on vibration-assisted drilling (VAD) methodology, where a transferring of high-frequency low-amplitude excitation into low-frequency high-amplitude response transpires by superimposing an axial vibratory source on the drill-string. The proposed rig-suspended dynamical model subjected to monochromatic deterministic and stochastic excitations, and exposed to a variety of material and geometric nonlinearities, captures the response (ROP) for any established downhole condition and formation type upon integrating percussion VAD technology; a two-step process deliberately percolates – proper mathematical representation of drill-string, VAD source, and rock-formation apprehending predominant physical attributes; and delineating appropriate [vibratory] source position within the drill-string, which warrants maximal penetration rates and eliminates tuning of mass near the natural frequency. Hypothetically formulating adequate physical parameters for the equation of motion implies incorporating finite element techniques, where the flexibility of the drill-string and elastic characteristics of the well-bore/formation are accounted along the axial and lateral directions. Modeling drill-string dynamics, nonetheless, postulates executing advanced numerical simulation techniques, particularly during the integration of the two thousand-degree-of-freedom oscillator exposed to mass, damping, and stiffness nonlinearities. In synthesizing compatible time histories through the adaptation of Kanai-Tajimi power spectrum, auto-regressive-moving-average (ARMA) filter is constituted. The method of Statistical Linearization and the method of Monte Carlo simulation are included within the stochastic vibration analyses.

Speakers
EM

Eleazar Marquez

Graduate Student, Rice University


Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: Subsurface extended full waveform inversion, Lei Fu, Rice University

DOWNLOAD POSTER PDF

Least-squares waveform inversion is approved to be capable of reconstructing remarkably detailed models of subsurface structure. Unlike conventional tomography or migration techniques that only make use of specific portion of the seismic data, seismic waveform inversion takes into account essentially any physics of seismic wave propagation that can be modelled. However, without extremely low frequency data or good initial model that contains the long-scale structure information, seismic waveform inversion is very likely to be trapped by numerous spurious local minima. The extended modelling concept combines the global convergence of migration velocity analysis with the physical fidelity of waveform inversion by adding additional dimension of freedom. Using Claerbout’s survey-sinking concept, this study implements depth-oriented extended inversion by introducing a subsurface shift in the imaging condition. Synthetic experiment result demonstrates that the depth-oriented extended waveform inversion can overcome the local minima problem and successfully estimate the true velocity model by using conventional seismic field data.

Speakers

Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: The Argonne Leadership Computing Facility and High Frequency Physics-Based Earthquake System Simulations, David Martin, Argonne National Laboratory
The Argonne Leadership Computing Facility (ALCF) is a supercomputing user facility supported by the U.S. Department of Energy (DOE). The ALCF provides the computational science community with a world-class computing capability dedicated to breakthrough science and engineering. Over 5 billion core hours on Mira, ALCF’s 10-petaflops Blue Gene/Q supercomputer, are made available to peer-reviewed projects, including explorations into renewable energy, studies of the affects of global climate change, and efforts to unravel the origins of the universe. Collaborators have access to a full range of services and support. ALCF offers expertise in novel computational methods and algorithms, application porting, performance tuning and scaling, petascale system management, and high-performance analysis and visualization.

Speakers
avatar for David Martin

David Martin

Manager, Inudstry Partnerships and Outreach, Argonne National Laboratory


Thursday March 6, 2014 4:30pm - 6:30pm
BRC

4:30pm

Poster: Vapor formation around heated nanoparticles: a molecular dynamic study, Vikram Kulkarni, Rice University

DOWNLOAD POSTER PDF


Strongly heated nanoparticles have many applications ranging from cancer therapy to efficient steam generation. Gold nanoparticles may be resonantly heated when exposed to light due to the excitation of surface plasmons. Here we simulate the thermodynamics of heat transfer from a gold nanoparticle into water. This is accomplished using molecular dynamics, a large scale brute force computational technique capable of simulating the motions of millions of atoms. We study nanoparticles of experimentally realistic size, ranging from 17-26 nm containing over a hundred thousand gold atoms immersed in millions of water molecules. We show the conditions required for the formation of a vapor bubble around the nanoparticle, including the threshold laser power and the critical size of the particle. We show explicitly that small nanoparticles may be heated to the melting point without the formation of a surrounding bubble. However for larger nanoparticles, a pronounced bubble is seen around the particle. Our results are compared to the well-known heat transfer equation, which is known to break down at the nanoscale due to the formation of interfacial thermal barriers. Our work is of vital importance for scientists and engineers who wish to utilize hot nanoparticles for changing the surrounding environment, whether for irradiating a tumor cell or to produce vapor bubbles for enhanced steam generation.

Speakers

Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: Waveform inversion with source coordinate model extension, Yin Huang, Rice University
Extended full waveform inversion combines ideas of seismic inversion and migration velocity analysis using extended model concept. For the linearized extended full waveform inversion (LEFWI), the model (extended) is separated into smooth background velocity (physical) and short scale reflectivity (extended). Minimization over the reflectivity gives a reduced objective function over the background velocity, which then can be minimized over velocity. We will review this method and then show some numerical results of source coordinate model extension. Adjoint state method is used to get derivatives of this method. Without the satisfaction of the adjoint relation, the optimization may have slow convergence rate or even do not converge at all. Thus, testing of the adjoint relation is crucial to a successful implementation. I will present a method to compute a derivative and its adjoint of a generalized time step function, with an automatic differentiation tool: TAPENADE, and then show testing results of the acoustic constant density wave equation.

Speakers
YH

Yin Huang

6100 Main St, MS-134, Houston, TX, CAAM-Rice University
I am a forth year as a PhD student at the Computational and Applied Mathematics Department at Rice University, and am now doing research on Extended Waveform Inversion, Imaging and High Performance Computing under the supervision of Dr. William Symes. | | Relevant courses work includes Numerical Analysis, Numerical Differential Equations, Optimization, High Performance Computing, Geophysical Data Analysis and Exploration Geophysics.


Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030

4:30pm

Poster: Write Aside Persistence (WrAP) for Storage Class Memory, Ellis Giles, Rice University
Emerging memory technologies like Phase Change Memory or Memristors (generically called SCM or Storage Class Memory) combine the ability to access data at byte granularity with the persistence of storage devices like hard disks or SSDs. With SCM, application developers can focus on a single storage abstraction rather than having to deal with both byte/word-grained accesses to DRAM locations and block-based accesses to file/disk ranges. By accessing data directly from SCM addresses instead of slow block I/O operations, developers can gain 1-2 orders of performance. However, this unification of storage into a single directly accessed persistent storage memory tier is a mixed blessing, as it pushes upon developers the burden of ensuring that SCM stores are ordered correctly, flushed from processor caches, and if interrupted by sudden machine stoppage, do not leave objects in SCM in inconsistent states. The complexity of ensuring properly ordered and all-or-nothing updates raises significant reliability and programmability challenges. We propose a solution called Write Aside Persistence, or WrAP, that provides durability and consistency for SCM writes, while ensuring fast paths to data in processor caches, DRAM, and persistent memory tiers. WrAP is presented in a software / hardware architecture and also as a software only approach. Simulations of transactional data structures, such as the Graph 500 Benchmark and Standard Template Library tests, indicate the potential for significant performance gains using Write Aside Persistence for atomic and durable writes to Storage Class Memory.

Speakers
avatar for Ellis Giles

Ellis Giles

Doctoral Student, Rice University
I am interested in high performance and distributed computing. I am also interested in emerging and exciting technologies. For fun I enjoy scuba diving, model rocketry, and working on classic automobiles.


Thursday March 6, 2014 4:30pm - 6:30pm
BRC Exhibit Hall Rice University 6500 Main Street at University, Houston, TX 77030