Loading…
2014 Rice Oil & Gas HPC has ended

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Programing Models Libraries and Tools [clear filter]
Thursday, March 6
 

1:00pm PST

Programming Models, Libraries and Tools: Research and Development into Extreme Scale I/O using the Adaptable I/O System (ADIOS), Scott Klasky, Oak Ridge National Laboratory

DOWNLOAD PRESENTATION

WATCH VIDEO

One of the most pressing problems facing the High Performance Computing (HPC) community is the ability to compose next generation simulations and understand the performance vs. performance tradeoffs. One of the realizations of exascale computing is that we will not be able to simply save all of the data from simulations, and we must move to more of a “in-situ” processing paradigm. In our research we have developed techniques to deal with the increasing disparity between I/O throughput and compute capability. As part of this effort we developed the Adaptable IO System (ADIOS), which focused on a Service Oriented Architecture, by combining cutting edge research into new I/O techniques. As a result, ADIOS provides the highest level of synchronous I/O performance for a number of mission critical applications at various DOE Leadership Computing Facilities, and as a result, won a R&D 100 award. Our research has focused on many of the pressing needs for exascale computing, and in this presentation we focus on two critical areas: 1) Creating new abstractions and middleware for location-flexible scientific data analytics, 2) Creating new techniques to facilitate spatial and temporal queries of scientific data analytics.

Moderators
avatar for Amik St-Cyr

Amik St-Cyr

Senior researcher, Shell
Amik St-Cyr recently joined the Royal Dutch Shell company as a senior researcher in computation & modeling. Amik came to the industry from the NSF funded National Center for Atmospheric Research (NCAR). His work consisted in the discovery of novel numerical methods for geophysical... Read More →

Speakers
SK

Scott Klasky

Oak Ridge National Laboratory


Thursday March 6, 2014 1:00pm - 1:20pm PST
BRC 280 Rice University 6500 Main Street at University, Houston, TX 77030

1:20pm PST

Programming Models, Libraries and Tools: Compiler Independent Strategy for Data Locality Optimization, Jinxin Yang, University of Houston

DOWNLOAD PRESENTATION

WATCH VIDEO

Data locality is an important optimization for loop oriented kernels. Auto tuning techniques are used to fi nd the best strategy for data locality optimization. However, auto tuning techniques are expensive and not independent of computing frameworks. Porting an application from one framework to another requires the whole auto tuning process to be repeated, in order to get an optimal solution for the new one. A global strategy will help in expediting the porting process for an application. In this work, we present a framework, consisting of OpenUH transformation directives and CHiLL framework, which provides an optimal strategy for the data locality problem which is independent of compilers. Our results show that the strategies given by our framework clearly out class the default optimization levels of OpenUH, GCC, Intel and PGI compilers.

Moderators
avatar for Amik St-Cyr

Amik St-Cyr

Senior researcher, Shell
Amik St-Cyr recently joined the Royal Dutch Shell company as a senior researcher in computation & modeling. Amik came to the industry from the NSF funded National Center for Atmospheric Research (NCAR). His work consisted in the discovery of novel numerical methods for geophysical... Read More →

Speakers
JY

Jinxin Yang

HPCTools Group, University of Houston


Thursday March 6, 2014 1:20pm - 1:40pm PST
BRC 280 Rice University 6500 Main Street at University, Houston, TX 77030

1:40pm PST

Programming Models, Libraries and Tools: Portable, MPI-Interoperable Coarray Fortran 2.0, Chaoran Yang, Rice University


DOWNLOAD PRESENTATION



The past decade has seen the advent of a number of parallel programming models such as Coarray Fortran (CAF), Unified Parallel C, X10, and Chapel. Despite the productivity gains promised by these models, most parallel scientific applications still rely on MPI as their data movement model. One reason for this trend is that it is hard for users to incrementally adopt these new programming models in existing MPI applications. Because each model use its own runtime system, they duplicate resources and are potentially error-prone. Such independent runtime systems were deemed necessary because MPI was considered insufficient in the past to play this role for these languages.

The recently released MPI-3, however, adds several new capabilities that now provide all of the functionality needed to act as a runtime, including a much more comprehensive one-sided communication framework. In this paper, we investigate how MPI-3 can form a runtime system for one example programming model, CAF, with a broader goal of enabling a single application to use both MPI and CAF with the highest level of interoperability.

Speakers

Thursday March 6, 2014 1:40pm - 2:00pm PST
BRC 280 Rice University 6500 Main Street at University, Houston, TX 77030

2:00pm PST

Programming Models, Libraries and Tools: A parallel multi-physics library for rigid-body, flexible-body, and fluid dynamics, Dan Negrut, University of Wisconsin - Madison

DOWNLOAD PRESENTATION

WATCH VIDEO

We present a multi-physics, multi-discipline computational framework for the modeling, simulation, and visualization of multibody dynamics, granular flow, and fluid-solid interaction applications. The Chrono simulation tool has a modular structure, built on top of five foundation elements that provide support for (1) modeling; (2) numerical solution; (3) proximity computation and contact detection; (4) domain decomposition and inter-domain communication; and (5) pre- and post-processing. The modeling component provides support for the automatic generation of the very large and complex sets of equations for different classes of applications. This is achieved in a fashion transparent to the user who need only provide high-level model and solution parameters. Examples include the equations of motion for granular flow simulations, using either a Differential Variational Inequality (DVI) or a Discrete Element Method (DEM) approach, the dynamic equations in an Absolute Nodal Coordinate Formulation (ANCF) for flexible multibody dynamics, the Smooth Particle Hydrodynamics (SPH) discretization of the Navier-Stokes equations for fluid-solid interaction problems, etc. The numerical solution component provides the parallel algorithmic support required to solve the set of equations governing the dynamics of interest. Depending on the underlying physics, various parallel solvers are employed for: optimization problems arising in the DVI approach for handling frictional contact; solving nonlinear problems arising in the context of implicit numerical integration; SPH-based methods for fluid-solid interaction problems, etc. For discrete problems, the proximity computation and contact detection component handles contact detection tasks; for continuum problems handled in a meshless framework it produces the list of neighboring nodes that overlap the compact support associated with each node of the discretization. The domain decomposition and inter-domain communication component manages the splitting of large problems into subdomains and provides support for the required inter-process communication. This enables the MPI simulation of granular flow problems with millions of particles interacting through frictional contact, conducted on hundreds of distributed nodes. The pre/post-processing component supports the process of setting up a model using the Chrono API and provides support for efficient visualization of simulation results from problems involving millions of states resolved at frequencies of hundreds of Hertz. Chrono leverages heterogeneous parallel computing architectures, including GPU and multi-core CPU processors, as well as MPI distributed architectures, to accelerate the simulation of very large systems. Examples of such systems include those encountered in granular dynamics where the number of interacting elements can be in the millions and fluid-solid interaction simulations involving millions of fluid markers and tens of thousands of solid (rigid or flexible) bodies. Chrono handles seamlessly systems that include both complex mechanisms composed of rigid bodies connected through mechanical joints and collections of millions of discrete elements interacting through contact, impact, and friction. Chrono is available open source, under a BSD license. Completely platform-independent, Chrono::Engine libraries are available for Windows, Linux and Mac OSX, in both 32-bit and 64-bit versions.

Moderators
avatar for Amik St-Cyr

Amik St-Cyr

Senior researcher, Shell
Amik St-Cyr recently joined the Royal Dutch Shell company as a senior researcher in computation & modeling. Amik came to the industry from the NSF funded National Center for Atmospheric Research (NCAR). His work consisted in the discovery of novel numerical methods for geophysical... Read More →

Speakers

Thursday March 6, 2014 2:00pm - 2:20pm PST
BRC 280 Rice University 6500 Main Street at University, Houston, TX 77030

2:20pm PST

Programming Models, Libraries and Tools: Fine Grain MPI, Earl Dodd, University of British Columbia

DOWNLOAD PRESENTATION

WATCH VIDEO

A major challenge in today’s high performance systems is how to seamlessly bridge between the fine-grain multicore processing inside one processing node to the parallelism available across the nodes in a cluster. In many cases this has led to a hybrid programming approach that has combined Message Passing Interface (MPI) with a finer grain programming model like OpenMP. However, the hybrid approach requires supporting both programming models, creates an inflexible boundary between the parts of the program using one model versus the other, and can create runtime systems models conflicts. We present a system called Fine-grain MPI (FG-MPI) that bridges the gap between multicore and cluster nodes by extending the MPI middleware to support a finer-grain process model that can support large number of concurrent threads in addition to multiple processes across the nodes. This provides a single unified process model that can both scale up and scale out without programming changes or rebuilds. FG-MPI extends the MPICH2 runtime to support execution of multiple MPI processes inside each single OS-process, essentially decoupling an MPI process from that of an OS-level process. These processes are full-fledged MPI processes. It is possible in FG-MPI to have hundreds and even thousands of MPI processes inside a single OS-process. As a result one can develop and execute MPI programs that scale to thousands and millions of MPI processes without requiring the corresponding number of processor cores. FG-MPI supports function-level parallelism, where a MPI process is bound to a function rather than a program, which brings MPI closer to that of task-oriented languages. Expressing function-level concurrency makes it easier to match the parallelism to the problem rather than to the hardware architecture. Overheads associated with the extra message-passing and scheduling of these smaller units of parallelism have been minimized. Context switching among co-located MPI processes in user space is an order of magnitude faster than that of OS-level processes. There is also support for zero-copy communication among co-located MPI processes inside the single address space. The FG-MPI runtime is integrated into the MPICH2 middleware and the co-located MPI processes share key structures inside the middleware and cooperatively progress messages for each other. FG-MPI implements a MPI-aware user-level scheduler that works in concert with MPICH2’s progress engine and is responsive to events occurring inside the middleware. For communication efficiency, we exploit the locality of MPI processes in the system and implement optimized communication between co-located processes in the same OS-process. FG-MPI can be viewed as a type of over-subscription (in the case of SPMD), however, it is the runtime scheduler that manages this over-subscription and not the OS-scheduler. Scheduling of heavy-weight MPI processes by the OS introduces a number of overheads due to costly context switches and because the OS scheduler is not aware of the cooperative nature of the communicating processes. In FG-MPI, not only are context switches an order of magnitude cheaper, but it is possible to reduce OS jitter by matching the number of OS-processes to the processor cores and having the remaining processes scheduled inside the OS-process. Cooperative execution of multiple MPI processes within an OS-process adds slackness that is important for latency hiding. This helps to reduce the idle time that can result from busy polling of the network inside the middleware. FG-MPI can improve the performance of existing MPI programs. Added concurrency makes it possible to adjust the unit of computation to better match the hardware architecture cache size. It can also aid in pipelining of several smaller messages and avoiding the rendezvous protocol commonly used for large messages. The ability to specify finer-grain smaller task-oriented units of computation makes it possible to assign them in many different ways to achieve better load balancing. The support for finer-grain tasking makes it possible to view MPI as a library for supporting concurrent programming rather simply a communication library for moving data among clusters of nodes. Function-level parallelism is closer to the notion of parallelism defined by process-oriented programming or languages based around Actor-like systems. In conclusion, FG-MPI provides a better match for today’s multicore processors and can be used for task-oriented programming with the ability to execute on a single machine (i.e., node) or a cluster of nodes. FG-MPI provides a single programming model that can execute within a single multicore node and across multiple multicore nodes in a cluster.

Moderators
avatar for Amik St-Cyr

Amik St-Cyr

Senior researcher, Shell
Amik St-Cyr recently joined the Royal Dutch Shell company as a senior researcher in computation & modeling. Amik came to the industry from the NSF funded National Center for Atmospheric Research (NCAR). His work consisted in the discovery of novel numerical methods for geophysical... Read More →

Speakers
avatar for Earl J. Dodd

Earl J. Dodd

Chief Strategy Officer, Scalable Analytics Inc.


Thursday March 6, 2014 2:20pm - 2:40pm PST
BRC 280 Rice University 6500 Main Street at University, Houston, TX 77030