PAQMSG: A Parallelization Library for Air Quality Models on Structured Grids

P. Miehe ( and A. Sandu (


PAQMSG is an MPI-based communication library for the parallelization of air quality models on structured grids. It consists of distribution, gathering and repartitioning routines for XY and HV domain decomposition implementing a master-worker strategy. The library is architecture and application independent and includes optimization strategies for different types of architectures.

PAQMSG was developed with support from NSF through the CAREER award ACI-0093139.


Source Code for Communication Library: tar.gz (59 KB). 
Source Code for Optimized HV-partitioning: tar.gz (59 KB). 
Slim version of Source Code: tar.gz (62 KB). 
Reference Manual for Communication Routines: ps.gz (280 KB) and pdf (250 KB). 
P. Miehe's M.S. Project Report: ps.gz (1.5 MB) and pdf (1.7 MB). 


From the start of industrialization in the 19th century until today mankind has achieved a high level of technology. This development during the past two centuries has had a great impact on our environment. Life strives on the air that surrounds us. Scientists studied the impact of air pollution on our world during the past years. In order to preserve the world present for our children, governments throughout the world are now regulating the admissible levels of pollutants in the atmosphere. Air pollution creates negative effects on the health of humans and can damage the environment and property. Air quality models (AQMs) are used to simulate the reactions and effects of certain chemical species in our atmosphere.

 AQMs include various submodels that cooperate to tackle the complex simulation problem. Monitoring the reactions of chemical species over time is just part of the task. Physical factors such as sun light and wind affect the reaction rates and species concentrations and have to be modeled as well in order to simulate the air quality.

Typically a regional air quality model simulates about 100 chemical gas-phase species and uses a resolution of 100*100 grid points in horizontal direction as well as 20 vertical layers. In order to make predictions a simulation to real time ratio of 100:1 is desired. Therefore a few hundred gigaflop per second are necessary. Today's standalone workstations do not incorporate that kind of power, however parallel machines can be applied to solve that problem. A parallelization can provide faster time-to-result for all simulation scenarios.
Chemistry calculations take most of the simulation time in air quality modeling, as seen in the neighboring figure. While chemical reactions are independent in each grid cell and therefore embarrassingly parallel, wind and diffusion introduce strong dependencies at each time step that make communication overhead necessary for the parallelization.

In the following we present two domain decomposition schemes that are recommended for use of implicit transportation schemes. Both are HV domain decomposition approaches.

Regular HV domain decomposition Irregular HV domain decomposition

HV-partitioning stands for horizontal-vertical partitioning. Figure (a) on the right identifies H-slices in comparison to V-columns in figure (b). All points in the cube having the same z index pertain to the same H-slice of the cube. A V-column consists of one grid point in X and Y-dimension and the full column in Z-dimension.

Only a combination of H and V-partitioning can be used to do the parallelization of the AQM. A shuffling from H-slices to V-columns and back has to take place during each time step. Here the parallelization is dependent on the number of grid points in Z-dimension - only Nz processors can be active during H-partitioning. But on the H-slice only the X and Y-transport can be done, therefore the time taking chemistry calculations will be done on V-columns and there are almost no limitations here because we have (Nx Ny) V-columns that can be distributed to the processors.

Distribution of V-columns can take place in a regular and irregular fashion. Regular distribution assigns V-columns to processors by going through x and then y-dimension while irregular distribution assigns V-columns by going through the diagonals. Assignments take place in a circular fashion (round robin).


Presented are the speed-ups obtained for the parallel version of the state-of-the-science
Air Quality Model  STEM-III on the Michigan Tech Beowulf cluster.
Sppedups for STEM-III on a Beowulf Cluster