OFIC-07_HauserThomas

download OFIC-07_HauserThomas

of 12

Transcript of OFIC-07_HauserThomas

  • DRAFT

    FOAMDSMC - A DSMC SOLVER FOR RAREFIED FLOW APPLICATIONS BASED ONOPENFOAM

    THOMAS HAUSER AND JEFFREY ALLEN

    ABSTRACT. The prevalence of applications involving the simulation of rarefied gas flows continues to increase. As such,new and innovative means for their solution have become necessary. Presented herein, is a new, parallel, steady/unsteadydirect simulation Monte Carlo solver, foamDSMC, based on object oriented programming practice. Its development, andvalidation are presented, along with various single and multiple processor performance characteristics. The validationresults, of both the hypersonic corner flow and sphere flow, showed the accuracy of the solver to be comparable tocommercial solvers. The foamDSMC solver was additionally applied to a sounding rocket flight, and thus showedits applicability to practical simulations. The single and multiple processor performance results demonstrated goodscalability with increased problem sizes and clear avenues for future improvements.

    1. INTRODUCTION

    A rarefied gas may be divided into several different flow regimes in accordance with its level of rarefactionas quantified by the Knudsen number (Kn). A significantly large number of flows may be classified within thetransition regime (0.1 < Kn < 10), and constitute numerical simulation limits well outside that of conventional,continuum based solvers. Relevant applications within this regime include upper atmospheric simulations, includ-ing: the Space Shuttle Orbiter [1], the Magellan Spacecraft [2], the Stardust Sample Return Capsule [3], and theMars Pathfinder [4]. Additional relevance outside that of upper-atmospheric studies, includes: chemical vapordeposition [5], the micro filter [6], and micro-electro mechanical systems (MEMS) [7]. Traditionally, the Boltz-mann equation, based on kinetic theory, remained the only appropriate option for the solution of these high Knnumber flows. The inherent difficulties associated with solutions of the Boltzmann equation, however, including:the large number of independent variables required (up to seven), the modeling of the collision term (includinginverse collisions), and the modeling of chemical and thermal non-equilibrium effects, have resulted in other, moredirect and simplified methods of solving these flows. The direct simulation Monte Carlo (DSMC) method of G.A.Bird is one such method, and may be regarded as a numerical solution to the Boltzmann equation in the limit ofvery large numbers of simulated molecules [8, 9]. The method, unlike the Boltzmann or Navier-Stokes equations,does not rely upon the discretization/solution of a set of partial differential equations combined with appropriateinitial and boundary, closure conditions. Rather, the method, as its name implies directly models the interactions ofa small subset of molecules, each representing a statistically large number of actual molecules. The method, thusdrastically reduces many of the problems associated with the Boltzmann equation. DSMC, for example, facilitatesthe modeling of chemical reactions by treating the species on a particle by particle basis, and completely eliminatesthe need for inverse collisions. The latter being particularly problematic (for the Boltzmann equation) with respectto modeling recombination reactions involving ternary interactions.

    Up until 1975, due to the large computational costs, the method was limited only to big budget aerospaceindustries [10]. Since 1975, and running parallel to increased computational efficiencies, the DSMC method hasbeen utilized by both large scale agencies, and individuals running their own personal computers. The DS2V andDS3V open source programs [11] of Bird are such examples applicable to the latter category. The first parallelimplementations of the DSMC method were created in the late 1980s and early 1990s. These included structuredgrids and static domain decomposition. Among others, these included the works of Ota [12], Nance [13], andMatsumoto [14]. In the mid 1990s, the parallel, unstructured grid DSMC solver, MONACO was developed byDietrich, and Boyd [15]. Additional parallel implementations include works conducted by Wu [16], and LeBeau[17].

    Although the evolution of the DSMCmethod from its early development, has progressed substantially to includesuch features as: unstructured grids, dynamic load balancing, and adaptive grid refinement, few of these parallelimplementations; however, accommodate unsteady simulations with capabilities needed for rapidly changing flowand species properties, or allow the flexibility and management that the object oriented programming style facili-tates. The authors main objective for conducting this research is therefore to outline the development, validationand performance of a new parallel, steady/unsteady DSMC solver, foamDSMC. The solver incorporates an objectoriented approach along with the capabilities for solution of unsteady flow applications, involving rapidly changingflow and species properties.

    1

  • DRAFT

    englishFOAMDSMC - A DSMC SOLVER FOR RAREFIED FLOW APPLICATIONS BASED ON OPENFOAM 2

    1.1. The Direct Simulation Monte Carlo Method. The DSMC method, as its name implies may be categorizedas a Monte Carlo method in that it makes extensive use of random number generation. The primary objectiveis to model a rarefied gas flow by using a large number of simulated molecules to represent real gas behavior.The idea is to track the motion and interactions of these simulated molecules such that their positions, velocities,internal energies and chemical compositions may be correctly modified over small time steps. Since the trackingand molecular interactions are conducted on a particle by particle basis, the conservation of mass, momentum, andenergy may be enforced to machine accuracy [18].

    The primary assumption upon which the DSMC method relies, is that the deterministic molecular motions andthe probabilistic intermolecular collisions are independent. This independence assumption is satisfied only for asufficiently small time step, the determination of which, is based on the mean collision time. The relatively largecomputational costs associated with the method, particularly with reference to three-dimensional applications, havegiven rise to applications for which symmetry simplifications become appropriate. Although these symmetricalsimplifications in physical space enable the reduction of grid dimensions, the collision modeling is always three-dimensional and not susceptible to such simplifications. Recently parallel implementations of the DSMC methodhave rendered many of these larger, three-dimensional applications tenable.

    Because the DSMC method is statistically based, and depends on the simulation of several thousand or millionsof simulated molecules, it may be subject to significant statistical errors or fluctuations. Although it is custom-ary to utilize several million simulated molecules for most, standard three-dimensional applications, this is still arelatively small fraction of the number of real molecules that physically would occupy the domain. Since the con-tinuum, macroscopic quantities are computed from either time or ensemble averages associated with the particles,a certain amount of statistical error is produced. Several numerical studies have shown this error to be inverselyproportional to the square root of the number of particles. An accurate DSMC simulation must therefore containa sufficient number of particles to faithfully represent a statistically significant sample size in order to reduce thispotentially significant error.

    The primary steps used in the DSMC method include: 1) particle initialization; 2) movement and boundaryinteraction computations; 3) intermolecular collisions; 4) sampling; and 5) macroscopic variable output. The mainalgorithm loop occurring at each time step includes steps two and three, while steps four and five are conductedon user defined intervals. Steady-state results are obtained as the macroscopic quantities are time averaged oversufficiently long time periods. Unsteady results are usually obtained from ensemble averages over specific userdefined intervals. A more detailed treatment of the DSMC method may be found in [19].

    2. foamDSMC ALGORITHM DEVELOPMENT

    A generalized flow chart of the parallel, steady/unsteady method is shown in Figure 1. As indicated, the primaryDSMC routines, including: particle initialization, movement, collision and sampling, are maintained as central ele-ments to the algorithm, and are implemented in accordance with [19]. In an effort to focus this study on the aspectsof object orientation, parallel development and unsteady implementation, all of the steps of the method will notbe covered. Furthermore, typical assumptions concerning boundary interactions and molecular collision modelsare maintained throughout this study. These include diffuse reflections with complete thermal accommodation,and the use of the Variable Hard Sphere (VHS) collision model used in conjunction with Birds No-time counter(NTC) technique [19]. Additionally, both monatomic and polyatomic molecules may be modeled, the latter usingthe phenomenological inelastic collision model of Larsen and Borgnakke [20].

    2.1. Object Oriented Baseline Development. The foamDSMC algorithm utilizes the Open Source Field Opera-tion and Manipulation (OpenFOAM) [21] for its baseline set of input-output (I/O) and particle tracking routines.The OpenFOAM package consists of a vast collection of open source, object oriented (C++), routines applicablefor the set-up and solution of a large number of both serial and parallelizable CFD related applications. Of particu-lar interest to the present authors included the excellent pre and post-processing functionality, and the established,although limited particle tracking capabilities.

    The object oriented programming (OOP) approach to DSMC, allows for certain advantages, in terms of codemaintenance, expandability, and management over traditional, procedural level, DSMC developments. The use ofclass objects, such as those defined in foamDSMC as particle or PCloud objects allow for simplified manip-ulations of underlying quantities such as a particles position, velocity and internal energy. The OOP approachadditionally allows for inheritance relationships among classes. The foamDSMC PCloud class, for example,is a derived class from the OpenFOAM, Cloud base class (class PCloud : public Cloud), and thus inherits allof the public functionality of the Cloud class. Future extensions to foamDSMC will also greatly benefit fromthis inheritance feature. The expansions of the collision routine, for example, to accommodate more than just thecurrent Variable Hard Spheres (VHS) model [19], but also the Generalized Hard Sphere (GHS) and GeneralizedSoft Sphere (GSS) models [19] is one such example.

  • DRAFT

    englishFOAMDSMC - A DSMC SOLVER FOR RAREFIED FLOW APPLICATIONS BASED ON OPENFOAM 3

    FIGURE 1. A generalized foamDSMC steady/unsteady flowchart.

  • DRAFT

    englishFOAMDSMC - A DSMC SOLVER FOR RAREFIED FLOW APPLICATIONS BASED ON OPENFOAM 4

    TABLE 1. Input conditions required for FOAMDSMC

    Input Property DescriptionMNSP Maximum number of species, limit=3MNC Maximum number of cellsMNM Maximum number of MoleculesIIS If IIS=0, then no steam, if =1, then uniform stream

    FTMP Free stream temperature (K)FND Free stream number density (/m3)FNUM Ratio of number Real molecules:number of simulated moleculesNIS Number of time steps before taking a sampleNSP Number of samples before output file createdvel1 Free stream velocity vector, 3 Cartesian components (m/s)

    vIndex Viscosity index for each speciesdiameter Species diameter at a given reference temperature (m)mass Species mass (kg)Fsp Species fraction of free stream number densityISPR Number of internal degrees of freedom for each species

    foamDSMC in conjunction with OpenFoam uses a dynamically linked list to keep track of all of the particleswithin the domain. The list is composed of class PCloud objects, and is made up of the position, velocity, speciestype, internal energy, and cell number of each simulated particle. The list is particularly advantageous with respectto running parallel applications (using domain decomposition) since the migration of particles from one processordomain to another is conducted on a particle by particle rather than a cell by cell basis. The migration is oftenrandom and the frequency dependent on the application and number of processors. However, since the DSMCmethod relies on collisions of particles taking place within a given cell, the list becomes extremely inefficientfor the selection of cell collision partners. The primary goal was thus to maintain the linked list for parallelapplications, but also enable the sorting of particles according to cell. The solution implemented in foamDSMCwas to create pointers to the particles within the list and store these pointers, according to cell, within a StandardTemplate Library (STL) vector (a one-dimensional, dynamically allocatable array), the size of which was basedon the number of cells. The result, as expected, was a dramatic increase in efficiency in terms of the selection ofcollision partners within a given cell. The vector container thus aided all of the functions within foamDSMC whichrelied on particles within a given cell, including the sampling routine.

    2.2. Pre-processing. As stated previously, one particular advantage of using OpenFOAM as the baseline code forfoamDSMC was based on its well-established pre-processing capabilities. OpenFOAM is strictly an unstructuredmesh solver, and this, combined with the option of using hexagonal or tetrahedral cell elements (or their com-bination) greatly facilitates the modeling and solution of applications involving relatively complex geometries.The program is compatible with several formats of grid-creation software, including: GAMBIT [22], CFX [23]and STAR-CD [24]. The present study utilizes entirely Fluent .msh files created with GAMBIT [22] using eitherhexagonal (for the simplest of test cases) or tetrahedral cells for more complex cases.

    The size of the computational domain is generally determined from the physics of the problem, and should belarge enough to ensure unperturbed mean flow at inlet boundaries [19]. This is important because injected particlesat the boundaries provide an input flux appropriate to equilibrium flow conditions. The cell size, as is customary,is governed by the local mean free path, and is sized (particularly important for non-adaptive grid solvers) from aconservative estimate of the local flow gradients.

    The DSMC method, unlike continuum solvers, does not rely upon boundary conditions for the explicit solutionof the macroscopic variables. The method does however, require that the user specify distinctions between surface,symmetry, or stream boundary conditions. These specifications are conducted at the time of mesh creation, and areread into an appropriately formatted OpenFOAM file.

    Several of the user-input conditions are shown in Table 1. Among these include, the number of species, variousfree-stream values, the ratio of the number of real to simulated molecules, and various species dependent properties.

    2.3. Parallel Implementation. At the pre-processing stage, the mesh and associated fields are decomposed. Theprimary goal is to partition the domain with minimal effort that still provides an economic solution [21]. Open-FOAM is equipped with several geometric, decomposition options, including directional SIMPLE, directionalHierarchical, METIS, and manual [21]. All subsequent validation and user-level applications within this study

  • DRAFT

    englishFOAMDSMC - A DSMC SOLVER FOR RAREFIED FLOW APPLICATIONS BASED ON OPENFOAM 5

    were conducted using the directional SIMPLE method, wherein the domain was partitioned in accordance with auser-specified direction.

    Once the domain is decomposed, each processor executes the core DSMC routines in serial for all particles andcells within its domain. Parallel communication occurs only when particles cross interprocessor boundaries. Asstated previously, each processor maintains a dynamically linked list of local molecules within its domain. Uponentry or exit of molecules from a certain processor domain, this list is adjusted accordingly. Each particle is movedaccording to its velocity and time step to the new position. Parallel communication of particles (along with theirrespective properties) is conducted via standard send and receive functions of the Message Passing Interface (MPI).

    In addition to parallelism through domain decomposition, unsteady applications, using ensemble averaging,include an additional level of parallelism. Since the inherent feature of ensemble averaging includes the solutionsfrom independent realizations, these may therefore be performed on different parallel machines, and later com-bined to render appropriately averaged unsteady results. This combination of parallelism thus renders unsteadyapplications particularly attractive, since it makes use of both embarrassingly parallel and decomposition methods.

    Upon completion of a steady or unsteady, parallel run, the results are recombined onto a single processor andmade available for use with several post-processing utilities for which OpenFOAM is compatible. These include,VTK [25] , Paraview [26], Fluent [22], Fieldview [27], and EnSight [28].

    2.4. Unsteady Implementation. As stated previously, unsteady DSMC applications are typically carried out us-ing ensemble rather than time averaging techniques. Although both techniques require increased computer timeand resources over their steady-state counterparts, the primary disadvantage of time averaging is often the ex-cessive number of simulated molecules per cell that is required. The DS3V algorithm [11], for example, wasimplemented with time averaging functionality, and requires upwards of 500 molecules per cell. This necessarilylimits the domain size of most unsteady simulations, particularly three-dimensional applications. Ensemble av-eraging, in contrast, may be carried out with cell samples comparable to those of steady-state simulations (10-30molecules/cell). The drawback; however, is still the increased time associated with carrying out numerous ensem-bles for each time interval. Implementing dual levels of parallelism, as suggested earlier, however, can significantlyreduce the runtimes associated with unsteady applications.

    Ensemble averaging is conducted over a user specified number of independent realizations. These ensemblesmay be repeated (in the case of highly transient flows) for each time step, or for more relaxed, unsteady conditions,after several time steps have elapsed. Depending on the duration and complexity of the flow simulation, thecomputational processing and storage requirements needed for unsteady flows is clearly much greater than that forsteady flows.

    Unsteady flows often require the need to input changing species and flow properties. The foamDSMC solver isunique from other unsteady parallel DSMC solvers in this regard. Changing flow and species properties, such asflow concentrations, particle velocities, species fractions, and temperatures influence the overall flow field throughincoming molecules. The foamDSMC solver, as appropriate, incorporates these changing properties within theparticle injection function, and allows the user to input these conditions, prior to running a simulation, within aninput file.

    3. VALIDATION CASES

    3.1. Steady Hypersonic Corner Flow. The hypersonic corner flow application case has become a standard fromwhich several parallel implementations of DSMC have been validated [17] [19]. The case involves two flat platesoriented perpendicular to each other and running parallel to the free stream. The computational domain as shownin Figure 2 consists of a parallelepiped that extends 0.1 m along the x direction and 0.6 m in both the y and zdirections. The domain is composed of 10x6x6 uniform, hexagonal cells, each with side length 0.01 m. Thefree steam conditions consist of a number density and velocity of 1.0E20 m3 and (1936, 0, 0)m/s respectively.The plate wall temperatures were set to 1000 K and were modeled as diffusely reflecting with complete thermalaccommodation. A fixed time step of 1.3E-6 seconds was used, and steady state conditions were obtained afterapproximately 0.1 seconds. The ratio of real to simulated molecules was 1.2E13, and resulted in an average ofapproximately 3600 molecules, or 10 molecules per cell. A multi-species, polyatomic gas composed ofN2, O2, andO was used, and the phenomenological inelastic collision model of Larsen and Borgnakke [20] was implemented.The specific gas properties, taken at standard conditions (101.3KPa, 0oC ) may be seen in Table 2, and wereobtained from Appendix A of [19].The approximate value of the mean free path was calculated at 0.0068 m andresulted in a Knudsen number of 0.113.

    The foamDSMC algorithm was validated both in serial and parallel (using up to eight processors) with BirdsThree-dimensional, structured algorithm [19]. Parallel decompositions were performed in accordance with thesimple method of OpenFOAM [21], wherein the domain was uniformly partitioned according to the number ofprocessors specified by the user.

  • DRAFT

    englishFOAMDSMC - A DSMC SOLVER FOR RAREFIED FLOW APPLICATIONS BASED ON OPENFOAM 6

    FIGURE 2. Hypersonic corner flow geometry composed of 360 hexagonal elements and an 8-processor parallel decomposition.

    TABLE 2. foamDSMC input properties of N2, O2,and O at standard conditions.

    Gas DOF Mol. Mass Viscosity index Dia. Species Fraction mx1027kg dx1010m dx1010m

    N2 2 46.5 0.74 4.17 0.777O2 2 53.12 0.77 4.07 0.184O 0 26.58 0.75 3.0 0.0391

    FIGURE 3. Hypersonic sphere flow tetrahedral mesh along the z= 0.04 plane, and a 32-processor, parallel domain decomposition.

    The results of Figure ?? show contours of number density, temperature and velocity magnitude at the x= 0.05m,and the z= 0.03m planes. Also shown are respective line plots comparing the commercial DS3V solver [11] withthe foamDSMC solver. As indicated, the agreement between the DS3V and foamDSMC solvers in serial andparallel (using up to 8 processors) is very good. Note, at present the foamDSMC solver does not implement surfacesampling, and thus applicable results corresponding to surface pressure and shear stress were not available.

    3.2. Steady Hypersonic Sphere Flow. A second, steady-state validation test of the foamDSMC solver was con-ducted on a hypersonic sphere flow application, using Birds, three-dimensional, commercial, DSMC solver, DS3V[11]. The case consisted of a 0.02 m diameter sphere centered in the y, z plane and located just aft of center inthe x direction ( see Figure 3). The computational domain consisted of a parallelepiped extending 0.1 m in the xdirection and 0.8 m in both the y and z directions. A multi-species, polyatomic gas composed of N2, O2, and O wasused, with free-stream flow conditions and species properties identical to those of of the corner flow application.A mean free-path ( ) of 0.0068m resulted in a Knudsen number (based on the sphere diameter) of 0.34. The do-main was composed of 9,200 unstructured, tetrahedral cells (see Figure 3), with average cell widths ranging fromapproximately 0.33 (adjacent to the sphere surface) to . The sphere wall temperatures were set to 1000 Kand were modeled as diffuse with complete thermal accommodation. A time step of 6.5E-7 seconds was used, andsteady state conditions were obtained after approximately 0.01 seconds. The ratio of real to simulated molecules(FNUM) used was 1.0E12, and resulted in approximately 64,000 molecules.

    The results are shown in Figure 4. Contours comparisons of DS3V and foamDSMC, are shown with numberdensity, temperature and velocity magnitude at the z=0.04 m plane. Also shown, are the respective line plotsalong the stagnation streamline stream line, located at, y= 0.04m; z= 0.04m; 0m x 0.05m. As indicated, theagreement, using up to 32 processors is very good.

  • DRAFT

    englishFOAMDSMC - A DSMC SOLVER FOR RAREFIED FLOW APPLICATIONS BASED ON OPENFOAM 7

    FIGURE 4. Hypersonic sphere flow simulation of a multi-species, polyatomic gas composed of N2, O2, andO atz = 0.04m. Shown are contours of number density, temperature, and velocity magnitude comparing the commercialDS3V and the foamDSMC algorithms. Also shown are comparison plots (using up to 32 processors) of these respectivequantities along the stagnation streamline corresponding to, y= 0.04m; z= 0.04m; 0m x 0.05m.

    3.3. Unsteady Sphere Flow. The foamDSMC solver was also validated with respect to unsteady applications.The following unsteady, validation case was performed on the hypersonic sphere flow application (initially at rest,but with remaining flow conditions as prescribed above), and was compared with the unsteady DS3V solver. DS3Vwas initially, uniformly distributed with approximately 3.0E6 argon molecules which resulted in approximately 500molecules per cell. An initial time step() of 1.0 E-7 seconds was used, with sampling conducted every 2 . ThefoamDSMC algorithm was initialized with approximately 3.18E5 argon molecules, or an average of 34 moleculescell. For each 2 sampling interval, 35 independent ensembles were conducted.

    Figure 5 shows number density contours using DS3V and foamDSMC at t=5.0E-7, t=1.1E-6, and t=2.1E-6seconds. As indicated, the wake quickly becomes more elongated behind the trailing edge, until it completely fillsthe entire aft shadow of the sphere. This unsteady wake phenomenon is a result of entrainment by the surroundinghigh speed flow. The leading edge number density reveals the evolving shock layer. The thickness of this layer,continues to increase, until the near steady-state thickness is achieved at t=2.1E-6 seconds. Figure 5 also showscomparison plots of number density using foamDSMC and DS3V along the stagnation line at y = 0.04m; z =0.04m; 0m x 0.05m, and at the three noted time intervals. The results of foamDSMC were computed usingfour processors, and show good agreement with the DS3V algorithm.

    4. APPLICATION OF foamDSMC TO THE CODA II SOUNDING ROCKET FLIGHT

    Subsequent to its successful validation, the foamDSMC solver was applied to the CODA II sounding rocketflight. The mission details of the flight are excluded here due to excessive length, but may be found in [29, 30]. Inbrief summary, the launch was conducted in order to investigate the atomic oxygen (AO) concentration within theMesosphere and lower Thermosphere (MALT). Past investigations [31, 32] have revealed that substantial externalinfluences, primarily aerodynamic, serve to inhibit the accurate measurement of AO with in-situ measurementsensors. Numerical simulation via DSMC of the compressible flow surrounding the rocket along various up-leg

  • DRAFT

    englishFOAMDSMC - A DSMC SOLVER FOR RAREFIED FLOW APPLICATIONS BASED ON OPENFOAM 8

    FIGURE 5. Unsteady contour plots of number density comparing foamDSMC and Birds DS3V algorithm usingargon gas, at t=5.0E-7 secs.; t=1.1E-6 secs.; and t=2.1E-6 secs. Also shown are comparison plots of the commercialDS3V and foamDSMC algorithms along the stagnation streamline corresponding to, y = 0.04m; z = 0.04m; 0m x 0.05m.

    and down-leg points in its trajectory have served to significantly reduce these effects. The following is a briefsummary of the setup and results obtained using the foamDSMC solver.

    The foamDSMC steady-state solver was applied to 25 different altitudes separated by 2 km intervals. Eachaltitude was solved for both the up-leg and down-leg trajectory resulting in a total of 49 simulations. Several dif-ferent grids were required each with varying cell concentrations appropriate to the mean free path. As a generalrule, a different grid was required for each 10 kilometer interval. The number of cells ranged from over 1.1E6 forthe lower 90 km altitudes to approximately 3.0E4 for apogee and near apogee cases. The number of simulatedmolecules also varied in accordance with the number of cells. In general, in order to maintain a statistically sig-nificant sample size, and reduce potential statistical fluctuation, the average number of molecules per cell was noless than ten. The number of simulated molecules from 90 km to apogee thus ranged from approximately 11.0E6to 3.0E5. The macroscopic variables which consisted of number density, temperature, and velocity were sampledin all cases every two time steps, while output files were written every 300 time steps. The simulations were allconducted in parallel utilizing Utah State Universitys Uinta cluster supercomputer. The number of processors var-ied from 4 to 16 and resulted in average, steady-state run times of between 6 and 12 hours respectively. Additionalinput conditions may be found in [jeffs diss].

    Figure 6 shows contours of number density and velocity magniture for various up-leg and down-leg locationsof the CODA II trajectory. Also shown are the representative grids used for collision domains, and macroscopicvariable sampling. As indicated, the grid refinement decreases dramatically with increased altitude, due to theincrease of mean free path (ranging from nearly 2 cm to more than 10 m).

    5. BENCHMARKING RESULTS

    The following benchmarking results of foamDSMC were performed on the Uinta supercomputer. This LinuxNetworx cluster was installed in September of 2005 at the Center for High Performance Computing at Utah State

  • DRAFT

    englishFOAMDSMC - A DSMC SOLVER FOR RAREFIED FLOW APPLICATIONS BASED ON OPENFOAM 9

    FIGURE 6. foamDSMC results of number density and velocity contours at specified locations along the up-leg anddown-leg CODA II trajectory. Also shown is the changing angle of attack, and the coarsening of grid/cell concentrationwith increasing altitude.

    University (HPC@USU) and consists of one server node, two interactive login nodes and 62 compute nodes. Eachcompute node has two dual core AMD Opteron 265, and 4 GBytes of main memory. The cluster has 3 networksincluding, a GBit switched Ethernet, a Flat Neighborhood Network build fro GBit and a Myrinet interconnect.Myrinets one-way and two-way (summed bidirectional) data rates are: 1.98 Gb/s (248 MBytes/s), and 3.92 Gb/s(490 MBytes/s), respectively. The Scientific Linux distribution was used for the operating system.

    5.1. Single Processor Performance. To evaluate the single processor performance of foamDSMC, the PerfSuitecommand line utility, psrun [33] was utilized. Specific results presented herein are applicable to the single 110km, up-leg, CODA II case, composed of 5.03E4 cells and initialized with 2.71E5 molecules. All profiles werecomputed over 50 time steps, each of duration 1.0E-5 seconds, resulting in an approximate wall time of 415.9seconds.

    Table 3 shows the functions within foamDSMC which consume the largest percentage of runtime. Clearlythe majority of time, 55.01% and 30.97%, is spent finding the nearest cell locations and identifying cell faces,respectively. A significantly smaller proportion of the total time, 3.90% and 1.58%, is spent in the respectivemovement and tracking of particles. The remainder of the functions (runtime% < 1.5%) are not shown.

    Table 4 shows the foamDSMC functions with the largest percentage of L2 data cache misses. As indicated,findFaces suffers from the majority of misses with 17.4% of the total. This is followed by the 10.65% and 10.23%miss percentages of the functions, sample and inject, respectively. The move function is also included, andshows a 7.8% L2 miss percentage.

    Combining Tables 3 and 4, we see that the findFaces is both taking up a considerable amount of runtime, andsuffering from a large number of L2 data cache misses. This function, thus clearly represents an ideal candidate forpossible performance improvement. The move function also appears in both tables and is therefore, also worthconsidering for possible improvement.

    Table 5 provides an estimate of total L1 and L2 data cache misses as functions of increased problem size, asquantified by the increase in simulated molecules. As shown, the L1 miss rate is fairly constant, representing amere 3.7% difference with a doubling of the number of molecules used. In contrast, the L2 miss rate shows morethan a 51% increase.

    5.2. Parallel Performance. Figure 7 shows the parallel speedup and efficiency results pertaining to the two vali-dation cases and three CODA II applications. Specifically, the CODA II results correspond to the steady-state 110km, up-leg and apogee cases, as well as the unsteady apogee case. As indicated, the best results are attributableto the 110 km up-leg case, with its relatively large problem size of 5.4E4 cells and 4.5E5 molecules. The 110 kmresults further show super linear scaling, occurring with numbers of processors fewer than 16. The steady cornerresults exhibit the worst parallel performance, attributable to the use of only 360 cells and 3.6E3 molecules. The

  • DRAFT

    englishFOAMDSMC - A DSMC SOLVER FOR RAREFIED FLOW APPLICATIONS BASED ON OPENFOAM 10

    TABLE 3. Profile of foamDSMC functions sorted by time

    Time (sec) Time % Function Description220.93 55.01 FindNearestCell Find nearest cell location124.38 30.97 FindFaces Identify cell faces15.66 3.90 move move particles6.34 1.58 trackToFace Track particle to nearest face location

    TABLE 4. L2 data cache misses

    L2 misses % Total Function Description1,131,874 17.40 findFaces Identify cell faces692,439 10.65 sample Sample the cell for macroscopic quantities665,049 10.23 inject Inject particles into domain511,749 7.87 move Move particles

    TABLE 5. Total number of L1 and L2 data cache misses with increased problem size

    L1 misses L2 misses Number of molecules5.3E9 0.29E9 2.72E55.5E9 0.49E9 5.44E5

    2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32No. of Processors

    2468

    101214161820222426283032

    Spee

    dup

    Steady Corner (No. Cells=360; No. Mols=3.6E3)Steady Sphere (No. Cells=9.2E3; No. Mols=6.4E4)Stdy. CODA 110km Up (No. Cells=5.346E4; No. Mols=4.54E5)Stdy. CODAI Apogee (No. Cells = 1.75E4; No. Mols = 2.44E5)Unstdy. CODA Apogee (No. Cells= 1.75E4; No. Mols.=3.0E5)Ideal

    2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32No. of Processors

    00.10.20.30.40.50.60.70.80.9

    11.11.21.31.41.5

    Effic

    ienc

    y

    Steady Corner (No. Cells=360; No. Mols=3.6E3)Steady Sphere (No. Cells=9.2E3; No. Mols=6.4E4)Stdy. CODA 110km Up (No. Cells=5.03E4; No. Mols=4.54E5)Stdy. CODA Apogee (No. Cells=1.75E4; No. Mols=2.44E5)Unstdy. CODA Apogee (No. Cells=1.75E4; No. Mols=3.0E5)Ideal

    FIGURE 7. Parallel speedup and efficiencies of foamDSMC applied to the corner and sphere flow validation cases,as well as selected CODA II applications.

    parallel efficiency plot, defined as the ratio of speedup to the number of processors, further illustrates these find-ings, illustrating super linear scaling with efficiencies greater than unity. Clearly from these results, the scalabilityof the application depends on the problem size.

    Finally, Figure shows the scaled speedup for the 110 km up-leg, CODA II case. The scaling was based on alinear, 1:1 ratio of problem size (as quantified by the number of molecules) to number of processors. As indicated,the results showed a maximum scaled speedup of approximately 1.9 using 4 processors, and downward slopesthereafter, reaching approximately 1.1 using 16 processors.

    6. CONCLUSIONS

    The development, validation, and initial performance of foamDSMC, as an object oriented, parallel, steady/unsteady,DSMC solver was shown to be successful. The validation results applied to the hypersonic corner and sphereflows showed the solver to be comparable in accuracy to existing commercial codes. Furthermore, the solver alsodemonstrated credible results when applied to particle applications, including the CODA II flight profile. Serialbenchmarking revealed that possible performance related improvements may be confined to certain specific func-tions of the solver. Parallel benchmarking revealed that the solvers scalability was dependent on problem size andshowed super linear scaling effects for certain sizable applications with lower numbers of processors.

    Although utilitarian, the solver still requires several future modifications for increased capability. Among these,include: 1) The development of surface sampling routines; 2) An adaptive grid capability; 3) Accommodation toincompressible flows, with applications to the micro and nano scales; 4) Hybridization of the solver to accommo-date rarefied as well as continuum based flows.

  • DRAFT

    englishFOAMDSMC - A DSMC SOLVER FOR RAREFIED FLOW APPLICATIONS BASED ON OPENFOAM 11

    2 4 6 8 10 12 14 16No. Processors

    1

    1.2

    1.4

    1.6

    1.8

    2

    Scal

    ed S

    peed

    up

    FIGURE 8. Scaled speedup of foamDSMC applied to the 110 km up-leg, CODA II case

    ACTSCKNOWLEDGMENTS

    The authors would like to acknowledge the Space Dynamics Laboratories enabling technologies program, andthe developers of OpenFOAM. Computer time from the Center for High Performance Computing at Utah StateUniversity is gratefully acknowledged. The computational resource, the Uinta cluster supercomputer, was providedthrough the National Science Foundation under Grant No. CTS-0321170 with matching funds provided by UtahState University.

    REFERENCES

    [1] Rault, D., Aerodynamics of the Shuttle Orbiter at High Altitudes, Journal of Spacecraft and Rockets, Vol. 31, No. 6, 1994, pp. 944952.[2] Hass, B. L. and Schmitt, D. A., Simulated Rarefied Aerodynamics of theMagellan Spacecraft during Aerobraking, Journal of Spacecraft

    and Rockets, Vol. 31, No. 6, 1994, pp. 980985.[3] Wilmoth, R., Mitcheltree, R., and Moss, J., Low-Density Aerodynamics of the Stardust Sample Return Capsule, AIAA-97-2510, 1997.[4] Moss, J., Blanchard, R., Wilmoth, R., and Braun, R., Mars Pathfinder Rarefied Aerodynamics: Computations and Measurements,

    Journal of Spacecraft and Rockets, Vol. 36, No. 3, 1999, pp. 330339.[5] Plimpton, S. and Bartel, T., Parallel particle simulation of low-density fluid flows,USDepartment of Energy Report, , No. DE94-007858,

    1993.[6] Yang, X. and Yang, J., Micromachined membrane particle filters, Sensors and Actuators, 1999, pp. 184191.[7] Piekos, E. and Breuer, K., Numerical modeling of micromechanical devices using the direct simulation Monte Carlo method, Journal

    of Fluids Engineering, Vol. 118, 1996, pp. 464469.[8] Nanbu, K., Theoretical basis of the direct simulation Monte Carlo method, Journal of the Physical Society of Japan, 1982.[9] Wagner, W., A convergence proof for Birds direct simulation Monte Carlo method for the Boltzmann equation, Journal of Statistical

    Physics, Vol. 66, No. 3/4, 1992, pp. 10111044.[10] Bird, G., Recent Advances and Current Challenges for DSMC, Computer and Mathematical Applications, Vol. 35, No. 1, 1998, pp. 1

    14.[11] Bird, G., "The DS3G Program Users Guide, Version 1.1", Killara, New South Wales, Australia, 2003.[12] Ota, M., Taniguchi, H., and Aritomi, M., A parallel processings for direct simulation Monte Carlo method, Japan Society of Mechanical

    Engineering, Vol. 61, pp. 496502.[13] Nance, R., Wilmoth, R., Moon, B., Hassan, H., and Saltz, J., Parallel solution of three-dimensional flow over a finite flat plate, AIAA

    Paper no. 94-0219, 1994.[14] Matsumoto, Y. and Tokumasu, T., Parallel computing of diatomic molecular rarefied gas flows, Parallel Computing, Vol. 23, pp. 1249

    1260.[15] Dietrich, S. and Boyd, I., Scalar and parallel optimized implementation of the direct simulation Monte Carlo method, Journal of

    Computational Physics, Vol. 126, 1996, pp. 328342.[16] Wu, J. and Lian, Y., Parallel three-dimensional direct simulation Monte Carlo method and its applications, Computers and Fluids,

    Vol. 32, 2003, pp. 11331160.[17] LeBeau, G., A parallel implementation of the direct simulation Monte Carlo method, Computer Methods in Applied Mechanics and

    Engineering, Vol. 174, 1999, pp. 319337.[18] Oran, E., Oh, C., and Cybyk, B., Direct Simulation Monte Carlo: Recent Advances and Applications, Annual Review of Fluid Mechan-

    ics, Vol. 30, 1998, pp. 403441.[19] Bird, G., Molecular Gas Dynamics and the Direct Simulation of Gas Flows, Oxford University Press, Sydney, 1994.[20] Borgnakke, C. and Larsen, P., Statistical collision model for Monte Carlo simulaton of polyatomic gas mixture, Journal of Computa-

    tional Physics, Vol. 18, No. 4, 1975, pp. 405420.[21] OpenFoam, The Mews, Picketts Lodge, Surrey RH1 5RG, UK, 2006.[22] Fluent 6.1 Users Guide, Lebanon, NH, 1998.[23] Ansys CFX V5.7 Users Manual, Canonsburg, PA, 2004.[24] STAR-CD V3.20, Tustin, CA, 2006.[25] VTK 4.4 Users Guide, Clifton park, NY, 2004.[26] ParaView Guide, Clifton park, NY, 2004.[27] FIELDVIEW 8.0 Users Guide, Lyndhurst, NJ, 2001.[28] EnSight 7.6 User Manual, Apex, NC, 2003.

  • DRAFT

    englishFOAMDSMC - A DSMC SOLVER FOR RAREFIED FLOW APPLICATIONS BASED ON OPENFOAM 12

    [29] Allen, J. and T.Hauser, Aerodynamic Influences on Atomic Oxygen Sensors from Sounding Rockets, 35th AIAA Fluid DynamicsConference and Exhibit, AIAA, Toronto, Canada, June 2005.

    [30] Allen, J. and Hauser, T., Unsteady DSMC Simulations of the Aerodynamics of Sounding Rockets, 44th AIAA Aerospace SciencesMeeting and Exhibit, AIAA, Reno, Nevada, Jan. 2006.

    [31] Patterson, P., In Situ Measurements of Upper Atmospheric Atomic Oxygen: The ATOX Resonant Fluorescence/Absorption Sensor, Ph.D.thesis, Utah State University, 2005.

    [32] Patterson, P., Swenson, C., Clemmons, J., Christensen, A., and Gregory, J., Atomic oxygen erosion observations in a diffuse aurora,EOS Trans. AGU, Fall Meet. Suppl., No. Abstract SA21A-02, AGU, San Francisco, CA, Dec. 2003.

    [33] Kufrin, R., PerfSuite: An Accessible OpenSource Performance Analysis Environment for Linux, 6th annual Internation conference onLinux clusters: The HPC Revolution, Chapel Hill, NC, April 2005.