Parallel Scaling

Parallel Scaling

Nanoelectronic Modeling may require significant computational resources. Benchmarking of two differenc codes is described here:

NEMO 1-D computes current flow in 1-D heterostructures where typically a double integral over momentum and energy is solved for many different bais points. The theory is based on the non-equilibriu Green function formalism. NEMO 1-D has been demonstrated to scale up to 23,000 processors in July of 2007 on ORNL's Cray XT3/4. Scaling experiments on various other TOP500 machines have also been performed.

NEMO 3-D computes electrnic structure in multi-million atom systems. The basic operation is the solution of an interior eigenvalue problem and requires a large number of sparse-matrix vector multiplies. NEMO 3-D has been demonstrated to scale to 8,192 cores in July of 2007 on various TOP500 machines. Details of the various scalings on these machines are available.

Parallelization work of NEMO 1-D began 1998 at the Jet Propulsion laboratory with compiler based parallelization and continued with tri-level MPI-based parallelization [40].

NEMO 3-D was immediately designed as a parallel code.

Several other benchmarks stemming from the early JPL work are also available.

  • Compiler-based parallelization and benchmarking of the original NEMO-1D code.
  • MPI-based parallelization of the original NEMO-1D code in loops corresponding to bias points, and integrations of the transverse momentum and energy.
  • MPI-based parallelization of the NEMO-3D code:
    • Benchmarks on a 32 CPU Beowulf cluster (Pentium III, 450 MHz)
    • Benchmarks comparing a 32 CPU Beowulf (450MHz) to a 20 node dual CPU Beowulf (800MHz).
    • Benchmarks on an SGI Origin 2000 compared to our Beowulf
  • MPI-based parallel genetic algorithm package usage and code developmend (GENES).
scaling to 23,000 CPUs
NEMO 1-D scaling to 23,000 cores
Scaling on ORNL Cray XT3/4
Benchmark 6 on Cray XT3/4
scaling to 23,000 CPUs
NEMO 1-D scaling to 23,000 cores
Scaling on ORNL Cray XT3/4
Benchmark 5 and 6 on Cray XT3/4
scaling to 23,000 CPUs
NEMO 1-D scaling to 23,000 cores
Scaling on 6 HPC Platforms
Benchmark 5 on six different HPC resources
scaling to 8,192 CPUs
NEMO 3-D scaling to 8,192 cores
Scaling on 2 HPC Platforms
Strong scaling of a constant problem size (8 million atoms) on 2 different HPC platforms. Solid / dashed lines correspond to a stored / recomputed Hamiltonian matrix. The largest number of cores available were 8,192 on Cray XT3/4 and IBM B/G.
scaling to 8,192 CPUs
NEMO 3-D scaling to 8,192 cores
Scaling on 6 HPC Platforms
Strong scaling of a constant problem size (8 million atoms) on 6 different HPC platforms. Solid / dashed lines correspond to a stored / recomputed Hamiltonian matrix. The largest number of cores available were 8,192 on Cray XT3/4 and IBM B/G.
scaling to 8,192 CPUs
NEMO 3-D scaling to 8,192 cores
Scaling on RPI BG/L
Strong (constant total problem size, diagonal dashed lines) and weak (constant number of atoms per core, horizontal solid lines) on RPI BG/L in recomputing mode. Excellent performance is evident for large enough problem sizes