## Parallel Scaling

Nanoelectronic Modeling may require significant computational resources. Benchmarking of two differenc codes is described here:

NEMO 1-D computes current flow in 1-D heterostructures where typically a double integral over momentum and energy is solved for many different bais points. The theory is based on the non-equilibriu Green function formalism. NEMO 1-D has been demonstrated to scale up to 23,000 processors in July of 2007 on ORNL's Cray XT3/4. Scaling experiments on various other TOP500 machines have also been performed.

NEMO 3-D computes electrnic structure in multi-million atom systems. The basic operation is the solution of an interior eigenvalue problem and requires a large number of sparse-matrix vector multiplies. NEMO 3-D has been demonstrated to scale to 8,192 cores in July of 2007 on various TOP500 machines. Details of the various scalings on these machines are available.

Parallelization work of NEMO 1-D began 1998 at the Jet Propulsion laboratory with compiler based parallelization and continued with tri-level MPI-based parallelization [40].

NEMO 3-D was immediately designed as a parallel code.

Several other benchmarks stemming from the early JPL work are also available.

- Compiler-based parallelization and benchmarking of the original NEMO-1D code.
- MPI-based parallelization of the original NEMO-1D code in loops corresponding to bias points, and integrations of the transverse momentum and energy.
- MPI-based parallelization of the NEMO-3D code:
- Benchmarks on a 32 CPU Beowulf cluster (Pentium III, 450 MHz)
- Benchmarks comparing a 32 CPU Beowulf (450MHz) to a 20 node dual CPU Beowulf (800MHz).
- Benchmarks on an SGI Origin 2000 compared to our Beowulf
- MPI-based parallel genetic algorithm package usage and code developmend (GENES).