Memory and number of cores¶
This page contains info about special features related to the Gaussian install made on Stallo, but also general issues related to Gaussian only vaguely documented elsewhere.
Choosing the right version¶
To see which versions of Gaussien are available, use:
$ module avail Gaussian/
To load a specific version of Gaussian, use for instance:
$ module load Gaussian/g16_B.01
Gaussian over Infiniband¶
First note that the Gaussian installation on Stallo is the Linda parallel version, so it scales somewhat initially. On top of this, Gaussian is installed with a little trick, where the loading of the executable is intercepted before launched, and an alternative socket library is loaded. This enables the running Gaussian natively on the Infiniband network giving us two advantages:
- The parallel fraction of the code scales to more cores.
- The shared memory performance is significantly enhanced (small scale performance).
But since we do this trick, we are greatly depending on altering the specific
node address into the input file: To run gaussian in parallel requires the
%NProcshared in the
Link 0 part of the
input file. This is taken care of by a wrapper script around the
original binary in each individual version folder. This
is also commented in the job script example. Please use
our example when when submitting jobs.
We have also taken care of the rsh/ssh setup in our installation procedure, to avoid .tsnet.config dependency for users.
Gaussian is a rather large program system with a range of different binaries, and users need to verify whether the functionality they use is parallelized and how it scales.
Due to the preload Infiniband trick, we have a somewhat more generous policy when it comes to allocating cores/nodes to Gaussian jobs but before computing a table of different functionals and molecules, we strongly advice users to first study the scaling of the code for a representative system.
Please do not reuse scripts inherited from others without studying the performance and scaling of your job. If you need assistance with this, please contact the user support.
We do advice people to use up to 256 cores (
--tasks). We have observed
acceptable scaling of the current Gaussian install beyond 16 nodes for the jobs
that do scale outside of one node (i.e. the binaries in the $gXXroot/linda-exe
folder). Linda networking overhead seems to hit hard around this amount of
cores, causing us to be somewhat reluctant to advice going beyond 256 cores.
Since we have two different architectures with two different core counts on
--exclusive flag is important to ensure that the distribution
of jobs across the whole system are done in a rather flexible and painless way.
Gaussian takes care of memory allocation internally.
That means that if the submitted job needs more memory per core than what is in
average available on the node, it will automatically scale down the number of
cores to mirror the need. This also means that you always should ask for full
nodes when submitting Gaussian jobs on Stallo! This is taken care of by the
--exclusive flag and commented in the job script example.
%mem allocation of memory in the Gaussian input file means two things:
- In general it means memory/node – for share between
nprocshared, and additional to the memory allocated per process. This is also documented by Gaussian.
- For the main process/node it also represents the network
buffer allocated by Linda since the main Gaussian process takes a part
and Linda communication process takes a part equally sized – thus you should
never ask for more than half of the physical memory on the nodes, unless they
have swap space available - which you never should assume.
%memlimit will always be half of the physical memory pool given in MB instead of GB - 16000MB instead of 16GB since this leaves a small part for the system. That is why we would actually advise to use 15GB as maximum