7. Parallelized version¶
7.1. General remarks¶
The parallelized version of Spex uses the MPI standard 3.1.
It can be run on several CPUs on the same node or on several nodes with the command
whatever MPI launcher your computer system uses. In principle, there are no restrictions with respect to
the number of processes. A better performance is expected, though, if the number of processes is not a
prime number but has a long prime factorization because this gives the code more freedom to distribute
the work among the processes.
The default parallelization strategy of Spex is conservative in the sense that memory demand and
load imbalances are minimized. Often, the parallelized run can be sped up substantially by using
MPIBLK, see below.
7.2. Special MPI keywords¶
In many calculation types (GW, Hubbard U calculations, COHSEX, …) there is an outer loop over the k-point set.
By default, Spex does not parallelize over this loop because different k points need different computation times depending on their symmetry,
making the work distribution non-trivial. However, if there are many k points, it is recommendable to additionally
parallelize over this loop. This can be enabled with the keyword
MPIKPT. The k-loop parallelization is over nodes, not over processes.
The computation for each individual k point runs in parallel over the processes on the respective node, in the same way as all processes would without
If you only have a single node (or very few nodes) available, you can still use
MPIKPT in conjunction with
allows processes to be grouped into virtual nodes.
7.2.2. MPIBLK (SENERGY)¶
Another special parallelization layer is the parallelization over blocks of the self-energy matrix (or over the diagonal elements).
This may speed up the calculation if there are many blocks but may also result in work imbalance. (Different blocks need different
computation times.) Parallelization over blocks is enabled with the keyword
MPIBLK. (An optional argument, e.g.,
MPIBLK 5, can
be used to fine-tune the work distribution. It gives the “relative computational overhead” of each block that does not
scale with the number of bands. The default value is 10.)
MPIBLK is enabled automatically except for
GW FULL calculations, because the different sizes of self-energy blocks in
may lead to work imbalances.
We note that
MPIBLK increases the memory demand.
||Enable parallelization over self-energy blocks (or diagonal elements).|
||Enable parallelization with assumed large “computational overhead”.|
||Disable parallellization over blocks.|
(*) The shared-memory functionality of MPI 3.1 is used for several big arrays,
which allows the same memory region to be accessed by several MPI processes.
By default, all processes running on one node share the memory.
It can be reasonable to change this behavior to, e.g., having processes on the same socket or on the same NUMA domain to share memory.
This is possible with
MPISPLIT NODE (default),
MPISPLIT SOCKET (only works with
where, in this example, groups of 16 processes will share memory: the ranks 0-15, 16-31, etc.
Using this option increases the memory consumption but might be advantageous
in terms of memory bandwidth and computation time.
7.2.4. MPISYM (SENERGY)¶
(*) Using Padé approximants in the evaluation of the GW self-energy (
CONTOUR with Padé approximant for W or
might lead to a slight symmetry breaking in the quasiparticle energies, leading to unphysical lifting of degeneracies.
(This is caused by the fact that Thiele’s continued-fraction Padé formula is numerically unstable, especially for a large number
of imaginary frequencies.)
Since these errors are usually very small, this is not a big problem. Furthermore, when the full self-energy matrix is calculated
GW FULL), Spex performs a symmetrization of the self-energy matrix, which enforces the correct degeneracies again.
However, for testing purposes, it is possible to enforce the correct symmetries already in the evaluation of the self-energy by
using the keyword
MPISYM. This requires additional communication among the processes, potentially slowing down
the calculation due to the necessary blocking synchronization.
In the parallelized version, the
RESTART option works in exactly the same way as for the serial version
(see Section 4.1.7, Section 5.1.13, and Section 5.5.3). However, the restart
data might be written to separate files when
MPIKPT is used. (The underlying reason for this is that binary or HDF5 files can be written
in parallel, i.e., by all processes at the same time, only if the dataset sizes are known in advance. This is not the case for the restart data.)
Instead of a single file “spex.cor”, Spex writes the files “spex.cor.1”, “spex.cor.2”,
et cetera, and a directory “spex.cor.map”, which contains, for each k point, links to the respective cor file that contains the data.
Furthermore, in addition to “spex.sigc” (“spex.sigx”, “spex.wcou”, “spex.ccou”, “spex.core”),
the files “spex.sigc.2”, “spex.sigc.3”, et cetera, might be
written (and analogously for the other file names). These multiple files should be taken into account, when restart files are transferred.
Switching between different numbers of processes, different numbers of nodes,
or between serial and parallel runs should not lead to problems. Spex should be able to always read the correct data.
Paragraphs discussing advanced options are preceded with (*), and the ones about obsolete, unmaintained, or experimental options are marked with (**). You can safely skip the paragraphs marked with (*) and (**) at first reading.