mpirun: Unrecognized argument mca
[mpiexec@tamnun] match_arg (./utils/args/args.c:194): unrecognized argument mca
[mpiexec@tamnun] HYDU_parse_array (./utils/args/args.c:214): argument matching returned error
[mpiexec@tamnun] parse_args (./ui/mpich/utils.c:2964): error parsing input array
^^^^^
[mpiexec@tamnun] HYD_uii_mpx_get_parameters (./ui/mpich/utils.c:3238): unable to parse user arguments
^^^^^
You are using MPICH in the last case. MPICH is not Open MPI and its process launcher does not recognize the --mca
parameter that is specific to Open MPI (MCA stands for Modular Component Architecture - the basic framework that Open MPI is built upon). A typical case of a mix-up of multiple MPI implementations.
Job fails while using srun or mpirun in slurm
The root cause is the mix of several MPI implementations that do not inter operate :
mpirun
is from Open MPImpiexec
is likely the builtin MPICH from Paraview- your app is built with Intel MPI.
Try using /nfs/apps/Compilers/Intel/ParallelStudio/2016.3.067/impi/5.1.3.210/bin/mpirun
(or /nfs/apps/Compilers/Intel/ParallelStudio/2016.3.067/impi/5.1.3.210/bin64/mpirun
) instead so the launcher will match your MPI library.
If you want to use srun
with Intel MPI, an extra step is required.
You first need to
export I_MPI_PMI_LIBRARY=/path/to/slurm/pmi/library/libpmi.so
openmpi ignored error: mca interface is not recognized
I've figured out the problem thanks to Gilles Gouaillardet's help on the OpenMPI forums.
Problem:
I installed the newer version 2.0.1 without uninstalling 1.10. Since I installed it at the same location, some mca files were overwritten while others have been removed or renamed in the newer version and were therefore still present in the directory. In the end, these module files were not recognised by version 2.0.1, resulting in the above warnings.
Solution:
- Remove all the pluging files:
rm -rf /usr/local/lib/openmpi
- Reinstall Openmpi:
make install
How to enable CUDA Aware OpenMPI?
This was an issue in the 20.7 release when adding UCX support. You can lower the optimization level to -O1 work around the problem, or update your NV HPC compiler version to 20.9 where we've resolved the issue.
https://developer.nvidia.com/nvidia-hpc-sdk-version-209-downloads
How can I increase OpenFabrics memory limit for Torque jobs?
Your mlx4_core
parameters allow for the registration of 2^20 * 2^4 * 4 KiB = 64 GiB
only. With 192 GiB of physical memory per node and given that it is recommended to have at least twice as much registerable memory, you should set log_num_mtt
to 23, which would increase the limit to 512 GiB - the closest power of two greater or equal to twice the amount of RAM. Be sure to reboot the node(s) or unload and then reload the kernel module.
You should also submit a simple Torque job script that executes ulimit -l
in order to verify the limits on locked memory and make sure there is no such limit. Note that ulimit -c unlimited
does not remove the limit on the amount of locked memory but rather the limit on the size of core dump files.
Spawn issue with mpi4py in the Anaconda Python distribution
I ran into the same problem and one solution was compiling mpi4py with openmpi instead of mpich (see 'Compute Pi' example in the mpi4py documentation).
See this unresolved issue.
Tested on:
Ubuntu 16.04
Anaconda 4.0.0
python 3.5.0
mpich 3.2.0
openmpi 1.10.2
mpi4py 2.0.0
Related Topics
Execute a Process from Memory Within Another Process
How to Compile Curlpp on Ubuntu
Computing Length of a C String at Compile Time. Is This Really a Constexpr
Why Is Std::Function Not Equality Comparable
Sort Array by First Item in Subarray C++
Qt: Resizing a Qlabel Containing a Qpixmap While Keeping Its Aspect Ratio
Why Does C++ Allow Unnamed Function Parameters
Replacement for Deprecated Register Keyword C++ 11
Differencebetween Std::Array and Std::Vector? When Do You Use One Over Other
Emulate "Double" Using 2 "Float"S
How to Force Linker to Use Shared Library Instead of Static Library
Cannot Get Makefile to Build Each Object from Its Corresponding Source
How to Increase Thread Priority in Pthreads
Floating Point Format for Std::Ostream
When Is It Best to Use the Stack Instead of the Heap and Vice Versa