How can I increase OpenFabrics memory limit for Torque jobs?
Your mlx4_core
parameters allow for the registration of 2^20 * 2^4 * 4 KiB = 64 GiB
only. With 192 GiB of physical memory per node and given that it is recommended to have at least twice as much registerable memory, you should set log_num_mtt
to 23, which would increase the limit to 512 GiB - the closest power of two greater or equal to twice the amount of RAM. Be sure to reboot the node(s) or unload and then reload the kernel module.
You should also submit a simple Torque job script that executes ulimit -l
in order to verify the limits on locked memory and make sure there is no such limit. Note that ulimit -c unlimited
does not remove the limit on the amount of locked memory but rather the limit on the size of core dump files.
mpirun: Unrecognized argument mca
[mpiexec@tamnun] match_arg (./utils/args/args.c:194): unrecognized argument mca
[mpiexec@tamnun] HYDU_parse_array (./utils/args/args.c:214): argument matching returned error
[mpiexec@tamnun] parse_args (./ui/mpich/utils.c:2964): error parsing input array
^^^^^
[mpiexec@tamnun] HYD_uii_mpx_get_parameters (./ui/mpich/utils.c:3238): unable to parse user arguments
^^^^^
You are using MPICH in the last case. MPICH is not Open MPI and its process launcher does not recognize the --mca
parameter that is specific to Open MPI (MCA stands for Modular Component Architecture - the basic framework that Open MPI is built upon). A typical case of a mix-up of multiple MPI implementations.
Related Topics
Usb Modem Is Echoing Back Wrong Characters
Removing of Specific Line in Text File
Extracting Variable in Yaml from a Shell Script
Mathematical Expression Result Assigned to a Bash Variable
Sed Not Working [Unterminated 'S' Command]
How to Add Output "Non_Assigned" When There Is No Match in Grep
How to Compile This Asm Code Under Linux with Nasm and Gcc
How to Increase the Maximum Number of Characters That Ksh Variable Accepts
I'm Having Difficulty Understanding the Shellshock Vulnerability Verification
Escaping Single Quotes in Shell for Postgresql
Shell Function Does Not Return Values Greater Than 255
How to Modify Eip's Tracee Forked Procee
Sort Command in Not Working Properly in Unix for Sorting a CSV File
Differencebetween These Two Commands Which Are Used to Run Shell Script