Change the RLIMIT_NPROC in linux
take a look at /etc/security/limits.conf
or /etc/security/limits.d/
if the latter exists in your installation. Don't forget to re-login afterward
which is better way to edit RLIMIT_NPROC value
First, I believe you are wrong in having nearly a thousand threads. Threads are quite costly, and it is usually not reasonable to have so much of them. I would suggest having a few dozen threads at most (unless you run on a very costly super-computer).
You could have some event loop around a multiplexing syscall like poll(2). Then a single thread can deal with many thousands of connections. Read about the C10K problem and epoll. Consider using some event libraries like libevent or libev etc...
You could start your application as root (perhaps by using setuid techniques), set-up the required resources (in particular, opening privileged TCP/IP ports), and change the user with setreuid(2)
Read Advanced Linux Programming...
You could also wrap your application around a tiny setuid C program which increase the limits using setrlimit(2), change the user with setreuid
, and at last execve(2) your real program.
Why setrlimit(RLIMIT_NPROC) doesn't work when run as root but works fine when run as a normal user?
the following proposed code:
- cleanly compiles
- fails to perform the desired functionality (?why?)
- incorporates all the needed header files
- only the 'parent' tries to create child processes
- note: the OPs and the proposed program both exit without waiting for the child processes to finish. I.E. The main program should be calling
wait()
orwait_pid()
for each child process started. - Note: the call to
sleep(1)
keeps the output nice and organized. However, during thatsleep
the child complete and exits, so there is actually only 1 child process running any at any one time, so even if the call tosetrlimit()
had been successful, that 'fork()` loop could have run forever.
and now, the proposed code:
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <sys/types.h>
#include <unistd.h>
int main( void )
{
struct rlimit rlim;
rlim.rlim_cur = rlim.rlim_max = 4;
if( getrlimit(RLIMIT_NPROC, &rlim) == -1 )
{
perror( "getrlimit failed" );
exit( EXIT_FAILURE );
}
if( setrlimit(RLIMIT_NPROC, &rlim) == -1 )
{
perror( "setrlimit failed" );
exit( EXIT_FAILURE );
}
for (int i = 0; i < 4; ++i)
{
pid_t pid = fork();
switch( pid )
{
case -1:
perror( "fork failed" );
exit( EXIT_FAILURE );
break;
case 0:
printf( "child pid: %d\n", getpid() );
exit( EXIT_SUCCESS );
break;
default:
printf( "parent pid: %d\n", getpid() );
break;
}
sleep(1);
}
return 0;
}
a run of the program results in:
fork failed: Resource temporarily unavailable
which indicates a problem with the call to setrlimit()
from the MAN page:
RLIMIT_NPROC
This is a limit on the number of extant process (or, more pre‐
cisely on Linux, threads) for the real user ID of the calling
process. So long as the current number of processes belonging
to this process's real user ID is greater than or equal to this
limit, fork(2) fails with the error EAGAIN.
The RLIMIT_NPROC limit is not enforced for processes that have
either the CAP_SYS_ADMIN or the CAP_SYS_RESOURCE capability.
so, the call to setrlimit()
is limiting the number of threads, not the number of child processes
However, if we add a couple of print statements immediately after the call to getrlimit()
and again after the call to setrlimit()
the result is:
if( getrlimit(RLIMIT_NPROC, &rlim) == -1 )
{
perror( "getrlimit failed" );
exit( EXIT_FAILURE );
}
printf( "soft limit: %d\n", (int)rlim.rlim_cur );
printf( "hard limit: %d\n\n", (int)rlim.rlim_max );
if( setrlimit(RLIMIT_NPROC, &rlim) == -1 )
{
perror( "setrlimit failed" );
exit( EXIT_FAILURE );
}
if( getrlimit(RLIMIT_NPROC, &rlim) == -1 )
{
perror( "getrlimit failed" );
exit( EXIT_FAILURE );
}
printf( "soft limit: %d\n", (int)rlim.rlim_cur );
printf( "hard limit: %d\n\n", (int)rlim.rlim_max );
then the result is:
soft limit: 27393
hard limit: 27393
soft limit: 27393
hard limit: 27393
parent pid: 5516
child pid: 5517
parent pid: 5516
child pid: 5518
parent pid: 5516
child pid: 5519
parent pid: 5516
child pid: 5520
which indicates that call to: setrlimit()
did not actually change the limits for child processes
Note: I'm running ubuntu linux 18.04
Apache 2.4 hits rlimit_nproc: hidden processes?
Found the problem thanks to the suggestion from @sarnold. My Application depends on mpm_prefork
and up till Ubuntu 13.04, this module was automatically enabled when the apache2-mpm-prefork
package is installed. I assumed this was still the case, but it turned out that it was running mpm_event
.
It seems that in Apache 2.4 the packaging of MPM's has changed and mpm_prefork
needs to be enabled manually after installation:
sudo a2dismod mpm_event
sudo a2enmod mpm_prefork
sudo service apache2 restart
Now the problems seem to have disappeared.
Multiple instances of Python running simultaneously limited to 35
Decomposing the Error Message
Your error message includes the following hint:
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
The RLIMIT_NPROC
variable controls the total number of processes that user can have. More specifically, as it is a per process setting, when fork()
, clone()
, vfork()
, &c are called by a process, the RLIMIT_NPROC
value for that process is compared to the total process count for that process's parent user. If that value is exceeded, things shut down, as you've experienced.
The error message indicates that OpenBLAS was unable to create additional threads because your user had used all the threads RLIMIT_NPROC
had given it.
Since you're running on a cluster, it's unlikely that your user is running many threads (unlike, say, if you were on your personal machine and browsing the web, playing music, &c), so it's reasonable to conclude that OpenBLAS is trying to start multiple threads.
How OpenBLAS Uses Threads
OpenBLAS can use multiple threads to accelerate linear algebra. You may want many threads for solving a single, larger problem quickly. You may want fewer threads for solving many smaller problems simultaneously.
OpenBLAS has several ways to limit the number of threads it uses. These are controlled via:
export OPENBLAS_NUM_THREADS=4
export GOTO_NUM_THREADS=4
export OMP_NUM_THREADS=4
The priorities are OPENBLAS_NUM_THREADS > GOTO_NUM_THREADS > OMP_NUM_THREADS. (I think this means that OPENBLAS_NUM_THREADS
overrides OMP_NUM_THREADS
; however, OpenBLAS ignores OPENBLAS_NUM_THREADS
and GOTO_NUM_THREADS
when compiled with USE_OPENMP=1
.)
If none of the foregoing variables are set, OpenBLAS will run using a number of threads equal to the number of cores on your machine (32 on your machine)
Your Situation
Your cluster has 32-core CPUs. You're trying to run 36 instances of Python. Each instance requires 1 thread for Python + 32 threads for OpenBLAS. You'll also need 1 thread for your SSH connection and 1 thread for your shell. That means that you need 36*(32+1)+2=1190 threads.
The nuclear option for fixing the problem is to use:
export OPENBLAS_NUM_THREADS=1
which should bring you down to 36*(1+1)+2=74 threads.
Since you have spare capacity, you could adjust OPENBLAS_NUM_THREADS
to a higher value, but then the OpenBLAS instances owned by your separate Python processes will interfere with each other. So there's a trade-off between how fast you get one solution versus how fast you can get many solutions. Ideally, you can solve this trade-off by running fewer Pythons per node and using more nodes.
Is there a programmatic way in C to determine the number of processes ever used in a group of processes under Linux?
To enforce the RLIMIT_NPROC
limit, linux kernel reads &p->real_cred->user->processes
field in copy_process
function (on fork()
for example)
http://lxr.free-electrons.com/source/kernel/fork.c?v=4.8#L1371
1371 if (atomic_read(&p->real_cred->user->processes) >=
1372 task_rlimit(p, RLIMIT_NPROC)) {
or in sys_execve
(do_execveat_common
in fs/exec.c):
1504 if ((current->flags & PF_NPROC_EXCEEDED) &&
1505 atomic_read(¤t_user()->processes) > rlimit(RLIMIT_NPROC)) {
1506 retval = -EAGAIN;
1507 goto out_ret;
So, if the processes
is larger than RLIMIT_NPROC, function will fail. This field is defined as part of struct user_struct
(accessed with struct cred
real_cred in sched.h as
atomic_t processes; /* How many processes does this user have? */
So the process count accounting is per-user.
There is decrement of the field in copy_process in case of fail:
1655 bad_fork_cleanup_count:
1656 atomic_dec(&p->cred->user->processes);
And increment of the field is in copy_cred
: http://code.metager.de/source/xref/linux/stable/kernel/cred.c#313
313 /*
314 * Copy credentials for the new process created by fork()
315 *
316 * We share if we can, but under some circumstances we have to generate a new
317 * set.
318 *
319 * The new process gets the current process's subjective credentials as its
320 * objective and subjective credentials
321 */
322 int copy_creds(struct task_struct *p, unsigned long clone_flags)
339 atomic_inc(&p->cred->user->processes);
372 atomic_inc(&new->user->processes);
man page says that it is per-user limit: http://man7.org/linux/man-pages/man2/setrlimit.2.html
RLIMIT_NPROC
The maximum number of processes (or, more precisely on Linux,
threads) that can be created for the real user ID of the
calling process. Upon encountering this limit, fork(2) fails
with the error EAGAIN.
Related Topics
Yocto for Nvidia Jetson Fails Because of Gcc 7 - Cannot Compute Suffix of Object Files
Bash Script to Install Postgresql - Not Working
Ftrace: System Crash When Changing Current_Tracer from Function_Graph via Echo
Copy Failed: Stat /Var/Lib/Docker/Tmp/Docker-Builder700869788/Private: No Such File or Directory
How to Switch Between Different Versions of Julia (Specifically Between V0.3 and V0.4 on Ubuntu)
Kernel Preemption While Holding Spinlock
I2C Write Acknowledge Polling in Linux Kernel
Giving Linux User Git Access But Not Shell Access
How to Check If Emacs in Frame or in Terminal
Generic Printing Using a Usb Port
Relative-To-Executable Path to Ld-Linux Dynamic Linker/Interpreter
How to Automatically Close The Execution of The 'Qemu' After End of Process
Vim Pauses If Echo in .Vimrc File