What Does "Ulimit -S Unlimited" Do

What does ulimit -s unlimited do?

When you call a function, a new "namespace" is allocated on the stack. That's how functions can have local variables. As functions call functions, which in turn call functions, we keep allocating more and more space on the stack to maintain this deep hierarchy of namespaces.

To curb programs using massive amounts of stack space, a limit is usually put in place via ulimit -s. If we remove that limit via ulimit -s unlimited, our programs will be able to keep gobbling up RAM for their evergrowing stack until eventually the system runs out of memory entirely.

int eat_stack_space(void) { return eat_stack_space(); }
// If we compile this with no optimization and run it, our computer could crash.

Usually, using a ton of stack space is accidental or a symptom of very deep recursion that probably should not be relying so much on the stack. Thus the stack limit.

Impact on performace is minor but does exist. Using the time command, I found that eliminating the stack limit increased performance by a few fractions of a second (at least on 64bit Ubuntu).

Segfault with ulimit -s unlimited

As Mark Plotnick explained, the problem comes from the default stack size for thread. Having ulimit -s unlimited gives you a stack size of 2MB, a small size by today's standards.

Looking more closely into pthread code, in __pthread_initialize_minimal_internal()

 /* Determine the default allowed stack size.  This is the size used
     in case the user does not specify one.  */
  struct rlimit limit;
  if (__getrlimit (RLIMIT_STACK, &limit) != 0
      || limit.rlim_cur == RLIM_INFINITY)
    /* The system limit is not usable.  Use an architecture-specific
       default.  */
    limit.rlim_cur = ARCH_STACK_DEFAULT_SIZE;
  else if (limit.rlim_cur < PTHREAD_STACK_MIN)
    /* The system limit is unusably small.
       Use the minimal size acceptable.  */
    limit.rlim_cur = PTHREAD_STACK_MIN;

  /* Make sure it meets the minimum size that allocate_stack
     (allocatestack.c) will demand, which depends on the page size.  */
  const uintptr_t pagesz = GLRO(dl_pagesize);
  const size_t minstack = pagesz + __static_tls_size + MINIMAL_REST_STACK;
  if (limit.rlim_cur < minstack)
    limit.rlim_cur = minstack;

  /* Round the resource limit up to page size.  */
  limit.rlim_cur = ALIGN_UP (limit.rlim_cur, pagesz);
  lll_lock (__default_pthread_attr_lock, LLL_PRIVATE);
  __default_pthread_attr.stacksize = limit.rlim_cur;
  __default_pthread_attr.guardsize = GLRO (dl_pagesize);
  lll_unlock (__default_pthread_attr_lock, LLL_PRIVATE);

The man-page of pthread_create(3) only mentions the default stack size for Linux/x86-32, but here are the values for other architectures in glibc 2.3.3 and up (ARCH_STACK_DEFAULT_SIZE):

Sparc-32 : 2MB
Sparc-64 : 4MB
PowerPC : 4MB
S/390 : 2MB
IA-64 : 32MB
i386 : 2MB
x86_64 : 2MB

I have submitted a patch to the man-page to include this information. Thank you again for your help in investigating this issue.

what is difference between (ulimit -s unlimited) and (export KMP_STACKSIZE = xx)?

ulimit sets the OS limits for the program.

KMP_STACKSIZE tells the OpenMP implementation about how much stack to actually allocate for each of the stacks. So, depending on your OS defaults you might need both. BTW, you should rather use OMP_STACKSIZE instead, as KMP_STACKSIZE is the environment variable used by the Intel and clang compilers. OMP_STACKSIZE is the standard way of setting the stack size of the OpenMP threads.

Note, that this problem is usually more exposed, as Fortran tends to keep more data on the stack, esp. arrays. Some compilers can move such arrays to the heap automatically, see for instance -heap-arrays for the Intel compiler.

ulimit stack size through slurm script

ulimit is a shell built-in command. Resource limits set with it are not system-wide and only apply to processes started in the same shell session and their descendants. When you SSH into a node and execute ulimit, it shows you the limits in that particular shell session, not the limits applied to the processes in the job, even if some of them are running on the same node.

Also, --propagate=STACK propagates the resource limits of the shell session where you execute the sbatch command and not the limits set in the job script:

PropagateResourceLimits
A list of comma separated resource limit names. The slurmd daemon uses these names to obtain the associated (soft) limit values from the users process environment on the submit node. These limits are then propagated and applied to the jobs that will run on the compute nodes.

Thus, the ulimit -s unlimited inside the job script only applies to the shell process started by SLURM when the job is executed and unless mpirun propagates the limits further onto the processes it spawns, they will inherit the system default stack size limit instead. But if you do:

$ ulimit -s unlimited
$ sbatch --propagate=STACK foo.sh

(or have #SBATCH --propagate=STACK inside foo.sh as you do), then all processes spawned by SLURM for that job will already have their stack size limit set to unlimited.

How to change the stack size using ulimit or per process on Mac OS X for a C or Ruby program?

Apparently there is a hard limit on the stack size for mac os x, taken from http://lists.apple.com/archives/scitech/2004/Oct/msg00124.html granted this is quite old, and Im not sure if its still true anymore, but to set it simply call ulimit -s hard, its 65532. or about 65 megs.

I did some tests on snow leopard, 10.6.8, and it does seem to be true.

$ ulimit -a
...
stack size              (kbytes, -s) 8192
...
$ ulimit -s 65533
-bash: ulimit: stack size: cannot modify limit: Operation not permitted
$ ulimit -s 65532
$

I also found this http://linuxtoosx.blogspot.com/2010/10/stack-overflow-increasing-stack-limit.html though I haven't test it, so can't really say much about it.

When applications consume gigs of memory thats usually taken from the heap, the stack is usually reserve for local automatic variables that exist for a relatively small amount of time equivalent to the lifespan of the function call, the heap is where most of the persistent data lives.

here is a quick tutorial:

#include <stdlib.h>

#define NUMBER_OF_BYTES 10000000 // about 10 megs
void test()
{
   char stack_data[NUMBER_OF_BYTES];          // allocating on the stack.
   char *heap_data = malloc(NUMBER_OF_BYTES); // pointer (heap_data) lives on the stack, the actual data lives on the heap.
}

int main()
{   
    test(); 
    // at this point stack_data[NUMBER_OF_BYTES] and *heap_data have being removed, but malloc(NUMBER_OF_BYTES) persists.
    // depending on the calling convention either main or test are responssible for resetting the stack.
    // on most compilers including gcc, the caller (main) is responssible.

    return 0;
}

$ ulimit -a
...
stack size              (kbytes, -s) 8192
...
$ gcc m.c
$ ./a.out
Segmentation fault
$ ulimit -s hard
$ ./a.out
$

ulimit is only temporary you would have to update it every time, or update your corresponding bash script to set it automatically.

Once ulimit is set it can only be lowered never raised.

UNIX: What should be Stack Size (ulimit -s) in UNIX?

Most people rely on the stack being “large” and their programs not using all of it, simply because the size has been set so large that programs rarely fail because they run out of stack space unless they use very large arrays with automatic storage duration.

This is an engineering failure, in the sense that it is not engineering: A known and largely preventable source of complete failure is uncontrolled.

In general, it can be difficult to compute the actual stack needs of a program. Especially when there is recursion, a compiler cannot generally predict how many times a routine will be called recursively, so it cannot know how many times that routine will need stack space. Another complication is calls to addresses prepared at run-time, such as calls to virtual functions or through other pointers-to-functions.

However, compilers and linkers could provide some assistance. For any routine that uses a fixed amount of stack space, a compiler, in theory, could provide that information. A routine may include blocks that are or are not executed, and each block might have different stack space requirements. This would interfere with a compiler providing a fixed number for the routine, but a compiler might provide information about each block individually and/or a maximum for the routine.

Linkers could, in theory, examine the call tree and, if it is static and is not recursive, provide a maximum stack use for the linked program. They could also provide the stack use along a particular call subchain (e.g., from one routine through the chain of calls that leads to the same routine being called recursively) so that a human could then apply knowledge of the algorithm to multiple the stack use of the subchain by the maximum number of times it might be called recursively).

I have not seen compilers or linkers with these features. This suggests there is little economic incentive for developing these features.

There are times when stack use information is important. Operating system kernels may have a stack that is much more limited than user processes, so the maximum stack use of the kernel code ought (as a good engineering practice) to be calculated so that the stack size can be set appropriately (or the code redesigned to use less stack).

If you have a critical need for calculating stack space requirements, you can examine the assembly code generated by the compiler. In many routines on many computing platforms, a fixed number is subtracted from the stack pointer at the beginning of the routine. In the absence of additional subtractions or “push” instructions, this is the stack use of the routine, excluding further stack used by subroutines it calls. However, routines may contain blocks of code that contain additional stack allocations, so you must be careful about examining the generated assembly code to ensure you have found all stack adjustments.

Routines may also contain stack allocations computed at run-time. In a situation where calculating stack space is critical, you might avoid writing code that causes such allocations (e.g., avoid using C’s variable-length array feature).

Once you have determined the stack use of each routine, you can determine the total stack use of the program by adding the stack use of each routine along various routine-call paths (including the stack use of the start routine that runs before main is called).

This sort of calculation of the stack use of a complete program is generally difficult and is rarely performed.

You can generally estimate the stack use of a program by knowing how much data it “needs” to do its work. Each routine generally needs stack space for the objects it uses with automatic storage duration plus some overhead for saving processor registers, passing parameters to subroutines, some scratch work, and so on. Many things can alter stack use, so only an estimate can be obtained this way. For example, your sample program does not need any space for number. Since no result of declaring or using number is ever printed, the optimizer in your compiler can eliminate it. Your program only needs stack space for the start routine; the main routine does not need to do anything except return zero.

What Does "Ulimit -S Unlimited" Do