Linux Pid Recycling

Linux PID recycling

As new processes fork in, PIDs will increase to a system-dependent limit and then wrap around. The kernel will not reuse a PID before this wrap-around happens.

The limit (maximum number of pids) is /proc/sys/kernel/pid_max. The manual says:

/proc/sys/kernel/pid_max (since Linux 2.5.34)

This file specifies the value at which PIDs wrap around (i.e., the
value in this file is one greater than the maximum PID). The default
value for this file, 32768, results in the same range of PIDs as on
earlier kernels

How are PIDs generated?

As wikipedia says,

Under Unix, process IDs are usually
allocated on a sequential basis,
beginning at 0 and rising to a maximum
value which varies from system to
system. Once this limit is reached,
allocation restarts at zero and again
increases. However, for this and
subsequent passes any PIDs still
assigned to processes are skipped.

so it's really a very simple policy for "generation", just increment a counter, and "recycling", just wrap the number around at a max value and keep incrementing until you find a number that was assigned to a process that has finished and has been removed from the process table.

Some Unix implementations such as AIX use a policy that's less simple, see e.g. this FAQ.

Does the PID of a child process become available for reuse if the parent process is still running?

The PID of a child process becomes available for reuse when the parent process calls wait or waitpid (or any other function of that family such as wait3, wait4, etc.).

When the child dies, it stays behind as a zombie — an entry in the process table with no process behind it, which remains behind just to reserve the process ID and store the exit status. Calling waitpid blocks until the designated child process dies (or returns immediately if it's already dead), retrieves the child's status code, and reaps the zombie (i.e. removes the process table entry, freeing the process ID for reuse). Calling wait is similar, but returns as soon as one child process has died.

If the parent process ignores the SIGCHLD signal at the time the process dies, then the process is not turned into a zombie and its PID becomes available for reuse immediately. The parent's status vis-à-vis SIGCHLD matters in other ways; see e.g. POSIX for the gritty details.

If the parent process dies before the child, the child is said to be an orphan adopted by init, the process with PID 1. It is part of init's job to reap orphans.

In a shell script, the wait builtin with is a wrapper around the wait system call. If the script has multiple children, wait with no argument blocks until all of them have died, and wait with some arguments blocks until all the specified processes have died (there's no way to wait until one process as died without specifying which). If wait $pid1 returns, it's possible that $pid2 has already died and has been reused for another process; however, the shell keeps track of $pid2's status code even so, and a subsequent wait $pid2 will return its status code. You should not fork a new background job until then, however, to avoid confusion in case $pid2 was reused to a background job.

In Unix-ish environments, is PID wraparound guaranteed to change process start time?

Since the assignment of PIDs and proc table management in general is not defined by any standard it's literally impossible to do what you want in a portable way.

You will need to do as you say and develop multiple platform-specific implementations to gather enough information about a process to determine a unique identity for every process.

On the other hand if you don't need this information in real time as the processes are started and while they are still running you can, on most unix-y systems, simply turn on process accounting and have a guaranteed unique and complete record of every process that has been run by the system. Process accounting files are not standardized either, but there will be header files defining their record format, and there should be tools on each type of system which can process and summarize accounting files in various ways.

Linux thread id recycle strategy

A threaded linux process has

an OS pid shared by all threads within the process - use getpid
each thread within the process has its own OS thread id - use gettid
a pthreads thread id used internally by pthreads to identify threads when making various pthread related calls - use pthread_self and similar.

It can't be determine from your question if you trying to implement a "recycle strategy" or why you think you need to do so.

Edit

As an idle curiosity you can look through the linux pthread code but technically you have no reason to care. The POSIX spec basically just says the thread id is guaranteed to be unique within a process and is free to be reused after a thread dies.

Although implementations may have thread IDs that are unique in a system, applications should only assume that thread IDs are usable and unique within a single process. The effect of calling any of the functions defined in this volume of IEEE Std 1003.1-2001 and passing as an argument the thread ID of a thread from another process is unspecified. A conforming implementation is free to reuse a thread ID after the thread terminates if it was created with the detachstate attribute set to PTHREAD_CREATE_DETACHED or if pthread_detach() or pthread_join() has been called for that thread.

Is it possible to limit the pid of child processes in Linux bash?

The PIDs don't "explode". They are recycled by the kernel. You can see the maximum PID number in /proc/sys/kernel/pid_max. Of course, you can modify this value, if you wish.

Linux Pid Recycling