What Does Waitpid() Do

Example of waitpid() in use?

Syntax of waitpid():

pid_t waitpid(pid_t pid, int *status, int options);

The value of pid can be:

< -1: Wait for any child process whose process group ID is equal to the absolute value of pid.
-1: Wait for any child process.
0: Wait for any child process whose process group ID is equal to that of the calling process.
> 0: Wait for the child whose process ID is equal to the value of pid.

The value of options is an OR of zero or more of the following constants:

WNOHANG: Return immediately if no child has exited.
WUNTRACED: Also return if a child has stopped. Status for traced children which have stopped is provided even if this option is not specified.
WCONTINUED: Also return if a stopped child has been resumed by delivery of SIGCONT.

For more help, use man waitpid.

Developing a correct understanding of waitpid() and getpid()

Your understanding of getpid is correct, it returns the PID of the running process.

waitpid is used (as you said) to block the execution of a process (unless
WNOHANG is passed) and resume execution when a (or more) child of the process
ends. waitpid returns the pid of the child whose state has changed, -1 on
failure. It also can return 0 if WNOHANG has specified but the child has not
changed the state. See:

man waitpid

RETURN VALUE

waitpid(): on success, returns the process ID of the child whose state has changed; if WNOHANG
was specified and one or more child(ren) specified by pid exist, but have not yet changed state,
then 0 is returned. On error, -1 is returned.

Depending on the arguments passed to waitpid, it will behave differently. Here
I'l quote the man page again:

man waitpid
pid_t waitpid(pid_t pid, int *wstatus, int options);
...

The waitpid() system call suspends execution of the calling process until a child specified by pid argument
has changed state. By default, waitpid() waits only for terminated children, but this behavior is modifiable
via the options argument, as described below:

The value of pid can be:

< -1: meaning wait for any child process whose process group ID is equal to the absolute value of pid.

-1: meaning wait for any child process.

0: meaning wait for any child process whose process group ID is equal to that of the calling process.

> 0: meaning wait for the child whose process ID is equal to the value of pid.

The value of options is an OR of zero or more of the following constants:

WNOHANG: return immediately if no child has exited.

WUNTRACED also return if a child has stopped (but not traced via ptrace(2)).
Status for traced children which have stopped is provided even if this option is not specified.

WCONTINUED (since Linux 2.6.10) also return if a stopped child has been resumed by delivery of SIGCONT.

I'm a little unsure as to how waitpid() manages all this

waitpid is a syscall and the OS handles this.

How does this change depending on whether or not a parent or child calls it, and whether or not a child process is still running, or has terminated?

wait should only be called by a process that has executed fork(). So the parent
process should cal wait()/waitpid. If the child process hasn't called
fork(), then it doesn't need to call either one of these functions. If however
the child process has called fork(), then it also should call
wait()/waitpid().

The behaviour of these function is very well explained in the man page, I quoted the important parts of it. You should read the whole man page
to get a better understanding of it.

How do I interpret waitpid function doesn't wait for the child that terminates first;?

If you complete the quotation (as it is now completed in the question), you see that waitpid() is more flexible (than wait()).

The POSIX specification says:

The pid argument specifies a set of child processes for which status is requested. The waitpid() function shall only return the status of a child process from this set:
If pid is equal to (pid_t)-1, status is requested for any child process. In this respect, waitpid() is then equivalent to wait().
If pid is greater than 0, it specifies the process ID of a single child process for which status is requested.
If pid is 0, status is requested for any child process whose process group ID is equal to that of the calling process.
If pid is less than (pid_t)-1, status is requested for any child process whose process group ID is equal to the absolute value of pid.

So, as the quote says, waitpid() doesn't always (only) wait for the first child to die; you can select much more precisely which child or children you are interested in. Additionally, you have options like WNOHANG that mean that the function returns if there is no child that has died that meets the criterion (but there's at least one process that would meet the criterion). You get a different status if there are no children that meet the criteria.

waitpid, wnohang, wuntraced. How do I use these

If you pass -1 and WNOHANG, waitpid() will check if any zombie-children exist. If yes, one of them is reaped and its exit status returned. If not, either 0 is returned (if unterminated children exist) or -1 is returned (if not) and ERRNO is set to ECHILD (No child processes). This is useful if you want to find out if any of your children recently died without having to wait for one of them to die. It's pretty useful in this regard.

The option WUNTRACED is documented as below, I have nothing to add to this description:

WUNTRACED The status of any child processes specified by pid that are stopped, and whose status has not yet been reported since they stopped, shall also be reported to the requesting process.

Read the waitpid page from POSIX for more details.

What does wait and waitpid are always interrupted when a signal is caught mean?

The original behaviour of signal() (System-V semantics) was, to interrupt any system call if the process currently slept, execute the signal handler and the system call returns with -EINTR. Then, BSD4.3 invented the restart mechanism, which would restart any system call automatically after it was interrupted. This avoids having to write a loop for each syscall if signal handlers are involved.

Linux did not change the semantics of the signal() syscall. However, the signal() glibc wrapper function nowadays calls the syscall sigaction() with SA_RESTART flag by default. So, if you do not need the restart behaviour, you have to call sigaction() and omit that flag.

So, your code indeed makes use of the restart mechanism on both BSD and linux

What does the second parameter of waitpid() mean?

It is a bit-field for options, the only one available is WNOWAIT, which means to leave the child in a waitable state; a later wait call can be used to again retrieve the child status information.

See: http://linux.die.net/man/2/waitpid