Why Do I Have to 'Wait()' for Child Processes

Why do I have to `wait()` for child processes?

I'll probably have to add a process table that stores the child pids
and have to use waitpid - not immideately, but after some time has
passed - which is a problem, because the running time of the children
varies from few microseconds to several minutes. If I use waitpid too
early, my parent process will get blocked

Check out the documentation for waitpid. You can tell waitpid to NOT block (i.e., return immediately if there are no children to reap) using the WNOHANG option. Moreover, you don't need to give waitpid a PID. You can specify -1, and it will wait for any child. So calling waitpid as below fits your no-blocking constraint and no-saving-pids constraint:

waitpid( -1, &status, WNOHANG );

If you really don't want to properly handle process creation, then you can give the reaping responsibility to init by forking twice, reaping the child, and giving the exec to the grandchild:

pid_t temp_pid, child_pid;
temp_pid = fork();
if( temp_pid == 0 ){
    child_pid = fork();
    if( child_pid == 0 ){
        // exec()
        error( EXIT_FAILURE, errno, "failed to exec :(" );
    } else if( child_pid < 0 ){
        error( EXIT_FAILURE, errno, "failed to fork :(" );
    }
    exit( EXIT_SUCCESS );
} else if( temp_pid < 0 ){
    error( EXIT_FAILURE, errno, "failed to fork :(" );
} else {
    wait( temp_pid );
}

In the above code snippet, the child process forks its own child, immediately exists, and then is immediately reaped by the parent. The grandchild is orphaned, adopted by init, and will be reaped automatically.

Why does Linux keep zombies at all? Why do I have to wait for my
children? Is this to enforce discipline on parent processes? In
decades of using Linux I have never got anything useful out of zombie
processes, I don't quite get the usefulness of zombies as a "feature".
If the answer is that parent processes need to have a way to find out
what happened to their children, then for god's sake there is no
reason to count zombies as normal processes and forbid the creation of
non-zombie processes just because there are too many zombies.

How else do you propose one may efficiently retrieve the exit code of a process? The problem is that the mapping of PID <=> exit code (et al.) must be one to one. If the kernel released the PID of a process as soon as it exits, reaped or not, and then a new process inherits that same PID and exits, how would you handle storing two codes for one PID? How would an interested process retrieve the exit code for the first process? Don't assume that no one cares about exit codes simply because you don't. What you consider to be a nuisance/bug is widely considered useful and clean.

On the system I'm currently developing for I can only spawn 400 to 500
processes before everything grinds to halt (it's a badly maintained
CentOS system running on the cheapest VServer I could find - but still
400 zombies are less than a few kB of information)

Something about making a widely accepted kernel behavior a scapegoat for what are clearly frustrations over a badly-maintained/cheap system doesn't seem right.

Typically, your maximum number of processes is limited only by your memory. You can see your limit with:

cat /proc/sys/kernel/threads-max

What if the child exits before the parent calls wait()?

But what if the kernel decides to schedule the child first and the
child process terminates before parent can call the wait()?

It is a pretty possible case. If one of the wait family functions is used by the parent or signal(SIGCHLD, SIG_IGN); is called explicitly before forking, it does not turn the child into a zombie even if the parent process is preempted(=not permitted to use CPU at that time).

Moreover, the need of wait or signal-ignorance mentioned is to clean process's unused datas. While using one of the methods, the kernel is told that the child(ren) process is not used anymore. So, you can cleanup unused system resources.

Fork() and Wait() with execvp() in C

As Jonathan Leffler said the problem was with the args, execvp() needs that the array off args has in the end NULL.

and that fixed the problem.

the Right code :

    char str1[LINELEN + 1];
    char str2[LINELEN + 1];
    int childReturns = 1;
    if (argc != 2)
        return -1;

    char *prog = progName(argv[1]);
    if (prog == NULL)
        return -1;
    char *args[4];
    args[3] = NULL;
    args[0] = prog;
    while (1)
    {
        printf("Enter string:");
        if (mygets(str1, LINELEN) == NULL)
            break;
        printf("Enter  string:");
        if (mygets(str2, LINELEN) == NULL)
            break;
        args[1] = str1;
        args[2] = str2;
        int processId = fork();
        if (processId == 0)
        {
            execvp(prog, args);
        }
        else
        {
            wait(&childReturns); // Wait for the child
            printf("Child  code is %d\n", WEXITSTATUS(childReturns));
        }
    }
    free(prog);
    return 0;```

Why child process not getting exited before parent process calls wait() function?

Upon exit, the child leaves an exit status that should be returned to the parent. So, when the child finishes it becomes a zombie.

Whenever the child exits or stops, the parent is sent a SIGCHLD signal.
The parent can use the system call wait() or waitpid() along with the macros WIFEXITED and WEXITSTATUS with it to learn about the status of its stopped child.

If the parent exits, than you can see your children still as zombie processes (unwaited children ).

wait() just tells you which child exited so you can get the exit code. If you have more children running, then of course, others could have terminated in the meantime as well.

If you don't care about the exit status, then wait() is just fine, but you still have to wait on all children you started.

python multiprocessing - child process blocking parent process

But from my understanding, p.join() is to tell the program to wait for
this thread/process to finish before ending the program.

Nope, It blocks the main thread right then and there until the thread / process finishes. By doing that right after you start the process, you don't let the loop continue until each process completes.

It would be better to collect all the Process objects you create into a list, so they can be accessed after the loop creating them. Then in a new loop, wait for them to finish only after they are all created and started.

#for example
processes = []
for i in whatever:
    p = Process(target=foo)
    p.start()
    processes.append(p)
for p in processes:
    p.join()

If you want to be able to do things in the meantime (while waiting for join), it is most common to use yet another thread or process. You can also choose to only wait a short time on join by giving it a timeout value, and if the process doesn't complete in that amount of time, an exception will be thrown which you can catch with a try block, and decide to go do something else before trying to join again.

how i can suspend child process and reset it

You need to use waitpid() rather than wait(). Using wait() will wait for a child process to be terminated, which means the processes must have exited, for example from SIGKILL or SIGSEGV. You are trying to wait for the process to be stopped, which is not the same thing. Stopping a process just pauses it and it allows it to be continued later. It doesn't exit.

You need to use the WUNTRACED flag with waitpid() to wait for a child processes to be terminated or stopped. Such as waitpid(child_pid, &status, WUNTRACED) or waitpid(-1, &status, WUNTRACED).

There is another flaw related to this. WIFSIGNALLED() and WTERMSIG() also apply only to a processing which was terminated by a signal. You should instead use WIFSTOPPED() and WSTOPSIG() to detect if the child was stopped.

Also note:

SIGSTOP can also stop a processes.
You may receive a SIGTSTP anytime after the signal handler is set, which could be before child_pid is assigned a value.
wait() and friends can return -1 with errno set to EINTR if your processes is interrupted by a signal.

cant run parent process after all child processes have been terminated.

Here you are creating a child process in each iteration of the loop and then waiting for it in the same iteration. So by the end of one iteration one child process is created, it prints then exits, parent wakes from wait and it prints and thus you get the first two lines.

Similar output follow for the next iterations, hence you get two lines for each iteration of the loop and it looks like parent is executing before child but it's not.

If you want to call the parent process after all the child process have finished then do the following.

Introduce a global variable isParent which is true if the current process is the parent. Initialize it to zero

int isParent = 0;

Then in the loop, instead of calling parentProcess() set isParent to 1

for (int file = 0; file < files_count; file++) {
    pid_t pid = fork();
    int file_loc = file + 2;

    if (pid == 0) {
        // child process
        occurrences_in_file(argv[file_loc], argv[1]);
        break;
    } else if (pid > 0) {
        // parent process
        isParent = 1;
    } else {
        // fork failed
        printf("fork() failed!\n");
        return 1;
    }

}

Then after the for loop call parentProcess if isParent is set

if(isParent){
    ParentProcess(files_count)
}

Then in the parentProcess(int numChildren) call wait over all the child processes.

void parentProcess(int numChildren);
void parentProcess(int numChildren) {

while (true) {
    int status;
    int i;
    for(i = 0;i < numChildren; i++){
        pid_t done = wait(&status);
        if (done == -1) {
            if (errno == ECHILD){

                cout << "parent process done"<< endl;
                break; // no more child processes
            }
        } else {
            if (!WIFEXITED(status) || WEXITSTATUS(status) != 0) {
                std::cerr << "pid " << done << " failed" << endl;
                _exit(1);
            }
        }
    }   
}

waitpid() and fork() to limit number of child processes

At the risk of being obvious, you basically want to just drop the else clause. The logic you're looking for is something like:

int max_active = 3; // or whatever
int number_active = 0;
bool done = false;

for (; !done; ++number_active) {
  // wait for something to do;
  GetSomeWork();
  // wait for something to finish, if necessary.
  for (; number_active >= max_active; --number_active)
    wait(&status);
  pid = fork();
  if (pid < 0)
    ReportErrorAndDie();
  if (pid == 0)
    DoTheWorkAndExit();
}

This actually lets you change the value of max_active without restarting, which is the only justification for the for loop around the wait() call.

The obvious complaint is that number_active in my version doesn't actually tell you how many processes are active, which is true. It tells you how many processes you haven't wait()'ed for, which means that you might keep some zombies (but the number is limited). If you're constantly running at or close to the maximum number of tasks, this doesn't matter, and unless your maximum is huge, it doesn't matter anyway, since the only Design Requirement was that you don't use more than the maximum number of tasks, and consequently you only have to know that the number active is not more than the maximum.

If this really bothers you and you want to clean the tasks up, you can put:

for (; waitpid(-1, &status, WNOHANG) > 0; --number_active) {}

before the other for loop, which will reap the zombies before checking if you need to block. (I can't remember if waitpid(-1, &status WNOHANG) returns an error if there are no processes at all, but in any event there's no point continuing the loop on an error.)

Why Do I Have to 'Wait()' for Child Processes