What Is The Differences and Relationships Between "Process", "Threads", "Task" and "Jobs" in Linux

What is the differences and relationships between process, threads, task and jobs in Linux?

The distinction between process and thread is fairly universal to all operating systems. A process usually represents an independent execution unit with its own memory area, system resources and scheduling slot.

A thread is typically a "division" within the process - threads usually share the same memory and operating system resources, and share the time allocated to that process. For example, when you open your Browser and Microsoft Word, each is a different process, but things that happen in the background of each (like animations, refreshes or backups) can be threads.

A job is usually a long-running unit of work executed by a user. The job may be "handled" by one or more processes. It might not be interactive. For instance, instructing the machine to zip a large file or to run some processing script on a large input file would typically be a job. The naming is relatively historic - Mainframes used to process jobs. In UNIX systems, many jobs are started automatically at prescheduled times using cron, so you have the notion of 'cron jobs'.

What is the difference between a thread/process/task?

Process:

A process is an instance of a computer program that is being executed.
It contains the program code and its current activity.
Depending on the operating system (OS), a process may be made up of multiple threads of execution that execute instructions concurrently.
Process-based multitasking enables you to run the Java compiler at the same time that you are using a text editor.
In employing multiple processes with a single CPU,context switching between various memory context is used.
Each process has a complete set of its own variables.

Thread:

A thread is a basic unit of CPU utilization, consisting of a program counter, a stack, and a set of registers.
A thread of execution results from a fork of a computer program into two or more concurrently running tasks.
The implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process. Multiple threads can exist within the same process and share resources such as memory, while different processes do not share these resources.
Example of threads in same process is automatic spell check and automatic saving of a file while writing.
Threads are basically processes that run in the same memory context.
Threads may share the same data while execution.
Thread Diagram i.e. single thread vs multiple threads

Task:

A task is a set of program instructions that are loaded in memory.

What is the difference between a process and a thread?

Both processes and threads are independent sequences of execution. The typical difference is that threads (of the same process) run in a shared memory space, while processes run in separate memory spaces.

I'm not sure what "hardware" vs "software" threads you might be referring to. Threads are an operating environment feature, rather than a CPU feature (though the CPU typically has operations that make threads efficient).

Erlang uses the term "process" because it does not expose a shared-memory multiprogramming model. Calling them "threads" would imply that they have shared memory.

Linux Kernel: Threading vs Process - task_struct vs thread_info

Threads in Linux are treated as processes that just happen to share some resources. Each thread has its own thread_info (at the bottom of the stack like you said) and its own task_struct. I can think of two reasons why they are maintained as separate structures.

  1. thread_info is architecture dependent. task_struct is generic.
  2. thread_info cuts into the size of the kernel stack for that process, so it should be kept small. thread_info is placed at the bottom of the stack as a micro-optimization that makes it possible to compute its address from the current stack pointer by rounding down by the stack size saving a CPU register.

Task vs. process, is there really any difference?

Processes and threads are the mechanics, task is more conceptual. You can queue a chuck of work to run asynchronously, on windows with .NET for example, this gets run on a thread from the thread pool. With OpenMP, a task would be part of your for loop running on one core.

Minor related notes: on windows, there are also jobs, thread pools, and fibers for mechanics. Also, a process is nothing without at least one thread running.

How is a process and a thread the same thing in Linux?

Linux didn't use to have special support for (POSIX) threads, and it simply treated them as processes that shared their address space as well as a few other resources (filedescriptors, signal actions, ...) with other "processes".

That implementation, while elegant, made certain things required for threads by POSIX difficult, so Linux did end up gaining that special support for threads and your premise is now no longer true.

Nevertheless, processes and threads still both remain represented as tasks within the kernel (but now the kernel has support for grouping those tasks into thread groups as well and APIs for working with those ((tgkill, tkill, exit_group, ...)).

You can google LinuxThreads and NPTL threads to learn more about the topic.

What is the difference between task and thread?

A task is something you want done.

A thread is one of the many possible workers which performs that task.

In .NET 4.0 terms, a Task represents an asynchronous operation. Thread(s) are used to complete that operation by breaking the work up into chunks and assigning to separate threads.

How to choose between process and threads

If your application will consist of separate, individually-usable components that communicate through well-defined protocols, each of which performs jobs that can individually succeed or fail without complicating the logic of the other components, then it's perfectly reasonable to write an application that utilizes multiple processes. A good example of an application that could be constructed this way is a MTA (mail transport agent).

If on the other hand the concurrency will involve lots of shared data/state where continuance of one flow of execution depends on the result of another, you really should be using threads. The biggest advantages of threads over processes are:

  1. Access to efficient synchronization objects without having to setup your own shared memory for them to live in.
  2. Atomicity of the process: it's impossible for some threads to be terminated by factors beyond your control (like the user issuing a kill command) and others to live on. This is very important because in applications with complex synchronization requirements, unexpected asynchronous termination of one flow of execution (especially, for example, with a lock still held) could make it impossible for others to continue safely.
  3. The existence of threads is completely transparent to other parts of the program that don't want/need to be aware of them (whereas creating child processes has global state issues with respect to waiting for exit status, SIGCHLD, etc.).

In addition, threads have some other minor practical advantages:

  1. Faster creation/exit times (usually 2-3x faster than fork and at least 20x faster than fork+exec).
  2. Improved ability for the kernel to make scheduling fair between applications.
  3. (And probably some others I'm not thinking of...)

And a few practical disadvantages:

  1. Unnecessary contention for locks in standard library functions like malloc.
  2. Ability of buggy code in one flow of execution to corrupt the state of others.
  3. Inability to have different privilege levels for each thread.

The only time I would consider using separate processes instead of threads when the conceptual best-fit for the problem is threads is in cases where using separate processes provides a huge advantage to your security model (i.e. privilege separation ala vsftpd and openssh).



Related Topics



Leave a reply



Submit