Process Management

Processes, Threads, and Coroutines#

What are Processes, Threads, and Coroutines#

	Process	Thread	Coroutine
Definition	The basic unit of resource allocation and ownership	The basic unit of program execution	A lightweight thread in user space, the basic unit of scheduling within a thread
Switching Situation	Saving the CPU environment of the process (stack, registers, page table, file handles, etc.) and setting up the CPU environment of the newly scheduled process	Saving and setting the program counter, a small number of registers, and stack contents	First saving the register context and stack, then restoring them when switching back
Switching Process	User mode -> Kernel mode -> User mode	User mode -> Kernel mode -> User mode	User mode (no kernel trap)
Owned Resources	CPU resources, memory resources, file resources, and handles	Program counter, registers, stack, and status word	Own register context and stack
Concurrency	Switching between different processes achieves concurrency, each occupying the CPU for parallel execution	Multiple threads within a process execute concurrently	Only one coroutine can execute at a time, while other coroutines are in a sleeping state, suitable for time-sharing processing of tasks
System Overhead	Switching virtual address space, switching kernel stack and hardware context, CPU cache invalidation, page table switching, high overhead	Only a small number of register contents need to be saved and set during switching, so overhead is low	Directly operating the stack incurs almost no kernel switching overhead, allowing lock-free access to global variables, making context switching very fast
Communication	Inter-process communication requires the operating system	Threads can directly read and write the process data segment (such as global variables) for communication

Switching Issues of Processes, Threads, and Coroutines#

Process Switching: Process switching is managed and scheduled by the operating system. When the operating system decides to switch to another process, it will save the context of the current process (including register values, program counter, etc.) and load the context of the next process. This process involves switching memory mappings, registers, and other resources, making the cost of switching processes relatively high. Process switching requires intervention from the operating system, resulting in relatively high overhead.
Thread Switching: Thread switching occurs within the same process and is scheduled by the operating system. Thread switching also requires saving and restoring register values, program counters, and other context information, but since threads share the process's address space and resources, the switching cost is relatively low. Thread switching still requires intervention from the operating system, but the overhead is much smaller than that of process switching.
Performance Overhead: Switching kernel stacks, switching hardware contexts, saving contents in registers, preserving the state of the previously executed flow, CPU cache invalidation. Page table lookup is a slow process, so a cache is typically used to cache frequently used address mappings, which speeds up page table lookups; this cache is the TLB. After a process switch, the page table must also be switched, and after the page table switch, the TLB becomes invalid, leading to a decrease in hit rate, which slows down the conversion of virtual addresses to physical addresses, resulting in slower program execution.
Coroutine Switching: Coroutine switching occurs at the application level and is explicitly managed by the coroutine library or the programmer. Coroutine switching does not require intervention from the operating system, making the switching cost very low. Coroutine switching typically only involves saving and restoring a small amount of context information, such as stack pointers and local variables. Since coroutine switching is cooperative, it requires explicitly yielding control to other coroutines, allowing better control over the overhead of coroutine switching.
Performance Overhead:
Save the CPU register state of the current coroutine, then load the CPU register state of the coroutine that needs to be switched in. This is done entirely in user mode; generally, a coroutine context switch takes at most tens of nanoseconds.

Coroutine switching is faster than thread switching for two main reasons:

Coroutine switching occurs entirely in user space; thread switching involves switching to privileged mode and must be completed in kernel space;
Coroutine switching does less work compared to thread switching; threads require switching between kernel and user modes, and the system call process.

Overall, process switching incurs the highest overhead, followed by thread switching, while coroutine switching has the lowest overhead. Coroutines, managed at the application level, can flexibly control the timing of switches to improve efficiency and concurrency performance. However, it is important to manage switch points reasonably in coroutines to avoid excessive switching that could lead to additional overhead.

How many threads can a process create at most?

In a 32-bit system, the user-mode virtual space is only 3G. If the stack space allocated when creating a thread is 10M, then a process can create at most around 300 threads. In a 64-bit system, the user-mode virtual space is large enough to be 128T, theoretically not limited by the size of virtual memory, but rather by system parameters or performance limitations.

Thread Crash, Process Crash

Generally, if a thread crashes due to illegal memory access, then the process will definitely crash. Why does the system allow the process to crash? This is mainly because, within a process, the address space of each thread is shared. Since it is shared, illegal access to an address by one thread can lead to memory uncertainty, which may affect other threads. Such operations are dangerous, and the operating system considers this likely to lead to a series of serious consequences, so it simply lets the entire process crash.

How does a process crash? What is the underlying mechanism? The answer is signals - kill (SIGSEGV).

Why does a thread crash not lead to a JVM process crash?

Because the JVM has defined its own signal handling function, intercepting the SIGSEGV signal to prevent both from crashing.

What are the communication methods?#

What are the methods of inter-process communication?#

Pipes/Anonymous Pipes: Used for communication between related parent and child processes or sibling processes.
Named Pipes: Anonymous pipes have no name and can only be used for communication between related processes. To overcome this limitation, named pipes were introduced. Named pipes strictly follow First In First Out (FIFO). Named pipes exist as disk files and can facilitate communication between any two processes on the same machine.
Signals: Signals are a more complex form of communication used to notify the receiving process that an event has occurred.
Message Queuing: A message queue is a linked list of messages with a specific format, stored in memory and identified by a message queue identifier. Both pipes and message queues follow the FIFO principle for communication data. Unlike pipes (anonymous pipes: files that only exist in memory; named pipes: exist on actual disk media or file systems), message queues are stored in the kernel and are only truly deleted when the kernel is restarted (i.e., the operating system is restarted) or a message queue is explicitly deleted. Message queues allow for random querying of messages; messages do not necessarily have to be read in FIFO order and can also be read by message type, providing advantages over FIFO. Message queues overcome the limitations of signals carrying little information and pipes being limited in buffer size.
Semaphores: A semaphore is a counter used for access to shared data by multiple processes, aimed at synchronizing processes. This communication method is mainly used to solve synchronization-related issues and avoid race conditions.
Shared Memory: Allows multiple processes to access the same memory space, enabling different processes to see updates to shared memory data in real-time. This method requires some form of synchronization, such as mutexes and semaphores. It can be considered the most useful method of inter-process communication.
Sockets: This method is primarily used for communication between clients and servers over a network. A socket is the basic unit of network communication supporting TCP/IP, serving as an endpoint for bidirectional communication between processes on different hosts, essentially an agreement between the two parties to complete the communication process using relevant functions in the socket.

Communication between multiple threads?#

Since multiple threads share the resources of a process, communication between threads focuses more on how to handle conflicts in resource usage.

Mutual Exclusion and Synchronization#

In a single-core CPU system, to create the illusion of multiple programs running simultaneously, the operating system typically uses time-slicing scheduling, allowing each process to execute for one time slice. When the time slice is exhausted, the next process runs. Due to the short duration of this time slice, a phenomenon of "concurrency" occurs. If multiple threads compete for shared resources without effective measures, it can lead to chaos in shared data.

Mutual Exclusion

The situation described above is called a race condition. When multiple threads compete to operate on shared variables, due to bad luck, a context switch may occur during execution, resulting in incorrect results. In fact, each run may yield different results, leading to uncertainty in the output.
Since the code segment where multiple threads execute operations on shared variables may lead to a race condition, we refer to this segment as the critical section, which is a code snippet that accesses shared resources and must not be executed simultaneously by multiple threads.
We want this code to be mutually exclusive, meaning that when one thread is executing in the critical section, other threads should be prevented from entering the critical section. In simple terms, at most one thread can be executing during the execution of this code.

Synchronization

Mutual exclusion solves the problem of concurrent processes/threads accessing the critical section. This interaction based on critical sections is relatively simple; as soon as one process/thread enters the critical section, other processes/threads attempting to enter the critical section will be blocked until the first process/thread exits the critical section.
We know that in multithreading, each thread does not necessarily execute in order; they progress independently and unpredictably. However, sometimes we want multiple threads to closely cooperate to achieve a common task.
Synchronization refers to the situation where concurrent processes/threads may need to wait for each other and communicate at certain critical points. This mutual waiting and information exchange is called process/thread synchronization.

Implementation and Use of Mutual Exclusion and Synchronization#

Locks

Using lock and unlock operations can solve the mutual exclusion problem of concurrent threads/processes.
Any thread that wants to enter the critical section must first perform the lock operation. If the lock operation is successful, the thread can enter the critical section; after completing access to the critical resource, it performs the unlock operation to release that critical resource.
Depending on the implementation of the lock, it can be divided into "busy waiting locks" and "non-busy waiting locks."

Busy Waiting Locks

"Busy waiting locks" are based on atomic operation instructions—Test-and-Set instructions. When a thread cannot acquire the lock, it will continuously loop without doing anything, hence the term "busy waiting lock," also known as a spin lock.

Non-Waiting Locks

When a thread cannot acquire the lock, it is placed in the lock's waiting queue, and the scheduler is executed to give the CPU to other threads.

Semaphores

A semaphore is a method provided by the operating system to coordinate access to shared resources.

Typically, a semaphore represents the quantity of resources, corresponding to an integer (sem) variable.

Additionally, there are two atomic operation system call functions to control the semaphore, namely:

P operation: Decreases sem by 1; if sem < 0 after subtraction, the process/thread enters a blocked waiting state; otherwise, it continues, indicating that the P operation may block;
V operation: Increases sem by 1; if sem <= 0 after addition, it wakes up a waiting process/thread, indicating that the V operation will not block;

The P operation is used before entering the critical section, and the V operation is used after leaving the critical section; these two operations must appear in pairs.

How to use semaphores to achieve mutual exclusion access to the critical section

Set a semaphore s for each type of shared resource, with an initial value of 1, indicating that the critical resource is unoccupied. By placing the operation to enter the critical section between P(s) and V(s), mutual exclusion for processes/threads can be achieved:

At this point, any thread wanting to enter the critical section must first perform the P operation on the mutual exclusion semaphore. After completing access to the critical resource, it performs the V operation. Since the initial value of the mutual exclusion semaphore is 1, after the first thread performs the P operation, the value of s becomes 0, indicating that the critical resource is free and can be allocated to that thread, allowing it to enter the critical section.

If a second thread wants to enter the critical section at this time, it must also perform the P operation, resulting in s becoming negative, indicating that the critical resource is occupied, and thus the second thread is blocked.

Furthermore, only after the first thread performs the V operation to release the critical resource and restore the value of s to 0 will the second thread be awakened, allowing it to enter the critical section. After it completes access to the critical resource, it will perform the V operation, restoring s to its initial value of 1.

For two concurrent threads, the value of the mutual exclusion semaphore can only be 1, 0, or -1, representing:

If the mutual exclusion semaphore is 1, it indicates that no thread has entered the critical section;
If the mutual exclusion semaphore is 0, it indicates that one thread has entered the critical section;
If the mutual exclusion semaphore is -1, it indicates that one thread has entered the critical section while another thread is waiting to enter.

By using mutual exclusion semaphores, it can be ensured that only one thread is executing in the critical section at any given time, achieving mutual exclusion.

How to use semaphores to achieve event synchronization

The synchronization method is to set a semaphore with an initial value of 0. Taking the example of "eating-cooking" synchronization:
When the mother initially asks her son whether he wants to eat, she performs P(s1), which is equivalent to asking her son whether he needs to eat. Since s1's initial value is 0, it changes to -1, indicating that the son does not need to eat, so the mother thread enters a waiting state.

When the son gets hungry, he performs V(s1), changing the s1 semaphore from -1 to 0, indicating that the son needs to eat, thus waking up the blocked mother thread, who then starts cooking.

Next, the son thread performs P(s2), which is equivalent to asking the mother whether the food is ready. Since s2's initial value is 0, it changes to -1, indicating that the mother has not finished cooking, and the son thread enters a waiting state.

Finally, when the mother finishes cooking, she performs V(s2), changing the s2 semaphore from -1 back to 0, waking up the waiting son thread, who can then proceed to eat.

Classic Synchronization Problems#

Producer-Consumer Problem#

The producer-consumer problem is described as follows:

The producer generates data and places it in a buffer;
The consumer retrieves data from the buffer for processing;
At any given time, only one producer or consumer can access the buffer;

Analyzing the problem, we can conclude:

At any given time, only one thread can operate on the buffer, indicating that operations on the buffer are critical code and require mutual exclusion;
When the buffer is empty, the consumer must wait for the producer to generate data; when the buffer is full, the producer must wait for the consumer to retrieve data, indicating that producers and consumers need synchronization.

Thus, we need three semaphores:

Mutual exclusion semaphore mutex: used for mutually exclusive access to the buffer, initialized to 1;
Resource semaphore fullBuffers: used for the consumer to inquire whether there is data in the buffer; initialized to 0 (indicating that the buffer is initially empty);
Resource semaphore emptyBuffers: used for the producer to inquire whether there is space in the buffer; initialized to n (the size of the buffer).

Dining Philosophers Problem#

Let's first look at the description of the dining philosophers problem:

5 philosophers are idly sitting around a round table eating noodles;
Conveniently, there are only 5 forks on the table, with one fork placed between each pair of philosophers;
The philosophers think together and get hungry during their thoughts;
Oddly, these philosophers require two forks to eat, meaning they need to pick up the forks on their left and right to eat;
After eating, they will put the two forks back in place and continue thinking;

An array state is used to record the three states of each philosopher: eating, thinking, and hungry (trying to pick up forks).

Thus, a philosopher can only enter the eating state when both neighbors are not eating.

The left and right neighbors of philosopher i are defined by macros LEFT and RIGHT:

LEFT : ( i + 5 - 1 ) % 5
RIGHT : ( i + 1 ) % 5

For example, if i is 2, then LEFT is 1, and RIGHT is 3.

Readers-Writers Problem#

The readers-writers problem is described as follows:

"Read-Read" allowed: Multiple readers can read simultaneously at the same time.
"Read-Write" mutual exclusion: Readers can only read when there are no writers, and writers can only write when there are no readers.
"Write-Write" mutual exclusion: Writers can only write when there are no other writers.

Fairness strategy:

Equal priority;
Mutual exclusion access for writers and readers;
Only one writer can access the critical section;
Multiple readers can access the critical resource simultaneously.

Deadlock#

Conditions for Deadlock#

Deadlock can only occur when all of the following four conditions are met:

Mutual exclusion condition;
Hold and wait condition;
No preemption condition;
Circular wait condition;

How to Diagnose Deadlock#

Java - jstack
C++ - Valgrind
Python - third-party libraries deadlock-detector and deadlocks
Go - go tool trace

How to Avoid Deadlock#

Use the resource ordering allocation method to break the circular wait condition.

So what is the resource ordering allocation method?

Threads A and B must acquire resources in the same order. When thread A first attempts to acquire resource A and then attempts to acquire resource B, thread B must also first attempt to acquire resource A and then attempt to acquire resource B. In other words, threads A and B always request their desired resources in the same order.

We can modify the previously deadlocked code using the resource ordering allocation method without changing thread A's code.

We first need to clarify the order in which thread A acquires resources; it first acquires mutex A and then mutex B.

Thus, we only need to change thread B to acquire resources in the same order to break the deadlock.

Optimistic Locking and Pessimistic Locking#

Pessimistic locking assumes that the probability of multiple threads modifying shared resources simultaneously is high, leading to conflicts. Therefore, it requires locking before accessing shared resources.

Optimistic locking assumes that the probability of conflicts is low. Its working method is: first modify the shared resource, then verify whether any conflicts have occurred during that time. If no other threads have modified the resource, the operation is completed. If it is found that other threads have modified the resource, the current operation is abandoned.
Scenario example: Online documents:
To implement simultaneous editing by multiple users, optimistic locking is used. It allows multiple users to open the same document for editing, and after editing, it verifies whether there are conflicts in the modified content.

What constitutes a conflict? For example, if user A edits the document in the browser first, and then user B also opens the same document in the browser for editing, but user B submits earlier than user A, user A is unaware of this process. When A submits the modified content, conflicts will occur in the areas modified in parallel between A and B.

How does the server verify whether a conflict has occurred? The usual solution is as follows:

Since the probability of conflicts is low, users are allowed to edit the document first. However, the browser records the document version number returned by the server when downloading the document;
When the user submits modifications, the request sent to the server includes the original document version number. The server then compares it with the current version number. If the version numbers do not match, the submission fails; if the version numbers match, the modification is successful, and the server updates the version number to the latest version.
In practice, we commonly see that SVN and Git also use the idea of optimistic locking, allowing users to edit code first and then determining whether conflicts have occurred based on version numbers when submitting. In the event of a conflict, users need to modify the conflicting areas themselves and then resubmit.

Although optimistic locking eliminates the need for locking and unlocking operations, once a conflict occurs, the cost of retrying is very high. Therefore, it should only be considered in scenarios where the probability of conflicts is very low and the cost of locking is very high.