1. Basic Concepts of HTTP#
1. What is HTTP?#
HTTP stands for HyperText Transfer Protocol.
The name "HyperText Transfer Protocol" can be broken down into three parts:
- HyperText: Text that transcends ordinary text; it is a mixture of text, images, videos, etc., and most importantly, it contains hyperlinks that can jump from one hypertext to another.
- Transfer: Bidirectional data transfer between two points.
- Protocol: Establishes a standard for communication between computers (involving two or more participants).
2. What are the common HTTP status codes?#
- 1xx status codes are informational and represent an intermediate state in protocol processing, and are rarely used in practice.
- 2xx status codes indicate that the server successfully processed the client's request, which is the status we most want to see.
"200 OK" is the most common success status code, indicating everything is normal. If it is not a HEAD request, the response headers returned by the server will include body data.
"204 No Content" is also a common success status code, which is essentially the same as 200 OK, but the response headers do not contain body data.
"206 Partial Content" is used for HTTP chunked downloads or resuming interrupted downloads, indicating that the body data returned in the response is not the entirety of the resource, but rather a part of it, and it is also a successful status from the server's perspective. - 3xx status codes indicate that the resource requested by the client has changed, requiring the client to resend the request with a new URL to obtain the resource, which is known as redirection.
"301 Moved Permanently" indicates a permanent redirection, meaning the requested resource no longer exists and must be accessed using a new URL.
"302 Found" indicates a temporary redirection, meaning the requested resource still exists but must be accessed using another URL temporarily.
Both 301 and 302 will use the Location field in the response headers to specify the URL to redirect to, and the browser will automatically redirect to the new URL.
"304 Not Modified" does not imply redirection; it indicates that the resource has not been modified, redirecting to an existing cached file, also known as cache redirection, which tells the client it can continue using the cached resource for cache control. - 4xx status codes indicate that there is an error in the message sent by the client, and the server cannot process it, which is the meaning of error codes.
"400 Bad Request" indicates that there is an error in the message sent by the client, but it is a general error.
"403 Forbidden" indicates that the server forbids access to the resource, and it is not an error in the client's request.
"404 Not Found" indicates that the requested resource does not exist or cannot be found on the server, so it cannot be provided to the client. - 5xx status codes indicate that the client's request message is correct, but an internal error occurred on the server during processing, which belongs to server-side error codes.
"500 Internal Server Error" is a general error code, similar to 400, indicating that we do not know what error occurred on the server.
"501 Not Implemented" indicates that the functionality requested by the client is not supported, similar to "coming soon, please stay tuned."
"502 Bad Gateway" is usually returned as an error code when the server acts as a gateway or proxy, indicating that the server itself is functioning normally, but there was an error accessing the backend server.
"503 Service Unavailable" indicates that the server is currently busy and cannot respond to the client temporarily, similar to "the network service is busy, please try again later."
3. What are the common HTTP fields?#
Host
field: Used by the client to specify the server's domain name when sending a request.Content-Length
field: When the server returns data, it will include the Content-Length field, indicating the length of the data in this response. The HTTP protocol uses carriage return and line feed as boundaries for HTTP headers, and the Content-Length field as the boundary for the HTTP body; both methods are designed to solve the "sticky packet" problem.Connection
field: The Connection field is most commonly used by the client to request the server to use the "HTTP persistent connection" mechanism for reusing other requests. The default connection in HTTP/1.1 is a persistent connection, but to maintain compatibility with older versions of HTTP, the value of the Connection header field must be specified as Keep-Alive.Content-Type
field: The Content-Type field is used by the server to inform the client of the format of the data in the response.Content-Encoding
field: The Content-Encoding field indicates the method of data compression. It specifies what compression format the data returned by the server uses, such asgzip
.
4. GET vs POST#
- The GET method is safe and idempotent because it is a "read-only" operation; no matter how many times the operation is performed, the data on the server remains safe, and the result is the same each time. Therefore, GET request data can be cached, which can be done on the browser itself (completely avoiding browser requests) or on a proxy (like nginx), and GET requests can be saved as bookmarks in the browser.
- POST, being an operation that "adds or submits data," modifies resources on the server, so it is not safe, and multiple submissions of data will create multiple resources, making it not idempotent. Therefore, browsers generally do not cache POST requests and cannot save POST requests as bookmarks.
5. HTTP Caching#
1. Strong Caching#
Strong caching is implemented using the following two HTTP response header fields, which indicate the validity period of the resource in the client's cache:
Cache-Control
, which is a relative time;Expires
, which is an absolute time;
Cache-Control takes precedence over Expires.
The process is as follows:
- When the browser first requests a server resource, the server will return this resource while adding Cache-Control in the Response headers, setting the expiration time.
- When the browser requests the same resource again, it will first compare the request time with the expiration time set in Cache-Control to determine if the resource has expired; if not, it will use the cached version; otherwise, it will request the server again.
- When the server receives the request again, it will update the Cache-Control in the Response headers.
2. Negotiated Caching#
When the response code is 304, it tells the browser that it can use the locally cached resource. This method of informing the client whether it can use the cache is known as negotiated caching.
The first method: The If-Modified-Since field in the request header and the Last-Modified field in the response header work together, meaning:
- The Last-Modified in the response header indicates the last modification time of the resource;
- The If-Modified-Since in the request header: When the resource has expired and the response header has a Last-Modified declaration, the next request will include the Last-Modified time. When the server receives the request and finds If-Modified-Since, it will compare it with the last modification time of the requested resource (Last-Modified). If the last modification time is newer (greater), it means the resource has been modified, and it will return the latest resource with HTTP 200 OK; if the last modification time is older (less), it means the resource has not been modified, and it will respond with HTTP 304 to use the cache.
The second method: The If-None-Match field in the request header and the ETag field in the response header work together, meaning:
- The ETag in the response header uniquely identifies the response resource;
- The If-None-Match in the request header: When the resource has expired, and the browser finds an ETag in the response header, it will include the If-None-Match value as the ETag value when making the request to the server again. When the server receives the request, it will compare; if the resource has not changed, it will return 304, and if the resource has changed, it will return 200.
If the ETag and Last-Modified field information is sent to the server, the ETag takes precedence.
Why does ETag take precedence? This is because ETag can solve several difficult problems that Last-Modified cannot:
- The last modification time of a file may change without modifying the file content, which can lead the client to believe the file has been changed, prompting a new request;
- Some files may be modified within seconds, and If-Modified-Since can only check at the second level, while ETag can ensure that the client can refresh multiple times within 1 second under such demands;
- Some servers cannot accurately obtain the last modification time of a file.
2. Differences Between HTTP and HTTPS#
1. What are the differences between HTTP and HTTPS?#
- HTTP is the HyperText Transfer Protocol, where information is transmitted in plaintext, posing security risks. HTTPS addresses the security flaws of HTTP by adding the SSL/TLS security protocol between the TCP and HTTP network layers, allowing messages to be transmitted securely.
- Establishing an HTTP connection is relatively simple; after the TCP three-way handshake, HTTP message transmission can occur. In contrast, HTTPS requires an additional SSL/TLS handshake process after the TCP three-way handshake before entering encrypted message transmission.
- The default ports for the two are different; the default port for HTTP is 80, while for HTTPS it is 443.
- The HTTPS protocol requires a digital certificate to be applied for from a CA (Certificate Authority) to ensure the server's identity is trustworthy.
2. What problems does HTTPS solve for HTTP?#
Due to plaintext transmission, HTTP has the following three security risks:
- Eavesdropping risk, where communication content can be intercepted on the communication link, making user information vulnerable.
- Tampering risk, where unwanted advertisements can be forcibly injected, causing visual pollution and making it difficult for users to see.
- Impersonation risk, where a fake website can impersonate a legitimate one, leading to potential financial loss for users.
How does HTTPS address the above three risks?
- It achieves confidentiality through a mixed encryption method, addressing the eavesdropping risk.
- It ensures integrity through a hashing algorithm, generating a unique "fingerprint" for the data, which is used to verify data integrity, addressing the tampering risk.
- It places the server's public key into a digital certificate, addressing the impersonation risk.
Mixed Encryption#
HTTPS employs a "mixed encryption" method that combines symmetric and asymmetric encryption:
- Asymmetric encryption is used to exchange the "session key" before communication begins, after which asymmetric encryption is no longer used.
- During communication, symmetric encryption using the "session key" is used to encrypt plaintext data.
The reason for using "mixed encryption":
- Symmetric encryption uses only one key, which is fast but must be kept secret, making secure key exchange difficult.
- Asymmetric encryption uses two keys: a public key that can be distributed freely and a private key that must be kept secret, solving the key exchange problem but being slower.
Hashing Algorithm + Digital Signature#
To ensure that the transmitted content is not tampered with, we need to calculate a "fingerprint" for the content and then transmit it along with the content.
Upon receiving it, the recipient calculates a "fingerprint" for the content and compares it with the "fingerprint" sent by the sender; if they match, it indicates the content has not been tampered with; otherwise, it indicates tampering has occurred.
In computing, a hashing algorithm (hash function) is used to compute the hash value of the content, which serves as the content's "fingerprint." This hash value is unique and cannot be used to derive the original content.
While hashing ensures that the content has not been tampered with, it does not guarantee that the "content + hash value" has not been replaced by a man-in-the-middle, as there is no proof of whether the message received by the client originated from the server.
To avoid this situation, asymmetric encryption algorithms are used, which involve two keys:
- One is the public key, which can be shared with everyone;
- The other is the private key, which must be managed by the owner and kept secret.
- Public key encryption and private key decryption ensure the security of content transmission, as content encrypted with the public key cannot be decrypted by others; only the holder of the private key can decrypt it to obtain the actual content.
- Private key encryption and public key decryption ensure that messages cannot be impersonated, as the private key is not disclosed. If the public key can successfully decrypt content encrypted with the private key, it proves that the message originated from the holder of the private key.
Thus, the primary purpose of asymmetric encryption is to confirm the identity of messages through the "private key encryption, public key decryption" method. The digital signature algorithm commonly used employs this method, but instead of encrypting the content itself, it encrypts the hash value of the content.
Digital Certificate#
What if the identity verification step is missing, and the public key is forged?
Through an authoritative CA (Certificate Authority), the server's public key is placed in a digital certificate (issued by the CA). As long as the certificate is trusted, the public key is also trusted.
3. How is a connection established in HTTPS? What interactions occur?#
The "handshake phase" of TLS involves four communications, using different key exchange algorithms, and the TLS handshake process may vary. Currently, two commonly used key exchange algorithms are RSA and ECDHE.
The detailed process of establishing a TLS protocol is as follows:
- ClientHello
First, the client initiates an encrypted communication request to the server, known as the ClientHello request.
At this step, the client primarily sends the following information to the server:
(1) The TLS protocol version supported by the client, such as TLS 1.2.
(2) A random number generated by the client (Client Random), which will be used as one of the conditions for generating the "session key."
(3) A list of cipher suites supported by the client, such as the RSA encryption algorithm. - ServerHello
After receiving the client's request, the server responds with a SeverHello message. The server's response includes:
(1) Confirmation of the TLS protocol version; if the browser does not support it, encrypted communication will be terminated.
(2) A random number generated by the server (Server Random), which will also be used as one of the conditions for generating the "session key."
(3) A confirmed list of cipher suites, such as the RSA encryption algorithm.
(4) The server's digital certificate. - Client Response
After receiving the server's response, the client first verifies the authenticity of the server's digital certificate using the CA public key from the browser or operating system.
If the certificate is valid, the client extracts the server's public key from the digital certificate, encrypts a message using it, and sends the following information to the server:
(1) A random number (pre-master key). This random number will be encrypted with the server's public key.
(2) A notification that the encryption communication algorithm has changed, indicating that subsequent information will be encrypted using the "session key."
(3) A notification that the client handshake has ended, indicating that the client's handshake phase is complete. This also includes a summary of all previously sent data for the server to verify.
The first item, the random number, is the third random number in the entire handshake phase, which will be sent to the server, so both the client and server will have the same random number.
With these three random numbers (Client Random, Server Random, pre-master key), the server and client will use the agreed encryption algorithm to generate the "session key" for this communication. - Final Server Response
After receiving the client's third random number (pre-master key), the server calculates the "session key" for this communication using the agreed encryption algorithm.
Then, the server sends the final information to the client:
(1) A notification that the encryption communication algorithm has changed, indicating that subsequent information will be encrypted using the "session key."
(2) A notification that the server handshake has ended, indicating that the server's handshake phase is complete. This also includes a summary of all previously sent data for the client to verify.
At this point, the entire TLS handshake phase is complete. The client and server will then enter encrypted communication, which will use the ordinary HTTP protocol, but the content will be encrypted using the "session key."
RSA Algorithm#
The connection process described above is the process of establishing a connection using the RSA algorithm, which ensures secure communication between the client and server but still has flaws:
The main issue with using the RSA key negotiation algorithm is that it does not support forward secrecy.
When the client transmits a random number (one of the conditions for generating the symmetric encryption key) to the server, it is encrypted with the public key. The server, upon receiving it, decrypts it with the private key. Therefore, if the server's private key is compromised, all previously intercepted TLS communication ciphertext can be decrypted.
To address this issue, the ECDHE key negotiation algorithm was introduced, which is now used by most websites.
ECDHE Algorithm#
The ECDHE key negotiation algorithm is an evolution of the DH algorithm, so we will start with the DH algorithm.
DH Algorithm#
The DH algorithm is an asymmetric encryption algorithm, which can be used for key exchange. The core mathematical concept of this algorithm is discrete logarithms.
Discrete logarithms are based on logarithmic operations with "modular arithmetic," meaning taking the remainder, which corresponds to the "%" operator in programming languages and can also be represented as "mod." The concept of discrete logarithms is illustrated in the following diagram:
In the diagram, the base a and modulus p are public parameters of the discrete logarithm, meaning they are public, while b is the true number, and i is the logarithm. Knowing the logarithm allows one to calculate the true number using the formula above. However, knowing the true number makes it very difficult to deduce the logarithm.
Especially when the modulus p is a large prime number, even knowing the base a and true number b, it is nearly impossible to calculate the discrete logarithm with current computing capabilities, which is the mathematical basis of the DH algorithm.
As shown in the diagram: a and b are each party's private keys, A and B are their respective public keys, G and P are public information, and K is the symmetric encryption key used between Xiao Hong and Xiao Ming, which can serve as the session key.
DHE Algorithm#
E stands for ephemeral (temporary), meaning that both parties' private keys are randomly generated and temporary for each key exchange communication.
Thus, even if a powerful hacker decrypts the private key of a particular communication process, the private keys of other communication processes remain secure, as each communication process's private key is independent, ensuring "forward secrecy."
ECDHE Algorithm#
Due to performance issues with the DHE algorithm, which requires extensive multiplication, the ECDHE algorithm has emerged, which is widely used for key exchange.
The ECDHE algorithm utilizes the properties of elliptic curves to compute public keys and the final session key with less computational effort.
The process of Xiao Hong and Xiao Ming using the ECDHE key exchange algorithm is as follows:
- Both parties agree on which elliptic curve to use and the base point G on the curve; these two parameters are public.
- Each party randomly generates a random number as their private key d and multiplies it with the base point G to obtain their public key Q (Q = dG); at this point, Xiao Hong's public and private keys are Q1 and d1, while Xiao Ming's are Q2 and d2.
- They exchange their public keys, and finally, Xiao Hong calculates the point (x1, y1) = d1Q2, while Xiao Ming calculates (x2, y2) = d2Q1. Since multiplication on the elliptic curve satisfies the commutative and associative laws, d1Q2 = d1d2G = d2d1G = d2Q1, thus both parties have the same x-coordinate, which serves as the shared key, or session key.
In this process, both parties' private keys are randomly and temporarily generated and are not public. Even with public information (elliptic curve, public key, base point G), it is still very difficult to compute the discrete logarithm on the elliptic curve (private key).
Handshake Process#
First TLS Handshake
The client first sends a "Client Hello" message, which includes the TLS version number used by the client, the list of supported cipher suites, and the generated random number (Client Random).
Second TLS Handshake
The server receives the client's "hello" and responds with a "Server Hello" message, which includes the TLS version number confirmed by the server and a random number (Server Random), then selects an appropriate cipher suite from the client's list.
Next, to prove its identity, the server sends a "Certificate" message, which includes the certificate to the client.
This step differs significantly from the RSA handshake process because the server has chosen the ECDHE key negotiation algorithm, so after sending the certificate, it sends a "Server Key Exchange" message.
In this process, the server does three things:
- It selects an elliptic curve named x25519, and the base point for the elliptic curve is also determined, which will be public to the client.
- It generates a random number as the server's elliptic curve private key, which is kept locally.
- It calculates the server's elliptic curve public key based on the base point G and the private key, which will be made public to the client.
To ensure that this elliptic curve public key is not tampered with by a third party, the server will sign the server's elliptic curve public key using the RSA signature algorithm.
Subsequently, a "Server Hello Done" message is sent, indicating to the client, "This is the information I provide, and the greeting is complete."
Third TLS Handshake
After the client receives the server's certificate, it must verify its legality. If the certificate is valid, the server's identity is confirmed. The verification process will follow the certificate chain step by step to confirm the authenticity of the certificate, and then use the public key of the certificate to verify the signature, thus confirming the server's identity. Once confirmed, the process can continue.
The client generates a random number as the client's elliptic curve private key, then generates the client's elliptic curve public key based on the information provided by the server, and sends it to the server using a "Client Key Exchange" message.
The final session key is generated using the "client random number + server random number + x (shared key calculated by the ECDHE algorithm)."
After calculating the session key, the client sends a "Change Cipher Spec" message to inform the server that subsequent communications will use symmetric encryption.
Next, the client sends an "Encrypted Handshake Message," which summarizes the previously sent data and encrypts it with the symmetric key for the server to verify whether the generated symmetric key can be used normally.
Fourth TLS Handshake
Finally, the server will perform the same operation, sending "Change Cipher Spec" and "Encrypted Handshake Message." If both parties verify that encryption and decryption are functioning correctly, the handshake is officially complete. Thus, encrypted HTTP requests and responses can be sent and received normally.
4. How is the integrity of application data in HTTPS ensured?#
TLS is implemented in two layers: the handshake protocol and the record protocol:
- The TLS handshake protocol is the four handshake processes we discussed earlier, responsible for negotiating encryption algorithms and generating symmetric keys, which are then used to protect application data (i.e., HTTP data);
- The TLS record protocol is responsible for protecting application data and verifying its integrity and origin, so the encryption of HTTP data is handled using the record protocol.
The TLS record protocol primarily handles the compression, encryption, and authentication of messages (HTTP data), as illustrated in the following diagram:
The specific process is as follows:
- First, the message is divided into multiple shorter segments, and each segment is compressed separately.
- Next, the compressed segments are appended with a message authentication code (MAC value, generated using a hashing algorithm) to ensure integrity and authenticate the data. By adding the MAC value, tampering can be detected. Additionally, to prevent replay attacks, the encoding of the segments is included when calculating the MAC.
- Subsequently, the compressed segments along with the MAC value are encrypted using symmetric encryption.
- Finally, the encrypted data is combined with a header composed of the data type, version number, and compressed length to form the final message data.
After the record protocol completes, the final message data is transmitted to the transport control protocol (TCP) layer for transmission.
5. Is HTTPS secure against man-in-the-middle attacks?#
Is HTTPS always secure?
When a client initiates an HTTPS request through a browser, it may be redirected to a "man-in-the-middle server" by a "fake base station," so the client completes the TLS handshake with the "man-in-the-middle server," which then completes the TLS handshake with the actual server.
Conclusion: During the TLS handshake process, the man-in-the-middle server sends a forged certificate to the browser, which the browser (client) can recognize as invalid, prompting a warning about the certificate issue.
If the user accepts the man-in-the-middle server's certificate, the man-in-the-middle can decrypt the data in the HTTPS request initiated by the browser and the HTTPS response data sent from the server to the browser. Essentially, the man-in-the-middle can "eavesdrop" on the data exchanged between the browser and the server.
Thus, the HTTPS protocol itself has no vulnerabilities to date; even if a man-in-the-middle attack is successful, it fundamentally exploits vulnerabilities on the client side (such as the user clicking to continue or being maliciously imported with a forged root certificate), rather than indicating that HTTPS is not secure enough.
3. Evolution of HTTP/1.1, HTTP/2, and HTTP/3#
1. What performance improvements does HTTP/1.1 offer over HTTP/1.0?#
- Connection Method: HTTP/1.0 uses short connections, while HTTP/1.1 supports persistent connections.
- Status Response Codes: HTTP/1.1 introduces a large number of new status codes, with 24 new error response status codes alone. For example, 100 (Continue) — a preliminary request before requesting large resources, 206 (Partial Content) — an identifier for range requests, 409 (Conflict) — indicating a conflict with the current resource's specifications, 410 (Gone) — indicating that the resource has been permanently moved with no known forwarding address.
- Caching Mechanism: In HTTP/1.0, caching was primarily determined using If-Modified-Since and Expires in the headers, while HTTP/1.1 introduced more caching control strategies such as Entity tag, If-Unmodified-Since, If-Match, If-None-Match, and more options for controlling caching strategies.
- Bandwidth: HTTP/1.0 exhibited some bandwidth-wasting phenomena, such as when the client only needed part of an object, but the server sent the entire object and did not support resuming downloads. HTTP/1.1 introduced the range header field in the request header, allowing for requests for only a portion of the resource, with the return code being 206 (Partial Content), facilitating developers' choices to fully utilize bandwidth and connections.
- Host Header Processing: HTTP/1.1 introduced the Host header field, allowing multiple domain names to be hosted on the same IP address, thus supporting virtual hosting. HTTP/1.0 did not have the Host header field and could not implement virtual hosting.
2. What performance improvements does HTTP/2.0 offer over HTTP/1.1?#
- IO Multiplexing: HTTP/2.0 can simultaneously transmit multiple requests and responses over the same connection (considered an upgrade of HTTP/1.1's persistent connection). In contrast, HTTP/1.1 uses a serial approach, requiring independent connections for each request and response. This makes HTTP/2.0 more efficient in handling multiple requests, reducing network latency and improving performance.
- Binary Frames: HTTP/2.0 uses binary frames for data transmission, while HTTP/1.1 uses text format messages. Binary frames are more compact and efficient, reducing the amount of data transmitted and bandwidth consumption.
- Header Compression: HTTP/1.1 supports body compression but does not support header compression. HTTP/2.0 supports header compression, reducing network overhead.
- Server Push: HTTP/2.0 supports server push, allowing the server to push related resources to the client when a client requests a resource, thereby reducing the number of requests and latency. In contrast, HTTP/1.1 requires the client to send requests to retrieve related resources.
3. What optimizations does HTTP/3 implement?#
- Transport Protocol: HTTP/2.0 is based on the TCP protocol, while HTTP/3.0 introduces the QUIC (Quick UDP Internet Connections) protocol for reliable transmission, providing security comparable to TLS/SSL, with lower connection and transmission latency. You can think of QUIC as an upgraded version of UDP, adding many features such as encryption and retransmission. HTTP/3.0 was previously known as HTTP-over-QUIC, indicating that the biggest change in HTTP/3 is the use of QUIC.
- Connection Establishment: HTTP/2.0 requires the classic TCP three-way handshake process (generally 3 RTTs). Due to the characteristics of the QUIC protocol, HTTP/3.0 can avoid the delays of the TCP three-way handshake, allowing data to be sent during the first connection (0 RTT, zero round-trip time).
- Head-of-Line Blocking: HTTP/2.0 multiplexes multiple requests over a single TCP connection; if a packet is lost, it blocks all HTTP requests. The QUIC protocol's characteristics allow HTTP/3.0 to mitigate head-of-line blocking issues to some extent (Head-of-Line blocking, abbreviated as HOL blocking). A connection can establish multiple independent data streams, meaning that if one data stream experiences packet loss, it does not affect the others (essentially multiplexing + polling).
- Error Recovery: HTTP/3.0 has a better error recovery mechanism, allowing for faster recovery and retransmission when network issues such as packet loss or latency occur. In contrast, HTTP/2.0 relies on TCP's error recovery mechanism.
- Security: Both HTTP/2.0 and HTTP/3.0 have high security requirements and support encrypted communication, but they differ in implementation. HTTP/2.0 uses the TLS protocol for encryption, while HTTP/3.0 is based on the QUIC protocol, which includes built-in encryption and authentication mechanisms, providing stronger security.