Mechanics of HTTP

When a client requests a web page from a web server, it must speak its request in the agreed-upon language: HTTP. As a consumer of the web, you don't normally see HTTP messages. Your browser composes the requests and parses the server's responses behind the scenes. However, some knowledge of HTTP is helpful as you navigate the terrain of web development. Let's have a look at what your browser is doing when you type an address in the location bar.

HTTP messages are written in plain text. Each message starts with an HTTP command. The command GET, for example, is used to retrieve a web page. A GET request looks like this:

GET /index.html HTTP/1.1\r\n
Host: twodee.org\r\n
\r\n

The first line of the message has this grammatical structure: <command> <path> <version>. The lines below are called headers. They are key-value pairs used to qualify the request. In this case, we add a Host header to identify which server has the webpage we wish to download.

The message has a blank line at the end. This blank line is required, as it signals the end of the request. At the end of each line we have a linebreak sequence, which is a carriage return followed by a linefeed. Note that this sequence is not the linebreak sequence native to every operating system. Linux and macOS, for example, end lines with only \n.

When we type http://twodee.org/index.html in the browser, it opens a socket that connects to port 80 of the server twodee.org and sends this message through it. You can recreate this behavior using the telnet utility. Run this command in your shell:

telnet twodee.org 80

In the prompt that appears, type in the lines from the GET request above. Leave out the \r\n sequences; these will be inserted automatically when you type Enter or Return.

If you don't have access to telnet, you can use a programming language that supports sockets. Here's a Ruby script that opens the socket, sends the message, and then prints out the response line by line:

require 'socket'

socket = TCPSocket.open('twodee.org', 80)
socket.write("GET /index.html HTTP/1.1\r\n")
socket.write("Host: twodee.org\r\n")
socket.write("\r\n")

while line = socket.gets
puts line
end

However you send this message, you see this response from the server:

HTTP/1.1 200 OK
Date: Tue, 22 Jun 2021 20:18:28 GMT
Server: Apache/2.4.41 (Ubuntu)
Last-Modified: Tue, 04 Feb 2020 14:25:57 GMT
ETag: "72-59dc0d17a95ca"
Accept-Ranges: bytes
Content-Length: 114
Vary: Accept-Encoding
Content-Type: text/html

<!DOCTYPE html>
<html>
<head>
<title>Ack</title>
</head>
<body>
Hello, client! It's me, server.
</body>
</html>

The first line reports the HTTP status. Value 200 means that the request was valid and the server was able to satisfy it. Several headers follow. From these you learn the server software is Apache, index.html was last modified in February 2020, and the body of the response is formatted as HTML.

After receiving this response, the browser would then proceed to parse the HTML of the body and render it. We won't try to recreate that step, which is a career. Instead, let's experiment a bit with the request.

What happens when you change the socket to 79 instead of 80?

You probably found that the client is never able to connect to the server. That's because the server is not listening on port 79. HTTP traffic is expected on port 80. In fact, a firewall is running on the server that blocks all traffic to any other ports.

What happens when you request the non-existent index5.html instead of index.html?

The status code is 302 instead of 200. You may have expected this to generate the status 404, which is used to report that a page can't be found. The 302 is sent instead, because there's a rule on the server that redirects most traffic from HTTP to secure-HTTP. If you visit https://twodee.org/index5.html, you will indeed see the 404 error.

What happens in the telnet session or the script right after the server's response is printed?

You see a pause, after which the socket is closed. With HTTP/1.0, the socket is closed immediately after the response was sent. The pause was added to HTTP/1.1 because in many web pages, the client is likely to follow up with other requests, perhaps for images or other files that were linked in the HTML. The client reuses the existing socket for these subsequent requests, which is faster than opening a new one. The Connection header is used to control this behavior. The value close will cause the socket to close immediately:

GET /index.html HTTP/1.1\r\n
Host: twodee.org\r\n
Connection: close\r\n
\r\n

By default, its value is keep-alive.

The semantic meaning of the Connection header is not essential knowledge. However, this noticeable pause demonstrates an important principle. In a technology's early stages, it is simple. As its developers add features and make it faster, it becomes increasingly hard to understand. You are entering the web development scene at a time where most of the simplicity is gone. But you'll manage.