HTTP and HTML in the Application Layer

HTTP

HTTP stands for Hypertext Transfer Protocol. It is the protocol used to transfer requested web pages from servers to clients, as well as the protocol that allows clients to send data to servers. It defines the structure of messages sent between clients and servers.

Hypertext and HTML

Hypertext is text that supports embedded links — called hyperlinks — to other pages, and is the foundation on which the World Wide Web is based. Its original focus was to allow sharing of documents between researchers, with a behavior much like the present wikipedia.org site.

Hypertext is formatted using the Hypertext Markup Language (HTML), which specifies text, links and so on, as well as the placement of audiovisual media such as images and video recordings. HTML interacts with Cascading Style Sheets, or CSS, along with programming languages such as JavaScript, PHP, Python and Ruby, and database managment systems such as MySQL and PostgreSQL, to allow the development of full-featured software applications that interact with users via web pages.

HTTP, then, is the application-layer protocol that is used to send client requests for hypertext (i.e. web pages) to a server, and to send the same back to the client from the server.

How HTTP Processes Requests and Responses

HTTP uses methods, or instructions, to process traffic. The most common methods are GET, POST, PUT and DELETE. GET retrieves data from the server. POST sends new data to the server, creating a new resource there. PUT overwrites an existing resource with new data, or uses the new data to create a new resource if the referenced resource isn’t found. DELETE deletes an existing resource.

These “resources” can be pretty much any sort of data, but it may helpful to visualize these four methods in terms of CRUD operations, where Create, Read, Update and Delete operations correspond to HTTP POST, GET, PUT and DELETE methods, respectively.

HTML has direct support only for GET and POST methods, so these are the HTTP methods most often used. We’ll go into them in more detail.

GET

The GET method requests transfer of a document. As such, GET is the mechanism for the retrieval of web pages.

The basic syntax for the HTTP GET method is GET [URL] [HTTP-version]. The HTTP version is optional in HTTP versions prior to 1.1, and required in later versions.

This line is usually followed by another set of lines that convey information about the message. Each of these lines is a name/value pair, in the form name:[ ]value[ ]CRLF. (The [ ] means an optional space; CRLF means a new line.) These lines are called fields. Taken together, these fields are called the header.

For example, HTTP v. 1.1 or later requires a Host field, that identifies the name of the host. If we establish a connection to w3.org, a GET request to that connection might look like this:


GET /pub/WWW/TheProject.html HTTP/1.1
Host: www.w3.org

GET /pub/WWW/TheProject.html HTTP/1.1

Host: www.w3.org

When a server receives a command like this, it processes a response. The response takes this form:


protocol/version status-code status-desc
*[header lines]
[message body]*

protocol/version status-code status-desc

*[header lines]

[message body]*

(The header lines and message body are optional in a response.) Some common status codes and descriptions:


200 OK
301 Moved Permanently
400 Bad Request
401 Forbidden
404 Not Found
500 Server Error

200 OK

301 Moved Permanently

400 Bad Request

401 Forbidden

404 Not Found

500 Server Error

For example, sending the above GET request to w3.org on port 80 gets this response (date is the current date):


HTTP/1.1 301 Moved Permanently
date: Sun, 03 May 2020 00:05:25 GMT
location: http://www.w3.org/TheProject.html
cache-control: max-age=21600
expires: Sun, 03 May 2020 06:05:25 GMT
content-length: 241
content-type: text/html; charset=iso-8859-1
vary: upgrade-insecure-requests

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
The document has moved <a href="http://www.w3.org/TheProject.html">here</a>.
</body></html>

HTTP/1.1 301 Moved Permanently

date: Sun, 03 May 2020 00:05:25 GMT

location: http://www.w3.org/TheProject.html

cache-control: max-age=21600

expires: Sun, 03 May 2020 06:05:25 GMT

content-length: 241

content-type: text/html; charset=iso-8859-1

vary: upgrade-insecure-requests

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">

<title>301 Moved Permanently</title>

</head><body>

<h1>Moved Permanently</h1>

The document has moved <a href="http://www.w3.org/TheProject.html">here</a>.

</body></html>

(The line beginning with <!DOCTYPE is the beginning of the message body.)

Note: HTTP methods specifically require a CRLF for a new line. (CRLF means carriage return plus line feed.) Different operating systems use one of these or both of them to denote a new line; for example Microsoft uses CRLF and Unix, Linux and Mac OSX use just LF. This can create problems when attempting to input HTTP commands manually, so if you’re getting bad requests where you don’t expect them, the new line character sequence of the application you’re using to input HTTP commands is the first place to look.)

POST

The POST method requests that the server process the data enclosed in the request in its own way. For example, POST is used for:

Providing the fields entered into an HTML form to a data-handling process, such as adding a new record to a database
Posting a message to a bulletin board or blog
Creating a new user account on a website

In terms of the syntax of requests and responses, POST works similarly to GET. However, a successful request, rather than sending the requested data as would a GET request, sends a message describing the result of the POST action.

Statefulness With HTTP

Once an HTTP request is fulfilled, the request ends. This isn’t always desirable, since often we want to keep track of some data over multiple page requests. The next article explains some ways to do this.

Robert Rodes

Software Developer

Turning ideas into software...