Simulating Statefulness in HTTP Traffic
HTTP is inherently stateless, meaning that it doesn’t have any built-in mechnanism to keep track of anything in a message. As such, HTTP has no built-in way to relate different messages to one another. This can be an issue. For example, if you go to your bank site and log into your account, HTTP has no direct way of holding on to the login information, and therefore no direct way of knowing whether a user making a request to access a bank account is authorized to do so.
Web clients and servers address this problem with sessions and cookies.
Sessions
A session is a larger context in which a series of HTTP requests and responses operate. Essentially, it is a place to store state between requests. While you are logged into your bank site, your user id is part of the state of your session. You might request a statement, view it, request another statement, and so on, all while only needing to log into the site once.
There are various ways of implementing sessions, but the important thing to understand here is that they are all outside the scope of what HTTP provides. HTTP simply takes requests and sends responses. It has no idea of how these may be related.
One thing that most session implementations have in common is that they use HTTP headers to send some form of session identification back and forth between a client and server. When a client requests a web page requiring a session, the server sends a session ID, or SID, back to the client in the response header, and the client includes that SID in subsequent related requests. That way, the server application knows which session it is dealing with, and can respond with the correct information.
For example, a banking application will use the session ID to determine which bank statements belong to a user making the request for statements.
Cookies
Cookies are a way of persisting (saving) data. For example, you may log into amazon.com, do some shopping, and then quit your browser. Then, if you bring up your browser again and reopen the amazon site, you may find that you are still logged in, and that your shopping cart is as you left it. This is because your SID and other state is saved on your disk in the form of cookies, allowing the session to be reinstated. When you visit the Amazon site, your browser looks up any cookies that the site has saved on your disk, and passes them to the server in the header of your HTTP request. The server can then use that information to retrieve the “persistent session data” associated with your account.
Servers tell clients to save cookie data on the local machine by including a set-cookie
header in the response. Clients pass cookie data to servers by including a cookie
header in the request.
Cookies are most typically used to manage sessions as in the above example, to keep track of user preferences such as themes and profile information, and to track user behavior for things like “targeted advertising.”
Other Ways to Persist Session Data
Cookies are not the only way to keep track of state, and they aren’t entirely reliable. A user may work with several different machines and have different cookies stored on the different machines. Also, cookies have an expiration date. And a user can delete cookies manually from the disk.
A more reliable way to store session data is to keep it on the server in some sort of session storage database. Frameworks such as Ruby on Rails or ASP.Net provide for this ability. Of course, this also requires space on the server, and cookies don’t, so it’s important when designing a website to work through the tradeoffs between reliability and scalability that these two ways of persisting state present.
Since session IDs are often used as part of sending sensitive information on the internet, they are a valuable commodity. Many identity thefts begin with the theft of a session ID. So, session security is an important part of internet use. The next article begins a discussion of session security.