Chapter 7. Session Tracking

HTTP is a stateless protocol: it provides no way for a server to recognize that a sequence of requests are all from the same client. Privacy advocates may consider this a feature, but it causes problems because many web applications aren't stateless. The shopping cart application is a classic example--a client can put items in his virtual cart, accumulating them until he checks out several page requests later. Other examples include sites that offer stock brokerage services or interactive data mining.

The HTTP state problem can best be understood if you imagine an online chat forum where you are the guest of honor. Picture dozens of chat users, all conversing with you at the same time. They are asking you questions, responding to your questions, and generally making you wish you had taken that typing course back in high school. Now imagine that when each participant writes to you, the chat forum doesn't tell you who's speaking! All you see is a bunch of questions and statements mixed in with each other. In this kind of forum, the best you can do is hold simple conversations, perhaps answering direct questions. If you try to do anything more, such as ask someone a question in return, you won't necessarily know when the answer comes back. This is exactly the HTTP state problem. The HTTP server sees only a series of requests--it needs extra help to know exactly who's making a request.[1]

[1] If you're wondering why the HTTP server can't identify the client by the connecting machine's IP address, the answer is that the reported IP address could possibly be the address of a proxy server or the address of a server machine that hosts multiple users.

The solution, as you may have already guessed, is for a client to introduce itself as it makes each request. Each client needs to provide a unique identifier that lets the server identify it, or it needs to give some information that the server can use to properly handle the request. To use the chat example, a participant has to begin each of his sentences with something like "Hi, I'm Jason, and ..." or "Hi, I just asked about your age, and ...". As you'll see in this chapter, there are several ways for HTTP clients to send this introductory information with each request.

The first half of the chapter explores the traditional session-tracking techniques used by CGI developers: user authorization, hidden form fields, URL rewriting, and persistent cookies. The second half of the chapter demonstrates the built-in support for session tracking in Version 2.0 of the Servlet API. This support is built on top of the traditional techniques and it greatly simplifies the task of session tracking in your servlets.

7.1. User Authorization

One way to perform session tracking is to leverage the information that comes with user authorization. We discussed user authorization back in Chapter 4, "Retrieving Information", but, in case you've forgotten, it occurs when a web server restricts access to some of its resources to only those clients that log in using a recognized username and password. After the client logs in, the username is available to a servlet through getRemoteUser().

We can use the username to track a client session. Once a user has logged in, the browser remembers her username and resends the name and password as the user views new pages on the site. A servlet can identify the user through her username and thereby track her session. For example, if the user adds an item to her virtual shopping cart, that fact can be remembered (in a shared class or external database, perhaps) and used later by another servlet when the user goes to the check-out page.

For example, a servlet that utilizes user authorization might add an item to a user's shopping cart with code like the following:

String name = req.getRemoteUser();
if (name == null) {
  // Explain that the server administrator should protect this page
}
else {
  String[] items = req.getParameterValues("item");
  if (items != null) {
    for (int i = 0; i < items.length; i++) {
      addItemToCart(name, items[i]);
    }
  }
}

Another servlet can then retrieve the items from a user's cart with code like this:

String name = req.getRemoteUser();
if (name == null) {
  // Explain that the server administrator should protect this page
}
else {
  String[] items = getItemsFromCart(name);
}

The biggest advantage of using user authorization to perform session tracking is that it's easy to implement. Simply tell the server to protect a set of pages, and use getRemoteUser() to identify each client. Another advantage is that the technique works even when the user accesses your site from different machines. It also works even if the user strays from your site or exits her browser before coming back.

The biggest disadvantage of user authorization is that it requires each user to register for an account and then log in each time she starts visiting your site. Most users will tolerate registering and logging in as a necessary evil when they are accessing sensitive information, but it's overkill for simple session tracking. We clearly need a better approach to support anonymous session tracking. Another small problem with user authorization is that a user cannot simultaneously maintain more than one session at the same site.

Chapter 7. Session Tracking

Contents:

7.1. User Authorization