2009/03/29

Conditional GET and ETag implementation for ASP.NET

This post continues the series of Internet Information Services IIS optimization. See the link if you want to follow the whole series.

You can download the VB project code of this article. Another way for optimizing your web site is setting it up for supporting conditional GET, that is, implementing the logic for handling requests whose headers specify If-None-Match (ETag) and/or If-Modified-Since values. This is not something easy, since ASP.NET does not offer support for it directly, nor have primitives/methods/functions for it and, by default, always returns 200 OK, no matter the headers of the request (apart from errors, such as 404, and so).

The idea behind this is quite simple; let’s suppose a dialog between a browser (B) and a web server (WS):

B: Hi, can you give me a copy of ~/document.aspx?
WS: Of course. Here you are: 200Kb of code. Thanks for coming, 200 OK.
B: Hi again, can you give me a copy of ~/another-document.aspx?
WS: Yes, we’re here to serve. Here you are: 160Kb. Thanks for coming, 200 OK.
(Now the user clicks on a link that points to ~/document.aspx or goes back in his browsing history)
B: Sorry for disturbing you again, can I have another copy of ~/document.aspx
WS: No problem at all. Here you are: 200Kb of code (the same as before). Thanks for coming, 200 OK.

Stupid, isn’t it? The way for enhancing the dialogue and avoid unnecessary traffic is having a richer vocabulary (If-None-Match & If-Modified-Since). Here is the same dialogue with these improvements:

B: Hi can you give me a copy of ~/document.aspx?
WS: Of course. Here you are: 200Kb of code. ISBN is 55511122 (ETag) and this is the 2009 edition (Last-Modified). Thanks for coming, 200 OK.
B: Hi again, can you give me a copy of ~/another-document.aspx?
WS: Yes, we are here to serve. Here you are: 160Kb. ISBN is 555111333 (ETag) and it is the 2007 edition (Last-Modified). Thanks for coming, 200 OK.
(Now the time passes and the user goes back to ~/document.aspx, maybe it was in his favorites, or arrived to the same file after browsing for a while)
B: Hi again, I already have a copy of ~/document.aspx, ISBN is 555111222 (If-None-Match), dated 2009 (If-Modified-Since). Is there any update for it?
WS: Let me check… No, you are up to date, 0Kb transferred, 304 Not modified.

It sounds more logical. It takes a little more dialogue (negotiation) previous to the transaction, but if the conditions are met, these extra words saves time and money (bandwidth) on both parties.

Most of the browsers nowadays support such a negotiation, but the web server must do it also in order to get benefits. Unfortunately IIS only supports conditional GET natively for static files. If you want to use it also for dynamic content (ASP.NET files) you need to add support for it programmatically. That’s what we are going to show here.

 

Calculating Last-Modified response header.

To begin with, the server needs to know when a page was last modified. This is very easy for static contents, a simple mapping between the web page being requested and the file in the underlying file system and you are done. The calculation of this date for .ASPX files is a little more complicated. You need to consider all the dependencies for the content being served and calculate the most recent date among them. For instance, let’s suppose the browser requests a page at ~/default.aspx and this file is based on a masterpage called ~/MasterPage.master which has a menu inside it, that grabs its contents from the file ~/web.sitemap. In the simplest scenario (no content being retrieved from a database, no user controls), ~/default.aspx will contain static content within. In this case, the Last-Modified value will be the most recent last modification time of these files:
  • ~/default.aspx
  • ~/default.aspx.vb (Optionally, depending on your pages having code behind which modifies the output or not)
  • ~/MasterPage.master
  • ~/MasterPage.master.vb (Optionally)
  • ~/web.sitemap

The last-mod time is retrieved using System.IO.File.GetLastWriteTime. In case of the content being retrieved from a database, you must have a column for storing last-mod-time (when the content was last written) in order to use this functionality.

 

Calculating ETag response header.

The second key of the dialogue is the ETag value. It is simply a hash function for the final contents being served. If you have any way (with low CPU footprint) for calculating a hash based on certain textual input, it can be used. In our implementation, we used CRC32 but any other will work the same way. We calculate the ETag value making a CRC32 checksum of any dependant content plus the last-mod-dates of these dependencies. I our simplest case, the concatenation of all these strings:
  • ~/default.aspx last write time
  • ~/default.aspx.vb last write time (not likely, but optionally necessary)
  • ~/MasterPage.master last write time
  • ~/MasterPage.master.vb last write time (Optionally)
  • ~/web.sitemap last write time
  • ~/default.aspx contents
  • ~/default.aspx.vb contents (Optionally, but not likely, to speed up calculations)
  • ~/MasterPage.master contents
  • ~/MasterPage.master.vb (Optionally)
  • ~/web.sitemap contents

And then a CRC32 of the whole. If your content is really dynamically generated (from a database, or by code), you will need to use it also, like any other dependency and include it in the former list.
It might seem too much burden, too much CPU usage but, as everything, it really depends on the website:

  High CPU usage Low CPU usage
High volume This scenario might not cope with the extra CPU needed. See Note*. You can safely spend CPU cycles in order to save some bandwidth. Implementing conditional GETs is a must.
Low volume What kind of web server is it? Definitely not a public web server as we know them. Implementing conditional GETs will give your website the impression of being served faster.

Note*: Consider this question: Is your CPU usage so high partly because the same contents are requested once and again by the same users? If you answer is yes (or maybe), an extra CPU usage with the intention of allowing client-side caching and conditional GETs will, globally viewed, lower your overall CPU usage and also the bandwidth being used. Giving a try to this idea and decide for yourself afterwards.

 

Returning them in the response.

Once we have calculated both the Last-Modified & Etag values, we need to return them with the response of the page. This is done using the following lines of code:
Response.Cache.SetLastModified(LastModifiedValue.ToUniversalTime)    
Response.Cache.SetETag(ETagValue)

 

Looking for the values in request’s headers.

Now that our pages’ responses are generated with Last-Modified and ETag headers, we need to check for those values in the requests too. The names of those parameters, when asked via request headers differ from the original names:

Response headers names Request headers names
Last-Modified If-Modified-Since
ETag If-None-Match

The logic for deciding if we should return 200 OK or 304 Not modified is as follows:

  • If both values (If-Modified-Since & If-None-Match) were provided in the request and both match, return 304 and no content (0 bytes)
  • If any of them do NOT match, return 200 and the complete page
  • If only one of them was specified (If-Modified-Since or If-None-Match), it decides.
  • If none were provided, always return 200 and the complete page.
In order to return 304 and no content for the page the code to be used is:
Response.Clear()    
Response.StatusCode = System.Net.HttpStatusCode.NotModified     
Response.SuppressContent = True

 

Test 1: Requests to ~/default.aspx

Having the ideas in place, we have reused the VB project from the previous post ASP.NET menu control optimization, to add it the support for conditional GETs. In the sample VB project for this post there are 2 new files, under App_Code, called CRC32.vb which implements a crc32 checksum algorithm, and another one named HttpSnippets.vb which implements a method called ConditionalGET that does most the jobs explained in this post. We have used Fiddler2 to debug two requests made to ~/default.aspx.

The first one, shown in the left column (red arrow), is made without the browser having any cached information about it. As you can see the browser makes the request without providing any If-Modified-Since or If-None-Match headers. The response given by the server sets the ETag and Last-Modified values for the browser to use in the future in case it supports them.

The second request, shown at the right column (green arrow), is made by the same browser some seconds later. The browser already have information for the page being requested and provides that information along with the request: the If-Modified-Since and If-None-Match headers are provided. The result from the server in this case is different. Instead of returning 200 Ok and the whole page, it returns 304 Not Modified, and the size of the body is 0. You are saving bandwidth at the cost of some CPU cycles and some bytes more in the negotiation (headers).

 

Test 2: Requests to ~/default-optimized.aspx

Following with the ASP.NET menu control optimization project, we added also support for conditional GET to our ~/default-optimized.asp page, which saves the menu in an external client-side cacheable page, in order to reduce (even more) the size of the pages being transferred.

In this case the first column (red arrow) belongs to the request of ~/default-optimized.aspx. As you can see the size of the page being transferred completely is 3785 bytes (in the previous example it was 18358 bytes). This reduction is solely due to the ASP.NET menu control optimization. For more info about this check the previous article. Regarding the conditional GET, the first request does not know anything about the page and no data is provided in the request. The response includes ETag and Last-Modified values.

The second request of interest is at the right column (green arrow) and belongs to the same browser requesting the same file some seconds later. This time, information about the page is provided by the browser with the headers (If-Modified-Since and if-None-Match values). The server then checks them and decides that the content has not changed, returning 304 Not Modified and a body length of 0 bytes.

It seems that ASP.NET Developer Server (“Cassini”), the web server used for debugging with VS2008, does not handle static files very well. As you can see, menu.css and some other static files under ~/resources/ are transferred completely with every request. No ETag nor Last-Modified values are returned for them automatically. This does not happen at all in real production environments with IIS, which handles static files correctly (calculating ETags and Last-Modified values) to avoid transferring static files unnecessarily.

 

Resources and links.

Internet Information Services IIS optimization
For live websites (in the public internet) you can easily test if they support Conditional GETs using HTTP compression and HTTP conditional GET test tool
Another valuable resource is Fiddler2.
The VB website project source sample is available for you to download.

7 comments:

Ben Amada said...

If HttpCachePolicy is set to Private (the default setting), it appears Response.Cache.SetETag() doesn't send the ETag header in the response. Manually appending the ETag header via Response.AppendHeader("ETag", ETagValue) does work, however.

I read about this at the link below and verified via Fiddler that the ETag header isn't sent to the browser when Response.Cache.SetETag() is used.

http://edn.embarcadero.com/article/38123

J.A. said...

That is the reason for me to change the default cache policy before using Cache.SetETag():

Look at the code that sets the ETag in my project file HttpSnippets.vb. I use Response.Cache.SetCacheability(HttpCacheability.ServerAndPrivate) and then Response.Cache.SetETag(myETagValue).

Look at the Fiddler captures I have prepared also. As you can see, after doing so, ETag value is sent and everything works.

Unknown said...

Thanks a lot for the detailed explanation and excellent code for Conditional GET and ETag.
I have successfully implemented the solution on my own website.
And thanks a lot for introducing me to Fiddler - it is an excellent tool.

Unknown said...

Great article on ETag and Last-Modified. I feel I need to point out that you may also want to set your cache policy to allow the browser to cache the file. If, in your scenario, document.aspx returned back an Expires header, then the second conversation wouldn't even happen (unless the user refreshed the page, or cleared their cache.) This means no added server overhead (bandwidth or processing). Of course, you lose some control as well, and ETag and Last-Modified allow you to change the document more frequently and irregularly, while keeping your clients up-to-date. Does your ETag have to be a CRC32, or will any unique value work? (for instance, perhaps if default.aspx returned the most recently inserted record in a database, I might use the record's primary key as the ETag). Also, have you thought about or experimented with output caching in IIS 7?

J.A. said...

Thanks Toby:

You are right in the cache policy & expires header: if used, the request would not even be done and you can save at least one round-trip to the server, at the cost of losing control (for counting hits on your page for instance). Using them or not depends on the scenario.

ETag does not need to be a CRC32. It can be whaterver string you want, as long as you ensure that it changes whenever the final page changes. Note that I said the whole page, the whole contents, thus I mean: the menu, structure (masterpage), footers, banners and whatever module/block you use on the page. That's why I calculate a CRC32 on the concatenation of the contents plus modtimes and so on.

Using your table's PK might not work, particularily with updates, since the PK is not usually changed when an update is done. You must complement that PK with more fields, a LastModTime field for instance.

Regarding IIS7, no I have not tested it, but it should work just the same way.

Good luck with your implementation!

ramo271285 said...

Hi,

Very nice your post, I want to test it in my web sites.

Do you have a C#.

Thank you very much.

Omar

J.A. said...

No, Omar, I did not make a C# version for this.

It shouldn't be, however, very difficult to make a translation from VB to C# since most of the code uses the .NET framework. With some "search and replace" rounds and some restylish of the VB code, you should be ready to go.

Regards.