While analyzing the CDN access logs for a site I realized that the ratio of pageviews per visit didn’t at all reflected the amount of css/js files that was transferred. Considering browser cache I expected roughly one css/js access per visit, perhaps slighly less considering some return visitors.
I found that access to css/js was a factor 10 to 100 less than expected. I also found that a small bunch of IP's where causing huge amounts of traffic. The top IP’s causing traffic to the site fell into two categories, crawlers (googlebot, bingbot and similar) and ISP’s.
Obviously crawlers don’t need to get the css/js files over and over again but ISP’s not getting them when the traffic is obviously from multiple clients behind a large NAT setup or similar, why? Thinking about it for a while my best guess is that they simply do what the CDN does, pass the of traffic through a caching proxy and cache everything according to http headers, with some extra intelligence to figure out what is important enough to stay in the cache.
This has two important implications:
1. If you think you can control caching by controlling your CDN configuration and using the CDN purge function you are wrong. Just as something stuck in browser cache being out of your control the ISP cache is also out of your control.
2. If you don't cache bust your resources properly you'll end up with a lot of weird behaviour for users with ISP's that have a cache.
If enough ISP's start doing this and do it well, I even see this as an important improvement to the overall performance of the Internet. The ISP cache would be close to the user and in terms of traffic and end user performance it is a win-win for both the ISP and the site owner. This is really a half decent content delivery network and it's free!
If someone has insight into ISP's it would be interesting to hear what technology they are using and how they are thinking. My findings might be specific to a site targeting mobile users, maybe mobile operators are more aggressive in this area? But it would be really beneficial to all ISP's.
Surprised? I've always known that there is a potential for all kinds of proxies around the Internet. I just never seen it in effect, and I certainly didn't expect it to be this effective!