What Is “Cache” and “Proxy”, how it speed up today’s internet
Cache refers one of the fastest costly volatile memory in computer system. Literally cache means “a collection of items of the same type stored in a hidden or inaccessible place1. In computer architecture cache is the closest memory from CPU after register. Important and frequently usable data are automatically stored and retrieved by CPU from Cache very fast and thus significantly speed up the system2. This article is about how cache memory is implemented to speed up today’s internet system.
Cache concept in internet is exactly same as CPU cache although they are implemented in different way. Internet access speed decreases with increasing distance from server as well as link cost increases. So, when a group of user (clients) from geologically close location browse same content repetitively (like reading an online forum or news etc.) it’s better to catch and store the content in some local server rather each and every time accessing the remote server over internet. Cache system installed by big ISPs and organization do exactly same. Nowadays cache is also implemented in each and every browser from PC to android.
Internet Service Providers (ISPs) install such cache server and when an object is queried more than once by their clients its delivered from that Cache Server rather than the remote server, provided the content is till unchanged in remote server from when it was cached in local Cache server.
Now the big part is how to be sure that the content is still unchanged in today’s dynamic world..!! Internet protocol like http (for web browsing), DNS (for domain name to IP resolution) etc ensure that in different ways.
Web Cache: For http and https each request by client is first passed via browser cache then through ISP cache (if ISP have one) to the remote server. If the requested object (web page, image) is not present in browser cache, its requested from ISP cache and only if its absent in ISP cache ISP retrieve it from remote server, keep a copy to cache and pass it to browser, which also keep a copy to local cache while presenting it to user. But if the required object is already in browser cache a small special quarry is send by browser to ISP and then ISP again quarry the remote server to know the present status of requested object. If it’s unchanged at remote end remote server only acknowledge it instead sending the object back. ISP cache pass that to browser and the object is served from cache. Else remote server send the modified object. Now in most case user cache is not updated but ISP cache is already updated due to quarry of same object from some other user. In that case ISP quarry from remote host to be sure that it have the latest object and supply it to browser instead fetching from remote server.
Basically, each time user requesting an object only a small quarry is send to remote server and if object is unchanged it’s supplied from cache instead from remote server. Now as browser cache resides in user PC and ISP cache is also close to user they are very quickly accessible. Only quarry bits of negligible size (compared to web object) are passed to remote Server over internet and thus significantly increase speed and browser satisfaction in a cost worthy manner.
Http3 use conditional GET to implement this. An If-modified-since header tag with a value of last caching date and time is added with http request. If the object is not changed since that time remote server send and 304 Not Modified header with an empty body in reply and if it’s changed the original object is send back with a 200 OK reply. Upon receiving 304 code cache simply pass the cached copy of object.
DNS: This method of quarrying is useful in web because quarry is of insignificant size than actual object. In DNS name resolution number of request is very high (say average 30/page for a WordPress site) and the size of DNS answer section is also small. So, quarry method doesn’t suits with DNS, instad DNS cache implement TTL (Time To Live) method. Considering DNS records are not going to be changed as fast as web content DNS Authorative server send a TTL value (in millisecond) with each answer of request. This TTL value indicates a maximum time after which cache server automatically discard the cached record. When a request is made within the TTL period of cached value cache server simply return it regardless of present status in authoritative DNS server. This indicate a primary reason why DNS update may take upto 48 hours to propagate.
Proxy A proxy server is an application level gateway server that often implement web-cache server, DNS Cache etc. Where cache server only use cache, whereas Proxy server implement and co-ordinate between several other thing like NAT, user login authentication for internet access etc. Now a days http, https, ftp and SOCKS based proxy server cheap and often implemented by VPN provider, Office and educational organization.
Implementing organization inform the protocol of proxy server and it’s IP address or a domain name pointed to the cache server to it’s user. When user configure to pass his traffic via the proxy server (Firefox proxy settings) service provider may implement Cache server or provided an item otherwise unavailable to user. NAT with a proxy server has become a common choice for setting up local organizational or academic network having same external IP for all of it’s user (That will be topic of another day).
References: