Improving Internet archive service through proxy cache

Hsiang‐Fu Yu, Yi‐Ming Chen, Shih‐Yong Wang, Li‐Ming Tseng
2003 Internet Research  
Downloading archives is a useful service. Many users download archive files containing shareware or freeware through Internet. Traditionally, FTP servers are major archive providers and users apply the Archie server to locate FTP archives. With the extreme popularity of the WWW, Web servers become new important archive providers and users start downloading archives via HTTP. Meanwhile, to alleviate massive HTTP traffic, proxy cache servers are widely deployed on Internet. However, we find the
more » ... t rate of archives in cache servers is quite low although archives are responsible for a large percentage of network traffic. This study proposes a combination of caching and better searching mechanisms to alleviate the problem. We enable a particular proxy server to automatically collect WWW and FTP archives from its cache, to organize them in the form of an FTP directory, and then to offer the directory list to the Archie. Accordingly, users can find archives on WWW and FTP servers through the Archie, and they can directly download archives from the proxy server. Thus, the reuse of cached archives is improved. A system was implemented and operated on a real environment to evaluate the approach. 2 Empirical results indicate that the reuse rate of cached objects increased by 18% to 37%. University (NCU) in Taiwan. Its host name was proxyftp.csie.ncu.edu.tw. This server served roughly one thousand clients daily. It was equipped with a personal computer with two Intel Pentium II 450 MHz processors. Its proxy software was SQUID (Squid, 2002), version 2.2.STABLE4, with a 108 Gbytes cache. The access log of consecutive days was concatenated to obtain longer data sets. A 47-day trace spanning May 14 to June 29, 1999 was created. Table I summarizes the reduced access logs for the proxy. Many users downloaded archives from WWW servers, with a total of 205,001 requests being observed. Because downloading archives was not as frequent as browsing pages
doi:10.1108/10662240310458387 fatcat:zx57enzoofao7dprpe344ack64