[H-GEN] Squid addon

Thu Apr 29 07:55:08 EDT 1999

(Note reply-to: being general at humbug.org.au vs Craig Eldershaw <ce at comlab.ox.ac.uk>)

>> of course you'd have to base it on both the name and file size, and make a
>> lower limit, there is a remote chance that index.html might be 3,212bytes
>> at server X+Y
>
>Again, as mentioned above, this approach would be unsound, since there may
>well be a fair probability of two completely different files on the same
>server having the exact same size and filename. 

Agreed, think of the odds:  take the number of web servers in the world
then multiply by number of directories to give an estimate (order of
magnitude only) if files called index.html.  That's an awful lot.

>If such a Squid addon were to be written, I'd imagine it'd cache the file,
>and generate a unique checksum for comparisons, therefore making the
>possibility of "mistakes" much lower.  However generating checksums
>on-the-fly would have to be kept relatively quick, so as to not not bog
>down the caching software overly -- I doubt this'd be a problem since
>squid is probably I/O bound anyhow. 

In my limited knowledge (any number theorists on the list ?), using
name, size and an MD5-like checksum would be almost certain to be
safe.  The problem is, how does squid's sister get the checksum of
file.x at Y without actually downloading it ?  Of course if the HTTP
protocol was modified, and all servers upgraded to provide such
checksums on demand, then squid's sister could simply ask for the
checksum (in the way that one can request the last modified by date).
But that's a very long-term kind of thing.

C.

--
This is list (humbug) general handled by majordomo at lists.humbug.org.au .
Postings only from subscribed addresses of lists general or general-post.