[H-GEN] Squid addon
Martin Pool
mbp at uq.net.au
Thu Apr 29 08:21:35 EDT 1999
(Note reply-to: being general at humbug.org.au vs Martin Pool <mbp at uq.net.au>)
On Thu, 29 Apr 1999 21:55:08 Craig Eldershaw wrote:
> (Note reply-to: being general at humbug.org.au vs Craig Eldershaw <ce at comlab.ox.ac.uk>)
>
> >> of course you'd have to base it on both the name and file size, and make a
> >> lower limit, there is a remote chance that index.html might be 3,212bytes
> >> at server X+Y
> >
> >Again, as mentioned above, this approach would be unsound, since there may
> >well be a fair probability of two completely different files on the same
> >server having the exact same size and filename.
>
> Agreed, think of the odds: take the number of web servers in the world
> then multiply by number of directories to give an estimate (order of
> magnitude only) if files called index.html. That's an awful lot.
>
> >If such a Squid addon were to be written, I'd imagine it'd cache the file,
> >and generate a unique checksum for comparisons, therefore making the
> >possibility of "mistakes" much lower. However generating checksums
> >on-the-fly would have to be kept relatively quick, so as to not not bog
> >down the caching software overly -- I doubt this'd be a problem since
> >squid is probably I/O bound anyhow.
>
> In my limited knowledge (any number theorists on the list ?), using
> name, size and an MD5-like checksum would be almost certain to be
> safe. The problem is, how does squid's sister get the checksum of
> file.x at Y without actually downloading it ? Of course if the HTTP
> protocol was modified, and all servers upgraded to provide such
> checksums on demand, then squid's sister could simply ask for the
> checksum (in the way that one can request the last modified by date).
> But that's a very long-term kind of thing.
There's some kind of support for this in the HTTP/1.1 E-Tag (entity tag) header. I can imagine for example collaborating web mirrors using this header as a way of assuring proxies and clients that they are supplying exactly the same file. However, I'm not sure of anyone using this. HTTP/1.1 also supports an MD5Sum header which can be used to check integrity and equivalence.
So, there's at least some support is in HTTP, but it's not as widely used as it might be.
--
Martin Pool
(My first message through balsa... cross your fingers!)
--
This is list (humbug) general handled by majordomo at lists.humbug.org.au .
Postings only from subscribed addresses of lists general or general-post.
More information about the General
mailing list