[H-GEN] Want squid to automatically update certain sites every 10 mins

Shane Ravenn sravenn at optusnet.com.au
Sat Mar 22 20:35:45 EST 2003


On Sun, 23 Mar 2003 09:49:34 +1000
"t" <s4565 at lycos.co.uk> wrote:

> Thanks for the wget suggestion.  I have just been trying it out but I
> cannot seem to download the images.  If I do
> 
> wget  -nd --delete-after   "www.smh.com.au"
> 
> the it downloads ok, but no images.  If I do
> 
> wget  -r -nd --delete-after  --level=1  "www.smh.com.au"
> 
> then it downloads both text and images but it dosent seem to respect
> the level command and it downloads heaps of text and images which are
> not from that page.
> 
> Any ideas?

Hi 

Among the best options for this would be this bit, skimmed dirrectly out
of the man page:
 -p
 --page-requisites
 This option causes Wget to download all the files that are necessary to
 properly display a given HTML page.  This includes such things as 
 inlined images, sounds, and referenced stylesheets.

 Ordinarily, when downloading a single HTML page, any requisite
 documents that may be needed to display it properly are not downloaded.
 Using -r together with -l can help, but since Wget does not ordinarily
 distinguish between external and inlined documents, one is generally
 left with ``leaf documents'' that are missing their requisites.

Basically, it grabs all items needed to show the page.

Some sites have odd layouts which cause wget to become confused. This
then gives the impression that the --level option is ignored, or that it
grabs from multiple hosts when you only want one. It makes it hard to
work around or give a definite answer, but the fun is in trying out
things to see what works for you.

I hope this helps.

Shane Ravenn

-- 
A wise person makes his own decisions, a weak one obeys public opinion. 
 - Chinese proverb
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.humbug.org.au/pipermail/general/attachments/20030323/f45c9f0c/attachment.sig>


More information about the General mailing list