Wget: Don't Follow Redirects

wget: don't follow redirects if it points to location X, otherwise do

As @Thor84no said, one solution can parse response. This is mine:

REDIRECTED_TO=`wget --max-redirect 0 $ADDRESS 2>&1 | grep "Location" | sed 's|.*\(http://.*/.*\) .*|\1|'`

if [ "$REDIRECTED_TO" != "$BAD_REDIRECTION" ]; then wget $REDIRECTED_TO; fi

Wget redirects even though robots are off

Interestingly the website example you've provided returns results based on the user-agent string. With the default user-agent, the server returns a 301 response and asks wget to download only the first page.

You can simply change the user-agent string to make it work. e.g.:
--user-agent=mozilla

Wget doesn't download recursively after following a redirect

You need to use --span-hosts (-H) with --domains:

wget --recursive --level=10 --convert-links -H \
--domains=www.btlregion.ru btlregion.ru

--span-hosts allows wget to follow links pointing to other domains, and --domains restricts this to only follow links to the listed domains, to avoid downloading the internet.

The option --domains will, somewhat contrary to intuition, only work together with -H. This is mentioned in the docs, but in a way that's hard to understand.



Related Topics



Leave a reply



Submit