Does Wget Timeout

Does WGET timeout?

According to the man page of wget, there are a couple of options related to timeouts -- and there is a default read timeout of 900s -- so I say that, yes, it could timeout.


Here are the options in question :

-T seconds
--timeout=seconds

Set the network timeout to seconds
seconds. This is equivalent to
specifying --dns-timeout,
--connect-timeout, and
--read-timeout, all at the same
time.


And for those three options :

--dns-timeout=seconds

Set the DNS lookup timeout to seconds
seconds.
DNS lookups that don't
complete within the specified time
will fail.
By default, there is no
timeout on DNS lookups, other than
that implemented by system libraries.

--connect-timeout=seconds

Set the connect timeout to seconds
seconds.
TCP connections that take
longer to establish will be aborted.

By default, there is no connect
timeout, other than that implemented
by system libraries.

--read-timeout=seconds

Set the read (and write) timeout to
seconds seconds.
The "time" of
this timeout refers to idle time: if,
at any point in the download, no data
is received for more than the
specified number of seconds, reading
fails and the download is restarted.

This option does not directly
affect the duration of the entire
download.


I suppose using something like

wget -O - -q -t 1 --timeout=600 http://www.example.com/cron/run

should make sure there is no timeout before longer than the duration of your script.

(Yeah, that's probably the most brutal solution possible ^^ )

wget using --timeout and --tries together

Wrong Number of Retries

Your wget seems to resolve the URL to multiple IP addresses as seen in the second line of your wget's output. Each IP is then tested with the specified timeout. Unfortunately I haven't found any options to limit the DNS lookup to one address or set a total timeout for all IPs together. But you could try to use "<googles ip address>:81/not-there" instead of the domain name.

To automatically resolve the domain to a single IP address you can use

wget "http://$(getent hosts www.google.com | sed 's/ .*//;q'):81/not-there"

Seemingly Too Long Timeout

As you already found out, setting --retry-connrefused lets wget retry even after a »connection refused« error. The specified timeout is used for each retry, but between the retries there will be a pause which gets longer after each retry.

Example

wget --timeout=1 --tries=5 --retry-connrefused URL

does something like

try to connect for 1 second
failed -> wait 1 second
try to connect for 1 second
failed -> wait 2 seconds
try to connect for 1 second
failed -> wait 3 second
try to connect for 1 second
failed -> wait 4 second
try to connect for 1 second

Therefore the command takes tries * timeout + 1 + 2 + ... + (tries - 1) seconds. This behavior is specified in man wget under the option, which allows you to change it :)

--waitretry=seconds

If you don't want Wget to wait between every retrieval, but only
between retries of failed downloads, you can use this option. Wget
will use linear backoff, waiting 1 second after the first failure
on a given file, then waiting 2 seconds after the second failure on
that file
, up to the maximum number of seconds you specify.

By default, Wget will assume a value of 10 seconds.

I think you wanted to use something like

wget --timeout=1 --waitretry=0 --tries=5 --retry-connrefused URL

which eliminates the pause between two retries, resulting in a total time of timeout * tries.

Limit wget wait time on non-responsive URL's

From the wget man page:

   -t number
--tries=number
Set number of tries to number. Specify 0 or inf for infinite retrying. The default is to retry 20 times, with the exception of fatal errors like "connection refused" or "not found" (404), which are not retried.

-T seconds
--timeout=seconds
Set the network timeout to seconds seconds. This is equivalent to specifying --dns-timeout, --connect-timeout, and --read-timeout, all at the same time.

When interacting with the network, Wget can check for timeout and abort the operation if it takes too long. This prevents anomalies like hanging reads and infinite connects. The only timeout enabled by default is a
900-second read timeout. Setting a timeout to 0 disables it altogether. Unless you know what you are doing, it is best not to change the default timeout settings.

All timeout-related options accept decimal values, as well as subsecond values. For example, 0.1 seconds is a legal (though unwise) choice of timeout. Subsecond timeouts are useful for checking server response times or
for testing network latency.

So if you can do e.g.

wget -H -k -e robots=on -P ~/data/quails/train/cail_quail/ -i http://image-net.org/api/text/imagenet.synset.geturls?wnid=n01804478 -T 5 -t 1

It'll timeout after 5 seconds and not retry

handling timeout in wget

You probably want to use a log file like this:.

#!/bin/bash

TARGET="$1"

if wget -S -t 1 --timeout=600 --spider https://"${TARGET}" --no-check-certificate > log.txt 2>&1; then
echo "VALID URL"
else
error="$(awk 'BEGIN { IGNORECASE=1 } /( failed| error| bad|unable | invalid| unrecognized)/ { err=$0 } END { if(match(err,/http:\/\/: Invalid host name/)) err=""; printf("%s",err) }' log.txt)"
# Connection reset by peer, Connection timed out, Network is unreachable.
if printf '%s' "$error" | grep -qiE '( peer| timed| unreachable)'; then
echo "VALID BUT UNREACHABLE"
# ERROR 403: Access denied/Forbidden.
elif printf '%s' "$error" | grep -q ' 403'; then
echo "FORBIDDEN"
# Not Found.
elif printf '%s' "$error" | grep -q ' 404'; then
echo "NOT FOUND"
# Fatal error.
else
echo "FATAL ERROR"
echo "$error"
fi
exit 1 # Exit the script
fi

# Delete the temporary log file.
# rm -f log.txt


Related Topics



Leave a reply



Submit