How to Check If There Is a There Is a Wget Instance Running

How to check results of wget/urllib2 in Python?

To check url:

import urllib2

def check(url):
    try:
        urllib2.urlopen(url).read()
    except EnvironmentError:
        return False
    else:
        return True

To investigate what kind of an error occurred you could look at the exception instance.

Python Wget: Check for duplicate files and skip if it exists?

wget.download() doesn't have any such option. The following workaround should do the trick for you:

import subprocess

url = "https://url/to/index.html"
path = "/path/to/save/your/files"
subprocess.run(["wget", "-r", "-nc", "-P", path, url])

If the file is already there, you will get the following message:

File ‘index.html’ already there; not retrieving.

EDIT:
If you are running this on Windows, you'd also have to include shell=True:

subprocess.run(["wget", "-r", "-nc", "-P", path, url], shell=True)

How to check which file wget is downloading while script is working in background?

ps command will tell you if it's still running or if it has crashed.

ls -alt in the download directory should show you the downloaded files in order of modification date. That should allow you to see what the last updated file was. If you then proceed to run that command a few more times you should be able to see if the size of that file is changing over time (i.e. download is in progress).

Checking if process still running?

I'd use pgrep to do this (caution, untested code):



exec("pgrep lighttpd", $pids);
if(empty($pids)) {

    // lighttpd is not running!
}

I have a bash script that does something similar (but with SSH tunnels):



#!/bin/sh

MYSQL_TUNNEL="ssh -f -N -L 33060:127.0.0.1:3306 tunnel@db"
RSYNC_TUNNEL="ssh -f -N -L 8730:127.0.0.1:873 tunnel@db"

# MYSQL
if [ -z `pgrep -f -x "$MYSQL_TUNNEL"` ] 
then
    echo Creating tunnel for MySQL.
    $MYSQL_TUNNEL
fi

# RSYNC
if [ -z `pgrep -f -x "$RSYNC_TUNNEL"` ]
then
    echo Creating tunnel for rsync.
    $RSYNC_TUNNEL
fi

You could alter this script with the commands that you want to monitor.

wget force retry until there is a connection

This loop should do this:

while true;do
wget -T 15 -c http://example.com && break
done

How it works:

In case there is no network connection, the while loop will not break and it will run the wget command continuously and keep printing error message.
As soon as it gets connected to the Internet, wget starts resolving the host and getting the files.
Now if the connection is lost or some other error occurs, default retry (don't use 0 or inf i.e unlimited retry, use limited value) of wget will retry to get the files until timeout of 15 seconds reached. After 15 seconds the wget command will fail and print error output and thus the while loop won't break. So it will again reach in a state where there is no connection or such and keep printing error message.
Again as soon as it gets connected or the error is resolved, wget starts resolving the host and getting the files. These steps (1-4) continue as long as the files are not downloaded completely.
This wget command uses -c option, i.e resume option. So every instances of wget will start (downloading) from where it left off.
When the files get downloaded completely and the wget command succeeds, the loop will break.

WGET seems not to work with user data on AWS EC2 launch

You have no way of knowing the current working directory when you execute the cd command. So specify full path:

cd /home/centos/testing

Try this:

#!/bin/bash
mkdir /home/centos/testing
cd /home/centos/testing 
wget https://validlink