How to Wget The More Recent File of a Directory

How to wget the more recent file of a directory

The files seem to be sorted by the release date, with each new release being a new entry with a new name reflecting the version number change, so checking timestamps of a certain file seems unnecessary.

Also, you have provided a link to a "directory", which essentially is a web page. AFAIK, there is no such thing as a directory in http (which is a communication protocol serving you data at the given address). What you see is a listing generated by the server that resembles windows folders for the ease of use, though it's still a web page.

Having that said, you can scrape that web page. The following code downloads the file at first position on the listing (assuming the first one is the most recent one):

#!/bin/bash

wget -q -O tmp.html http://www.rstudio.org/download/daily/desktop/ubuntu64/
RELEASE_URL=`cat tmp.html | grep -m 1 -o -E "https[^<>]*?amd64.deb" | head -1`
rm tmp.html

# TODO Check if the old package name is the same as in RELEASE_URL.

# If not, then get the new version.
wget -q $RELEASE_URL

Now you can check it against your local most-recent version, and install if necessary.

EDIT: Updated version, which does simple version checking and installs the package.

#!/bin/bash

MY_PATH=`dirname "$0"`
RES_DIR="$MY_PATH/res"

# Piping from stdout suggested by Chirlo.
RELEASE_URL=`wget -q -O - http://www.rstudio.org/download/daily/desktop/ubuntu64/ | grep -m 1 -o "https[^\']*"`

if [ "$RELEASE_URL" == "" ]; then
echo "Package index not found. Maybe the server is down?"
exit 1
fi

mkdir -p "$RES_DIR"
NEW_PACKAGE=${RELEASE_URL##https*/}
OLD_PACKAGE=`ls "$RES_DIR"`

if [ "$OLD_PACKAGE" == "" ] || [ "$OLD_PACKAGE" != "$NEW_PACKAGE" ]; then

cd "$RES_DIR"
rm -f $OLD_PACKAGE

echo "New version found. Downloading..."
wget -q $RELEASE_URL

if [ ! -e "$NEW_PACKAGE" ]; then
echo "Package not found."
exit 1
fi

echo "Installing..."
sudo dpkg -i $NEW_PACKAGE

else
echo "rstudio up to date."
fi

And a couple of comments:

  • The script keeps a local res/ dir with the latest version (exactly
    one file) and compares it's name with the newly scraped package name.
    This is dirty (having a file doesn't mean that it has been
    successfully installed in the past). It would be better to parse the
    output of dpkg -l, but the name of the package might slightly
    differ from the scraped one.
  • You will still need to enter the
    password for sudo, so it won't be 100% automatic. There are a few
    ways around this, though without supervision you might encounter the
    previously stated problem.

Tell wget to only download the most recent file in a directory?

You can try to download the latest file using its version number:

wget ftp://ftp-cdc.dwd.de/pub/CDC/grids_germany/hourly/radolan/recent/$(wget -O- ftp://ftp-cdc.dwd.de/pub/CDC/grids_germany/hourly/radolan/recent/ | egrep -o 'raa01-rw_10000-[0-9\.]+\-dwd---bin.gz' | sort | tail -1)

Download latest file using wget

This should do the trick:

wget https://myserver@my.domain/ssl/$(grep -oE "^grafana[^[:space:]]+" downloaded_file.txt | sort | tail -n 1)

Downloading Most Recent File On Server Via wget

If there is an index of all the files you could first download that and then parse it to find the most recent file.

If that is not possible you could count backwards from the current time (use date +%M in addition to date +%H) and stop if wget was able to get the file (e.g. if wget exits with 0).

Hope it helps!


Example to parse the index:

filename=`wget -q -O - http://thredds.ucar.edu/thredds/catalog/grib/nexrad/composite/unidata/NEXRAD_Unidata_Reflectivity-20140501/files/catalog.html | grep '<a href=' | head -1 | sed -e 's/.*\(Level3_Composite_N0R_[0-9]*_[0-9]*.grib2\).*/\1/'`

This fetches the page and runs the first line containing a <a href= through a quick sed to extract the filename.

Downloading Only Newest File Using Wget / Curl

You can just run following command periodically:

wget -r -nc --level=1 http://mrms.ncep.noaa.gov/data/2D/RotationTrackML1440min/

It will download recursively whatever is new in the directory after last run.

Using wget to recursively fetch a directory with arbitrary files in it

You have to pass the -np/--no-parent option to wget (in addition to -r/--recursive, of course), otherwise it will follow the link in the directory index on my site to the parent directory. So the command would look like this:

wget --recursive --no-parent http://example.com/configs/.vim/

To avoid downloading the auto-generated index.html files, use the -R/--reject option:

wget -r -np -R "index.html*" http://example.com/configs/.vim/

How to specify the download location with wget?

From the manual page:

-P prefix
--directory-prefix=prefix
Set directory prefix to prefix. The directory prefix is the
directory where all other files and sub-directories will be
saved to, i.e. the top of the retrieval tree. The default
is . (the current directory).

So you need to add -P /tmp/cron_test/ (short form) or --directory-prefix=/tmp/cron_test/ (long form) to your command. Also note that if the directory does not exist it will get created.

How to download latest file from remote repository using shell script?

You could perhaps do something like this:

#!/bin/bash

# base path to server
base="http://path/to/file/"
# get files contained in base dir
lynx -dump -listonly "${base}" | grep http | grep '1\.0\.[0-9]+\.zip' | awk '{print $2}' > .tmpfiles
# get latest version number (x)
version=$(awk -F'.' '{ print $3 }' .tmpfiles | sort | tail -n1)
# get filename of latest build
latest=$(grep "1.0.${version}" .tmpfiles)
# download latest file
wget -qO - "${base}${latest}" > .tmp.zip && unzip .tmp.zip

You could easily modify this so that you can pass the version number as a parameter to the script.

But it would be a lot easier if the builds were stored on an FTP server then you can create a simple FTP script.



Related Topics



Leave a reply



Submit