How to Resume Interrupted Download Automatically in Curl

How to resume interrupted download of specific part of file automatically in curl?

It's clear that server doesn't support byte ranges

Check if curl incremental (--continue-at) download is successful

A HTTP 416 ("requested range not satisfiable") with -C - is a reasonable response when your file is already complete: The range of content after the current size of the file is an empty set, so while a server could return a 0-byte successful response, it can also state that no response is possible, which is what you're seeing here.

One approach you can take, if your service supports the Content-Length header, is extracting the intended file size from a HEAD request, and comparing that to the current size on-disk:

dest="$HOME/Downloads/$savePath/$filename"
if [[ -e $dest ]]; then
remote_size=$(curl -I "$url" | awk -F: '/^Content-Length:/ { print $2 }')
local_size=$(stat --format=%s "$dest")
if ! [[ $remote_size ]]; then
echo "Unable to retrieve remote size: Server does not provide Content-Length" >&2
elif ! [[ $local_size ]]; then
echo "Unable to check local size: Validate that GNU stat is installed" >&2
elif (( remote_size == local_size )); then
echo "File is complete" >&2
elif (( remote_size > local_size )); then
echo "Download is incomplete -- can probably resume" >&2
elif (( remote_size < local_size )); then
echo "Remote file shrunk -- probably should delete local and start over" >&2
fi
else
echo "File does not exist locally at all" >&2
fi

Note that stat --format is a GNU extension. If you're running on MacOS, you can install GNU stat as gstat via MacPorts; see BashFAQ #87 for a detailed discussion on extracting metadata if you don't have GNU tools.

Resuming curl download that goes through some bash script

The curl command you're running there doesn't download the VM images. It downloads a bash script called ievms.sh and then pipes the script to bash, which executes it.

Looking at the script, it looks like the file it downloads for IE10 is here:

http://virtualization.modern.ie/vhd/IEKitV1_Final/VirtualBox/OSX/IE10_Win8.zip

I think if you download that file (you could use your browser or curl) and put it in ~/.ievms, and then run the command again, it should see that the file has already been downloaded and finish the installation.

If the partially-downloaded file is already there, then you could resume that download with this command:

curl -L "http://virtualization.modern.ie/vhd/IEKitV1_Final/VirtualBox/OSX/IE10_Win8.zip" \
-C - -o ~/.ievms/IE10_Win8.zip

(Then run the original IEVMs curl command to finish installation.)

Does curl -C, --continue-at work when piping standard out?

From testing, curl will not retry using a range request.

I wrote a broken HTTP server, requiring the client to retry using a range-request to get a full response. Using wget

wget -O - http://127.0.0.1:8888/ | less

results in the full response

abcdefghijklmnopqrstuvwxyz

and I can see on the server side there way a request with 'Range': 'bytes=24-' in the request headers.

However, using curl

curl --retry 9999 --continue-at - http://127.0.0.1:8888/ | less

results in only the incomplete response, and no range request in the server log.

abcdefghijklmnopqrstuvwx

The Python server used

import asyncio
import re
from aiohttp import web

async def main():
data = b'abcdefghijklmnopqrstuvwxyz'

async def handle(request):
print(request.headers)

# A too-short response with an exception that will close the
# connection, so the client should retry
if 'Range' not in request.headers:
start = 0
end = len(data) - 2
data_to_send = data[start:end]
headers = {
'Content-Length': str(len(data)),
'Accept-Ranges': 'bytes',
}
print('Sending headers', headers)
print('Sending data', data_to_send)
response = web.StreamResponse(
headers=headers,
status=200,
)
await response.prepare(request)
await response.write(data_to_send)
raise Exception()

# Any range request
match = re.match(r'^bytes=(?P<start>\d+)-(?P<end>\d+)?$', request.headers['Range'])
start = int(match['start'])
end = \
int(match['end']) + 1 if match['end'] else \
len(data)
data_to_send = data[start:end + 1]
headers = {
'Content-Range': 'bytes {}-{}/{}'.format(start, end - 1, len(data)),
'Content-Length': str(len(data_to_send)),
}
print('Sending headers', headers)
print('Sending data', data_to_send)
response = web.StreamResponse(
headers=headers,
status=206
)
await response.prepare(request)
await response.write(data_to_send)
await response.write_eof()
return response

app = web.Application()
app.add_routes([web.get(r'/', handle)])

runner = web.AppRunner(app)
await runner.setup()
site = web.TCPSite(runner, '0.0.0.0', 8888)
await site.start()
await asyncio.Future()

asyncio.run(main())

Parallel download using Curl command line utility

Well, curl is just a simple UNIX process. You can have as many of these curl processes running in parallel and sending their outputs to different files.

curl can use the filename part of the URL to generate the local file. Just use the -O option (man curl for details).

You could use something like the following

urls="http://example.com/?page1.html http://example.com?page2.html" # add more URLs here

for url in $urls; do
# run the curl job in the background so we can start another job
# and disable the progress bar (-s)
echo "fetching $url"
curl $url -O -s &
done
wait #wait for all background jobs to terminate


Related Topics



Leave a reply



Submit