How to resume interrupted download of specific part of file automatically in curl?
It's clear that server doesn't support byte ranges
Check if curl incremental (--continue-at) download is successful
A HTTP 416 ("requested range not satisfiable") with -C -
is a reasonable response when your file is already complete: The range of content after the current size of the file is an empty set, so while a server could return a 0-byte successful response, it can also state that no response is possible, which is what you're seeing here.
One approach you can take, if your service supports the Content-Length header, is extracting the intended file size from a HEAD request, and comparing that to the current size on-disk:
dest="$HOME/Downloads/$savePath/$filename"
if [[ -e $dest ]]; then
remote_size=$(curl -I "$url" | awk -F: '/^Content-Length:/ { print $2 }')
local_size=$(stat --format=%s "$dest")
if ! [[ $remote_size ]]; then
echo "Unable to retrieve remote size: Server does not provide Content-Length" >&2
elif ! [[ $local_size ]]; then
echo "Unable to check local size: Validate that GNU stat is installed" >&2
elif (( remote_size == local_size )); then
echo "File is complete" >&2
elif (( remote_size > local_size )); then
echo "Download is incomplete -- can probably resume" >&2
elif (( remote_size < local_size )); then
echo "Remote file shrunk -- probably should delete local and start over" >&2
fi
else
echo "File does not exist locally at all" >&2
fi
Note that stat --format
is a GNU extension. If you're running on MacOS, you can install GNU stat as gstat
via MacPorts; see BashFAQ #87 for a detailed discussion on extracting metadata if you don't have GNU tools.
Resuming curl download that goes through some bash script
The curl command you're running there doesn't download the VM images. It downloads a bash script called ievms.sh
and then pipes the script to bash
, which executes it.
Looking at the script, it looks like the file it downloads for IE10 is here:
http://virtualization.modern.ie/vhd/IEKitV1_Final/VirtualBox/OSX/IE10_Win8.zip
I think if you download that file (you could use your browser or curl) and put it in ~/.ievms
, and then run the command again, it should see that the file has already been downloaded and finish the installation.
If the partially-downloaded file is already there, then you could resume that download with this command:
curl -L "http://virtualization.modern.ie/vhd/IEKitV1_Final/VirtualBox/OSX/IE10_Win8.zip" \
-C - -o ~/.ievms/IE10_Win8.zip
(Then run the original IEVMs curl command to finish installation.)
Does curl -C, --continue-at work when piping standard out?
From testing, curl will not retry using a range request.
I wrote a broken HTTP server, requiring the client to retry using a range-request to get a full response. Using wget
wget -O - http://127.0.0.1:8888/ | less
results in the full response
abcdefghijklmnopqrstuvwxyz
and I can see on the server side there way a request with 'Range': 'bytes=24-'
in the request headers.
However, using curl
curl --retry 9999 --continue-at - http://127.0.0.1:8888/ | less
results in only the incomplete response, and no range request in the server log.
abcdefghijklmnopqrstuvwx
The Python server used
import asyncio
import re
from aiohttp import web
async def main():
data = b'abcdefghijklmnopqrstuvwxyz'
async def handle(request):
print(request.headers)
# A too-short response with an exception that will close the
# connection, so the client should retry
if 'Range' not in request.headers:
start = 0
end = len(data) - 2
data_to_send = data[start:end]
headers = {
'Content-Length': str(len(data)),
'Accept-Ranges': 'bytes',
}
print('Sending headers', headers)
print('Sending data', data_to_send)
response = web.StreamResponse(
headers=headers,
status=200,
)
await response.prepare(request)
await response.write(data_to_send)
raise Exception()
# Any range request
match = re.match(r'^bytes=(?P<start>\d+)-(?P<end>\d+)?$', request.headers['Range'])
start = int(match['start'])
end = \
int(match['end']) + 1 if match['end'] else \
len(data)
data_to_send = data[start:end + 1]
headers = {
'Content-Range': 'bytes {}-{}/{}'.format(start, end - 1, len(data)),
'Content-Length': str(len(data_to_send)),
}
print('Sending headers', headers)
print('Sending data', data_to_send)
response = web.StreamResponse(
headers=headers,
status=206
)
await response.prepare(request)
await response.write(data_to_send)
await response.write_eof()
return response
app = web.Application()
app.add_routes([web.get(r'/', handle)])
runner = web.AppRunner(app)
await runner.setup()
site = web.TCPSite(runner, '0.0.0.0', 8888)
await site.start()
await asyncio.Future()
asyncio.run(main())
Parallel download using Curl command line utility
Well, curl
is just a simple UNIX process. You can have as many of these curl
processes running in parallel and sending their outputs to different files.
curl
can use the filename part of the URL to generate the local file. Just use the -O
option (man curl
for details).
You could use something like the following
urls="http://example.com/?page1.html http://example.com?page2.html" # add more URLs here
for url in $urls; do
# run the curl job in the background so we can start another job
# and disable the progress bar (-s)
echo "fetching $url"
curl $url -O -s &
done
wait #wait for all background jobs to terminate
Related Topics
Different File Owner Inside Docker Container and in Host MAChine
Dbd::Oracle Installation Causing Error
How to Delete Multiple Files at Once in Bash on Linux
How to Have Tcpdump Write to File and Standard Output the Appropriate Data
How to Create a Script to Save and Restore Permissions
Sanitize Environment with Command or Bash Script
How to Add My Own Software to a Buildroot Linux Package
Why Do I Get Permission Denied When I Try Use "Make" to Install Something
How to Test for If Two Files Exist
How to Append to a File Using X86-64 Linux System Calls
How to Replace Just One Newline Between > and < in Unix
Are Message Queues Obsolete in Linux
Find Files in Created Between a Date Range
How to Insert a New Line in Linux Shell Script