wget breaking with content-disposition
The &
symbols are interpreted as the shell special character which causes a command to run in background(to fork). So you should escape or quote them:
curl -O -J -L 'http://waterwatch.usgs.gov/index.php?m=real&w=kml&r=us®ions=ia'
In the command above we used full quoting.
The following lines from your output mean that three commands are being forked to background:
[1] 32260
[2] 32261
[3] 32262
The numbers at the left (in brackets) are job numbers. You can bring a job to foreground by typing fg N
, where N
is the number of the job. The numbers at the right are process IDs.
PHP: Save the file returned by wget with file_put_content
Its because exec
doesn't return the whole data. Take a look at the documentation
https://www.php.net/manual/en/function.exec.php :
Return Values: The last line from the result of the command.
But shell_exec
(or just backticks) returns whole data: https://www.php.net/manual/en/function.shell-exec.php .
Example:
<?php
$url = 'https://file-examples-com.github.io/uploads/2017/02/zip_5MB.zip';
$content = exec("wget -qO- $url");
var_dump(strlen($content));
$content2 = shell_exec("wget -qO- $url");
var_dump(strlen($content2));
file_put_contents('1.zip', $content);
file_put_contents('2.zip', $content2);
Output:
int(208)
int(5452018)
2.zip works (all 5MB data), 1.zip obviously not (just some bytes from the end).
So don't treat exec
's return value as the whole output of the command.
How to make wget to save with proper file name
On an HTTP level, the server sends the filename information to the client in the Content-Disposition
header field within the HTTP response:
HTTP/1.1 200 OK
[…]
Content-Disposition: attachment; filename="bind-9.9.4-P2.tar.gz";
See RFC2183 for details on the Content-Disposition
header field.
wget
has experimental support for this feature according to its manpage:
--content-disposition
If this is set to on, experimental (not fully-functional) support for
"Content-Disposition" headers is enabled. This can currently result in
extra round-trips to the server for a "HEAD" request, and is known to
suffer from a few bugs, which is why it is not currently enabled by
default.
So if you choose to enable it, just specify the --content-disposition
option. (You could also use curl
to do the job instead of wget
, but the question was about wget
.)
Efficient parallel downloading and decompressing with matching pattern for list of files on server
Can you list the urls to download?
listurls() {
# do something that lists the urls without downloading them
# Possibly something like:
# lynx -listonly -image_links -dump "$starturl"
# or
# wget --spider -r -nH -np -nv -nd -A "${filename}.bz2" "url/${run}/${1,,}/"
# or
# seq 100 | parallel echo ${url}${year}${month}${day}${run}_{}_${id}.grib2
}
get_and_extract_one() {
url="$1"
file="$2"
wget -O - "$url" | bzip2 -dc > "$file"
}
export -f get_and_extract_one
# {=s:/:_:g; =} will generate a file name from the URL with / replaced by _
# You probably want something nicer.
# Possibly just {/.}
listurls | parallel get_and_extract_one {} '{=s:/:_:g; =}'
This way you will decompress while downloading and doing all in parallel.
Related Topics
Trying to Install Docker Gpg Key Recieving Error: Curl: Option '-' Is Unknown
How to Configure Multiple Ssh Access to an Ec2 Instance
How to Limit CPU and Ram Resources for Mongodump
Running Linux Container on Docker Windows
How to Do Simple Arithmetic in Sed Addresses
End Perl Script Without Waiting for System Call to Return
How to Inject a Raw L2 Packet as an Incoming Packet to an Interface on Linux
Sending Realtime Signal from a Kernel Module to User Space Fails
Which Signal Was Delivered to Process Deadlocked in Signal Handler
Interpreting Openssl Speed Output for Rsa with Multi Option
Why Does The Stack Have to Be Page Aligned
In Bash Tee Is Making Function Variables Local, How to Escape This
What Is The Side Effect of Setting Tcp_Max_Tw_Buckets to a Very Small Value
What Is The Differences and Relationships Between "Process", "Threads", "Task" and "Jobs" in Linux
Create and Test X86-64 Elf Executable Shellcode on a Linux Machine