download images from google with command line
First attempt
First you need to set the user agent so google will authorize output from searches. Then we can look for images and select the desired one. To accomplish that we insert missing newlines, wget will return google searches on one single line, and filter the link. The index of the file is stored in the variable count
.
$ count=10
$ imagelink=$(wget --user-agent 'Mozilla/5.0' -qO - "www.google.be/search?q=something\&tbm=isch" | sed 's/</\n</g' | grep '<img' | head -n"$count" | tail -n1 | sed 's/.*src="\([^"]*\)".*/\1/')
$ wget $imagelink
The image will now be in your working directory, you can tweak the last command and specify a desired output file name.
You can summarize it in a shell script:
#! /bin/bash
count=${1}
shift
query="$@"
[ -z $query ] && exit 1 # insufficient arguments
imagelink=$(wget --user-agent 'Mozilla/5.0' -qO - | "www.google.be/search?q=${query}\&tbm=isch" | sed 's/</\n</g' | grep '<img' | head -n"$count" | tail -n1 | sed 's/.*src="\([^"]*\)".*/\1/')
wget -qO google_image $imagelink
Example usage:
$ ls
Documents
Downloads
Music
script.sh
$ chmod +x script.sh
$ bash script.sh 5 awesome
$ ls
Documents
Downloads
google_image
Music
script.sh
Now the google_image
should contain the fifth google image when looking for 'awesome'. If you experience any bugs, let me know, I'll take care of them.
Better code
The problem with this code is that it returns pictures in low resolution. A better solution is as follows:
#! /bin/bash
# function to create all dirs til file can be made
function mkdirs {
file="$1"
dir="/"
# convert to full path
if [ "${file##/*}" ]; then
file="${PWD}/${file}"
fi
# dir name of following dir
next="${file#/}"
# while not filename
while [ "${next//[^\/]/}" ]; do
# create dir if doesn't exist
[ -d "${dir}" ] || mkdir "${dir}"
dir="${dir}/${next%%/*}"
next="${next#*/}"
done
# last directory to make
[ -d "${dir}" ] || mkdir "${dir}"
}
# get optional 'o' flag, this will open the image after download
getopts 'o' option
[[ $option = 'o' ]] && shift
# parse arguments
count=${1}
shift
query="$@"
[ -z "$query" ] && exit 1 # insufficient arguments
# set user agent, customize this by visiting http://whatsmyuseragent.com/
useragent='Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:31.0) Gecko/20100101 Firefox/31.0'
# construct google link
link="www.google.cz/search?q=${query}\&tbm=isch"
# fetch link for download
imagelink=$(wget -e robots=off --user-agent "$useragent" -qO - "$link" | sed 's/</\n</g' | grep '<a href.*\(png\|jpg\|jpeg\)' | sed 's/.*imgurl=\([^&]*\)\&.*/\1/' | head -n $count | tail -n1)
imagelink="${imagelink%\%*}"
# get file extention (.png, .jpg, .jpeg)
ext=$(echo $imagelink | sed "s/.*\(\.[^\.]*\)$/\1/")
# set default save location and file name change this!!
dir="$PWD"
file="google image"
# get optional second argument, which defines the file name or dir
if [[ $# -eq 2 ]]; then
if [ -d "$2" ]; then
dir="$2"
else
file="${2}"
mkdirs "${dir}"
dir=""
fi
fi
# construct image link: add 'echo "${google_image}"'
# after this line for debug output
google_image="${dir}/${file}"
# construct name, append number if file exists
if [[ -e "${google_image}${ext}" ]] ; then
i=0
while [[ -e "${google_image}(${i})${ext}" ]] ; do
((i++))
done
google_image="${google_image}(${i})${ext}"
else
google_image="${google_image}${ext}"
fi
# get actual picture and store in google_image.$ext
wget --max-redirect 0 -qO "${google_image}" "${imagelink}"
# if 'o' flag supplied: open image
[[ $option = "o" ]] && gnome-open "${google_image}"
# successful execution, exit code 0
exit 0
The comments should be self explanatory, if you have any questions about the code (such as the long pipeline) I'll be happy to clarify the mechanics. Note that I had to set a more detailed user agent on the wget, it may happen that you need to set a different user agent but I don't think it'll be a problem. If you do have a problem, visit http://whatsmyuseragent.com/ and supply the output in the useragent
variable.
When you wish to open the image instead of only downloading, use the -o
flag, example below. If you wish to extend the script and also include a custom output file name, just let me know and I'll add it for you.
Example usage:
$ chmod +x getimg.sh
$ ./getimg.sh 1 dog
$ gnome-open google_image.jpg
$ ./getimg.sh -o 10 donkey
How to download images from Google with terminal? (Mac)
The command is fine but you need to use the URL of the image
curl -O https://i.imgur.com/PmPGYHR.png
If you want to get the image URL from the URL you referred, you can do
curl https://imgur.com/gallery/yu5An |grep "link rel=\"image_src" |cut -d'"' -f4
wget/curl large file from google drive
WARNING: This functionality is deprecated. See warning below in comments.
Have a look at this question: Direct download from Google Drive using Google Drive API
Basically you have to create a public directory and access your files by relative reference with something like
wget https://googledrive.com/host/LARGEPUBLICFOLDERID/index4phlat.tar.gz
Alternatively, you can use this script: https://github.com/circulosmeos/gdown.pl
Ubuntu: Using curl to download an image
curl
without any options will perform a GET request. It will simply return the data from the URI specified. Not retrieve the file itself to your local machine.
When you do,
$ curl https://www.python.org/static/apple-touch-icon-144x144-precomposed.png
You will receive binary data:
|�>�$! <R�HP@T*�Pm�Z��jU֖��ZP+UAUQ@�
��{X\� K���>0c�yF[i�}4�!�V̧�H_�)nO#�;I��vg^_ ��-Hm$$N0.
���%Y[�L�U3�_^9��P�T�0'u8�l�4 ...
In order to save this, you can use:
$ curl https://www.python.org/static/apple-touch-icon-144x144-precomposed.png > image.png
to store that raw image data inside of a file.
An easier way though, is just to use wget
.
$ wget https://www.python.org/static/apple-touch-icon-144x144-precomposed.png
$ ls
.
..
apple-touch-icon-144x144-precomposed.png
Download first 1000 images from google search
update 4: PhantomJS is now obsolete, I made a new script google-images.py in Python using Selenium and Chrome headless. See here for more details: https://stackoverflow.com/a/61982397/218294
update 3: I fixed the script to work with phantomjs 2.x.
update 2: I modified the script to use phantomjs. It's harder to install, but at least it works again. http://sam.nipl.net/b/google-images http://sam.nipl.net/b/google-images.js
update 1: Unfortunately this no longer works. It seems Javascript and other magic is now required to find where the images are located. Here is a version of the script for yahoo image search: http://sam.nipl.net/code/nipl-tools/bin/yimg
original answer: I hacked something together for this. I normally write smaller tools and use them together, but you asked for one shell script, not three dozen. This is deliberately dense code.
http://sam.nipl.net/code/nipl-tools/bin/google-images
It seems to work very well so far. Please let me know if you can improve it, or suggest any better coding techniques (given that it's a shell script).
#!/bin/bash
[ $# = 0 ] && { prog=`basename "$0"`;
echo >&2 "usage: $prog query count parallel safe opts timeout tries agent1 agent2
e.g. : $prog ostrich
$prog nipl 100 20 on isz:l,itp:clipart 5 10"; exit 2; }
query=$1 count=${2:-20} parallel=${3:-10} safe=$4 opts=$5 timeout=${6:-10} tries=${7:-2}
agent1=${8:-Mozilla/5.0} agent2=${9:-Googlebot-Image/1.0}
query_esc=`perl -e 'use URI::Escape; print uri_escape($ARGV[0]);' "$query"`
dir=`echo "$query_esc" | sed 's/%20/-/g'`; mkdir "$dir" || exit 2; cd "$dir"
url="http://www.google.com/search?tbm=isch&safe=$safe&tbs=$opts&q=$query_esc" procs=0
echo >.URL "$url" ; for A; do echo >>.args "$A"; done
htmlsplit() { tr '\n\r \t' ' ' | sed 's/</\n</g; s/>/>\n/g; s/\n *\n/\n/g; s/^ *\n//; s/ $//;'; }
for start in `seq 0 20 $[$count-1]`; do
wget -U"$agent1" -T"$timeout" --tries="$tries" -O- "$url&start=$start" | htmlsplit
done | perl -ne 'use HTML::Entities; /^<a .*?href="(.*?)"/ and print decode_entities($1), "\n";' | grep '/imgres?' |
perl -ne 'use URI::Escape; ($img, $ref) = map { uri_unescape($_) } /imgurl=(.*?)&imgrefurl=(.*?)&/;
$ext = $img; for ($ext) { s,.*[/.],,; s/[^a-z0-9].*//i; $_ ||= "img"; }
$save = sprintf("%04d.$ext", ++$i); print join("\t", $save, $img, $ref), "\n";' |
tee -a .images.tsv |
while IFS=$'\t' read -r save img ref; do
wget -U"$agent2" -T"$timeout" --tries="$tries" --referer="$ref" -O "$save" "$img" || rm "$save" &
procs=$[$procs + 1]; [ $procs = $parallel ] && { wait; procs=0; }
done ; wait
Features:
- under 1500 bytes
- explains usage, if run with no args
- downloads full images in parallel
- safe search option
- image size, type, etc. opts string
- timeout / retries options
- impersonates googlebot to fetch all images
- numbers image files
- saves metadata
I'll post a modular version some time, to show that it can be done quite nicely with a set of shell scripts and simple tools.
Download Images from a site via a command line
You get the same behavior when running wget https://www.bunlongheng.com/
without all that stuff with images. Running wget -d https://www.bunlongheng.com/ 2>&1 | less
provides some information: there is an index error in a php file :
ErrorException: Undefined offset: 1 (View: /home/forge/bheng/resources/views/layouts/fe/meta.blade.php) (View: /home/forge/bheng/resources/views/layouts/fe/mSkipping 512 bytes of body: [eta.blade.php) in file /home/forge/bheng/storage/framework/views/0b4178e309ed0339363606e08a7e6d3f33347b7f.php on line 76
Stack trace:
1. ErrorException->() /home/forge/bheng/storage/framework/views/0b4178e309ed0339363606e08a7e6d3f33347b7f.php:76
...
etc
As proposed by @mhdINbY, if you put a user agent of an existing browser (I tried mine : -U "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:64.0) Gecko/20100101 Firefox/64.0"
everything went OK.
I would suspect that your framework analyses the user agent HTTP header in order to format the output accordingly and has a bug when it doesn't know the user agent you are using, here User-Agent: Wget/1.17.1 (linux-gnu)
batch download images from url with for
Numbered-Files Downloader 1.0
Here is a complete batch script that is doing exactly what you asked for. You don't need to download any executable files, this is 100% batch script and it should works on any (recent) Windows installation.
All you need to do is to edit the _URL variable (Line 11) and replace "example.com/folder..." with the actual URL of the files you want to download. After that, you can run the script and get your files.
- Note that in your URL, this string: _NUMBERS_ is a keyword-filter that will be replaced by the incremented numbers in the final download function.
All your downloaded files will be saved in the directory where this script is located. You can choose an other directory by uncommenting the _SAVE_PATH variable (Line 15).
Finally the following variables can be changed to configure the series of numbers:
_START : The file numbers starts with this value.
_STEP : Step between each files.
_END : The file numbers ends with this value.
Leading Zeros
Currently, the counter doesn't support leading zeros.
EX. From Picture_001.jpg to Picture_999.jpg
But otherwise it should work fine for something like this:
EX. From Picture_1.jpg to Picture_999.jpg
I will try to find some time to add this option, it shouldn't be too difficult.
Feel free to modify & enhance this script if you need!
Numbered-DL.cmd
@echo off
setlocal EnableDelayedExpansion
rem STACKOVERFLOW - QUESTION FROM:
rem https://stackoverflow.com/questions/45796990/batch-download-images-from-url-with-for
:VARIABLES
rem WHERE YOU WANT TO SAVE FILES
rem "%~dp0" is a variable for the same folder as this script, so files should be saved in the same folder.
rem If you want to save the downloaded files somewhere else, uncomment the next line and edit the path.
SET "_SAVE_DIR=%~dp0"
rem SET _SAVE_PATH=C:\Folder\
rem DOWNLOAD THIS FILE URL
rem
rem "_NUMBERS_" WILL BE REPLACED BY THE COUNTER
rem CURRENLY IT DOESN'T SUPPORT CHOOSING A NUMBERS OF ZEROS FOR THE COUNTER EX: 001,002,003...
rem BUT IT SHOULDN'T BE TOO HARD TO IMPLEMENT, MAYBE ILL ADD THIS IN THE FUTURE.
rem
rem SET _FILE_URL=https://example.com/folder/_NUMBERS_.png
SET "_FILE_URL=https://cweb.canon.jp/eos/lineup/r5/image/downloads/sample0_NUMBERS_.jpg"
rem FOR THIS EXAMPLE THE SCRIPT WILL DOWNLOAD FILES FROM "sample01.jpg" TO "sample05.jpg"
SET _START=1
SET _STEP=1
SET _END=5
:CMD_PARAMS
IF NOT [%1]==[] SET "_FILE_URL=%1"
IF NOT [%2]==[] SET "_SAVE_DIR=%2"
:PATH_FIX
rem REMOVE THE LAST CHAR IF IT IS "\"
IF [%_SAVE_DIR:~-1%] == [\] SET "_SAVE_DIR=%_SAVE_DIR:~0,-1%"
:DETAILS_DISPLAY
ECHO.
ECHO SCRIPT: Numbered-Files Downloader 1.0
ECHO AUTHOR: Frank Einstein
ECHO.
ECHO.
ECHO INPUTS
ECHO _URL: %_FILE_URL%
ECHO _SAVE_DIR: %_SAVE_DIR%
ECHO.
ECHO _START: %_START%
ECHO _STEP= %_STEP%
ECHO _END= %_END%
ECHO.
ECHO.
CALL :DOWNLOAD_LOOP
ECHO.
ECHO EXECUTION COMPLETED
ECHO.
PAUSE
EXIT /B
:DOWNLOAD_LOOP
SET FINAL_URL=%_FILE_URL%
FOR /L %%G IN (%_START%,%_STEP%,%_END%) DO (
rem REPLACE URL'S KEYWORD WITH NUMBERS
SET NUM=%%G
SET FINAL_URL=%FINAL_URL:_NUMBERS_=!NUM!%
rem CUMSTOM BATCH FUNCTION FOR DOWNLOADING FILES
rem
rem SYNTAX:
rem echo CALL :DOWNLOAD !FINAL_URL!
CALL :DOWNLOAD !FINAL_URL! !_SAVE_DIR!
)
Goto :EOF
rem PAUSE
rem EXIT /B
rem FUNCTIONS
:DOWNLOAD
setlocal
SET "DL_FILE_URL=%1"
SET "DL_SAVE_DIR=%2"
rem EXTRACT THE FILENAME FROM URL (NEED TO FIX THIS PART?)
FOR %%F IN ("%DL_FILE_URL%") DO SET DL_FILE_NAME=%%~nxF
IF "%DL_SAVE_DIR:~-1%" == "\" SET "DL_SAVE_DIR=%DL_SAVE_DIR:~0,-1%"
IF NOT [%2]==[] SET "DL_SAVE_FILE=%DL_SAVE_DIR%\%DL_FILE_NAME%"
IF [%2]==[] SET "DL_SAVE_FILE=%~dp0%DL_FILE_NAME%"
rem :BITSADMIN
ECHO.
ECHO DOWNLOADING: "%DL_FILE_URL%"
ECHO SAVING TO: "%DL_SAVE_FILE%"
ECHO.
bitsadmin /transfer mydownloadjob /download /priority foreground "%DL_FILE_URL%" "%DL_SAVE_FILE%"
rem BITSADMIN DOWNLOAD EXAMPLE
rem bitsadmin /transfer mydownloadjob /download /priority foreground http://example.com/filename.zip C:\Users\username\Downloads\filename.zip
endlocal
GOTO :EOF
Download Images from list of urls
Create a folder in your machine.
Place your text file of images URL in the folder.
cd
to that folder.Use
wget -i images.txt
You will find all your downloaded files in the folder.
Related Topics
Timed Out While Waiting for the MAChine to Boot When Vagrant Up
How to Runtime Debug Shared Libraries
Loading Elf File in C in User Space
How to Clear the Line Number in Vim When Copying
How to Automatically Start a Node.Js Application in Amazon Linux Ami on Aws
Managing a User Password for Linux in Puppet
How to Get the Last Word in Each Line with Bash
Sudo: Docker-Compose: Command Not Found
How to Tell Whether I'm in a Screen
Git: Can't Push (Unpacker Error) Related to Permission Issues
How to Set a Static Ip Address in a Docker Container
Linux's Thread Local Storage Implementation
Grep from Tar.Gz Without Extracting [Faster One]
Which Signal Does Ctrl-X Send When Used in a Terminal