Download Images from Google with Command Line

download images from google with command line

First attempt

First you need to set the user agent so google will authorize output from searches. Then we can look for images and select the desired one. To accomplish that we insert missing newlines, wget will return google searches on one single line, and filter the link. The index of the file is stored in the variable count.

$ count=10
$ imagelink=$(wget --user-agent 'Mozilla/5.0' -qO - "www.google.be/search?q=something\&tbm=isch" | sed 's/</\n</g' | grep '<img' | head -n"$count" | tail -n1 | sed 's/.*src="\([^"]*\)".*/\1/')
$ wget $imagelink

The image will now be in your working directory, you can tweak the last command and specify a desired output file name.

You can summarize it in a shell script:

#! /bin/bash
count=${1}
shift
query="$@"
[ -z $query ] && exit 1 # insufficient arguments
imagelink=$(wget --user-agent 'Mozilla/5.0' -qO - | "www.google.be/search?q=${query}\&tbm=isch" | sed 's/</\n</g' | grep '<img' | head -n"$count" | tail -n1 | sed 's/.*src="\([^"]*\)".*/\1/')
wget -qO google_image $imagelink

Example usage:

$ ls
Documents
Downloads
Music
script.sh
$ chmod +x script.sh
$ bash script.sh 5 awesome
$ ls
Documents
Downloads
google_image
Music
script.sh

Now the google_image should contain the fifth google image when looking for 'awesome'. If you experience any bugs, let me know, I'll take care of them.

Better code

The problem with this code is that it returns pictures in low resolution. A better solution is as follows:

#! /bin/bash

# function to create all dirs til file can be made
function mkdirs {
file="$1"
dir="/"

# convert to full path
if [ "${file##/*}" ]; then
file="${PWD}/${file}"
fi

# dir name of following dir
next="${file#/}"

# while not filename
while [ "${next//[^\/]/}" ]; do
# create dir if doesn't exist
[ -d "${dir}" ] || mkdir "${dir}"
dir="${dir}/${next%%/*}"
next="${next#*/}"
done

# last directory to make
[ -d "${dir}" ] || mkdir "${dir}"
}

# get optional 'o' flag, this will open the image after download
getopts 'o' option
[[ $option = 'o' ]] && shift

# parse arguments
count=${1}
shift
query="$@"
[ -z "$query" ] && exit 1 # insufficient arguments

# set user agent, customize this by visiting http://whatsmyuseragent.com/
useragent='Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:31.0) Gecko/20100101 Firefox/31.0'

# construct google link
link="www.google.cz/search?q=${query}\&tbm=isch"

# fetch link for download
imagelink=$(wget -e robots=off --user-agent "$useragent" -qO - "$link" | sed 's/</\n</g' | grep '<a href.*\(png\|jpg\|jpeg\)' | sed 's/.*imgurl=\([^&]*\)\&.*/\1/' | head -n $count | tail -n1)
imagelink="${imagelink%\%*}"

# get file extention (.png, .jpg, .jpeg)
ext=$(echo $imagelink | sed "s/.*\(\.[^\.]*\)$/\1/")

# set default save location and file name change this!!
dir="$PWD"
file="google image"

# get optional second argument, which defines the file name or dir
if [[ $# -eq 2 ]]; then
if [ -d "$2" ]; then
dir="$2"
else
file="${2}"
mkdirs "${dir}"
dir=""
fi
fi

# construct image link: add 'echo "${google_image}"'
# after this line for debug output
google_image="${dir}/${file}"

# construct name, append number if file exists
if [[ -e "${google_image}${ext}" ]] ; then
i=0
while [[ -e "${google_image}(${i})${ext}" ]] ; do
((i++))
done
google_image="${google_image}(${i})${ext}"
else
google_image="${google_image}${ext}"
fi

# get actual picture and store in google_image.$ext
wget --max-redirect 0 -qO "${google_image}" "${imagelink}"

# if 'o' flag supplied: open image
[[ $option = "o" ]] && gnome-open "${google_image}"

# successful execution, exit code 0
exit 0

The comments should be self explanatory, if you have any questions about the code (such as the long pipeline) I'll be happy to clarify the mechanics. Note that I had to set a more detailed user agent on the wget, it may happen that you need to set a different user agent but I don't think it'll be a problem. If you do have a problem, visit http://whatsmyuseragent.com/ and supply the output in the useragent variable.

When you wish to open the image instead of only downloading, use the -o flag, example below. If you wish to extend the script and also include a custom output file name, just let me know and I'll add it for you.

Example usage:

$ chmod +x getimg.sh
$ ./getimg.sh 1 dog
$ gnome-open google_image.jpg
$ ./getimg.sh -o 10 donkey

How to download images from Google with terminal? (Mac)

The command is fine but you need to use the URL of the image

curl -O https://i.imgur.com/PmPGYHR.png

If you want to get the image URL from the URL you referred, you can do

curl https://imgur.com/gallery/yu5An |grep "link rel=\"image_src" |cut -d'"' -f4

wget/curl large file from google drive

WARNING: This functionality is deprecated. See warning below in comments.


Have a look at this question: Direct download from Google Drive using Google Drive API

Basically you have to create a public directory and access your files by relative reference with something like

wget https://googledrive.com/host/LARGEPUBLICFOLDERID/index4phlat.tar.gz

Alternatively, you can use this script: https://github.com/circulosmeos/gdown.pl

Ubuntu: Using curl to download an image

curl without any options will perform a GET request. It will simply return the data from the URI specified. Not retrieve the file itself to your local machine.

When you do,

$ curl https://www.python.org/static/apple-touch-icon-144x144-precomposed.png

You will receive binary data:

                   |�>�$! <R�HP@T*�Pm�Z��jU֖��ZP+UAUQ@�
��{X\� K���>0c�yF[i�}4�!�V̧�H_�)nO#�;I��vg^_ ��-Hm$$N0.
���%Y[�L�U3�_^9��P�T�0'u8�l�4 ...

In order to save this, you can use:

$ curl https://www.python.org/static/apple-touch-icon-144x144-precomposed.png > image.png

to store that raw image data inside of a file.

An easier way though, is just to use wget.

$ wget https://www.python.org/static/apple-touch-icon-144x144-precomposed.png
$ ls
.
..
apple-touch-icon-144x144-precomposed.png

Download first 1000 images from google search

update 4: PhantomJS is now obsolete, I made a new script google-images.py in Python using Selenium and Chrome headless. See here for more details: https://stackoverflow.com/a/61982397/218294

update 3: I fixed the script to work with phantomjs 2.x.

update 2: I modified the script to use phantomjs. It's harder to install, but at least it works again. http://sam.nipl.net/b/google-images http://sam.nipl.net/b/google-images.js

update 1: Unfortunately this no longer works. It seems Javascript and other magic is now required to find where the images are located. Here is a version of the script for yahoo image search: http://sam.nipl.net/code/nipl-tools/bin/yimg

original answer: I hacked something together for this. I normally write smaller tools and use them together, but you asked for one shell script, not three dozen. This is deliberately dense code.

http://sam.nipl.net/code/nipl-tools/bin/google-images

It seems to work very well so far. Please let me know if you can improve it, or suggest any better coding techniques (given that it's a shell script).

#!/bin/bash
[ $# = 0 ] && { prog=`basename "$0"`;
echo >&2 "usage: $prog query count parallel safe opts timeout tries agent1 agent2
e.g. : $prog ostrich
$prog nipl 100 20 on isz:l,itp:clipart 5 10"; exit 2; }
query=$1 count=${2:-20} parallel=${3:-10} safe=$4 opts=$5 timeout=${6:-10} tries=${7:-2}
agent1=${8:-Mozilla/5.0} agent2=${9:-Googlebot-Image/1.0}
query_esc=`perl -e 'use URI::Escape; print uri_escape($ARGV[0]);' "$query"`
dir=`echo "$query_esc" | sed 's/%20/-/g'`; mkdir "$dir" || exit 2; cd "$dir"
url="http://www.google.com/search?tbm=isch&safe=$safe&tbs=$opts&q=$query_esc" procs=0
echo >.URL "$url" ; for A; do echo >>.args "$A"; done
htmlsplit() { tr '\n\r \t' ' ' | sed 's/</\n</g; s/>/>\n/g; s/\n *\n/\n/g; s/^ *\n//; s/ $//;'; }
for start in `seq 0 20 $[$count-1]`; do
wget -U"$agent1" -T"$timeout" --tries="$tries" -O- "$url&start=$start" | htmlsplit
done | perl -ne 'use HTML::Entities; /^<a .*?href="(.*?)"/ and print decode_entities($1), "\n";' | grep '/imgres?' |
perl -ne 'use URI::Escape; ($img, $ref) = map { uri_unescape($_) } /imgurl=(.*?)&imgrefurl=(.*?)&/;
$ext = $img; for ($ext) { s,.*[/.],,; s/[^a-z0-9].*//i; $_ ||= "img"; }
$save = sprintf("%04d.$ext", ++$i); print join("\t", $save, $img, $ref), "\n";' |
tee -a .images.tsv |
while IFS=$'\t' read -r save img ref; do
wget -U"$agent2" -T"$timeout" --tries="$tries" --referer="$ref" -O "$save" "$img" || rm "$save" &
procs=$[$procs + 1]; [ $procs = $parallel ] && { wait; procs=0; }
done ; wait

Features:

  • under 1500 bytes
  • explains usage, if run with no args
  • downloads full images in parallel
  • safe search option
  • image size, type, etc. opts string
  • timeout / retries options
  • impersonates googlebot to fetch all images
  • numbers image files
  • saves metadata

I'll post a modular version some time, to show that it can be done quite nicely with a set of shell scripts and simple tools.

Download Images from a site via a command line

You get the same behavior when running wget https://www.bunlongheng.com/ without all that stuff with images. Running wget -d https://www.bunlongheng.com/ 2>&1 | less provides some information: there is an index error in a php file :

ErrorException: Undefined offset: 1 (View: /home/forge/bheng/resources/views/layouts/fe/meta.blade.php) (View: /home/forge/bheng/resources/views/layouts/fe/mSkipping 512 bytes of body: [eta.blade.php) in file /home/forge/bheng/storage/framework/views/0b4178e309ed0339363606e08a7e6d3f33347b7f.php on line 76
Stack trace:
1. ErrorException->() /home/forge/bheng/storage/framework/views/0b4178e309ed0339363606e08a7e6d3f33347b7f.php:76
...
etc

As proposed by @mhdINbY, if you put a user agent of an existing browser (I tried mine : -U "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:64.0) Gecko/20100101 Firefox/64.0" everything went OK.

I would suspect that your framework analyses the user agent HTTP header in order to format the output accordingly and has a bug when it doesn't know the user agent you are using, here User-Agent: Wget/1.17.1 (linux-gnu)

batch download images from url with for

Numbered-Files Downloader 1.0

Here is a complete batch script that is doing exactly what you asked for. You don't need to download any executable files, this is 100% batch script and it should works on any (recent) Windows installation.

All you need to do is to edit the _URL variable (Line 11) and replace "example.com/folder..." with the actual URL of the files you want to download. After that, you can run the script and get your files.

  • Note that in your URL, this string: _NUMBERS_ is a keyword-filter that will be replaced by the incremented numbers in the final download function.

All your downloaded files will be saved in the directory where this script is located. You can choose an other directory by uncommenting the _SAVE_PATH variable (Line 15).

Finally the following variables can be changed to configure the series of numbers:

_START : The file numbers starts with this value.

_STEP   : Step between each files.

_END    : The file numbers ends with this value.



Leading Zeros

Currently, the counter doesn't support leading zeros.
EX. From Picture_001.jpg to Picture_999.jpg
But otherwise it should work fine for something like this:
EX. From Picture_1.jpg to Picture_999.jpg

I will try to find some time to add this option, it shouldn't be too difficult.


Feel free to modify & enhance this script if you need!



Numbered-DL.cmd

@echo off
setlocal EnableDelayedExpansion

rem STACKOVERFLOW - QUESTION FROM:
rem https://stackoverflow.com/questions/45796990/batch-download-images-from-url-with-for

:VARIABLES

rem WHERE YOU WANT TO SAVE FILES
rem "%~dp0" is a variable for the same folder as this script, so files should be saved in the same folder.
rem If you want to save the downloaded files somewhere else, uncomment the next line and edit the path.
SET "_SAVE_DIR=%~dp0"
rem SET _SAVE_PATH=C:\Folder\

rem DOWNLOAD THIS FILE URL
rem
rem "_NUMBERS_" WILL BE REPLACED BY THE COUNTER
rem CURRENLY IT DOESN'T SUPPORT CHOOSING A NUMBERS OF ZEROS FOR THE COUNTER EX: 001,002,003...
rem BUT IT SHOULDN'T BE TOO HARD TO IMPLEMENT, MAYBE ILL ADD THIS IN THE FUTURE.
rem
rem SET _FILE_URL=https://example.com/folder/_NUMBERS_.png
SET "_FILE_URL=https://cweb.canon.jp/eos/lineup/r5/image/downloads/sample0_NUMBERS_.jpg"

rem FOR THIS EXAMPLE THE SCRIPT WILL DOWNLOAD FILES FROM "sample01.jpg" TO "sample05.jpg"
SET _START=1
SET _STEP=1
SET _END=5

:CMD_PARAMS
IF NOT [%1]==[] SET "_FILE_URL=%1"
IF NOT [%2]==[] SET "_SAVE_DIR=%2"

:PATH_FIX
rem REMOVE THE LAST CHAR IF IT IS "\"
IF [%_SAVE_DIR:~-1%] == [\] SET "_SAVE_DIR=%_SAVE_DIR:~0,-1%"

:DETAILS_DISPLAY

ECHO.
ECHO SCRIPT: Numbered-Files Downloader 1.0
ECHO AUTHOR: Frank Einstein
ECHO.
ECHO.
ECHO INPUTS
ECHO _URL: %_FILE_URL%
ECHO _SAVE_DIR: %_SAVE_DIR%
ECHO.
ECHO _START: %_START%
ECHO _STEP= %_STEP%
ECHO _END= %_END%
ECHO.
ECHO.

CALL :DOWNLOAD_LOOP

ECHO.
ECHO EXECUTION COMPLETED
ECHO.
PAUSE
EXIT /B

:DOWNLOAD_LOOP

SET FINAL_URL=%_FILE_URL%

FOR /L %%G IN (%_START%,%_STEP%,%_END%) DO (

rem REPLACE URL'S KEYWORD WITH NUMBERS
SET NUM=%%G
SET FINAL_URL=%FINAL_URL:_NUMBERS_=!NUM!%

rem CUMSTOM BATCH FUNCTION FOR DOWNLOADING FILES
rem
rem SYNTAX:
rem echo CALL :DOWNLOAD !FINAL_URL!
CALL :DOWNLOAD !FINAL_URL! !_SAVE_DIR!

)

Goto :EOF
rem PAUSE
rem EXIT /B

rem FUNCTIONS

:DOWNLOAD

setlocal

SET "DL_FILE_URL=%1"
SET "DL_SAVE_DIR=%2"

rem EXTRACT THE FILENAME FROM URL (NEED TO FIX THIS PART?)

FOR %%F IN ("%DL_FILE_URL%") DO SET DL_FILE_NAME=%%~nxF

IF "%DL_SAVE_DIR:~-1%" == "\" SET "DL_SAVE_DIR=%DL_SAVE_DIR:~0,-1%"
IF NOT [%2]==[] SET "DL_SAVE_FILE=%DL_SAVE_DIR%\%DL_FILE_NAME%"
IF [%2]==[] SET "DL_SAVE_FILE=%~dp0%DL_FILE_NAME%"

rem :BITSADMIN

ECHO.
ECHO DOWNLOADING: "%DL_FILE_URL%"
ECHO SAVING TO: "%DL_SAVE_FILE%"
ECHO.

bitsadmin /transfer mydownloadjob /download /priority foreground "%DL_FILE_URL%" "%DL_SAVE_FILE%"

rem BITSADMIN DOWNLOAD EXAMPLE
rem bitsadmin /transfer mydownloadjob /download /priority foreground http://example.com/filename.zip C:\Users\username\Downloads\filename.zip

endlocal

GOTO :EOF

Download Images from list of urls

  • Create a folder in your machine.

  • Place your text file of images URL in the folder.

  • cd to that folder.
  • Use wget -i images.txt

  • You will find all your downloaded files in the folder.



Related Topics



Leave a reply



Submit