How to Check Status of Urls from Text File Using Bash Shell Script

Script to get the HTTP status code of a list of urls?

Curl has a specific option, --write-out, for this:

$ curl -o /dev/null --silent --head --write-out '%{http_code}\n' <url>
200
  • -o /dev/null throws away the usual output
  • --silent throws away the progress meter
  • --head makes a HEAD HTTP request, instead of GET
  • --write-out '%{http_code}\n' prints the required status code

To wrap this up in a complete Bash script:

#!/bin/bash
while read LINE; do
curl -o /dev/null --silent --head --write-out "%{http_code} $LINE\n" "$LINE"
done < url-list.txt

(Eagle-eyed readers will notice that this uses one curl process per URL, which imposes fork and TCP connection penalties. It would be faster if multiple URLs were combined in a single curl, but there isn't space to write out the monsterous repetition of options that curl requires to do this.)

Bash script using curl to hit urls provided by txt file

You can tell curl not to output the content with -o /dev/null and forward the progress information to a file using 2>>/home/warm_script/output.txt:

#!/bin/bash

url="https://example.com"
for i in $(cat /home/warm_script/urls.txt); do
curl -m '20' -A 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36 (m__warmer)' "$url/$i" -o /dev/null /home/warm_script/output.txt
done

You can also use the -I option to just print headers:

content="$(curl -m '20' -A 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36 (m__warmer)' "$url/$i" -I)"

If you just want the status code, you can use -o /dev/null -w '%{http_code}\n' -s (see this answer:

content="$(curl -m '20' -A 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36 (m__warmer)' "$url/$i" -o /dev/null -w '%{http_code}\n' -s)"

How to check the validity of a list of URLs in a file?

One can use && and || operators to print to an output file based on the exit status of the curl command.

$ curl -k -s -o /dev/null "${url}" && echo "${url} : Exists" > 1.txt || echo "${url} : Does not exist" > 2.txt

For a list,

#!/bin/bash
> 1.txt # Create empty text file
> 2.txt #
while IFS= read -r url
do
curl -k -s -o /dev/null "${url}" && echo "${url} : Exists" >> 1.txt || echo "${url}: Does not exist" >> 2.txt
done < /path/to/list.txt

Fastest way to check if an URL is valid in bash script

With a touch of caution, it might be worth trying to parallelise these to reduce the overall time. But you might hit some other bottleneck.

This is not tested, but hopefully might be enough to help you to give it a go:


## The fucntions does the checking. This will be called
## as a async background process

function checkurl
{
myline = $1
myurl = $2
if ! wget -q --method=HEAD $myurl; then
echo $myline >> invalid_instance_types
else
echo $myline >> valid_instance_types
fi
}

echo "Separating valid and invalid URLs"

$maxno=10
$cno = 0

while read line;
do
url=`echo "$line" | cut -d' ' -f1 | cut -d'<' -f2 | cut -d'>' -f1`
checkurl "$line" "$url" &

## Optional - have a limit for number of submissions
## Not foolproof but can be tuned to taste and developed
## to be more accurate and robust, e.g check number of jobs
## running and limiting that to maxno.

((cno=cno+1))
if [ $cno -gt $maxno ]
then
sleep 5
cno=0
fi

done < test

## Wait for all submitted processes to complete - using the 'while'
## loop might play tricks with this.
## If so, you could implement a sleep loop to periodically count
## the lines until
## (wc -l test == (wc -l invalid_instance_types valid_instance_types))

echo "Waiting for all to complete"
wait
echo "Done"

Check if a URL goes to a page containing the text 404

For the fun - here is an BASH solution:

dosomething() {
code="$1"; url="$2"
case "$code" in
200) echo "OK for $url";;
302) echo "redir for $url";;
404) echo "notfound for $url";;
*) echo "other $code for $url";;
esac
}

#MAIN program
while read url
do
uri=($(echo "$url" | sed 's~http://\([^/][^/]*\)\(.*\)~\1 \2~'))
HOST=${uri[0]:=localhost}
FILE=${uri[1]:=/}
exec {SOCKET}<>/dev/tcp/$HOST/80
echo -ne "GET $FILE HTTP/1.1\nHost: $HOST\n\n" >&${SOCKET}
res=($(<&${SOCKET} sed '/^.$/,$d' | grep '^HTTP'))
dosomething ${res[1]} "$url"
done << EOF
http://stackoverflow.com
http://stackoverflow.com/some/bad/url
EOF

using URLS in bash script

Too many quotes. Try a single quote instead:

test="curl -s -o /dev/null -I -w %{http_code} '$name'"

To adept the comments, it should be sufficient to use the script like:

#!/bin/bash
while read line
do
/usr/bin/curl -s -o /dev/null -I -w %{http_code} -- "$line"
done < $1

Copy line of text from input to output file while using xargs

Use -I option. E.g:

xargs -n1 -P 10 -I '{}' curl -u user:pass -L -o /dev/null --silent --head --write-out '{},%{url_effective},%{http_code},%{num_redirects}\n' '{}' < url-list.txt | tee status-codes.csv

man xargs:

-I replace-str
Replace occurrences of replace-str in the initial-arguments with names read from standard input.

Issue with Unix Bash Script that Reads In Mp3 Urls from .TXT File, then Downloads and Renames Files

No, there is no inherent shell script limit that you are hitting.

Is it possible that the web server you are downloading the MP3s from has a rate limiter which kicks in at 50 downloads in too short a time? If so you will need to slow down your script.

Try this modification and see what happens if you start at the 50th MP3:

#!/bin/bash
mkdir -p ~/Desktop/URLs
n=1
while read mp3; do
((n >= 50)) && curl "$mp3" > ~/Desktop/URLs/$n.mp3
((n++))
done < ~/Desktop/URLs.txt

If you want to slow it down add a sleep call to the loop.



Related Topics



Leave a reply



Submit