Extract parent domain name from a list of url through Bash ShellScripting
Using awk
awk -F \/ '{l=split($3,a,"."); print (a[l-1]=="com"?a[l-2] OFS:X) a[l-1] OFS a[l]}' OFS="." file|sort -u
contatoruy.in
dicadodia.com.br
doomyjupe.com
forterins.com
gaelsyaray.com
livrariacultura.com.br
maxxivrimoveis.com.br
meguiatramandai.com.br
prategama.com
quetxviii.com
smilecire.com
suleacatan.com
theirpoem.com
toneyvaws.com
visionwebmkt.com
woadsbevy.com
yournjuju.com
zonalrems.com
zrobimystrone.pl
Extract parent domain/subdomain name from a list of url through Bash ShellScripting
You can use awk
,
awk -F/ '{sub(/^www\.?/,"",$3); print $3}' yourfile
Test:
$ awk -F/ '{sub(/^www\.?/,"",$3); print $3}' yourfile
example.com
example2.com
example3.com
subdomain.example4.com
subdomain.example5.com
Extract parent domain/subdomain name from a list of url through Bash ShellScripting
You can use awk
,
awk -F/ '{sub(/^www\.?/,"",$3); print $3}' yourfile
Test:
$ awk -F/ '{sub(/^www\.?/,"",$3); print $3}' yourfile
example.com
example2.com
example3.com
subdomain.example4.com
subdomain.example5.com
How to retrieve main domain from random subdomain in bash
I'm not an expert on domain names - Based on https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains, with minor exception, all domains with 2 letter suffix will have main domain of something.bb.cc, and all other suffix (usually 3 letters), the main domain will be something.ccc
Using bash
domain=...
md=
p2='^(.*\.)?([^.]+\.[a-z]+\.[a-z][a-z])$'
p3='^(.*\.)?([^.]+\.(com|org|net|int|edu|gov|mil))$'
px='^(.*\.)([a-z]+)$'
# 2 letter country codes
if [[ "$domain" =~ $p2 ]] ; then
md=${BASH_REMATCH[2]};
# 3 letters legacy domain
elif [[ "$domain" =~ $p3 ]] ; then
md=${BASH_REMATCH[2]};
# All Other
elif [[ "$domain" =~ $px ]] ; then
md=${BASH_REMATCH[2]};
fi ;
echo "$domain -> $md"
Could extend to handle few 4 letter domain
how to distinguish the domain from a subdomain
My solution to find the domain as registered with the registrar:
wget https://raw.githubusercontent.com/gavingmiller/second-level-domains/master/SLDs.csv
DOMAIN="www.e-learning.go4progress.co.uk";
KEEPPARTS=2;
TWOLEVELS=$( /bin/echo "${DOMAIN}" | /usr/bin/rev | /usr/bin/cut -d "." --output-delimiter=".\\" -f 1-2 | /usr/bin/rev );
if /bin/grep -P ",\.${TWOLEVELS}" SLDs.csv >/dev/null; then
KEEPPARTS=3;
fi
DOMAIN=$( /bin/echo "${DOMAIN}" | /usr/bin/rev | /usr/bin/cut -d "." -f "1-${KEEPPARTS}" | /usr/bin/rev );
echo "${DOMAIN}"
Thanks to https://github.com/gavingmiller/second-level-domains and https://github.com/medialize/URI.js/issues/17#issuecomment-3976617
Foreach loop in bash
Using grep
:
grep -F -f domains.csv url.csv
Test Results:
$ cat wordlist
github.com
youtube.com
facebook.com
$ cat urllist
| URL |
| ------------------------------|
| http://github.com/name |
| http://stackoverflow.com/name2|
| http://stackoverflow.com/name3|
| http://www.linkedin.com/name3 |
$ grep -F -f wordlist urllist
| http://github.com/name |
Extract filename and extension in Bash
First, get file name without the path:
filename=$(basename -- "$fullfile")
extension="${filename##*.}"
filename="${filename%.*}"
Alternatively, you can focus on the last '/' of the path instead of the '.' which should work even if you have unpredictable file extensions:
filename="${fullfile##*/}"
You may want to check the documentation :
- On the web at section "3.5.3 Shell Parameter Expansion"
- In the bash manpage at section called "Parameter Expansion"
Related Topics
Show Image Notification from Bash Script
How to Make Libusb Library Visible to Another Program
How to Measure Net Used Disk Space Change Due to Activity by a Given Process in Linux
Command Line Video Editing Tools
Do Here-Strings Undergo Word-Splitting
Check What Conda Environment Is Currently Activated
Gunicorn Does Not Start After Boot
Print Bash Script Result Behind Prompt in The Next Line
How to Send a Signal to Process That Belongs to Different User
Programmatically Set Custom Folder/Directory Icon in Linux
Stty: Standard Input: Inappropriate Ioctl for Device
Installing New Version of Python on Debian Linux Server
Can Inotify Tell Me Where a Monitored File Is Moved
How to Change Port Gitlab on Centos 6
Echo - Syntax Error: Bad Substitution
Using Pthread Mutex Shared Between Processes Correctly
Permission Denied: '/Var/Lib/Pgadmin/Sessions' in Docker
What's a Simple Method to Dump Pipe Input to a File? (Linux)