How to Find a List of Ip Addresses in Another File

How to find a list of ip addresses in another file

A better script :

while read ip  
do
grep "$ip" "$ips" > /dev/null 2>&1 && echo "$ip" >> ip.found || echo "$ip" >> ip.notfound
done

Name the script "searchip.sh"

Assume your input file is "iplist" ,set up variable and call like this:

ips=ips
cat iplist | sh searchip.sh

or

sh searchip.sh < iplist  

Then you get two files , one is ip found, other one is ip not found.

What you need is shell I/O redirection.

How to extract all IP addresses present in a line from a file?

If you match globally (//g) in a while loop, you can get all IP addresses:

use warnings;
use strict;
use Regexp::Common qw/net/;

while (<DATA>) {
my ($grp) = /^(\w+)/;
my @ips;
while (/($RE{net}{IPv4})/g) {
push @ips, $1;
}
print join(',', $grp, @ips), "\n";
}
__DATA__
ip_group1,1.2.3.4,otherstring1,otherstring2,4.5.6.7
ip_group2,3.4.5.6,otherstring1
ip_group3,11.21.31.41,otherstring1,otherstring2,4.5.6.7,otherstring4,1.2.3.4,otherstring5,otherstring2,41.51.16.71

Prints:

ip_group1,1.2.3.4,4.5.6.7
ip_group2,3.4.5.6
ip_group3,11.21.31.41,4.5.6.7,1.2.3.4,41.51.16.71

Fastest way to find lines of a file from another larger file in Bash

A small piece of Perl code solved the problem. This is the approach taken:

  • store the lines of file1.txt in a hash
  • read file2.txt line by line, parse and extract the second field
  • check if the extracted field is in the hash; if so, print the line

Here is the code:

#!/usr/bin/perl -w

use strict;
if (scalar(@ARGV) != 2) {
printf STDERR "Usage: fgrep.pl smallfile bigfile\n";
exit(2);
}

my ($small_file, $big_file) = ($ARGV[0], $ARGV[1]);
my ($small_fp, $big_fp, %small_hash, $field);

open($small_fp, "<", $small_file) || die "Can't open $small_file: " . $!;
open($big_fp, "<", $big_file) || die "Can't open $big_file: " . $!;

# store contents of small file in a hash
while (<$small_fp>) {
chomp;
$small_hash{$_} = undef;
}
close($small_fp);

# loop through big file and find matches
while (<$big_fp>) {
# no need for chomp
$field = (split(/\|/, $_))[1];
if (defined($field) && exists($small_hash{$field})) {
printf("%s", $_);
}
}

close($big_fp);
exit(0);

I ran the above script with 14K lines in file1.txt and 1.3M lines in file2.txt. It finished in about 13 seconds, producing 126K matches. Here is the time output for the same:

real    0m11.694s
user 0m11.507s
sys 0m0.174s

I ran @Inian's awk code:

awk 'FNR==NR{hash[$1]; next}{for (i in hash) if (match($0,i)) {print; break}}' file1.txt FS='|' file2.txt

It was way slower than the Perl solution, since it is looping 14K times for each line in file2.txt - which is really expensive. It aborted after processing 592K records of file2.txt and producing 40K matched lines. Here is how long it took:

awk: illegal primary in regular expression 24/Nov/2016||592989 at 592989
input record number 675280, file file2.txt
source line number 1

real 55m5.539s
user 54m53.080s
sys 0m5.095s

Using @Inian's other awk solution, which eliminates the looping issue:

time awk -F '|' 'FNR==NR{hash[$1]; next}$2 in hash' file1.txt FS='|' file2.txt > awk1.out

real 0m39.966s
user 0m37.916s
sys 0m0.743s

time LC_ALL=C awk -F '|' 'FNR==NR{hash[$1]; next}$2 in hash' file1.txt FS='|' file2.txt > awk.out

real 0m41.057s
user 0m38.475s
sys 0m0.904s

awk is very impressive here, given that we didn't have to write an entire program to do it.

I ran @oliv's Python code as well. It took about 15 hours to complete the job, and looked like it produced the right results. Building a huge regex isn't as efficient as using a hash lookup. Here the time output:

real    895m14.862s
user 806m59.219s
sys 1m12.147s

I tried to follow the suggestion to use parallel. However, it failed with fgrep: memory exhausted error, even with very small block sizes.


What surprised me was that fgrep was totally unsuitable for this. I aborted it after 22 hours and it produced about 100K matches. I wish fgrep had an option to force the content of -f file to be kept in a hash, just like what the Perl code did.

I didn't check join approach - I didn't want the additional overhead of sorting the files. Also, given fgrep's poor performance, I don't believe join would have done better than the Perl code.

Thanks everyone for your attention and responses.

COMPARE IP From One File to Another

1.
Try to open files with with open ... as, because in your current code you only open them and not even close. This way it will automaticcely close when finished opening the file.

with open('a.txt','r') as f:
output_a = [i for i in f]
with open('b.txt','r') as f:
output_b = [i for i in f]

2.
To be able to find if three octects are in the list of some other ip's you could do something like this:

# Testdata
output_a = ['1.1.2.9', '1.5.65.32']
output_b = ['1.2.57.1', '1.5.65.39']

# This will check if any subIP is in the list of other ip's and if not will append to result
# The rest I leave up to you ;)

# First solution
result = []
subIP = lambda x: ['.'.join(i.split('.')[:3]) for i in x]
for sub, full in zip(subIP(output_a), output_a):
if not len([ip for ip in subIP(output_b) if sub == ip]):
result.append(full)
print(result)
# Will print: ['1.1.2.9']

# Second solution
# Some set operations will also work, but will leave you with a set of ip with 3 octets
# afterwards you need to convert it back to full length if you want
print (set(subIP(output_a)) - set(subIP(output_b)))
# Will print {'1.1.2'}

Find IP address in multiple text files and replace it with another string in Python with regex

If the ip is always your last field you can simply do this:

txt = txt.replace("noresult", txt.split("ip[")[-1])

In details, if you want to read, modify and write:

txt = open("filename.txt", "r").read()
txt = txt.replace("noresult", txt.split("ip[")[-1])
open("filename.txt", "w").write(txt)

If you have more files, you can group their paths in a list file_list = ["file1.txt", "file2.txt", ... ] and then repeat the above procedures cycling on this list:

for filepath in file_list:
txt = open(filepath, "r").read()
txt = txt.replace("noresult", txt.split("ip[")[-1])
open(filepath, "w").write(txt)

Read an IP address from one file, then lookup that IP in another file and printout the correspondent interface

first make an ip list from ips.txt then check in l2circuitconfig.txt . where regex is used to find ip from each line:

import re
ipList = []
with open('ips.txt', "r") as f:
for line in f:
ipList.append(line.strip())

with open('l2circuitconfig.txt', "r") as f:
for line in f:
ipAddr = re.findall( r'[0-9]+(?:\.[0-9]+){3}', line )[0]
if ipAddr in ipList:
d = line.split()
print(d[6], '-', d[8])


Related Topics



Leave a reply



Submit