Search Ip from a Text File in .Csv Log File, If Found Add New Column Next to It

Check if string exists in multiple csv files and write row to file

This could be done in Python 2.x as follows:

from itertools import dropwhile
from collections import defaultdict
import glob    
import csv

fieldnames = ['E-MAIL ADDRESS', 'FIRST TIME LOGGED IN', 'LAST TIME LOGGED IN', 'USERNAME']
emails = defaultdict(list)

for csv_filename in glob.glob('*.csv'):
    with open(csv_filename, 'rb') as f_input:
        csv_reader = csv.DictReader(f_input, fieldnames=fieldnames, skipinitialspace=True)
        next(dropwhile(lambda x: x['E-MAIL ADDRESS'] != 'E-MAIL ADDRESS', csv_reader))

        for row in csv_reader:
            emails[row['E-MAIL ADDRESS']].append(row)

with open('output.csv', 'wb') as f_output:
    csv_writer = csv.DictWriter(f_output, fieldnames=fieldnames, extrasaction='ignore')
    csv_writer.writeheader()

    for email, rows in sorted(emails.items()):
        if len(rows) > 1:
            csv_writer.writerows(rows)

This uses the glob.glob() function to give you a list of .csv files. It writes all email addresses to output.csv where the email address is seen more than once across all CSV files found. It skips all lines until the line starting E-MAIL ADDRESS is found.

How to skip repeated entries in a .csv file

Would you please try the following:

declare -A seen                         # memorize the appearance of IPs
echo "Subdomain,IP" > subdomainIP.csv   # let's overwrite, not appending
while IFS= read -r line; do
    ipValue=                            # initialize the value
    while IFS= read -r ip; do
        if [[ $ip =~ ^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
            ipValue+="${ip}-"           # append the results with "-"
        fi
    done < <(dig +short "$line")        # assuming the result has multi-line
    ipValue=${ipValue%-}                # remove trailing "-" if any
    if [[ -n $ipValue ]] && (( seen[$ipValue]++ == 0 )); then
                # if the IP is not empty and not in the previous list
        echo "$line,$ipValue" >> subdomainIP.csv
    fi
done < URLs.txt

The associative array seen may be a key for the purpose. It is indexed
by an arbitrary string (ip adddress in the case) and can memorize the value
associated with the string. It will be suitable to check the appearance
of the ip address across the input lines.

CSV file reading and find the value from nth Column using Robot Framework

Firstly i tried with the inbuilt library as suggested by me to you

github.com/s4int/robotframework-CSVLibrary

i ran across to some errors,which may be because of format of my data.csv , but did not had enough time to debug that.

i created a custom library in python for your solution, you can use it for your work

data.csv

Name,Age,region,country,Marks
pankaj,22,delhi,india,45
neeraj,32,noida,india,75

python code to parse this data using csv module and return value of nth row and nth column

import csv
#Previous function to go to nth row and nth column
def go_to_nth_row_nth_column(File,row_no,col_no):
    inputFile = File
    row_no=int(row_no)
    col_no=int(col_no)
    with open(inputFile) as ip:
        reader = csv.reader(ip)
        for i, row in enumerate(reader):
            if i == row_no:      # here's the row 
                #print row[col_no] # here's the column
                return row[col_no]

#Function to find the string values, in case of duplicate occurrence as well
def search_cell(File,search_string):
    inputFile = File
    search_position=[]  #empty list which will later store row,column occurences
    with open(inputFile) as ip:
        reader = csv.reader(ip)
        for i, row in enumerate(reader):
            for j, column in enumerate(row):
                if search_string in column:      # here's the row 
                    #print((i,j))
                    search_position.append((i,j)) #this will create list of list i.e. list of row,columns in case of multi occurences
                    #return (i,j)   
    return search_position

you can use this as library in your robot file , like below

*** Settings ***
Library    csv2.py

 *** Test Cases ***
Test
    Check row column
    Search String

*** Keywords ***
Check row column
    ${result} =    go_to_nth_row_nth_column    data.csv    2    1
    log  ${result}

Search String
    ${result1}=    search_cell    data.csv    india
    log  ${result1}

read, compare, and save 2 files with shell script (little more then what it sounds like)

The general purpose standard UNIX tool for manipulating text is awk:

$ awk '
    BEGIN { FS=OFS=" ," }
    NR==FNR { a[$1]=$2; next }
    { print $0 ($2 in a ? OFS a[$2] : "") }
' file1 file2
outlook.office365.com ,174.203.0.118 ,UserLoginFailed
outlook.office365.com ,107.147.166.60 ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com ,107.147.167.26 ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com ,174.205.17.24 ,UserLoginFailed
outlook.office365.com ,108.48.185.186 ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com ,174.226.15.21 ,UserLoginFailed
outlook.office365.com ,108.51.114.130 ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com ,67.180.23.93 ,UserLoginFailed
outlook.office365.com ,142.255.102.68 ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com ,164.106.75.235 ,UserLoginFailed

How do you extract IP addresses from files using a regex in a linux shell?

You could use grep to pull them out.

grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' file.txt

Python text file into .csv

If you're trying to open your .csv in excel, it's probably not going to be very readable.
csv stands for comma separated values, and your output has no commas, but a lot of whitespace and newlines- excel can't really be expected to handle those. Here's what the output of your program is (plus newlines):

hostname cisco1841

interface Loopback0

ip address 10.10.0.1 255.255.255.255

You can format your output more correctly by adding commas to your output file in place of the spaces that are already there. For the ip address line you want to grab, it's harder since that has a space in its name as well. I solved this by just grabbing the first ipAddress listed in that line- if you want the other one or both it's an easy fix. I also removed the buffer you were using and "keepCurrentResultSet" because there's really no need for them that I can see.

inFile = open("text.txt")
outFile = open("result.csv", "w")
for line in inFile:
        if line.startswith ("hostname"):
                outFile.write(line.replace(' ',','))
        elif line.startswith("interface Loopback"):
                outFile.write(line.replace(' ',','))
                ipAddrLine = next(inFile)
                ipAddress = ipAddrLine.split(' ')[2:3]
                outFile.write('ip address,' + ','.join(ipAddress))
inFile.close()
outFile.close()

That gives this output, which should be considerd valid .csv format by excel:

hostname,cisco1841
interface,Loopback0
ip address,10.10.0.1

Script to compare a string in two different files

Here's the approach I'd take:

Iterate over each csv file (python has a handy csv module for accomplishing this), capturing the mac-address and placing it in a set (one per file). And once again, python has a great builtin set type. Here's a good example of using the csv module and of-course, the docs.
Next, you can get the intersection of set1 (file1) and set2 (file2). This will show you mac-addresses that exist in both files one and two.

Example (in python):

s1 = set([1,2,3])  # You can add things incrementally with "s1.add(value)"
s2 = set([2,3,4])

shared_items = s1.intersection(s2)
print shared_items

Which outputs:

set([2, 3])

Logging these shared items could be done with anything from printing (then redirecting output to a file), to using the logging module, to saving directly to a file.

I'm not sure how in-depth of an answer you were looking for, but this should get you started.

Update: CSV/Set usage example

Assuming you have a file "foo.csv", that looks something like this:

bob,123,127.0.0.1,mac-address-1
fred,124,127.0.0.1,mac-address-2

The simplest way to build the set, would be something like this:

import csv

set1 = set()
for record in csv.reader(open('foo.csv', 'rb')):
    user, machine_id, ip_address, mac_address = record
    set1.add(mac_address)
    # or simply "set1.add(record[3])", if you don't need the other fields.

Obviously, you'd need something like this for each file, so you may want to put this in a function to make life easier.

Finally, if you want to go the less-verbose-but-cooler-python-way, you could also build the set like this:

csvfile = csv.reader(open('foo.csv', 'rb'))
set1 = set(rec[3] for rec in csvfile)   # Assuming mac-address is the 4th column.