Check if string exists in multiple csv files and write row to file
This could be done in Python 2.x as follows:
from itertools import dropwhile
from collections import defaultdict
import glob
import csv
fieldnames = ['E-MAIL ADDRESS', 'FIRST TIME LOGGED IN', 'LAST TIME LOGGED IN', 'USERNAME']
emails = defaultdict(list)
for csv_filename in glob.glob('*.csv'):
with open(csv_filename, 'rb') as f_input:
csv_reader = csv.DictReader(f_input, fieldnames=fieldnames, skipinitialspace=True)
next(dropwhile(lambda x: x['E-MAIL ADDRESS'] != 'E-MAIL ADDRESS', csv_reader))
for row in csv_reader:
emails[row['E-MAIL ADDRESS']].append(row)
with open('output.csv', 'wb') as f_output:
csv_writer = csv.DictWriter(f_output, fieldnames=fieldnames, extrasaction='ignore')
csv_writer.writeheader()
for email, rows in sorted(emails.items()):
if len(rows) > 1:
csv_writer.writerows(rows)
This uses the glob.glob()
function to give you a list of .csv
files. It writes all email addresses to output.csv
where the email address is seen more than once across all CSV files found. It skips all lines until the line starting E-MAIL ADDRESS
is found.
How to skip repeated entries in a .csv file
Would you please try the following:
declare -A seen # memorize the appearance of IPs
echo "Subdomain,IP" > subdomainIP.csv # let's overwrite, not appending
while IFS= read -r line; do
ipValue= # initialize the value
while IFS= read -r ip; do
if [[ $ip =~ ^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
ipValue+="${ip}-" # append the results with "-"
fi
done < <(dig +short "$line") # assuming the result has multi-line
ipValue=${ipValue%-} # remove trailing "-" if any
if [[ -n $ipValue ]] && (( seen[$ipValue]++ == 0 )); then
# if the IP is not empty and not in the previous list
echo "$line,$ipValue" >> subdomainIP.csv
fi
done < URLs.txt
- The associative array
seen
may be a key for the purpose. It is indexed
by an arbitrary string (ip adddress in the case) and can memorize the value
associated with the string. It will be suitable to check the appearance
of the ip address across the input lines.
CSV file reading and find the value from nth Column using Robot Framework
Firstly i tried with the inbuilt library as suggested by me to you
github.com/s4int/robotframework-CSVLibrary
i ran across to some errors,which may be because of format of my data.csv , but did not had enough time to debug that.
i created a custom library in python for your solution, you can use it for your work
data.csv
Name,Age,region,country,Marks
pankaj,22,delhi,india,45
neeraj,32,noida,india,75
python code to parse this data using csv module and return value of nth row and nth column
import csv
#Previous function to go to nth row and nth column
def go_to_nth_row_nth_column(File,row_no,col_no):
inputFile = File
row_no=int(row_no)
col_no=int(col_no)
with open(inputFile) as ip:
reader = csv.reader(ip)
for i, row in enumerate(reader):
if i == row_no: # here's the row
#print row[col_no] # here's the column
return row[col_no]
#Function to find the string values, in case of duplicate occurrence as well
def search_cell(File,search_string):
inputFile = File
search_position=[] #empty list which will later store row,column occurences
with open(inputFile) as ip:
reader = csv.reader(ip)
for i, row in enumerate(reader):
for j, column in enumerate(row):
if search_string in column: # here's the row
#print((i,j))
search_position.append((i,j)) #this will create list of list i.e. list of row,columns in case of multi occurences
#return (i,j)
return search_position
you can use this as library in your robot file , like below
*** Settings ***
Library csv2.py
*** Test Cases ***
Test
Check row column
Search String
*** Keywords ***
Check row column
${result} = go_to_nth_row_nth_column data.csv 2 1
log ${result}
Search String
${result1}= search_cell data.csv india
log ${result1}
read, compare, and save 2 files with shell script (little more then what it sounds like)
The general purpose standard UNIX tool for manipulating text is awk:
$ awk '
BEGIN { FS=OFS=" ," }
NR==FNR { a[$1]=$2; next }
{ print $0 ($2 in a ? OFS a[$2] : "") }
' file1 file2
outlook.office365.com ,174.203.0.118 ,UserLoginFailed
outlook.office365.com ,107.147.166.60 ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com ,107.147.167.26 ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com ,174.205.17.24 ,UserLoginFailed
outlook.office365.com ,108.48.185.186 ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com ,174.226.15.21 ,UserLoginFailed
outlook.office365.com ,108.51.114.130 ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com ,67.180.23.93 ,UserLoginFailed
outlook.office365.com ,142.255.102.68 ,UserLoginFailed ,SUSPICIOUS IP
outlook.office365.com ,164.106.75.235 ,UserLoginFailed
How do you extract IP addresses from files using a regex in a linux shell?
You could use grep to pull them out.
grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' file.txt
Python text file into .csv
If you're trying to open your .csv in excel, it's probably not going to be very readable.
csv stands for comma separated values, and your output has no commas, but a lot of whitespace and newlines- excel can't really be expected to handle those. Here's what the output of your program is (plus newlines):
hostname cisco1841
interface Loopback0
ip address 10.10.0.1 255.255.255.255
You can format your output more correctly by adding commas to your output file in place of the spaces that are already there. For the ip address line you want to grab, it's harder since that has a space in its name as well. I solved this by just grabbing the first ipAddress listed in that line- if you want the other one or both it's an easy fix. I also removed the buffer you were using and "keepCurrentResultSet" because there's really no need for them that I can see.
inFile = open("text.txt")
outFile = open("result.csv", "w")
for line in inFile:
if line.startswith ("hostname"):
outFile.write(line.replace(' ',','))
elif line.startswith("interface Loopback"):
outFile.write(line.replace(' ',','))
ipAddrLine = next(inFile)
ipAddress = ipAddrLine.split(' ')[2:3]
outFile.write('ip address,' + ','.join(ipAddress))
inFile.close()
outFile.close()
That gives this output, which should be considerd valid .csv format by excel:
hostname,cisco1841
interface,Loopback0
ip address,10.10.0.1
Script to compare a string in two different files
Here's the approach I'd take:
Iterate over each csv file (python has a handy
csv
module for accomplishing this), capturing the mac-address and placing it in a set (one per file). And once again, python has a great builtinset
type. Here's a good example of using thecsv
module and of-course, the docs.Next, you can get the
intersection
of set1 (file1) and set2 (file2). This will show you mac-addresses that exist in both files one and two.
Example (in python):
s1 = set([1,2,3]) # You can add things incrementally with "s1.add(value)"
s2 = set([2,3,4])
shared_items = s1.intersection(s2)
print shared_items
Which outputs:
set([2, 3])
Logging these shared items could be done with anything from printing (then redirecting output to a file), to using the logging
module, to saving directly to a file.
I'm not sure how in-depth of an answer you were looking for, but this should get you started.
Update: CSV/Set usage example
Assuming you have a file "foo.csv", that looks something like this:
bob,123,127.0.0.1,mac-address-1
fred,124,127.0.0.1,mac-address-2
The simplest way to build the set, would be something like this:
import csv
set1 = set()
for record in csv.reader(open('foo.csv', 'rb')):
user, machine_id, ip_address, mac_address = record
set1.add(mac_address)
# or simply "set1.add(record[3])", if you don't need the other fields.
Obviously, you'd need something like this for each file, so you may want to put this in a function to make life easier.
Finally, if you want to go the less-verbose-but-cooler-python-way, you could also build the set like this:
csvfile = csv.reader(open('foo.csv', 'rb'))
set1 = set(rec[3] for rec in csvfile) # Assuming mac-address is the 4th column.
Related Topics
Tcp Keepalive - Protocol Not Available
Linux History of All Commands Executed During Whole Day, Everyday
Sublime Text 2 Build (Ctrl +B) Intel Fortran Compiler
Can't Load Mod_Wsgi Compiled for Python 3
Openldap Naming Context Issue with Apache Directory Studio
Perl and Bash Variable Substitution, with Hexadecimal Characters and Repeat Count
Simpler Way to Repeatedly Read Lines and Invoke a Program
Sed: Remove Whole Words Containg a Character Class
Fortran: How to Get The Node Name of a Cluster
Rename Multiple Files - Linux/Ubuntu
Text Encoding Between Linux and Windows
Passwd in One Command Isn't Working