Compare two CSV files and search for similar items
Edit: While my solution works correctly, check out Martijn's answer below for a more efficient solution.
You can find the documentation for the python CSV module here.
What you're looking for is something like this:
import csv
f1 = file('hosts.csv', 'r')
f2 = file('masterlist.csv', 'r')
f3 = file('results.csv', 'w')
c1 = csv.reader(f1)
c2 = csv.reader(f2)
c3 = csv.writer(f3)
masterlist = list(c2)
for hosts_row in c1:
row = 1
found = False
for master_row in masterlist:
results_row = hosts_row
if hosts_row[3] == master_row[1]:
results_row.append('FOUND in master list (row ' + str(row) + ')')
found = True
break
row = row + 1
if not found:
results_row.append('NOT FOUND in master list')
c3.writerow(results_row)
f1.close()
f2.close()
f3.close()
Compare two csv files and write the matching entries in third file python
You are rewriting output file each time.
Change "w" to "a+":
with open('file3.csv', "a+", encoding....
Compare two CSV files and look for matches Python
Try this:
import csv
alist, blist = [], []
with open("csv1.csv", "rb") as fileA:
reader = csv.reader(fileA, delimiter=',')
for row in reader:
for row_str in row:
alist += row_str.strip().split()
with open("organs.csv", "rb") as fileB:
reader = csv.reader(fileB, delimiter=',')
for row in reader:
blist += row
first_set = set(alist)
second_set = set(blist)
print first_set.intersection(second_set)
Basically, iterating through the csv file via csv reader returns a row which is a list of the items (strings) like this ['arm', 'biopsy', 'forearm'], so you have to sum lists to insert all of the items.
On the other hand, to remove duplications only one set conversion via the set() function is required, and the intersection method returns another set with the elements.
how to compare two csv file in python and flag the difference?
The idea here is to flatten your dataframe with melt
to compare each value:
# Load your csv files
df1 = pd.read_csv('file1.csv', ...)
df2 = pd.read_csv('file2.csv', ...)
# Select columns (not mandatory, it depends on your 'Sn' column)
cols = ['Name', 'Subject', 'Marks']
# Flat your dataframes
out1 = df1[cols].melt('Name', var_name='Item', value_name='Old')
out2 = df2[cols].melt('Name', var_name='Item', value_name='New')
out = pd.merge(out1, out2, on=['Name', 'Item'], how='outer')
# Flag the state of each item
condlist = [out['Old'] != out['New'],
out['Old'].isna(),
out['New'].isna()]
out['State'] = np.select(condlist, choicelist=['changed', 'added', 'deleted'],
default='unchanged')
Output:
>>> out
Name Item Old New State
0 Ram Subject Maths computer changed
1 sita Subject Engilsh Engilsh unchanged
2 vishnu Subject science science unchanged
3 balaji Subject social social unchanged
4 Ram Marks 85 85 unchanged
5 sita Marks 66 66 unchanged
6 vishnu Marks 50 90 changed
7 balaji Marks 60 60 unchanged
8 kishor Subject NaN chem changed
9 kishor Marks NaN 99 changed
Compare two csv files and output changes
My solution is to turn each csv into a dictionary with the first column as the keys and the second column as the values. After that, I can loop through the keys and determine if the corresponding values were changed, removed, or added.
import csv
import re
def csv2dict(filename):
with open(filename) as file_handle:
reader = csv.reader(file_handle)
dict_object = dict(reader)
return dict_object
def separate_text_and_number(value):
text, number = re.match(r'(\D+)(\d+)', value).groups()
number = int(number)
return (text, number)
def main():
""" Entry """
csv1 = csv2dict('file1.csv')
csv2 = csv2dict('file2.csv')
all_keys = csv1.keys() | csv2.keys()
for key in sorted(all_keys, key=separate_text_and_number):
if key not in csv2:
print(f'{key} value removed')
elif key not in csv1:
print(f'{key} value added')
elif csv1[key] != csv2[key]:
print(f'{key} value changed from {csv1[key]} to {csv2[key]}')
if __name__ == '__main__':
main()
Output
name1 value changed from 2.0001 to 3.0000
name3 value added
name4 value removed
name5 value changed from 1.0000 to 1.0901
name7 value added
name8 value removed
name10 value removed
name11 value added
name12 value added
Notes
- The function
csv2dict
opens a file and converts the contents into a dictionary - The function
separate_text_and_number
splitsname14
into('name', 14)
to help with sorting the keys - In Python 3, the
dict.keys()
method returns a set-like object which contains all the keys. I use the|
operator to find a union of two sets of keys. - For a more readable output, I sort the keys with the help of
separate_text_and_number
Related Topics
Most Pythonic Way to Interleave Two Strings
Efficient Way to Add Spaces Between Characters in a String
How to Check That Multiple Keys Are in a Dict in a Single Pass
Reading Dynamically Generated Web Pages Using Python
How to Add Conda Environment to Jupyter Lab
How to Access a File's Properties on Windows
How to Extract Text and Text Coordinates from a PDF File
How to Extract Text and Text Coordinates from a PDF File
Case-Insensitive List Sorting, Without Lowercasing the Result
Getting the Indices of Several Elements in a Numpy Array at Once
Pandas Dataframe Fillna() Only Some Columns in Place
How to Fix the "Element Not Interactable" Exception
Instance Attribute Attribute_Name Defined Outside _Init_
How to Crop an Image with Pygame
Setting Django Up to Use MySQL