Compare Two CSV Files and Search for Similar Items

Compare two CSV files and search for similar items

Edit: While my solution works correctly, check out Martijn's answer below for a more efficient solution.

You can find the documentation for the python CSV module here.

What you're looking for is something like this:

import csv

f1 = file('hosts.csv', 'r')
f2 = file('masterlist.csv', 'r')
f3 = file('results.csv', 'w')

c1 = csv.reader(f1)
c2 = csv.reader(f2)
c3 = csv.writer(f3)

masterlist = list(c2)

for hosts_row in c1:
    row = 1
    found = False
    for master_row in masterlist:
        results_row = hosts_row
        if hosts_row[3] == master_row[1]:
            results_row.append('FOUND in master list (row ' + str(row) + ')')
            found = True
            break
        row = row + 1
    if not found:
        results_row.append('NOT FOUND in master list')
    c3.writerow(results_row)

f1.close()
f2.close()
f3.close()

Compare two csv files and write the matching entries in third file python

You are rewriting output file each time.
Change "w" to "a+":

with open('file3.csv', "a+", encoding....

Compare two CSV files and look for matches Python

Try this:

import csv

alist, blist = [], []

with open("csv1.csv", "rb") as fileA:
    reader = csv.reader(fileA, delimiter=',')
    for row in reader:
        for row_str in row:
            alist += row_str.strip().split()

with open("organs.csv", "rb") as fileB:
    reader = csv.reader(fileB, delimiter=',')
    for row in reader:
        blist += row

first_set = set(alist)
second_set = set(blist)

print first_set.intersection(second_set)

Basically, iterating through the csv file via csv reader returns a row which is a list of the items (strings) like this ['arm', 'biopsy', 'forearm'], so you have to sum lists to insert all of the items.

On the other hand, to remove duplications only one set conversion via the set() function is required, and the intersection method returns another set with the elements.

how to compare two csv file in python and flag the difference?

The idea here is to flatten your dataframe with melt to compare each value:

# Load your csv files
df1 = pd.read_csv('file1.csv', ...)
df2 = pd.read_csv('file2.csv', ...)

# Select columns (not mandatory, it depends on your 'Sn' column)
cols = ['Name', 'Subject', 'Marks']

# Flat your dataframes
out1 = df1[cols].melt('Name', var_name='Item', value_name='Old')
out2 = df2[cols].melt('Name', var_name='Item', value_name='New')
out = pd.merge(out1, out2, on=['Name', 'Item'], how='outer')

# Flag the state of each item
condlist = [out['Old'] != out['New'],
            out['Old'].isna(),
            out['New'].isna()]

out['State'] = np.select(condlist, choicelist=['changed', 'added', 'deleted'], 
                         default='unchanged')

Output:

>>> out
     Name     Item      Old       New      State
0     Ram  Subject    Maths  computer    changed
1    sita  Subject  Engilsh   Engilsh  unchanged
2  vishnu  Subject  science   science  unchanged
3  balaji  Subject   social    social  unchanged
4     Ram    Marks       85        85  unchanged
5    sita    Marks       66        66  unchanged
6  vishnu    Marks       50        90    changed
7  balaji    Marks       60        60  unchanged
8  kishor  Subject      NaN      chem    changed
9  kishor    Marks      NaN        99    changed

Compare two csv files and output changes

My solution is to turn each csv into a dictionary with the first column as the keys and the second column as the values. After that, I can loop through the keys and determine if the corresponding values were changed, removed, or added.

import csv
import re

def csv2dict(filename):
    with open(filename) as file_handle:
        reader = csv.reader(file_handle)
        dict_object = dict(reader)
        return dict_object

def separate_text_and_number(value):
    text, number = re.match(r'(\D+)(\d+)', value).groups()
    number = int(number)
    return (text, number)

def main():
    """ Entry """
    csv1 = csv2dict('file1.csv')
    csv2 = csv2dict('file2.csv')
    all_keys = csv1.keys() | csv2.keys()

    for key in sorted(all_keys, key=separate_text_and_number):
        if key not in csv2:
            print(f'{key} value removed')
        elif key not in csv1:
            print(f'{key} value added')
        elif csv1[key] != csv2[key]:
            print(f'{key} value changed from {csv1[key]} to {csv2[key]}')

if __name__ == '__main__':
    main()

Output

name1 value changed from 2.0001 to 3.0000
name3 value added
name4 value removed
name5 value changed from 1.0000 to 1.0901
name7 value added
name8 value removed
name10 value removed
name11 value added
name12 value added

Notes

The function csv2dict opens a file and converts the contents into a dictionary
The function separate_text_and_number splits name14 into ('name', 14) to help with sorting the keys
In Python 3, the dict.keys() method returns a set-like object which contains all the keys. I use the | operator to find a union of two sets of keys.
For a more readable output, I sort the keys with the help of separate_text_and_number

Compare Two CSV Files and Search for Similar Items