Compare a Column Between 2 CSV Files and Write Differences Using Python

Compare a column between 2 csv files and write differences using Python

import csv

data = {} # creating dictionary to store the data

with open('CSV1.csv', 'r') as lookuplist:
reader1 = csv.reader(lookuplist)
for col in reader1:
data[col[1]] = col[1] # stores the data from column 0 to column 1 in the data list

with open('CSV2.csv', 'r') as csvinput, open('Output.csv', 'w', newline='') as f_output:
reader2 = csv.reader(csvinput)
csv_output = csv.writer(f_output)
fieldnames = (['SERVER', 'FQDN', 'AUTOMATION_ADMINISTRATOR', 'IP_ADDRESS', 'PRIMARY_1', 'MHT_1', 'MHT_2',
'MHT_3'])
csv_output.writerow(fieldnames) # prints header to the output file

for col in reader2:
if col[0] not in data: # if the column 1 in CSV1 does not match with column 0 in CSV2 Extract
col = [col[0]]


csv_output.writerow(col) # writes all the data that is matched in CMDB WLC Extract

So basically, I only had to change 'not in' under 'for loop' and change the columns in the data list that will be reading from the CSV1 file that I am creating.

How do I read two CSV files and compare 1 column from both and then write to a new file where columns match

Try with merge

out = df1.merge(df2[['Names']],how='inner',on='Names')
Out[44]:
ID Names
0 1 Bill

How to compare 2 different csv files and output the differences

Have you looked at csv-diff? Their website has an example that might be suitable:

from csv_diff import load_csv, compare
diff = compare(
load_csv(open("one.csv"), key="id"),
load_csv(open("two.csv"), key="id")
)

This should return a dict object, which you can parse into a CSV file. To parse that dict into rows, this is an example. Note: getting the changes to write correctly is difficult, but this is more of a proof-of-concept - modify as you wish

from csv_diff import load_csv, compare
fro csv import DictWriter

# Get all the row headers across all the changes
headers = set({'change type'})
for key, vals in diff.items():
for val in vals: # Multiple of the same difference 'type'
headers = headers.union(set(val.keys()))

# Write changes to file
with open('changes.csv', 'w', encoding='utf-8') as fh:
w = DictWriter(fh, headers)
w.writeheader()
for key, changes in diff.items():
for val in changes: # Add each instance of this type of change
val.update({'change type': key}) # Add 'change type' data
w.writerow(val)

For the file one.csv:

id,     name, age
1, Cleo, 4
2, Pancakes, 2

and two.csv:

id,   name, age
1, Cleo, 5
3, Bailey, 1
4, Elliot, 10

Running this produces:

change type,     name, id,               changes, age, key
added, Bailey, 3, , 1,
added, Elliot, 4, , 10,
removed, Pancakes, 2, , 2,
changed, , , "{'age': ['4', '5']}", , 1

So not great for all changes, but works really well for the added/removed rows.

Compare two columns in csv files

My answer will work with all records in files. It will find match in all records in file1 and file2.

  1. Reverse list reader1 = [i[::-1] for i in reader1] for ordering it.
  2. Making a list of theese two reader = reader1 + reader2
  3. Making a dictionary, which will find all matches by number.
  4. Just printing result of our searching
import csv

interesting_cols = [0, 2, 3, 4, 5]
with open("file1.csv", 'r') as file1,\
open("file2.csv", 'r') as file2:
reader1, reader2 = csv.reader(file1), csv.reader(file2)

reader1 = [i[::-1] for i in reader1]
reader2 = [i for i in reader2]

dictionary_of_records = {item[0]: [] for item in reader1}

for i, item in enumerate(reader2):
key = item[0]
if key in dictionary_of_records:
dictionary_of_records[key].append(i)

for key, value in dictionary_of_records.items():
if len(value) >= 1:
print(f"Match for {key}")
for index in value:
print(' '.join(reader2[index]))
else:
print(f"No match for {key}")
print("-----------------------------")

P.S. It's quite hardcode, I think. You can also watch pandas library or itertools to find more beatiful way to do it.

How to compare two different CSV files with different number of columns and rows by keyword?

You seperately need to check each column you want to compare. The following code is one option how to do this.

import pandas as pd

def getLinesOnlyInA(dfa, dfb, result):
# iterate over all lines in file a
for index1, row1 in dfa.iterrows():
aLineIsEqual = False
# iterate over all lines in file b
for index2, row2 in dfb.iterrows():
thisLineIsDifferent = False
# for each column, check if they are different
for column in columns:
if row1[column] != row2[column]:
thisLineIsDifferent = True
# ionly continue when the fields are the same
break
if not thisLineIsDifferent:
aLineIsEqual = True
# only continue when no equal line was found
break
# when no equal line was found, add that line to the result
if not aLineIsEqual:
result.append(row1)


df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')
columns = ['round', 'first', 'fifth'] # columns to be compared
results = []

getLinesOnlyInA(df1, df2, results) # find all lines only existing in file 1
getLinesOnlyInA(df2, df1, results) # find all lines only existing in file 2
dfResult = pd.DataFrame(results) # cast all lines into a dataframe

print(dfResult.to_string())
dfResult.to_csv('result.csv', sep=',')

Result:

   round     date  first second third fourth  fifth sixth
2 5 2021.03 4 43!d9 23 26 29 33
0 1 2021.04 1 14 15 24 40 41
3 4 2021.03 10 11 20 21 24325 41
4 6 2021.03 4321 9 2#@6 28 34350 41

In case one file contains less columns (eg. file1 does not contain the date column), this still works and results in:

   round  first second third fourth  fifth sixth     date
2 5 4 43!d9 23 26 29 33 NaN
0 1 1 14 15 24 40 41 2021.04
3 4 10 11 20 21 24325 41 2021.03
4 6 4321 9 2#@6 28 34350 41 2021.03

Python - Better way to compare 2 csv files and add 2 new columns if condition met?

You needed merge, moreover, you need to make sure the 'user_id' columns from both dataframes have the same data type for merge to be able to identify matches. For example, 123 as integer won't match with '123' as string.

outcome = optin_infile.merge(
extravars_infile[['user_id', 'country', 'year_of_birth']],
on='user_id', how='left'
)

outcome.to_csv(export_file_name, index=False)

If you are wondering why your attempts on using merge didn't work, after making sure about the datatype, then, share your code that involves the failed merge, and share minimum number of rows of each of your dataframes so that we can reproduce your failure attempt in our computer and investigate for the reason.



Related Topics



Leave a reply



Submit