Compare a column between 2 csv files and write differences using Python
import csv
data = {} # creating dictionary to store the data
with open('CSV1.csv', 'r') as lookuplist:
reader1 = csv.reader(lookuplist)
for col in reader1:
data[col[1]] = col[1] # stores the data from column 0 to column 1 in the data list
with open('CSV2.csv', 'r') as csvinput, open('Output.csv', 'w', newline='') as f_output:
reader2 = csv.reader(csvinput)
csv_output = csv.writer(f_output)
fieldnames = (['SERVER', 'FQDN', 'AUTOMATION_ADMINISTRATOR', 'IP_ADDRESS', 'PRIMARY_1', 'MHT_1', 'MHT_2',
'MHT_3'])
csv_output.writerow(fieldnames) # prints header to the output file
for col in reader2:
if col[0] not in data: # if the column 1 in CSV1 does not match with column 0 in CSV2 Extract
col = [col[0]]
csv_output.writerow(col) # writes all the data that is matched in CMDB WLC Extract
So basically, I only had to change 'not in' under 'for loop' and change the columns in the data list that will be reading from the CSV1 file that I am creating.
How do I read two CSV files and compare 1 column from both and then write to a new file where columns match
Try with merge
out = df1.merge(df2[['Names']],how='inner',on='Names')
Out[44]:
ID Names
0 1 Bill
How to compare 2 different csv files and output the differences
Have you looked at csv-diff
? Their website has an example that might be suitable:
from csv_diff import load_csv, compare
diff = compare(
load_csv(open("one.csv"), key="id"),
load_csv(open("two.csv"), key="id")
)
This should return a dict
object, which you can parse into a CSV file. To parse that dict into rows, this is an example. Note: getting the changes to write correctly is difficult, but this is more of a proof-of-concept - modify as you wish
from csv_diff import load_csv, compare
fro csv import DictWriter
# Get all the row headers across all the changes
headers = set({'change type'})
for key, vals in diff.items():
for val in vals: # Multiple of the same difference 'type'
headers = headers.union(set(val.keys()))
# Write changes to file
with open('changes.csv', 'w', encoding='utf-8') as fh:
w = DictWriter(fh, headers)
w.writeheader()
for key, changes in diff.items():
for val in changes: # Add each instance of this type of change
val.update({'change type': key}) # Add 'change type' data
w.writerow(val)
For the file one.csv
:
id, name, age
1, Cleo, 4
2, Pancakes, 2
and two.csv
:
id, name, age
1, Cleo, 5
3, Bailey, 1
4, Elliot, 10
Running this produces:
change type, name, id, changes, age, key
added, Bailey, 3, , 1,
added, Elliot, 4, , 10,
removed, Pancakes, 2, , 2,
changed, , , "{'age': ['4', '5']}", , 1
So not great for all changes, but works really well for the added/removed rows.
Compare two columns in csv files
My answer will work with all records in files. It will find match in all records in file1 and file2.
- Reverse list
reader1 = [i[::-1] for i in reader1]
for ordering it. - Making a list of theese two
reader = reader1 + reader2
- Making a dictionary, which will find all matches by number.
- Just printing result of our searching
import csv
interesting_cols = [0, 2, 3, 4, 5]
with open("file1.csv", 'r') as file1,\
open("file2.csv", 'r') as file2:
reader1, reader2 = csv.reader(file1), csv.reader(file2)
reader1 = [i[::-1] for i in reader1]
reader2 = [i for i in reader2]
dictionary_of_records = {item[0]: [] for item in reader1}
for i, item in enumerate(reader2):
key = item[0]
if key in dictionary_of_records:
dictionary_of_records[key].append(i)
for key, value in dictionary_of_records.items():
if len(value) >= 1:
print(f"Match for {key}")
for index in value:
print(' '.join(reader2[index]))
else:
print(f"No match for {key}")
print("-----------------------------")
P.S. It's quite hardcode, I think. You can also watch pandas library or itertools to find more beatiful way to do it.
How to compare two different CSV files with different number of columns and rows by keyword?
You seperately need to check each column you want to compare. The following code is one option how to do this.
import pandas as pd
def getLinesOnlyInA(dfa, dfb, result):
# iterate over all lines in file a
for index1, row1 in dfa.iterrows():
aLineIsEqual = False
# iterate over all lines in file b
for index2, row2 in dfb.iterrows():
thisLineIsDifferent = False
# for each column, check if they are different
for column in columns:
if row1[column] != row2[column]:
thisLineIsDifferent = True
# ionly continue when the fields are the same
break
if not thisLineIsDifferent:
aLineIsEqual = True
# only continue when no equal line was found
break
# when no equal line was found, add that line to the result
if not aLineIsEqual:
result.append(row1)
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')
columns = ['round', 'first', 'fifth'] # columns to be compared
results = []
getLinesOnlyInA(df1, df2, results) # find all lines only existing in file 1
getLinesOnlyInA(df2, df1, results) # find all lines only existing in file 2
dfResult = pd.DataFrame(results) # cast all lines into a dataframe
print(dfResult.to_string())
dfResult.to_csv('result.csv', sep=',')
Result:
round date first second third fourth fifth sixth
2 5 2021.03 4 43!d9 23 26 29 33
0 1 2021.04 1 14 15 24 40 41
3 4 2021.03 10 11 20 21 24325 41
4 6 2021.03 4321 9 2#@6 28 34350 41
In case one file contains less columns (eg. file1
does not contain the date
column), this still works and results in:
round first second third fourth fifth sixth date
2 5 4 43!d9 23 26 29 33 NaN
0 1 1 14 15 24 40 41 2021.04
3 4 10 11 20 21 24325 41 2021.03
4 6 4321 9 2#@6 28 34350 41 2021.03
Python - Better way to compare 2 csv files and add 2 new columns if condition met?
You needed merge
, moreover, you need to make sure the 'user_id'
columns from both dataframes have the same data type for merge
to be able to identify matches. For example, 123
as integer won't match with '123'
as string.
outcome = optin_infile.merge(
extravars_infile[['user_id', 'country', 'year_of_birth']],
on='user_id', how='left'
)
outcome.to_csv(export_file_name, index=False)
If you are wondering why your attempts on using merge
didn't work, after making sure about the datatype, then, share your code that involves the failed merge
, and share minimum number of rows of each of your dataframes so that we can reproduce your failure attempt in our computer and investigate for the reason.
Related Topics
Convert HTML String to an Image in Python
How to Export a Table Dataframe in Pyspark to Csv
How to Convert List into String With Quotes in Python
Changing Presence Discord Status
Beautifulsoup Findall() Given Multiple Classes
Type Conversion in Python Attributeerror: 'Str' Object Has No Attribute 'Astype'
Cannot Convert the Series to <Class 'Int''>
Pandas To_Csv: Suppress Scientific Notation in CSV File When Writing Pandas to Csv
Printing Even Characters With Strings in Python
Read CSV from Google Cloud Storage to Pandas Dataframe
How to Get a Fields Particular Value of Json in Python
Pip Install Pandas: Installing Dependencies Error
Python | Make the Percentage of a List
How to Test If a Column Exists and Is Not Null in a Dataframe
Running a Powershell Cmdlet in a Python Script