Python : Compare two csv files and print out differences
The problem is that you are comparing each line in fileone
to the same line in filetwo
. As soon as there is an extra line in one file you will find that the lines are never equal again. Try this:
with open('old.csv', 'r') as t1, open('new.csv', 'r') as t2:
fileone = t1.readlines()
filetwo = t2.readlines()
with open('update.csv', 'w') as outFile:
for line in filetwo:
if line not in fileone:
outFile.write(line)
How to compare two CSV files in Python?
Method 1: pandas
This task can be done with relative ease using pandas
. DataFrame documentation here.
Example:
In the example below, the two CSV files are read into two DataFrames. The DataFrames are merged using an inner join on the matching columns.
The output shows the merged result.
import pandas as pd
df1 = pd.read_csv('file1.csv', names=['col1', 'col2', 'col3'], quotechar="'", skipinitialspace=True)
df2 = pd.read_csv('file2.csv', names=['match'])
df = pd.merge(df1, df2, left_on=df1['col3'], right_on=df2['match'], how='inner')
The quotechar
and skipinitialspace
parameters are used as the first column in file1
is quoted and contains a comma, and there is leading whitespace after the comma before the last field.
Output:
col1 col2 col3
0 A J1, Jhon1 jhon1@jhon.com A/B-201 Test1
1 A J3, Jhon3 jhon3@jhon.com A/B-203 Test3
If you choose, the output can easily be written back to a CSV file as:
df.to_csv('path/to/output.csv')
For other DataFrame operations, refer to the documentation linked above.
Method 2: Core Python
The method below does not use any libraries, only core Python.
- Read the matches from
file2
into a list. - Iterate over
file1
and search each line to determine if the last value is a match for an item infile2
. - Report the output.
Any subsequent data cleaning (if required) will be up to your personal requirements or use-case.
Example:
output = []
# Read the matching values into a list.
with open('file2.csv') as f:
matches = [i.strip() for i in f]
# Iterate over file1 and place any matches into the output.
with open('file1.csv') as f:
for i in f:
match = i.split(',')[-1].strip()
if any(match == j for j in matches):
output.append(i)
Output:
["'A J1, Jhon1',jhon1@jhon.com, A/B-201 Test1\n",
"'A J3, Jhon3',jhon3@jhon.com, A/B-203 Test3\n"]
How to compare two csv files and print all the differences
You could consider using difflib for this, but it will have the same limitations as command-line diff. It can report a line as "new" when it's merely moved.
Assuming order isn't important, the set-based approach is probably what you need.
Python : How to compare two csv files and print out the matching strings in a new file
Try below code:
import pandas as pd
df1=pd.read_csv('raw_data.csv')
df2=pd.read_csv('new_data.csv')
df_final=pd.merge(df1,df2,on=['compound_name'])
df_final.to_csv('final.csv',columns=['name_id','reference_id'])
Hope this helps!
Related Topics
Remove Last Few Characters in Pyspark Dataframe Column
Winerror 10049: the Requested Address Is Not Valid in Its Context
Discord.Py | Add Role to Someone
How to Get the Name of an Object
Python Overflowerror: Int Too Large to Convert to Float
How to Adjust Padding With Cutoff or Overlapping Labels
Plotting Data from Multiple Pandas Data Frames in One Plot
How to Delete a Column That Contains Only Zeros in Pandas
Plotly Graph Does Not Show When Jupyter Notebook Is Converted to Slides
How to Change Border Color in Tkinter Widget
How to Find Consecutive Numbers in a Python List
How to Prevent Brokenpipeerror When Doing a Flush in Python
Python:Compare Two CSV Files and Print Out Differences
Comparing Two Dataframes and Getting the Differences
Webdriverexception: Message: Service Chromedriver Unexpectedly Exited. Status Code Was: 127
How to Concisely Replace Column Values Given Multiple Conditions