Compare two files report difference in python
import difflib
lines1 = '''
dog
cat
bird
buffalo
gophers
hound
horse
'''.strip().splitlines()
lines2 = '''
cat
dog
bird
buffalo
gopher
horse
mouse
'''.strip().splitlines()
# Changes:
# swapped positions of cat and dog
# changed gophers to gopher
# removed hound
# added mouse
for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm=''):
print line
Outputs the following:
--- file1
+++ file2
@@ -1,7 +1,7 @@
+cat
dog
-cat
bird
buffalo
-gophers
-hound
+gopher
horse
+mouse
This diff gives you context -- surrounding lines to help make it clear how the file is different. You can see "cat" here twice, because it was removed from below "dog" and added above it.
You can use n=0 to remove the context.
for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0):
print line
Outputting this:
--- file1
+++ file2
@@ -0,0 +1 @@
+cat
@@ -2 +2,0 @@
-cat
@@ -5,2 +5 @@
-gophers
-hound
+gopher
@@ -7,0 +7 @@
+mouse
But now it's full of the "@@" lines telling you the position in the file that has changed. Let's remove the extra lines to make it more readable.
for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0):
for prefix in ('---', '+++', '@@'):
if line.startswith(prefix):
break
else:
print line
Giving us this output:
+cat
-cat
-gophers
-hound
+gopher
+mouse
Now what do you want it to do?
If you ignore all removed lines, then you won't see that "hound" was removed.
If you're happy just showing the additions to the file, then you could do this:
diff = difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0)
lines = list(diff)[2:]
added = [line[1:] for line in lines if line[0] == '+']
removed = [line[1:] for line in lines if line[0] == '-']
print 'additions:'
for line in added:
print line
print
print 'additions, ignoring position'
for line in added:
if line not in removed:
print line
Outputting:
additions:
cat
gopher
mouse
additions, ignoring position:
gopher
mouse
You can probably tell by now that there are various ways to "print the differences" of two files, so you will need to be very specific if you want more help.
Compare two different files line by line in python
This solution reads both files in one pass, excludes blank lines, and prints common lines regardless of their position in the file:
with open('some_file_1.txt', 'r') as file1:
with open('some_file_2.txt', 'r') as file2:
same = set(file1).intersection(file2)
same.discard('\n')
with open('some_output_file.txt', 'w') as file_out:
for line in same:
file_out.write(line)
Python - Compare 2 files and output differences
This is working for me:
def compare(File1,File2):
with open(File1,'r') as f:
d=set(f.readlines())
with open(File2,'r') as f:
e=set(f.readlines())
open('file3.txt','w').close() #Create the file
with open('file3.txt','a') as f:
for line in list(d-e):
f.write(line)
You need to compare the readlines set and find out lines that are not present in file2. You can then append these lines to the new file.
Compare two files and remove the common lines
Read the lines of both files into a separate variables. Iterate over the lines of the first file, and for each of one them check if they exist on the second file, if not then save them into the first file.
with open(file1, "r") as file1:
lines_file1 = file1.readlines()
with open(file, "r") as file2:
lines_file2 = file2.readlines()
with open(file1, "w") as f_w:
for line in lines_file1:
if line not in lines_file2
f_w.write(line)
The downside of this approach is that you are loading the entire files into memory.
Related Topics
Passing Variable from Python Script to Bash Script
Convert Binary to Ascii and Vice Versa
What Are the Risks of Running 'Sudo Pip'
How to Redirect 'Print' Output to a File
What Are the Differences Between Numpy Arrays and Matrices? Which One Should I Use
Pythonic Way to Print List Items
How to Return Two Values from a Function in Python
Iterate an Iterator by Chunks (Of N) in Python
Using an Numpy Array as Indices of the 2Nd Dim of Another Array
Is It Worth Using Python's Re.Compile
How to Protect My Python Scripts on Raspberry Pi
Pandas Get Topmost N Records Within Each Group
Apply Function to Each Element of a List
Selenium with Scrapy for Dynamic Page
Keras, How to Get the Output of Each Layer
Regular Expression to Return Text Between Parenthesis
Access Multiple Elements of List Knowing Their Index
How Does Swapping of Members in Tuples (A,B)=(B,A) Work Internally