Compare two different files line by line in python
This solution reads both files in one pass, excludes blank lines, and prints common lines regardless of their position in the file:
with open('some_file_1.txt', 'r') as file1:
with open('some_file_2.txt', 'r') as file2:
same = set(file1).intersection(file2)
same.discard('\n')
with open('some_output_file.txt', 'w') as file_out:
for line in same:
file_out.write(line)
Python - Compare 2 files and output differences
This is working for me:
def compare(File1,File2):
with open(File1,'r') as f:
d=set(f.readlines())
with open(File2,'r') as f:
e=set(f.readlines())
open('file3.txt','w').close() #Create the file
with open('file3.txt','a') as f:
for line in list(d-e):
f.write(line)
You need to compare the readlines set and find out lines that are not present in file2. You can then append these lines to the new file.
comparing two text files - line by line comparison (involves masking) - python
This is the answer - finally cracked it myself -:)
import os
import sys
import re
import webbrowser
Comparison function - does it line by line:
def CompareFiles(str_file1,str_file2):
'''
This function compares two long string texts and returns their
differences as two sequences of unique lines, one list for each.
'''
#reading from text file and splitting str_file into lines - delimited by "\n"
file1_lines = str_file1.split("\n")
file2_lines = str_file2.split("\n")
#unique lines to each one, store it in their respective lists
unique_file1 = []
unique_file2 = []
#unique lines in str1
for line1 in file1_lines:
if line1 !='':
if line1 not in file2_lines:
unique_file1.append(line1)
#unique lines in str2
for line2 in file2_lines:
if line2 != '':
if line2 not in file1_lines:
unique_file2.append(line2)
return unique_file1, unique_file2
Use this function to mask:
def Masker(pattern_lines, file2mask):
'''
This function masks some fields (based on the pattern_lines) with
dummy text to simplify the comparison
'''
#mask the values of all matches from the pattern_lines by a dummy data - 'xxxxxxxxxx'
for pattern in pattern_lines:
temp = pattern.findall(file2mask)
if len(temp) != 0:
for value in temp:
if isinstance(value, str):
masked_file = file2mask.replace(str(value),'x'*10)
elif isinstance(value, tuple):
for tup in value:
masked_file = file2mask.replace(str(tup),'x'*10)
return masked_file
Open the files:
f1 = open("file1.txt","r")
data1 = f1.read()
f1.close()
f3 = open("file2.txt","r")
data3 = f3.read()
f3.close()
Create a folder to store the output file (optional):
save_path = os.path.join(os.path.dirname(__file__), 'outputs')
filename = os.path.normpath(os.path.join(save_path,"interim.txt"))
Pattern lines for masking:
pattern_lines = [
re.compile(r'\- This file is located in 3000.3422.(.*) description \"(.*)\"', re.M),
re.compile(r'\- City address of file is \"(.*)\"',re.M),
re.compile(r'\- Country of file is (.*)',re.M)
]
Mask the two files:
data1_masked = Masker(pattern_lines,data1)
data3_masked = Masker(pattern_lines,data3)
compare the two files and return the unique lines for both
unique1, unique2 = CompareFiles(data1_masked, data3_masked)
Reporting - you can write it into a function:
file = open(filename,'w')
file.write("-------------------------\n")
file.write("\nONLY in FILE ONE\n")
file.write("\n-------------------------\n")
file.write(str('\n'.join(unique1)))
file.write("\n-------------------------\n")
file.write("\nONLY in FILE TWO\n")
file.write("\n-------------------------\n")
file.write(str('\n'.join(unique2)))
file.close()
And finally open the comparison output file:
webbrowser.open(filename)
How to compare 2 files line by line with terminal
What you are after is an awk script of the following form:
$ awk '(NR==FNR){a[FNR]=$0;next}
!(FNR in a) { print "file2 has more lines than file1"; exit 1 }
{ print (($0 == a[FNR]) ? "matching" : "not matching") }
END { if (NR-FNR > FNR) print "file1 has more lines than file2"; exit 1}' file1 file2
Compare two files line by line and generate the difference in another file
diff(1) is not the answer, but comm(1) is.
NAME
comm - compare two sorted files line by line
SYNOPSIS
comm [OPTION]... FILE1 FILE2
...
-1 suppress lines unique to FILE1
-2 suppress lines unique to FILE2
-3 suppress lines that appear in both files
So
comm -2 -3 file1 file2 > file3
The input files must be sorted. If they are not, sort them first. This can be done with a temporary file, or...
comm -2 -3 <(sort file1) <(sort file2) > file3
provided that your shell supports process substitution (bash does).
Using Python to Compare Two Text Files Line by Line
https://docs.python.org/2.7/reference/expressions.html#in:
For the Unicode and string types,
x in y
is true if and only if x is a substring of y.
Instead of
if line1 in line2:
I think you meant to write
if line1 == line2:
Or maybe replace the whole
for line2 in f2data:
if line1 in line2:
linecount+=1
block by
if line1 in f2data:
linecount += 1
Compare two files and remove the common lines
Read the lines of both files into a separate variables. Iterate over the lines of the first file, and for each of one them check if they exist on the second file, if not then save them into the first file.
with open(file1, "r") as file1:
lines_file1 = file1.readlines()
with open(file, "r") as file2:
lines_file2 = file2.readlines()
with open(file1, "w") as f_w:
for line in lines_file1:
if line not in lines_file2
f_w.write(line)
The downside of this approach is that you are loading the entire files into memory.
Related Topics
Typeerror: Can Only Concatenate Str (Not "Float") to Str
What Is the Fastest Way to Open Urls in New Tabs via Selenium - Python
Python - Pygame Error When Executing Exe File
Why Does This Not Work as an Array Membership Test
Check If a Number Is Int or Float
Why am I Getting Importerror: No Module Named Pip ' Right After Installing Pip
How to Write a File or Data to an S3 Object Using Boto3
Salt and Hash a Password in Python
Sorting by a Custom List in Pandas
Configuring So That Pip Install Can Work from Github
How to Implement a Binary Tree
How to Turn Off Info Logging in Spark
How to Use Python to Execute a Curl Command
Is There Any Difference Between "Foo Is None" and "Foo == None"
How to Scrape a Website Which Requires Login Using Python and Beautifulsoup