Compare Two Different Files Line by Line in Python

Compare two different files line by line in python

This solution reads both files in one pass, excludes blank lines, and prints common lines regardless of their position in the file:

with open('some_file_1.txt', 'r') as file1:
with open('some_file_2.txt', 'r') as file2:
same = set(file1).intersection(file2)

same.discard('\n')

with open('some_output_file.txt', 'w') as file_out:
for line in same:
file_out.write(line)

Python - Compare 2 files and output differences

This is working for me:

def compare(File1,File2):
with open(File1,'r') as f:
d=set(f.readlines())

with open(File2,'r') as f:
e=set(f.readlines())

open('file3.txt','w').close() #Create the file

with open('file3.txt','a') as f:
for line in list(d-e):
f.write(line)

You need to compare the readlines set and find out lines that are not present in file2. You can then append these lines to the new file.

comparing two text files - line by line comparison (involves masking) - python

This is the answer - finally cracked it myself -:)

import os
import sys
import re
import webbrowser

Comparison function - does it line by line:

def CompareFiles(str_file1,str_file2):
'''
This function compares two long string texts and returns their
differences as two sequences of unique lines, one list for each.
'''
#reading from text file and splitting str_file into lines - delimited by "\n"
file1_lines = str_file1.split("\n")
file2_lines = str_file2.split("\n")

#unique lines to each one, store it in their respective lists
unique_file1 = []
unique_file2 = []

#unique lines in str1
for line1 in file1_lines:
if line1 !='':
if line1 not in file2_lines:
unique_file1.append(line1)

#unique lines in str2
for line2 in file2_lines:
if line2 != '':
if line2 not in file1_lines:
unique_file2.append(line2)

return unique_file1, unique_file2

Use this function to mask:

def Masker(pattern_lines, file2mask):
'''
This function masks some fields (based on the pattern_lines) with
dummy text to simplify the comparison
'''
#mask the values of all matches from the pattern_lines by a dummy data - 'xxxxxxxxxx'
for pattern in pattern_lines:
temp = pattern.findall(file2mask)
if len(temp) != 0:
for value in temp:
if isinstance(value, str):
masked_file = file2mask.replace(str(value),'x'*10)
elif isinstance(value, tuple):
for tup in value:
masked_file = file2mask.replace(str(tup),'x'*10)
return masked_file

Open the files:

f1 = open("file1.txt","r")
data1 = f1.read()
f1.close()

f3 = open("file2.txt","r")
data3 = f3.read()
f3.close()

Create a folder to store the output file (optional):

save_path = os.path.join(os.path.dirname(__file__), 'outputs')
filename = os.path.normpath(os.path.join(save_path,"interim.txt"))

Pattern lines for masking:

pattern_lines = [
re.compile(r'\- This file is located in 3000.3422.(.*) description \"(.*)\"', re.M),
re.compile(r'\- City address of file is \"(.*)\"',re.M),
re.compile(r'\- Country of file is (.*)',re.M)
]

Mask the two files:

data1_masked = Masker(pattern_lines,data1)
data3_masked = Masker(pattern_lines,data3)

compare the two files and return the unique lines for both

unique1, unique2 = CompareFiles(data1_masked, data3_masked)

Reporting - you can write it into a function:

file = open(filename,'w')
file.write("-------------------------\n")
file.write("\nONLY in FILE ONE\n")
file.write("\n-------------------------\n")
file.write(str('\n'.join(unique1)))
file.write("\n-------------------------\n")
file.write("\nONLY in FILE TWO\n")
file.write("\n-------------------------\n")
file.write(str('\n'.join(unique2)))
file.close()

And finally open the comparison output file:

webbrowser.open(filename)

How to compare 2 files line by line with terminal

What you are after is an awk script of the following form:

$ awk '(NR==FNR){a[FNR]=$0;next}
!(FNR in a) { print "file2 has more lines than file1"; exit 1 }
{ print (($0 == a[FNR]) ? "matching" : "not matching") }
END { if (NR-FNR > FNR) print "file1 has more lines than file2"; exit 1}' file1 file2

Compare two files line by line and generate the difference in another file

diff(1) is not the answer, but comm(1) is.

NAME
comm - compare two sorted files line by line

SYNOPSIS
comm [OPTION]... FILE1 FILE2

...

-1 suppress lines unique to FILE1

-2 suppress lines unique to FILE2

-3 suppress lines that appear in both files

So

comm -2 -3 file1 file2 > file3

The input files must be sorted. If they are not, sort them first. This can be done with a temporary file, or...

comm -2 -3 <(sort file1) <(sort file2) > file3

provided that your shell supports process substitution (bash does).

Using Python to Compare Two Text Files Line by Line

https://docs.python.org/2.7/reference/expressions.html#in:

For the Unicode and string types, x in y is true if and only if x is a substring of y.

Instead of

    if line1 in line2:

I think you meant to write

    if line1 == line2:

Or maybe replace the whole

for line2 in f2data:
if line1 in line2:
linecount+=1

block by

if line1 in f2data:
linecount += 1

Compare two files and remove the common lines

Read the lines of both files into a separate variables. Iterate over the lines of the first file, and for each of one them check if they exist on the second file, if not then save them into the first file.

with open(file1, "r") as file1:
lines_file1 = file1.readlines()
with open(file, "r") as file2:
lines_file2 = file2.readlines()
with open(file1, "w") as f_w:
for line in lines_file1:
if line not in lines_file2
f_w.write(line)

The downside of this approach is that you are loading the entire files into memory.



Related Topics



Leave a reply



Submit