Trying to Compare Two Text Files, and Create a Third Based on Information

compare two text files (order does not matter) and output the words the two files have in common to a third file

You don't need all those loops - if the files are small (i.e., less than several hundred MB), you can work with them more directly:

words1 = f1.read().split()
words2 = f2.read().split()
words = set(words1) & set(words2)

words will then be a set containing all the words those files have in common. You can ignore case by using lower() before splitting the text.

To have a count of each word as you mention in a comment, simply use the count() method:

with open('outfile.txt', 'w') as output:
for word in words:
output.write('{} appears {} times in f1 and {} times in f2.\n'.format(word, words1.count(word), words2.count(word))

Trying to compare two text files using python

the open() function returns an object, not the content of the file.
you are comparing two references to different file objects.
you should read the contents of the file and then compare it.
you should do:

read = open("text1.txt", "r").read()
read2 = open("text2.txt", "r").read()

How to compare the content of two text files and output in another text file? text1 - text2

Using Guava:

Set<String> lines1 = 
new HashSet<>(Files.readLines(new File("1.txt"), Charsets.UTF_8));
Set<String> lines2 =
new HashSet<>(Files.readLines(new File("2.txt"), Charsets.UTF_8));
Set<String> minus1 = Sets.difference(lines1, lines2);
Set<String> minus2 = Sets.difference(lines2, lines1);
Files.asCharSink(new File("3.txt"), Charsets.UTF_8).writeLines(minus1);
Files.asCharSink(new File("4.txt"), Charsets.UTF_8).writeLines(minus2);

Compare two text files with two columns, find matches in first column, output match to third file

Find all matches of the second file's first column to the first file's first column, print both columns of the first file where there is a match, then add it to file3.txt:

awk 'NR==FNR{a[$1];next}$1 in a' file2.txt file1.txt >> file3.txt

Explanation:

NR and FNR are built in awk variables representing the number of total input records and the number of records in the current files.

NR==FNR # when in the first file (file2)
{
a[$1] # build associative array on the first column of file2
next # process next line
}
($1 in a) # if value in first column of the second file (file1) is in the array, get the whole line

>> file3.txt pipes what awk prints to file3.

Comparing two text files and the matches go to a new file

Try something like this.

with open('file1.txt') as file1:
with open('file2.txt') as file2:
newfile = open('newfile.txt','w')
for range(len(file1.readlines())):
s1 = file1.readline()
s2 = file2.readline()
if s1 == s2:
newfile.write(s1)
newfile.close()

Or something simpler like @SUTerliakov pointed out
common_lines = set(file1.readlines()) & set(file2.readlines())
instead of the checking block such as:

with open('file1.txt') as file1:
with open('file2.txt') as file2:
newfile = open('newfile.txt','w')
common_lines = set(file1.readlines()) & set(file2.readlines())
for line in common_lines:
newfile.write(line)

newfile.close()


Related Topics



Leave a reply



Submit