compare two text files (order does not matter) and output the words the two files have in common to a third file
You don't need all those loops - if the files are small (i.e., less than several hundred MB), you can work with them more directly:
words1 = f1.read().split()
words2 = f2.read().split()
words = set(words1) & set(words2)
words
will then be a set
containing all the words those files have in common. You can ignore case by using lower()
before splitting the text.
To have a count of each word as you mention in a comment, simply use the count()
method:
with open('outfile.txt', 'w') as output:
for word in words:
output.write('{} appears {} times in f1 and {} times in f2.\n'.format(word, words1.count(word), words2.count(word))
Trying to compare two text files using python
the open()
function returns an object, not the content of the file.
you are comparing two references to different file objects.
you should read the contents of the file and then compare it.
you should do:
read = open("text1.txt", "r").read()
read2 = open("text2.txt", "r").read()
How to compare the content of two text files and output in another text file? text1 - text2
Using Guava:
Set<String> lines1 =
new HashSet<>(Files.readLines(new File("1.txt"), Charsets.UTF_8));
Set<String> lines2 =
new HashSet<>(Files.readLines(new File("2.txt"), Charsets.UTF_8));
Set<String> minus1 = Sets.difference(lines1, lines2);
Set<String> minus2 = Sets.difference(lines2, lines1);
Files.asCharSink(new File("3.txt"), Charsets.UTF_8).writeLines(minus1);
Files.asCharSink(new File("4.txt"), Charsets.UTF_8).writeLines(minus2);
Compare two text files with two columns, find matches in first column, output match to third file
Find all matches of the second file's first column to the first file's first column, print both columns of the first file where there is a match, then add it to file3.txt
:
awk 'NR==FNR{a[$1];next}$1 in a' file2.txt file1.txt >> file3.txt
Explanation:
NR
and FNR
are built in awk
variables representing the number of total input records and the number of records in the current files.
NR==FNR # when in the first file (file2)
{
a[$1] # build associative array on the first column of file2
next # process next line
}
($1 in a) # if value in first column of the second file (file1) is in the array, get the whole line
>> file3.txt
pipes what awk prints to file3.
Comparing two text files and the matches go to a new file
Try something like this.
with open('file1.txt') as file1:
with open('file2.txt') as file2:
newfile = open('newfile.txt','w')
for range(len(file1.readlines())):
s1 = file1.readline()
s2 = file2.readline()
if s1 == s2:
newfile.write(s1)
newfile.close()
Or something simpler like @SUTerliakov pointed outcommon_lines = set(file1.readlines()) & set(file2.readlines())
instead of the checking block such as:
with open('file1.txt') as file1:
with open('file2.txt') as file2:
newfile = open('newfile.txt','w')
common_lines = set(file1.readlines()) & set(file2.readlines())
for line in common_lines:
newfile.write(line)
newfile.close()
Related Topics
How to Reference a Method in Another Ruby Code File
Error "Undefinded Method "Load_Defaults" " When Trying to Deploy App on Heroku
Rails: How to to Download a File from a Http and Save It into Database
Ubuntu 12.10 - Ruby Gem Rmagick Missing Dependency Issue
Suppresing Output to Console with Ruby
Ruby Selenium Webdriver Unable to Find Mozilla Geckodriver
How to Create a Charge and a Customer in Stripe (Rails)
Ruby -V Dyld: Library Not Loaded: /Usr/Local/Lib/Libgmp.10.Dylib
How to Split String into Array as Integers
Updated to Osx 10.9, Now Getting Ruby Error Using Homebrew
Irb: How to Start an Interactive Ruby Session with Pre-Loaded Classes
Sorting a Hash in Ruby by Its Value First Then Its Key
Expected #Count to Have Changed by 1, But Was Not Given a Block
Authlogic Perishable_Token Resets on Every Request
In Ruby, How to Find Out If a String Is Not in an Array
How to Convert a Scientific Notation String to Decimal Notation