How to Find Words from One File in Another File

How to find words from one file in another file?

You can use grep -f:

grep -Ff "first-file" "second-file"

OR else to match full words:

grep -w -Ff "first-file" "second-file"

UPDATE: As per the comments:

awk 'FNR==NR{a[$1]; next} ($1 in a){delete a[$1]; print $1}' file1 file2

Find words in one file and grab the line they are in in another file: Bash script

try this:

grep -F -f file1 file2 >newfile.txt

how to perform a search for words in one file against another file and display the first matching word in a line

With GNU awk for sorted_in:

$ cat tst.awk
BEGIN { PROCINFO["sorted_in"] = "@val_num_asc" }
NR==FNR { res[$0]; next }
{
delete found
for ( re in res ) {
if ( !(re in found) ) {
if ( match($0,re) ) {
found[re] = RSTART
}
}
}
for ( re in found ) {
printf "%s (line #%d match)\n", re, FNR
}
}

$ awk -f tst.awk file1 file2
Sam (line #1 match)
Tom (line #2 match)
Tom (line #3 match)
Tom (line #4 match)
Sam (line #4 match)
Sam (line #5 match)
Tom (line #5 match)
Sam (line #6 match)

Find entries of one text file in another file in python

Strip the entries of newlines

Python includes newlines when you read lines - your first entry is read as 1223232\n. Strip the newline and it will work.

def readA():
with open('A.txt') as bondNumberFile:
for line in bondNumberFile:
readB(line.rstrip())

read words from one file and remove the containing lines from another file

First, read the search words into a list.

with open("file1.txt") as f1:
wordslist = [line.strip() for line in f1]

Now, wordslist is ['good', 'bad']

Then, read file2 line-by-line, and check if the line contains any words you read into wordslist:

with open("file2.txt") as f2, open("file3.txt", "w") as f3:
for line in f2:
if not any(word in line for word in wordslist):
f3.write(line)

file3 now contains:

Hi,
Bye.

Search for strings listed in one file from another text file?

When you read a file with readlines(), the resulting list elements do have a trailing newline characters. Likely, these are the reason why you have less matches than you expected.

Instead of writing

for x in list:

write

for x in (s.strip() for s in list):

This removes leading and trailing whitespace from the strings in list. Hence, it removes trailing newline characters from the strings.

In order to consolidate your program, you could do something like this:

with open('c:/tmp/textfile.TXT') as f:
haystack = f.read()

if not haystack:
sys.exit("Could not read haystack data :-(")

with open('c:/tmp/list.txt') as f:
for needle in (line.strip() for line in f):
if needle in haystack:
print(needle, ',one_sentence')
else:
print(needle, ',another_sentence')

I did not want to make too drastic changes. The most important difference is that I am using the context manager here via the with statement. It ensures proper file handling (mainly closing) for you. Also, the 'needle' lines are stripped on the fly using a generator expression. The above approach reads and processes the needle file line by line instead of loading the whole file into memory at once. Of course, this only makes a difference for large files.

Use a file to search lines in another file in Python

Load the keyword list into a set:

keywords = set()
with open(list_file_path) as list_file:
for line in list_file:
if line.strip():
keywords.add(line.strip())

Then iterate over each line in the master file, pulling out the lines that contain at least one keyword:

with open(master_file_path) as master_file:
with open(search_results_path, 'w') as search_results:
for line in master_file:
if set(line.split()[:-1]) & keywords:
search_results.write(line)

Find if lines on one file appear as words in the lines of another file in Python

If you just need to find out how many words in file2 occur in file1, you just need to read in both files and find the size of the intersection of the sets containing the words in both files.

with open("file1.txt") as f:
file1_words = f.readlines()

with open("file2.txt") as f:
file2_words = f.read().split() # Read everything and split by whitespace

file1_words = set(file1_words)
file2_words = set(file2_words)

common_words = file1_words.intersection(file2_words)
print(f"File1 and File2 have {len(common_words)} words in common")

If you want to count the occurrences of each word from file1 in file2, you'll need to write some more code.

First, read the second file and count the occurrences of each word. You could use collections.Counter for this, but it's pretty easy to write your own code if you're learning:

with open("file2.txt") as f:
file2_words = f.read().split() # Read everything, then split by whitespace

file2_wordcount = dict() # Empty dictionary

for word in file2_words:
old_count = file2_wordcount.get(word, 0) # Get the count from the dict. Or 0 if it doesn't exist
file2_wordcount[word] = old_count + 1 # Set the new count

At the end of this block, we have a dictionary file2_wordcount which maps each word to its count in the second file. Next, we need to read the words from the first file and find out how many times they occur in the other file.

# Now, read the lines from file 1
with open("file1.txt") as f:
file1_words = f.readlines() # Since you have one word per line.

# Convert it into a set to remove duplicates
file1_words = set(file1_words)

for word in file1_words:
count = file2_wordcount.get(word, 0) # Get the count from the dict. Or 0 if it doesn't exist
print(word, count) # Print them both

Or, to get the total count, use the sum() function:

total_common_count = sum(file2_wordcount.get(word, 0) for word in file1_words)

How to highlight words from one file in another file? (Linux)

You probably want

grep --color=always -f iplistfile dhcpd.hosts

and you might want to add the -F and -w options too.



Related Topics



Leave a reply



Submit