How to find words from one file in another file?
You can use grep -f
:
grep -Ff "first-file" "second-file"
OR else to match full words:
grep -w -Ff "first-file" "second-file"
UPDATE: As per the comments:
awk 'FNR==NR{a[$1]; next} ($1 in a){delete a[$1]; print $1}' file1 file2
Find words in one file and grab the line they are in in another file: Bash script
try this:
grep -F -f file1 file2 >newfile.txt
how to perform a search for words in one file against another file and display the first matching word in a line
With GNU awk for sorted_in
:
$ cat tst.awk
BEGIN { PROCINFO["sorted_in"] = "@val_num_asc" }
NR==FNR { res[$0]; next }
{
delete found
for ( re in res ) {
if ( !(re in found) ) {
if ( match($0,re) ) {
found[re] = RSTART
}
}
}
for ( re in found ) {
printf "%s (line #%d match)\n", re, FNR
}
}
$ awk -f tst.awk file1 file2
Sam (line #1 match)
Tom (line #2 match)
Tom (line #3 match)
Tom (line #4 match)
Sam (line #4 match)
Sam (line #5 match)
Tom (line #5 match)
Sam (line #6 match)
Find entries of one text file in another file in python
Strip the entries of newlines
Python includes newlines when you read lines - your first entry is read as 1223232\n
. Strip the newline and it will work.
def readA():
with open('A.txt') as bondNumberFile:
for line in bondNumberFile:
readB(line.rstrip())
read words from one file and remove the containing lines from another file
First, read the search words into a list.
with open("file1.txt") as f1:
wordslist = [line.strip() for line in f1]
Now, wordslist
is ['good', 'bad']
Then, read file2 line-by-line, and check if the line contains any words you read into wordslist
:
with open("file2.txt") as f2, open("file3.txt", "w") as f3:
for line in f2:
if not any(word in line for word in wordslist):
f3.write(line)
file3 now contains:
Hi,
Bye.
Search for strings listed in one file from another text file?
When you read a file with readlines()
, the resulting list elements do have a trailing newline characters. Likely, these are the reason why you have less matches than you expected.
Instead of writing
for x in list:
write
for x in (s.strip() for s in list):
This removes leading and trailing whitespace from the strings in list
. Hence, it removes trailing newline characters from the strings.
In order to consolidate your program, you could do something like this:
with open('c:/tmp/textfile.TXT') as f:
haystack = f.read()
if not haystack:
sys.exit("Could not read haystack data :-(")
with open('c:/tmp/list.txt') as f:
for needle in (line.strip() for line in f):
if needle in haystack:
print(needle, ',one_sentence')
else:
print(needle, ',another_sentence')
I did not want to make too drastic changes. The most important difference is that I am using the context manager here via the with
statement. It ensures proper file handling (mainly closing) for you. Also, the 'needle' lines are stripped on the fly using a generator expression. The above approach reads and processes the needle file line by line instead of loading the whole file into memory at once. Of course, this only makes a difference for large files.
Use a file to search lines in another file in Python
Load the keyword list into a set:
keywords = set()
with open(list_file_path) as list_file:
for line in list_file:
if line.strip():
keywords.add(line.strip())
Then iterate over each line in the master file, pulling out the lines that contain at least one keyword:
with open(master_file_path) as master_file:
with open(search_results_path, 'w') as search_results:
for line in master_file:
if set(line.split()[:-1]) & keywords:
search_results.write(line)
Find if lines on one file appear as words in the lines of another file in Python
If you just need to find out how many words in file2
occur in file1
, you just need to read in both files and find the size of the intersection of the sets containing the words in both files.
with open("file1.txt") as f:
file1_words = f.readlines()
with open("file2.txt") as f:
file2_words = f.read().split() # Read everything and split by whitespace
file1_words = set(file1_words)
file2_words = set(file2_words)
common_words = file1_words.intersection(file2_words)
print(f"File1 and File2 have {len(common_words)} words in common")
If you want to count the occurrences of each word from file1
in file2
, you'll need to write some more code.
First, read the second file and count the occurrences of each word. You could use collections.Counter
for this, but it's pretty easy to write your own code if you're learning:
with open("file2.txt") as f:
file2_words = f.read().split() # Read everything, then split by whitespace
file2_wordcount = dict() # Empty dictionary
for word in file2_words:
old_count = file2_wordcount.get(word, 0) # Get the count from the dict. Or 0 if it doesn't exist
file2_wordcount[word] = old_count + 1 # Set the new count
At the end of this block, we have a dictionary file2_wordcount
which maps each word to its count in the second file. Next, we need to read the words from the first file and find out how many times they occur in the other file.
# Now, read the lines from file 1
with open("file1.txt") as f:
file1_words = f.readlines() # Since you have one word per line.
# Convert it into a set to remove duplicates
file1_words = set(file1_words)
for word in file1_words:
count = file2_wordcount.get(word, 0) # Get the count from the dict. Or 0 if it doesn't exist
print(word, count) # Print them both
Or, to get the total count, use the sum()
function:
total_common_count = sum(file2_wordcount.get(word, 0) for word in file1_words)
How to highlight words from one file in another file? (Linux)
You probably want
grep --color=always -f iplistfile dhcpd.hosts
and you might want to add the -F
and -w
options too.
Related Topics
Bash: Run an Executable File in Background
How to Check If There Are Symbolic Links Pointing to a Directory
Linux Pipe Audio File to Microphone Input
Find and Basename Not Playing Nicely
Using Jq to Fetch Key Value from JSON Output
Granting Access Permission to a File to a Specific User
Recursively Kill R Process with Children in Linux
Shell Shift Procedure - What Is This
Why Can Back-Quotes and $() for Command Substitution Result in Different Output
Bpf Verifier Rejects Code: "Invalid Bpf_Context Access"
How to Assign a Name for a Screen
Where Are All My Inodes Being Used
How to Remove All White Spaces from a Given Text File