Grep files containing two or more occurrence of a specific string
What about this:
grep -o -c Hello * | awk -F: '{if ($2 > 1){print $1}}'
Bash: How can I grep a line for multiple instances of the same string?
You can use
cat lines | grep 'xz.*xz'
Or just
grep 'xz.*xz' lines
The .*
will match optional characters (any but a newline) between 2 xz
.
In case you need to use look-arounds, you will need -P
switch to enable Perl-like regexps.
How to grep for two words existing on the same line?
Why do you pass -c
? That will just show the number of matches. Similarly, there is no reason to use -r
. I suggest you read man grep
.
To grep for 2 words existing on the same line, simply do:
grep "word1" FILE | grep "word2"
grep "word1" FILE
will print all lines that have word1 in them from FILE, and then grep "word2"
will print the lines that have word2 in them. Hence, if you combine these using a pipe, it will show lines containing both word1 and word2.
If you just want a count of how many lines had the 2 words on the same line, do:
grep "word1" FILE | grep -c "word2"
Also, to address your question why does it get stuck : in grep -c "word1"
, you did not specify a file. Therefore, grep
expects input from stdin
, which is why it seems to hang. You can press Ctrl+D to send an EOF (end-of-file) so that it quits.
Regex match a pattern occurring multiple times in a string
You can use
^[0-9]+:[0-9]+, 80:[0-9]+, 443:[0-9]+(, [0-9]+:[0-9]+)+,$
See the regex demo.
Also, consider the awk
solution like
awk '/^[0-9]+:[0-9]+(, [0-9]+:[0-9]+)+,$/ && /80/ && /443/' file
See the online demo:
#!/bin/bash
s='0:0, 80:3, 443:0, 8883:0, 9000:0, 9001:0,
0:0, 80:0, 443:1, 8883:0, 9000:0, 9001:0,
0:0, 80:0, 443:0, 8883:0, 9000:0, 9001:0,
0:0, 80:3, 443:1, 8883:0, 9000:0, 9001:0,
0:0, 443:0, 8883:0, 9000:0, 9001:0,
0:0, 80:0, 8883:0, 9000:0, 9001:0,
0:0, 8883:0, 9000:0, 9001:0,'
awk '/^[0-9]+:[0-9]+(, [0-9]+:[0-9]+)+,$/ && /80/ && /443/' <<< "$s"
Output:
0:0, 80:3, 443:0, 8883:0, 9000:0, 9001:0,
0:0, 80:0, 443:1, 8883:0, 9000:0, 9001:0,
0:0, 80:0, 443:0, 8883:0, 9000:0, 9001:0,
0:0, 80:3, 443:1, 8883:0, 9000:0, 9001:0,
Match two strings in one line with grep
You can use
grep 'string1' filename | grep 'string2'
Or
grep 'string1.*string2\|string2.*string1' filename
Check if all of multiple strings or regexes exist in a file
Awk is the tool that the guys who invented grep, shell, etc. invented to do general text manipulation jobs like this so not sure why you'd want to try to avoid it.
In case brevity is what you're looking for, here's the GNU awk one-liner to do just what you asked for:
awk 'NR==FNR{a[$0];next} {for(s in a) if(!index($0,s)) exit 1}' strings RS='^$' file
And here's a bunch of other information and options:
Assuming you're really looking for strings, it'd be:
awk -v strings='string1 string2 string3' '
BEGIN {
numStrings = split(strings,tmp)
for (i in tmp) strs[tmp[i]]
}
numStrings == 0 { exit }
{
for (str in strs) {
if ( index($0,str) ) {
delete strs[str]
numStrings--
}
}
}
END { exit (numStrings ? 1 : 0) }
' file
the above will stop reading the file as soon as all strings have matched.
If you were looking for regexps instead of strings then with GNU awk for multi-char RS and retention of $0 in the END section you could do:
awk -v RS='^$' 'END{exit !(/regexp1/ && /regexp2/ && /regexp3/)}' file
Actually, even if it were strings you could do:
awk -v RS='^$' 'END{exit !(index($0,"string1") && index($0,"string2") && index($0,"string3"))}' file
The main issue with the above 2 GNU awk solutions is that, like @anubhava's GNU grep -P solution, the whole file has to be read into memory at one time whereas with the first awk script above, it'll work in any awk in any shell on any UNIX box and only stores one line of input at a time.
I see you've added a comment under your question to say you could have several thousand "patterns". Assuming you mean "strings" then instead of passing them as arguments to the script you could read them from a file, e.g. with GNU awk for multi-char RS and a file with one search string per line:
awk '
NR==FNR { strings[$0]; next }
{
for (string in strings)
if ( !index($0,string) )
exit 1
}
' file_of_strings RS='^$' file_to_be_searched
and for regexps it'd be:
awk '
NR==FNR { regexps[$0]; next }
{
for (regexp in regexps)
if ( $0 !~ regexp )
exit 1
}
' file_of_regexps RS='^$' file_to_be_searched
If you don't have GNU awk and your input file does not contain NUL characters then you can get the same effect as above by using RS='\0'
instead of RS='^$'
or by appending to variable one line at a time as it's read and then processing that variable in the END section.
If your file_to_be_searched is too large to fit in memory then it'd be this for strings:
awk '
NR==FNR { strings[$0]; numStrings=NR; next }
numStrings == 0 { exit }
{
for (string in strings) {
if ( index($0,string) ) {
delete strings[string]
numStrings--
}
}
}
END { exit (numStrings ? 1 : 0) }
' file_of_strings file_to_be_searched
and the equivalent for regexps:
awk '
NR==FNR { regexps[$0]; numRegexps=NR; next }
numRegexps == 0 { exit }
{
for (regexp in regexps) {
if ( $0 ~ regexp ) {
delete regexps[regexp]
numRegexps--
}
}
}
END { exit (numRegexps ? 1 : 0) }
' file_of_regexps file_to_be_searched
Regex character repeats n or more times in line with grep
you should change your grep
command in:
grep -E 'g{4,}' input_file # --> this will extract only the lines containing chains of 4 or more g
if you want to take all the lines that contain chains of 4 or more identical characters your regex become:
grep -E '(.)\1{3,}' input_file
If you do not need the chains but only line where g
appear 4 or more times:
grep -E '([^g]*g){4}' input_file
you can generalize to any char repeating 4 times or more by using:
grep -E '(.)(.*\1){3}' input_file
Related Topics
Serial Port Doesn't Work Properly After Reboot, Unless I Execute Minicom
Awk - Count Each Unique Value and Match Values Between Two Files
How to Get a List of Programs Running with Nohup
List Files Over a Specific Size in Current Directory and All Subdirectories
In Linux Determine If a .A Library/Archive 32-Bit or 64-Bit
Linux, Why Can't I Write Even Though I Have Group Permissions
What Happens After a Packet Is Captured
How to Automatically Pipe to Less If the Result Is More Than a Page on My Shell
How to Conveniently Sync a File Between Two Git Repositories
Black Color Showing on Cmy Channels When Converted to Cmyk Using Ghostscript
Executing String Sent from One Terminal in Another in Linux Pseudo-Terminal
How to Delete Multiple Files at Once in Bash on Linux
I Want to Contribute to the Linux Kernel
How to Find a File/Directory That Could Be Anywhere on Linux Command Line
How to Check Syslog in Bash on Linux