Is \d not supported by grep's basic expressions?
As specified in POSIX, grep
uses basic regular expressions, but \d
is part of a Perl-compatible regular expression (PCRE).
If you are using GNU grep, you can use the -P
option, to allow use of PCRE regular expressions. Otherwise you can use the POSIX-specified [[:digit:]]
character class in place of \d
.
echo 1 | grep -P '\d'
# output: 1
echo 1 | grep '[[:digit:]]'
# output: 1
Why doesn't this regex work for grep?
By default, grep uses basic regular expression and \d
is (PCRE
) syntax. It is not supported so you'll need to use ( [0-9]
) or ( [[:digit:]]
) instead, or use grep with option -P
Why doesn't [0-9]+
work?
- In BRE, meta-characters like
+
lose their meaning and need to be escaped.
You can fix this by using one of the following:
grep -v "Packet number [0-9]\+ doesn't match"
OR
grep -v "Packet number [[:digit:]]\+ doesn't match"
regular expressions using grep not working (\d+ in particular)
If the number suffixes of interest are of fixed length and all you care about is filtering out the files that have an additional extension, the following glob (NOT a regex, but a wildcard expression) will do:
ifrontThermal64.[0-9][0-9][0-9][0-9][0-9]
E.g.:
printf "%s\n" ifrontThermal64.[0-9][0-9][0-9][0-9][0-9]
Note that globs always match against the entire filename, whereas grep
performs substring matching by default.
As for why your approach didn't work:
- Your regex isn't quoted, so the shell's parsing 'eats' the
\
, thereby altering it. - Also, whether
grep
recognizes\d
is platform-dependent; to rule out such issues, use[0-9]
instead. - If you use
grep
without-E
, it uses so-called basic regular expressions, which require that the quantifier+
be escaped as\+
; while you could do that, the generally better option is to instead usegrep -E
or to simply invokegrep
asegrep
in order to use extended regexes, which mostly behave like regular expressions in other languages. .
, when intended to be a literal, should be\
-escaped in a regex (which you did in one of your attempts).- The
-1
option ofls
is implied whenls
is not outputting to a terminal. grep
uses substring matching by default, so use-x
to match against the entire input line (alternatively, use the anchors^
and$
) so as to rule out filenames that match the expression but have an additional extension.
Thus, a corrected version of the original command is:
ls | egrep -x 'ifrontThermal64\.[0-9]+'
As an aside: there's no point in enclosing your commands in parentheses; you'll needlessly create subshells (unless they're optimized away, but the point is that they're not needed).
Issue using RegEx with Linux find
Note: The following assumes you're using GNU find, which since you mention Linux, is a safe bet.
The default regular expression syntax does not understand \d
(Instead you'd use [0-9]
or [[:digit:]]
). Alternation is \|
. I don't think it supports repetition ranges; they're not documented. POSIX Basic Regular Expression syntax also doesn't understand \d
, or alternation (though some GNU implementations do as an extension using \|
), and requires many other things like groups and repetition ranges to be escaped. And none of the supported flavors supports non-capturing grouping ((?:...)
).
Since your alternating group tests for either two or three digits, it can be turned into a single range when using one of the RE flavors that supports them.
So, something like:
find /path/to/files -regextype posix-extended -type f ! -regex ".*- S[0-9]{2}E[0-9]{2,3} -.*\.mkv"
is probably the cleanest approach.
Modify a Python regular expression to work in grep
grep can use pcre as well :
grep -P '(?:_|\.)S\d{1,}(?:\.|_)'
or more portable :
perl -lne 'print $& if /(?:_|\.)S\d{1,}(?:\.|_)/'
Pattern matching digits does not work in egrep?
egrep
doesn't recognize \d
shorthand for digit character class, so you need to use e.g. [0-9]
.
Moreover, while it's not absolutely necessary in this case, it's good habit to quote the regex to prevent misinterpretation by the shell. Thus, something like this should work:
egrep '[0-9]{7}-[0-9]{10}' file
See also
egrep
mini tutorial
References
- regular-expressions.info/Flavor comparison
- Flavor note for GNU
grep
,ed
,sed
,egrep
,awk
,emacs
- Lists the differences between
grep
vsegrep
vs other regex flavors
- Lists the differences between
- Flavor note for GNU
What are the differences between GNU grep's basic/extended and PCRE (`-P`) regular expressions?
My research of the major syntax and functionality differences from http://www.greenend.org.uk/rjk/tech/regexp.html:
.
in GNU grep does not match null bytes and newlines (but does match newlines when used with--null-data
), while Perl, everything except\n
is matched.[...]
in GNU grep defines POSIX bracket expressions, while Perl uses "character" classes. I'm not sure on the details. See http://www.greenend.org.uk/rjk/tech/regexp.html#bracketexpression- "In basic regular expressions the meta-characters
?
,+
,{
,|
,(
, and)
lose their special meaning; instead use the backslashed versions\?
,\+
,\{
,\|
,\(
, and\)
." From https://www.gnu.org/software/grep/manual/html_node/Basic-vs-Extended.html. ERE matches PCRE syntax. - GNU grep
\w
and\W
are the same as[[:alnum:]]
and[^[:alnum]]
, while Perl uses alphanumeric and underscore. - GNU grep has
\<
and\>
for start and end of word.
Perl supports much more additional functionality:
- "nongreedy {}" with syntax
re{...}?
- additional anchors and character types
\A
,\C
,\d
,\D
,\G
,\p
,\P
,\s
,\S
,\X
.\Z
,\z
. (?#comment)
- shy grouping
(?:re)
, shy grouping + modifiers(?modifiers:re)
- lookahead and negative lookahead
(?=re)
and(?!re)
, lookbehind and negative lookbehind(?<=p)
and(?<!p)
- Atomic groups
(?>re)
- Conditional expression
(?(cond)re)
- ... and more, see
man pcresyntax
Why doesn't `\d` work in regular expressions in sed?
\d
is a switch not a regular expression macro. If you want to use some predefined "constant" instead of [0-9]
expression just try run this code:
s/[[:digit:]]+//g
Grep with complex regular expressions
Use Extended Regular Expressions with Grep
Standard grep uses a regular expression engine that doesn't understand ranges, and that requires special characters to be escaped. Extended regular expressions will handle these atoms and operators properly, so use egrep
, grep -E
, or pcregrep
depending on what's available on your particular system.
$ echo 'server 172.31.21.45 max_fails=3 fail_timeout=30s;' |
egrep 'server (?:[0-9]{1,3}\.){3}[0-9]{1,3}'
server 172.31.21.45 max_fails=3 fail_timeout=30s;
Use the PCRE Library
Note that GNU grep (at least through v2.20) doesn't support some of the atoms you are using. In particular, non-capturing groups with ?:
are not supported without the Perl-compatible regular expression (PCRE) library, which many Linux distributions do not compile into GNU grep by default.
To see if you have PCRE support, try ldd $(which grep) | fgrep -i pcre
to see if the PCRE library is linked in. If it is, you may just need to add the -P
or --perl-regexp
flags to enable it for your expressions.
If you do not have PCRE compiled in, then either capture the group:
$ echo 'server 172.31.21.45 max_fails=3 fail_timeout=30s;' |
egrep 'server ([0-9]{1,3}\.){3}[0-9]{1,3}'
server 172.31.21.45 max_fails=3 fail_timeout=30s;
or install and use pcregrep
instead:
$ echo 'server 172.31.21.45 max_fails=3 fail_timeout=30s;' |
pcregrep 'server (?:[0-9]{1,3}\.){3}[0-9]{1,3}'
server 172.31.21.45 max_fails=3 fail_timeout=30s;
which certainly does support non-capturing groups.
Extract version from string cross platform
grep -ow '[0-9][0-9.]\+[0-9]'
That uses only a basic regular expression, and options that BSD grep and GNU grep share.
Related Topics
Signal Handling in Asm: Why am I Receiving Sigsegv When Invoking the Sys_Pause Syscall
Aws Lambda Permission Denied When Trying to Use Ffmpeg
Bash Shell Script Variable Assignment
Use Find Command But Exclude Files in Two Directories
How to Track Child Process Using Strace
Remove Empty Lines in a Text File via Grep
Move Files to Directories Based on Extension
Set-Up X11 Forwarding Over Ssh
Manually Merge Two Files Using Diff
Multiple Ble Connections Using Linux and Bluez 5.0
How to Imshow with Invisible Figure in Matlab Running on Linux
Platform Independent Resource Management
How to Print a Single Ascii Char
What Should I Choose: Gtk+ or Qt
Linux Command History with Date and Time
No Local Gulp Install Found Even After Installing Npm Install -G Gulp