Is \D Not Supported by Grep's Basic Expressions

Is \d not supported by grep's basic expressions?

As specified in POSIX, grep uses basic regular expressions, but \d is part of a Perl-compatible regular expression (PCRE).

If you are using GNU grep, you can use the -P option, to allow use of PCRE regular expressions. Otherwise you can use the POSIX-specified [[:digit:]] character class in place of \d.

echo 1 | grep -P '\d'
# output: 1
echo 1 | grep '[[:digit:]]'
# output: 1

Why doesn't this regex work for grep?

By default, grep uses basic regular expression and \d is (PCRE) syntax. It is not supported so you'll need to use ( [0-9] ) or ( [[:digit:]] ) instead, or use grep with option -P

Why doesn't [0-9]+ work?

  • In BRE, meta-characters like + lose their meaning and need to be escaped.

You can fix this by using one of the following:

grep -v "Packet number [0-9]\+ doesn't match"

OR

grep -v "Packet number [[:digit:]]\+ doesn't match"

regular expressions using grep not working (\d+ in particular)

If the number suffixes of interest are of fixed length and all you care about is filtering out the files that have an additional extension, the following glob (NOT a regex, but a wildcard expression) will do:

ifrontThermal64.[0-9][0-9][0-9][0-9][0-9]

E.g.:

printf "%s\n" ifrontThermal64.[0-9][0-9][0-9][0-9][0-9]

Note that globs always match against the entire filename, whereas grep performs substring matching by default.

As for why your approach didn't work:

  • Your regex isn't quoted, so the shell's parsing 'eats' the \, thereby altering it.
  • Also, whether grep recognizes \d is platform-dependent; to rule out such issues, use [0-9] instead.
  • If you use grep without -E, it uses so-called basic regular expressions, which require that the quantifier + be escaped as \+; while you could do that, the generally better option is to instead use grep -E or to simply invoke grep as egrep in order to use extended regexes, which mostly behave like regular expressions in other languages.
  • ., when intended to be a literal, should be \-escaped in a regex (which you did in one of your attempts).
  • The -1 option of ls is implied when ls is not outputting to a terminal.
  • grep uses substring matching by default, so use -x to match against the entire input line (alternatively, use the anchors ^ and $) so as to rule out filenames that match the expression but have an additional extension.

Thus, a corrected version of the original command is:

 ls | egrep -x 'ifrontThermal64\.[0-9]+'

As an aside: there's no point in enclosing your commands in parentheses; you'll needlessly create subshells (unless they're optimized away, but the point is that they're not needed).

Issue using RegEx with Linux find

Note: The following assumes you're using GNU find, which since you mention Linux, is a safe bet.

The default regular expression syntax does not understand \d (Instead you'd use [0-9] or [[:digit:]]). Alternation is \|. I don't think it supports repetition ranges; they're not documented. POSIX Basic Regular Expression syntax also doesn't understand \d, or alternation (though some GNU implementations do as an extension using \|), and requires many other things like groups and repetition ranges to be escaped. And none of the supported flavors supports non-capturing grouping ((?:...)).

Since your alternating group tests for either two or three digits, it can be turned into a single range when using one of the RE flavors that supports them.

So, something like:

find /path/to/files -regextype posix-extended -type f ! -regex ".*- S[0-9]{2}E[0-9]{2,3} -.*\.mkv"

is probably the cleanest approach.

Modify a Python regular expression to work in grep

grep can use pcre as well :

grep -P '(?:_|\.)S\d{1,}(?:\.|_)'

or more portable :

perl -lne 'print $& if /(?:_|\.)S\d{1,}(?:\.|_)/' 

Pattern matching digits does not work in egrep?

egrep doesn't recognize \d shorthand for digit character class, so you need to use e.g. [0-9].

Moreover, while it's not absolutely necessary in this case, it's good habit to quote the regex to prevent misinterpretation by the shell. Thus, something like this should work:

egrep '[0-9]{7}-[0-9]{10}' file

See also

  • egrep mini tutorial

References

  • regular-expressions.info/Flavor comparison

    • Flavor note for GNU grep, ed, sed, egrep, awk, emacs
      • Lists the differences between grep vs egrep vs other regex flavors

What are the differences between GNU grep's basic/extended and PCRE (`-P`) regular expressions?

My research of the major syntax and functionality differences from http://www.greenend.org.uk/rjk/tech/regexp.html:

  • . in GNU grep does not match null bytes and newlines (but does match newlines when used with --null-data), while Perl, everything except \n is matched.
  • [...] in GNU grep defines POSIX bracket expressions, while Perl uses "character" classes. I'm not sure on the details. See http://www.greenend.org.uk/rjk/tech/regexp.html#bracketexpression
  • "In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \)." From https://www.gnu.org/software/grep/manual/html_node/Basic-vs-Extended.html. ERE matches PCRE syntax.
  • GNU grep \w and \W are the same as [[:alnum:]] and [^[:alnum]], while Perl uses alphanumeric and underscore.
  • GNU grep has \< and \> for start and end of word.

Perl supports much more additional functionality:

  • "nongreedy {}" with syntax re{...}?
  • additional anchors and character types \A, \C, \d, \D, \G, \p, \P, \s, \S, \X. \Z, \z.
  • (?#comment)
  • shy grouping (?:re), shy grouping + modifiers (?modifiers:re)
  • lookahead and negative lookahead (?=re) and (?!re), lookbehind and negative lookbehind (?<=p) and (?<!p)
  • Atomic groups (?>re)
  • Conditional expression (?(cond)re)
  • ... and more, see man pcresyntax

Why doesn't `\d` work in regular expressions in sed?

\d is a switch not a regular expression macro. If you want to use some predefined "constant" instead of [0-9] expression just try run this code:

s/[[:digit:]]+//g

Grep with complex regular expressions

Use Extended Regular Expressions with Grep

Standard grep uses a regular expression engine that doesn't understand ranges, and that requires special characters to be escaped. Extended regular expressions will handle these atoms and operators properly, so use egrep, grep -E, or pcregrep depending on what's available on your particular system.

$ echo 'server 172.31.21.45 max_fails=3 fail_timeout=30s;' |
egrep 'server (?:[0-9]{1,3}\.){3}[0-9]{1,3}'
server 172.31.21.45 max_fails=3 fail_timeout=30s;

Use the PCRE Library

Note that GNU grep (at least through v2.20) doesn't support some of the atoms you are using. In particular, non-capturing groups with ?: are not supported without the Perl-compatible regular expression (PCRE) library, which many Linux distributions do not compile into GNU grep by default.

To see if you have PCRE support, try ldd $(which grep) | fgrep -i pcre to see if the PCRE library is linked in. If it is, you may just need to add the -P or --perl-regexp flags to enable it for your expressions.

If you do not have PCRE compiled in, then either capture the group:

$ echo 'server 172.31.21.45 max_fails=3 fail_timeout=30s;' |
egrep 'server ([0-9]{1,3}\.){3}[0-9]{1,3}'
server 172.31.21.45 max_fails=3 fail_timeout=30s;

or install and use pcregrep instead:

$ echo 'server 172.31.21.45 max_fails=3 fail_timeout=30s;' |     
pcregrep 'server (?:[0-9]{1,3}\.){3}[0-9]{1,3}'
server 172.31.21.45 max_fails=3 fail_timeout=30s;

which certainly does support non-capturing groups.

Extract version from string cross platform

grep -ow '[0-9][0-9.]\+[0-9]'

That uses only a basic regular expression, and options that BSD grep and GNU grep share.



Related Topics



Leave a reply



Submit