Why isn't the regular expression's non-capturing group working?
group()
and group(0)
will return the entire match. Subsequent groups are actual capture groups.
>>> print (re.match(r"(?:aaa)(_bbb)", string1).group(0))
aaa_bbb
>>> print (re.match(r"(?:aaa)(_bbb)", string1).group(1))
_bbb
>>> print (re.match(r"(?:aaa)(_bbb)", string1).group(2))
Traceback (most recent call last):
File "<stdin>", line 1, in ?
IndexError: no such group
If you want the same behavior than group()
:
" ".join(re.match(r"(?:aaa)(_bbb)", string1).groups())
non-capture group still showing in match
The entire match will always be group 0, you need to access that specific group (group 1 in this case since the first group is non-capture), you can do it like this:
var str = "<p model='cat'></p>";var regex = /(?:model=')(.*)(?:')/gvar match = regex.exec(str);alert(match[1]); // cat
When do we need non-capturing groups?
Non capturing group help to don't get unwanted data in capturing groups.
For instance you string look like
abc and bcd
def or cef
Here you want to capture first and third column data which is separated by and && or
. so you write the regex as follows
(\w+)\s+(and|or)\s+(\w+)
Here $1
contain first column
abc def
then $3
contain
bcd cef
and then unnecessary data stored in to the $2
which is and or
. In this case you don't want to store the unnecessary data so will use non capturing group.
(\w+)\s+(?:and|or)\s+(\w+)
Here $1 contain
abc
def
$2 contain
bcd
def
And will get the exact data from the non capturing group.
For example
(?:don't (want))
Now the $1
contain the data want
.
Then it also help to perform the |
condition inside grouping. For example
(?:don't(want)|some(what))
In the above example $1
contain the data want
and the $2
contain the data what
.
Regex/Python - why is non capturing group captured in this case?
You need to use a look-behind instead of a non-capturing group if you want to check a substring for presence/absence, but exclude it from the match:
import re
s = "Monday, Tuesday, Wednesday, Thursday, Friday, Saturday:"
print(re.sub(r"[\r\n\t]|(?<!\d):",'',s))
# ^^^^^^^
# Result: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday
See IDEONE demo
Here, (?<!\d)
only checks if the preceding character before a colon is not a digit.
Also, alternation involves additional overhead, character class [\r\n\t]
is preferable, and you do not need any capturing groups (round brackets) since you are not using them at all.
Also, please note that the regex is initialized with a raw string literal to avoid overescaping.
Some more details from Python Regular Expression Syntax regarding non-capturing groups and negative look-behinds:
(?<!...)
- Matches if the current position in the string is not preceded by a match for...
. This is called a negative lookbehind assertion. Similar to positive lookbehind assertions, the contained pattern must only match strings of some fixed length and shouldn’t contain group references. Patterns which start with negative lookbehind assertions may match at the beginning of the string being searched.
(?:...)
- A non-capturing version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern.
As look-behinds are zero-width assertions (=expressions returning true or false without moving the index any further in the string), they are exactly what you need in this case where you want to check but not match. A non-capturing group will consume part of the string and thus will be part of the match.
What does non capturing group inside a look ahead does?
It does the same thing it does outside of a lookahead.
Consider the following regex:
(\d+)(?=(b|c))
And searching the string '123c'
See regex demo
For example, in Python:
import re
m = re.search(r'(\d+)(?=(b|c))', '123c')
print(m.group(1), m.group(2))
Prints:
123 c
But with ...
(\d+)(?=(?:b|c))
... there is only capture group 1.
Regex including what is supposed to be non-capturing group in result
A (?:...)
is a non-capturing group that matches and still consumes the text. It means the part of text this group matches is still added to the overall match value.
In general, if you want to match something but not consume, you need to use lookarounds. So, if you need to match something that is followed with a specific string, use a positive lookahead, (?=...)
construct:
some_pattern(?=specific string) // if specific string comes immmediately after pattern
some_pattern(?=.*specific string) // if specific string comes anywhere after pattern
If you need to match but "exclude from match" some specific text before, use a positive lookbehind:
(?<=specific string)some_pattern // if specific string comes immmediately before pattern
(?<=specific string.*?)some_pattern // if specific string comes anywhere before pattern
Note that .*?
or .*
- that is, patterns with *
, +
, ?
, {2,}
or even {1,3}
quantifiers - in lookbehind patterns are not always supported by regex engines, however, C# .NET regex engine luckily supports them. They are also supported by Python PyPi regex
module, Vim, JGSoft software and now by ECMAScript 2018 compliant JavaScript environments.
In this case, you may capture what you need to get and just match the context without capturing:
var testEcl = "\"D:\\src\\repos\\myprj\\bin\\Debug\\MyApp.exe\" /?";
var asmName = string.Empty;
var m = Regex.Match(testEcl, @"([^\\]+)\.exe", RegexOptions.IgnoreCase);
if (m.Success)
{
asmName = m.Groups[1].Value;
}
Console.WriteLine(asmName);
See the C# demo
Details
([^\\]+)
- Capturing group 1: one or more chars other than\
\.
- a literal dotexe
- a literalexe
substring.
Since we are only interested in capturing group #1 contents, we grab m.Groups[1].Value
, and not the whole m.Value
(that contains .exe
).
Non-capturing group gets displayed in C#
The code is ignoring the capturing groups.
string line = @"DCS120170517220207-FIC-023.FLW 07-FIC-023 00060Y000000011.266525G";
string patDate = @"(?:^.{4})([2-9][0-9]{3}[0-1][0-9][0-3][0-9])";
Match m = Regex.Match(line, patDate);
foreach (Group g in m.Groups)
{
Console.WriteLine($"{g.Index}: {g.Value}");
}
m.Value
is group zero -- the entire match, irrespective of groupings. Since you wisely marked the first group as non-capturing, group 1 is the date.
I suggest naming your capturing groups, for ease of maintenance:
string line = @"DCS120170517220207-FIC-023.FLW 07-FIC-023 00060Y000000011.266525G";
string patDate = @"(?:^.{4})(?<date>[2-9][0-9]{3}[0-1][0-9][0-3][0-9])";
Match m = Regex.Match(line, patDate);
var date = m.Groups["date"].Value;
Update
Wiktor Stribiżew observes that the non-capturing group is otiose. The following pattern will behave identically to your original pattern. The first capturing group is still m.Groups[1]
, however, because m.Groups[0]
is always the entire match, irrespective of groups.
string patDate = @"^.{4}(?<date>[2-9][0-9]{3}[0-1][0-9][0-3][0-9])";
How to use regex non-capturing groups format in Python
It isn't included in the inner group, but it's still included as part of the outer group. A non-capturing group does't necessarily imply it isn't captured at all... just that that group does not explicitly get saved in the output. It is still captured as part of any enclosing groups.
Just do not put them into the ()
that define the capturing:
import pandas as pd
df = pd.DataFrame(
{'a' : [1,2,3,4],
'b' : ['41u -428u', '31u - 68u', '11u - 58u', '21u - 318u']
})
df['b'].str.extract(r'- ?(\d+)u', expand=True)
0
0 428
1 68
2 58
3 318
That way you match anything that has a '-'
in front (mabye followed by a aspace), a 'u'
behind and numbers between the both.
Where,
- # literal hyphen
\s? # optional space—or you could go with \s* if you expect more than one
(\d+) # capture one or more digits
u # literal "u"
Why aren't these non-capturing regex groups working right?
The capture group overrides each previous match. Capture group #1 first matches "1px", then capture group #1 matches "solid" overwriting "1px", then it matches "rgb(255, 255, 255)" overwriting "solid", etc.
Regex - Non capturing group not working
try using lookbehind assertion
$regex = "(?<=\[').*?(?=')"
or:
$regex = "(?:\[\[')(.*?)(?=')"
$yourstring -match $regex
$Matches[1]
Related Topics
Python: Start New Command Prompt on Windows and Wait for It Finish/Exit
How to Change Foreignkey Display Text in the Django Admin
Print List of Lists in Separate Lines
Making a Python User-Defined Class Sortable, Hashable
Good or Bad Practice in Python: Import in the Middle of a File
How to Feed Time-Series Data to Stateful Lstm
Sort Multidimensional Array Based on 2Nd Element of the Subarray
Merging a List of Time-Range Tuples That Have Overlapping Time-Ranges
Builtins.Typeerror: Must Be Str, Not Bytes
How to Resize an Image with Opencv2.0 and Python2.6
Using Numpy Vectorize on Functions That Return Vectors
How to Copy Inmemoryuploadedfile Object to Disk
How to Check If an Object Is a List or Tuple (But Not String)