Regex Named Groups in Java
(Update: August 2011)
As geofflane mentions in his answer, Java 7 now support named groups.
tchrist points out in the comment that the support is limited.
He details the limitations in his great answer "Java Regex Helper"
Java 7 regex named group support was presented back in September 2010 in Oracle's blog.
In the official release of Java 7, the constructs to support the named capturing group are:
(?<name>capturing text)
to define a named group "name"\k<name>
to backreference a named group "name"${name}
to reference to captured group in Matcher's replacement stringMatcher.group(String name)
to return the captured input subsequence by the given "named group".
Other alternatives for pre-Java 7 were:
- Google named-regex (see John Hardy's answer)
Gábor Lipták mentions (November 2012) that this project might not be active (with several outstanding bugs), and its GitHub fork could be considered instead. - jregex (See Brian Clozel's answer)
(Original answer: Jan 2009, with the next two links now broken)
You can not refer to named group, unless you code your own version of Regex...
That is precisely what Gorbush2 did in this thread.
Regex2
(limited implementation, as pointed out again by tchrist, as it looks only for ASCII identifiers. tchrist details the limitation as:
only being able to have one named group per same name (which you don’t always have control over!) and not being able to use them for in-regex recursion.
Note: You can find true regex recursion examples in Perl and PCRE regexes, as mentioned in Regexp Power, PCRE specs and Matching Strings with Balanced Parentheses slide)
Example:
String:
"TEST 123"
RegExp:
"(?<login>\\w+) (?<id>\\d+)"
Access
matcher.group(1) ==> TEST
matcher.group("login") ==> TEST
matcher.name(1) ==> login
Replace
matcher.replaceAll("aaaaa_$1_sssss_$2____") ==> aaaaa_TEST_sssss_123____
matcher.replaceAll("aaaaa_${login}_sssss_${id}____") ==> aaaaa_TEST_sssss_123____
(extract from the implementation)
public final class Pattern
implements java.io.Serializable
{
[...]
/**
* Parses a group and returns the head node of a set of nodes that process
* the group. Sometimes a double return system is used where the tail is
* returned in root.
*/
private Node group0() {
boolean capturingGroup = false;
Node head = null;
Node tail = null;
int save = flags;
root = null;
int ch = next();
if (ch == '?') {
ch = skip();
switch (ch) {
case '<': // (?<xxx) look behind or group name
ch = read();
int start = cursor;
[...]
// test forGroupName
int startChar = ch;
while(ASCII.isWord(ch) && ch != '>') ch=read();
if(ch == '>'){
// valid group name
int len = cursor-start;
int[] newtemp = new int[2*(len) + 2];
//System.arraycopy(temp, start, newtemp, 0, len);
StringBuilder name = new StringBuilder();
for(int i = start; i< cursor; i++){
name.append((char)temp[i-1]);
}
// create Named group
head = createGroup(false);
((GroupTail)root).name = name.toString();
capturingGroup = true;
tail = root;
head.next = expr(tail);
break;
}
Java support for (? name pattern) in patterns
This is supported starting in Java 7. Your C# code can be translated to something like this:
String pattern = ";(?<foo>\\d{6});(?<bar>\\d{6});";
Pattern regex = Pattern.compile(pattern);
Matcher matcher = regex.matcher(";123456;123456;");
boolean success = matcher.find();
String foo = success ? matcher.group("foo") : null;
String bar = success ? matcher.group("bar") : null;
You have to create a Matcher
object which doesn't actually perform the regex test until you call find()
.
(I used find()
because it can find a match anywhere in the input string, like the Regex.Match()
method. The .matches()
method only returns true if the regex matches the entire input string.)
Regular Expression named capturing groups support in Java 7
Specifying named capturing group
Use the following regex with a single capturing group as an example ([Pp]attern)
.
Below are 4 examples on how to specify a named capturing group for the regex above:
(?<Name>[Pp]attern)
(?<group1>[Pp]attern)
(?<name>[Pp]attern)
(?<NAME>[Pp]attern)
Note that the name of the capturing group must strictly matches the following Pattern:
[A-Za-z][A-Za-z0-9]*
The group name is case-sensitive, so you must specify the exact group name when you are referring to them (see below).
Backreference the named capturing group in regex
To back-reference the content matched by a named capturing group in the regex (correspond to 4 examples above):
\k<Name>
\k<group1>
\k<name>
\k<NAME>
The named capturing group is still numbered, so in all 4 examples, it can be back-referenced with \1
as per normal.
Refer to named capturing group in replacement string
To refer to the capturing group in replacement string (correspond to 4 examples above):
${Name}
${group1}
${name}
${NAME}
Same as above, in all 4 examples, the content of the capturing group can be referred to with $1
in the replacement string.
Named capturing group in COMMENT mode
Using (?<name>[Pp]attern)
as an example for this section.
Oracle's implementation of the COMMENT
mode (embedded flag (?x)
) parses the following examples to be identical to the regex above:
(?x) ( ?<name> [Pp] attern )
(?x) ( ?< name > [Pp] attern )
(?x) ( ?< n a m e > [Pp] attern )
Except for ?<
which must not be separated, it allows arbitrary spacing even in between the name of the capturing group.
Same name for different capturing groups?
While it is possible in .NET, Perl and PCRE to define the same name for different capturing groups, it is currently not supported in Java (Java 8). You can't use the same name for different capturing groups.
Named capturing group related APIs
New methods in Matcher class to support retrieving captured text by group name:
group(String name)
(from Java 7)start(String name)
(from Java 8)end(String name)
(from Java 8)
The corresponding method is missing from MatchResult
class as of Java 8. There is an on-going Enhancement request JDK-8065554 for this issue.
There is currently no API to get the list of named capturing groups in the regex. We have to jump through extra hoops to get it. Though it is quite useless for most purposes, except for writing a regex tester.
How to get the names of the regex named capturing group in a match in Java?
Here is my attempt in Scala:
import java.util.regex.{MatchResult, Pattern}
class GroupNamedRegex(pattern: Pattern, namedGroups: Set[String]) {
def this(regex: String) = this(Pattern.compile(regex),
"\\(\\?<([a-zA-Z][a-zA-Z0-9]*)>".r.findAllMatchIn(regex).map(_.group(1)).toSet)
def findNamedMatches(s: String): Iterator[GroupNamedRegex.Match] = new Iterator[GroupNamedRegex.Match] {
private[this] val m = pattern.matcher(s)
private[this] var _hasNext = m.find()
override def hasNext = _hasNext
override def next() = {
val ans = GroupNamedRegex.Match(m.toMatchResult, namedGroups.find(group => m.group(group) != null))
_hasNext = m.find()
ans
}
}
}
object GroupNamedRegex extends App {
case class Match(result: MatchResult, groupName: Option[String])
val r = new GroupNamedRegex("(?<FB>(FACE(\\p{Space}?)BOOK))|(?<GOOGL>(GOOGL(E)?))")
println(r.findNamedMatches("FACEBOOK is buying GOOGLE and FACE BOOK FB").map(s => s.groupName -> s.result.group()).toList)
}
Java String.replaceAll backreference with named groups
Based on https://blogs.oracle.com/xuemingshen/entry/named_capturing_group_in_jdk7
you should use ${nameOfCapturedGroup}
which in your case would be ${render}
.
DEMO:
String test = "{0000:Billy} bites {0001:Jake}";
test = test.replaceAll("\\{(?<id>\\d\\d\\d\\d):(?<render>.*?)\\}", "${render}");
System.out.println(test);
Output: Billy bites Jake
Get group names in java regex
There is no API in Java to obtain the names of the named capturing groups. I think this is a missing feature.
The easy way out is to pick out candidate named capturing groups from the pattern, then try to access the named group from the match. In other words, you don't know the exact names of the named capturing groups, until you plug in a string that matches the whole pattern.
The Pattern
to capture the names of the named capturing group is \(\?<([a-zA-Z][a-zA-Z0-9]*)>
(derived based on Pattern
class documentation).
(The hard way is to implement a parser for regex and get the names of the capturing groups).
A sample implementation:
import java.util.Scanner;
import java.util.Set;
import java.util.TreeSet;
import java.util.Iterator;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.regex.MatchResult;
class RegexTester {
public static void main(String args[]) {
Scanner scanner = new Scanner(System.in);
String regex = scanner.nextLine();
StringBuilder input = new StringBuilder();
while (scanner.hasNextLine()) {
input.append(scanner.nextLine()).append('\n');
}
Set<String> namedGroups = getNamedGroupCandidates(regex);
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(input);
int groupCount = m.groupCount();
int matchCount = 0;
if (m.find()) {
// Remove invalid groups
Iterator<String> i = namedGroups.iterator();
while (i.hasNext()) {
try {
m.group(i.next());
} catch (IllegalArgumentException e) {
i.remove();
}
}
matchCount += 1;
System.out.println("Match " + matchCount + ":");
System.out.println("=" + m.group() + "=");
System.out.println();
printMatches(m, namedGroups);
while (m.find()) {
matchCount += 1;
System.out.println("Match " + matchCount + ":");
System.out.println("=" + m.group() + "=");
System.out.println();
printMatches(m, namedGroups);
}
}
}
private static void printMatches(Matcher matcher, Set<String> namedGroups) {
for (String name: namedGroups) {
String matchedString = matcher.group(name);
if (matchedString != null) {
System.out.println(name + "=" + matchedString + "=");
} else {
System.out.println(name + "_");
}
}
System.out.println();
for (int i = 1; i < matcher.groupCount(); i++) {
String matchedString = matcher.group(i);
if (matchedString != null) {
System.out.println(i + "=" + matchedString + "=");
} else {
System.out.println(i + "_");
}
}
System.out.println();
}
private static Set<String> getNamedGroupCandidates(String regex) {
Set<String> namedGroups = new TreeSet<String>();
Matcher m = Pattern.compile("\\(\\?<([a-zA-Z][a-zA-Z0-9]*)>").matcher(regex);
while (m.find()) {
namedGroups.add(m.group(1));
}
return namedGroups;
}
}
}
There is a caveat to this implementation, though. It currently doesn't work with regex in Pattern.COMMENTS
mode.
How do I take a string with a named group and replace only that named capture group with a value in Java 7
I got it:
String string = "/this/(?<capture1>.*)/a/string/(?<capture2>.*)";
Pattern pattern = Pattern.compile(string);
Matcher matcher = pattern.matches(string);
string.replace(matcher.group("capture1"), "value 1");
string.replace(matcher.group("capture2"), "value 2");
Crazy, but works.
android java regex named groups
Android Pattern
class implementation is provided by ICU, to be precise, ICU4C.
The regular expression implementation used in Android is provided by ICU. The notation for the regular expressions is mostly a superset of those used in other Java language implementations. This means that existing applications will normally work as expected, but in rare cases Android may accept a regular expression that is not accepted by other implementations.
And ICU4C currently doesn't support named capturing group. You have to fall back on numbered capturing groups.
ICU does not support named capture groups. http://bugs.icu-project.org/trac/ticket/5312
You need to write a wrapper and parse the expression yourself to provide named capturing group capability, if you really need the feature.
Related Topics
Java: Error: Variable Might Not Have Been Initialized
Spring MVC @Pathvariable Getting Truncated
What Is More Efficient, I++ or ++I
How to Put a Jar in Classpath in Eclipse
How to Automatically Generate N "Distinct" Colors
How to Convert Currenttimemillis to a Date in Java
Convert HTML Character Back to Text Using Java Standard Library
How to Parse Date String to Date
How to Respond with an Http 400 Error in a Spring MVC @Responsebody Method Returning String
Overload with Different Return Type in Java
Parse JSON from Httpurlconnection Object
Java.Lang.Outofmemoryerror: Gc Overhead Limit Exceeded
Polymorphism: Why Use "List List = New Arraylist" Instead of "Arraylist List = New Arraylist"
Convert an Array of Primitive Longs into a List of Longs
Loading Images from Jars for Swing HTML
How to Get Frequency from Fft Result