Java Regex - Overlapping Matches

Java regex - overlapping matches

Make the matcher attempt to start its next scan from the latter \d+.

Matcher m = Pattern.compile("\\d+\\D+(\\d+)").matcher("2abc3abc4abc5");
if (m.find()) {
do {
allMatches.add(m.group());
} while (m.find(m.start(1)));
}

java get regex overlapping matches

Is this what you are trying to do?

String regex = "(?=((Bob|Mary)\\b[^\\.\\?!]*?\\b(Paris|London)\\b.*?[\\.\\?!]))";
Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern
.matcher("Bob and Mary are planning to go to Paris. They want to leave before July.");
while (matcher.find()){
System.out.println(matcher.group(1));
}

output:

Bob and Mary are planning to go to Paris.
Mary are planning to go to Paris.

Normally regex will consume what it will match once so it would be impossible to use same part of string in next match. To get rid of this problem we can use look-ahead mechanism (?=...) and groups.

Efficiently finding all overlapping matches for a regular expression

In the other question you mentioned Matcher's region() method, but you weren't making full use of it. What makes it so valuable is that the anchors will match at the region's bounds as if they were the bounds of a standalone string. That's assuming you've got the useAnchoringBounds() option set, but that's the default setting.

public static void allMatches(String text, String regex)
{
Matcher m = Pattern.compile(regex).matcher(text);
int end = text.length();
for (int i = 0; i < end; ++i)
{
for (int j = i + 1; j <= end; ++j)
{
m.region(i, j);

if (m.find())
{
System.out.printf("Match found: \"%s\" at position [%d, %d)%n",
m.group(), i, j);
}
}
}
}

Given your sample string and regex:

allMatches("String t = 04/31 412-555-1235;", "^\\d\\d+$");

...I get this output:

Match found: "04" at position [11, 13)
Match found: "31" at position [14, 16)
Match found: "41" at position [17, 19)
Match found: "412" at position [17, 20)
Match found: "12" at position [18, 20)
Match found: "55" at position [21, 23)
Match found: "555" at position [21, 24)
Match found: "55" at position [22, 24)
Match found: "12" at position [25, 27)
Match found: "123" at position [25, 28)
Match found: "1235" at position [25, 29)
Match found: "23" at position [26, 28)
Match found: "235" at position [26, 29)
Match found: "35" at position [27, 29)

Getting overlapping matches with multiple patterns in Java regex

You could try the below regex which uses positive lookahead assertion.

(?=(\b\w+ Road \d+\b)|(\b\d+ suite\b))

DEMO

String s = "XYZ Road 123 Suite";
Matcher m = Pattern.compile("(?i)(?=(\\b\\w+ Road \\d+\\b)|(\\b\\d+ suite))").matcher(s);
while(m.find())
{
if(m.group(1) != null) System.out.println(m.group(1));
if(m.group(2) != null) System.out.println(m.group(2));
}

Output:

XYZ Road 123
123 Suite

Find ALL matches of a regex pattern in Java - even overlapping ones

By default, successive calls to Matcher.find() start at the end of the previous match.

To find from a specific location pass a start position parameter to find of one character past the start of the previous find.

In your case probably something like:

while (matcher.find(matcher.start()+1))

This works fine:

Pattern p = Pattern.compile("[0-9],[0-9],[0-9],[0-9]");

public void test(String[] args) throws Exception {
String test = "0,1,2,3,4,5,6,7,8,9";
Matcher m = p.matcher(test);
if(m.find()) {
do {
System.out.println(m.group());
} while(m.find(m.start()+1));
}
}

printing

0,1,2,3

1,2,3,4

...

How can I get overlapping RegEx matches in Java?

You can get this with a lookahead that involves a capture group:

(?=([a-z]{2})).

You'll need a loop involving Matcher.find and query the matcher each time with group(1) to get your match. The main regex match itself is irrelevant and should be ignored.

Returning overlapping regular expressions

Sure, match an empty string and place a look-ahead after it that captures /.* in a capturing group:

Matcher m = Pattern.compile("(?=(/.*))").matcher("/abc/def/ghi");
while(m.find()) {
System.out.println(m.group(1));
}

would print:

/abc/def/ghi
/def/ghi
/ghi


Related Topics



Leave a reply



Submit