Scjp6 Regex Issue

SCJP6 regex issue

\d* matches 0 or more digits. So, it will even match empty string before every character and after the last character. First before index 0, then before index 1, and so on.

So, for string ab34ef, it matches following groups:

Index    Group
0 "" (Before a)
1 "" (Before b)
2 34 (Matches more than 0 digits this time)
4 "" (Before `e` at index 4)
5 "" (Before f)
6 "" (At the end, after f)

If you use \\d+, then you will get just a single group at 34.

Trying to understand this Regex code

The \d* matches zero(!) or more digits, that's why it returns an empty string as a match at 0 and 1, it the matches 34 at position 2 and an empty string again at position 4 and 5. At that point what is left to match against is an empty string. And this empty string also matches \d* (because an empty string contains zero digits), that's why there is another match at position 6.

To contrast this try using \d+ (which matches one or more digits) as the pattern and see what happens then.

Java regex pattern query

In your string, "ab34ef", there are 7 "empty characters" with a value of "". They are located between each of the normal characters. It attempts to find a match starting on each empty character, not each normal character; i.e. the location of each | in the following: "|a|b|3|4|e|f|".

Regular expression - Greedy quantifier

It is helpful to change

System.out.print(m.start() + m.group());

to

System.out.println(m.start() + ": " + m.group());

This way the output is much clearer:

0: 
1:
2: 34
4:
5:
6:

You can see that it matched at 7 different positions: at position 2 it matched string "34" and at any other position it matched an empty string. Empty string matches at the end as well, which is why you see "6" at the end of your output.

Note that if you run your program like this:

java Regex2 "\d+" ab34ef

it will only output

2: 34

How Matcher.find() works

Your regular expression can match zero characters. The final match is a zero width string occurring at the end of the string, after the character at index 5. The index of this zero width string is therefore 6.


As an aside, you might also find it easier to understand what is going on if you use separators to make the output more readable:

System.out.println(matcher.start()+ ": " + matcher.group());

Results:

0: 
1:
2: 34
4:
5:
6:

ideone



Related Topics



Leave a reply



Submit