SCJP6 regex issue
\d*
matches 0 or more digits. So, it will even match empty string before every character and after the last character. First before index 0
, then before index 1
, and so on.
So, for string ab34ef
, it matches following groups:
Index Group
0 "" (Before a)
1 "" (Before b)
2 34 (Matches more than 0 digits this time)
4 "" (Before `e` at index 4)
5 "" (Before f)
6 "" (At the end, after f)
If you use \\d+
, then you will get just a single group at 34
.
Trying to understand this Regex code
The \d* matches zero(!) or more digits, that's why it returns an empty string as a match at 0 and 1, it the matches 34 at position 2 and an empty string again at position 4 and 5. At that point what is left to match against is an empty string. And this empty string also matches \d* (because an empty string contains zero digits), that's why there is another match at position 6.
To contrast this try using \d+ (which matches one or more digits) as the pattern and see what happens then.
Java regex pattern query
In your string, "ab34ef"
, there are 7 "empty characters" with a value of ""
. They are located between each of the normal characters. It attempts to find a match starting on each empty character, not each normal character; i.e. the location of each |
in the following: "|a|b|3|4|e|f|"
.
Regular expression - Greedy quantifier
It is helpful to change
System.out.print(m.start() + m.group());
to
System.out.println(m.start() + ": " + m.group());
This way the output is much clearer:
0:
1:
2: 34
4:
5:
6:
You can see that it matched at 7 different positions: at position 2 it matched string "34" and at any other position it matched an empty string. Empty string matches at the end as well, which is why you see "6" at the end of your output.
Note that if you run your program like this:
java Regex2 "\d+" ab34ef
it will only output
2: 34
How Matcher.find() works
Your regular expression can match zero characters. The final match is a zero width string occurring at the end of the string, after the character at index 5. The index of this zero width string is therefore 6.
As an aside, you might also find it easier to understand what is going on if you use separators to make the output more readable:
System.out.println(matcher.start()+ ": " + matcher.group());
Results:
0:
1:
2: 34
4:
5:
6:
ideone
Related Topics
How to Cast a Double to an Int in Java by Rounding It Down
Differencebetween Unidirectional and Bidirectional JPA and Hibernate Associations
Getting Jsoup to Support Dynamically Generated HTML by JavaScript
How to Fix Google Cloud Messaging Registration Error: Service_Not_Available
Is Custom Enum Serializable Too
Android Post Picture to Facebook Wall
Jcenter Deprecation; Impact on Gradle and Android
Difference Between Actionbarsherlock and Actionbar Compatibility
When/Why to Call System.Out.Flush() in Java
How to Make a New List in Java
Vertically Centering Text in HTML Table Cell in Java Jlabel
Registering a Headset Button Click with Broadcastreceiver in Android
What's a Good Library for Parsing Mathematical Expressions in Java
How to Use Interceptor to Add Headers in Retrofit 2.0
How to Get Current Location in Googlemap Using Fusedlocationproviderclient