Split regex to extract Strings of contiguous characters
It is totally possible to write the regex for splitting in one step:
"(?<=(.))(?!\\1)"
Since you want to split between every group of same characters, we just need to look for the boundary between 2 groups. I achieve this by using a positive look-behind just to grab the previous character, and use a negative look-ahead and back-reference to check that the next character is not the same character.
As you can see, the regex is zero-width (only 2 look around assertions). No character is consumed by the regex.
Javascript Regex to split a string into array of grouped/contiguous characters
Your regex is fine, you're just using the wrong function. Use String.match, not String.split:
var matches = 'aaaabbbbczzxxxhhnnppp'.match(/((.)\2*)/g);
How to split string using regex in java
The idea behind this is, (.)\\1+
helps to match any number of repeated characters at very first and this |.
helps to match all the other single characters. Finally put all the matched characters into a list and then print it.
String s = "AABBABA";
ArrayList<String> fields = new ArrayList<String>();
Pattern regex = Pattern.compile("(.)\\1+|.");
Matcher m = regex.matcher(s);
while(m.find()){
fields.add(m.group(0));
}
System.out.println(fields);
}
Output:
[AA, BB, A, B, A]
By defining all the above input inside an array.
String s[] = {"AA", "ABA", "AABBABA"};
Pattern regex = Pattern.compile("(.)\\1+|.");
for(String i:s)
{
ArrayList<String> fields = new ArrayList<String>();
Matcher m = regex.matcher(i);
while(m.find()){
fields.add(m.group(0));
}
System.out.println(fields);
}
Output:
[AA]
[A, B, A]
[AA, BB, A, B, A]
Split string into repeated characters
Try this:
String str = "aaaabbbccccaaddddcfggghhhh";
String[] out = str.split("(?<=(.))(?!\\1)");
System.out.println(Arrays.toString(out));
=> [aaaa, bbb, cccc, aa, dddd, c, f, ggg, hhhh]
Explanation: we want to split the string at groups of same chars, so we need to find out the "boundary" between each group. I'm using Java's syntax for positive look-behind to pick the previous char and then a negative look-ahead with a back reference to verify that the next char is not the same as the previous one. No characters were actually consumed, because only two look-around assertions were used (that is, the regular expresion is zero-width).
Extracting numbers from a String in Java by splitting on a regex
You could use a regex like this:
([-.]?\d+(?:\.\d+)?)
Working demo
Match Information:
MATCH 1
1. [1-6] `0.286`
MATCH 2
1. [6-12] `-3.099`
MATCH 3
1. [12-17] `-0.44`
MATCH 4
1. [18-24] `-2.901`
MATCH 5
1. [25-31] `-0.436`
MATCH 6
1. [34-37] `123`
MATCH 7
1. [38-43] `0.123`
MATCH 8
1. [44-47] `.34`
Update
Jawee's approach
As Jawee pointed in his comment there is a problem for .34.34
, so you can use his regex that fix this problem. Thanks Jawee to point out that.
(-?(?:\d+)?\.?\d+)
To have graphic idea about what happens behind this regex you can check this Debuggex
image:
Engine explanation:
1st Capturing group (-?(?:\d+)?\.?\d+)
-? -> matches the character - literally zero and one time
(?:\d+)? -> \d+ match a digit [0-9] one and unlimited times (using non capturing group)
\.? matches the character . literally zero and one time
\d+ match a digit [0-9] one and unlimited times
REGEX in java for extracting consecutive duplicate characters in a string
There is no single plain regex solution to this problem because you need a lookbehind with a backreference inside, which is not supported by Java regex engine.
What you can do is either get all (\w)\1+
matches and then check their length using common string methods:
String s = "aaabbaa";
Pattern pattern = Pattern.compile("(\\w)\\1+");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
if (matcher.group().length() == 2) System.out.println(matcher.group(1));
}
(see the Java demo) or you can match 3 or more repetitions or just 2 repetitions and only grab the match if the Group 2 matched:
String s = "aaabbaa";
Pattern pattern = Pattern.compile("(\\w)\\1{2,}|(\\w)\\2");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
if (matcher.group(2) != null)
System.out.println(matcher.group(2));
}
See this Java demo. Regex details:
(\w)\1{2,}
- a word char and two or more occurrences of the same char right after|
- or(\w)\2
- a word char and the same char right after.
How to split a string on regex in Python
You need to use re.split
if you want to split a string according to a regex pattern.
tokens = re.split(r'[.:]', ip)
Inside a character class |
matches a literal |
symbol and note that [.:]
matches a dot or colon (|
won't do the orring here).
So you need to remove |
from the character class or otherwise it would do splitting according to the pipe character also.
or
Use string.split
along with list_comprehension
.
>>> ip = '192.168.0.1:8080'
>>> [j for i in ip.split(':') for j in i.split('.')]
['192', '168', '0', '1', '8080']
Related Topics
Implementation Difference Between Aggregation and Composition in Java
Difference Between One-To-Many, Many-To-One and Many-To-Many
Limiting Java Ssl Debug Logging
Comparing Time Is Incorrect When Picking 12:00
Are There Inline Functions in Java
How to Load/Reference a File as a File Instance from the Classpath
How to Find Out What Type Each Object Is in a Arraylist<Object>
How and Where to Use Static Modifier in Java
Java Equivalent to #Region in C#
Java - Delete Line from Text File by Overwriting While Reading It
What's the Default Value of Char
Arraylist or List Declaration in Java
Why Is Executing Java Code in Comments with Certain Unicode Characters Allowed