Split regex to extract Strings of contiguous characters

It is totally possible to write the regex for splitting in one step:


Since you want to split between every group of same characters, we just need to look for the boundary between 2 groups. I achieve this by using a positive look-behind just to grab the previous character, and use a negative look-ahead and back-reference to check that the next character is not the same character.

As you can see, the regex is zero-width (only 2 look around assertions). No character is consumed by the regex.

Javascript Regex to split a string into array of grouped/contiguous characters

Your regex is fine, you're just using the wrong function. Use String.match, not String.split:

var matches = 'aaaabbbbczzxxxhhnnppp'.match(/((.)\2*)/g);

How to split string using regex in java

The idea behind this is, (.)\\1+ helps to match any number of repeated characters at very first and this |. helps to match all the other single characters. Finally put all the matched characters into a list and then print it.

   String s = "AABBABA";
ArrayList<String> fields = new ArrayList<String>();

Pattern regex = Pattern.compile("(.)\\1+|.");
Matcher m = regex.matcher(s);




[AA, BB, A, B, A]

By defining all the above input inside an array.

   String s[] = {"AA", "ABA", "AABBABA"};
Pattern regex = Pattern.compile("(.)\\1+|.");
for(String i:s)
ArrayList<String> fields = new ArrayList<String>();
Matcher m = regex.matcher(i);




[A, B, A]
[AA, BB, A, B, A]

Split string into repeated characters

Try this:

String   str = "aaaabbbccccaaddddcfggghhhh";
String[] out = str.split("(?<=(.))(?!\\1)");

=> [aaaa, bbb, cccc, aa, dddd, c, f, ggg, hhhh]

Explanation: we want to split the string at groups of same chars, so we need to find out the "boundary" between each group. I'm using Java's syntax for positive look-behind to pick the previous char and then a negative look-ahead with a back reference to verify that the next char is not the same as the previous one. No characters were actually consumed, because only two look-around assertions were used (that is, the regular expresion is zero-width).

Extracting numbers from a String in Java by splitting on a regex

You could use a regex like this:


Working demo

Sample Image

Match Information:

1. [1-6] `0.286`
1. [6-12] `-3.099`
1. [12-17] `-0.44`
1. [18-24] `-2.901`
1. [25-31] `-0.436`
1. [34-37] `123`
1. [38-43] `0.123`
1. [44-47] `.34`


Jawee's approach

As Jawee pointed in his comment there is a problem for .34.34, so you can use his regex that fix this problem. Thanks Jawee to point out that.


To have graphic idea about what happens behind this regex you can check this Debuggex

Regular expression visualization

Engine explanation:

1st Capturing group (-?(?:\d+)?\.?\d+)
-? -> matches the character - literally zero and one time
(?:\d+)? -> \d+ match a digit [0-9] one and unlimited times (using non capturing group)
\.? matches the character . literally zero and one time
\d+ match a digit [0-9] one and unlimited times

REGEX in java for extracting consecutive duplicate characters in a string

There is no single plain regex solution to this problem because you need a lookbehind with a backreference inside, which is not supported by Java regex engine.

What you can do is either get all (\w)\1+ matches and then check their length using common string methods:

String s = "aaabbaa";
Pattern pattern = Pattern.compile("(\\w)\\1+");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
if (matcher.group().length() == 2) System.out.println(matcher.group(1));

(see the Java demo) or you can match 3 or more repetitions or just 2 repetitions and only grab the match if the Group 2 matched:

String s = "aaabbaa";
Pattern pattern = Pattern.compile("(\\w)\\1{2,}|(\\w)\\2");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
if (matcher.group(2) != null)

See this Java demo. Regex details:

  • (\w)\1{2,} - a word char and two or more occurrences of the same char right after
  • | - or
  • (\w)\2 - a word char and the same char right after.

How to split a string on regex in Python

You need to use re.split if you want to split a string according to a regex pattern.

tokens = re.split(r'[.:]', ip)

Inside a character class | matches a literal | symbol and note that [.:] matches a dot or colon (| won't do the orring here).

So you need to remove | from the character class or otherwise it would do splitting according to the pipe character also.


Use string.split along with list_comprehension.

>>> ip = ''
>>> [j for i in ip.split(':') for j in i.split('.')]
['192', '168', '0', '1', '8080']

