Java: Split String When an Uppercase Letter Is Found

Java: Split string when an uppercase letter is found

You may use a regexp with zero-width positive lookahead - it finds uppercase letters but doesn't include them into delimiter:

String s = "thisIsMyString";
String[] r = s.split("(?=\\p{Upper})");

Y(?=X) matches Y followed by X, but doesn't include X into match. So (?=\\p{Upper}) matches an empty sequence followed by a uppercase letter, and split uses it as a delimiter.

See javadoc for more info on Java regexp syntax.

EDIT: By the way, it doesn't work with thisIsMyÜberString. For non-ASCII uppercase letters you need a Unicode uppercase character class instead of POSIX one:

String[] r = s.split("(?=\\p{Lu})");

Split by capital letters in Java

It looks like you want to convert camelcase into readable language. Is that the case?

If so, this solution should work for you - How do I convert CamelCase into human-readable names in Java?

If you want subsequent words lowercased, you'll have to split to handle that yourself.

java regular expression: conditionally spilt string by capital letters

Since you can have multiple consecutive upper case letters, you want to have lookbehind for lower case as well as lookahead for upper case:

(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])

If you want support for other languages, you should use posix character classes:

(?<=\\p{Lower})(?=\\p{Upper})|(?<=\\p{Upper})(?=\\p{Upper}\\p{Lower})

The first alternation will match if you are between lowercase and uppercase letters. The second one - if you are between an upper case and another upper case, followed by lower case.

Split String based on Uppercase and Numbers

public static void main(String args[]) {
String s = "HeFNeO2H3Be1H";
String[] r = s.split("(?=[A-Z0-9])");
for (int i = 0; i<r.length; i++){
System.out.println(""+r[i]);
}
}

Split the string from first Upper case letter

boolean hadDot=false;//this makes sure we don't split before finding the file extension
String file="",date="";
for(int i=0;i<text.length();i++){
if(text.charAt(i)=='.'){
hadDot=true;
continue;
}
if(hadDot&&Character.isUpperCase(text.charAt(i))){
file=text.substring(0,i);
date=text.substring(i);
break;
}
}

How to split a string based of capital letters?

You need to match these chunks with /[A-Z]+[^A-Z]*|[^A-Z]+/g instead of splitting with a zero-width assertion pattern, because the latter (in your case, it is a positive lookahead only regex) will have to check each position inside the string and it is impossible to tell the regex to skip a position once the lookaround pattern is found.

s = 'and some text hereOzievRQ7O37SB5qG3eLB';console.log(s.match(/[A-Z]+[^A-Z]*|[^A-Z]+/g));

split a string based on pattern in java - capital letters and numbers

You can actually do this in regex alone using look ahead and look behind
(see special constructs on this page: http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html )

/**
* We'll use this pattern as divider to split the string into an array.
* Usage: myString.split(DIVIDER_PATTERN);
*/
private static final String DIVIDER_PATTERN =

"(?<=[^\\p{Lu}])(?=\\p{Lu})"
// either there is anything that is not an uppercase character
// followed by an uppercase character

+ "|(?<=[\\p{Ll}])(?=\\d)"
// or there is a lowercase character followed by a digit

;

@Test
public void testStringSplitting() {
assertEquals(2, "3/4Word".split(DIVIDER_PATTERN).length);
assertEquals(7, "ManyManyWordsInThisBigThing".split(DIVIDER_PATTERN).length);
assertEquals(7, "This123/4Mixed567ThingIsDifficult"
.split(DIVIDER_PATTERN).length);
}

So what you can do is something like this:

for(String word: myString.split(DIVIDER_PATTERN)){
System.out.println(word);
}

Sean

Split a string at uppercase letters, but only if a lowercase letter follows in Python

We can try using re.sub here for a regex approach:

inp = "2018Annual ReportInvesting for Growth and Market LeadershipOur CEO will provide you with all further details below."
inp = re.sub(r'(?<![A-Z\W])(?=[A-Z])', ' ', inp)
print(inp)

This prints:

2018 Annual Report Investing for Growth and Market Leadership Our CEO will provide you with all further details below.

The regex used here says to insert a space at any point for which:

(?<![A-Z\W])  what precedes is a word character EXCEPT
for capital letters
(?=[A-Z]) and what follows is a capital letter


Related Topics



Leave a reply



Submit