PHP Put a Space in Front of Capitals in a String (Regex)

Php put a space in front of capitals in a string (Regex)

Problem

  1. Your regex '~^[A-Z]~' will match only the first capital letter. Check out Meta Characters in the Pattern Syntax for more information.

  2. Your replacement is a newline character '\n' and not a space.

Solution

Use this code:

$String = 'ThisWasCool';
$Words = preg_replace('/(?<!\ )[A-Z]/', ' $0', $String);

The (?<!\ ) is an assertion that will make sure we don't add a space before a capital letter that already has a space before it.

How can I add a space in string at capital letters, but keep continuous capitals together using PHP and a Regex?

Find:

(?<!^)((?<![[:upper:]])[[:upper:]]|[[:upper:]](?![[:upper:]]))

Replace:

 $1

note the space before $1

Edit: fix.

Insert space before capital letters

You can just add a space before every uppercase character and trim off the leading and trailing spaces

s = s.replace(/([A-Z])/g, ' $1').trim()

How to add space in PHP string only between full CamelCase words skipping single uppercase letters intact?

One option could be using 2 capturing groups and an alternation using a branch reset group to share the same capturing groups

(?|([A-Z])([A-Z][a-z])|([a-z])([A-Z]))
  • (?| Branch reset group

    • ([A-Z]) Capture group 1, match A-Z
    • ([A-Z][a-z]) Capture group 2
    • | Or
    • ([a-z]) Capture group 1, match a-z
    • ([A-Z]) Capture group 2, match A-Z
  • ) Close branch reset group

Regex demo

In the replacement use

$1 $2

Output

IMAC Super Serious Label DCN

Add spaces before Capital Letters

The regexes will work fine (I even voted up Martin Browns answer), but they are expensive (and personally I find any pattern longer than a couple of characters prohibitively obtuse)

This function

string AddSpacesToSentence(string text, bool preserveAcronyms)
{
if (string.IsNullOrWhiteSpace(text))
return string.Empty;
StringBuilder newText = new StringBuilder(text.Length * 2);
newText.Append(text[0]);
for (int i = 1; i < text.Length; i++)
{
if (char.IsUpper(text[i]))
if ((text[i - 1] != ' ' && !char.IsUpper(text[i - 1])) ||
(preserveAcronyms && char.IsUpper(text[i - 1]) &&
i < text.Length - 1 && !char.IsUpper(text[i + 1])))
newText.Append(' ');
newText.Append(text[i]);
}
return newText.ToString();
}

Will do it 100,000 times in 2,968,750 ticks, the regex will take 25,000,000 ticks (and thats with the regex compiled).

It's better, for a given value of better (i.e. faster) however it's more code to maintain. "Better" is often compromise of competing requirements.

Update

It's a good long while since I looked at this, and I just realised the timings haven't been updated since the code changed (it only changed a little).

On a string with 'Ab' repeated 100 times (i.e. 1,000 bytes), a run of 100,000 conversions takes the hand coded function 4,517,177 ticks, and the Regex below takes 59,435,719 making the Hand coded function run in 7.6% of the time it takes the Regex.

Update 2
Will it take Acronyms into account? It will now!
The logic of the if statment is fairly obscure, as you can see expanding it to this ...

if (char.IsUpper(text[i]))
if (char.IsUpper(text[i - 1]))
if (preserveAcronyms && i < text.Length - 1 && !char.IsUpper(text[i + 1]))
newText.Append(' ');
else ;
else if (text[i - 1] != ' ')
newText.Append(' ');

... doesn't help at all!

Here's the original simple method that doesn't worry about Acronyms

string AddSpacesToSentence(string text)
{
if (string.IsNullOrWhiteSpace(text))
return "";
StringBuilder newText = new StringBuilder(text.Length * 2);
newText.Append(text[0]);
for (int i = 1; i < text.Length; i++)
{
if (char.IsUpper(text[i]) && text[i - 1] != ' ')
newText.Append(' ');
newText.Append(text[i]);
}
return newText.ToString();
}

Putting space in camel case string using regular expression

var rex = /([A-Z])([A-Z])([a-z])|([a-z])([A-Z])/g;

"CSVFilesAreCoolButTXT".replace( rex, '$1$4 $2$3$5' );
// "CSV Files Are Cool But TXT"

And also

"CSVFilesAreCoolButTXTRules".replace( rex, '$1$4 $2$3$5' );    
// "CSV Files Are Cool But TXT Rules"

The text of the subject string that matches the regex pattern will be replaced by the replacement string '$1$4 $2$3$5', where the $1, $2 etc. refer to the substrings matched by the pattern's capture groups ().

$1 refers to the substring matched by the first ([A-Z]) sub-pattern, and $3 refers to the substring matched by the first ([a-z]) sub-pattern etc.

Because of the alternation character |, to make a match the regex will have to match either the ([A-Z])([A-Z])([a-z]) sub-pattern or the ([a-z])([A-Z]) sub-pattern, so if a match is made several of the capture groups will remain unmatched. These capture groups can be referenced in the replacement string but they have have no effect upon it - effectively, they will reference an empty string.

The space in the replacement string ensures a space is inserted in the subject string every time a match is made (the trailing g flag means the regular expression engine will look for more than one match).

PHP regex strip comma and space from beginning and end of string

Use the regex \b\w+\b to extract words and then reformat like this:

<?php

$strings = [", One ",
", One , Two",
"One, Two ",
" One,Two, ",
" ,Two ,Three ",
", Two ,Three, Twenty Five, Six"];
foreach($strings as &$str)
{
preg_match_all('/\b[\w\s]+\b/',$str,$matches);
$neat = '';
foreach($matches[0] as $word)
{
$neat .= $word.', ';
}
$neat = rtrim($neat,', ');
$str = $neat;
}
print_r($strings);

?>

Output:

Array
(
[0] => One
[1] => One, Two
[2] => One, Two
[3] => One, Two
[4] => Two, Three
[5] => Two, Three, Twenty Five, Six
)


Related Topics



Leave a reply



Submit