PHP Explode the String, But Treat Words in Quotes as a Single Word

PHP explode the string, but treat words in quotes as a single word

You could use a preg_match_all(...):

$text = 'Lorem ipsum "dolor sit amet" consectetur "adipiscing \\"elit" dolor';
preg_match_all('/"(?:\\\\.|[^\\\\"])*"|\S+/', $text, $matches);
print_r($matches);

which will produce:

Array
(
[0] => Array
(
[0] => Lorem
[1] => ipsum
[2] => "dolor sit amet"
[3] => consectetur
[4] => "adipiscing \"elit"
[5] => dolor
)

)

And as you can see, it also accounts for escaped quotes inside quoted strings.

EDIT

A short explanation:

"           # match the character '"'
(?: # start non-capture group 1
\\ # match the character '\'
. # match any character except line breaks
| # OR
[^\\"] # match any character except '\' and '"'
)* # end non-capture group 1 and repeat it zero or more times
" # match the character '"'
| # OR
\S+ # match a non-whitespace character: [^\s] and repeat it one or more times

And in case of matching %22 instead of double quotes, you'd do:

preg_match_all('/%22(?:\\\\.|(?!%22).)*%22|\S+/', $text, $matches);

PHP explode strings, but treat words in quotes as a single word

You may use

if (preg_match_all('~(?|"([^\\\\"]*(?:\\\\.[^"\\\\]*)*)"|([^\s"]+))~s', $s, $matches)) 
{
print_r($matches[1]);
}

See the regex demo.

Details

  • (?| - starts a branch reset group:

    • " - a " char
    • ([^\\\\"]*(?:\\\\.[^"\\\\]*)*) - Group 1: any 0+ chars other than \ and " followed with 0 or more repetitions of any escaped char and then any 0+ chars other than \ and "
    • " - a " char
  • | - or

    • ([^\s"]+) - Group 1: one or more chars other than whitespace and "
  • ) - end of the branch reset group.

See the PHP demo:

$s = '"foo bar"ANDbar"foo"AND"foofoo" lorem "impsum"';
if (preg_match_all('~(?|"([^\\\\"]*(?:\\\\.[^"\\\\]*)*)"|([^\s"]+))~s', $s, $matches))
{
print_r($matches[1]);
}
// => Array ( [0] => foo bar [1] => ANDbar [2] => foo [3] => AND [4] => foofoo [5] => lorem [6] => impsum )

PHP explode the string, but treat words in quotes as a single word and ignore brackets

The reason is that the regex you use is meant to keep standalone " in the matches.

If you are sure the unescaped double quotes are always paired in your input, use

'/"(?:\\\\.|[^\\\\"])*"|[^\s"]+/'
^^^^^^

Exclude the " from \S by turning it into a negative character class [^\s] and add the double quote inside.

To include single quoted substrings, you may use

'~"(?:\\\\.|[^\\\\"])*"|\'(?:\\\\.|[^\\\\\'])*\'|[^\s"\']+~'

See the regex demo and a PHP demo:

$re = '~"(?:\\\\.|[^\\\\"])*"|\'(?:\\\\.|[^\\\\\'])*\'|[^\s"\']+~';
$str = 'Lorem ipsum ("dolor sit amet") consectetur "adipiscing \\"elit" dolor \'something \\\'here\'';
preg_match_all($re, $str, $matches);
print_r($matches[0]);
// => Array ( [0] => Lorem [1] => ipsum [2] => ( [3] => "dolor sit amet" [4] => )
// [5] => consectetur [6] => "adipiscing \"elit" [7] => dolor [8] => 'something \'here' )

R: Explode string but keep quoted text as a single word

A simple option would be to use scan:

> x <- scan(what = "", text = mystr)
Read 11 items
> x
[1] "preceded by itself in quotation marks forms a complete sentence"
[2] "preceded"
[3] "by"
[4] "itself"
[5] "in"
[6] "quotation"
[7] "marks"
[8] "forms"
[9] "a"
[10] "complete"
[11] "sentence"

Split string on spaces except words in quotes

You can use:

$string = 'Some of "this string is" in quotes';
$arr = preg_split('/("[^"]*")|\h+/', $string, -1,
PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
print_r ( $arr );

Output:

Array
(
[0] => Some
[1] => of
[2] => "this string is"
[3] => in
[4] => quotes
)

RegEx Breakup

("[^"]*")    # match quoted text and group it so that it can be used in output using
# PREG_SPLIT_DELIM_CAPTURE option
| # regex alteration
\h+ # match 1 or more horizontal whitespace

An explode() function that ignores characters inside quotes?

str_getcsv($str, '/')

There's a recipe for <5.3 on the linked page.

Exploding string on space but not spaces in quotation marks

You could use regex:

$string = 'test1 test2 "test3 test4"';
preg_match_all('/\"[\s\S]+\")|([\S]+)/ism', $string, $matches);

print_r($matches);

Alternatively, you could try using str_getcsv()

PHP string explode on space, except when in quotes

You may use

preg_split('~(?<!\\\\)(?:\\\\{2})*"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"(*SKIP)(*F)|\s+~s', $s)

See the regex demo

Details

  • (?<!\\) - no \ allowed immediately to the left of the current location
  • (?:\\{2})* - zero or more double backslashes
  • " - a quote
  • [^"\\]* - 0+ chars other than " and \
  • (?:\\.[^"\\]*)* - 0+ sequences of

    • \\. - any escape sequence
    • [^"\\]* - 0+ chars other than " and \
  • " - a quote
  • (*SKIP)(*F) - skipping the match and proceeding to the next match from the current match end location
  • | - or
  • \s+ - 1+ whitespaces in any other contexts.

See the PHP demo:

$s = 'title:"tab system" color:="blue" price:>10';
print_r(preg_split('~(?<!\\\\)(?:\\\\{2})*"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"(*SKIP)(*F)|\s+~s', $s));

Output:

Array
(
[0] => title:"tab system"
[1] => color:="blue"
[2] => price:>10
)


Related Topics



Leave a reply



Submit