How to Keep the Delimiters When Splitting a Ruby String

How do I keep the delimiters when splitting a Ruby string?

Answer

Use a positive lookbehind regular expression (i.e. ?<=) inside a parenthesis capture group to keep the delimiter at the end of each string:

content.split(/(?<=[?.!])/)

# Returns an array with:
# ["Do you like to code?", " How I love to code!", " I'm always coding."]

That leaves a white space at the start of the second and third strings. Add a match for zero or more white spaces (\s*) after the capture group to exclude it:

content.split(/(?<=[?.!])\s*/)

# Returns an array with:
# ["Do you like to code?", "How I love to code!", "I'm always coding."]

Additional Notes

While it doesn't make sense with your example, the delimiter can be shifted to the front of the strings starting with the second one. This is done with a positive lookahead regular expression (i.e. ?=). For the sake of anyone looking for that technique, here's how to do that:

content.split(/(?=[?.!])/)

# Returns an array with:
# ["Do you like to code", "? How I love to code", "! I'm always coding", "."]

A better example to illustrate the behavior is:

content = "- the - quick brown - fox jumps"
content.split(/(?=-)/)

# Returns an array with:
# ["- the ", "- quick brown ", "- fox jumps"]

Notice that the square bracket capture group wasn't necessary since there is only one delimiter. Also, since the first match happens at the first character it ends up as the first item in the array.

Split string without removing delimiter

This works:

@sql_stmts_array = File.read(@sql_file).lines(separator=';')

Ruby split keep the delimiter before the string

(?<=\\n)\s*(?=%%)

You can split on the space using lookarounds.See demo.

https://regex101.com/r/fM9lY3/7

Ruby: how to split a string on a regex while keeping the delimiters?

If split's pattern contains a capture group, the group will be included in the resulting array.

str = "oruh43451rohcs56oweuex59869rsr"
str.split(/(\d+)/)
# => ["oruh", "43451", "rohcs", "56", "oweuex", "59869", "rsr"]

If you want it zipped,

str.split(/(\d+)/).each_slice(2).to_a
# => [["oruh", "43451"], ["rohcs", "56"], ["oweuex", "59869"], ["rsr"]]

Regex / Ruby - split keeping delimiter

You may use String#split with a pattern like

/(%[^%]*%)/

According to the documentation:

If pattern contains groups, the respective matches will be returned in the array as well.

See the regex demo, it matches and captures into Group 1 a % char, then any 0 or more chars other than %, and then a %.

See a Ruby demo:

s = "Hello %Customer Name% your order number is %Order Number% and will be delivered soon"
p s.split(/(%[^%]*%)/)
# => ["Hello ", "%Customer Name%", " your order number is ", "%Order Number%", " and will be delivered soon"]

Split string into a list, but keeping the split pattern

Thanks to Mark Wilkins for inpsiration, but here's a shorter bit of code for doing it:

irb(main):015:0> s = "split on the word on okay?"
=> "split on the word on okay?"
irb(main):016:0> b=[]; s.split(/(on)/).each_slice(2) { |s| b << s.join }; b
=> ["split on", " the word on", " okay?"]

or:

s.split(/(on)/).each_slice(2).map(&:join)

See below the fold for an explanation.


Here's how this works. First, we split on "on", but wrap it in parentheses to make it into a match group. When there's a match group in the regular expression passed to split, Ruby will include that group in the output:

s.split(/(on)/)
# => ["split", "on", "the word", "on", "okay?"

Now we want to join each instance of "on" with the preceding string. each_slice(2) helps by passing two elements at a time to its block. Let's just invoke each_slice(2) to see what results. Since each_slice, when invoked without a block, will return an enumerator, we'll apply to_a to the Enumerator so we can see what the Enumerator will enumerator over:

s.split(/(on)/).each_slice(2).to_a
# => [["split", "on"], ["the word", "on"], ["okay?"]]

We're getting close. Now all we have to do is join the words together. And that gets us to the full solution above. I'll unwrap it into individual lines to make it easier to follow:

b = []
s.split(/(on)/).each_slice(2) do |s|
b << s.join
end
b
# => ["split on", "the word on" "okay?"]

But there's a nifty way to eliminate the temporary b and shorten the code considerably:

s.split(/(on)/).each_slice(2).map do |a|
a.join
end

map passes each element of its input array to the block; the result of the block becomes the new element at that position in the output array. In MRI >= 1.8.7, you can shorten it even more, to the equivalent:

s.split(/(on)/).each_slice(2).map(&:join)

Split string by multiple delimiters

word = "Now is the,time for'all good people"
word.split(/[\s,']/)
=> ["Now", "is", "the", "time", "for", "all", "good", "people"]

split string into array without deleting delimiter ruby

You may use the following regex with .split:

/(?!\A)(?=^@\d{4}(?:-\d{2}){2} \d{2}(?::\d{2}){2},\d{3})/

It is a combination of two lookaheads:

  • (?!\A) - not at the start of the string
  • (?=^@\d{4}(?:-\d{2}){2} \d{2}(?::\d{2}){2},\d{3}) - start of a line position that is followed with @\d{4}(?:-\d{2}){2} \d{2}(?::\d{2}){2},\d{3} pattern that matches @, then a date-like pattern, then a space, adn then a time-like pattern.

See the regex demo.

Split a string with multiple delimiters in Ruby

What about the following:

options.gsub(/ or /i, ",").split(",").map(&:strip).reject(&:empty?)
  • replaces all delimiters but the ,
  • splits it at ,
  • trims each characters, since stuff like ice cream with a leading space might be left
  • removes all blank strings


Related Topics



Leave a reply



Submit