Ruby String Split with Regex

Ruby String split with regex

I think this would do it:

a.split(/\.(?=[\w])/)

I don't know how much you know about regex, but the (?=[\w]) is a lookahead that says "only match the dot if the next character is a letter kind of character". A lookahead won't actually grab the text it matches. It just "looks". So the result is exactly what you're looking for:

> a.split(/\.(?=[\w])/)
=> ["foo", "bar", "size", "split('.')", "last"]

Ruby: how to split a string on a regex while keeping the delimiters?

If split's pattern contains a capture group, the group will be included in the resulting array.

str = "oruh43451rohcs56oweuex59869rsr"
str.split(/(\d+)/)
# => ["oruh", "43451", "rohcs", "56", "oweuex", "59869", "rsr"]

If you want it zipped,

str.split(/(\d+)/).each_slice(2).to_a
# => [["oruh", "43451"], ["rohcs", "56"], ["oweuex", "59869"], ["rsr"]]

How can I use regex in Ruby to split a string into an array of the words it contains?

You may use a matching approach to extract chunks of 2 or more uppercase letters or a letter followed only with 0+ lowercase letters:

s.scan(/\p{Lu}{2,}|\p{L}\p{Ll}*/).map(&:downcase)

See the Ruby demo and the Rubular demo.

The regex matches:

  • \p{Lu}{2,} - 2 or more uppercase letters
  • | - or
  • \p{L} - any letter
  • \p{Ll}* - 0 or more lowercase letters.

With map(&:downcase), the items you get with .scan() are turned to lower case.

Ruby Split string at character difference using regex

str = "111333224456777"

str.scan /0+|1+|2+|3+|4+|5+|6+|7+|8+|9+/
#=> ["111", "333", "22", "44", "5", "6", "777"]

or

str.gsub(/(\d)\1*/).to_a
#=> ["111", "333", "22", "44", "5", "6", "777"]

The latter uses the (underused) form of String#gsub that takes one argument and no block, returning an enumerator. It merely generates matches and has nothing to do with character replacement.


For fun, here are several other ways to do that.

str.scan(/((\d)\2*)/).map(&:first)
str.split(/(?<=(.))(?!\1)/).each_slice(2).map(&:first)
str.each_char.slice_when(&:!=).map(&:join)
str.each_char.chunk(&:itself).map { |_,a| a.join }
str.each_char.chunk_while(&:==).map(&:join)
str.gsub(/(?<=(.))(?!\1)/, ' ').split
str.gsub(/(.)\1*/).reduce([], &:<<)
str[1..-1].each_char.with_object([txt[0]]) {|c,a| a.last[-1]==c ? (a.last<<c) : a << c}

Splitting string in Ruby on list of words using regex

I'd do it this way:

str = "The force be with you."     
stop_array = %w[the with]
stopwords_regex = /(?:#{ Regexp.union(stop_array).source })/i
str.split(stopwords_regex).map(&:strip) # => ["", "force be", "you."]

When using Regexp.union, it's important to watch out for the actual pattern that is generated:

/(?:#{ Regexp.union(stop_array) })/i
=> /(?:(?-mix:the|with))/i

The embedded (?-mix: turns off the case-insensitive flag inside the pattern, which can break the pattern, causing it to grab the wrong things. Instead, you have to tell the engine to return just the pattern, without the flags:

/(?:#{ Regexp.union(stop_array).source })/i
=> /(?:the|with)/i

Here's why pattern = "(?:\bthe\b|\bwith\b)" doesn't work:

/#{pattern}/i # => /(?:\x08the\x08|\x08with\x08)/i

Ruby sees "\b" as a backspace character. Instead use:

pattern = "(?:\\bthe\\b|\\bwith\\b)"
/#{pattern}/i # => /(?:\bthe\b|\bwith\b)/i

How to split string using regex to split between +,-,*,/ symbols?

I think this could be useful:

"1.2+3.453".split('+').flat_map{|elem| [elem, "+"]}[0...-1]
# => ["1.2", "+", "3.453"]
"1.2+3.453".split('+').flat_map{|elem| [elem.to_f, "+"]}[0...-1]
# => [1.2, "+", 3.453]

Obviously this work only for +. But you can change the split character.

EDIT:

This version work for every operator

"1.2+3.453".split(%r{(\+|\-|\/|\*)}).map do |x|
unless x =~ /(\+|\-|\/|\*)/ then x.to_f else x end
end
# => [1.2, "+", 3.453]

Split string with regex group

As for the initial empty string, it is because the original purpose of split is to delimit a string into fields with a delimiter. It always assumes that there is a field before a delimiter, even if it is empty. As for the other empty strings, it is because the delimiters are adjacent.

Best way to split an array of strings in ruby?

You need map{...}, not map(...) for correct syntax in Ruby here:

array = ["string_apple", "string_banana", "string_orange"]

# Assign to a different array:
split_array = array.map{ |s| s.split(/_/) }

# Or modify the original array in place:
array.map!{ |s| s.split(/_/) }

# [["string", "apple"], ["string", "banana"], ["string", "orange"]]

split string into array without deleting delimiter ruby

You may use the following regex with .split:

/(?!\A)(?=^@\d{4}(?:-\d{2}){2} \d{2}(?::\d{2}){2},\d{3})/

It is a combination of two lookaheads:

  • (?!\A) - not at the start of the string
  • (?=^@\d{4}(?:-\d{2}){2} \d{2}(?::\d{2}){2},\d{3}) - start of a line position that is followed with @\d{4}(?:-\d{2}){2} \d{2}(?::\d{2}){2},\d{3} pattern that matches @, then a date-like pattern, then a space, adn then a time-like pattern.

See the regex demo.

How to split a string without getting an empty string inserted in the array

The empty element will always be there if you get a match, because the captured part appears at the beginning of the string and the string between the start of the string and the match is added to the resulting array, be it an empty or non-empty string. Either shift/drop it once you get a match, or just remove all empty array elements with .reject { |c| c.empty? } (see How do I remove blank elements from an array?).

Then, 14- is eaten up (consumed) by the \d+[[:space:]]... pattern part - put it into a (?=...) lookahead that will just check for the pattern match, but won't consume the characters.

Use something like

MY_SEPARATOR_TOKENS = ["-", " to "]
s = "M14-19"
puts s.split(/^(m|f)(?=\d+[[:space:]]*#{Regexp.union(MY_SEPARATOR_TOKENS)})/i).drop(1)
#=> ["M", "14-19"]

See Ruby demo



Related Topics



Leave a reply



Submit