Ruby String Split with Terminal Strings Empty

ruby string split with terminal strings empty

You need to say:

string.split(',',-1)

to avoid omitting the trailing blanks.

per Why does Ruby String#split not treat consecutive trailing delimiters as separate entities?

The second parameter is the "limit" parameter, documented at http://ruby-doc.org/core-2.0.0/String.html#method-i-split as follows:

If the "limit" parameter is omitted, trailing null fields are
suppressed. If limit is a positive number, at most that number of
fields will be returned (if limit is 1, the entire string is returned
as the only entry in an array). If negative, there is no limit to the
number of fields returned, and trailing null fields are not
suppressed.

Ignore empty captures when splitting string

When splitting with a regex containing capturing groups, consecutive matches always produce empty array items.

Rather than switch to a matching approach, use

arr = arr.reject { |c| c.empty? }

Or any other method, see How do I remove blank elements from an array?

Else, you will have to match the substrings using a regex that will match the deilimiters first and then any text that does not start the delimiter texts (that is, you will need to build a tempered greedy token):

arr = s.scan(/(?x)\*{2}|[*\n.]|(?:(?!\*{2})[^*\n.])+/)

See the regex demo.

Here,

  • (?x) - a freespacing/comment modifier
  • \*{2} - ** substring
  • | - or
  • [*\n.] - a char that is either *, newline LF or a .
  • | - or
  • (?:(?!\*{2})[^*\n.])+ - 1 or more (+) chars that are not *, LF or . ([^*\n.]) that do not start a ** substring.

Splitting into empty substrings

Logic is described in the documentation:

If the limit parameter is omitted, trailing null fields are suppressed.

Trailing empty fields are removed, but not leading ones.


If, by any chance, what you were asking is "yeah, but where's the logic in that?", then imagine we're parsing some CSV.

fname,sname,id,email,status
,,1,sergio@example.com,

We want the first two position to remain empty (rather than be removed and have fname become 1 and sname - sergio@example.com).

We care less about trailing empty fields. Removed or kept, they don't shift data.

Why does Ruby String#split not treat consecutive trailing delimiters as separate entities?

You need to pass a negative value as the second parameter to split. This prevents it from suppressing trailing null fields:

"w$x$$\r\n".chomp.split('$', -1)
# => ["w", "x", "", ""]

See the docs on split.

The letter disapperaed after Splitting string in my ruby program

It looks like you were expecting String#tr to behave like String#gsub.

Calling string.tr("GPS:", '') does not replace the complete string "GPS:" with the empty string. Instead, it replaces any character from within the string "GPS:" with an empty string. Commonly you will find .tr() called with an equal number of input and replacement characters, and in that case the input character is replaced by the output character in the corresponding position. But the way you have called it with only the empty string '' as its translation argument, will delete any of G, P, S, : from anywhere within the string.

>> "String with S and G and a: P".tr("GPS:", '')
=> "tring with and and a "

Instead, use .gsub('GPS:', '') to replace the complete match as a group.

string = "GPS:3;S23.164865;E113.428970;88"
info = string.gsub('GPS:', '')
info_array = info.split(";")
puts "GPS: #{info_array[0]},#{info_array[1]},#{info_array[2]}"

# prints
GPS: 3,S23.164865,E113.428970

Here we've called .gsub() with a string argument. It is probably more often called with a regexp search match argument though.

Split a string into a string and an integer

Use a positive lookbehind assertion based regex in string.split.

> "10480ABCD".split(/(?<=\d)(?=[A-Za-z])/)
=> ["10480", "ABCD"]
  • (?<=\d) Positive lookbehind which asserts that the match must be preceded by a digit character.

  • (?=[A-Za-z]) which asserts that the match must be followed by an alphabet. So the above regex would match the boundary which exists between a digit and an alphabet. Splitting your input according to the matched boundary will give you the desired output.

OR

Use string.scan

> "10480ABCD".scan(/\d+|[A-Za-z]+/)
=> ["10480", "ABCD"]

Split a string with multiple delimiters in Ruby

What about the following:

options.gsub(/ or /i, ",").split(",").map(&:strip).reject(&:empty?)
  • replaces all delimiters but the ,
  • splits it at ,
  • trims each characters, since stuff like ice cream with a leading space might be left
  • removes all blank strings

Can't split/strip by space in a string in Ruby because it's an NBSP character

You should split on all whitespaces, including the non-ASCII ones:

a, b = str.split(/[[:space:]]/)

I'm assuming you are using Ruby 1.9+ and that your str has the right encoding (e.g. utf-8). As explained in the regex reference, \s matches only ASCII spaces, while [[:space:]] will match all unicode spaces (same for \d vs [[:digit:]], etc...)

What is the best way to split a string to get all the substrings by Ruby?

def split_word s
(0..s.length).inject([]){|ai,i|
(1..s.length - i).inject(ai){|aj,j|
aj << s[i,j]
}
}.uniq
end

And you can also consider using Set instead of Array for the result.

PS: Here's another idea, based on array product:

def split_word s
indices = (0...s.length).to_a
indices.product(indices).reject{|i,j| i > j}.map{|i,j| s[i..j]}.uniq
end

Using string .split (and a regular expression) to check for inner quotes

You want to use this regular expression (see on rubular.com):

/"[^"]*"|'[^']*'|[^"'\s]+/

This regex matches the tokens instead of the delimiters, so you'd want to use scan instead of split.

The […] construct is called a character class. [^"] is "anything but the double quote".

There are essentially 3 alternates:

  • "[^"]*" - double quoted token (may include spaces and single quotes)
  • '[^']*' - single quoted token (may include spaces and double quotes)
  • [^"'\s]+ - a token consisting of one or more of anything but quotes and whitespaces

References

  • regular-expressions.info/Character Class

Snippet

Here's a Ruby implementation:

s = %_foobar "your mom"bar'test course''test lesson'asdf_
puts s

puts s.scan(/"[^"]*"|'[^']*'|[^"'\s]+/)

The above prints (as seen on ideone.com):

foobar "your mom"bar'test course''test lesson'asdf
foobar
"your mom"
bar
'test course'
'test lesson'
asdf

See also

  • Which style of Ruby string quoting do you favour?


Related Topics



Leave a reply



Submit