Ruby Split by Whitespace

Ruby split by whitespace

The following should work for the example you gave:

str.gsub(/\s+/m, ' ').strip.split(" ")

it returns:

["aa", "bbb", "cc", "dd", "ee"]

Meaning of code:

/\s+/m is the more complicated part. \s means white space, so \s+ means one ore more white space letters. In the /m part, m is called a modifier, in this case it means, multiline, meaning visit many lines, not just one which is the default behavior.
So, /\s+/m means, find sequences of one or more white spaces.

gsub means replace all.

strip is the equivalent of trim in other languages, and removes spaces from the front and end of the string.

As, I was writing the explanation, it could be the case where you do end up with and end-line character at the end or the beginning of the string.

To be safe

The code could be written as:

str.gsub(/\s+/m, ' ').gsub(/^\s+|\s+$/m, '').split(" ")

So if you had:

str = "\n     aa bbb\n    cc    dd ee\n\n"

Then you'd get:

["aa", "bbb", "cc", "dd", "ee"]

Meaning of new code:

^\s+ a sequence of white spaces at the beginning of the string

\s+$ a sequence of white spaces at the end of the string

So gsub(/^\s+|\s+$/m, '') means remove any sequence of white space at the beginning of the string and at the end of the string.

Split string by whitespaces, ignoring escaped whitespaces

If your strings have no escape sequences, you may use a splitting approach with

.split(/(?<!\\)\s+/)

Here, (?<!\\)\s+ matches 1+ whitespaces (\s+) that are not preceded with \.

If your strings may contain escape sequences, a matching approach is preferable as it is more reliable:

.scan(/(?:[^\\\s]|\\.)+/)

See the Ruby demo.

It will match 1 or more characters other than \ and whitespace (with [^\\\s]) and any escape sequence (matched with \\., a backslash + any char other than line break chars).

To get rid of \ symbols, you will have to use a gsub later.

Preserving whitespace with .split()?

The Ruby doc for pattern-based splitting says:

If pattern is a String, then its contents are used as the delimiter
when splitting str. If pattern is a single space, str is split on
whitespace, with leading and trailing whitespace and runs of
contiguous whitespace characters ignored.

In other words, split(" ") will treat any number of spaces as a unit to be split around:

"hello   world".split(" ")  # => ["hello", "world"]

Alternatively:

If pattern is a Regexp, str is divided where the pattern matches.
Whenever the pattern matches a zero-length string, str is split into
individual characters. If pattern contains groups, the respective
matches will be returned in the array as well.

Consequently, split(/ /) will treat every space as a different point to split, and split(/(\s+)/) (as proposed by darclander) will include the multiple space elements in the result. Illustrating this with underscores instead of spaces:

"hello___world".split(/_/)  # =>  ["hello", "", "", "world"]
"hello___world".split(/(_+)/) # => ["hello", "___", "world"]

Given that reversing spaces gives the same number of spaces, a quick solution looks like:

"hello   world".split(/(\s+)/).map(&:reverse).join  # => "olleh   dlrow"

As pointed out by Cary Swoveland, you may want to preserve output spacing in the case of mixed types of whitespace. Consider replacing map(&:reverse) with a block that preserves whitespace but reverses non-whitespace, such as map { |s| s.strip.empty? ? s : s.reverse }.

Split string in ruby based on whitespace

In ruby this would simply be String#split:

2.2.3 :001 > str = 'I am a student of xyz University'
=> "I am a student of xyz University"
2.2.3 :002 > str.split
=> ["I", "am", "a", "student", "of", "xyz", "University"]

Split a string by whitespace but retain \n - Ruby

Michael Berkowski's comment on your question is correct.

If you want to work around this case, use a regular expression:

"Lorem ipsum\ndolor sit amet".split(/ /)
#=> ["Lorem", "ipsum\ndolor", "sit", "amet"]

Ruby: Split, then remove leading/trailing whitespace in place?


s = "one thing, two things, three things, four things"
s.split(",").map(&:strip)
# => ["one thing", "two things", "three things", "four things"]

In my Ubuntu 13.04 OS,using Ruby 2.0.0p0

require 'benchmark'

s = "one thing, two things, three things, four things"
result = ""

Benchmark.bmbm do |b|
b.report("strip/split: ") { 1_000_000.times {result = s.split(",").map(&:strip)} }
b.report("regex: ") { 1_000_000.times {result = s.split(/\s*,\s*/)} }
end

Rehearsal -------------------------------------------------
strip/split: 6.260000 0.000000 6.260000 ( 6.276583)
regex: 7.310000 0.000000 7.310000 ( 7.320001)
--------------------------------------- total: 13.570000sec

user system total real
strip/split: 6.350000 0.000000 6.350000 ( 6.363127)
regex: 7.290000 0.000000 7.290000 ( 7.302163)

Split string by every first space in ruby

Maybe using scan would give you your expected output easier:

p str.scan(/[A-Z]+|\s{3}/)
# ["AB", "C", " ", "D", "E", " ", "F"]

As your input is only capitalized characters, [A-Z] would work, /[a-z]/i is for both cases.

Wondering why such an output:

p str.scan(/[A-Z]+|\s{3}/).map(&:split)
# [["AB"], ["C"], [], ["D"], ["E"], [], ["F"]]

Ruby - split string made up of emails at space or comma

What you need is a character set, denoted by [].

@emails.split(/[,\s]+/)

The [] say to match any character in that set. The + is there because you want to treat multiple spaces between emails as a single separator.

Split string by spaces into an array

String#split doesn't actually need an argument if you want to split around whitespaces :

"15 17 21 46".split
#=> ["15", "17", "21", "46"]

If you want to specify an argument, you need to use a space, not an empty string :

"15 17 21 46".split(' ')
#=> ["15", "17", "21", "46"]

And if you want to convert those strings to integers :

"15 17 21 46".split(' ').map(&:to_i)
#=> [15, 17, 21, 46]


Related Topics



Leave a reply



Submit