Ruby split by whitespace
The following should work for the example you gave:
str.gsub(/\s+/m, ' ').strip.split(" ")
it returns:
["aa", "bbb", "cc", "dd", "ee"]
Meaning of code:
/\s+/m
is the more complicated part. \s
means white space, so \s+
means one ore more white space letters. In the /m
part, m
is called a modifier, in this case it means, multiline, meaning visit many lines, not just one which is the default behavior.
So, /\s+/m
means, find sequences of one or more white spaces.
gsub
means replace all.
strip
is the equivalent of trim
in other languages, and removes spaces from the front and end of the string.
As, I was writing the explanation, it could be the case where you do end up with and end-line character at the end or the beginning of the string.
To be safe
The code could be written as:
str.gsub(/\s+/m, ' ').gsub(/^\s+|\s+$/m, '').split(" ")
So if you had:
str = "\n aa bbb\n cc dd ee\n\n"
Then you'd get:
["aa", "bbb", "cc", "dd", "ee"]
Meaning of new code:
^\s+
a sequence of white spaces at the beginning of the string
\s+$
a sequence of white spaces at the end of the string
So gsub(/^\s+|\s+$/m, '')
means remove any sequence of white space at the beginning of the string and at the end of the string.
Split string by whitespaces, ignoring escaped whitespaces
If your strings have no escape sequences, you may use a splitting approach with
.split(/(?<!\\)\s+/)
Here, (?<!\\)\s+
matches 1+ whitespaces (\s+
) that are not preceded with \
.
If your strings may contain escape sequences, a matching approach is preferable as it is more reliable:
.scan(/(?:[^\\\s]|\\.)+/)
See the Ruby demo.
It will match 1 or more characters other than \
and whitespace (with [^\\\s]
) and any escape sequence (matched with \\.
, a backslash + any char other than line break chars).
To get rid of \
symbols, you will have to use a gsub
later.
Preserving whitespace with .split()?
The Ruby doc for pattern-based splitting says:
If pattern is a String, then its contents are used as the delimiter
when splitting str. If pattern is a single space, str is split on
whitespace, with leading and trailing whitespace and runs of
contiguous whitespace characters ignored.
In other words, split(" ")
will treat any number of spaces as a unit to be split around:
"hello world".split(" ") # => ["hello", "world"]
Alternatively:
If pattern is a Regexp, str is divided where the pattern matches.
Whenever the pattern matches a zero-length string, str is split into
individual characters. If pattern contains groups, the respective
matches will be returned in the array as well.
Consequently, split(/ /)
will treat every space as a different point to split, and split(/(\s+)/)
(as proposed by darclander) will include the multiple space elements in the result. Illustrating this with underscores instead of spaces:
"hello___world".split(/_/) # => ["hello", "", "", "world"]
"hello___world".split(/(_+)/) # => ["hello", "___", "world"]
Given that reversing spaces gives the same number of spaces, a quick solution looks like:
"hello world".split(/(\s+)/).map(&:reverse).join # => "olleh dlrow"
As pointed out by Cary Swoveland, you may want to preserve output spacing in the case of mixed types of whitespace. Consider replacing map(&:reverse)
with a block that preserves whitespace but reverses non-whitespace, such as map { |s| s.strip.empty? ? s : s.reverse }
.
Split string in ruby based on whitespace
In ruby this would simply be String#split
:
2.2.3 :001 > str = 'I am a student of xyz University'
=> "I am a student of xyz University"
2.2.3 :002 > str.split
=> ["I", "am", "a", "student", "of", "xyz", "University"]
Split a string by whitespace but retain \n - Ruby
Michael Berkowski's comment on your question is correct.
If you want to work around this case, use a regular expression:
"Lorem ipsum\ndolor sit amet".split(/ /)
#=> ["Lorem", "ipsum\ndolor", "sit", "amet"]
Ruby: Split, then remove leading/trailing whitespace in place?
s = "one thing, two things, three things, four things"
s.split(",").map(&:strip)
# => ["one thing", "two things", "three things", "four things"]
In my Ubuntu 13.04 OS,using Ruby 2.0.0p0
require 'benchmark'
s = "one thing, two things, three things, four things"
result = ""
Benchmark.bmbm do |b|
b.report("strip/split: ") { 1_000_000.times {result = s.split(",").map(&:strip)} }
b.report("regex: ") { 1_000_000.times {result = s.split(/\s*,\s*/)} }
end
Rehearsal -------------------------------------------------
strip/split: 6.260000 0.000000 6.260000 ( 6.276583)
regex: 7.310000 0.000000 7.310000 ( 7.320001)
--------------------------------------- total: 13.570000sec
user system total real
strip/split: 6.350000 0.000000 6.350000 ( 6.363127)
regex: 7.290000 0.000000 7.290000 ( 7.302163)
Split string by every first space in ruby
Maybe using scan
would give you your expected output easier:
p str.scan(/[A-Z]+|\s{3}/)
# ["AB", "C", " ", "D", "E", " ", "F"]
As your input is only capitalized characters, [A-Z]
would work, /[a-z]/i
is for both cases.
Wondering why such an output:
p str.scan(/[A-Z]+|\s{3}/).map(&:split)
# [["AB"], ["C"], [], ["D"], ["E"], [], ["F"]]
Ruby - split string made up of emails at space or comma
What you need is a character set, denoted by []
.
@emails.split(/[,\s]+/)
The []
say to match any character in that set. The +
is there because you want to treat multiple spaces between emails as a single separator.
Split string by spaces into an array
String#split
doesn't actually need an argument if you want to split around whitespaces :
"15 17 21 46".split
#=> ["15", "17", "21", "46"]
If you want to specify an argument, you need to use a space, not an empty string :
"15 17 21 46".split(' ')
#=> ["15", "17", "21", "46"]
And if you want to convert those strings to integers :
"15 17 21 46".split(' ').map(&:to_i)
#=> [15, 17, 21, 46]
Related Topics
How to Solve Insecure World Writable Dir /Usr in Path,Mode 040777 Warning on Ruby
How to Convert a Ruby Object to JSON
How to Get a Particular Line from a File
Calling Sinatra from Within Sinatra
Converting a Hexadecimal Digest to Base64 in Ruby
How to Handle a Thread Issue in Zeromq + Ruby
New Rails Project: 'Bundle Install' Can't Install Rails in Gemfile
How to Configure Ruby to Enter the Debugger on Ctrl-C (Sigint)
Changing Every Value in a Hash in Ruby
How to Compile Ruby to Byte Code as with Python
Before/After Suite When Using Ruby Minitest
How to Test CSV File Download in Capybara and Rspec
Disable Sprockets Asset Caching in Development
Is There a Cucumber Hook to Run Before and After Each Feature
Cannot Login to Amazon with Ruby Mechanize
Decrypting Salted Aes File Generated on Command Line with Ruby