Splitting a string into words and punctuation with Ruby
You can try the following:
s.scan(/[\w'-]+|[[:punct:]]+/)
#=> ["here", "...", "is", "a", "happy-go-lucky", "string", "that", "I'm", "writing"]
String splitting with unknown punctuation in Ruby
split
is useful when you can more easily describe the delimiters than the parts to be extracted. In your case, you can more easily describe the parts to be extracted rather than the delimiters, in which case scan
is more suited. It is a wrong decision to use split
. You should you scan
.
text.scan(/[\w']+/)
# => ["some", "string", "with", "punctuation", "for", "example", "things", "I", "don't", "know", "about", "that", "may", "or", "may", "not", "have", "whitespaces", "and", "random", "characters"]
If you want to replace the matches, there is even more reason to not use split
. In that case, you should use gsub
.
text.gsub(/[\w']+/) do |word|
if word.is_of_certain_part_of_speech?
"___" # Replace it with `"___"`.
else
word # Put back the original word.
end
end
How can I use regex in Ruby to split a string into an array of the words it contains?
You may use a matching approach to extract chunks of 2 or more uppercase letters or a letter followed only with 0+ lowercase letters:
s.scan(/\p{Lu}{2,}|\p{L}\p{Ll}*/).map(&:downcase)
See the Ruby demo and the Rubular demo.
The regex matches:
\p{Lu}{2,}
- 2 or more uppercase letters|
- or\p{L}
- any letter\p{Ll}*
- 0 or more lowercase letters.
With map(&:downcase)
, the items you get with .scan()
are turned to lower case.
Ruby: Extracting Words From String
The split command.
words = @string1.split(/\W+/)
will split the string into an array based on a regular expression. \W means any "non-word" character and the "+" means to combine multiple delimiters.
Is there a way to split a string by spaces and commas but preserving the coma in the resulting array?
This is my take on it.
text = "my bike, is very big"
text_array = text.split(/(\W+)/)
parsed_text_array = text_array.map { |item|
next if item.eql?(" ")
item.strip
}.compact
print parsed_text_array
# ~> ["my", "bike", ",", "is", "very", "big"]
Hope this helps :)
Split body of text into sentences but keep punctuation?
I think that should be \0
>> string = "I am a lion. Hear me roar! Where is my cub? Never mind, found him."
>> string.gsub(/[.?!]/, '\0|')
# "I am a lion.| Hear me roar!| Where is my cub?| Never mind, found him.|"
Split a string into an array of words, punctuation and spaces in JavaScript
Use String#match
method with regex /\w+|\s+|[^\s\w]+/g
.
\w+
- for any word match\s+
- for whitespace[^\s\w]+
- for matching combination of anything other than whitespace and word character.
var text = "I like grumpy cats. Do you?";
console.log( text.match(/\w+|\s+|[^\s\w]+/g))
Split string into an array by comma, unless comma is inside quotes
The Ruby standard CSV library's .parse_csv
, does exactly this.
require 'csv'
"\"hey, you\", 21".parse_csv
# => ["hey, you", " 21"]
Ruby - split string made up of emails at space or comma
What you need is a character set, denoted by []
.
@emails.split(/[,\s]+/)
The []
say to match any character in that set. The +
is there because you want to treat multiple spaces between emails as a single separator.
Related Topics
Idiomatically Mock Openuri.Open_Uri with Minitest
Nokogiri Fails to Install on Os X
Return True Only If All Values Evaluate to True in Ruby
Why Is Ruby's Loop Command Slower Than While True
How to Reinstall Ruby with Readline Support
Iterate Through Array of Hashes in Ruby
Access 'Self' of an Object Through the Parameters
Symbol#To_Proc Shorthand with the Stabby Lambda Syntax
Exclude Some Ids from Result in Rails Activerecord
Convert String Numbers( in Word Format) to Integer Ruby
Openuri Causing 401 Unauthorized Error with Https Url
Regex to Match Mm/Dd/Yyyy Hh:Mm:Ss Am or Pm
Rails 3.2 Force_Ssl Except on Landing Page
Any Ruby Library to Inspect What Are the Arguments That a Certain Methods Take