Splitting a String into Words and Punctuation with Ruby

Splitting a string into words and punctuation with Ruby

You can try the following:

s.scan(/[\w'-]+|[[:punct:]]+/)
#=> ["here", "...", "is", "a", "happy-go-lucky", "string", "that", "I'm", "writing"]

String splitting with unknown punctuation in Ruby

split is useful when you can more easily describe the delimiters than the parts to be extracted. In your case, you can more easily describe the parts to be extracted rather than the delimiters, in which case scan is more suited. It is a wrong decision to use split. You should you scan.

text.scan(/[\w']+/)
# => ["some", "string", "with", "punctuation", "for", "example", "things", "I", "don't", "know", "about", "that", "may", "or", "may", "not", "have", "whitespaces", "and", "random", "characters"]

If you want to replace the matches, there is even more reason to not use split. In that case, you should use gsub.

text.gsub(/[\w']+/) do |word|
if word.is_of_certain_part_of_speech?
"___" # Replace it with `"___"`.
else
word # Put back the original word.
end
end

How can I use regex in Ruby to split a string into an array of the words it contains?

You may use a matching approach to extract chunks of 2 or more uppercase letters or a letter followed only with 0+ lowercase letters:

s.scan(/\p{Lu}{2,}|\p{L}\p{Ll}*/).map(&:downcase)

See the Ruby demo and the Rubular demo.

The regex matches:

  • \p{Lu}{2,} - 2 or more uppercase letters
  • | - or
  • \p{L} - any letter
  • \p{Ll}* - 0 or more lowercase letters.

With map(&:downcase), the items you get with .scan() are turned to lower case.

Ruby: Extracting Words From String

The split command.

   words = @string1.split(/\W+/)

will split the string into an array based on a regular expression. \W means any "non-word" character and the "+" means to combine multiple delimiters.

Is there a way to split a string by spaces and commas but preserving the coma in the resulting array?

This is my take on it.

text = "my bike, is very big"
text_array = text.split(/(\W+)/)
parsed_text_array = text_array.map { |item|
next if item.eql?(" ")
item.strip
}.compact

print parsed_text_array

# ~> ["my", "bike", ",", "is", "very", "big"]

Hope this helps :)

Split body of text into sentences but keep punctuation?

I think that should be \0

>> string = "I am a lion. Hear me roar! Where is my cub? Never mind, found him."
>> string.gsub(/[.?!]/, '\0|')
# "I am a lion.| Hear me roar!| Where is my cub?| Never mind, found him.|"

Split a string into an array of words, punctuation and spaces in JavaScript

Use String#match method with regex /\w+|\s+|[^\s\w]+/g.

  1. \w+ - for any word match
  2. \s+ - for whitespace
  3. [^\s\w]+ - for matching combination of anything other than whitespace and word character.

var text = "I like grumpy cats. Do you?";
console.log( text.match(/\w+|\s+|[^\s\w]+/g))

Split string into an array by comma, unless comma is inside quotes

The Ruby standard CSV library's .parse_csv, does exactly this.

require 'csv'
"\"hey, you\", 21".parse_csv
# => ["hey, you", " 21"]

Ruby - split string made up of emails at space or comma

What you need is a character set, denoted by [].

@emails.split(/[,\s]+/)

The [] say to match any character in that set. The + is there because you want to treat multiple spaces between emails as a single separator.



Related Topics



Leave a reply



Submit