Splitting String into Pair of Characters in Ruby

Splitting string into pair of characters in Ruby

Try the String object's scan method:

>> foo = "AABBCCDDEEFF"
=> "AABBCCDDEEFF"
>> foo.scan(/../)
=> ["AA", "BB", "CC", "DD", "EE", "FF"]

How do Slice a string into pairs

input = "2209222717080109"
input.chars.each_slice(2).map(&:join)
["22", "09", "22", "27", "17", "08", "01", "09"]

How to split string into only two parts with a given character in Ruby?

String#split takes a second argument, the limit.

str.split(' ', 2)

should do the trick.

How to split the string by certain amount of characters in Ruby

Using Enumerable#each_slice

'some_string'.chars.each_slice(3).map(&:join)
# => ["som", "e_s", "tri", "ng"]

Using regular expression:

'some_string'.scan(/.{1,3}/)
# => ["som", "e_s", "tri", "ng"]

How to split each 2 word in string into array - Ruby?

Input

str = "how are you to day"

Code

p str.split(/\s/)
.each_cons(2)
.map { |str| str.join(" ") }

Output

["how are", "are you", "you to", "to day"]

Ruby #split() vs #chars on a string

What is the difference between split and chars [...]?

string.chars parses the underlying bytes to returns the string's characters, whereas string.split('') uses a regular expression to achieve the same.

As a result, chars is faster and more robust. It even works if the string contains invalid characters:

"foo\x80bar".chars
#=> ["f", "o", "o", "\x80", "b", "a", "r"]

Whereas split fails if the string is malformed (because the regex engine can't handle it):

"foo\x80bar".split('')
#=> ArgumentError: invalid byte sequence in UTF-8

If I'm not mistaken, split('') is equivalent to split(//).

Is there a scenario where one is preferable?

split('') can be found in many tutorials. I assume this is because prior to Ruby 2.x, chars returned an enumerator. So in order to get an array you had to use two method calls:

string.chars.to_a

or a single call to: (which is also slightly shorter)

string.split('')

Nowadays, you'd use chars (or each_char for the pre-2.x behavior)

How to split a string into key-value pairs?

Hash[s.scan(/\@\w+/).zip s.split(/\s?\@\w+\s/).drop(1)]

Ruby: Split a string into substring of maximum 40 characters

Your first attempt:

sentence[0..40].gsub(/\s\w+$/,'')

almost works, but it has one fatal flaw. You are splitting on the number of characters before cutting off the last word. This means you have no way of knowing whether the bit being trimmed off was a whole word, or a partial word.

Because of this, your code will always cut off the last word.

I would solve the problem as follows:

sentence[/\A.{0,39}[a-z]\b/mi]
  • \A is an anchor to fix the regex to the start of the string.
  • .{0,39}[a-z] matches on 1 to 40 characters, where the last character must be a letter. This is to prevent the last selected character from being punctuation or space. (Is that desired behaviour? Your question didn't really specify. Feel free to tweak/remove that [a-z] part, e.g. [a-z.] to match a full stop, if desired.)
  • \b is a word boundary look-around. It is a zero-width matcher, on beginning/end of words.
  • /mi modifiers will include case insensitive (i.e. A-Z) and multi-line matches.

One very minor note is that because this regex is matching 1 to 40 characters (rather than zero), it is possible to get a null result. (Although this is seemingly very unlikely, since you'd need a 1-word, 41+ letter string!!) To account for this edge case, call .to_s on the result if needed.


Update: Thank you for the improved edit to your question, providing a concrete example of an input/result. This makes it much clearer what you are asking for, as the original post was somewhat ambiguous.

You could solve this with something like the following:

sentence.scan(/.{0,39}[a-z.!?,;](?:\b|$)/mi)
  • String#scan returns an array of strings that match the pattern - so you can then re-join these strings to reconstruct the original.
  • Again, I have added a few more characters (!?,;) to the list of "final characters in the substring". Feel free to tweak this as desired.
  • (?:\b|$) means "either a word boundary, or the end of the line". This fixes the issue of the result not including the final . in the substrings. Note that I have used a non-capture group (?:) to prevent the result of scan from changing.


Related Topics



Leave a reply



Submit