What is the best way to chop a string into chunks of a given length in Ruby?
Use String#scan
:
>> 'abcdefghijklmnopqrstuvwxyz'.scan(/.{4}/)
=> ["abcd", "efgh", "ijkl", "mnop", "qrst", "uvwx"]
>> 'abcdefghijklmnopqrstuvwxyz'.scan(/.{1,4}/)
=> ["abcd", "efgh", "ijkl", "mnop", "qrst", "uvwx", "yz"]
>> 'abcdefghijklmnopqrstuvwxyz'.scan(/.{1,3}/)
=> ["abc", "def", "ghi", "jkl", "mno", "pqr", "stu", "vwx", "yz"]
How can I split a string into chunks?
The problem is that you're trying to perform an enumerable method on a non-enumerable object (a string). You can try using scan
on the string to find groups of 5:
arr = str.scan /.{1,5}/
If you wanted to go the enumerable route, you could first break up the string into a character array, get groups of 5, then join them back into 5-character strings:
arr = str.chars.each_slice(5).map(&:join)
Split string into chunks (of different size) without breaking words
This doesn't do the trick?:
def get_chunks(str, n = 3)
str.scan(/^.{1,25}\b|.{1,35}\b/).first(n).map(&:strip)
end
Split string into equal slices/chunks
What about this?
string.scan(/.{,#{L}}/)
Split a string into chunks of specified size without breaking words
How about:
str = "split a string into chunks according to a specific size. Seems easy enough, but here is the catch: I cannot be breaking words between chunks, so I need to catch when adding the next word will go over chunk size and start the next one (its ok if a chunk is less than specified size)."
str.scan(/.{1,25}\W/)
=> ["split a string into ", "chunks according to a ", "specific size. Seems easy ", "enough, but here is the ", "catch: I cannot be ", "breaking words between ", "chunks, so I need to ", "catch when adding the ", "next word will go over ", "chunk size and start the ", "next one (its ok if a ", "chunk is less than ", "specified size)."]
Update after @sawa comment:
str.scan(/.{1,25}\b|.{1,25}/).map(&:strip)
This is better as it doesn't require a string to end with \W
And it will handle words longer than specified length. Actually it will split them, but I assume this is desired behaviour
Split string into chunks of maximum character count without breaking words
This is what worked for me (thanks to @StefanPochmann's comments):
text = "Some really long string\nwith some line breaks"
The following will first remove all whitespace before breaking the string up.
text.gsub(/\s+/, ' ').scan(/.{1,2000}(?: |$)/).map(&:strip)
The resulting chunks of strings will lose all the line breaks (\n
) from the original string. If you need to maintain the line breaks, you need to replace them all with some random placeholder (before applying the regex), for example: (br)
, that you can use to restore the line breaks later. Like this:
text = "Some really long string\nwith some line breaks".gsub("\n", "(br)")
After we run the regex, we can restore the line breaks for the new chunks by replacing all occurrences of (br)
with \n
like this:
chunks = text.gsub(/\s+/, ' ').scan(/.{1,2000}(?: |$)/).map(&:strip)
chunks.each{|chunk| chunk.gsub!('(br)', "\n")}
Looks like a long process but it worked for me.
Chop a string in Ruby into fixed length string ignoring (not considering/regardless) new line or space characters
"This is some\nText\nThis is some text".scan(/.{1,17}/m)
# => ["This is some\nText", "\nThis is some tex", "t"]
Ruby: Split a string into substring of maximum 40 characters
Your first attempt:
sentence[0..40].gsub(/\s\w+$/,'')
almost works, but it has one fatal flaw. You are splitting on the number of characters before cutting off the last word. This means you have no way of knowing whether the bit being trimmed off was a whole word, or a partial word.
Because of this, your code will always cut off the last word.
I would solve the problem as follows:
sentence[/\A.{0,39}[a-z]\b/mi]
\A
is an anchor to fix the regex to the start of the string..{0,39}[a-z]
matches on 1 to 40 characters, where the last character must be a letter. This is to prevent the last selected character from being punctuation or space. (Is that desired behaviour? Your question didn't really specify. Feel free to tweak/remove that[a-z]
part, e.g.[a-z.]
to match a full stop, if desired.)\b
is a word boundary look-around. It is a zero-width matcher, on beginning/end of words./mi
modifiers will include case insensitive (i.e.A-Z
) and multi-line matches.
One very minor note is that because this regex is matching 1 to 40 characters (rather than zero), it is possible to get a null result. (Although this is seemingly very unlikely, since you'd need a 1-word, 41+ letter string!!) To account for this edge case, call .to_s
on the result if needed.
Update: Thank you for the improved edit to your question, providing a concrete example of an input/result. This makes it much clearer what you are asking for, as the original post was somewhat ambiguous.
You could solve this with something like the following:
sentence.scan(/.{0,39}[a-z.!?,;](?:\b|$)/mi)
String#scan
returns an array of strings that match the pattern - so you can then re-join these strings to reconstruct the original.- Again, I have added a few more characters (
!?,;
) to the list of "final characters in the substring". Feel free to tweak this as desired. (?:\b|$)
means "either a word boundary, or the end of the line". This fixes the issue of the result not including the final.
in the substrings. Note that I have used a non-capture group (?:
) to prevent the result ofscan
from changing.
Related Topics
Conditional Key/Value in a Ruby Hash
Passing Headers and Query Params in Httparty
How to Filter Parameters in Rails
In Ruby What Does "=>" Mean and How Does It Work
How to Unfreeze an Object in Ruby
Force Strings to Utf-8 from Any Encoding
How to Spawn a Child Process in Ruby
What Is the Modern Way to Structure a Ruby Gem
Iconv Deprecation Warning with Ruby 1.9.3
Handling Namespace Models (Classes) in Namespace
How to List All Objects Created from a Class in Ruby
Uninitialized Constant Rake::Dsl in Ruby Gem
How to Share Variables Across My .Rb Files
Rails App: Solr Throwing Rsolr::Error::Http - 404 Not Found When Executing Search