Rails gem to break a paragraph into series of sentences
There are two non-trivial tasks to achieve what you are after:
- splitting a string into sentences
- and word-wrapping each sentence with extra care for punctuation.
I think the first one is not easy to implement from scratch so your best bet might just be to use natural language processing libraries provided that your "third-party language processing service" doesn't have such a feature. I don't know any "rails gem" to meet your requirement.
Here is just a toy example of splitting a string into sentences using stanford-core-nlp.
require 'stanford-core-nlp'
text = "Lorem ipsum, consectetur elit. Donec ut ligula. Sed acumsan posuere tristique. Sed et tristique sem. Aenean sollicitudin, sapien sodales elementum blandit. Fusce urna libero blandit eu aliquet ac rutrum vel tortor."
pipeline = StanfordCoreNLP.load(:tokenize, :ssplit)
a = StanfordCoreNLP::Annotation.new(text)
pipeline.annotate(a)
sentenses = a.get(:sentences).to_a.map &:to_s # Map with to_s if you want an array of sentence string.
# => ["Lorem ipsum, consectetur elit.", "Donec ut ligula.", "Sed acumsan posuere tristique.", "Sed et tristique sem.", "Aenean sollicitudin, sapien sodales elementum blandit.", "Fusce urna libero blandit eu aliquet ac rutrum vel tortor."]
The second problem is similar to word-wrapping and if it exactly were a word-wrapping problem, it should be easily solved using existing implementations like ActionView::Helpers::TextHelper.word_wrap.
However, there is an extra requirement concerning punctuations. I don't know any existing implementation to achieve exactly the same goal of yours. Maybe you have to come up with your own solution.
My only idea is to firstly word-wrap each sentence, secondly split each line with a punctuation and then join the pieces again but with limitation on length. I wonder if this would work though.
How to split text per paragraph based on length?
First you should split your text to single sentences.
Here's a simple, far-from-perfect way for doing this (I'm sure you could find plenty of more complete patterns elsewhere):
'Gsda asd. Gasd sasd. Tfed fdd.'.scan(/(.+?\.) ?/).map(&:first)
#=> ["Gsda asd.", "Gasd sasd.", "Tfed fdd."]
Then, you should join these sentences, keeping an eye of the paragraph length. You can use something like this:
# using words as units, but sentences are just the same:
s = ['foo', 'bar', 'beef', 'baz', 'hello', 'chunky', 'bacon']
LEN = 7 # minimum length of a paragraph
s.inject([]){|a,i|
if !a.last || a.last.length > LEN
a << i
else
a.last << " #{i}"
end
a
}
#=> ["foo bar beef", "baz hello", "chunky bacon"]
how to break long text to smaller lines by words in ruby/rails?
Rails comes with the word_wrap
helper which can split long lines based on a given line width. It always splits at whitespace so long words won't get split / cut.
In rails/console
:
lines = helper.word_wrap("a b c d e text longword", line_width: 5)
#=> "a b c\nd e\ntext\nlongword"
puts lines
Output:
a b c
d e
text
longword
Note that it returns a string, not an array.
Can a string be broken into multiple paragraph elements while iterating in Rails 4?
A simple place to start would be with Rails's simple_format
helper.
<%= simple_format post.entry %>
It formats a single line break as a <br />
and 2 consecutive line breaks as a new paragraph.
So this input:
Hi
I'm on a
new paragraph
Would be formatted as such:
<p>
Hi<br />
I'm on a
</p>
<p>
new paragraph
</p>
You could also consider integrating a Markdown parser later if you want to get more advanced.
How to capitalize first character of each sentence in rails model
class Question < ActiveRecord::Base
before_save :capitalize_attributes
def capitalize_attributes
self.question = capitalize_sentences(question)
self.description = capitalize_sentences(description)
end
def capitalize_sentences(string)
unless string.blank?
string.split('.').map do |sentence|
sentence.strip.capitalize
end.join(' ')
end
end
end
Ruby on Rails Truncate text - can I use it for a combination of title and content?
You can just concatenate the two parts and truncate the result. Maybe factor it out into a helper method as well:
def truncate_topic(topic)
full_text = link_to(topic.title, topic) + ' - ' + topic.description_without_embed
truncate(full_text, :length => 50, :omission => "...")
end
And then in your view:
<%= truncate_topic(topic) %>
How do I keep the delimiters when splitting a Ruby string?
Answer
Use a positive lookbehind regular expression (i.e. ?<=
) inside a parenthesis capture group to keep the delimiter at the end of each string:
content.split(/(?<=[?.!])/)
# Returns an array with:
# ["Do you like to code?", " How I love to code!", " I'm always coding."]
That leaves a white space at the start of the second and third strings. Add a match for zero or more white spaces (\s*
) after the capture group to exclude it:
content.split(/(?<=[?.!])\s*/)
# Returns an array with:
# ["Do you like to code?", "How I love to code!", "I'm always coding."]
Additional Notes
While it doesn't make sense with your example, the delimiter can be shifted to the front of the strings starting with the second one. This is done with a positive lookahead regular expression (i.e. ?=
). For the sake of anyone looking for that technique, here's how to do that:
content.split(/(?=[?.!])/)
# Returns an array with:
# ["Do you like to code", "? How I love to code", "! I'm always coding", "."]
A better example to illustrate the behavior is:
content = "- the - quick brown - fox jumps"
content.split(/(?=-)/)
# Returns an array with:
# ["- the ", "- quick brown ", "- fox jumps"]
Notice that the square bracket capture group wasn't necessary since there is only one delimiter. Also, since the first match happens at the first character it ends up as the first item in the array.
Don't break paragraph when new page (Prawn)
You could just do :
pdf.group do
#Your code
end
Is that what you were looking for ??
Related Topics
Ruby: Cannot Install Watir Gem on Windows
Nokogiri Issues with Ruby on Rails
Project Euler #3 in Ruby Solution Times Out
Ruby Outputting to the Same Line as the Previous Output
How to Prevent Rails Controller Generator to Modify Config/Routes.Rb
Why Are My Rspec Tests Failing, But My App Is Working
Losing an Attribute When Saving Through an Association W/ Scope (Rails 4.0.0)
Prawn PDF: I Need to Generate Nested Tables
Issues with Installing Ruby 2.0.0 on MACos Catalina
Ruby 'Require' Call Fails on Custom Code
Ruby on Rails Active Admin Has_Many Changing Dropdown to Use a Different Column
Requiring a Ruby Gem in Ruby Script Breaks Cron Job Execution
Can't Setup Ruby Environment - Installing Fii Gem Error
Ruby - What's the Difference Between Single and Double Quotes
Ruby 'Range.Last' Does Not Give the Last Value. Why
Conditional Page Caching [Solution: Conditional Fragment Caching]