Rails Gem to Break a Paragraph into Series of Sentences

Rails gem to break a paragraph into series of sentences

There are two non-trivial tasks to achieve what you are after:

  1. splitting a string into sentences
  2. and word-wrapping each sentence with extra care for punctuation.

I think the first one is not easy to implement from scratch so your best bet might just be to use natural language processing libraries provided that your "third-party language processing service" doesn't have such a feature. I don't know any "rails gem" to meet your requirement.

Here is just a toy example of splitting a string into sentences using stanford-core-nlp.

require 'stanford-core-nlp'
text = "Lorem ipsum, consectetur elit. Donec ut ligula. Sed acumsan posuere tristique. Sed et tristique sem. Aenean sollicitudin, sapien sodales elementum blandit. Fusce urna libero blandit eu aliquet ac rutrum vel tortor."
pipeline = StanfordCoreNLP.load(:tokenize, :ssplit)
a = StanfordCoreNLP::Annotation.new(text)
pipeline.annotate(a)
sentenses = a.get(:sentences).to_a.map &:to_s # Map with to_s if you want an array of sentence string.
# => ["Lorem ipsum, consectetur elit.", "Donec ut ligula.", "Sed acumsan posuere tristique.", "Sed et tristique sem.", "Aenean sollicitudin, sapien sodales elementum blandit.", "Fusce urna libero blandit eu aliquet ac rutrum vel tortor."]

The second problem is similar to word-wrapping and if it exactly were a word-wrapping problem, it should be easily solved using existing implementations like ActionView::Helpers::TextHelper.word_wrap.
However, there is an extra requirement concerning punctuations. I don't know any existing implementation to achieve exactly the same goal of yours. Maybe you have to come up with your own solution.

My only idea is to firstly word-wrap each sentence, secondly split each line with a punctuation and then join the pieces again but with limitation on length. I wonder if this would work though.

How to split text per paragraph based on length?

First you should split your text to single sentences.

Here's a simple, far-from-perfect way for doing this (I'm sure you could find plenty of more complete patterns elsewhere):

'Gsda asd. Gasd sasd. Tfed fdd.'.scan(/(.+?\.) ?/).map(&:first)
#=> ["Gsda asd.", "Gasd sasd.", "Tfed fdd."]

Then, you should join these sentences, keeping an eye of the paragraph length. You can use something like this:

# using words as units, but sentences are just the same:
s = ['foo', 'bar', 'beef', 'baz', 'hello', 'chunky', 'bacon']
LEN = 7 # minimum length of a paragraph
s.inject([]){|a,i|
if !a.last || a.last.length > LEN
a << i
else
a.last << " #{i}"
end
a
}
#=> ["foo bar beef", "baz hello", "chunky bacon"]

how to break long text to smaller lines by words in ruby/rails?

Rails comes with the word_wrap helper which can split long lines based on a given line width. It always splits at whitespace so long words won't get split / cut.

In rails/console:

lines = helper.word_wrap("a b c d e text longword", line_width: 5)
#=> "a b c\nd e\ntext\nlongword"

puts lines

Output:

a b c
d e
text
longword

Note that it returns a string, not an array.

Can a string be broken into multiple paragraph elements while iterating in Rails 4?

A simple place to start would be with Rails's simple_format helper.

<%= simple_format post.entry %>

It formats a single line break as a <br /> and 2 consecutive line breaks as a new paragraph.

So this input:

Hi
I'm on a

new paragraph

Would be formatted as such:

<p>
Hi<br />
I'm on a
</p>
<p>
new paragraph
</p>

You could also consider integrating a Markdown parser later if you want to get more advanced.

How to capitalize first character of each sentence in rails model

class Question < ActiveRecord::Base
before_save :capitalize_attributes

def capitalize_attributes
self.question = capitalize_sentences(question)
self.description = capitalize_sentences(description)
end

def capitalize_sentences(string)
unless string.blank?
string.split('.').map do |sentence|
sentence.strip.capitalize
end.join(' ')
end
end
end

Ruby on Rails Truncate text - can I use it for a combination of title and content?

You can just concatenate the two parts and truncate the result. Maybe factor it out into a helper method as well:

def truncate_topic(topic)
full_text = link_to(topic.title, topic) + ' - ' + topic.description_without_embed
truncate(full_text, :length => 50, :omission => "...")
end

And then in your view:

<%= truncate_topic(topic) %>

How do I keep the delimiters when splitting a Ruby string?

Answer

Use a positive lookbehind regular expression (i.e. ?<=) inside a parenthesis capture group to keep the delimiter at the end of each string:

content.split(/(?<=[?.!])/)

# Returns an array with:
# ["Do you like to code?", " How I love to code!", " I'm always coding."]

That leaves a white space at the start of the second and third strings. Add a match for zero or more white spaces (\s*) after the capture group to exclude it:

content.split(/(?<=[?.!])\s*/)

# Returns an array with:
# ["Do you like to code?", "How I love to code!", "I'm always coding."]

Additional Notes

While it doesn't make sense with your example, the delimiter can be shifted to the front of the strings starting with the second one. This is done with a positive lookahead regular expression (i.e. ?=). For the sake of anyone looking for that technique, here's how to do that:

content.split(/(?=[?.!])/)

# Returns an array with:
# ["Do you like to code", "? How I love to code", "! I'm always coding", "."]

A better example to illustrate the behavior is:

content = "- the - quick brown - fox jumps"
content.split(/(?=-)/)

# Returns an array with:
# ["- the ", "- quick brown ", "- fox jumps"]

Notice that the square bracket capture group wasn't necessary since there is only one delimiter. Also, since the first match happens at the first character it ends up as the first item in the array.

Don't break paragraph when new page (Prawn)

You could just do :

pdf.group do

#Your code

end

Is that what you were looking for ??



Related Topics



Leave a reply



Submit