HTML to Plain Text with Ruby

HTML to Plain Text with Ruby?

Actually, this is much simpler:

require 'rubygems'
require 'nokogiri'

puts Nokogiri::HTML(my_html).text

You still have line break issues, though, so you're going to have to figure out how you want to handle those yourself.

Convert HTML to a proper plain text?

Found the solution here: https://github.com/alexdunae/premailer/blob/master/lib/premailer/html_to_plain_text.rb

Works like a charm!

Ruby: Convert HTML/Redcloth to plain text

You need to make a new formatter class.

module RedCloth::Formatters
module PlainText
include RedCloth::Formatters::Base
# ...
end
end

I won't write your code for you today but this is very easy to do. Read the RedCloth source if you doubt me: it's only 346 lines for the HTML formatter.

So, once you have your PlainText formatter you patch the class and use it:

module RedCloth
class TextileDoc
def to_txt( *rules )
apply_rules(rules)
to(RedCloth::Formatters::PlainText)
end
end
end

print RedCloth.new(str).to_txt

Convert HTML to plain text (with inclusion of br s)

Instead of writing complex regexp I used Nokogiri.

Working solution (K.I.S.S!):

def strip_html(str)
document = Nokogiri::HTML.parse(str)
document.css("br").each { |node| node.replace("\n") }
document.text
end

converting plain text to html

Wrap your text in a Pre tag:

<%= content_tag('pre', "Hello\n\nHello") %>

Converting back to plain text when using ActionText

So apparently ActionText instances have a method for retrieving plain text values with to_plain_text. All together it looks like this:

@post.body => <div>This is my markup</div>
@post.body.to_plain_text => This is my markup


Related Topics



Leave a reply



Submit