Convert HTML Character Back to Text Using Java Standard Library

Convert HTML Character Back to Text Using Java Standard Library

I think the Apache Commons Lang library's StringEscapeUtils.unescapeHtml3() and unescapeHtml4() methods are what you are looking for. See https://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/StringEscapeUtils.html.

How to unescape HTML character entities in Java?

I have used the Apache Commons StringEscapeUtils.unescapeHtml4() for this:

Unescapes a string containing entity
escapes to a string containing the
actual Unicode characters
corresponding to the escapes. Supports
HTML 4.0 entities.

Convert HTML character code to char in Java

You can use the StringEscapeUtils from Apache Commons:
http://commons.apache.org/lang/api-2.6/org/apache/commons/lang/StringEscapeUtils.html

next time search before: How to convert from HTML to UTF-8 in java

Convert plain text to HTML text in Java

I found a solution using pattern matching. Here is my code -

String str = "(?i)\\b((?:https?://|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:\'\".,<>?«»“”‘’]))";
Pattern patt = Pattern.compile(str);
Matcher matcher = patt.matcher(plain);
plain = matcher.replaceAll("<a href=\"$1\">$1</a>");

And Here are the input and output -

Input text is variable plain:

some text and then the URL http://www.google.com and then some other text.

Output :

some text and then the URL <a href="http://www.google.com">http://www.google.com</a> and then some other text.

Convert HTML Character Back to Text Using Java Standard Library

I think the Apache Commons Lang library's StringEscapeUtils.unescapeHtml3() and unescapeHtml4() methods are what you are looking for. See https://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/StringEscapeUtils.html.

HTML to TXT library that mimics the output of lynx -dump ?

After a year, I give up. Answer is: no way to handle that, no library in Java. At least for now.

I'm closing this. Thank you for your attention.



Related Topics



Leave a reply



Submit