Convert HTML Character Back to Text Using Java Standard Library
I think the Apache Commons Lang library's StringEscapeUtils.unescapeHtml3()
and unescapeHtml4()
methods are what you are looking for. See https://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/StringEscapeUtils.html.
How to unescape HTML character entities in Java?
I have used the Apache Commons StringEscapeUtils.unescapeHtml4() for this:
Unescapes a string containing entity
escapes to a string containing the
actual Unicode characters
corresponding to the escapes. Supports
HTML 4.0 entities.
Convert HTML character code to char in Java
You can use the StringEscapeUtils from Apache Commons:
http://commons.apache.org/lang/api-2.6/org/apache/commons/lang/StringEscapeUtils.html
next time search before: How to convert from HTML to UTF-8 in java
Convert plain text to HTML text in Java
I found a solution using pattern matching. Here is my code -
String str = "(?i)\\b((?:https?://|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:\'\".,<>?«»“”‘’]))";
Pattern patt = Pattern.compile(str);
Matcher matcher = patt.matcher(plain);
plain = matcher.replaceAll("<a href=\"$1\">$1</a>");
And Here are the input and output -
Input text is variable plain
:
some text and then the URL http://www.google.com and then some other text.
Output :
some text and then the URL <a href="http://www.google.com">http://www.google.com</a> and then some other text.
Convert HTML Character Back to Text Using Java Standard Library
I think the Apache Commons Lang library's StringEscapeUtils.unescapeHtml3()
and unescapeHtml4()
methods are what you are looking for. See https://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/StringEscapeUtils.html.
HTML to TXT library that mimics the output of lynx -dump ?
After a year, I give up. Answer is: no way to handle that, no library in Java. At least for now.
I'm closing this. Thank you for your attention.
Related Topics
How Does Auto Boxing/Unboxing Work in Java
Cannot Parse String in Iso 8601 Format, Lacking Colon in Offset, to Java 8 Date
Java Random Always Returns the Same Number When I Set the Seed
Effective Gif/Image Color Quantization
Simple Popup Java Form with at Least Two Fields
Swingpropertychangesupport to Dynamically Update Jtextarea
Abstracttablemodel Gui Display Issue
Maven Modules + Building a Single Specific Module
Handling Exceptions from Java Executorservice Tasks
What Is the Meaning of the Cascadetype.All for a @Manytoone JPA Association
Nullpointerexception in Collectors.Tomap with Null Entry Values
How to Escape Special HTML Characters in Jsp
Java Unsupported Major Minor Version 52.0
Loading Images from Jars for Swing HTML
Why One Should Prefer Using CSS Over Xpath in Ie
Javafx Tableview Text Alignment
Spring Boot CSS Showing Up Blank/Not Loading After Trying Everything
How to Specify the Schema When Connecting to Postgres with Jdbc