How to parse XML with jsoup
It seems the latest version of Jsoup (1.6.2 - released March 28, 2012) includes some basic support for XML.
String html = "<?xml version=\"1.0\" encoding=\"UTF-8\"><tests><test><id>xxx</id><status>xxx</status></test><test><id>xxx</id><status>xxx</status></test></tests></xml>";
Document doc = Jsoup.parse(html, "", Parser.xmlParser());
for (Element e : doc.select("test")) {
System.out.println(e);
}
Give that a shot..
Reading XML with jsoup
getElementsMatchingOwnText
tries to find element based on its own text, like when you want to find <name>Foo Bar</name>
based on Foo
or Bar
. Instead use
select
which supports CSS query format,- or
document.getElementsByTag("name")
Also to actually get text which element represent call e.text()
.
BTW you shouldn't be building strings in loop via concatenation. In each iteration this needs to create new string by copying old result (which can be long) and add some small part to it. Instead use StringBuilder
and append
new content to it (this class is wrapper for char[]
array of quite big size so append just fills it with text, when length of array is not enough it is being replaced by array with doubled size). When you are done, call toString
method to get result as String.
So what you want is more like
Elements elements = document.getElementsByTag("name");
StringBuilder sb = new StringBuilder();
for(Element e : elements) {
sb.append(e.text()).append(", ");
}
desc = sb.toString();
Use jsoup to parse XML - prevent jsoup from cleaning link tags
In jsoup 1.6.2 I have added an XML parser mode, which parses the input as-is, without applying the HTML5 parse rules (contents of element, document structure, etc). This mode will keep text in a <link>
tag, and allow multiples of it, etc.
Here's an example:
String xml = "<link>One</link><link>Two</link>";
Document xmlDoc = Jsoup.parse(xml, "", Parser.xmlParser());
Elements links = xmlDoc.select("link");
System.out.println("Link text 1: " + links.get(0).text());
System.out.println("Link text 2: " + links.get(1).text());
Returns:
Link text 1: One
Link text 2: Two
Can't parse XML (from web) using JSoup
There are 2 issues in your code:
- You provide a
String
representation of anURL
while anXML
content is expected, you should rather use the methodparse(InputStream in, String charsetName, String baseUri, Parser parser)
instead to parse your XML as an input stream. - There is no element
genre
in yourXML
,genre
is an attribute of the elementmovie
.
Here is how your code should look like:
String url = "http://www.omdbapi.com/?t=Private%20Ryan&y=&plot=short&r=xml";
// Parse the doc using an XML parser
Document doc = Jsoup.parse(new URL(url).openStream(), "UTF-8", "", Parser.xmlParser());
// Select the first element "movie"
Element movieFromXml = doc.select("movie").first();
// Get its attribute "genre"
String genre = movieFromXml.attr("genre");
// Print the result
System.out.println(genre);
Output:
Drama, War
How to iterate through XML tags using Jsoup?
Here is a snippet that prints all children of the item
elements:
public class Test {
public static void main(String[] args) {
String xml =
"<item>\r\n" +
" <title> this is title 1 </title>\r\n" +
" <description> description 1 </description>\r\n" +
" <pubDate> date 1 </pubDate>\r\n" +
"</item>\r\n" +
"\r\n" +
"<item>\r\n" +
" <title> this is title 2 </title>\r\n" +
" <description> description 2 </description>\r\n" +
" <pubDate> date 2 </pubDate>\r\n" +
"</item>";
Document doc = Jsoup.parse(xml, "", Parser.xmlParser());
for (Element item : doc.select("item")) {
Elements children = item.children();
for (Element child : children) {
System.out.println(child.text());
}
}
}
}
This is the output:
this is title 1
description 1
date 1
this is title 2
description 2
date 2
Parsing XML with Jsoup
The mistake I made was going through the XML by Elements
, which do not include TextNodes
. When I go through it Node by Node, I can check wether the Node
is an Element
or a TextNode
, that way I can treat them accordingly.
Parsing xml data with Jsoup in android studio
First, select all forecast
elements:
val listItems: Elements = doc.select("forecast")
Next, loop through your list and print the desired children:
for (item in listItems) {
System.out.println(item.select("ftime"));
System.out.println(item.select("f"));
System.out.println(item.select("d"));
System.out.println(item.select("t"));
System.out.println(item.select("w"));
}
If you only want to print the text contained inside the child nodes, replace the above statements:
System.out.println(item.select(/* ... */));
with:
System.out.println(item.select(/* ... */).text());
Related Topics
Block()/Blockfirst()/Blocklast() Are Blocking Error When Calling Bodytomono After Exchange()
How to Hide a Jframe in System Tray of Taskbar
Read Error Response Body in Java
Waiting at Sun.Misc.Unsafe.Park(Native Method)
Pass Data from Java Servlet to Jsp
Is There a Way in Java to Convert an Integer to Its Ordinal Name
How to Get the Count of Line in a File in an Efficient Way
Displaying Fancy Equations with Java
How to Change Java Version Used by Tomcat
Change Database Schema Used by Spring Boot
How to Get the Http Status Code Out of a Servletresponse in a Servletfilter
Error: Java_Home Is Not Defined Correctly Executing Maven
How to Achieve Conditional Resource Import in a Spring Xml Context
Is It Discouraged to Use @Spy and @Injectmocks on the Same Field
How to Decrypt an Encrypted Aes-256 String from Cryptojs Using Java