How to Parse Xml with Jsoup

How to parse XML with jsoup

It seems the latest version of Jsoup (1.6.2 - released March 28, 2012) includes some basic support for XML.

String html = "<?xml version=\"1.0\" encoding=\"UTF-8\"><tests><test><id>xxx</id><status>xxx</status></test><test><id>xxx</id><status>xxx</status></test></tests></xml>";
Document doc = Jsoup.parse(html, "", Parser.xmlParser());
for (Element e : doc.select("test")) {
System.out.println(e);
}

Give that a shot..

Reading XML with jsoup

getElementsMatchingOwnText tries to find element based on its own text, like when you want to find <name>Foo Bar</name> based on Foo or Bar. Instead use

  • select which supports CSS query format,
  • or document.getElementsByTag("name")

Also to actually get text which element represent call e.text().

BTW you shouldn't be building strings in loop via concatenation. In each iteration this needs to create new string by copying old result (which can be long) and add some small part to it. Instead use StringBuilder and append new content to it (this class is wrapper for char[] array of quite big size so append just fills it with text, when length of array is not enough it is being replaced by array with doubled size). When you are done, call toString method to get result as String.

So what you want is more like

Elements elements = document.getElementsByTag("name");
StringBuilder sb = new StringBuilder();
for(Element e : elements) {
sb.append(e.text()).append(", ");
}
desc = sb.toString();

Use jsoup to parse XML - prevent jsoup from cleaning link tags

In jsoup 1.6.2 I have added an XML parser mode, which parses the input as-is, without applying the HTML5 parse rules (contents of element, document structure, etc). This mode will keep text in a <link> tag, and allow multiples of it, etc.

Here's an example:

String xml = "<link>One</link><link>Two</link>";
Document xmlDoc = Jsoup.parse(xml, "", Parser.xmlParser());

Elements links = xmlDoc.select("link");
System.out.println("Link text 1: " + links.get(0).text());
System.out.println("Link text 2: " + links.get(1).text());

Returns:

Link text 1: One
Link text 2: Two

Can't parse XML (from web) using JSoup

There are 2 issues in your code:

  1. You provide a String representation of an URL while an XML content is expected, you should rather use the method parse(InputStream in, String charsetName, String baseUri, Parser parser) instead to parse your XML as an input stream.
  2. There is no element genre in your XML, genre is an attribute of the element movie.

Here is how your code should look like:

String url = "http://www.omdbapi.com/?t=Private%20Ryan&y=&plot=short&r=xml";
// Parse the doc using an XML parser
Document doc = Jsoup.parse(new URL(url).openStream(), "UTF-8", "", Parser.xmlParser());
// Select the first element "movie"
Element movieFromXml = doc.select("movie").first();
// Get its attribute "genre"
String genre = movieFromXml.attr("genre");
// Print the result
System.out.println(genre);

Output:

Drama, War

How to iterate through XML tags using Jsoup?

Here is a snippet that prints all children of the item elements:

public class Test {

public static void main(String[] args) {
String xml =
"<item>\r\n" +
" <title> this is title 1 </title>\r\n" +
" <description> description 1 </description>\r\n" +
" <pubDate> date 1 </pubDate>\r\n" +
"</item>\r\n" +
"\r\n" +
"<item>\r\n" +
" <title> this is title 2 </title>\r\n" +
" <description> description 2 </description>\r\n" +
" <pubDate> date 2 </pubDate>\r\n" +
"</item>";

Document doc = Jsoup.parse(xml, "", Parser.xmlParser());
for (Element item : doc.select("item")) {
Elements children = item.children();
for (Element child : children) {
System.out.println(child.text());
}
}
}
}

This is the output:

this is title 1

description 1

date 1

this is title 2

description 2

date 2

Parsing XML with Jsoup

The mistake I made was going through the XML by Elements, which do not include TextNodes. When I go through it Node by Node, I can check wether the Node is an Element or a TextNode, that way I can treat them accordingly.

Parsing xml data with Jsoup in android studio

First, select all forecast elements:

val listItems: Elements = doc.select("forecast")

Next, loop through your list and print the desired children:

for (item in listItems) {
System.out.println(item.select("ftime"));
System.out.println(item.select("f"));
System.out.println(item.select("d"));
System.out.println(item.select("t"));
System.out.println(item.select("w"));
}

If you only want to print the text contained inside the child nodes, replace the above statements:

System.out.println(item.select(/* ... */));

with:

System.out.println(item.select(/* ... */).text());


Related Topics



Leave a reply



Submit