Printing Content of a Xml File Using Xml Dom

Printing content of a XML file using XML DOM

Explanation for weird #text strings

The weird #text strings dont come out of the blue but are actual Text Nodes. When you load a formatted XML document with DOM any whitespace, e.g. indenting, linebreaks and node values will be part of the DOM as DOMText instances by default, e.g.

<cellphones>\n\t<telefon>\n\t\t<model>Easy DB…
E T E T E T

where E is a DOMElement and T is a DOMText.

To get around that, load the document like this:

$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->load('file.xml');

Then your document will be structured as follows

<cellphones><telefon><model>Easy DB…
E E E T

Note that individual nodes representing the value of a DOMElement will still be DOMText instances, but the nodes that control the formatting are gone. More on that later.

Proof

You can test this easily with this code:

$dom = new DOMDocument;
$dom->preserveWhiteSpace = TRUE; // change to FALSE to see the difference
$dom->load('file.xml');
foreach ($dom->getElementsByTagName('telefon') as $telefon) {
foreach($telefon->childNodes as $node) {
printf(
"Name: %s - Type: %s - Value: %s\n",
$node->nodeName,
$node->nodeType,
urlencode($node->nodeValue)
);
}
}

This code runs through all the telefon elements in your given XML and prints out node name, type and the urlencoded node value of it's child nodes. When you preserve the whitespace, you will get something like

Name: #text - Type: 3 - Value: %0A++++
Name: model - Type: 1 - Value: Easy+DB
Name: #text - Type: 3 - Value: %0A++++
Name: proizvodjac - Type: 1 - Value: Alcatel
Name: #text - Type: 3 - Value: %0A++++
Name: cena - Type: 1 - Value: 25
Name: #text - Type: 3 - Value: %0A++

The reason I urlencoded the value is to show that there is in fact DOMText nodes containing the indenting and the linebreaks in your DOMDocument. %0A is a linebreak, while each + is a space.

When you compare this with your XML, you will see there is a line break after each <telefon> element followed by four spaces until the <model> element starts. Likewise, there is only a newline and two spaces between the closing <cena> and the opening <telefon>.

The given type for these nodes is 3, which - according to the list of predefined constants - is XML_TEXT_NODE, e.g. a DOMText node. In lack of a proper element name, these nodes have a name of #text.

Disregarding Whitespace

Now, when you disable preservation of whitespace, the above will output:

Name: model - Type: 1 - Value: Easy+DB
Name: proizvodjac - Type: 1 - Value: Alcatel
Name: cena - Type: 1 - Value: 25
Name: model - Type: 1 - Value: 3310

As you can see, there is no more #text nodes, but only type 1 nodes, which means XML_ELEMENT_NODE, e.g. DOMElement.

DOMElements contain DOMText nodes

In the beginning I said, the values of DOMElements are DOMText instances too. But in the output above, they are nowhere to be seen. That's because we are accessing the nodeValue property, which returns the value of the DOMText as string. We can prove that the value is a DOMText easily though:

$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->loadXML($xml);
foreach ($dom->getElementsByTagName('telefon') as $telefon) {
$node = $telefon->firstChild->firstChild; // 1st child of model
printf(
"Name: %s - Type: %s - Value: %s\n",
$node->nodeName,
$node->nodeType,
urlencode($node->nodeValue)
);
}

will output

Name: #text - Type: 3 - Value: Easy+DB
Name: #text - Type: 3 - Value: 3310
Name: #text - Type: 3 - Value: GF768
Name: #text - Type: 3 - Value: Skeleton
Name: #text - Type: 3 - Value: Earl

And this proves a DOMElement contains it's value as a DOMText and nodeValue is just returning the content of the DOMText directly.

More on nodeValue

In fact, nodeValue is smart enough to concatenate the contents of any DOMText children:

$dom = new DOMDocument;
$dom->loadXML('<root><p>Hello <em>World</em>!!!</p></root>');
$node = $dom->documentElement->firstChild; // p
printf(
"Name: %s - Type: %s - Value: %s\n",
$node->nodeName,
$node->nodeType,
$node->nodeValue
);

will output

Name: p - Type: 1 - Value: Hello World!!!

although these are really the combined values of

DOMText "Hello"
DOMElement em with DOMText "World"
DOMText "!!!"

Printing content of a XML file using XML DOM

To finally answer your question, look at the first test code. Everything you need is in there. And of course by now you have been given fine other answers too.

Pretty printing XML in Python

import xml.dom.minidom

dom = xml.dom.minidom.parse(xml_fname) # or xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = dom.toprettyxml()

Read and print out XML file to console

I have edited your code by adding two lines. Check below. Basically, you have to iterate through next level of nodes.

code:

public void readXMLFile() {
try {

File fXmlFile = new File("highscore.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
doc.getDocumentElement().normalize();
System.out.println("Root element : " + doc.getDocumentElement().getNodeName());
NodeList nList = doc.getElementsByTagName("Highscore");
Node child = nList.item(0);
NodeList nL = child.getChildNodes();
System.out.println("----------------------------");
int i = 1;
for (int temp = 0; temp < nL.getLength(); temp++) {
Node nNode = nL.item(temp);
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
System.out.println(i + "," + eElement.getAttribute("name") + "," + eElement.getAttribute("score"));
i++;
}
}
} catch (Exception e) {
e.printStackTrace();
}
}

Output:

Root element : Highscore
----------------------------
1,Rasmus,10000
2,Søren,6000
3,Niclas,5000

Parse XML using DOM and print in java

You're mixing your models here. getElementById is for elements that have an attribute identified by the document's DTD as an ID, and since your document doesn't have a DTD it will never give you anything useful.

Because your document uses namespaces you should use the "NS" methods to extract elements, and for the text contained in an element you can use getTextContent

  File infile = new File("D:\\Cust.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
// namespaces - DocumentBuilderFactory is *not* namespace aware by default
dbFactory.setNamespaceAware(true);
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(infile);
doc.getDocumentElement().normalize();

System.out.println("root of xml file " +
doc.getDocumentElement().getNodeName());
System.out.println("==========================");

NodeList list = doc.getElementsByTagNameNS(
"http://abc.com/elements", "CustomerDetails");
System.out.println(list.getLength());
for(int i = 0; i < list.getLength(); i++) {
Element custDetails = (Element)list.item(i);
Element id = custDetails.getElementsByTagNameNS(
"http://abc.com/elements", "CustomerId").item(0);
System.out.println("Customer ID: " + id.getTextContent());
Element name = custDetails.getElementsByTagNameNS(
"http://abc.com/elements", "CustomerName").item(0);
System.out.println("Customer Name: " + name.getTextContent());
}

Reading XML file using xml.dom.minidom or elementtree - Python

I don't have your code so can't see what you are doing, but tag.text should get you the text of the tag. Example:

import xml.etree.ElementTree as ET

xml = '''<user name="John Doe" title="Manager">
<group title="USA">
<column name="Inventory">
Inventory of the products
</column>

<column name="Sells">
Sells of the products
</column>
</group>
</user>'''

root = ET.fromstring(xml)

inventory = root.findall('.//column[@name="Inventory"]')
print inventory[0].text.strip()

sells = root.findall('.//column[@name="Sells"]')
print sells[0].text.strip()

printing content into Browser after reading xml file

you can do like this...change Path according to you

package com.asn.model;

import java.awt.Desktop;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class ticketcount {

public static void main(String[] args) throws IOException {
List<ticket> ticketList = new ArrayList<ticket>();
String content = "";
String content1 ="<HTML><HEAD></HEAD><TABLE border=3>";
FileWriter fw =null;
BufferedWriter bw=null;
String Path = "src";

try {

File fXmlFile = new File(Path + "\\file.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);

doc.getDocumentElement().normalize();

NodeList ticketNodeList = doc.getElementsByTagName("ticket");

for (int temp = 0; temp < ticketNodeList.getLength(); temp++) {
Node varNode = ticketNodeList.item(temp);

if (varNode.getNodeType() == Node.ELEMENT_NODE) {

Element eElement = (Element) varNode;

NodeList teamList = eElement.getElementsByTagName("team");
NodeList varsionList = eElement.getElementsByTagName("imp");

Node teamNode = teamList.item(0);
Node impNode = varsionList.item(0);

if (teamNode.getNodeType() == Node.ELEMENT_NODE
&& impNode.getNodeType() ==

Node.ELEMENT_NODE) {
Element teamElement = (Element) teamNode;
Element impElement = (Element) impNode;
ticket ticket = new ticket(
teamElement.getTextContent(),

impElement.getTextContent());
ticketList.add(ticket);
}
}

File file = new File(Path + "\\result1.html");
if (!file.exists()) {
file.createNewFile();
}

fw = new FileWriter(file.getAbsoluteFile());
bw = new BufferedWriter(fw);
}

} catch (Exception e) {
e.printStackTrace();
}

Map<ticket, Integer> count = new HashMap<ticket, Integer>();
for (ticket c : ticketList)
if (!count.containsKey(c))
count.put(c, Collections.frequency(ticketList, c));

List<String> imps = getimps(count);

List<String> teams = getteams(count);

content=content+"<tr><th>ticket</th> ";
for (String s : imps) {

content=content+"<th>"+s+"</th>";
}

content=content+"</tr>";

System.out.println("---------------------------------");

for (String m : teams) {
System.out.println(m + "\t| " + getNumOfteams(m, imps, count));
content = content + "<tr><td>" + m + "</td>"
+ getNumOfteams(m, imps, count) + "</tr>";
}
bw.write(content1 + content + "</TABLE></HTML>");
bw.close();

Runtime rTime = Runtime.getRuntime();
String url = Path + "//result1.html";
// String url = "C:\\Users\\a561922\\Desktop\\TEST.html";//"D:/hi.html";
String browser = "C:/Program Files/Internet Explorer/iexplore.exe ";
File htmlFile = new File(url);
System.out.println(url);
Desktop.getDesktop().browse(htmlFile.toURI());

// Process pc = rTime.exec(browser + url);
// pc.waitFor();

// Runtime.getRuntime().exec("C:\\Users\\a561922\\Desktop\\TEST.html");
}

private static List<String> getteams(Map<ticket, Integer> count) {
List<String> teams = new ArrayList<String>();
for (Map.Entry<ticket, Integer> ent : count.entrySet())
if (!teams.contains(ent.getKey().getteam()))
teams.add(ent.getKey().getteam());
return teams;
}

private static String getNumOfteams(String team, List<String> imps,
Map<ticket, Integer>

count) {
StringBuilder builder = new StringBuilder();

for (String v : imps) {
Integer cnt = count.get(new ticket(team, v));
if (cnt == null) {
cnt = 0;
}
builder.append("<td>"+cnt + "</td>");

}

return builder.toString();
}

private static List<String> getimps(Map<ticket, Integer> count) {
List<String> imps = new ArrayList<String>();
for (Map.Entry<ticket, Integer> ent : count.entrySet())
if (!imps.contains(ent.getKey().getimp()))
imps.add(ent.getKey().getimp());
return imps;
}
}

Now the Output :-

Sample Image

I think this the output what you want ... Let me know if u face any issues

print value from a parent tag xml with minidom python

Using lxml it is very simple, find the calendar parent tags that have > 2 contactitem//fields tags using count:

from lxml.html import fromstring

tree = fromstring(the_xml)

print(tree.xpath("//calendar[count(./contactitem//fields) > 2]/@id"))

Sample run:

In [8]: from lxml.html import fromstring

In [9]: tree = fromstring(h)

In [10]: tree.xpath("//calendar[count(./contactitem//fields) > 2]/@id"
....: )
Out[10]: ['text1']

Or using lxml.etree:

from lxml.etree import fromstring

tree = fromstring(h)

print(tree.xpath("//Calendar[count(./ContactItem//FIELDS) > 2]/@ID"))

To read from the file use parse:

from lxml.html import parse
tree = parse("your.xml")

You should generally read from the file and let lxml handle the encoding.

count is not supported in xml.etree so to do the same you would use findall:

from xml.etree import ElementTree as et

tree = et.parse("Your.xml")
cals = tree.findall(".//Calendar")
print([c.get("ID") for c in cals if len(c.findall("./ContactItem/FIELDS")) > 2])

Amend node value in existing XML document and print result with DOM PHP

Try:

$xpath = new DOMXPath($xmlDoc);
$from = $xpath->query('/note//from')[0];
$from->nodeValue="John";
echo $xmlDoc->saveXML();

Output:

<?xml version="1.0" encoding="UTF-8"?>
<note>
<to>Tove</to>
<from>John</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>


Related Topics



Leave a reply



Submit