Android Sax Parser Not Getting Full Text from Between Tags

Android SAX parser not getting full text from between tags

As you can see, it's cutting
everything off the url from the
ampersand escape code and after.

From the documentation of the characters() method:

The Parser will call this method to
report each chunk of character data.
SAX parsers may return all contiguous
character data in a single chunk, or
they may split it into several chunks;
however, all of the characters in any
single event must come from the same
external entity so that the Locator
provides useful information.

When I write SAX parsers, I use a StringBuilder to append everything passed to characters():

public void characters (char ch[], int start, int length) {
if (buf!=null) {
for (int i=start; i<start+length; i++) {
buf.append(ch[i]);
}
}
}

Then in endElement(), I take the contents of the StringBuilder and do something with it. That way, if the parser calls characters() several times, I don't miss anything.

SaxParser doesn't get full string between the tags

The characters method can be called more than once for the text within a single pair of open and close tags.

Your code assumes it's only called once, which will frequently be true for small data, but not always.

You need to initialize a buffer in the startElement method for that tag, collect into the buffer in the characters method, and convert the buffer to a string in the endElement.

SaxParser doesn't get full string between the tags

The characters method can be called more than once for the text within a single pair of open and close tags.

Your code assumes it's only called once, which will frequently be true for small data, but not always.

You need to initialize a buffer in the startElement method for that tag, collect into the buffer in the characters method, and convert the buffer to a string in the endElement.

Android,SAX parser Problem while reading Html Tags

An HTML file is not XML conformant.

RSS Reader using Sax Parser losing characters from title

Use a StringBuilder to build the tag, rather than using a new String instance as the documentation says:

The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity so that the Locator provides useful information.

And @CommonWares says this exactly in his post Here.

Build your tag as it is found using StringBuilder, since there is chunks coming in at once rather than the entire string (This explains the incomplete tags!). You may or may not need the isBuilding flag, but I don't know your entire implementation so I added it incase.

   StringBuilder mSb;
boolean isBuilding;

@Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {

mSb = new StringBuilder();
isBuilding = true;

if(qName.equals("title")){
parsingTitle = true;
}
...
...
}

@Override
public void characters (char ch[], int start, int length) {
if (mSb !=null && isBuilding) {
for (int i=start; i<start+length; i++) {
mSb.append(ch[i]);
}
}
}

@Override
public void endElement(String uri, String localName, String qName)
throws SAXException {

if(parsingTitle){
currentItem.setTitle(sb.toString().trim());
parsingTitle = false;
isBuilding = false;
}
}

SAXParser - Handle tags with same text at different level in XML structure

You can use XPath rather than parsing your XML using SAX.

XPath expression for your case is:

/channel/item/title

Example code:

import org.xml.sax.InputSource;

import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import java.io.StringReader;

public class XPathTest {

public static void main(String[] args) throws XPathExpressionException {

String xml = "<channel>\n" +
"\n" +
" <title>Site Name</title>\n" +
"\n" +
" <item> \n" +
" <title>News Title!</title> \n" +
" </item>\n" +
"\n" +
"</channel>";

Object result = XPathFactory.newInstance().newXPath().compile("/channel/item/title").evaluate(new InputSource(new StringReader(xml)));
System.out.print(result);
}
}

Special characters in Text node not getting parsed by SAX's characters() method

Here, the parameter 'char[] ch' is supposed to fetch the entire line Deals & Dealmakers: Technology, media and communications M&A But it is only getting "Deals ".

You seem to be assuming that you'll get the whole text in one call. There's no guarantee of that. I strongly suspect that your characters method will be called multiple times for the same text node, which is valid for the parser to do. You need to make sure your code handles that.

From the documentation:

SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity so that the Locator provides useful information.

There may be a feature you can set to ensure you get all the data in one go; I'm not sure.



Related Topics



Leave a reply



Submit