JAVA SAX parser split calls to characters()
Parser is calling characters
method more than one time, because it can and allowed per spec. This helps fast parser and keep their memory footprint low. If you want a single string create a new StringBuilder
object in the startElement
and process it on endElement
method.
SAX parsing and special characters
My guess is that you are treating each call to characters
as delivering the complete text for a cat
element. You should code your handler so that successive calls to characters
accumulate the text, and you only capture it on the endElement
event:
public class CatHandler extends DefaultHandler {
private StringBuilder chars = new StringBuilder();
public void startElement(String uri, String lName, String qName, Attributes a)
{
final String name = qName == null ? lName : qName;
if ("cat".equals(name)) {
chars.setLength(0);
} else . . .
}
public void endElement(String uri, String lName, String qName) {
final String name = qName == null ? lName : qName;
if ("cat".equals(name)) {
String catName = chars.toString();
// do something with cat name
} else . . .
}
public void characters(char[] ch, int start, int length) {
chars.append(ch, start, length);
}
Java SaxParser trim the string after &
This is a lesson everyone has to learn when using SAX: the parser can break up text nodes and report the content in multiple calls to characters(), and it's the application's job to reassemble it (e.g. by using a StringBuilder). It's very common for parsers to break the text at any point where it would otherwise have to shunt characters around in memory, e.g. where entity references occur or where it hits an I/O buffer boundary.
It was designed this way to make SAX parsers super-efficient by minimizing text copying, but I suspect there's no real benefit, because the text copying just has to be done by the application instead.
Don't try and second-guess the parser as @DavidWallace suggests. The parser is allowed to break the text up any way it likes, and your application should cater for that.
Sax characters breaking element apart
The parser is allowed to call the ContentHandler characters method multiple times for each string of element text, it's not finding a line terminator necessarily. the Java tutorial on SAX has a short explanation of the characters method:
Parsers are not required to return any particular number of characters at one time. A parser can return anything from a single character at a time up to several thousand and still be a standard-conforming implementation. So if your application needs to process the characters it sees, it is wise to have the characters() method accumulate the characters in a java.lang.StringBuffer and operate on them only when you are sure that all of them have been found.
Also this Javaworld article has good explanations and examples.
Parse value containing special character / gives wrong output using SAX parser
Just change character() method
@Override
public void characters(char[] buffer, int start, int length) {
tmpValue += new String(buffer, start, length);
}
And add this at last line in the endElement method .
public void endElement(String s, String s1, String element) throws SAXException {
if (OrgDataPartitonObj != null && "fs:FinancialStatementLineItemDataItem".equals(OrgDataPartitonObj.getType())) {
FinancialStatementLineItemParser.getEndElementFinancialStatementLineItemParser(financialStatementLineItemObj, element, tmpValue);
}
tmpValue="";
}
Sax Parser - Unable to split XML file to specified size
You should call the setContentHandler before the parse.
Related Topics
How to Read Input Character-By-Character in Java
Why Doesn't a Missing Annotation Cause a Classnotfoundexception at Runtime
Java.Util.Concurrentmodificationexception Not Thrown When Expected
Why Does This Generic Code Compile in Java 8
What Is the Most Efficient Java Collections Library
How to Do a Soap Web Service Call from Java Class
How Is Length Implemented in Java Arrays
Java Ternary Without Assignment
Escaping Special Characters in Java Regular Expressions
Log4J: How to Use Socketappender
Spring Autowiring Class VS. Interface
How to Get the Session Object If I Have the Entity-Manager
Javamail API to Imail -- Java.Net.Socketexception: Permission Denied: Connect
Understanding Spring @Configuration Class
Get Declared Fields of Java.Lang.Reflect.Fields in Jdk12
How to Add an Extra Source Directory for Maven to Compile and Include in the Build Jar