How to fix Invalid byte 1 of 1-byte UTF-8 sequence
How to fix this issue ?
Read the data using the correct character encoding. The error message means that you are trying to read the data as UTF-8 (either deliberately or because that is the default encoding for an XML file that does not specify <?xml version="1.0" encoding="somethingelse"?>
) but it is actually in a different encoding such as ISO-8859-1 or Windows-1252.
To be able to advise on how you should do this I'd have to see the code you're currently using to read the XML.
I have UTF-8 - but still get Invalid byte 1 of 1-byte UTF-8 sequence
If your database contains only a single byte (with value 0xC4) then you aren't using UTF-8 encoding.
The character "LATIN CAPITAL LETTER A WITH DIAERESIS" has a code-point value U+00C4, but UTF-8 can't encode that in a single byte. If you check the third column "UTF-8 (hex.)" on UTF8-zeichentabelle.de you'll see that UTF-8 encodes that as 0xC3 84 (two bytes).
Please read Joel's article "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" for more info.
EDIT: Christian found the answer himself; turned out it was a problem in the Cocoon 3 SAX component (I guess it's the alpha 3 version). It turns out that if you pass an XML as a String into the XMLGenerator
class, something will go wrong during SAX parsing causing this mess.
I looked up the code to find the actual problem in Cocoon-stax:
if (XMLGenerator.this.logger.isDebugEnabled()) {
XMLGenerator.this.logger.debug("Using a string to produce SAX events.");
}
XMLUtils.toSax(new ByteArrayInputStream(this.xmlString.getBytes()), XMLGenerator.this.getSAXConsumer();
As you can see, the call getBytes()
will create a Byte array with the JRE's default encoding which will then fail to parse. This is because the XML declares itself to be UTF-8 whereas the data is now in bytes again, and likely using your Windows codepage.
As a workaround, one can use the following:
new org.apache.cocoon.sax.component.XMLGenerator(xmlInput.getBytes("UTF-8"),
"UTF-8");
This will trigger the right internal actions (as Christian found out by experimenting with the API).
I've opened an issue in Apache's bug tracker.
EDIT 2: The issue is fixed and will be included in an upcoming release.
Message: Invalid byte 1 of 1-byte UTF-8 sequence in hadoop
I suspect this is the problem - it's at least a problem:
XMLStreamReader reader =
XMLInputFactory.newInstance().createXMLStreamReader(new
ByteArrayInputStream(document.getBytes()));
That call to getBytes
will use the platform default encoding, rather than UTF-8.
You could specify "utf-8"
as the encoding name - but it would be simpler to create a StringReader
:
XMLStreamReader reader = XMLInputFactory.newInstance()
.createXMLStreamReader(new StringReader(document));
Of course that may not be the only error, but it's at least something to look at.
Invalid byte 1 of 1-byte UTF-8 sequence occurs when posting xml in .jar but not in eclpise
You need to choose the encoding used by your PrintWriter
. Outside of Eclipse, your platform is presumably defaulting to something other than UTF-8.
Try this code:
PrintWriter pw = new PrintWriter(new OutputStreamWriter(
conn.getOutputStream(), "UTF-8"));
Related Topics
Why Is Method Overloading and Overriding Needed in Java
Alternative to Deprecated Getcelltype
Java Native Method Source Code
Inetaddress.Getlocalhost() Slow to Run (30+ Seconds)
Strange Behavior of Class.Getresource() and Classloader.Getresource() in Executable Jar
Why Catch Exceptions in Java, When You Can Catch Throwables
Create Custom Annotation for Lombok
How to Unzip Files Recursively in Java
Why Are Variables Declared with Their Interface Name in Java
Override "Private" Method in Java
When Using == for a Primitive and a Boxed Value, Is Autoboxing Done, or Is Unboxing Done
What Is the Time Complexity Performance of Hashset.Contains() in Java
Byte Array to Short Array and Back Again in Java
How to Set Eclipse Console Locale/Language
Are Thread.Sleep(0) and Thread.Yield() Statements Equivalent
What Code Does the Compiler Generate for Autoboxing
Why Functional Interfaces in Java 8 Have One Abstract Method