Error: "Input Is Not Proper Utf-8, Indicate Encoding !" Using PHP'S Simplexml_Load_String

Error: Input is not proper UTF-8, indicate encoding ! using PHP's simplexml_load_string

Your 0xED 0x6E 0x2C 0x20 bytes correspond to "ín, " in ISO-8859-1, so it looks like your content is in ISO-8859-1, not UTF-8. Tell your data provider about it and ask them to fix it, because if it doesn't work for you it probably doesn't work for other people either.

Now there are a few ways to work it around, which you should only use if you cannot load the XML normally. One of them would be to use utf8_encode(). The downside is that if that XML contains both valid UTF-8 and some ISO-8859-1 then the result will contain mojibake. Or you can try to convert the string from UTF-8 to UTF-8 using iconv() or mbstring, and hope they'll fix it for you. (they won't, but you can at least ignore the invalid characters so you can load your XML)

Or you can take the long, long road and validate/fix the sequences by yourself. That will take you a while depending on how familiar you are with UTF-8. Perhaps there are libraries out there that would do that, although I don't know any.

Either way, notify your data provider that they're sending invalid data so that they can fix it.


Here's a partial fix. It will definitely not fix everything, but will fix some of it. Hopefully enough for you to get by until your provider fix their stuff.

function fix_latin1_mangled_with_utf8_maybe_hopefully_most_of_the_time($str)
{
return preg_replace_callback('#[\\xA1-\\xFF](?![\\x80-\\xBF]{2,})#', 'utf8_encode_callback', $str);
}

function utf8_encode_callback($m)
{
return utf8_encode($m[0]);
}

XML error: Input is not proper UTF-8, indicate encoding ! after change from PHP5 to PHP7

I believe you encountered a streamed XML parsing bug in chrome. The error will point to the beginning of the XML tag, but in fact the „error” is somewhere further in the content. It is because the server responds in chunks, and one of those chunks were split in the middle of a multibyte UTF character.

Another PHP XML parsing error: Input is not proper UTF-8, indicate encoding!

When ç is "ç", then your encoding is Windows-1252 (or maybe ISO-8859-1), but not UTF-8.

simplexml_load_string(): Input is not proper UTF-8, indicate encoding ! Bytes: 0xC2 0x20 0x5D 0x5D

Try this:

 $feed=file_get_contents($feed);
$feed=str_replace(chr(hexdec('20')),' ', $feed);
$feed=str_replace(chr(hexdec('5D')), ']', $feed);
$feed=simplexml_load_string($feed);

Actually each hex(0x) entry in the error corresponds to a symbol use str_replace to replace the characters and then try simplexml_load_string.Use this link for conversion purpose.I am surprised not to see this easy method else where in this forum.Hope this helps you.

readOuterXml(), Input is not proper UTF-8, indicate encoding

I figured out that vim is pretty good at converting from one encoding to another.

My trick is to parse the file normally, and when the encoding error is encountered just re-encode the file with vim and start parsing again.

Here's the rough idea:

$xmlFile = '/path/to/file.xml';

// Parse the file in a loop
while(...)
{

try
{
// Normal parsing logic...

$reader->readOuterXml();

//...
}
catch(Exception $ex)
{
$encoding = getXMLEncoding($xmlFile) ?: 'utf-8';

exec(sprintf(VIM_PATH . ' -c "set fileencoding=%s" -c "wq" "%s"', $encoding, $xmlFile));

// File has been re-encoded
// The real encoding should now match the declared encoding

// -> Go back to the beginning and parse the file again
}

}

Using this method might garble 1 or 2 chars, but it's way better than completely failed parsing. Ideally the 3rd party would mark their files correctly.

My system is Windows, so the vim arguments might be different on Linux (don't know).



Related Topics



Leave a reply



Submit