HTML encoding issues - Â character showing up instead of
Somewhere in that mess, the non-breaking spaces from the HTML template (the s) are encoding as ISO-8859-1 so that they show up incorrectly as an "Â" character
That'd be encoding to UTF-8 then, not ISO-8859-1. The non-breaking space character is byte 0xA0 in ISO-8859-1; when encoded to UTF-8 it'd be 0xC2,0xA0, which, if you (incorrectly) view it as ISO-8859-1 comes out as "Â "
. That includes a trailing nbsp which you might not be noticing; if that byte isn't there, then something else has mauled your document and we need to see further up to find out what.
What's the regexp, how does the templating work? There would seem to be a proper HTML parser involved somewhere if your
strings are (correctly) being turned into U+00A0 NON-BREAKING SPACE characters. If so, you could just process your template natively in the DOM, and ask it to serialise using the ASCII encoding to keep non-ASCII characters as character references. That would also stop you having to do regex post-processing on the HTML itself, which is always a highly dodgy business.
Well anyway, for now you can add one of the following to your document's <head>
and see if that makes it look right in the browser:
- for HTML4:
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
- for HTML5:
<meta charset="utf-8">
If you've done that, then any remaining problem is ActivePDF's fault.
 character showing up instead of
Found it!
@$doc->loadHTML(mb_convert_encoding($str, 'HTML-ENTITIES', 'UTF-8'));
This answer explains the issue and gives the work around above;
DOMDocument::loadHTML will treat your string as being in ISO-8859-1 unless you tell it otherwise. This results in UTF-8 strings being interpreted incorrectly.
’ showing on page instead of '
Ensure the browser and editor are using UTF-8 encoding instead of ISO-8859-1/Windows-1252.
Or use ’
.
What is this character ( Â ) and how do I remove it with PHP?
"Latin 1" is your problem here. There are approx 65256 UTF-8 characters available to a web page which you cannot store in a Latin-1 code page.
For your immediate problem you should be able to
$clean = str_replace(chr(194)," ",$dirty)
However I would switch your database to use utf-8 ASAP as the problem will almost certainly reoccur.
Why am I printing » instead of »?
Why is
Â
being added to it?
Because your stylesheet is saved as UTF-8, but the browser is decoding it using Windows-1252. This is probably because the page that's referencing the stylesheet has no declared encoding and the browser is arbitrarily guessing the Windows-1252, which is typically the default encoding on Western European locales. The byte sequence 0xC2 0xBB
represents »
in UTF-8 but »
in Windows-1252.
Adding the <meta charset>
declaration in Akjm's answer to the page(s) that reference the stylesheet should make this work. If you can't do this (for example because you are making a stylesheet that might be referenced by other people's pages which could be in any encoding), alternatives are:
encoding the character using CSS backslash-escapes, as in @RobFonseca's answer. (The HTML character reference syntax in @Akjm's answer is not effective here.)
putting the rule
@charset "utf-8";
at the top of the stylesheet to tell the browser that the stylesheet has its own encoding, independently of whatever the page usessetting the web server to serve the stylesheet with an HTTP
Content-Type: text/css;charset=utf-8
header
Support for approaches 2–4 has traditionally been rocky, though I haven't checked browser support recently.
Related Topics
Curve Bottom Side of the Div to the Inside With Css
How to Make a Whole Row in a Table Clickable as a Link
Flex-Grow Not Sizing Flex Items as Expected
Is ≪Img≫ Element Block Level or Inline Level
Positions Fixed Doesn't Work When Using -Webkit-Transform
Li Item on Two Lines. Second Line Has No Margin
Position Footer At Bottom of Page Having Fixed Header
How to Delete an Item or Object from an Array Using Ng-Click
Two Forward Slashes in a Url/Src/Href Attribute
Make Iframe to Fit 100% of Container'S Remaining Height
Centered Elements Inside a Flex Container Are Growing and Overflowing Beyond Top
Best Practices & Considerations When Writing HTML Emails
Why Does Position:Relative; Appear to Change the Z-Index
How to Remove the Arrow from a Select Element in Firefox
Input Type="Submit" VS Button Tag Are They Interchangeable
How to Use Div as a Direct Child of Ul
Why the Content Is Not Covered by the Background of an Overlapping Element