How Do Browsers Deal with "Tofu" Characters

How do browsers deal with Tofu characters

In Chrome you can inspect the element, then switch into the "Computed" tab and it will show the Rendered Fonts. Chrome (and I think most modern browsers) will use the font for all characters that are supported, and will silently use a local font to display the remaining characters.

E.g. for Japanese it will use "Yu Gothic" for 5 glyphs and "SimSun" for 3 glyphs.

Japanese

For Tamil it uses 2 glyphs from "Orbitron" (the space and the exclamation mark) and the remaining 12 are rendered in "Nirmala UI":
Tamil

For Spanish it only renders "Orbitron", I'm not sure why, but I guess it basically uses the existing exclamation mark and rotates it.

Which local font file is used can vary based on Browser and OS.

How do web browsers implement font fallback?

Font fallback in browsers (as opposed to, say, in an OS) is based on two things:

  1. The CSS specification, which gives the fonts that are to be used for fallback, and
  2. The text engine, which does text shaping.

The CSS spec is fairly trivial in this respect, simply giving the list of fonts using their system names, but several possible "catch all" fonts that are in no way guaranteed to be the same from computer to computer (there is no reason to assume that serif maps to Times or Times New Roman, for instance).

The fallback algorithm used by text engines is entirely up to the engine, but usually kicks in during the glyph lookup step: the text engine sees a string of code points, and tries to use a font to shape that string. For each point in the sequence, it checks whether the font has a matching glyph (by consulting the CMAP table and subtables), or a rule that tells the engine that there may be a glyph to use only if more code points follow, through the GSUB mechanism (For instance, a font without glyphs for the individual letters e, t and c, but with a glyph for & and a GSUB rule that says the sequence e+t+c should be in-text replaced with the single glyph &), and when it's finished accumulating this kind of "unit of points", it shapes the text and hands it back to whatever asked it to shape text.

If, during glyph lookup, it turns out the font doesn't contain anything that lets the engine shape a particular code point (i.e. running through the CMAP data as well as the GSUB rules still shows "there is no glyph") then the text engine can do two things:

  1. Give up. There is no glyph, instead use the .notdef outline defined as glyph id 0, and generally give you text with lovely empty boxes (lovingly called "tofu" by font folks) or question marks.
  2. Attempt font fallback, where it will try another font to find a glyph for the unsupported code point in.

When using fallback, an engine can go down a list of alternative fonts until either: (a) a glyph is found, or (b) the list is exhausted, at which point the engine has to give up, and will use the .notdef glyph. Whether the engine grabs the .notdef glyph from the original font, or from the last font in the list, is entirely up to the engine (although usually it'll go with the first font, for legibility)

There is no "standard" algorithm for this defined anywhere; font fallback is basically a convenience mechanism offered by text engine authors, like how browsers come with bookmark managers (handy, and not part of any spec). As far as OpenType is concerned, there are no requirements on whether an engine should just serve up .notdef when a glyph is not found, or whether it should serve up the part it could shape, then find the missing glyph somewhere else, and render text that way. CSS implies that your text engine should have at least some form of font fallback, but it doesn't specify how it should work, or when it should kick in.

Rectangles instead of letters when browsing Angular 12 app using mobile browser

The root issue is that a font is being used to display the text that doesn't support all of the characters in the text. When a character isn't supported by a font, the ".notdef" glyph (glyph 0) would be used, and in many fonts that looks like a rectangle as shown in your image.

It appears you're requesting the Playfair Display font from Google Fonts. According to Googles documentation of the glyph repertoire for that font, it doesn't support characters from the Greek script.

However, the image you show isn't Playfair Display at all! All the same, whatever font was selected by the browser, the rectangles appear because those characters are not supported by that font.

The more interesting questions are:

  • Why wasn't Playfair Display used?
  • What font did the browser select?
  • Why didn't the browser selected another font for those characters not supported by the first font (font fallback)?

The last question in particular: for Chrome on Android, typically a Noto ("no tofu!") font would be used to avoid displaying any rectangles (aka pieces of tofu). You haven't given the actual string being displayed, so a possibility is that those aren't typical Greek characters but are some unusual Unicode characters not supported in the Noto fonts on the given device.

Increase font-size globally for CJK fonts?

Just changing the root size probably won't give you what you want as the rest of the tags use the variable for pixel margins etc.

I recommend customizing the build and there are two options for fine tuning the elements:

1) using a body/wrapper class:

LESS

.cjk{
& h1,
& h2,
& h3{
font-family:@font-family-zh
}
...
}

2) use guarded mixins to change specific elements such as:

in variables.less to change everything together

.SetFontSize() when (@cjk) {
@font-size-base: 17px;
}
.SetFontSize() when (@cjk = false) {
@font-size-base: 15px;
}
.SetFontSize();

or in specific files like scaffolding.less for finer control

body {
font-family: @font-family-base;
& when (@cjk) {
font-size: @font-size-base-adjusted;
}
& when (@cjk = false) {
font-size: @font-size-base;
}
font-weight: @font-weight-base;
line-height: @line-height-computed;
color: @text-color;
background-color: @body-bg;
position: relative;
}

Bootstrap.less

@cjk = false;
...
rest of the imports

Bootstrap-zh.less

@cjk = true;
...
rest of the imports

option 1 is more verbose but you only have one file.

option 2 is streamlined but you have to switch out the files bootstrap.css <> bootstrap-cjk.css

How to render a non english characters (japanese, chinise) in html

These two things should get you set. Inside the <head> of your HTML, you should have a meta tag to specify UTF-8 charset:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

and on the PHP side (before any HTML output):

header('Content-Type: text/html; charset=UTF-8');

How do I ensure my pdf-generating application supports as many language fonts as possible?

Depending on how big you application can be, you could take a look at Noto, which will "support all languages with a harmonious look and feel." But be warned that covering every writing system on the planet will require at least a gigabyte of fonts.

Browsers support a large range of writing systems ("languages") by relying on different fallback fonts supplied by the operating system. Only when they are exhausted you will see a tofu.

Problem with characters encoding text buffer from process

I'm fonded solution for this.

Console settings

In windows console settings i found the current encoding is CP866 and add it to mb_encoding.

For Russian symbols in console:

mb_convert_encoding($out, 'utf-8', 'cp866');

or

 iconv('cp866', 'utf-8', $out);

Check your console settings to figuraute what encoding it use.

Now all works like a sharm. Thanks a lot for the help!

check if glyph is rendered correctly

Unfortunately, there isn't; it's outright missing functionality. The closest you can get is to get the actual font info for a character — requires 8.6 I think — or to measure its width, but that doesn't really help:

% font actual TkFixedFont
-family Monaco -size 11 -weight normal -slant roman -underline 0 -overstrike 0
% font actual TkFixedFont \uf16c
-family Monaco -size 11 -weight normal -slant roman -underline 0 -overstrike 0
% font measure TkFixedFont \uf16c
14

(The character renders as the glyph-not-found symbol on this system with that font.)

Why is this the extended ascii character (â, é, etc) getting replaced with ? characters?

That's what the browser does when it doesn't know the encoding to use for a character. Make sure you specify the encoding type of the text you send to the client either in headers or markup meta.

In HTML:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

In PHP (before any other content is sent to the client):

header('Content-Type: text/html; charset=utf-8');

I'm assuming you'll want UTF-8 encoding. If your site uses another encoding for text, then you should replace UTF-8 with the encoding you're using.

One thing to note about using HTML to specify the encoding is that the browser will restart rendering a page once it sees the Content-Type meta tag, so you should include the <meta /> tag immediately after the <head /> tag in your page so the browser doesn't do any more extra processing than it needs.

Another common charset is "iso-8859-1" (Basic Latin), which you may want to use instead of UTF-8. You can find more detailed info from this awesome article on character encodings and the web. You can also get an exhaustive list of character encodings here if you need a specific type.


If nothing else works, another (rare) possibility is that you may not have a font installed on your computer with the characters needed to display the page. I've tried repeating your results on my own server and had no luck, possibly because I have a lot of fonts installed on my machine so the browser can always substitute unavailable characters from one font with another font.

What I did notice by investigating further is that if text is sent in an encoding different than the encoding the browser reports as, Unicode characters can render unexpectedly. To work around this, I used the HTML character entity representation of special characters, so â becomes â in my HTML and é becomes é. Once I did this, no matter what encoding I reported as, my characters rendered correctly.

Obviously you don't want to modify your database to HTML encode Unicode characters. Your best option if you must do this is to use a PHP function, htmlentities(). You should use this function on any data-driven text you expect to have Unicode characters in. This may be annoying to do, but if specifying the encoding doesn't help, this is a good last resort for forcing Unicode characters to work.



Related Topics



Leave a reply



Submit