How Can One Find the Unicode Codepoints That a Font Has Glyphs For, on a Debian-Based System

How can one find the Unicode codepoints that a font has glyphs for, on a Debian-based system?

I would try any of the FreeType 2 language bindings. Here's a Perl solution to list the Unicode code points of a font using Font::FreeType:

use Font::FreeType;
Font::FreeType->new->face('DejaVuSans.ttf')->foreach_char(sub {
printf("%04X\n", $_->char_code);
});

Finding out what characters a given font supports

Here is a method using the fontTools Python library (which you can install with something like pip install fonttools):

#!/usr/bin/env python
from itertools import chain
import sys

from fontTools.ttLib import TTFont
from fontTools.unicode import Unicode

with TTFont(
sys.argv[1], 0, allowVID=0, ignoreDecompileErrors=True, fontNumber=-1
) as ttf:
chars = chain.from_iterable(
[y + (Unicode[y[0]],) for y in x.cmap.items()] for x in ttf["cmap"].tables
)
if len(sys.argv) == 2: # print all code points
for c in chars:
print(c)
elif len(sys.argv) >= 3: # search code points / characters
code_points = {c[0] for c in chars}
for i in sys.argv[2:]:
code_point = int(i) # search code point
#code_point = ord(i) # search character
print(Unicode[code_point])
print(code_point in code_points)

The script takes as arguments the font path and optionally code points / characters to search for:

$ python checkfont.py /usr/share/fonts/**/DejaVuSans.ttf
(32, 'space', 'SPACE')
(33, 'exclam', 'EXCLAMATION MARK')
(34, 'quotedbl', 'QUOTATION MARK')


$ python checkfont.py /usr/share/fonts/**/DejaVuSans.ttf 65 12622 # a ㅎ
LATIN CAPITAL LETTER A
True
HANGUL LETTER HIEUH
False

How to check which fonts contain a specific character with perl?

For .ttf files, you can use Font::TTF and related modules:

use Font::TTF::Font;
my $font = Font::TTF::Font->open( "C:/Windows/Fonts/ariali.ttf" );
my @supported_codepoints = sort { $a <=> $b } $font->{cmap}->reverse;

I'm getting out of my depth, but there's also a Font::TTF::Ttc module in the Font::TTF distribution that you could poke around in and see if you can extract more information about supported code points.

(Font::TTF suggestion came from here)

Some Unicode characters cannot be printed in PrinceXML

As I found out Times New Roman is the default serif font in PrinceXML, and it cannot print control characters (0000-000F). So I just remove the control characters from a string before passing it to the PrinceXML as was suggested in this SO question (take a look at the accepted answer).

$string = preg_replace('/[\x00-\x1F\x7F]/u', '', $string);

Find the font used to render a character, or containing the glyph?

See there for an answer (if your GNOME version has not deprecated the feature)

https://fedoraproject.org/wiki/Identifying_fonts

How to properly display unicode symbols on Linux using QT?

There are several possibilities, but the most probable one is that the font you are using doesn't implement the glyph at all. Can you see the triangle in eg. an editor when using the same font as your Qt font?

Edit: The fonts (which are stored in the system's font files) are the commands to draw the images of each character on the screen). Many (if not all) fonts are incomplete, which means they are not able to represent all 2,000,000,000+ codes which are possible in the unicode (the numbers which represent the characters). The files would just be to large to be practical.

The triangles you want printed are fairly basic, and should be available in many font sets. Liberation Sans and Liberation Serif are two I just checked.

I suspect Qt uses the font set of the system, which can probably be changed in the System Settings somewhere. If you tell us which distribution you are using (i.e. Ubuntu, Debian, ...), maybe we can help.

How can I find encoding of a file via a script on Linux?

It sounds like you're looking for enca. It can guess and even convert between encodings. Just look at the man page.

Or, failing that, use file -i (Linux) or file -I (OS X). That will output MIME-type information for the file, which will also include the character-set encoding. I found a man-page for it, too :)



Related Topics



Leave a reply



Submit