How to Prevent Unicode Characters from Rendering as Emoji in HTML from JavaScript

Prevent unicode characters from turning into emojis (especially in gmail)

What question do you want answered? Do you want these characters to display with text presentation in the GMail web app (viewed in a browser)? Or, do you want these characters to appear as authored in the email message which GMail sends? Or, do you want these characters to display with text presentation in a Gmail app for iOS, or Android? All these are different cases.

When I paste the Wikipedia table you show, into a GMail message composition window on a Firefox browser, I see the same thing that you show in your second example: the "base+VS15 (text)" row displays the same as the rows above and below. However, when I look at the HTML which GMail generates in the composition window, I see something very interesting.

For instance, the cell in the "base+VS15 (text)" row, "26A0" column, has this content (line breaks added for readability):

<td><img data-emoji="⚠" class="an1 CToWUd" alt="⚠" aria-label="⚠" 
src="https://fonts.gstatic.com/s/e/notoemoji/14.0/26a0/32.png"
loading="lazy">︎︎</td>

Note: in the actual HTML, the sequence was instead the literal character U+FE0E VARIATION SELECTOR-15, not the HTML hexadecimal entity reference. It turns out that the StackOverflow editor seems to discard the literal U+FE0E character. I had to represent it by something to make it visible in this answer.

So, it seems that GMail chooses to replace the literal U+26A0 WARNING SIGN character with an <img> tag, which instructs the browser to display an image at the URL …/26a0/32.png, which is the emoji presentation Sample Image which you observed in GMail. However, GMail does not discard the U+FE0E VARIATION SELECTOR-15. That character is still present, just after the <img> tag.

When I send that email to another email program (I used Thunderbird), I see the table the same way as you showed in your first example. The "base+VS15 (text)" row displays the differently than the rows above or below.

When I look at the source of the email message I received, and look in that same cell of the table, I see the following:

=E2=9A=A0=EF=B8=8E

This is the two characters, U+26A0 U+FE0E, expressed in UTF-8 and then represented with MIME hex escaping. That is evidence that the email message which GMail sent no longer had GMail's <img> tag, and again had the original characters.

So, do you want these characters to display with text presentation in the GMail web app (viewed in a browser)? I think GMail chooses to deny you that. They perhaps made a design decision that it was better to be sure users saw some form of each emoji, regardless of what browser or fonts the user might have installed, than to get VS15 text display right.

But, do you want these characters to appear as authored in the email message which GMail sends? I think GMail gives you that with no special effort.

It is more difficult to see what is going on with the composition window display of GMail apps for iOS or Android, so I am not going to attempt that.

By the way, I got this HTML by using the Firefox developer tool, control-click…"Inspect Element" on the contents of that cell, then clicking on the tag, and using the control-click… "Copy… Outer HTML" to copy the HTML, then pasting the HTML into a plain text editor.

How to specify emoji version of a Unicode character in HTML?

I think you just specify the "emoji version" as a second entity. Like this:

<p>
✔<br/>
✔️<br/>
</p>

How to remove emoji code using javascript?

The range you have selected is the Private Use Area, containing non-standard characters. Carriers used to encode emoji as different, inconsistent values inside this range.

More recently, the emoji have been given standardised 'unified' codepoints. Many of these are outside of the Basic Multilingual Plane, in the block U+1F300–U+1F5FF, including your example U+1F534 Large Red Circle.

You could detect these characters with [\U0001F300-\U0001F5FF] in a regex engine that supported non-BMP characters, but JavaScript's RegExp is not such a beast. Unfortunately the JS string model is based on UTF-16 code units, so you'd have to work with the UTF-16 surrogates in a regexp:

return this.replace(/([\uE000-\uF8FF]|\uD83C[\uDF00-\uDFFF]|\uD83D[\uDC00-\uDDFF])/g, '')

However, note that there are other characters in the Basic Multilingual Plane that are used as emoji by phones but which long predate emoji. For example U+2665 is the traditional Heart Suit character ♥, but it may be rendered as an emoji graphic on some devices. It's up to you whether you treat this as emoji and try to remove it. See this list for more examples.

How to prevent emojis rendering in Edge

Try adding font-family: "Segoe UI Symbol"; to your css containing the HTML entity.

For example:

span {  font-family: "Segoe UI Symbol";}
<span>✔︎</span>

How can I control (across all devices and browsers) whether a character is displayed as the emoji version or text version?

The short answer is that you can’t. The text variation selector does not work generally for all characters; only those sequences explicitly defined in the standard are valid. Chrome on Windows is in fact violating the standard in your first example because there are no variation sequences for and . There is no Unicode mechanism to stop these characters from behaving like emoji; <U+1F304, U+FE0E> must display identically to U+1F304 alone.

All emoji characters that allow variation selectors are listed in the data file emoji-variation-sequences.txt, and I also curate a visual table on my website for easy access.

However, even for those characters that do support variation selectors, there is no guarantee that they will actually work. Older Android phones for example cannot display many emoji as text-style because they simply lack the fonts necessary to do so.

If you want to ensure universal text-style display, you will need to supply your own fonts to override the system defaults.

Sidenote: While your Windows example gets the variation selectors wrong, it actually does handle ⚠ correctly because that character is meant to be text-style by default unlike all the others. If you need emoji-style display, you have to append the emoji variation selector U+FE0F like so: ⚠️. This is not necessary (but possible) for ⌛ and ⚡ because they’re emoji-default.



Related Topics



Leave a reply



Submit