\U200B (Zero Width Space) Characters in My Js Code. Where Did They Come From

\u200b (Zero width space) characters in my JS code. Where did they come from?

Here's a stab in the dark.

My bet would be on Google Chrome Inspector. Searching through the Chromium source, I spotted the following block of code

    if (hasText)
attrSpanElement.appendChild(document.createTextNode("=\u200B\""));

if (linkify && (name === "src" || name === "href")) {
var rewrittenHref = WebInspector.resourceURLForRelatedNode(node, value);
value = value.replace(/([\/;:\)\]\}])/g, "$1\u200B");
attrSpanElement.appendChild(linkify(rewrittenHref, value, "webkit-html-attribute-value", node.nodeName().toLowerCase() === "a"));
} else {
value = value.replace(/([\/;:\)\]\}])/g, "$1\u200B");
var attrValueElement = attrSpanElement.createChild("span", "webkit-html-attribute-value");
attrValueElement.textContent = value;
}

It's quite possible that I'm simply barking up the wrong tree here, but it looks like zero-width spaces were being inserted (to handle soft text wrapping?) during the display of attributes. Perhaps the "Copy as HTML" function had not properly removed them?


Update

After fiddling with the Chrome element inspector, I'm almost convinced that's where your stray \u200b came from. Notice how the line can wrap not only at visible space but also after = or chars matched by /([\/;:\)\]\}])/ thanks to the inserted zero-width space.

chrome inspector screenshot

Unfortunately, I am unable to replicate your problem where they inadvertently get included into your clipboard (I used Chrome 13.0.782.112 on Win XP).

It would certainly be worth submitting a bug report should your be able to reproduce the behaviour.

Escape \u200b (Zero width space) and other illegal JavaScript characters

If the character only appears inside strings, either escaped as \u200b or as literals, you should be fine. They're only illegal as identifiers. And even as identifiers, you could still use them as object property names, if you use subscript notation (obj["aaa\u200b"] = "foo"); at least in Chrome, but I'm not sure how safe/compatible that is.

Look at a few example that seem to work:

var escaped = "aaa \u200b bbb";
var unescaped = "bbb ​ ccc";

console.log(escaped.charCodeAt(4));
console.log(unescaped.charCodeAt(4));

var obj = {};
obj[escaped] = "foo";
obj[unescaped] = "bar";
console.log(obj[escaped]);
console.log(obj[unescaped]);
console.log(obj["aaa \u200b bbb"]);
console.log(obj["bbb ​ ccc"]);

http://codepen.io/anon/pen/JqjEK

You might also be interested on this Q/A I wrote a while ago: No visible cause for "Unexpected token ILLEGAL"

Remove zero-width space characters from a JavaScript string

Unicode has the following zero-width characters:

  • U+200B zero width space
  • U+200C zero width non-joiner Unicode code point
  • U+200D zero width joiner Unicode code point
  • U+FEFF zero width no-break space Unicode code point

To remove them from a string in JavaScript, you can use a simple regular expression:

var userInput = 'a\u200Bb\u200Cc\u200Dd\uFEFFe';
console.log(userInput.length); // 9
var result = userInput.replace(/[\u200B-\u200D\uFEFF]/g, '');
console.log(result.length); // 5

Note that there are many more symbols that may not be visible. Some of ASCII’s control characters, for example.

Illegal Character error: '\u200b'

\u200b is a "zero-width-space" in Unicode.

You should delete line 12 (the blank line), save the file, re-add the blank line and save again. using a simple text editor.

If that doesn't fix it delete lines 11 and 13 as well and recreate them.

How to remove zero-width space characters ‍ from the text

The string is the HTML character entity for the zero-width joiner. When a web browser sees it it will replace it with an actual zero-width joiner, but as far as Ruby is concerned it is just a 5 character string.

What you want to do is to specify the actual zero-width joiner character. It has the codepoint U+200D, so you can use it like this, using Ruby’s Unicode escape:

text.gsub("\u200D", "")

This should remove the zero-width joiner characters, rather than looking for the string which your original code was doing.

Zero-width space vs zero-width non-joiner

A zero-width non-joiner is almost non-existing. Its only purpose is to split things into two. For example, 123 zero-width-non-joiner 456 is two numbers with nothing in between.

A zero-width space is a space character, just a very very narrow one. For example 123 zero-width-space 456 is two numbers with a space character in between.

String length shows that this string's length is 7, but actually string length is 6

As per what I have observed in the data that's being received there's an extra U+200B zero width space character. That's the main reason why the length is not coming as expected.

I have copied the { table : '​circle' } from the comments and tried to execute the same in browser console, this is how it looks in the console.

Sample Image

In the below snippet also the same thing is happening i.e., \u200b which is zero width space unicode character is being added

let x = { table : '​circle' }

console.log(x.table.length);

How to identify zero-width character?

Aha, use this website http://www.fileformat.info/info/unicode/char/search.htm?q=%E2%80%8B&preview=entity

Are you looking for Unicode character U+200B: ZERO WIDTH SPACE?

http://www.fileformat.info/info/unicode/char/200b/index.htm

JavaScript remove ZERO WIDTH SPACE (unicode 8203) from string

The number in a unicode escape should be in hex, and the hex for 8203 is 200B (which is indeed a Unicode zero-width space), so:

var b = a.replace(/\u200B/g,'');

Live Example:

var a = "o​m"; //the invisible character is between o and m
var b = a.replace(/\u200B/g,'');
console.log("a.length = " + a.length); // 3
console.log("a === 'om'? " + (a === 'om')); // false
console.log("b.length = " + b.length); // 2
console.log("b === 'om'? " + (b === 'om')); // true


Related Topics



Leave a reply



Submit