What's the Difference Between "&Nbsp;" and " "

What's the difference between   and ?

One is non-breaking space and the other is a regular space. A non-breaking space means that the line should not be wrapped at that point, just like it wouldn’t be wrapped in the middle of a word.

Furthermore as Svend points out in his comment, non-breaking spaces are not collapsed.

Why is   different from while comparing

The space " " and the non-breaking space are two different characters. The non-breaking space has a code unit of 160, whereas the space has a code unit of 32.

Going off this observation, the spec uses the following logic when strict equality is used between two non-numeric types:

7.2.13 SameValueNonNumeric ( x, y )

The internal comparison abstract operation SameValueNonNumeric(x, y),
where neither x nor y are numeric type values, produces true or false.
Such a comparison is performed as follows:

  • Assert: Type(x) is not Number or BigInt. Assert: Type(x) is the same
    as Type(y).

  • If Type(x) is Undefined, return true. If Type(x) is Null,
    return true.

  • If Type(x) is String, then

  • If x and y are exactly the
    same sequence of code units (same length and same code units at
    corresponding indices), return true; otherwise, return false.
    ...

The last statement above is not true as both have different code unit values (as seen above), and so, we get false when you try and compare the two. This shouldn't be too surprising as we're comparing two different strings (as indicated by their code unit values).

When you use \s in a regular expression, however, you're referring to special whitespace characters:

Matches a single white space character, including space, tab, form feed, line feed, and other Unicode spaces. Equivalent to

[ \f\n\r\t\v\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]

- MDN

The character set above includes both the space character (seen at the beginning of the character set) and the non-breaking space (which has a Unicode encoding of U+00A0), and so both your tests using regular expressions will return true.

When to use  

  (it should have a semi-colon on the end) is an entity for a non-breaking space.

Use it between two words that should not have a line break inserted between them by word wrapping.

There is a good explanation about when this is appropriate grammar on the English StackExchange.


It is sometimes abused to create horizontal space between content in web pages (since it will not collapse like multiple regular spaces). Padding and margins should usually be used instead of this hack.

innerHTML and  

technically, I just made an assignment of one variable to another, both are strings. How come the values are different?

Because yes, innerHTML is special. It's not just a simple value property, it's an accessor property. When you assign to it, you're calling a function under-the-covers which parses the HTML you give it and creates the necessary DOM nodes and elements. When you read its value, you're calling another function under-the-covers which recurses through the DOM tree starting from the element on which you access it and builds an HTML string representing that DOM tree.

So while you assigned the character U+00A0 to it, that got turned into a DOM text node; and when you read from it, that DOM text node was rendered as a canonical (per that browser's rules) HTML string:  .

You can see that innerHTML isn't just a simple value property by using the DOM to manipulate the element:

var target = document.getElementById("target");target.innerHTML = "\u00A0";console.log(target.innerHTML); // " "target.appendChild(  document.createElement("span"));console.log(target.innerHTML); // " <span></span>"
<div id="target"></div>

  is getting replaced with á on labels

This is a character encoding issue.

The probable chain of events is this:

  • The browser is rendering the   entity into the Unicode code point "U+00A0 NO-BREAK SPACE".
  • This is being encoded in UTF-8, as the sequence of bytes C2 A0.
  • These bytes are being interpreted by the Zebra printer according to Code page 850, where C2 is mapped to "┴" (U+2534 BOX DRAWINGS LIGHT UP AND HORIZONTAL) and A0 to "á" (U+00E1 LATIN SMALL LETTER A WITH ACUTE).

In code page 850, a non-breaking space is represented by the byte FF.

You may be able to tell the whatever is interpreting the HTML to use Code page 850 instead of UTF-8, and it will send the byte sequences the printer is expecting. You will need to make sure your input doesn't contain any literal UTF-8 - escape all non-ASCII characters as HTML entities.

Otherwise, you will need to substitute byte-wise before sending to the printer, or encode in some other way.

How to write out HTML entity name ( , , , etc)

You can use & instead of &
So   will be &nbsp;

Tab space instead of multiple non-breaking spaces ( nbsp )?

It's much cleaner to use CSS. Try padding-left:5em or margin-left:5em as appropriate instead.

Perl split string at character entity reference  

You shouldn't have to know how the text in the document is encoded. As such, findvalue returns an actual non-breaking space (U+00A0) when the document contains  . As such, you'd use

split(/\xA0/, $title_string)
-or-
split(/\x{00A0}/, $title_string)
-or-
split(/\N{U+00A0}/, $title_string)
-or-
split(/\N{NBSP}/, $title_string)
-or-
split(/\N{NO-BREAK SPACE}/, $title_string)


Related Topics



Leave a reply



Submit