When Did C++ Compilers Start Considering More Than Two Hex Digits in String Literal Character Escapes

When did C++ compilers start considering more than two hex digits in string literal character escapes?

GCC is only following the standard. #877: "Each [...] hexadecimal escape sequence is the longest sequence of characters that can constitute the escape sequence."

Tell where escaped hexadecimal ends in a string

From this C++ escape sequence reference:

Hexadecimal escape sequences have no length limit and terminate at the first character that is not a valid hexadecimal digit.

So without any workarounds there is simply no limit on the digits the compiler will read.

Note that the corresponding C reference says the same thing.

How does more than one hex-digit in a hex-escape work in the grammar?

You're missing the second expansion. The full definition is:

hexadecimal-escape-sequence:
"\x" hexadecimal-digit
hexadecimal-escape-sequence hexadecimal-digit

So the minimal escape sequence is "\x" followed by a single digit, but you can extend the escape sequence by adding as many digits as you like.

Why is percentage character not escaped with backslash in C?

Because the % is handled by printf. It is not a special character in C, but printf itself treats it differently.

Rules for C++ string literals escape character

Control characters:

(Hex codes assume an ASCII-compatible character encoding.)

  • \a = \x07 = alert (bell)
  • \b = \x08 = backspace
  • \t = \x09 = horizonal tab
  • \n = \x0A = newline (or line feed)
  • \v = \x0B = vertical tab
  • \f = \x0C = form feed
  • \r = \x0D = carriage return
  • \e = \x1B = escape (non-standard GCC extension)

Punctuation characters:

  • \" = quotation mark (backslash not required for '"')
  • \' = apostrophe (backslash not required for "'")
  • \? = question mark (used to avoid trigraphs)
  • \\ = backslash

Numeric character references:

  • \ + up to 3 octal digits
  • \x + any number of hex digits
  • \u + 4 hex digits (Unicode BMP, new in C++11)
  • \U + 8 hex digits (Unicode astral planes, new in C++11)

\0 = \00 = \000 = octal ecape for null character

If you do want an actual digit character after a \0, then yes, I recommend string concatenation. Note that the whitespace between the parts of the literal is optional, so you can write "\0""0".

error: hex escape sequence out of range

The problem with:

BOOL r = [res isEqualToString:@"\x124Vx\xc3\xaa"];

is with the \x124 part. It seems you can only have hex values in the range 00 - ff. And note the removal of the [ ] around the string.

If you don't want the 4 to be considered part of the \x hex number, you can do this:

BOOL r = [res isEqualToString:@"\x12""4Vx\xc3\xaa"];

The two double-quote characters ensure the \x escape sequence stops where you need it to.

To eliminate the new warning about a possible missing comma when using such a string in an NSArray, you will need to use the older syntax to create the array:

NSArray *answers = [NSArray arrayWithObjects:
@"",
@"#",
@"\x12""4Vx\xc3\xaa",
nil
];

strlen - the length of the string is sometimes increased by 1

Let's write

char c[] = "abc\012\0x34";

with single characters:

char c[] = { 'a', 'b', 'c', '\012', '\0', 'x', '3', '4', '\0' };

The first \0 you see is the start of an octal escape sequence \012 that extends over the following octal digits.

Octal escape sequences are specified in section 6.4.4.4 of the standard (N1570 draft):

octal-escape-sequence:
\ octal-digit
\ octal-digit octal-digit
\ octal-digit octal-digit octal-digit

they consist of a backslash followed by one, two, or three octal digits. In paragraph 7 of that section, the extent of octal and hexadecimal escape sequences is given:

7 Each octal or hexadecimal escape sequence is the longest sequence of characters that can
constitute the escape sequence.

Note that while the length of an octal escape sequence is limited to at most three octal digits (thus "\123456" consists of five characters, { '\123', '4', '5', '6', '\0' }), hexadecimal escape sequences have unlimited length

hexadecimal-escape-sequence:
\x hexadecimal-digit
hexadecimal-escape-sequence hexadecimal-digit

and thus "\x123456789abcdef" consists of only two characters ({ '\x123456789abcdef', '\0' }).

std::string control characters: assigning a hex number to a string

You want this:

std::string pattern = "\xDD\xAF\x57\x42"; 

Otherwise, it tries to read your entire hex code in as one char, which then is truncated to only the last 8 bits.



Related Topics



Leave a reply



Submit