Stringification - How Does It Work

Stringification - how does it work?

The relevant steps of macro expansion are (per C 2011 [n1570] 6.10.3.1 and C++ 1998 16.3.1):

  1. Process tokens that are preceded by # or ##.
  2. Apply macro replacement to each argument.
  3. Replace each parameter with the corresponding result of the above macro replacement.
  4. Rescan for more macros.

Thus, with xstr(foo), we have:

  1. The replacement text, str(s), contains no # or ##, so nothing happens.
  2. The argument foo is replaced with 4, so it is as if xstr(4) had been used.
  3. In the replacement text str(s), the parameter s is replaced with 4, producing str(4).
  4. str(4) is rescanned. (The resulting steps produce ”4”.)

Note that the problem with str(foo) is that step 2, which would replace foo with 4, comes after step 1, which changes the argument to a string. In step 1, foo is still foo; it has not been replaced with 4, so the result is ”foo”.

This is why a helper macro is used. It allows us to get a step 2 performed, then use another macro to perform step 1.

C Preprocessor stringification (again)

Is it possible to have stringification after numeric evaluation?

Yes. For reference here are some specific definitions I'll use:

  • arithmetic The ability to perform calculations on numbers using common primitives such as adding, subtracting, multiplication, division, etc.
  • expression A grammatical representation of arithmetic using common operators such as parenthetical groupings, infix operators +, -, *, /, etc.

Arithmetic approach

Given these definitions, macro expansions cannot evaluate expressions, but they can perform arithmetic. Instead of the operators you would use macros, each of which would implement arithmetic from scratch. Here's a boost pp usage using this approach:

#include <boost/preprocessor/arithmetic.hpp>
#include <boost/preprocessor/stringize.hpp>

#define A 1
#define B 2
#define SUM BOOST_PP_ADD(A, B)
BOOST_PP_STRINGIZE(SUM)

Demo

With this approach, you get to simply expand a macro to a result; that result can then be stringified. But you're implementing arithmetic itself in macros as opposed to operators, and that takes a lot of macros. So to pull this off, either your number ranges need to be severely limited, or you need to use numbers that are decomposable to macro evaluation (e.g., represent 20000 as (2,0,0,0,0) as opposed to 20000). Boost pp arithmetic uses the former approach (works with ranges from 0 to 256; yes, I realize that's 257 numbers).

Expression approach

Alternately, you can evaluate expressions. As noted, the preprocessor can evaluate expressions in conditional directives. Using that as a primitive, you can tease out the result; e.g., if EXPRESSION expands to your expression, you can #define D0, representing the unit digit of the result, using a construct like this:

#if   ((EXPRESSION)%10)==9
#define D0 9
#elif ((EXPRESSION)%10)==8
#define D0 8
...

You can then similarly #define D1 to be the specific digit for the ten's place, D2 for the hundreds, etc... then have a RESULT macro expand to ... D3##D2##D1##D0. Wrap this entire thing into something like evaluator.hpp and you can pump an arbitrary expression in by defining EXPRESSION as your expression, using #include "evaluator.hpp" to evaluate it, and finally use RESULT to represent the result. With this approach each "evaluator" needs the specific Dx macros defined to the specific digits to work... so that behaves analogously to "variables", but consumes the entire evaluator.

Boost pp has this capability with each evaluator known as a "slot", and provides 5 slots. So you have to pump with #include's, and each evaluator can only store one result at a time... but in return your range is not restricted (more than native ranges) and you're actually evaluating expressions. Here's an example of using this approach using boost pp:

#include <boost/preprocessor/slot/slot.hpp>
#include <boost/preprocessor/stringize.hpp>

#define A 1
#define B 2
#define SUM (A+B)

#define BOOST_PP_VALUE SUM
#include BOOST_PP_ASSIGN_SLOT(1)

BOOST_PP_STRINGIZE(BOOST_PP_SLOT(1))

Demo

(Edit: Hand rolled evaluator here (wandbox) might be worth review to see how the mechanics explained above work).

TL;DR summary

    • arithmetic
    • pro: evaluates entirely by macro invocation
    • con: macros instead of infix operators
    • con: either ranges are limited or uses alternate literal representations
    • expressions
    • pro: evaluates actual expressions (infix, groups, etc)
    • pro: ranges as open as native ranges, with ordinary literals
    • con: requires `#include` to pump generic mechanism
    • con: reuse of evaluator must lose previous result

C preprocessor stringification weirdness

Because of the order of expansion. The GCC documentation says:

Macro arguments are completely macro-expanded before they are substituted into a macro body, unless they are stringified or pasted with other tokens. After substitution, the entire macro body, including the substituted arguments, is scanned again for macros to be expanded. The result is that the arguments are scanned twice to expand macro calls in them.

So if the argument will be stringified, it is not expanded first. You are getting the literal text in the parenthesis. But if it's being passed to another macro, it is expanded. Therefore if you want to expand it, you need two levels of macros.

This is done because there are cases where you do not want to expand the argument before stringification, most common being the assert() macro. If you write:

assert(MIN(width, height) >= 240);

you want the message to be:

Assertion MIN(width, height) >= 240 failed

and not some insane thing the MIN macro expands to (in gcc it uses several gcc-specific extensions and is quite long IIRC).

Stringification of int in C/C++

The reason is that preprocessors operate on tokens passed into them, not on values associated with those tokens.

#include <stdio.h>

#define vstr(s) str(s)
#define str(s) #s

int main()
{
puts(vstr(10+10));
return 0;
}

Outputs:
10+10

Stringification of a macro value

The xstr macro defined below will stringify after doing macro-expansion.

#define xstr(a) str(a)
#define str(a) #a

#define RECORDS_PER_PAGE 10

#define REQUEST_RECORDS \
"SELECT Fields FROM Table WHERE Conditions" \
" OFFSET %d * " xstr(RECORDS_PER_PAGE) \
" LIMIT " xstr(RECORDS_PER_PAGE) ";"

How JSON.stringify() works internally?

How JSON.stringify() works internally?

Thats probably some low level, highly optimized native code. But lets assume it is just a regular JavaScript function instead, that makes things easier. The function would be defined as such:

 JSON.stringify = function(toStringify, replacer) {

Now that function has to determine what toStringify is first, e.g.:

  if(typeof toStringify === "object") {

In that case, the code has to go over all the objects key/value pairs:

  for(let key in toStringify) {
let value = toStringify[key];

Now the code can call the replacer with those pairs:

   value = replacer(key, value);

Then a string can be built up as:

   result += `"${key}": ${JSON.stringify(value)}`;

Then that result gets returned.

why Stringification not working as expected

Because of macro expansion rules in C. Using the str(s) you defined the foo immediately gets placed as #foo rather than evaluating the value of foo. When you wrap it with xstr it gives it a chance to actually evaluate foo before applying stringification.

The process looks something like this

str(foo)->#foo->"foo"
xstr(foo)->str(4)->#4->"4"

preprocessor macro stringify

why one of the bellow preprocessor macro doesn't work while the other does

While the preprocessor will expand most further macros that result out of a current expansion, it will only do a single expansion step. So ASTRINGZ(__FILE__) is not going to be expanded all the way before being passed to the stringification of TODO.

You have several options to deal with this, the easiest is to rely on the fact __FILE__ is already a string literal.

#define msg(s) TODO( s " - @ - " __FILE__)

But if you wish to experiment with macro expansion, you can try a deferring technique. This will delay the moment TODO is actually expanded itself, and give the arguments time to be expanded themselves.

#define EMPTY() 
#define DEFER(m) m EMPTY EMPTY()()

#define msg(s) DEFER(TODO)( s " - @ - " ASTRINGZ(__FILE__))

The above makes the ( s " - @ - " ASTRINGZ(__FILE__)) not be arguments to a macro, so ASTRINGZ will be expanded. DEFER(TODO) is a macro however, so it will be expanded to TODO EMPTY EMPTY()(). It will take two more expansion cycles (each EMPTY() for TODO (...) to be handed back to the preprocessor. At which point everything should be properly expanded.

what is the difference between #pragma and _Pragma

_Pragma is another standard way to provide compiler specific pragma directive. The difference is that _Pragma can be the result of macro expansion, while #pragma being a directive may not.

why do we wrap STRINGZ with ASTRINGZ?

It's another deferral technique. In case the argument to ASTRINGZ is itself the result of some non-trivial preprocssor expansion.

How, exactly, does the double-stringize trick work?

Yes, it's guaranteed.

It works because arguments to macros are themselves macro-expanded, except where the macro argument name appears in the macro body with the stringifier # or the token-paster ##.

6.10.3.1/1:

... After the arguments for the
invocation of a function-like macro
have been identified, argument
substitution takes place. A parameter
in the replacement list, unless
preceded by a # or ## preprocessing
token or followed by a ##
preprocessing token (see below), is
replaced by the corresponding argument
after all macros contained therein
have been expanded...

So, if you do STR1(THE_ANSWER) then you get "THE_ANSWER", because the argument of STR1 is not macro-expanded. However, the argument of STR2 is macro-expanded when it's substituted into the definition of STR2, which therefore gives STR1 an argument of 42, with the result of "42".



Related Topics



Leave a reply



Submit