How to Run the Preprocessor on Local Headers Only

How do I run the preprocessor on local headers only?

How much effort are you willing to go to? There's an obnoxiously obscure way to do it but it requires you to set up a dummy directory to hold surrogates for the system headers. OTOH, it doesn't require any changes in any of your source code. The same technique works equally well for C code.

Setup

Files:

./class_a.hpp
./class_b.hpp
./example.cpp
./system-headers/iostream
./system-headers/string

The 'system headers' such as ./system-headers/iostream contain a single line (there is no # on that line!):

include <iostream>

The class headers each contain a single line like:

class A{};

The contents of example.cpp are what you show in the question:

#include <iostream>     //system
#include "class_a.hpp" //local
#include <string> //system
#include "class_b.hpp" //local

int main() {}

Running the C preprocessor

Running the C preprocessor like this produces the output shown:

$ cpp -Dinclude=#include -I. -Isystem-headers example.cpp
# 1 "example.cpp"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "example.cpp"
# 1 "system-headers/iostream" 1
#include <iostream>
# 2 "example.cpp" 2
# 1 "class_a.hpp" 1
class A{};
# 3 "example.cpp" 2
# 1 "system-headers/string" 1
#include <string>
# 4 "example.cpp" 2
# 1 "class_b.hpp" 1
class B{};
# 5 "example.cpp" 2

int main() {}
$

If you eliminate the # n lines, that output is:

$ cpp -Dinclude=#include -I. -Isystem-headers example.cpp | grep -v '^# [0-9]'
#include <iostream>
class A{};
#include <string>
class B{};

int main() {}
$

which, give or take the space at the beginning of the lines containing #include, is what you wanted.

Analysis

The -Dinclude=#include argument is equivalent to #define include #include. When the preprocessor generates output from a macro, even if it looks like a directive (such as #include), it is not a preprocessor directive. Quoting the C++11 standard ISO/IEC 14882:2011 (not that this has changed between versions AFAIK — and is, verbatim, what it says in the C11 standard, ISO/IEC 9899:2011 too, in §6.10.3):

§16.3 Macro replacement

¶8 If a # preprocessing token, followed by an identifier, occurs lexically at the point at which a preprocessing directive could begin, the identifier is not subject to macro replacement.

§16.3.4 Rescanning and further replacement

¶2 If the name of the macro being replaced is found during this scan of the replacement list (not including the rest of the source file’s preprocessing tokens), it is not replaced. …

¶3 The resulting completely macro-replaced preprocessing token sequence is not processed as a preprocessing directive even if it resembles one, …

When the preprocessor encounters #include <iostream>, it looks in the current directory and finds no file, then looks in ./system-headers and finds the file iostream so it processes that into the output. It contains a single line, include <iostream>. Since include is a macro, it is expanded (to #include) but further expansion is prevented, and the # is not processed as a directive because of §16.3.4 ¶3. Thus, the output contains #include <iostream>.

When the preprocessor encounters #include "class_a.hpp", it looks in the current directory and finds the file and includes its contents in the output.

Rinse and repeat for the other headers. If class_a.hpp contained #include <iostream>, then that ends up expanding to #include <iostream> again (with the leading space). If your system-headers directory is missing any header, then the preprocessor will search in the normal locations and find and include that. If you use the compiler rather than cpp directly, you can prohibit it from looking in the system directories with -nostdinc — so the preprocessor will generate an error if system-headers is missing a (surrogate for a) system header.

$ g++ -E -nostdinc -Dinclude=#include -I. -Isystem-headers example.cpp | grep -v '^# [0-9]'
#include <iostream>
class A{};
#include <string>
class B{};

int main() {}
$

Note that it is very easy to generate the surrogate system headers:

for header in algorithm chrono iostream string …
do echo "include <$header>" > system-headers/$header
done

JFTR, testing was done on Mac OS X 10.11.5 with GCC 6.1.0. If you're using GCC (the GNU Compiler Collection, with leading example compilers gcc and g++), your mileage shouldn't vary very much with any plausible alternative version.

If you're uncomfortable using the macro name include, you can change it to anything else that suits you — syzygy, apoplexy, nadir, reinclude, … — and change the surrogate headers to use that name, and define that name on the preprocessor (compiler) command line. One advantage of include is that it's improbable that you have anything using that as a macro name.

Automatically generating surrogate headers

osgx asks:

How can we automate the generation of mock system headers?

There are a variety of options. One is to analyze your code (with grep for example) to find the names that are, or might be, referenced and generate the appropriate surrogate headers. It doesn't matter if you generate a few unused headers — they won't affect the process. Note that if you use #include <sys/wait.h>, the surrogate must be ./system-headers/sys/wait.h; that slightly complicates the shell code shown, but not by very much. Another way would look at the headers in the system header directories (/usr/include, /usr/local/include, etc) and generate surrogates for the headers you find there.
For example, mksurrogates.sh might be:

#!/bin/sh

sysdir="./system-headers"
for header in "$@"
do
mkdir -p "$sysdir/$(dirname $header)"
echo "include <$header>" > "$sysdir/$header"
done

And we can write listsyshdrs.sh to find the system headers referenced in source code under a named directory:

#!/bin/sh

grep -h -e '^[[:space:]]*#[[:space:]]*include[[:space:]]*<[^>]*>' -r "${@:-.}" |
sed 's/^[[:space:]]*#[[:space:]]*include[[:space:]]*<\([^>]*\)>.*/\1/' |
sort -u

With a bit of formatting added, that generated a list of headers like this when I scanned the source tree with my answers to SO questions:

algorithm         arpa/inet.h       assert.h          cassert
chrono cmath cstddef cstdint
cstdlib cstring ctime ctype.h
dirent.h errno.h fcntl.h float.h
getopt.h inttypes.h iomanip iostream
limits.h locale.h map math.h
memory.h netdb.h netinet/in.h pthread.h
semaphore.h signal.h sstream stdarg.h
stdbool.h stddef.h stdint.h stdio.h
stdlib.h string string.h sys/ipc.h
sys/mman.h sys/param.h sys/ptrace.h sys/select.h
sys/sem.h sys/shm.h sys/socket.h sys/stat.h
sys/time.h sys/timeb.h sys/times.h sys/types.h
sys/wait.h termios.h time.h unistd.h
utility vector wchar.h

So, to generate the surrogates for the source tree under the current directory:

$ sh mksurrogatehdr.sh $(sh listsyshdrs.sh)
$ ls -lR system-headers
total 344
-rw-r--r-- 1 jleffler staff 20 Jul 2 17:27 algorithm
drwxr-xr-x 3 jleffler staff 102 Jul 2 17:27 arpa
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 assert.h
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 cassert
-rw-r--r-- 1 jleffler staff 17 Jul 2 17:27 chrono
-rw-r--r-- 1 jleffler staff 16 Jul 2 17:27 cmath
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 cstddef
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 cstdint
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 cstdlib
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 cstring
-rw-r--r-- 1 jleffler staff 16 Jul 2 17:27 ctime
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 ctype.h
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 dirent.h
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 errno.h
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 fcntl.h
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 float.h
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 getopt.h
-rw-r--r-- 1 jleffler staff 21 Jul 2 17:27 inttypes.h
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 iomanip
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 iostream
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 limits.h
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 locale.h
-rw-r--r-- 1 jleffler staff 14 Jul 2 17:27 map
-rw-r--r-- 1 jleffler staff 17 Jul 2 17:27 math.h
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 memory.h
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 netdb.h
drwxr-xr-x 3 jleffler staff 102 Jul 2 17:27 netinet
-rw-r--r-- 1 jleffler staff 20 Jul 2 17:27 pthread.h
-rw-r--r-- 1 jleffler staff 22 Jul 2 17:27 semaphore.h
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 signal.h
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 sstream
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 stdarg.h
-rw-r--r-- 1 jleffler staff 20 Jul 2 17:27 stdbool.h
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 stddef.h
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 stdint.h
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 stdio.h
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 stdlib.h
-rw-r--r-- 1 jleffler staff 17 Jul 2 17:27 string
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 string.h
drwxr-xr-x 16 jleffler staff 544 Jul 2 17:27 sys
-rw-r--r-- 1 jleffler staff 20 Jul 2 17:27 termios.h
-rw-r--r-- 1 jleffler staff 17 Jul 2 17:27 time.h
-rw-r--r-- 1 jleffler staff 19 Jul 2 17:27 unistd.h
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 utility
-rw-r--r-- 1 jleffler staff 17 Jul 2 17:27 vector
-rw-r--r-- 1 jleffler staff 18 Jul 2 17:27 wchar.h

system-headers/arpa:
total 8
-rw-r--r-- 1 jleffler staff 22 Jul 2 17:27 inet.h

system-headers/netinet:
total 8
-rw-r--r-- 1 jleffler staff 23 Jul 2 17:27 in.h

system-headers/sys:
total 112
-rw-r--r-- 1 jleffler staff 20 Jul 2 17:27 ipc.h
-rw-r--r-- 1 jleffler staff 21 Jul 2 17:27 mman.h
-rw-r--r-- 1 jleffler staff 22 Jul 2 17:27 param.h
-rw-r--r-- 1 jleffler staff 23 Jul 2 17:27 ptrace.h
-rw-r--r-- 1 jleffler staff 23 Jul 2 17:27 select.h
-rw-r--r-- 1 jleffler staff 20 Jul 2 17:27 sem.h
-rw-r--r-- 1 jleffler staff 20 Jul 2 17:27 shm.h
-rw-r--r-- 1 jleffler staff 23 Jul 2 17:27 socket.h
-rw-r--r-- 1 jleffler staff 21 Jul 2 17:27 stat.h
-rw-r--r-- 1 jleffler staff 21 Jul 2 17:27 time.h
-rw-r--r-- 1 jleffler staff 22 Jul 2 17:27 timeb.h
-rw-r--r-- 1 jleffler staff 22 Jul 2 17:27 times.h
-rw-r--r-- 1 jleffler staff 22 Jul 2 17:27 types.h
-rw-r--r-- 1 jleffler staff 21 Jul 2 17:27 wait.h
$

This assumes that header file names contain no spaces, which is not unreasonable — it would be a brave programmer who created header file names with spaces or other tricky characters.

A full production-ready version of mksurrogates.sh would accept an argument specifying the surrogate header directory.

How to show 'preprocessed' code ignoring includes with GCC

I agree with Matteo Italia's comment that if you just prevent the #include directives from being expanded, then the resulting code won't represent what the compiler actually sees, and therefore it will be of limited use in troubleshooting.

Here's an idea to get around that. Add a variable declaration before and after your includes. Any variable that is reasonably unique will do.

int begin_includes_tag;
#include <stdio.h>
... other includes
int end_includes_tag;

Then you can do:

> gcc -E main -o out.c | sed '/begin_includes_tag/,/end_includes_tag/d'

The sed command will delete everything between those variable declarations.

How to partially preprocess a C file with specific working directory

I found a utility that does exactly what I was looking for:

$ cpphs --nowarn --nomacro -I./ input.c | sed -E 's|#line 1 "missing file: (.*)"|#include <\1>|'

Using clang or g++/gcc to print preprocessed code without including files from system paths

(It's not really an answer - just a "hack")

To solve this I created a text file with all system headers by:

rem my GCC STL-PATH
cd Z:\usr\include\c++\10

dir /b > F:\DummySTL\files.txt

Then I executed the following line of code:

for /f "delims=" %F in (files.txt) do copy nul "%F"

This creates an empty text file for every line in the file.
Now I can call gcc or clang just with:

-isystem"F:\DummySTL"

Is it safe to run the C preprocessor several times on the same source?

In general, preprocessing via cpp is not guaranteed to be idempotent (a noop after the first run). A simple counterexample:

#define X #define Y z
X
Y

The first invocation will yield:

 #define Y z
Y

The second one:

z

Having said that, valid C code shouldn't be doing something like that (because the output wouldn't be valid input for next stages of the compiler).

Moreover, depending on what you are trying to do, cpp has options like -fpreprocessed that may help.

How to tell the preprocessor to search for a particular folder for header files, when I say #include xyz.h

You should add a path into "Additional include directories" in the "C++" section of the project options (the "General" tab). You can use environment variables as well as "this folder" (.) shortcut and "up one folder" (..) shortcut for this setting to not be bound to a certain directory structure.

Get #included header files after including

You'll need to write your own script for this. What you're asking for is far less than what gcc -E does (preprocess everything). Further, it's less than what cpp -fdirectives-only does, because you don't want to recurse. You effectively have a project-specific requirement, and you'll need to write project-specific code for it.



Related Topics



Leave a reply



Submit