- Are preprocessor directives like #include and #define counted when
tokenizing in the C language (When ask as a question in exam paper)?
The C grammar and analysis is specified in the C standard as parsing the source text into preprocessing tokens, which include # and include as preprocessing tokens. Then preprocessing is performed. After that, all preprocessing tokens are converted to tokens, and the main compilation occurs. (This is a conceptual order in the C standard, not necessarily the actual order used in a compiler.)
You will have to determine whether you want to count preprocessing tokens before preprocessing or count tokens after preprocessing.
#include and #define are not tokens (preprocessing tokens or tokens). The tokens are #, include, #, and define.
- if this count as token then <stdio.h> considered a single token or separate tokens like
<,stdio.h, >?
The tokens in #include <stdio.h> are #, include, and <stdio.h>. There is a special sub-grammar for header names that results in <stdio.h> being a single token. Outside of an #include directive and certain other places where a header name is expected, <stdio.h> would be multiple tokens: <, stdio, ., h, and >. This means any parser to count tokens must be context-dependent.
- Should we count macro parameters like (x) and macro body tokens?
A token is a token.
- How many total tokens are there in this code from a C compiler's
perspective?
I manually counted 53 preprocessing tokens but easily could have made a mistake.
- What is the exact rule or reference from the C standard or GCC
documentation that explains this clearly?
There is no single rule. The grammar for a preprocessing token starts in C 2024 6.4.1 where it defines preprocessing-token to be one of header-name, identifier, pp-number, character-constant, string-literal, punctuator, a universal character name that cannot be one of the aforementioned, or a non-white-space character that cannot be one of the aforementioned. Definitions of those continue in other parts of the C standard. To count tokens, you will have to parse at least the preprocessing grammar of the C standard.
#includedirectives with the actual content of the files named (and any files#included within those files). It also replaces any macro references with their definitions (e.gPIwill be replaced by its defined value:3.14159. This newly created source file is then passed to the actual compiler to be turned into object code. So which tokens are you referring to? Most compilers have an option to save the preprocessed source to a file so you can examine it.