9

From what I saw across many many stackoverflow questions among other places, the way to define globals is to define them in exactly one .c file, then declare it as an extern in a header file which then gets included in the required .c files.

However, today I saw in a codebase global variable definition in the header file and I got into arguing, but he insisted it will work. Now, I had no idea why, so I created a small project to test it out real quick:

a.c

#include <stdio.h>
#include "a.h"

int main()
{
    p1.x = 5;
    p1.x = 4;
    com = 6;
    change();
    printf("p1 = %d, %d\ncom = %d\n", p1.x, p1.y, com);
    return 0;
}

b.c

#include "a.h"

void change(void)
{
    p1.x = 7;
    p1.y = 9;
    com = 1;
}

a.h

typedef struct coord{
    int x;
    int y;
} coord;

coord p1;
int com;

void change(void);

Makefile

all:
    gcc -c a.c -o a.o
    gcc -c b.c -o b.o
    gcc a.o b.o -o run.out

clean:
    rm a.o b.o run.out

Output

p1 = 7, 9
com = 1

How is this working? Is this an artifact of the way I've set up the test? Is it that newer gcc has managed to catch this condition? Or is my interpretation of the whole thing completely wrong? Please help...

6
  • 2
    The program has undefined behavior. Commented Feb 4, 2021 at 11:22
  • 2
    @VladfromMoscow Ok, thank you. Could you please give a for instance what would trigger the undefined behaviour? Commented Feb 4, 2021 at 11:27
  • 3
    The compiler will create two tentative definitions of the same variable in each translation unit. So the further behavior depends on the behavior of the linker. It can issue an error message. Or it can remove one of the tentative definitions. Or the program will have two tentative definitions and as a result changing the value of one tentative definition will not influence on the value of other tentative definition that is it will look as you have an object with the internal linkage. Commented Feb 4, 2021 at 11:29
  • 1
    Nothing, or everything. UB means your compiler can has no contract to fulfill. The program can behave in practically any way. Commented Feb 4, 2021 at 11:30
  • 1
    The linker should complain just as it would complain if you had two functions with the same name in two different translation units Commented Feb 4, 2021 at 11:35

2 Answers 2

6

This relies on so called "common symbols" which are an extension to standard C's notion of tentative definitions (https://port70.net/~nsz/c/c11/n1570.html#6.9.2p2), except most UNIX linkers make it work across translation units too (and many even with shared dynamic libaries)

AFAIK, the feature has existed since pretty much forever and it had something to do with fortran compatibility/similarity.

It works by the compiler placing giving uninitialized (tentative) globals a special "common" category (shown in the nm utility as "C", which stands for "common").

Example of data symbol categories:

  #!/bin/sh -eu
(
cat <<EOF
int common_symbol; //C
int zero_init_symbol = 0; //B
int data_init_symbol = 4; //D
const int const_symbol = 4; //R
EOF
) | gcc -xc - -c -o data_symbol_types.o
nm data_symbol_types.o

Output:

0000000000000004 C common_symbol
0000000000000000 R const_symbol
0000000000000000 D data_init_symbol
0000000000000000 B zero_init_symbol

Whenever a linker sees multiple redefinitions for a particular symbol, it usually generates linkers errors.

But when those redefinitions are in the common category, the linker will merge them into one. Also, if there are N-1 common definitions for a particular symbol and one non-tentative definition (in the R,D, or B category), then all the definitions are merged into the one nontentative definition and also no error is generated.

In other cases you get symbol redefinition errors.

Although common symbols are widely supported, they aren't technically standard C and relying on them is theoretically undefined behavior (even though in practice it often works).

clang and tinycc, as far as I've noticed, do not generate common symbols (there you should get a redefinition error). On gcc, common symbol generation can be disabled with -fno-common.

(Ian Lance Taylor's serios on linkers has more info on common symbols and it also mentions how linkers even allow merging differently sized common symbols, using the largest size for the final object: https://www.airs.com/blog/archives/42 . I believe this weird trick was once used by libc's to some effect)

Sign up to request clarification or add additional context in comments.

9 Comments

What, pray tell, was bad about the entire list of duplicates discussing this and their variety of answers, that warranted re-opening this Q just to post yet another answer re-hashing the extension?
@StoryTeller-UnslanderMonica OK, I opened a couple of the so called duplicates, and I thought they were not relevant because they just mentioned it's not standard compliant and this question asked about why it works. Maybe there was an actual duplicate answer I missed (in which case, let's close it) but I have to admit I got a little irritated by having the question closed right under me when I already had the answer written and so I just clicked vote to reopen.
@StoryTeller Maybe that choice was questionable but so I think is SO's policy of not letting us submit finished answers if somebody closes the question in the meantime.
And you know what else is irksome? Pouring over a list of topics to point the OP at, only to be undermined.
I had, in the past, the capability to post an answer that was in progress when the question was closed. Is the dupe closure an exception to the grace period allowing an answer in progress to be posted? It is possibly a good subject for a meta-question
|
1

That program should not compile (well it should compile, but you'll have double definition errors in your linking phase) due to how the variables are defined in your header file.

A header file informs the compiler about external environment it normally cannog guess by itself, as external variables defined in other modules.

As your question deals with this, I'll try to explain the correct way to define a global variable in one module, and how to inform the compiler about it in other modules.

Let's say you have a module A.c with some variable defined in it:

A.c

int I_am_a_global_variable;  /* you can even initialize it */

well, normally to make the compiler know when compiling other modules that you have that variable defined elsewhere, you need to say something like (the trick is in the extern keyword used to say that it is not defined here):

B.c

extern int I_am_a_global_variable; /* you cannot initialize it, as it is defined elsewhere */

As this is a property of the module A.c, we can write a A.h file, stating that somewhere else in the program, there's a variable named I_am_a_global_variable of type int, in order to be able to access it.

A.h

extern int I_am_a_global_variable; /* as above, you cannot initialize the variable here */

and, instead of declaring it in B.c, we can include the file A.h in B.c to ensure that the variable is declared as the author of B.c wanted to.

So now B.c is:

B.c

#include "A.h"
void some_function() {
    /* ... */
    I_am_a_global_variable = /* some complicated expression */;
}

this ensures that if the author of B.c decides to change the type or the declaration of the variable, he can do changing the file A.h and all the files that #include it should be recompiled (you can do this in the Makefile for example)

A.c

#include "A.h"   /* extern int I_am_a_global_variable; */
int I_am_a_global_variable = 27; 

In order to prevent errors, it is good that A.c also #includes the file A.h, so the declaration

extern int I_am_a_global_variable; /* as above, you cannot initialize the variable here */

and the final definition (that is included in A.c):

int I_am_a_global_variable = 23; /* I have initialized it to a non-default value to show how to do it */

are consistent between them (consider the author changes the type of I_am_a_global_variable to double and forgets to change the declaration in A.h, the compiler will complaint about non-matching declaration and definition, when compiling A.c (which now includes A.h).

Why I say that you will have double definition errors when linking?

Well, if you compile several modules with the statement (result of #includeing the file A.h in several modules) with the statement:

#include "A.h" /* this has an extern int I_am_a_global_variable; that informs the
                * compiler that the variable is defined elsewhere, but see below */
int I_am_a_global_variable; /* here is _elsewhere_ :) */

then all those modules will have a global variable I_m_a_global_variable, initialized to 0, because the compiler defined it in every module (you don't say that the variable is defined elsewhere, you are stating it to declare and define it in this compilation unit) and when you link all the modules together you'll end with several definitions of a variable with the same name at several places, and the references from other modules using this variable will don't know which one is to be used.

The compiler doesn't know anything of other compilations for an application when it is compiling module A, so you need some means to tell it what is happening around. The same as you use function prototypes to indicate it that there's a function somewhere that takes some number of arguments of types A, B, C, etc. and returns a value of type Z, you need to tell it that there's a variable defined elsewhere that has type X, so all the accesses you do to it in this module will be compiled correctly.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.