The symbols emitted by object files are the bridge of a program as they play a rather important role at the link stage. It’s a good manner for programmers to keep an eye on the symbols from time to time since it would help a lot in debugging, especially when it comes to link errors. And recently, I found something interesting.
Firstly, let’s start with the following case.
$ cat a.C
struct father { int a; };
struct child : virtual father { int b; child() {} };
struct grandchild : child { int c; };
int main() {
child c;
grandchild g;
}
Now compile it. Here the environment is little endian Linux, and I’m using the IBM XL C/C++ compiler. Look carefully into the binary built, and you would find something weird.
$ xlC -+ a.C
$ nm –gC a.out
……
00000000100009f8 W gran
dchi ld:: gran dchi ld() 0000000010000938 W child::child()
000000001000099c W child::child()
……
nm command prints all external symbols of a.out here. In each line, the hexadecimal numbers are the value of the symbol; “W” means the linkage of the symbol is weak; and the string that follows is the name.
See? There are two constructors with the same name child::child()!
Fortunately, it’s not a defect of the compiler or the nm command. In fact, it’s related to the mechanism of the implementation of virtual inheritance. I would like to begin the topic with the Linux C++ Application Binary Interface (ABI).
Two program entities!
Let’s take a further step.
$ nm –g a.out
…..
00000000100009f8 W _ZN10grandchildC1Ev
0000000010000938 W _ZN5childC1Ev
000000001000099c W _ZN5childC2Ev
……
Actually, the two constructors just “look” like each other. There’s some tiny difference in the mangled name.
Take _ZN5childC1Ev as an example. In the nested name 5childC1 (the name enclosed by N and E), 5child is the prefix and C1 is in the place of an unqualified function name, which is a constructor name in this case. Now here’s the question: what is C1 for? And also, what is C2? The answer is in the Itanium C++ ABI.
The Itanium C++ ABI (the object code interfaces between the user C++ code and the impl
5.1.4.3 Constructors and Destructors
Constructors and destructors are simply special cases of <unqualified-name>, where the final <unqualified-name> of a nested name is replaced by one of the following:
<
> ::= C1 # complete object constructor::= C2 # base object constructor
::= C3 # complete object allocating constructor
::= D0 # deleting destructor
::= D1 # complete object destructor
::= D2 # base object destructor
C1 is named as a complete object constructor, and C2 a base object constructor. As designed, the complete object constructor is invoked when the object is being constructed, while the base object constructor is invoked when a class derived from it is being constructed, in other words, when the object is constructed as a base. The two constructors are totally the same in most cases. However, things would go different when virtual inheritance is involved.
Virtual inheritance makes things complicated? We have the virtual table table – VTT!
As we all know, if a class child inherits from a class father virtually, child would have only one copy of member data instances of father, even if it inherits from multiple classes who have father as a common base, and so does any class further deriving from child. This also requires the compiler to guarantee that the virtual base father is constructed only once during the construction of derived class child, even though father is inherited multiple times. Let’s see how the compiler accomplishes that.
Thankfully, compilers provide data layout and virtual tables of classes under a certain option. Now, compile the initial case with XL C/C++ option -qdu
Class grandchild
size=24 align=8
grandchild (0x00000000) 0
vptridx=0 vptr
=((& gran dchi ld:: _ZTV 10gr andc hild ) + 24) child (0x00000000) 0
primary-for grandchild (0x00000000)
subvttidx=8
father (0x00000000) 16 virtual canonical
vbaseoffset=-24
Note, here father is a virtual base, so it’s placed after the member data of child, and there’s a virtual pointer at the offset 0, pointing to the virtual table of grandchild, indicating where the data of virtual base father is. The virtual table is also shared with the base child, because in this case, grandchild and child have the same pointer offset (this pointer still pointing to offset 0 when the object is cast to struct child). Since the program is built under 64-bit, the size of the pointer is 8 bytes, the size of int is 4 bytes, and the alignment is 8 bytes. Thus, the layout should look like:
0 | vptr of grandchild |
4 | |
8 | b |
12 | c |
16 | a |
20 | empty |
The virtual table for grandchild looks like:
Vtable for grandchild
gran
dchi ld:: _ZTV 10gr andc hild : 3 entries 0 16 ## [grandchild-child] VBase offset of struct father
8 0 ## [grandchild-child] Offset to top: struct child
16 &_ZTI10grandchild ## [grandchild-child] Class Info
The first column indicates the offset of the value, and the second column contains the value of entries. The content following ## are comments.
Now, let’s start with the construction. The virtual base father is the first to be constructed, and the default constructor father::father() is invoked. Meanwhile, this pointer would point to father now (offset 16). Nothing special. The layout becomes:
0 |
|
4 | |
8 |
|
12 |
|
16 (this) | a |
20 | empty |
After that, it’s child’s turn. The compiler would try to invoke child::child(). Also, the compiler adjusts this pointer to child, which should be offset 0.
0 (this) |
|
4 | |
8 |
|
12 |
|
16 | a |
20 | empty |
Let’s see what happens if we intuitively use the virtual table of child to do the construction. The data layout and virtual table of object child are shown below.
0 | vptr of child |
4 | |
8 | b |
12 | a |
Vtable for child
child::_ZTV5child: 3 entries
0 12 ## [child] VBase offset of struct father
8 0 ## [child] Offset to top: struct child
16 &_ZTI5child ## [child] Class Info
Wait! The virtual base offset of struct father in child is 12, while apparently in grandchild it should be 16 (from child offset 0 to father offset 16). Then how would child::child() know that we are constructing a child in grandchild? In other words, how would child::child() know how much the offset of struct father is? No way! So, now comes the virtual table table (VTT).
In the case of objects with virtual bases, the compiler would create a construction virtual table for every base (such as the following for child) specifically for construction.
Construction vtable for child (@0x100310b8260 instance) in grandchild
gran
dchi ld:: _ZTC 10gr andc hild 0_5c hild : 3 entries 0 16 ## [child] VBase offset of struct father
8 0 ## [child] Offset to top: struct child
16 &_ZTI5child ## [child] Class Info
And all these construction virtual tables are kept in a table, called the virtual table table (VTT).
VTT for grandchild
gran
dchi ld:: _ZTT 10gr andc hild : 2 entries 0 ((&g
rand chil d::_ ZTV1 0gra ndch ild) + 24) ## grandchild 8 ((&g
rand chil d::_ ZTC1 0gra ndch ild0 _5ch ild) + 24) ## child in construction vtable for child (@0x100310b8260 instance) in grandchild
With these tables in hand, the compiler can look for the child-in-grandchild construction virtual table and pass it to child::child() during the construction of child in grandchild, telling it how much the offset of base father is.
Problem solved!
Complete object constructor and base object constructor
Now let’s get back to the constructors. We know that VTT would be generated for objects which have virtual bases and the constructor would use this to pass a construction virtual table when the constructor of its base is invoked. But note that the actions related with VTT are the business of grandchild, so no VTT action is taken during the construction of the base object, such as child::child(). That is to say, there is a bit of difference between object constructors and base object constructors. That’s why we need two kinds of constructors.
From the name you can tell the difference. The complete object constructor (C1) processes the whole construction work, including fetching VTT and invoking all constructions of bases. On the other hand, the base object constructor (C2) is only called when the object is constructed as a base. It only cares about the object itself.
In this case, the complete object constructor (C1) of struct grandchild is called, it would invoke father::father() to construct the virtual base of grandchild. And also, it would invoke the base object constructor (C2) of struct child, which would not construct virtual base father. So both the complete object constructor and the base object constructor of struct child are generated, and they are used for child c and child-in-grandchild g respectively. Since grandchild is not derived in the program, the base object constructor is never invoked so only complete object constructor is generated.
Cases for classes without virtual base
For classes that have no virtual bases, such as struct father in this case, the complete object constructor doesn’t need to do VTT so it would be completely the same as the base object constructor. In such situations, it depends on the implementation of compilers whether to emit both constructors. Some compilers might generate two constructors with just the same implementation, some might generate two functions and make them alias, while others would just emit one base object constructor.
So, don’t be too surprised if you find two symbols for one constructor in your binary. They do exist for some reason.
A bit more about debugging
How about debugging the constructors of objects with virtual bases? The breakpoint set in constructors would point to two instruction addresses!
(gdb) break child::child()
Breakpoint 1 at 0x10000950: child::child(). (2 locations)
That’s not strange. There are two functions with body generated, with the same line number.
Breakpoint 1, child::child (thi
s=0x 3fff ffff e710 ) at a.C:2 2 struct child : virtual father { int b ; child() { } };
(gdb) backtrace
#0 child::child (thi
s=0x 3fff ffff e710 ) at a.C:2 #1 0x0000000010000ab0 in main () at a.C:6
(gdb) continue
Continuing.
Breakpoint 1, child::child (thi
s=0x 3fff ffff e720 ) at a.C:2 2 struct child : virtual father { int b ; child() { } };
(gdb) backtrace
#0 child::child (thi
s=0x 3fff ffff e720 ) at a.C:2 #1 0x0000000010000a2c in gran
dchi ld:: gran dchi ld (thi s=0x 3fff ffff e720 ) at a.C:3 #2 0x0000000010000abc in main () at a.C:7
Apparently, the two stops are in two different constructors. If you “disassemble” it to see the assembler code, you will find they do stop at different instructions in different functions. However, I believe this wouldn’t cause inconvenience when you debug the programs. The instructions users wrote down are all the same in the two functions, and thus the two constructors “appear” to be the same one for users (unless they "stepi" through the program).
Conclusion
We found two identical constructor symbols in our object file. However, they have different signatures, which indicate that one is a base object constructor and the other is a complete object constructor according to the Linux C++ ABI. We also know that they exist for certain reason in construction of objects with virtual bases. That is, for every constructor users wrote down, the compiler will generate two functions: one base object constructor (C2) to construct the object itself as a base, and one complete object constructor (C1) to do something more related to VTT. Thankfully, the compiler is handling them carefully and quite well, so that users don’t need to care about them.
Next time when you see both of them, don’t be surprised!
English Editor: Shuai Cao (Erik). Many thanks to Erik!
联系客服