Introduction
With this article I would like to look a bit behind the scenes of the Visual C/C++ compiler. It for sure is interesting how the individual compiler switches influence the generated code.
Background
I assume the reader to be familiar with the basics of the Visual C/C++ compiler. Also the reader should not be affraid of using the compiler from the command line.
Enable Read-Only String Pooling
The command line help denotes the /GF compiler switch by "enable read-only string pooling". What exactly does this mean?
In a nutshell it means that identical string literals occuring in several places in your source code will be translated to a single data item in the binary image. Thus the /GF option will help you optimize your code as it will produce smaller binaries. Let us look at the following code snippet:
char* str1 = "Bart Simpson";
char* str2 = "Milhouse van Houten";
void foo() {
static char* s1 = "Bart Simpson";
static char* s2 = "Milhouse van Houten";
}
int main() {
}
Without specifying the /GF option the Visual C/C++ compiler will generate code like this:
_DATA SEGMENT
$SG855 DB 'Bart Simpson', 00H
ORG $+3
str1 DQ FLAT:$SG855
$SG857 DB 'Milhouse van Houten', 00H
ORG $+4
str2 DQ FLAT:$SG857
$SG862 DB 'Bart Simpson', 00H
ORG $+3
?s1@?1??foo@@9@9 DQ FLAT:$SG862
$SG865 DB 'Milhouse van Houten', 00H
ORG $+4
?s2@?1??foo@@9@9 DQ FLAT:$SG865
_DATA ENDS
As you can see each string literal in the C++ will be placed into the binary image. Even if some string literals are identical.
SECTION HEADER #3
.data name
21A0 virtual size
9000 virtual address (0000000140009000 to 000000014000B19F)
1000 size of raw data
7800 file pointer to raw data (00007800 to 000087FF)
0 file pointer to relocation table
0 file pointer to line numbers
0 number of relocations
0 number of line numbers
C0000040 flags
Initialized Data
Read Write
RAW DATA #3
0000000140009000: 42 61 72 74 20 53 69 6D 70 73 6F 6E 00 00 00 00 Bart Simpson....
0000000140009010: 00 90 00 40 01 00 00 00 4D 69 6C 68 6F 75 73 65 [email protected]
0000000140009020: 20 76 61 6E 20 48 6F 75 74 65 6E 00 00 00 00 00 van Houten.....
0000000140009030: 18 90 00 40 01 00 00 00 42 61 72 74 20 53 69 6D [email protected] Sim
0000000140009040: 70 73 6F 6E 00 00 00 00 38 90 00 40 01 00 00 00 pson....8..@....
0000000140009050: 4D 69 6C 68 6F 75 73 65 20 76 61 6E 20 48 6F 75 Milhouse van Hou
0000000140009060: 74 65 6E 00
Now let us rebuild the code with the /GF compiler switch. As we can see from the assembly listing the compiler will generate completely different code:
CONST SEGMENT
??_C@_0BE@BMDGJIMK@Milhouse?5van?5Houten?$AA@ DB 'Milhouse van Houten', 00H
CONST ENDS
CONST SEGMENT
??_C@_0N@MPADFJH@Bart?5Simpson?$AA@ DB 'Bart Simpson', 00H
CONST ENDS
_DATA SEGMENT
str1 DQ FLAT:??_C@_0N@MPADFJH@Bart?5Simpson?$AA@
str2 DQ FLAT:??_C@_0BE@BMDGJIMK@Milhouse?5van?5Houten?$AA@
?s1@?1??foo@@9@9 DQ FLAT:??_C@_0N@MPADFJH@Bart?5Simpson?$AA@
?s2@?1??foo@@9@9 DQ FLAT:??_C@_0BE@BMDGJIMK@Milhouse?5van?5Houten?$AA@
_DATA ENDS
We can make two observations:
- several identical string literals will be translated into a single data item
- the string data will be placed in a read-only data segment
SECTION HEADER #2
.rdata name
2560 virtual size
6000 virtual address (0000000140006000 to 000000014000855F)
2600 size of raw data
5200 file pointer to raw data (00005200 to 000077FF)
0 file pointer to relocation table
0 file pointer to line numbers
0 number of relocations
0 number of line numbers
40000040 flags
Initialized Data
Read Only
RAW DATA #2
0000000140006000: 48 81 00 00 00 00 00 00 5A 81 00 00 00 00 00 00 H.......Z.......
...
0000000140006210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000000140006220: 4D 69 6C 68 6F 75 73 65 20 76 61 6E 20 48 6F 75 Milhouse van Hou
0000000140006230: 74 65 6E 00 00 00 00 00 42 61 72 74 20 53 69 6D ten.....Bart Sim
0000000140006240: 70 73 6F 6E 00
Points of Interest
The concept of strings in the C/C++ programming language even confuses advanced developers. One can find countless questions related to strings in C/C++. Here just some notes regarding the code snippet in this article.
According to the Annotated C++ Reference Manual a string literal has type char[]
("array of char"). An attempt to modify a string literal results in undefined behavior. The C++ language refused to make string literals of type const char[]
for compatibility reasons with classic C.
So let us see what the compiler does when we type the strings to char[]
instead of char*
.
char str1[] = "Bart Simpson";
char str2[] = "Milhouse van Houten";
void foo() {
static char s1[] = "Bart Simpson";
static char s2[] = "Milhouse van Houten";
}
int main() {
}
Surprisingly the /GF option in this situation has no effect! Multiple data items will be placed in the data segment of the binary image. This is not a real issue with the compiler. Just one thing one would expect a professional compiler be able to do...
History
- March, 2012 - Article first published.