Skip to main content

Comparing Compiler Options and Output Files (Lab 2)

Depending on the way a file is compiled, it will have different run time speeds, file sizes, and Assembly code. This is explained below in 7 different compilations of a C program:
 
 **All images will be hosted here [click the Assembly Code Pictures.docx green text to download] to make this post lighter on the planet, every picture reference will be made by figure followed by a number.**

1: gcc lab2.c -g (debugging) -O0 (no optimization) -fno-builtin (no builit in function optimization)

When exploring the Assembly version of this code we use objdump -d lab2.c which tells us a few things. Firstly, the code I wrote is in the section <main> in figure 1. Secondly, the statement to which I am calling (a simple "Hello World!" printf statement) is displayed in figure 2. Lastly, with objdump -h we can see there are 30 section headers and using ls-1h we can see the file is approximately 72 kb.

2:  gcc lab2.c -static -g -O0 -fno-builtin

Without explaining -static, lets see what it does to our file size, section headers, and function call. After compilation the file size is approximately 815kb, many times bigger than our dynamically compiled program! We can also see that we have 2 more section headers totalling 31 which are assumed to be unneeded. Lastly, the function call makes a reference to a different section header  <_IO_printf> shown in figure 3. This section appears to be much larger and slower than the printf counterpart as seen in figure 4.

3: gcc lab2.c -g -O0
 Without including the no function optimization option, we notice an important change in function call, it now references <puts@plt> noted in figure 5. This section is also quite small, noted in figure 6.

4: gcc lab2.c -O0 -fno-builtin

Without including debugging features and information we notice some differences in size, section headers, and disassembly output. Firstly, the size is slightly lower, now approximately 70kb instead of 72kb. Secondly, there are now only 25 section headers, significantly less than the original 30. Lastly the disassembly output is significantly smaller than the original output.

5: gcc lab2.c -g -O0 -fno-builtin

This time, I added more print arguments, the statement looks like the one below.

printf("Hello World!\n%d\n%d\n%d\n%d\n%d\n%d\n%d\n%d\n%d\n%d", 1,2,3,4,5,6,7,8,9,10);

We notice in figure 7 that each number is assigned a register specific to the architcture being used, which in my case is an Aarch64 architecture.

6: gcc lab2.c -g -O0 -fno-builtin

Changing it up again, I now switched the basic "Hello World!" statement to a void function called output() and called it in the main function. The result is the code in figure 8, where the <main> section now calls the <output> section which calls the previously mentioned <printf@plt>. This occurs becuase the assembler must now dig for the code we want to execute through the external function call.

7: gcc lab2.c -g -O3 -fno-builtin

Using a full optimization with O3 option while again using the standard "Hello World!" printf statement yields a very unexpected result. While still not allowing the function optimization the program cannot rewrite very much code, and yields a result nearly indentical to the O0 compilation option.

These tests with compiler options have allowed for some conclusions to be made. Firstly, for the best run time speeds following program completion, it is crucial to have the -O3 compilation flag and minimal function calls. Secondly, for debugging purposes, the -g option should always be added as a compilation option. Lastly, the program should be only knowingly compiled with the -static compiler flag as it appears to include the entire library in the assembly code instead of calling pieces when needed.





Comments

Popular posts from this blog

Final Project Part 02 - Sha1 Function Enhancements

To try to squeeze out a bit more performance I attempted to some compiler optimizations. Unfortunately, due to the sheer complexity of the algorithm, I was unable to find other logic complexities to simplify. I tried some loop unrolling to make the compiler have to work a little less, some examples are here below: I made a graph to demonstrate the minute differences this makes in the test vectors below: At most a few millisecond difference is all that can be acquired, and this is only from the finalcount[] array as the digest array produces errors if not compiled in a loop along with other for loops in the code. To test this I simply altered the sha1.c code and ran the make file to see if the vectors passed or failed. As mentioned this is a compiler optimzation, in other words it is completed already, especially at the -O3 level where the benchmarking was done. I would not  recommend this change to be pushed upstream normally due to the insignificant time ch...

Inline Assembler (Lab 7)

Part 1 After given an Inline assembler version of the volume program I made in the last lab, I got some results that shocked me. After running it with the same 500,000,000 sample size It took only 1.2 seconds of computing time, which is better than even the best variant (bit-shifting) of the program I had made by over 50%. I answered some questions below to further my understanding: 1. What is another way of defining variables instead of the (type name register) format? This can be done using normal type variables as the compiler will automatically put values into registers. 2. For the line vol_int = (int16_t) (0.5 * 32767.0); should 32767 or 32768 be used? 32767 should be used because the int will round the value and 32768 is not in the int16_t range. 3. What does __asm__("dup v1.8h,w22"); do? The duplicate simply means copy the int value into a new vector register . This is for SIMD instructions. 4. What happens if we remove : "=r"(in_cursor) ...

Final Project Part 02 - Final Summary

In conclusion, the -O3 flag was the most important discovery with trying to optimize clib. It offered a significant speed up with no interference, and provided the chance to uniform a many times used function, strdup. Overall the function is built extremely well with very advanced logic. Attempting to alter said logic sprouted many errors and warnings and left only simple compiler optimizations such as loop unrolling which made small differences in speed up. Clib as a whole is a great idea, offering many compartmentalized features for the C programming language that programmers could definitely find useful when developing. I hope in the future I can get more involved in writing code for open source projects such as clib whether that be doing optimization work or building from the ground up. This project not only gave me an insight on how these open source projects work but also at real world coding and improvement opportunities. I can honestly say now that I have had some experience ...