Skip to main content

Vectorization and SIMD (Lab 5)

Our task was to create a small program that:

1) Creates two 1000-element integer arrays and fills them with random numbers in the range -1000 to +1000

2) Sums those two arrays element-by-element to a third array

3) Sums the third array and prints the result

The source code for the program is here. The best way I found to get this to compile vectorized on our aarch64 machine was:

gcc -03 lab5.c

Before this, I tried:

gcc  -ftree-vectorize lab5.c //already done in -03, leaves a messy main that is hard to understand.
gcc -ftree-vectorizer-verbose=2 lab5.c//supposed to show how the program was vectorized but has been retired for -fopt-info-
gcc -O2 -ftree-vectorize -fopt-info-vec-all lab5.c //shows vectorization process, useful for making sure your code structure allows vectorization. Interestingly, this helped me understand having one for loop allows for no vectorization. 

A picture demonstrating the breakdown of the code is here. I thought this task was really practical for real world applications because if it is coded in a way that can be vectorized, the compiler can do the speed up instead of the programmer. Going line by line commenting was rather simple but time consuming, however very useful in seeing the vectorization process in assembly.

In case anyone needs help, this guide is really useful in decoding aarch64 assembly and gave me a better understanding on how everything works in the big picture.  

Comments

Popular posts from this blog

Final Project Part 02 - Sha1 Function Enhancements

To try to squeeze out a bit more performance I attempted to some compiler optimizations. Unfortunately, due to the sheer complexity of the algorithm, I was unable to find other logic complexities to simplify. I tried some loop unrolling to make the compiler have to work a little less, some examples are here below: I made a graph to demonstrate the minute differences this makes in the test vectors below: At most a few millisecond difference is all that can be acquired, and this is only from the finalcount[] array as the digest array produces errors if not compiled in a loop along with other for loops in the code. To test this I simply altered the sha1.c code and ran the make file to see if the vectors passed or failed. As mentioned this is a compiler optimzation, in other words it is completed already, especially at the -O3 level where the benchmarking was done. I would not  recommend this change to be pushed upstream normally due to the insignificant time change

Final Project Part 02 - Final Summary

In conclusion, the -O3 flag was the most important discovery with trying to optimize clib. It offered a significant speed up with no interference, and provided the chance to uniform a many times used function, strdup. Overall the function is built extremely well with very advanced logic. Attempting to alter said logic sprouted many errors and warnings and left only simple compiler optimizations such as loop unrolling which made small differences in speed up. Clib as a whole is a great idea, offering many compartmentalized features for the C programming language that programmers could definitely find useful when developing. I hope in the future I can get more involved in writing code for open source projects such as clib whether that be doing optimization work or building from the ground up. This project not only gave me an insight on how these open source projects work but also at real world coding and improvement opportunities. I can honestly say now that I have had some experience

Comparing Open Source Software Packages (Lab 1)

This post examines code review processes to understand how and where to look to push code upstream OpenLDAP This software is best described on the application's project overview page as a " robust, commercial-grade, fully featured, and open source LDAP suite of applications and development tools ". ( https://www.openldap.org/project/ )   This software operates under it's own OpenLDAP public license and accepts patches through the OpenLDAP Issue Tracking System . Patches are approved by the OpenLDAP Core Team , most noticeably Howard Chu and Kurt Zeilenga. An example of a closed patch is Contribution# 5410 where a developer Peter O'Gorman added a patch to allow building of a module with a different compiler addressed to Howard Chu (Chief Architect). The issue was concluded over nine days after two replies to the original message.  Change was implemented and the developer was very prompt (three hours after Architect reply) to respond. The Issue Tracker