Our task was to create a small program that:
1) Creates two 1000-element integer arrays and fills them with random numbers in the range -1000 to +1000
2) Sums those two arrays element-by-element to a third array
3) Sums the third array and prints the result
The source code for the program is here. The best way I found to get this to compile vectorized on our aarch64 machine was:
gcc -03 lab5.c
Before this, I tried:
gcc
In case anyone needs help, this guide is really useful in decoding aarch64 assembly and gave me a better understanding on how everything works in the big picture.
1) Creates two 1000-element integer arrays and fills them with random numbers in the range -1000 to +1000
2) Sums those two arrays element-by-element to a third array
3) Sums the third array and prints the result
The source code for the program is here. The best way I found to get this to compile vectorized on our aarch64 machine was:
gcc -03 lab5.c
Before this, I tried:
gcc
-ftree-vectorize lab5.c //already done in -03, leaves a messy main that is hard to understand.
gcc
-ftree-vectorizer-verbose=2 lab5.c//supposed to show how the program was vectorized but has been retired for
-fopt-info-
gcc
-O2 -ftree-vectorize -fopt-info-vec-all lab5.c //shows vectorization process,
useful for making sure your code structure allows vectorization. Interestingly, this helped me understand having one for loop allows for no vectorization.
A picture demonstrating the breakdown of the code is here. I thought this task was really practical for real world applications because if it is coded in a way that can be vectorized, the compiler can do the speed up instead of the programmer. Going line by line commenting was rather simple but time consuming, however very useful in seeing the vectorization process in assembly. In case anyone needs help, this guide is really useful in decoding aarch64 assembly and gave me a better understanding on how everything works in the big picture.
Comments
Post a Comment