When attempting to alter large amounts of data by a single value, the approach taken can have a significant impact on performance. In this lab example, we had the task of simulating sound and reducing the volume to 75% of the original. Limited by the variable type of int16_t means that the range must be between -32,768 to +32,767. To simulate sound the random function was used albeit with a fixed seed to predict the same results.
The basic program had to work something like this:
//make a 500 million digit array with random int16_t numbers
//scale the values by 0.75 and see how long it takes
//sum the result and display it
Part 1
This part involves simply multiplying each value in the array by 0.75 and placing it back in the array. The code I made for that is here. This approach took the most amount of time, coming in at approximately 8 seconds.
Part 2
This part was completed using a precalculated array of values scaled to 0.75 volume. The calculations during the time step were much easier as the sound array would pass it's current value + 32768 (because we can't have a negative sound) and see what that value is in the lookup table. The code for this step is here. This calculation was faster, coming in at approximately 6 seconds.
Part 3
The final part was made to demonstrate the speed of bit shifting values. The 0.75 value was first multiplied by 256 to make it a fixed point integer. This was followed by the entire value (i.e. array value * new volume) being bit shifted by 8. The code for this final part can be found here. This was by far the fastest, as the calculation took approximately 3 seconds to complete.
These run times have been captured live and are available here for future reference. There are some important questions that help us analyze what we have just made.
1) What is the impact of various optimization levels on the software performance?
If we do a severe optimization (O3) we notice the time is cut in half or better for all parts, a huge improvement. This can be seen here.
2) Does the distribution of data matter?
The distribution of data does matter, this is proven by part 2 being faster than part 1. If we split the new values into another array it is faster to take values from another array then perform a math calculation.
3) If samples are fed at CD rate (44100 samples per second x 2 channels), can both algorithms keep up?
All 3 algorithms can easily keep up. How can we determine this? At the highest rate we are looking at 88200 samples a second. If the slowest algorithm can process 500 million numbers in less than 10 seconds they all meet this requirement.
4) What is the memory footprint of each approach?
The beautiful thing with the calloc() function is these algorithms cannot use any more memory than they need. The memory footprint is approximately bound to the size and type of the array(s). The sound array calculation is 16 bits * 500000000 numbers = 8000000000 bits = 1000000000 bytes = 1000mb.
5) What is the performance of each approach?
The second approach is slightly faster than the first approach, and the third approach is drastically faster than both.
6) What is the energy consumption of each approach? (What information do you need to calculate this?)
To calculate energy consumption, the architecture (ARM), the time taken, and processor power would have to be considered. The longer the run time however will likely drain more power as the CPU does not tend to sit idly if possible.
7) Aarchie and Betty (Xerxes) have different performance profiles, so it's not reasonable to compare performance between the machines, but it is reasonable to compare the relative performance of the two algorithms in each context. Do you get similar results?
Both results seemed to be relatively similar, this is difficult to test however because load on each machine varies and causes discrepancies.
8) What other optimizations can be applied to this problem?
To test the limit on these algorithms they could be written in assembly. This has already been proven to yield the fastest times.
The basic program had to work something like this:
//make a 500 million digit array with random int16_t numbers
//scale the values by 0.75 and see how long it takes
//sum the result and display it
Part 1
This part involves simply multiplying each value in the array by 0.75 and placing it back in the array. The code I made for that is here. This approach took the most amount of time, coming in at approximately 8 seconds.
Part 2
This part was completed using a precalculated array of values scaled to 0.75 volume. The calculations during the time step were much easier as the sound array would pass it's current value + 32768 (because we can't have a negative sound) and see what that value is in the lookup table. The code for this step is here. This calculation was faster, coming in at approximately 6 seconds.
Part 3
The final part was made to demonstrate the speed of bit shifting values. The 0.75 value was first multiplied by 256 to make it a fixed point integer. This was followed by the entire value (i.e. array value * new volume) being bit shifted by 8. The code for this final part can be found here. This was by far the fastest, as the calculation took approximately 3 seconds to complete.
These run times have been captured live and are available here for future reference. There are some important questions that help us analyze what we have just made.
1) What is the impact of various optimization levels on the software performance?
If we do a severe optimization (O3) we notice the time is cut in half or better for all parts, a huge improvement. This can be seen here.
2) Does the distribution of data matter?
The distribution of data does matter, this is proven by part 2 being faster than part 1. If we split the new values into another array it is faster to take values from another array then perform a math calculation.
3) If samples are fed at CD rate (44100 samples per second x 2 channels), can both algorithms keep up?
All 3 algorithms can easily keep up. How can we determine this? At the highest rate we are looking at 88200 samples a second. If the slowest algorithm can process 500 million numbers in less than 10 seconds they all meet this requirement.
4) What is the memory footprint of each approach?
The beautiful thing with the calloc() function is these algorithms cannot use any more memory than they need. The memory footprint is approximately bound to the size and type of the array(s). The sound array calculation is 16 bits * 500000000 numbers = 8000000000 bits = 1000000000 bytes = 1000mb.
5) What is the performance of each approach?
The second approach is slightly faster than the first approach, and the third approach is drastically faster than both.
6) What is the energy consumption of each approach? (What information do you need to calculate this?)
To calculate energy consumption, the architecture (ARM), the time taken, and processor power would have to be considered. The longer the run time however will likely drain more power as the CPU does not tend to sit idly if possible.
7) Aarchie and Betty (Xerxes) have different performance profiles, so it's not reasonable to compare performance between the machines, but it is reasonable to compare the relative performance of the two algorithms in each context. Do you get similar results?
Both results seemed to be relatively similar, this is difficult to test however because load on each machine varies and causes discrepancies.
8) What other optimizations can be applied to this problem?
To test the limit on these algorithms they could be written in assembly. This has already been proven to yield the fastest times.
Comments
Post a Comment