Why Exynos Octa 5420 is unusually slow
My code:
#include<ctime>
#include<cstdio>
int main(){
struct timespec t,mt1,mt2;
unsigned long long int mt;
clock_gettime(CLOCK_THREAD_CPUTIME_ID,&mt1);
//Measured block begin
for(int i=0;i<1000000;i++)
clock_gettime(CLOCK_THREAD_CPUTIME_ID,&t);
//Measured block end
clock_gettime(CLOCK_THREAD_CPUTIME_ID,&mt2);
mt = (mt2.tv_sec - mt1.tv_sec)*1000000000LL + mt2.tv_nsec - mt1.tv_nsec;
printf("%lldn",mt);
return 0;
}
I'm using a standalone arm-v7a toolchain generated from Android NDK r9d that resides under /opt/android-toolchain
.
Configuration 1:
These are the default flags generated by the toolchain file in https://github.com/taka-no-me/android-cmake.
Compiler configuration:
/opt/android-toolchain/bin/arm-linux-androideabi-g++
-DANDROID -Wno-psabi --sysroot=/opt/android-toolchain/sysroot
-fpic -funwind-tables -finline-limit=64 -fsigned-char
-no-canonical-prefixes -march=armv7-a -mfloat-abi=softfp
-mfpu=vfpv3-d16 -fdata-sections -ffunction-sections
-Wa,--noexecstack -mthumb -fomit-frame-pointer
-fno-strict-aliasing -O3 -DNDEBUG
-isystem /opt/android-toolchain/sysroot/usr/include
-isystem /opt/android-toolchain/include/c++/4.8
-isystem /opt/android-toolchain/include/c++/4.8/arm-linux-androideabi/armv7-a
-o my-object-file.o -c my-source-file.cpp
Linker configuration:
/opt/android-toolchain/bin/arm-linux-androideabi-gcc
-Wno-psabi --sysroot=/opt/android-toolchain/sysroot
-fpic -funwind-tables -finline-limit=64 -fsigned-char
-no-canonical-prefixes -march=armv7-a -mfloat-abi=softfp
-mfpu=vfpv3-d16 -fdata-sections -ffunction-sections
-Wa,--noexecstack -mthumb -fomit-frame-pointer
-fno-strict-aliasing -O3 -DNDEBUG -Wl,--fix-cortex-a8
-Wl,--no-undefined -Wl,-allow-shlib-undefined -Wl,--gc-sections
-Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now
-Wl,-z,nocopyreloc my-object-file.o -o my-executable
-L/libs/armeabi-v7a -rdynamic
"/opt/android-toolchain/arm-linux-androideabi/lib/armv7-a/thumb/libstdc++.a"
"/opt/android-toolchain/arm-linux-androideabi/lib/armv7-a/thumb/libsupc++.a"
-lm
Configuration 2:
Nearly all flags from before are removed.
Compiler configuration:
/opt/android-toolchain/bin/arm-linux-androideabi-g++
-DANDROID --sysroot=/opt/android-toolchain/sysroot
-O3 -DNDEBUG
-isystem /opt/android-toolchain/sysroot/usr/include
-isystem /opt/android-toolchain/include/c++/4.8
-isystem /opt/android-toolchain/include/c++/4.8/arm-linux-androideabi/armv7-a
-o my-object-file.o -c my-source-file.cpp
Linker configuration:
/opt/android-toolchain/bin/arm-linux-androideabi-gcc
--sysroot=/opt/android-toolchain/sysroot -O3 -DNDEBUG
-Wl,-z,nocopyreloc my-object-file.o -o my-executable
-L/libs/armeabi-v7a -rdynamic
"/opt/android-toolchain/arm-linux-androideabi/lib/armv7-a/thumb/libstdc++.a"
"/opt/android-toolchain/arm-linux-androideabi/lib/armv7-a/thumb/libsupc++.a"
-lm
Notes for both configurations:
I set the the lowest CPU clock frequency to the highest possible, ie 1.9 Ghz, by a CPU tuning app.
I made sure there are no background processes hogging the CPU.
I also tried specifically -mcpu=cortex-a15
flag, doesn't change the execution time significantly.
Also tried -mfpu=neon -marm -mtune=cortex-a15
, doesn't change the execution time significantly.
clock_gettime()
is not the culprit, the code is visibly slower.
Other pieces of code I tried, including parts of OpenCV imgproc
and STL calls such as std::map::find()
and std::sort()
are all visibly and clock_gettime()
-measurably slower on the Exynos Octa 5420 compared to the two others I listed above.
My hypotheses:
My thread is somehow getting stuck on one of the Cortex-A7 cores instead of hopping on to one of the Cortex-A15 ones. If this might be the case, what can I do to make sure this is the case or how can I force my threads onto the Cortex-A15 cores?
I failed to set the CPU clock frequency lower limit and the CPU is being throttled. If this might be the case, how can I make sure that this is the case?
Samsung's kernel is somehow worse compared to CM's. Can this cause this much difference in execution time?
I'm pretty much stumped at this point. What are your advices and insights so that I can get my money's worth out of this device?
Edit: I flashed a custom tweaked kernel (http://forum.xda-developers.com/showthread.php?t=2725193) and set the governor to performance
and execution time went down to about 1.3 seconds , so I think my 3rd hypothesis is a bit stronger now. It is still slower than the older CPUs though...