Why Exynos Octa 5420 is unusually slow

My code:

#include<ctime>
#include<cstdio>

int main(){
    struct timespec t,mt1,mt2;
    unsigned long long int mt;

    clock_gettime(CLOCK_THREAD_CPUTIME_ID,&mt1);

    //Measured block begin
    for(int i=0;i<1000000;i++)
        clock_gettime(CLOCK_THREAD_CPUTIME_ID,&t);
    //Measured block end

    clock_gettime(CLOCK_THREAD_CPUTIME_ID,&mt2);
    mt = (mt2.tv_sec - mt1.tv_sec)*1000000000LL + mt2.tv_nsec - mt1.tv_nsec;

    printf("%lldn",mt);

    return 0;
}

I'm using a standalone arm-v7a toolchain generated from Android NDK r9d that resides under /opt/android-toolchain .

Configuration 1:

These are the default flags generated by the toolchain file in https://github.com/taka-no-me/android-cmake.

Compiler configuration:

/opt/android-toolchain/bin/arm-linux-androideabi-g++ 
    -DANDROID -Wno-psabi --sysroot=/opt/android-toolchain/sysroot 
    -fpic -funwind-tables -finline-limit=64 -fsigned-char 
    -no-canonical-prefixes -march=armv7-a -mfloat-abi=softfp 
    -mfpu=vfpv3-d16 -fdata-sections -ffunction-sections 
    -Wa,--noexecstack  -mthumb -fomit-frame-pointer 
    -fno-strict-aliasing -O3 -DNDEBUG 
    -isystem /opt/android-toolchain/sysroot/usr/include 
    -isystem /opt/android-toolchain/include/c++/4.8 
    -isystem /opt/android-toolchain/include/c++/4.8/arm-linux-androideabi/armv7-a 
    -o my-object-file.o -c my-source-file.cpp

Linker configuration:

/opt/android-toolchain/bin/arm-linux-androideabi-gcc 
    -Wno-psabi --sysroot=/opt/android-toolchain/sysroot 
    -fpic -funwind-tables -finline-limit=64 -fsigned-char 
    -no-canonical-prefixes -march=armv7-a -mfloat-abi=softfp 
    -mfpu=vfpv3-d16 -fdata-sections -ffunction-sections 
    -Wa,--noexecstack  -mthumb -fomit-frame-pointer 
    -fno-strict-aliasing -O3 -DNDEBUG -Wl,--fix-cortex-a8 
    -Wl,--no-undefined -Wl,-allow-shlib-undefined -Wl,--gc-sections 
    -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now 
    -Wl,-z,nocopyreloc my-object-file.o -o my-executable 
    -L/libs/armeabi-v7a -rdynamic 
    "/opt/android-toolchain/arm-linux-androideabi/lib/armv7-a/thumb/libstdc++.a" 
    "/opt/android-toolchain/arm-linux-androideabi/lib/armv7-a/thumb/libsupc++.a" 
    -lm
  • Samsung Galaxy Note 10.1 2014 Edition with Exynos Octa 5420 @1.9 Ghz running Samsung stock 4.4.2 ROM, code takes 2.0 seconds
  • Samsung Galaxy Note II with Exynos 4412 @1.6 GHz running CyanogenMod 11 based on Android 4.4.4, code takes 0.75 seconds
  • Samsung Galaxy S3 with Exynos 4412 @1.4 Ghz running CyanogenMod 11 based on Android 4.4.4, code takes 1.1 seconds
  • Configuration 2:

    Nearly all flags from before are removed.

    Compiler configuration:

    /opt/android-toolchain/bin/arm-linux-androideabi-g++ 
        -DANDROID --sysroot=/opt/android-toolchain/sysroot 
        -O3 -DNDEBUG 
        -isystem /opt/android-toolchain/sysroot/usr/include 
        -isystem /opt/android-toolchain/include/c++/4.8 
        -isystem /opt/android-toolchain/include/c++/4.8/arm-linux-androideabi/armv7-a 
        -o my-object-file.o -c my-source-file.cpp
    

    Linker configuration:

    /opt/android-toolchain/bin/arm-linux-androideabi-gcc 
        --sysroot=/opt/android-toolchain/sysroot -O3 -DNDEBUG 
        -Wl,-z,nocopyreloc my-object-file.o -o my-executable 
        -L/libs/armeabi-v7a -rdynamic 
        "/opt/android-toolchain/arm-linux-androideabi/lib/armv7-a/thumb/libstdc++.a" 
        "/opt/android-toolchain/arm-linux-androideabi/lib/armv7-a/thumb/libsupc++.a" 
        -lm
    
  • Samsung Galaxy Note 10.1 2014 Edition with Exynos Octa 5420 @1.9 Ghz running Samsung stock 4.4.2 ROM, code takes 2.2 seconds
  • Samsung Galaxy Note II with Exynos 4412 @1.6 GHz running CyanogenMod 11 based on Android 4.4.4, code takes 0.94 seconds
  • Samsung Galaxy S3 with Exynos 4412 @1.4 Ghz running CyanogenMod 11 based on Android 4.4.4, code takes 1.1 seconds
  • Notes for both configurations:

  • I set the the lowest CPU clock frequency to the highest possible, ie 1.9 Ghz, by a CPU tuning app.

  • I made sure there are no background processes hogging the CPU.

  • I also tried specifically -mcpu=cortex-a15 flag, doesn't change the execution time significantly.

  • Also tried -mfpu=neon -marm -mtune=cortex-a15 , doesn't change the execution time significantly.

  • clock_gettime() is not the culprit, the code is visibly slower.

  • Other pieces of code I tried, including parts of OpenCV imgproc and STL calls such as std::map::find() and std::sort() are all visibly and clock_gettime() -measurably slower on the Exynos Octa 5420 compared to the two others I listed above.

  • My hypotheses:

  • My thread is somehow getting stuck on one of the Cortex-A7 cores instead of hopping on to one of the Cortex-A15 ones. If this might be the case, what can I do to make sure this is the case or how can I force my threads onto the Cortex-A15 cores?

  • I failed to set the CPU clock frequency lower limit and the CPU is being throttled. If this might be the case, how can I make sure that this is the case?

  • Samsung's kernel is somehow worse compared to CM's. Can this cause this much difference in execution time?

  • I'm pretty much stumped at this point. What are your advices and insights so that I can get my money's worth out of this device?

    Edit: I flashed a custom tweaked kernel (http://forum.xda-developers.com/showthread.php?t=2725193) and set the governor to performance and execution time went down to about 1.3 seconds , so I think my 3rd hypothesis is a bit stronger now. It is still slower than the older CPUs though...

    链接地址: http://www.djcxy.com/p/14992.html

    上一篇: NaN == 0.0与英特尔C ++编译器?

    下一篇: 为什么Exynos Octa 5420异常缓慢