Linaro compilation speed

I am working on software running on an embedded ARM platform. In the course of updating our platform, we are switching from an OpenEmbedded based system to Linaro.

On my machine, it currently takes about 9 minutes to cross-compile our software for ARM, using the 32 bit gcc 4.6.4 that OpenEmbedded built for us. For the new system we are now of course trying Linaro's gcc 4.7 binary - with the surprising result that compilation suddenly takes about twice as long (18 minutes). The Linaro gcc 4.6 binary has the same issue, so it is not gcc version specific.

Using Linaro's crosstool-ng to create an adjusted version of their compiler (eg trying to get the configure options as close as possible) did not speed it up.

The main differences between our old gcc compiler and the Linaro one:

  • old one uses softfp, Linaro hard and specifies the fpu
  • old one targets no particular ARM processor/architecture (arm-none-linux-gnueabi), Linaro's gcc (arm-linux-gnueabihf) has with-target=armv7-a and with-tune=cortex-a9 explicitely set
  • Changing configure options in gcc like enabling of ssp, thumb/arm mode, using multilib, target CPU (cortex-a8 vs a9) does not yield an improvement.

    Performance speed already differs for a simple test.cpp that just has a main function with a vector<int> , so it's not related to the linking and I doubt that the STL header files are causing that much difference.

    I am running out of ideas what else to tweak. Does anybody have an idea?


    EDIT4: I also tried the arm cross compiler from Ubuntu 12.04 (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)) and it has comparable compilation times to my 4.6.4 version. So there seems to be something particular different in Linaro's version which I either can't manage to turn off or is some special patch they applied?


    EDIT3: -ftime-report from Linaro gcc 4.7 for an actual source file from the project:

    Execution times (seconds)
     phase setup             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 1%) wall    2076 kB ( 2%) ggc
     phase parsing           :   1.72 (74%) usr   0.34 (85%) sys   2.04 (75%) wall   66732 kB (79%) ggc
     phase lang. deferred    :   0.28 (12%) usr   0.04 (10%) sys   0.33 (12%) wall   10215 kB (12%) ggc
     phase cgraph            :   0.32 (14%) usr   0.02 ( 5%) sys   0.33 (12%) wall    5481 kB ( 6%) ggc
     phase generate          :   0.60 (26%) usr   0.06 (15%) sys   0.66 (24%) wall   15700 kB (19%) ggc
     |name lookup            :   0.28 (12%) usr   0.02 ( 5%) sys   0.24 ( 9%) wall    8058 kB (10%) ggc
     |overload resolution    :   0.32 (14%) usr   0.06 (15%) sys   0.36 (13%) wall   10042 kB (12%) ggc
     callgraph construction  :   0.06 ( 3%) usr   0.00 ( 0%) sys   0.02 ( 1%) wall     551 kB ( 1%) ggc
     callgraph optimization  :   0.02 ( 1%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     224 kB ( 0%) ggc
     varpool construction    :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall      94 kB ( 0%) ggc
     df scan insns           :   0.06 ( 3%) usr   0.00 ( 0%) sys   0.02 ( 1%) wall       5 kB ( 0%) ggc
     df reg dead/unused notes:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall      22 kB ( 0%) ggc
     alias analysis          :   0.00 ( 0%) usr   0.02 ( 5%) sys   0.00 ( 0%) wall      11 kB ( 0%) ggc
     preprocessing           :   0.08 ( 3%) usr   0.10 (25%) sys   0.29 (11%) wall    1069 kB ( 1%) ggc
     parser (global)         :   0.58 (25%) usr   0.08 (20%) sys   0.43 (16%) wall   25145 kB (30%) ggc
     parser struct body      :   0.28 (12%) usr   0.02 ( 5%) sys   0.34 (12%) wall   12400 kB (15%) ggc
     parser enumerator list  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     121 kB ( 0%) ggc
     parser function body    :   0.14 ( 6%) usr   0.00 ( 0%) sys   0.15 ( 6%) wall    2435 kB ( 3%) ggc
     parser inl. func. body  :   0.10 ( 4%) usr   0.02 ( 5%) sys   0.18 ( 7%) wall    3682 kB ( 4%) ggc
     parser inl. meth. body  :   0.24 (10%) usr   0.02 ( 5%) sys   0.20 ( 7%) wall    5298 kB ( 6%) ggc
     template instantiation  :   0.58 (25%) usr   0.14 (35%) sys   0.75 (28%) wall   26588 kB (31%) ggc
     tree gimplify           :   0.02 ( 1%) usr   0.00 ( 0%) sys   0.03 ( 1%) wall     785 kB ( 1%) ggc
     tree CFG construction   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     543 kB ( 1%) ggc
     tree SSA other          :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 1%) wall      32 kB ( 0%) ggc
     out of ssa              :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
     expand                  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     438 kB ( 1%) ggc
     varconst                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       6 kB ( 0%) ggc
     integrated RA           :   0.04 ( 2%) usr   0.00 ( 0%) sys   0.09 ( 3%) wall    1313 kB ( 2%) ggc
     reload                  :   0.06 ( 3%) usr   0.00 ( 0%) sys   0.02 ( 1%) wall      60 kB ( 0%) ggc
     thread pro- & epilogue  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 1%) wall      92 kB ( 0%) ggc
     final                   :   0.02 ( 1%) usr   0.00 ( 0%) sys   0.02 ( 1%) wall       4 kB ( 0%) ggc
     rest of compilation     :   0.02 ( 1%) usr   0.00 ( 0%) sys   0.02 ( 1%) wall     133 kB ( 0%) ggc
     unaccounted todo        :   0.02 ( 1%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
     TOTAL                 :   2.32             0.40             2.72              84519 kB
    

    and the same for my gcc-4.6:

    Execution times (seconds)
     callgraph construction:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 1%) wall     527 kB ( 1%) ggc
     trivially dead code   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall       0 kB ( 0%) ggc
     df scan insns         :   0.02 ( 2%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall       5 kB ( 0%) ggc
     df reg dead/unused notes:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall      22 kB ( 0%) ggc
     preprocessing         :   0.08 ( 7%) usr   0.10 (26%) sys   0.14 ( 9%) wall    1016 kB ( 1%) ggc
     parser                :   0.68 (58%) usr   0.24 (63%) sys   0.83 (52%) wall   52215 kB (76%) ggc
     name lookup           :   0.28 (24%) usr   0.02 ( 5%) sys   0.41 (26%) wall   10211 kB (15%) ggc
     inline heuristics     :   0.00 ( 0%) usr   0.02 ( 5%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
     tree gimplify         :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall     637 kB ( 1%) ggc
     tree CFG construction :   0.02 ( 2%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall     463 kB ( 1%) ggc
     expand                :   0.02 ( 2%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall     426 kB ( 1%) ggc
     varconst              :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 1%) wall     132 kB ( 0%) ggc
     integrated RA         :   0.02 ( 2%) usr   0.00 ( 0%) sys   0.03 ( 2%) wall     304 kB ( 0%) ggc
     reload                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall      58 kB ( 0%) ggc
     machine dep reorg     :   0.02 ( 2%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       3 kB ( 0%) ggc
     final                 :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 1%) wall       4 kB ( 0%) ggc
     rest of compilation   :   0.02 ( 2%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall     133 kB ( 0%) ggc
     unaccounted todo      :   0.02 ( 2%) usr   0.00 ( 0%) sys   0.03 ( 2%) wall       0 kB ( 0%) ggc
     TOTAL                 :   1.18             0.38             1.59              68315 kB
    

    EDIT2: Linaro gcc 4.6.3's -ftime-report output for a VERY simple test.cpp (including options -fno-graphite-identity -fno-graphite ):

    Execution times (seconds)
     preprocessing         :   0.00 ( 0%) usr   0.02 (50%) sys   0.02 (10%) wall     121 kB ( 2%) ggc
     parser                :   0.10 (62%) usr   0.02 (50%) sys   0.11 (55%) wall    4022 kB (65%) ggc
     name lookup           :   0.02 (12%) usr   0.00 ( 0%) sys   0.04 (20%) wall     879 kB (14%) ggc
     tree gimplify         :   0.02 (13%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      20 kB ( 0%) ggc
     expand                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 5%) wall      34 kB ( 1%) ggc
     integrated RA         :   0.02 (12%) usr   0.00 ( 0%) sys   0.01 ( 5%) wall      59 kB ( 1%) ggc
     TOTAL                 :   0.16             0.04             0.20               6207 kB
    

    and for the same file with my old gcc 4.6.4:

    Execution times (seconds)
     preprocessing         :   0.02 (25%) usr   0.00 ( 0%) sys   0.02 (14%) wall     119 kB ( 2%) ggc
     parser                :   0.00 ( 0%) usr   0.04 (100%) sys   0.06 (43%) wall    4021 kB (65%) ggc
     name lookup           :   0.04 (50%) usr   0.00 ( 0%) sys   0.03 (21%) wall     879 kB (14%) ggc
     expand                :   0.02 (25%) usr   0.00 ( 0%) sys   0.01 ( 7%) wall      34 kB ( 1%) ggc
     unaccounted todo      :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 7%) wall       0 kB ( 0%) ggc
     TOTAL                 :   0.08             0.04             0.14               6204 kB
    

    Generating the preprocessed file with both compilers yielded no significant difference (output of Linaro's gcc was but 3 lines longer).


    EDIT1: gcc -v for the old one (path shortened or removed (eg --sbindir)

    # arm-none-linux-gnueabi-g++ -v
    Using built-in specs.
    COLLECT_GCC=..sysroots/i686-linux/usr/bin/arm-none-linux-gnueabi-g++
    COLLECT_LTO_WRAPPER=../libexec/gcc/arm-none-linux-gnueabi/4.6.4/lto-wrapper
    Target: arm-none-linux-gnueabi
    Configured with: ..tmp/work/armv7a-none-linux-gnueabi/gcc-cross-4.6.3+svnr184847-r27/gcc-4_6-branch/configure --build=i686-linux --host=i686-linux --target=arm-none-linux-gnueabi  --with-gnu-ld --enable-shared --enable-languages=c,c++ --enable-threads=posix --disable-multilib --enable-c99 --enable-long-long --enable-symvers=gnu --enable-libstdcxx-pch --program-prefix=arm-none-linux-gnueabi- --without-local-prefix --enable-lto --enable-libssp --disable-bootstrap --disable-libgomp --disable-libmudflap --with-system-zlib --with-linker-hash-style=gnu --with-ppl=no --with-cloog=no --enable-cheaders=c_global --enable-languages=c,c++,fortran --disable-libunwind-exceptions --with-mpfr=..sysroots/i686-linux/usr --with-system-zlib --enable-__cxa_atexit
    Thread model: posix
    gcc version 4.6.4 20120303 (prerelease) (GCC) 
    

    and Linaro gcc -v

    # arm-linux-gnueabihf-g++ -v
    Using built-in specs.
    COLLECT_GCC=..compiler/bin/arm-linux-gnueabihf-g++
    COLLECT_LTO_WRAPPER=../libexec/gcc/arm-linux-gnueabihf/4.7.2/lto-wrapper
    Target: arm-linux-gnueabihf
    Configured with: .build/src/gcc-linaro-4.7-2012.08/configure --build=i686-build_pc-linux-gnu --host=i686-build_pc-linux-gnu --target=arm-linux-gnueabihf --enable-languages=c,c++,fortran --enable-multilib --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16 --with-float=hard --with-pkgversion='crosstool-NG linaro-1.13.1-2012.08-20120827 - Linaro GCC 2012.08' --with-bugurl=https://bugs.launchpad.net/gcc-linaro --enable-__cxa_atexit --enable-libmudflap --enable-libgomp --enable-libssp --with-gmp=.. --with-mpfr=.. --with-mpc=.. --with-ppl=.. --with-cloog=.. --with-libelf=.. --with-host-libstdcxx='-L.. -lpwl' --enable-threads=posix --disable-libstdcxx-pch --enable-linker-build-id --enable-gold --with-local-prefix=.. --enable-c99 --enable-long-long --with-mode=thumb
    Thread model: posix
    gcc version 4.7.2 20120731 (prerelease) (crosstool-NG linaro-1.13.1-2012.08-20120827 - Linaro GCC 2012.08) 
    

    For the latter I also made adjustments to have --disable-multilib --disable-libmudflap --disable-libgomp --disable-multilib.

    And here's Ubuntu 12.04's arm compiler:

    > arm-linux-gnueabihf-g++-4.6 -v
    Using built-in specs.
    COLLECT_GCC=arm-linux-gnueabihf-g++-4.6
    COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/4.6/lto-wrapper
    Target: arm-linux-gnueabihf
    Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.6.3-1ubuntu5'  --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/arm-linux-gnueabihf/include/c++/4.6.3 --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --enable-objc-gc --enable-multilib --disable-sjlj-exceptions --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --with-mode=thumb --disable-werror --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=arm-linux-gnueabihf 
    

    You could pass -time or -ftime-report to the gcc compiler to find out why and where gcc is taking compilation time.

    But why does the compilation time matters so much to you?

    You should take care of the execution time of the produced executable binary.

    Also, show us the output of the -v option passed to your gcc

    And you might pass the -j option to your make command to have it work in parallel (eg running several gcc in parallel). You could also lower the optimization level, eg from -O3 to -O2 or -O1


    OK - you -ftime-report tests clearly show "parser" is the culprit; I'm guessing templates (in general) and STL (in particular) are the root cause.

    SUGGESTION:

    See if there's any way you can use "precompiled headers" in your tool chain. If you can, that might eliminate the entire problem.

    LINKS (unfortunately, I'm not sure which may or may not be applicable to you):

  • Why does C++ compilation take so long?

  • http://gcc.gnu.org/onlinedocs/gcc-4.0.4/gcc/Precompiled-Headers.html

  • http://clang.llvm.org/docs/UsersManual.html#precompiledheaders

  • In GCC, can precompiled headers be included from other headers?

  • http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0472c/CIHJFBHC.html

  • 链接地址: http://www.djcxy.com/p/15242.html

    上一篇: 如果工具链是未知的,则编译嵌入式系统的C程序

    下一篇: Linaro编译速度