Is Java ArrayList really this much slower than C++ vector?
I do not want to start yet another pointless flamewar about whether Java or C++ is the better language in general. I want to know whether a comparison I did for a specific task is fair and the measured data correct.
We need to decide whether to use Java or C++ for our next project. I'm in the C++ camp but I want to have solid arguments for my case. Our application is a special and has the following needs:
I implemented a toy program in C++ and in Java. First, I present the C++ version:
#include <iostream>
#include <vector>
#include <cstdlib>
using namespace std;
struct Point{
float x, y;
};
int main(int argc, char*argv[]){
int n = atoi(argv[1]);
vector<Point>arr;
for(int i=0; i<n; ++i){
Point p;
p.x = i;
p.y = i+0.5f;
arr.push_back(p);
}
float dotp = 0;
for(int i=0; i<n; ++i)
dotp += arr[i].x * arr[i].y;
cout << dotp << endl;
}
Next is the Java version that does the same thing:
import java.util.*;
class Point{
public float x, y;
}
class Main{
static public void main(String[]args){
int n = Integer.parseInt(args[0]);
ArrayList<Point> arr = new ArrayList<Point>();
for(int i=0; i<n; ++i){
Point p = new Point();
p.x = i;
p.y = i+0.5f;
arr.add(p);
}
float dotp = 0;
for(int i=0; i<n; ++i)
dotp += arr.get(i).x * arr.get(i).y;
System.out.println(dotp);
}
}
I pass the number of elements using the commandline to the program to prevent the optimizer from executing the program during compilation. The computed value is not useful. The only interesting question is how fast the programs run and how much memory they use. I start with C++:
$ g++ --version
g++ (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
$ g++ -O3 test.cpp -o test
$ /usr/bin/time ./test 1000000
3.33381e+17
0.01user 0.00system 0:00.02elapsed 100%CPU (0avgtext+0avgdata 10084maxresident)k
0inputs+0outputs (0major+2348minor)pagefaults 0swaps
$ /usr/bin/time ./test 10000000
3.36984e+20
0.08user 0.01system 0:00.09elapsed 100%CPU (0avgtext+0avgdata 134380maxresident)k
0inputs+0outputs (0major+4074minor)pagefaults 0swaps
$ /usr/bin/time ./test 100000000
2.42876e+23
0.77user 0.09system 0:00.87elapsed 99%CPU (0avgtext+0avgdata 1050400maxresident)k
0inputs+0outputs (0major+6540minor)pagefaults 0swaps
The "user" time is how long the program ran. For 10^6 elements, it ran for 0.01 sec, for 10^7 elements 0.08 sec, and for 10^8 elements 0.77 sec. "maxresident" is the amount of physically memory in kilobyte that the kernel gave the program. For 10^6 its 10 MB, for 10^7 its 132 MB, and for 10^8 its 1 GB.
The memory consumption sounds right. An array with x elements needs sizeof(float)*2*x=8*x bytes of memory. For 10^6 elements that is about 8MB, for 10^7 about 76MB, and for 10^8 about 762 MB.
Next, I run the Java program:
$ javac -version
javac 1.6.0_41
$ javac Main.java
$ java -version
java version "1.7.0_131"
OpenJDK Runtime Environment (IcedTea 2.6.9) (7u131-2.6.9-0ubuntu0.14.04.2)
OpenJDK 64-Bit Server VM (build 24.131-b00, mixed mode)
$ /usr/bin/time java Main 1000000
3.33381168E17
0.16user 0.00system 0:00.09elapsed 173%CPU (0avgtext+0avgdata 79828maxresident)k
0inputs+64outputs (0major+4314minor)pagefaults 0swaps
$ /usr/bin/time java Main 10000000
3.3698438E20
5.23user 0.18system 0:02.07elapsed 261%CPU (0avgtext+0avgdata 424180maxresident)k
0inputs+64outputs (0major+13508minor)pagefaults 0swaps
$ /usr/bin/time java Main 100000000
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at Main.main(Main.java:14)
Command exited with non-zero status 1
3840.72user 13.06system 17:11.79elapsed 373%CPU (0avgtext+0avgdata 2281416maxresident)k
0inputs+1408outputs (0major+139893minor)pagefaults 0swaps
For 10^6 elements it needs 0.16 sec and 78 MB. For 10^7 elements it needs 5.23 sec and 414 MB. I tried to run the program for 10^8 elements but Java crashed. It used all cores of my machine (in a sequential program!) and ran for 17 min while occupying 2.2GB. My machine has 8 GB of memory.
For 10^6 elements C++ is 0.16 / 0.01 = 16 times faster and needs 78/10 = 7.8 times less memory. For 10^7 elements C++ is 5.23/0.08 = 65 times faster and needs 414/132 = 3.1 times less memory. Java did not finish on the test instance with 10^8 elements while the C++ program finished within well below a second.
For 10^6 Java seems manageable but less than ideal. For 10^7 and 10^8 it is an absolute no-go. I expected a slight performance advantage of C++ over Java but not something this drastic.
The most likely explanation is that my testing methodology is wrong or that I have a non-obvious performance bottleneck in my Java code. Another explanation would be that the OpenJDK JVM significantly lacks behind JVMs from other vendors.
Please explain to me why Java performs so bad in this benchmark. How did I unintentionally make Java look worse than it is?
Thanks
The JIT not running thus explains a small part of the effect but not the primary slowdown.
Right, Java is slow on start up because of JIT and it takes some time till it runs with full speed.
But the performance you describe is catastrophic and has another reason: You wrote
It used all cores of my machine (in a sequential program!)
and this must have been the garbage collector. The GC running hard means you're running out of memory. On my machine the times were
28.689 millis for 1 M pairs
143.104 millis for 10 M pairs
3100.856 millis for 100 M pairs
10.404 millis for 1 M pairs
113.054 millis for 10 M pairs
2528.371 millis for 100 M pairs
which is still a pain, but a possibly usable starting point. Observe that the second run is faster as it gets optimized better. Note that this is not how Java benchmarks should be written!
The reason for the memory consumption is that you have a List
of references to objects containing two floats instead of vector
of pairs of floats. Every references adds 4 or 8 bytes overhead, every object adds even more. Additionally, there's an indirection on every access.
If memory matters, then this isn't the right way how to code it in Java. There's surely a better way (I made give it try), but the code may get ugly. Java without value types sucks at such computations. IMHO it excels nearly everywhere else (personal opinion).
An equivalent C++ code would use vector<Point*>
. If you do this, your code gets slower and memory bigger, but still better than Java (object header overhead).
I rewrote the code so it uses a PointArray
storing the two floats next to each other in a single array. Without measuring anything, I claim that the memory consumption is about the same now. The times now are
31.505 millis for 1 M pairs
232.658 millis for 10 M pairs
1870.664 millis for 100 M pairs
17.536 millis for 1 M pairs
219.222 millis for 10 M pairs
1757.475 millis for 100 M pairs
which is still too slow. I guess, it's the bound checking, which can't be switched off in Java (unless you resolve to Unsafe
). Usually, the JIT (just-in-time compiler) can move them out of loops, which makes their cost negligible.
It may be also my slow (factor 1.5) array resizing (IIRC vector
uses a factor of 2). Or just a pity when the array gets resized when you're nearly done. As you're doing about nothing with the array, it may weight a lot.
In any case, you need at least one good Java programmer when you want to get fast processing of arrays of primitives. It may take them a day or two till you get good performance. A good library could do as well. Using List<Point>
is just too inefficient before Java 10 (or 11, or ...).
上一篇: 正确拉动edxops /论坛的方式