Java or C# (or good old C)?

2018-06-14 04:19:43

I'm currently deciding on a platform to build a scientific computational product on, and am deciding on either C#, Java, or plain C with Intel compiler on Core2 Quad CPU's. It's mostly integer arithmetic.

My benchmarks so far show Java and C are about on par with each other, and .NET/C# trails by about 5%- however a number of my coworkers are claiming that .NET with the right optimizations will beat both of these given enough time for the JIT to do its work.

I always assume that the JIT would have done it's job within a few minutes of the app starting (Probably a few seconds in my case, as it's mostly tight loops), so I'm not sure whether to believe them

Can anyone shed any light on the situation? Would .NET beat Java? (Or am I best just sticking with C at this point?).

The code is highly multithreaded and data sets are several terabytes in size.

Haskell/Erlang etc are not options in this case as there is a significant quantity of existing legacy C code that will be ported to the new system, and porting C to Java/C# is a lot simpler than to Haskell or Erlang. (Unless of course these provide a significant speedup).

Edit: We are considering moving to C# or Java because they may, in theory, be faster. Every percent we can shave off our processing time saves us tens of thousands of dollars per year. At this point we are just trying to evaluate whether C, Java, or c# would be faster.

The key piece of information in the question is this:

Every percent we can shave off our processing time saves us tens of thousands of dollars per year

So you need to consider how much it will cost to shave each percent off. If that optimization effort costs tens of thousands of dollars per year, then it isn't worth doing. You could make a bigger saving by firing a programmer.

With the right skills (which today are rarer and therefore more expensive) you can hand-craft assembler to get the fastest possible code. With slightly less rare (and expensive) skills, you can do almost as well with some really ugly-looking C code. And so on. The more performance you squeeze out of it, the more it will cost you in development effort, and there will be diminishing returns for ever greater effort. If the profit from this stays at "tens of thousands of dollars per year" then there will come a point where it is no longer worth the effort. In fact I would hazard a guess you're already at that point because "tens of thousands of dollars per year" is in the range of one salary, and probably not enough to buy the skills required to hand-optimize a complex program.

I would guess that if you have code already written in C, the effort of rewriting it all as a direct translation in another language will be 90% wasted effort. It will very likely perform slower simply because you won't be taking advantage of the capabilities of the platform, but instead working against them, eg trying to use Java as if it was C.

Also within your existing code, there will be parts that make a crucial contribution to the running time (they run frequently), and other parts that are totally irrelevant (they run rarely). So if you have some idea for speeding up the program, there is no economic sense in wasting time applying it to the parts of the program that don't affect the running time.

So use a profiler to find the hot spots, and see where time is being wasted in the existing code.

Update when I noticed the reference to the code being "multithreaded"

In that case, if you focus your effort on removing bottlenecks so that your program can scale well over a large number of cores, then it will automatically get faster every year at a rate that will dwarf any other optimization you can make. This time next year, quad cores will be standard on desktops. The year after that, 8 cores will be getting cheaper (I bought one over a year ago for a few thousand dollars), and I would predict that a 32 core machine will cost less than a developer by that time.

I'm sorry, but that is not a simple question. It would depend a lot on what exactly was going on. C# is certainly no slouch, and you'd be hard-pressed to say "java is faster" or "C# is faster". C is a very different beast... it maybe has the potential to be faster - if you get it right; but in most cases it'll be about the same, but much harder to write.

It also depends how you do it - locking strategies, how you do the parallelization, the main code body, etc.

Re JIT - you could use NGEN to flatten this, but yes; if you are hitting the same code it should be JITted very early on.

One very useful feature of C#/Java (over C) is that they have the potential to make better use of the local CPU (optimizations etc), without you having to worry about it.

Also - with .NET, consider things like "Parallel Extensions" (to be bundled in 4.0), which gives you a much stronger threading story (compared to .NET without PFX).

Don't worry about language; parallelize!

If you have a highly multithreaded, data-intensive scientific code, then I don't think worrying about language is the biggest issue for you. I think you should concentrate on making your application parallel, especially making it scale past a single node. This will get you far more performance than just switching languages.

As long as you're confined to a single node, you're going to be starved for compute power and bandwidth for your app. On upcoming many-core machines, it's not clear that you'll have the bandwidth you need to do data-intensive computing on all the cores. You can do computationally intensive work (like a GPU does), but you may not be able to feed all the cores if you need to stream a lot of data to every one of them.

I think you should consider two options:

MapReduce
Your problem sounds like a good match for something like Hadoop, which is designed for very data-intensive jobs.

Hadoop has scaled to 10,000 nodes on Linux, and you can shunt your work off either to someone else's (eg Amazon's, Microsoft's) or your own compute cloud. It's written in Java, so as far as porting goes, you can either call your existing C code from within Java, or you can port the whole thing to Java.

MPI
If you don't want to bother porting to MapReduce, or if for some reason your parallel paradigm doesn't fit the MapReduce model, you could consider adapting your app to use MPI. This would also allow you to scale out to (potentially thousands) of cores. MPI is the de-facto standard for computationally intensive, distributed-memory applications, and I believe there are Java bindings, but mostly people use MPI with C, C++, and Fortran. So you could keep your code in C and focus on parallelizing the performance-intensive parts. Take a look at OpenMPI for starters if you are interested.

链接地址: http://www.djcxy.com/p/40370.html

上一篇: Chrome调试器的分析器中的“（程序）”是什么？

下一篇: Java或C＃（或旧的C）？