How are you taking advantage of Multicore?

As someone in the world of HPC who came from the world of enterprise web development, I'm always curious to see how developers back in the "real world" are taking advantage of parallel computing. This is much more relevant now that all chips are going multicore, and it'll be even more relevant when there are thousands of cores on a chip instead of just a few.

My questions are:

  • How does this affect your software roadmap?
  • I'm particularly interested in real stories about how multicore is affecting different software domains, so specify what kind of development you do in your answer (eg server side, client-side apps, scientific computing, etc).
  • What are you doing with your existing code to take advantage of multicore machines, and what challenges have you faced? Are you using OpenMP, Erlang, Haskell, CUDA, TBB, UPC or something else?
  • What do you plan to do as concurrency levels continue to increase, and how will you deal with hundreds or thousands of cores?
  • If your domain doesn't easily benefit from parallel computation, then explaining why is interesting, too.
  • Finally, I've framed this as a multicore question, but feel free to talk about other types of parallel computing. If you're porting part of your app to use MapReduce, or if MPI on large clusters is the paradigm for you, then definitely mention that, too.

    Update: If you do answer #5, mention whether you think things will change if there get to be more cores (100, 1000, etc) than you can feed with available memory bandwidth (seeing as how bandwidth is getting smaller and smaller per core). Can you still use the remaining cores for your application?


    My research work includes work on compilers and on spam filtering. I also do a lot of 'personal productivity' Unix stuff. Plus I write and use software to administer classes that I teach, which includes grading, testing student code, tracking grades, and myriad other trivia.

  • Multicore affects me not at all except as a research problem for compilers to support other applications. But those problems lie primarily in the run-time system, not the compiler.
  • At great trouble and expense, Dave Wortman showed around 1990 that you could parallelize a compiler to keep four processors busy . Nobody I know has ever repeated the experiment. Most compilers are fast enough to run single-threaded. And it's much easier to run your sequential compiler on several different source files in parallel than it is to make your compiler itself parallel. For spam filtering, learning is an inherently sequential process . And even an older machine can learn hundreds of messages a second, so even a large corpus can be learned in under a minute. Again, training is fast enough .
  • The only significant way I have of exploiting parallel machines is using parallel make . It is a great boon, and big builds are easy to parallelize . Make does almost all the work automatically. The only other thing I can remember is using parallelism to time long-running student code by farming it out to a bunch of lab machines, which I could do in good conscience because I was only clobbering a single core per machine, so using only 1/4 of CPU resources. Oh, and I wrote a Lua script that will use all 4 cores when ripping MP3 files with lame. That script was a lot of work to get right.
  • I will ignore tens, hundreds, and thousands of cores . The first time I was told "parallel machines are coming; you must get ready" was 1984. It was true then and is true today that parallel programming is a domain for highly skilled specialists . The only thing that has changed is that today manufacturers are forcing us to pay for parallel hardware whether we want it or not. But just because the hardware is paid for doesn't mean it's free to use. The programming models are awful, and making the thread/mutex model work, let alone perform well, is an expensive job even if the hardware is free. I expect most programmers to ignore parallelism and quietly get on about their business. When a skilled specialist comes along with a parallel make or a great computer game, I will quietly applaud and make use of their efforts. If I want performance for my own apps I will concentrate on reducing memory allocations and ignore parallelism.
  • Parallelism is really hard. Most domains are hard to parallelize. A widely reusable exception like parallel make is cause for much rejoicing.
  • Summary (which I heard from a keynote speaker who works for a leading CPU manufacturer): the industry backed into multicore because they couldn't keep making machines run faster and hotter and they didn't know what to do with the extra transistors. Now they're desperate to find a way to make multicore profitable because if they don't have profits, they can't build the next generation of fab lines. The gravy train is over, and we might actually have to start paying attention to software costs.

    Many people who are serious about parallelism are ignoring these toy 4-core or even 32-core machines in favor of GPUs with 128 processors or more. My guess is that the real action is going to be there.


    For web applications it's very, very easy: ignore it. Unless you've got some code that really begs to be done in parallel you can simply write old-style single-threaded code and be happy.

    You usually have a lot more requests to handle at any given moment than you have cores. And since each one is handled in its own Thread (or even process, depending on your technology) this is already working in parallel.

    The only place you need to be careful is when accessing some kind of global state that requires synchronization. Keep that to a minimum to avoid introducing artificial bottlenecks to an otherwise (almost) perfectly scalable world.

    So for me multi-core basically boils down to these items:

  • My servers have less "CPUs" while each one sports more cores (not much of a difference to me)
  • The same number of CPUs can substain a larger amount of concurrent users
  • When the seems to be performance bottleneck that's not the result of the CPU being 100% loaded, then that's an indication that I'm doing some bad synchronization somewhere.

  • At the moment - doesn't affect it that much, to be honest. I'm more in 'preparation stage', learning about the technologies and language features that make this possible.
  • I don't have one particular domain, but I've encountered domains like math (where multi-core is essential), data sort/search (where divide & conquer on multi-core is helpful) and multi-computer requirements (eg, a requirement that a back-up station's processing power is used for something).
  • This depends on what language I'm working. Obviously in C#, my hands are tied with a not-yet-ready implementation of Parallel Extensions that does seem to boost performance, until you start comparing same algorithms with OpenMP (perhaps not a fair comparison). So on .NET it's going to be an easy ride with some forParallel.For refactorings and the like.
    Where things get really interesting is with C++, because the performance you can squeeze out of things like OpenMP is staggering compared to .NET. In fact, OpenMP surprised me a lot, because I didn't expect it to work so efficiently. Well, I guess its developers have had a lot of time to polish it. I also like that it is available in Visual Studio out-of-the-box, unlike TBB for which you have to pay.
    As for MPI, I use PureMPI.net for little home projects (I have a LAN) to fool around with computations that one machine can't quite take. I've never used MPI commercially, but I do know that MKL has some MPI-optimized functions, which might be interesting to look at for anyone needing them.
  • I plan to do 'frivolous computing', ie use extra cores for precomputation of results that might or might not be needed - RAM permitting, of course. I also intend to delve into costly algorithms and approaches that most end users' machines right now cannot handle.
  • As for domains not benefitting from parallellization... well, one can always find something. One thing I am concerned about is decent support in .NET, though regrettably I have given up hope that speeds similar to C++ can be attained.
  • 链接地址: http://www.djcxy.com/p/79310.html

    上一篇: 为什么存在async关键字

    下一篇: 你如何利用多核?