process for cpu intensive task?

So I'm starting to use node.js for a project I'm doing.

When a client makes a request, My node.js server fetches from another server a json and then reformats it into a new json that gets served to this client. However, the json that the node server got from the other server can potentially be pretty big and so that "massaging" of data is pretty cpu intensive.

I've been reading for the past few hours how node.js isn't great for cpu tasks and the main response that I've seen is to spawn a child-process (basically a .js file running through a different instance of node) that deals with any cpu intensive tasks that might block the main event loop.

So let's say I have 20,000 concurrent users, that would mean it would spawn 20,000 os-level jobs as it's running these child-processes.

Does this sound like a good idea? (A different web server would just create 20,000 threads on the same process.)

I'm not sure if I should be running a child-process. But I do need to make a non-blocking cpu intensive task. Any ideas of what I should do?


The V8 Javascript Engine that powers Node is actually pretty fast compared to many server-side languages.

The issue is that Node's Evented Model is very similar to cooperative multitasking -- a particular request's operations will continue until it cedes control back to the Javascript Event Loop, so high CPU tasks will block up the loop (meaning a random selection of users will get perfect performance and another group will get timeouts, instead of performance degrading gracefully with load).

So, for CPU-intensive tasks, there are several solutions you can use:

  • You can treat your code like a processing pipeline, and simply process.nextTick between significant chunks of processing to reduce the average latency (while increasing the absolute minimum), basically being more "cooperative" and not letting any one request hog the CPU for a long time.
  • If your work is pure Javascript (no Node modules needed), you can use the node-webworker-threads library to offload the CPU-intensive work to threads. However, constantly spawning new threads is probably a bad idea, so you'll probably want a pool of threads that you enqueue work to that these workers pull from and send back into and output queue. In which case...
  • You create a pool of child process workers and use the same queuing mechanism, where the pool size depends on the % of requests that need the CPU intensive path, the total number of requests, and the tolerable latency increase allowed for these requests.

  • The people who say that don't know how to architect solutions.

    NodeJS is exactly what it says, It is a node, and should be treated like such.

    In your example, your node instance connects to an external api and grabs json to process and send back.

    ie 1. Get // server.com/getJSON 2. Process the json 3. Post // server.com/postJSON

    So what do you do? Ask yourself is time an issue? if so then node isnt the solution However if you are more interested in raw processing power so instead of 1 request done in 4 seconds

    You are interested in 200 requests finishing in 10 seconds, but each individual one taking about the full 10 seconds.

    Diagnose how long your JSON should take to massage, if it is less than 1 second. Just run 4 node instances instead of 1.

    However if its more complex than that, Break the json into segments to process. And use asynchronous callbacks to process each segment

    process.nextTick(function( doprocess(segment1); process.nextTick(function() {doprocess(segment2)

    each doProcess calls the next doProcess

    Node js will trade time between requests.

    Now Take that solution and scale it too 4 node instances per server, and 2-5 servers

    and suddenly you have an extremely scaleable and cost effective solution.

    链接地址: http://www.djcxy.com/p/52524.html

    上一篇: 何时使用线程池?

    下一篇: 用于cpu密集型任务的进程?