How should I spawn threads for parallel computation?

Today, I got into multi-threading. Since it's a new concept, I thought I could begin to learn by translating a simple iteration to a parallelized one. But, I think I got stuck before I even began.

Initially, my loop looked something like this:

let stuff: Vec<u8> = items.into_iter().map(|item| {
    some_item_worker(&item)
}).collect();

I had put a reasonably large amount of stuff into items and it took about 0.05 seconds to finish the computation. So, I was really excited to see the time reduction once I successfully implemented multi-threading!

When I used threads, I got into trouble, probably due to my bad reasoning.

use std::thread;

let threads: Vec<_> = items.into_iter().map(|item| {
    thread::spawn(move || {
        some_item_worker(&item)
    })
}).collect(); // yeah, this is followed by another iter() that unwraps the values

I have a quad-core CPU, which means that I can run only up to 4 threads concurrently. I guessed that it worked this way: once the iterator starts, threads are spawned. Whenever a thread ends, another thread begins, so that at any given time, 4 threads run concurrently.

The result was that it took (after some re-runs) ~0.2 seconds to finish the same computation. Clearly, there's no parallel computing going on here. I don't know why the time increased by 4 times, but I'm sure that I've misunderstood something.

Since this isn't the right way, how should I go about modifying the program so that the threads execute concurrently?

EDIT:

I'm sorry, I was wrong about that ~0.2 seconds. I woke up and tried it again, when I noticed that the usual iteration ran for 2 seconds. It turned out that some process had been consuming the memory wildly. When I rebooted my system and tried the threaded iteration again, it ran for about 0.07 seconds. Here are some timings for each run.

Actual iteration (first one):

0.0553760528564 seconds
0.0539519786835 seconds
0.0564560890198 seconds

Threaded one:

0.0734670162201 seconds
0.0727820396423 seconds
0.0719120502472 seconds

I agree that the threads are indeed running concurrently, but it seems to consume another 20 ms to finish the job. My actual goal was to utilize my processor to run threads parallel and finish the job soon. Is this gonna be complicated? What should I do to make those threads run in parallel, not concurrent?


I have a quad-core CPU, which means that I can run only up to 4 threads concurrently.

Only 4 may be running concurrently, but you can certainly create more than 4...

whenever a thread ends, another thread begins, so that at any given time, 4 threads run concurrently (it was just a guess).

Whenever you have a guess, you should create an experiment to figure out if your guess is correct. Here's one:

use std::{iter, thread, time::Duration};

fn main() {
    let items: Vec<_> = iter::repeat(0).take(500).collect();

    let threads: Vec<_> = items
        .into_iter()
        .map(|_| {
            thread::spawn(move || {
                println!("Started!");
                thread::sleep(Duration::from_millis(500));
                println!("Finished!");
            })
        })
        .collect();

    for handle in threads {
        handle.join().unwrap()
    }
}

If you run this, you will see that "Started!" is printed out 500 times, followed by 500 "Finished!"

Clearly, there's no parallel computing going on here

Unfortunately, your question isn't fleshed out enough for us to tell why your time went up. In the example I've provided, it takes a little less than 600 ms, so it's clearly not happening in serial!


Creating a thread has a cost. If the cost of the computation inside the thread is small enough, it'll be dwarfed by the cost of the threads or the inefficiencies caused by the threads.

For example, spawning 10 million threads to double 10 million u8s will probably not be worth it. Vectorizing it would probably yield better performance.

That said, you still might be able to get some improvement through parallelizing cheap tasks. But you want to use fewer threads through a thread pool w/ a small number of threads (so you have a (small) number of threads created at any given point, less CPU contention) or something more sophisticated (under the hood, the api is quite simple) like Rayon.

// Notice `.par_iter()` turns it into a `parallel iterator`
let stuff: Vec<u8> = items.par_iter().map(|item| {
    some_item_worker(&item)
}).collect();
链接地址: http://www.djcxy.com/p/35392.html

上一篇: ExecutorService和ForkJoinPool

下一篇: 我应该如何为并行计算生成线程?