What is the advantage of a parallel pipeline compared with Task Parallelism?

2018-06-17 18:16:32

I often read about the pipeline pattern as a common and helpful pattern in terms of exploiting concurrency. But I wonder if there is any advantage of the parallel pipeline pattern compared with the Task Parallel Pattern.

Suppose we have three stages in a pipeline: A, B, C. When data needs to be processed A takes it, processes it and hands it over to B. When the next data chunk is coming in, the same happens and A and B are working concurrently.

So different stages in the pipeline can be executed in parallel, but when we use three pipelines working in parallel (as in Task Parallelism Pattern), we get exactly the same picture. When two data chunks are coming in one after another, the first chunk is taken by Pipeline 1, the next chunk is taken by Pipeline 2 and both chunks are processed concurrently.

Furthermore I can easily imagine a lot of problems in parallel Pipeline: The Buffer between the stages could block (or overflow), one stage is dominating in terms of processing speed so all stages before the slowest stage have to wait etc...

These problems do not exist in the Task Parallelism Pattern. Additionally, this pattern is more flexible when the chunks are coming in faster than the first stage of the pipeline can process them (or they can be fetched concurrently).

So why should I ever use the parallel pipeline pattern?

Thanks in advance for any ideas!

If you have a pipeline A=>B=>C and no further restrictions on it that's indeed useless. You could have just used a function C(B(A(input))) .

The concept becomes more useful if you allow different degrees of parallelism at the pipeline stages. Maybe step B accesses an SSD and you want at most 4 concurrent accesses. You could achieve the same thing with a semaphore.

If A, B and C are limited to a degree of parallelism of 1 the pipeline also has value: In the pipeline model all 3 nodes can execute concurrently. Using "three pipelines" as you put it is impossible because of the assumed parallelism limit (or you'd need 3 locks which is equivalent to the pipeline solution).

Sometimes, you want buffering between the nodes. Maybe, A rarely emits high bursts that B will process over time. Buffering helps keep A working and not stalled.

Sometimes, it's not a pipeline but a data flow network that branches in and out (possibly joins).

All in all I very rarely find a use case for dataflow networks. Often, it's simpler to just use data parallelism and use appropriate locks and semaphores. But this might be because of the domains I typically work in. YMMV.

Pipeline and Task Parallelism are definitely 2 different concepts.

Pipeline :

Implements Producer-Consumer Pattern . ProcessA gets some data processes and passes to the next one( ProcessB ). B can't do anything before A's processing. Same with B and C etc. There are dependencies among processes.

Ex: Refer this

Task Parallelism :

Simply there's no dependencies.

Ex: loop-parallels

So, You can't use task Parallelism for dependent tasks.

链接地址: http://www.djcxy.com/p/50172.html

上一篇: 并行Foreach缓慢创建线程

下一篇: 与Task Parallel并行相比，并行管道的优势是什么？