Multiprocessing : More processes than cpu.count
Note : I "forayed" into the land of multiprocessing
2 days ago. So my understanding is very basic.
I am writing and application for uploads to amazon s3
buckets. In case the file size is larger( 100mb
), Ive implemented parallel uploads using pool
from the multiprocessing
module. I am using a machine with core i7
, i had a cpu_count
of 8
. I was under the impression that if i do pool = Pool(process = 6)
I use 6
cores and the file begins to upload in parts and the uploads for the first 6 parts begins simultaneously. To see what happens when the process
is greater than the cpu_count
, i entered 20 (implying that i want to use 20 cores). To my surprise instead of getting a block of errors the program began to upload 20 parts simultaneously (I used a smaller chunk size
to make sure there are plenty of parts). I dont understand this behavior. I have only 8
cores, so how cant he program accept an input of 20? When I say process=6
, does it actually use 6 threads?? Which can be the only explanation of 20 being a valid input as there can be 1000s of threads. Can someone please explain this to me.
Edit:
I 'borrowed' the code from here. I have changed it only slightly wherein I ask the user for a core usage for his choice instead of setting parallel_processes
to 4
The number of processes running concurrently on your computer is not limited by the number of cores. In fact you probably have hundreds of programs running right now on your computer - each with its own process. To make it work the OS assigns one of your 8 processors to each process or thread only temporarily - at some point it may get stopped and another process will take its place. See What is the difference between concurrent programming and parallel programming? if you want to find out more.
Edit: Assigning more processes in your uploading example may or may not make sense. Reading from disk and sending over the network is normally a blocking operation in python. A process that waits for its chunk of data to be read or sent can be halted so that another process may start its IO. On the other hand, with too many processes either file I/O or network I/O will become a bottleneck and your program will slow down because of the additional overhead needed for process switching.
链接地址: http://www.djcxy.com/p/79390.html上一篇: 并行还是并行?
下一篇: 多处理:比cpu.count更多的进程