threading, concurrency and parallelism on a multicore processor

2018-06-28 10:13:34

I've been reading about the topic and I'm a bit confused about multi-threading and parallelism; I've read this question

"How are threads distributed between CORES in a multithreaded system? Lets say I have a program that creates 6 threads. My system has 3 CORES. In this case will the threads be distributed between the 3 COREs or will all the threads execute on the same CORE?" - physical CPU with 3 logical cores

The answer to the question suggests that I have to tell the OS which core executes what, is this a universal truth when multithreading in a language like Java or C#?

Link to question

In a scenario where I don't specify which core does what can I achieve parallelism in a multithreaded program written in languages like Java or C#?

I've been learning some erlang and I've reached the topic of spawning processes, when these processes are spawned; does erlang tell the OS to assign the different processes to different cores? I saw this code from Learn you some erlang, this spawns 10 processes and each will print out a number but why use timer:sleep ? isn't that just making the program look parallel when instead it's just making something stop so something else can run (like in concurrency) ?

4> G = fun(X) -> timer:sleep(10), io:format("~p~n", [X]) end.
#Fun<erl_eval.6.13229925>
5> [spawn(fun() -> G(X) end) || X <- lists:seq(1,10)].

I implemented this in java and got a similar result, I created 6 threads and each "thread" has a loop where it prints the thread name then a number then sleeps for 10 milliseconds.

public class ThreadTest {

public static void main(String args[]) {

    Thread t1 = new Thread(new Thread2());
    Thread t2 = new Thread(new Thread3());
    Thread t3 = new Thread(new Thread4());
    Thread t4 = new Thread(new Thread5());
    Thread t5 = new Thread(new Thread6());

    t1.start();        
    t2.start();
    t3.start();
    t4.start();
    t5.start();
  }

}

Looking at both programs are they executing things concurrently or in parallel ? I also realise that computers are fast but even if I don't make the java program sleep it will just print the things one after another. With erlang, however, if I remove sleep it sometimes prints out a whole lot of the numbers then prints the number of processes and continues counting or it prints out all numbers then the process list last.

In reference to the question above, is java doing things on one core concurrently (with context switching) or is it utilising more cores and doing things in parallel but just too fast to give me random results ? (without sleep)

Is erlang using more cores and doing things in parallel since it sometimes prints out the process list in the middle of counting ? (without sleep)

Note: I have purposefully left out the code for the other threads and just thought it'd be better to explain what the classes do.

A conventional operating system (OS), for instance Linux, manages the execution of a number of processes (processes essentially correspond to programs).

A process initially executes on one thread, but can create additional threads as it executes. One of the main tasks of an OS is to manage the execution of all the process threads.

When there is just one processor, the OS scheduler context switches between different threads to provide concurrent execution .

When there are multiple processors, each processor essentially runs an instance of the OS scheduler, thereby executing threads that are waiting to be run. The result is parallel execution of the set of threads to be executed.

The behavior of concurrency or threading in a language depends on how it is implemented.

With Java, implementations of the JVM will most likely use threading mechanism provided by the OS, ie POSIX (see this question). The performance of a multithreaded Java program will therefore be determined by the OS. For example, see details on the Linux scheduler.

With Erlang, the situation is slightly different and I think the source of confusion. Because Erlang advocates the use of large numbers of processes (ie threads) and these communicate with message passing, the threading implementation must be efficient. For these reasons, POSIX threads are not suitable and the Erlang virtual machine has its own threading mechanism. The way this works is by allocating one OS thread for each core with a fixed affinity, and to run the Erlang scheduler on each of these.

The principles of how an OS schedules threads on a multicore computer aren't so different to how it is done on a single core machine. In effect each core is running a copy of the OS'es scheduler, and those talk to each other and decide how best to allocate threads to cores. If one of those schedulers finds that it has no ready threads to run on its core it will run a ready thread that's blocked in another core.

The thing is that those cores are mostly asynchronous and independent of each other, and there really is no guarantee of order of execution. So the order in which strings are output by different threads is not deterministic and can easily be different every time you run the program.

All this works because multicore computers of today implememt Symetrical Multi Processing (SMP). All cores can see all memory, so there is no real problem in choosing which core is the best one to run a thread.

However, the SMP we get from Xeons and the like is 'fake', that is they're partly NUMA with SMP synthesised by QPI. AMD processors are entirely NUMA with SMP being synthesised over Hypertransport. QPI and HyperTransport are very good, but there is an overhead which an OS shouldn't ignore. In short, it takes longer for a thread on one core to access memory that is electronically connected to a different core. So a good OS will attempt to run threads on cores closest to the memory they're accessing so as to get good performance. In Xeon land this is veeery complicated because the machine's default memory map is interleaved between CPUs (an attempt to make the faked SMP better).

So as soon as you start delving into core affinity and binding threads to particular cores (something that OSes allow you to do if you really want to) your program has to fully understand the microelectronic architecture of the hardware it happens to find itself running on. In general that's not wholly realistic and in almost all cases its probably better to just let the OS sort things out for you.

链接地址: http://www.djcxy.com/p/79412.html

上一篇: Java 8并行流中的自定义线程池

下一篇: 多核处理器上的线程化，并发性和并行性