How are Haskell programs compiled and executed internally?

2018-05-31 15:24:33

I'm having trouble understanding how Haskell (GHC) compiles programs, and how those programs are run.

GHC is the canonical example of a nontrivial program written in Haskell. However, parts of GHC seem not to be written in Haskell, namely the runtime environment (in C/C--). Why is that? Performance reasons? (I am aware of this site and its friends, but cannot make much sense of them.)

Speaking of the runtime environment: Why does a compiled language need one? Shouldn't the compiled program be machine code and nothing else? From what I understand, a runtime environment is somewhat similar to a virtual machine or a bytecode interpreter, that deals with some form of meta code and does the actual calculations based on that. So: what does the GHC runtime do exactly, and why is it necessary in the first place?

Concerning the FFI: How are C calls handled? Initially, I thought using the FFI generates a single executable where Haskell and C are compiled together. However, I read multiple times that GHC programs kind of do a call out of the program to the C function. This is especially relevant to understand the problem the FFI has with parallel programming. So: how are FFI functions different from normal Haskell functions?

To compile and execute a programming language on stock hardware you need a number of things:

a compiler to translate your source language into assembly code executable by the native host

a support library (aka runtime) for primitive language services, such as memory management, IO and thread management. Things that must be leveraged from lower-level system services.

C, Java, and GHC Haskell are examples of such systems. In the case of GHC, the entire architecture is described here. The pieces are also described individually, and in detail.

The compiler (written in Haskell), translates Haskell to C, assembly, LLVM bitcode and other formats. The strategy it uses is described best here: Implementing lazy functional languages on stock hardware:the Spineless Tagless G-machine.

The runtime services (aka "the GHC runtime") are described over several papers:

The multicore garbage collector with thread-local heaps

How the garbage collector services work

How multithreading works in GHC

How laziness is implemented

How the runtime calls code in foreign languages

I can offer some precision on what a runtime is.

A virtual machine is "a" kind of runtime , but not the only one. A runtime system is simply the environment (and the set of services) that your program can assume would be present during it execution. Even very low-level languages like C and C++ have runtime systems (think about malloc... someone/something is doing the allocation for you, or even division by zero checks).

In general higher level languages have a richer runtime (meanning the runtime offers more services to the executing program); those range from memory management (eg garbage collection) to reflection/introspection infrastructure (think ruby etc...) to array boundary check, but pretty much all languages have some kind of runtime system (if only the operating system).

1: Why is the RTS not written in Haskell?

Because it does low-level stuff that cannot be expressed in Haskell. Much like the Linux kernel is a system for running C programs, and yet parts of the Linux kernel are written in assembly, not C.

2: Why does a compiled program need a runtime environment? From what I understand, that's something like the Java bytecode interpretter.

GHCi uses something almost exactly like the Java bytecode interpretter. Compiled GHC programs do not; the compiled program is raw machine code.

Rather, the Haskell RTS is more like a kind of mini-OS. It does memory management, it does thread sheduling, it does certain aspects of exception handling, it does transaction handling. Every Haskell program runs under this mini-OS.

(It's a bit like even though a C program is compiled, it's raw machine code, but you still can't run it without an operating system like Windows or Linux or something.)

For example, every time a Haskell program runs out of memory, the Haskell program stops running, and the garbage collector starts running. The garbage collector tries to free up some memory, and once it has, the Haskell program starts running again.

Every compiled Haskell program has a copy of this garbage collector program, which is just one part of the Haskell RTS. Similarly, multiple Haskell threads can run inside one OS thread, so the RTS has a thread scheduler inside it. I could go on...

3: How is FFI handled? I thought the stuff was all compiled together.

It is all compiled [or rather, linked] together. If you write a C program, one C function can call another C function. When Haskell calls a C function, it's pretty much like any other function calling that C function. Depending on what the function call does, there are a few things that happen on the Haskell side though, which may add some overhead.

链接地址: http://www.djcxy.com/p/7516.html

上一篇: GHC专业化

下一篇: Haskell程序如何在内部编译和执行？