What is the fastest virtual machine design for x86?

I will implement a virtual machine in x86 and I wonder what kind of design would yield best results. What should I concentrate on to squish out the juice? I will to implement the whole virtual machine in x86 assembly.

I haven't much instructions and I can choose their form. The instructions project directly into smalltalk's syntax in blocks. I give out the instruction design I were thinking of:

^ ...       # return
^null     # return nothing
object    # address to object
... selector: ... # message pass (in this case arity:1 selector: #selector:)
var := ... # set
var # get

The sort of VM I were thinking about:

mov eax, [esi]
add esi, 2
mov ecx, eax
and eax, 0xff
and ecx, 0xff00 # *256
shr ecx, 5          # *8
jmp [ecx*4 + operations]
align 8:
    operations:
dd retnull
dd ret
# so on...
    retnull:          # jumps here at retnul
# ... retnull action
    ret:
# ... ret action
#etc.

Don't start asking why I need yet another virtual machine implementation. Interpretive routines aren't stock stuff you just pick up whenever you need them. Most virtual machines you are proposing elsewhere are weighted towards portability with the cost of performance. My goal is not the portability, my goal is the performance.

The reason this interpreter is needed at all is because smalltalk blocks doesn't end up gotten interpreted the same way:

A := B subclass: [
    def a:x [^ x*x]
    clmet b [...]
    def c [...]
    def d [...]
]

[ 2 < x ] whileTrue: [...]

(i isNeat) ifTrue: [...] ifFalse: [...]

List fromBlock: [
    "carrots"
    "apples"
    "oranges" toUpper
]

I need the real benefit coming from the interpretive routines, that is the choice of context where to read the program in. Of course, good compiler should just most of the time compile the obvious cases like: 'ifTrue:ifFalse' or 'whileTrue:', or the list example. The need for interpreter doesn't just disappear because you always may hit a case where you can't be sure the block gets the treatment you expect.


I see there is some confusion about portability here, so I feel obliged to clarify matters some. These are my humble opinions so you are, of course, free to object against them.

I assume you came accross http://www.complang.tuwien.ac.at/forth/threading/ if you consider writing a VM seriously, so I won't dwell upon the described techniques.

Already mentioned, targeting a VM has some advantages such as reduced code size, reduced compiler complexity (often translates to faster compilation), portability (note that the point of a VM is portability of the language, so it doesn't matter if the VM itself is not portable).

Considering dynamic nature of your example, your VM will resemble a JIT compiler more than other more popular ones. So, altough S.Lott missed the point in this case, his mentioning of Forth is very on the spot. If I were to design a VM for a very dynamic language, I would separate interpretation into two stages;

  • A producer stage which consults an AST stream on demand and transforms it to a more meaningful form (for example, taking a block, deciding whether it should be executed right away or stored in somewhere for later execution) possibly introducing new kinds of tokens. Essentially you recover context sensitive information that may be lost in parsing here.

  • A consumer stage fetching the generated stream from 1 and executes it blindly like any other machine. If you make it Forth like, you can just push a stored stream and be done with it instead of jumping instruction pointer around.

  • As you say, just mimicking how the damn processor work in another way doesn't accomplish any dynamism (or any other feature worth a damn, like security) that you require. Otherwise, you would be writing a compiler.

    Of course, you can add arbitrarily comlex optimizations in stage 1.


    If you want something really fast, try using LLVM. It can generate native code for most processors from a high level program description. You can either go with your own assembly language or generate the llvm structure skipping the assembly phase, depending on what you find most convenient.

    I'm not sure if it's the best for your problem, but it's definitely what I would use if I would do some performance critical execution of code that can't be compiled with the rest of the program.


    The point of an interpreter is portability, most of the time. The fastest approach I can think of is to generate x86 code in memory directly, just like JIT compilers do, but then, of course, you don't have an interpreter anymore. You have a compiler.

    However, I'm not sure writing the interpreter in assembler will give you the best performance (unless you're an assembler guru and your project is very limited in scope). Using a higher-level language can help you focus on better algorithms for, say, symbol lookup and register allocation strategies.

    链接地址: http://www.djcxy.com/p/72394.html

    上一篇: 如何为现代PC电脑创建自己的x86操作系统

    下一篇: 什么是x86最快的虚拟机设计?