decoded threading in instruction set translator / emulator
I have written a full system simulator / emulator of a RISC style processor (and all the peripherals). Currently, it is using an indirect threaded emulation loop. Ie all the instruction footers are something in the style of:
pc += 4;
inst = loadWord(mem, pc);
instp = decodeTable[opcode(inst)];
goto *instp
This is performing quite well, I get around 70-80 MIPS on a modern machine when booting Linux, which is quite good.
However, I am looking at moving to a direct predecoded threaded interpreter model, that is something that looks as follows:
tPC += 1;
instp = predecodeMem[tPC].operation;
goto *instp;
The pre-decoding is not much of a problem in itself, it is just the replacement of the existing decoder and the addition of some shadow memory. My main problem with this is related to self modifying code (or semi-self modifying code).
In the simple case we can just allocate the predecode pages lazily when pages are visited which have not been executed before. The software TLB is then purged from all the entries to ensure we go through the memory simulation system on the next write to that page and thus, writes to executable pages will have to update the decode info as well which cost in performance, but as this is rare we should have no problem with it (also we can speed this up by adding sub page executable bits computed at runtime).
The problem here is about a long term code discovery when pages are reused by the operating system running inside the emulator. For example a memory page may be allocated by the Linux kernel, assigned as code for one process. Next time a process is created, the page may be allocated as data, but in the scheme just described this cause problems since the now pure data page will have to go through rather slow predecoding on every byte write.
Some current ideas, but non of these I find particularly nice, ie they all have significant drawbacks:
I find that the literature is seriously lacking on discussions on this topic. What general methods exist that can age pages when they are no longer used so that we for example clear the execute bit and the predecode memory associated with the page?
链接地址: http://www.djcxy.com/p/50564.html下一篇: 在指令集翻译器/仿真器中解码线程