Serializing Running Programs in a Functional Interpreter

2018-06-20 21:54:11

I am writing an interpreter implemented functionally using a variations of the Cont Monad. Inspired by Smalltalk's use of images to capture a running program, I am investigating how to serialize the executing hosted program and need help determining how to accomplish this at a high level.

Problem Statement

Using the Cont monad, I can capture the current continuation of a running program in my interpreter. Storing the current continuation allows resuming interpreter execution by calling the continuation. I would like to serialize this continuation so that the state of a running program can be saved to disk or loaded by another interpreter instance. However, my language (I am both targeting and working in Javascript) does not support serializing functions this way.

I would be interested in an approach that can be used to build up the continuation at a given point of execution given some metadata without running the entire program again until it reaches that point. Preferably, minimal changes to the implementation of the interpreter itself would be made.

Considered Approach

One approach that may work is to push all control flow logic into the program state. For example, I currently express a C style for loop using the host language's recursion for the looping behavior:

var forLoop = function(init, test, update, body) {
    var iter = function() {
        // When test is true, evaluate the loop body and start iter again
        // otherwise, evaluate an empty statement and finish
        return branch(test,    
            next(             
                next(body, update),
                iter()),
             empty);
    };

    return next(
        init,
        iter());
};

This is a nice solution but if I pause the program midway though a for loop, I don't know how I can serialize the continuation that has been built up.

I know I can serialize a transformed program using jumps however, and the for loop can be constructed from jump operations. A first pass of my interpreter would generate blocks of code and save these in the program state. These blocks would capture some action in the hosted language and potentially execute other blocks. The preprocessed program would look something like this:

Label         Actions (Block of code, there is no sequencing)
-----------------------------------
start:        init, GOTO loop

loop:         IF test GOTO loop_body ELSE GOTO end

loop_body:    body, GOTO update

update:       update, GOTO loop

end:          ...

This makes each block of code independent, only relying on values stored in the program state.

To serialize, I would save off the current label name and the state when it was entered. Deserialization would preprocess the input code to build the labels again and then resume at the given label with the given state. But now I have to think in terms of these blocks when implementing my interpreter. Even using composition to hide some of this seems kind of ugly.

Question

Are there any good existing approaches for addressing this problem? Am I thinking about serializing a program the entirely wrong way? Is this even possible for structures like this?

After more research, I have some thoughts on how I would do this. However, I'm not sure that adding serialization is something I want to do at this point as it would effect the rest of the implementation so much.

I'm not satisfied with this approach and would greatly like to hear any alternatives.

Problem

As I noted, transforming the program into a list of statements makes serialization easier. The entire program can be transformed into something like assembly language, but I wanted to avoid this.

Keeping a concept of expressions, what I didn't originally consider is that function calls can occur inside of deeply nested expressions. Take this program for example:

function id(x) { return x; }
10 + id(id(5)) * id(3);

The serializer should be able to serialize the program at any statement, but the statement could potentially be evaluated inside of an expression.

Host Functions In the State

The reason the program state cannot be easily serialized is that it contains host functions for continuations. These continuations must be transformed into data structures that can be serialized and independently reconstructed into the action the original continuation represented. Defunctionalization is most often used to express a higher order language in a first order language, but I believe it would also enable serialization.

Not all continuations can be easily defunctionalized without drastically rewriting the interpreter. As we are only interested in serialization at specific points, serialization at these points requires the entire continuation stack be defunctionalized. So all statements and expressions must be defunctionalized, but internal logic can remain unchanged in most cases because we don't want to allow serialization partway though an internal operation.

However, to my knowledge, defunctionalization does not work the Cont Monad because of bind statements. The lack of a good abstraction makes it difficult to work with.

Thoughts on a Solution

The goal is to create an object made up of only simple data structures that can be used to reconstruct the entire program state.

First, to minimize the amount of work required, I would rewrite the statements level interpreter to use something more like a state machine that can be more easily serialized. Then, I would defunctionalize expressions. Function calls would push the defunctionlized continuation for the remaining expression onto an internal stack for the state machine.

Making Program State a Serializable Object

Looking at how statements work, I'm not convinced that the Cont Monad is the best approach for chaining statements together (I do think it works pretty well at the expression level and for internal operations however). A state machine seems more natural approach, and this would also be easier to serialize.

A machine that steps between statements would be written. All other structures in the state would also be made serializable. Builtin functions would have to use serializable handles to identify them so that no functions are in the state.

Handling Expressions

Expressions would be rewritten to pass defunctionalized continuations instead of host function continuations. When a function call is encountered in an expression, it captures the defunctionalized current continuation and pushes it onto the statement machine's internal stack (This would only happen for hosted functions, not builtin ones), creating a restore point where computation can resume.

When the function returns, the defunctionalized continuation is passed the result.

Concerns

Javascript also allows hosted functions to be evaluated inside almost any expression (getters, setters, type conversion, higher order builtins) and this may complicate things if we allow serialization inside of those functions.

Defunctionalization seems to require working directly with continuations and would make the entire interpreter less flexible.

链接地址: http://www.djcxy.com/p/58764.html

上一篇: C＃连续Monad实现

下一篇: 在功能解释器中序列化正在运行的程序