在Python中嵌入低性能脚本语言

2018-07-04 06:37:12

我有一个网络应用程序。作为其中的一部分，我需要应用程序的用户能够编写（或复制和粘贴）非常简单的脚本来针对他们的数据运行。

脚本确实可以非常简单，而且性能只是最小的问题。我的意思是脚本的复杂性的例子如下所示：

ratio = 1.2345678
minimum = 10

def convert(money)
    return money * ratio
end

if price < minimum
    cost = convert(minimum)
else
    cost = convert(price)
end

价格和成本是一个全局变量（我可以在计算之后将其输入环境和访问）。

但是，我确实需要保证一些东西。

任何运行的脚本都无法访问Python的环境。他们不能导入东西，调用我没有明确公开的方法，读取或写入文件，产生线程等。我需要完全锁定。

我需要能够严格限制脚本运行的“周期数”。周期在这里是一个通用术语。如果语言是字节编译的，则可以是VM指令。为Eval / Apply循环应用调用。或者只是通过一些运行脚本的中央处理循环进行迭代。细节并不像我在短时间内阻止某些事情运行的能力，并向所有者发送电子邮件并说“您的脚本似乎不只是添加几个数字 - 将它们排除在外”。

它必须运行在Vanilla未修补的CPython上。

到目前为止，我一直在为自己的任务编写自己的DSL。我可以做到这一点。但我想知道我是否能够建立在巨人的肩膀上。是否有可用于Python的迷你语言？

有很多hacky的Lisp变体（甚至是我在Github上写的），但我更喜欢更多非专业语法的东西（比如更多的C或Pascal），并且我正在考虑将此作为编码的替代方案我自己想要一些更成熟的东西。

有任何想法吗？

这是我对这个问题的看法。要求用户脚本在vanilla CPython中运行意味着您需要为您的迷你语言编写解释器，或者将其编译为Python字节码（或者使用Python作为源语言），然后在执行字节码之前“清理”字节码。

我假设用户可以用Python编写他们的脚本，并且可以通过从分析树中过滤不安全的语法和/或从不安全的操作码中删除不安全的操作码的某种组合来充分地消毒源代码和字节代码字节码。

解决方案的第二部分要求用户脚本字节码由看门狗任务定期中断，这将确保用户脚本不超过某个操作码限制，并且所有这些都可以在vanilla CPython上运行。

我的尝试总结，主要集中在问题的第二部分。

用户脚本是用Python编写的。

使用byteplay筛选和修改字节码。

检测用户的字节码以插入操作码计数器并调用一个函数，该函数将上下文切换到看门狗任务。

使用greenlet执行用户的字节码，并在用户的脚本和看门狗协程之间切换。

在发生错误之前，看门狗强制执行可以执行的操作码数量的预设限制。

希望这至少能够朝着正确的方向前进。我有兴趣在您抵达时了解您的解决方案。

lowperf.py源代码：

# std
import ast
import dis
import sys
from pprint import pprint

# vendor
import byteplay
import greenlet

# bytecode snippet to increment our global opcode counter
INCREMENT = [
    (byteplay.LOAD_GLOBAL, '__op_counter'),
    (byteplay.LOAD_CONST, 1),
    (byteplay.INPLACE_ADD, None),
    (byteplay.STORE_GLOBAL, '__op_counter')
    ]

# bytecode snippet to perform a yield to our watchdog tasklet.
YIELD = [
    (byteplay.LOAD_GLOBAL, '__yield'),
    (byteplay.LOAD_GLOBAL, '__op_counter'),
    (byteplay.CALL_FUNCTION, 1),
    (byteplay.POP_TOP, None)
    ]

def instrument(orig):
    """
    Instrument bytecode.  We place a call to our yield function before
    jumps and returns.  You could choose alternate places depending on 
    your use case.
    """
    line_count = 0
    res = []
    for op, arg in orig.code:
        line_count += 1

        # NOTE: you could put an advanced bytecode filter here.

        # whenever a code block is loaded we must instrument it
        if op == byteplay.LOAD_CONST and isinstance(arg, byteplay.Code):
            code = instrument(arg)
            res.append((op, code))
            continue

        # 'setlineno' opcode is a safe place to increment our global 
        # opcode counter.
        if op == byteplay.SetLineno:
            res += INCREMENT
            line_count += 1

        # append the opcode and its argument
        res.append((op, arg))

        # if we're at a jump or return, or we've processed 10 lines of
        # source code, insert a call to our yield function.  you could 
        # choose other places to yield more appropriate for your app.
        if op in (byteplay.JUMP_ABSOLUTE, byteplay.RETURN_VALUE) 
                or line_count > 10:
            res += YIELD
            line_count = 0

    # finally, build and return new code object
    return byteplay.Code(res, orig.freevars, orig.args, orig.varargs,
        orig.varkwargs, orig.newlocals, orig.name, orig.filename,
        orig.firstlineno, orig.docstring)

def transform(path):
    """
    Transform the Python source into a form safe to execute and return
    the bytecode.
    """
    # NOTE: you could call ast.parse(data, path) here to get an
    # abstract syntax tree, then filter that tree down before compiling
    # it into bytecode.  i've skipped that step as it is pretty verbose.
    data = open(path, 'rb').read()
    suite = compile(data, path, 'exec')
    orig = byteplay.Code.from_code(suite)
    return instrument(orig)

def execute(path, limit = 40):
    """
    This transforms the user's source code into bytecode, instrumenting
    it, then kicks off the watchdog and user script tasklets.
    """
    code = transform(path)
    target = greenlet.greenlet(run_task)

    def watcher_task(op_count):
        """
        Task which is yielded to by the user script, making sure it doesn't
        use too many resources.
        """
        while 1:
            if op_count > limit:
                raise RuntimeError("script used too many resources")
            op_count = target.switch()

    watcher = greenlet.greenlet(watcher_task)
    target.switch(code, watcher.switch)

def run_task(code, yield_func):
    "This is the greenlet task which runs our user's script."
    globals_ = {'__yield': yield_func, '__op_counter': 0}
    eval(code.to_code(), globals_, globals_)

execute(sys.argv[1])

这是一个示例用户脚本user.py ：

def otherfunc(b):
    return b * 7

def myfunc(a):
    for i in range(0, 20):
        print i, otherfunc(i + a + 3)

myfunc(2)

这是一个示例运行：

% python lowperf.py user.py

0 35
1 42
2 49
3 56
4 63
5 70
6 77
7 84
8 91
9 98
10 105
11 112
Traceback (most recent call last):
  File "lowperf.py", line 114, in <module>
    execute(sys.argv[1])
  File "lowperf.py", line 105, in execute
    target.switch(code, watcher.switch)
  File "lowperf.py", line 101, in watcher_task
    raise RuntimeError("script used too many resources")
RuntimeError: script used too many resources

Jispy是最合适的！

它是Python中的一个JavaScript解释器，主要用于在Python中嵌入JS。

值得注意的是，它提供了递归和循环的检查和上限。正如需要。

它很容易让你使JavaScript代码可以使用python函数。

默认情况下，它不公开主机的文件系统或任何其他敏感元素。

全面披露：

Jispy是我的项目。我显然偏向它。

尽管如此，在这里，它确实看起来非常合适。

PS：

这个答案是在这个问题被问到3年后写的。

这样一个迟到的答案背后的动机很简单：
鉴于Jispy对这个问题的关注程度如何，具有相似要求的未来读者应该能够从中受益。

试试Lua。你提到的语法与Lua的几乎相同。请参阅如何将Lua嵌入到Python 3.x中？

链接地址: http://www.djcxy.com/p/95451.html

上一篇: Embedding a Low Performance Scripting Language in Python

下一篇: How to build an accurate translation engine?