正则表达式用于解析相似的汇编程序指令

介绍有点冗长,请耐心等待。 :)

我正在为使用汇编编写的大型源文件编写一个简单的基于正则表达式的解析器。 大多数这些指令只是移动,增加,减少和跳跃,但它是一个非常大的文件,我需要移植到两种不同的语言,我懒得手动完成它。 这是要求,我不能做很多'回合(所以请不要回答“你为什么不简单使用ANTLR”)。

因此,在我做了一些预处理之后(我已经完成了这部分工作:替换了定义和宏,并且删除了多余的空白和注释),现在我基本上必须逐行读取文件并将一行或多行解析为“中间”指令,然后我将使用它来生成或多或少的1对1等效(使用实数整数算术和一组GOTO)。

所以,假设我可以拥有所有这些不同的寻址模式:

寻址模式取决于指令的格式

我可以采取两种不同的方式:

  • 有一个单一的MOV正则表达式可以处理所有这些情况,或者
  • 对于每种指令类型,都有多个MOV正则表达式。 这种方法的问题是,我将不得不小心设计每个正则表达式以避免任何含糊之处。 而且看起来会有很多重复,因为源操作数和目标操作数共享许多寻址模式。
  • 我的问题是:如果我对所有指令都有一个正则表达式,我应该如何指定我的组和捕获才能区分不同的模式?

    或者我只是简单地捕捉所有内容,然后在初始匹配后处理源/目标地址?

    比如一个相当简单的match-all正则表达式就是:

    ^MOVs+(?<dest>[^s,]+)[s,]*(?<src>[^s,]+)$
    

    (分成多行注释):

    ^MOV              (?#instruction)
    s+               (?#some whitespace)
    (?<dest>[^s,]+)  (?#match everything except whitespace and comma)
    s*,s*           (?#match comma, allow some whitespace)
    (?<src>[^s,]+)   (?#match everything except whitespace and comma)$
    

    所以,我当然可以做到这一点,然后分别处理destsrc组。 但是,创建一个令人讨厌的复杂正则表达式来匹配下表中的所有情况会更好吗? 在这种情况下,我不确定我将如何解释这些捕获来了解寻址模式是否匹配。

    我使用C#,如果这有什么区别。


    当您尝试将解析器带入解析器的工作时,您会发现会发生什么。 我认为你的许多困难在于试图用正则表达式做太多。

    是的,我会建议一个像ANTLR或类似的解析器。

    如果你走了这条路线,你会写很多小的正则表达式来识别令牌(“MOV”,“#”,“[”,...),然后你会写一个语法来定义它们是如何组成的说明。 如果没有别的,这使得简单地编写解析部分变得更容易。

    你可以看到这是什么样的汇编代码。 (使用ANTLR以外的系统,但想法相同)。 这写起来相当简单,并且试图编写一个正则表达式来统一它们全部没有痛苦。 [我在一个晚上做了这个例子,并用它解析了一大批来源]。

    你不清楚“港口”是什么意思。 据推测,如果不是另一种机器体系结构,你将会使用另一种汇编语法。 为了做到这一点,你需要访问各种指令部分(对于所有可能的MOV指令,单个正则表达式不会给你)。 解析和生成树的美妙之处在于:所有这些部分都暴露在你身上,嵌入到它们所属的结构中。 您甚至可以从多个汇编语言语句生成单个指令,因为该树包含整个程序。 (对于具有千兆字节RAM的系统,树大小并不意味着太大)。


    这里有一个正则表达式,可以完成你想要的功能(你需要为实际的数据表单编辑;而不是所有的注册标签ax,bx,...我只是用'reg'等)

     (?<Opt1>MOVs*Rw,sRw)
    |(?<Opt2>MOVs*Rw,s#data4)
    |(?<Opt3>MOVs*Rw,s#data16)
    |(?<Opt4>MOVs*Rw,s[Rw])
    |(?<Opt5>MOVs*Rw,s[Rw+])
    |(?<Opt6>MOVs*[Rw],sRw)
    |(?<Opt7>MOVs*[-Rw],sRw)
    |(?<Opt8>MOVs*[Rw],s[Rw])
    |(?<Opt9>MOVs*[Rw+],s[Rw])
    |(?<OptA>MOVs*[Rw],s[Rw+]) 
    

    使用这些数据:

    MOV Rw, Rw
    MOV Rw, #data4
    MOV Rw, #data16
    MOV Rw, [Rw]
    MOV Rw, [Rw+]
    MOV [Rw], Rw
    MOV [-Rw], Rw
    MOV [Rw], [Rw]
    MOV [Rw+], [Rw]
    MOV [Rw], [Rw+]
    

    RegexBuddy生成这个:

    Match 1:    MOV Rw, Rw       0      10
    Group "Opt1":   MOV Rw, Rw       0      10
    Group "Opt2" did not participate in the match
    Group "Opt3" did not participate in the match
    Group "Opt4" did not participate in the match
    Group "Opt5" did not participate in the match
    Group "Opt6" did not participate in the match
    Group "Opt7" did not participate in the match
    Group "Opt8" did not participate in the match
    Group "Opt9" did not participate in the match
    Group "OptA" did not participate in the match
    Match 2:    MOV Rw, #data4      12      14
    Group "Opt1" did not participate in the match
    Group "Opt2":   MOV Rw, #data4      12      14
    Group "Opt3" did not participate in the match
    Group "Opt4" did not participate in the match
    Group "Opt5" did not participate in the match
    Group "Opt6" did not participate in the match
    Group "Opt7" did not participate in the match
    Group "Opt8" did not participate in the match
    Group "Opt9" did not participate in the match
    Group "OptA" did not participate in the match
    Match 3:    MOV Rw, #data16     28      15
    Group "Opt1" did not participate in the match
    Group "Opt2" did not participate in the match
    Group "Opt3":   MOV Rw, #data16     28      15
    Group "Opt4" did not participate in the match
    Group "Opt5" did not participate in the match
    Group "Opt6" did not participate in the match
    Group "Opt7" did not participate in the match
    Group "Opt8" did not participate in the match
    Group "Opt9" did not participate in the match
    Group "OptA" did not participate in the match
    Match 4:    MOV Rw, [Rw]        45      12
    Group "Opt1" did not participate in the match
    Group "Opt2" did not participate in the match
    Group "Opt3" did not participate in the match
    Group "Opt4":   MOV Rw, [Rw]        45      12
    Group "Opt5" did not participate in the match
    Group "Opt6" did not participate in the match
    Group "Opt7" did not participate in the match
    Group "Opt8" did not participate in the match
    Group "Opt9" did not participate in the match
    Group "OptA" did not participate in the match
    Match 5:    MOV Rw, [Rw+]       59      13
    Group "Opt1" did not participate in the match
    Group "Opt2" did not participate in the match
    Group "Opt3" did not participate in the match
    Group "Opt4" did not participate in the match
    Group "Opt5":   MOV Rw, [Rw+]       59      13
    Group "Opt6" did not participate in the match
    Group "Opt7" did not participate in the match
    Group "Opt8" did not participate in the match
    Group "Opt9" did not participate in the match
    Group "OptA" did not participate in the match
    Match 6:    MOV [Rw], Rw        74      12
    Group "Opt1" did not participate in the match
    Group "Opt2" did not participate in the match
    Group "Opt3" did not participate in the match
    Group "Opt4" did not participate in the match
    Group "Opt5" did not participate in the match
    Group "Opt6":   MOV [Rw], Rw        74      12
    Group "Opt7" did not participate in the match
    Group "Opt8" did not participate in the match
    Group "Opt9" did not participate in the match
    Group "OptA" did not participate in the match
    Match 7:    MOV [-Rw], Rw       88      13
    Group "Opt1" did not participate in the match
    Group "Opt2" did not participate in the match
    Group "Opt3" did not participate in the match
    Group "Opt4" did not participate in the match
    Group "Opt5" did not participate in the match
    Group "Opt6" did not participate in the match
    Group "Opt7":   MOV [-Rw], Rw       88      13
    Group "Opt8" did not participate in the match
    Group "Opt9" did not participate in the match
    Group "OptA" did not participate in the match
    Match 8:    MOV [Rw], [Rw]     103      14
    Group "Opt1" did not participate in the match
    Group "Opt2" did not participate in the match
    Group "Opt3" did not participate in the match
    Group "Opt4" did not participate in the match
    Group "Opt5" did not participate in the match
    Group "Opt6" did not participate in the match
    Group "Opt7" did not participate in the match
    Group "Opt8":   MOV [Rw], [Rw]     103      14
    Group "Opt9" did not participate in the match
    Group "OptA" did not participate in the match
    Match 9:    MOV [Rw+], [Rw]    119      15
    Group "Opt1" did not participate in the match
    Group "Opt2" did not participate in the match
    Group "Opt3" did not participate in the match
    Group "Opt4" did not participate in the match
    Group "Opt5" did not participate in the match
    Group "Opt6" did not participate in the match
    Group "Opt7" did not participate in the match
    Group "Opt8" did not participate in the match
    Group "Opt9":   MOV [Rw+], [Rw]    119      15
    Group "OptA" did not participate in the match
    Match 10:   MOV [Rw], [Rw+]    136      15
    Group "Opt1" did not participate in the match
    Group "Opt2" did not participate in the match
    Group "Opt3" did not participate in the match
    Group "Opt4" did not participate in the match
    Group "Opt5" did not participate in the match
    Group "Opt6" did not participate in the match
    Group "Opt7" did not participate in the match
    Group "Opt8" did not participate in the match
    Group "Opt9" did not participate in the match
    Group "OptA":   MOV [Rw], [Rw+]    136      15
    
    链接地址: http://www.djcxy.com/p/72437.html

    上一篇: Regular expression for parsing similar assembler instructions

    下一篇: mov instructions with byte destination for immediate to memory