Request for comments on simple Alex parser

I've been looking at contributing code to the Haskell Yi editor and I want to add Git commit and rebase modes to it. I've never done anything with Alex before so I decided to write a commit parser standalone outside of Yi before trying to add one to to the editor. I couldn't find much documentation on Alex aside from the docs on the Alex page which are really light on information about the monad wrapper which seems to be what the Yi project emulates.

Could anyone give me comments about what's wrong (and hopefully right) about this code? I'm pretty new to Haskell so any comments there would also be appreciated. The idea is that this will correctly handle message lines, comments, and diff lines (which will be under all of the above when you run git commit -v ). I may add support for differentiating between the digest line and subsequent lines later but I wanted to stay simple for now.


{
module Main where
}

%wrapper "monad"

$commitChars = [$printablet]
@diffStart = diff --git $commitChars*

gitCommit :-

 {
^@diffStart$                   {makeAlexAction DiffDeclaration `andBegin` diff}
^# $commitChars*$             {makeAlexAction Comment}
$commitChars*$                 {makeAlexAction MessageLine}
}

 {
^@diffStart$                   {makeAlexAction DiffDeclaration}
^- $commitChars*$             {makeAlexAction DiffRemove}
^+ $commitChars*$             {makeAlexAction DiffAdd}
^$commitChars*$                {makeAlexAction DiffContext}
}

.                                     ;
[nr]                                ;

{
data GitCommitToken = Digest String
                    | MessageLine String
                    | Comment String
                    | DiffDeclaration String
                    | DiffAdd String
                    | DiffRemove String
                    | DiffContext String
                    | CommitEOF
                  deriving (Show, Eq)

makeAlexAction ::Monad m => (String -> GitCommitToken) ->AlexInput ->Int ->m GitCommitToken
makeAlexAction cons =  (_,_,inp) len ->return $cons (take len inp)

alexEOF = return CommitEOF

alexMonadScanTokens ::Alex [GitCommitToken]
alexMonadScanTokens = do                                                                                 
  inp  alexEOF >>= eof ->return [eof]
    AlexError inp' ->alexError $ "lexical error: " ++ show inp'
    AlexSkip  inp' len ->do
        alexSetInput inp'
        alexMonadScanTokens
    AlexToken inp' len action ->do
        alexSetInput inp'
        token <- action inp len
        tokens <-alexMonadScanTokens
        return $ token : tokens

main = do
     s <- getContents
     mapM_ print $ either (_ -> []) id (runAlex s alexMonadScanTokens)
}

First, thanks for contributing to Haskell, and to Yi!

Code review

  • I'd use bytestring parsing. Alex supports this now. Use the %wrapper "strict-bytestring" mode.
  • either (_ -> []) id : a bit odd. I'd use an explicit case, with a better error for parse failure.
  • Does it have a space leak in alexMonadScanTokens (at least, it consumes the stack). Alex defines alexScanTokens for you, to be a left fold over the input. I think you should be using alexMonadScan
  • In a heavy duty parser, I'd use unpacked, strict bytestrings for the token types. Eg Digest !ByteString
  • Your regexes look fine.
  • Summary, pretty good first go! Switch to bytestring parsing, and try testing on some large files to ensure that your scanner has no space leak, assuming you don't use the out-of-the-box alexScanTokens .

    Resources

    Look at the bytestring-lexing package for a bytestring-based Alex parser. Secondarily, the alex package itself has many nice examples, that can help with idiomatic parsing:

      $ cabal unpack alex
      $ cd examples
    
    链接地址: http://www.djcxy.com/p/65626.html

    上一篇: 解析器和词法分析器的设计指南?

    下一篇: 请求对简单的Alex解析器发表评论