Request for comments on simple Alex parser
I've been looking at contributing code to the Haskell Yi editor and I want to add Git commit and rebase modes to it. I've never done anything with Alex before so I decided to write a commit parser standalone outside of Yi before trying to add one to to the editor. I couldn't find much documentation on Alex aside from the docs on the Alex page which are really light on information about the monad wrapper which seems to be what the Yi project emulates.
Could anyone give me comments about what's wrong (and hopefully right) about this code? I'm pretty new to Haskell so any comments there would also be appreciated. The idea is that this will correctly handle message lines, comments, and diff lines (which will be under all of the above when you run git commit -v
). I may add support for differentiating between the digest line and subsequent lines later but I wanted to stay simple for now.
{
module Main where
}
%wrapper "monad"
$commitChars = [$printablet]
@diffStart = diff --git $commitChars*
gitCommit :-
{
^@diffStart$ {makeAlexAction DiffDeclaration `andBegin` diff}
^# $commitChars*$ {makeAlexAction Comment}
$commitChars*$ {makeAlexAction MessageLine}
}
{
^@diffStart$ {makeAlexAction DiffDeclaration}
^- $commitChars*$ {makeAlexAction DiffRemove}
^+ $commitChars*$ {makeAlexAction DiffAdd}
^$commitChars*$ {makeAlexAction DiffContext}
}
. ;
[nr] ;
{
data GitCommitToken = Digest String
| MessageLine String
| Comment String
| DiffDeclaration String
| DiffAdd String
| DiffRemove String
| DiffContext String
| CommitEOF
deriving (Show, Eq)
makeAlexAction ::Monad m => (String -> GitCommitToken) ->AlexInput ->Int ->m GitCommitToken
makeAlexAction cons = (_,_,inp) len ->return $cons (take len inp)
alexEOF = return CommitEOF
alexMonadScanTokens ::Alex [GitCommitToken]
alexMonadScanTokens = do
inp alexEOF >>= eof ->return [eof]
AlexError inp' ->alexError $ "lexical error: " ++ show inp'
AlexSkip inp' len ->do
alexSetInput inp'
alexMonadScanTokens
AlexToken inp' len action ->do
alexSetInput inp'
token <- action inp len
tokens <-alexMonadScanTokens
return $ token : tokens
main = do
s <- getContents
mapM_ print $ either (_ -> []) id (runAlex s alexMonadScanTokens)
}
First, thanks for contributing to Haskell, and to Yi!
Code review
%wrapper "strict-bytestring"
mode. either (_ -> []) id
: a bit odd. I'd use an explicit case, with a better error for parse failure. alexMonadScanTokens
(at least, it consumes the stack). Alex defines alexScanTokens
for you, to be a left fold over the input. I think you should be using alexMonadScan
Digest !ByteString
Summary, pretty good first go! Switch to bytestring parsing, and try testing on some large files to ensure that your scanner has no space leak, assuming you don't use the out-of-the-box alexScanTokens
.
Resources
Look at the bytestring-lexing package for a bytestring-based Alex parser. Secondarily, the alex package itself has many nice examples, that can help with idiomatic parsing:
$ cabal unpack alex
$ cd examples
链接地址: http://www.djcxy.com/p/65626.html
上一篇: 解析器和词法分析器的设计指南?
下一篇: 请求对简单的Alex解析器发表评论