Generating a unique value in Haskell do
To generate x86 assembly code, I have defined a custom type called X86
:
data X86 a = X86 { code :: String, counter :: Integer, value :: (X86 a -> a) }
This type is used in do-notation like the following. This makes it easy to write templates for generating if-statements, for-loops, etc...
generateCode :: X86 ()
generateCode = do
label1 <- allocateUniqueLabel
label2 <- allocateUniqueLabel
jmp label1
label label1
jmp label2
label label2
Instructions are defined like this:
jmp :: String -> X86 ()
jmp l = X86 { code = "jmp " ++ l ++ ";n", counter = 0, value = const () }
label :: String -> X86 ()
label l = X86 { code = l ++ ":n", counter = 0, value = const () }
And the completed assembly file is printed like so:
printAsm :: X86 a -> String
printAsm X86{code=code} = code
main = do
putStrLn (printAsm generateCode)
I implemented the X86
monad in the following manner. Essentially, the sequence operator concatenates blocks of assembly code in order and ensures the counters are incremented.
instance Monad X86 where
x >> y = X86 { code = code x ++ code y, counter = counter x + counter y, value = value y }
x >>= f = x >> y
where y = f (value x x)
The problem is the labels are not incremented properly, so they are not unique! The following is the output:
jmp Label1;
Label1:
jmp Label1;
Label1:
I desire the output to have a unique value for each label:
jmp Label1;
Label1:
jmp Label2;
Label2:
To complete the example, here is the implementation of the allocatedUniqueLabel
function:
allocateUniqueId :: X86 Integer
allocateUniqueId = X86 { code = "", counter = 1, value = counter }
allocateUniqueLabel :: X86 String
allocateUniqueLabel = do
id <- allocateUniqueId
return ("Label" ++ show id)
How can I fix my X86
monad so the labels are unique?
Here is what I've tried:
State
monad. => I have looked into a number of examples, but do not understand how to integrate them into my existing X86
monad. We can use mtl classes to describe X86 code as effectful programs. We want:
Writer
effect; State
effect. We worry about instantiating these effects last, and in the description of the programs we use MonadWriter
and MonadState
constraints.
import Control.Monad.State -- mtl
import Control.Monad.Writer
Allocating a new identifier increments the counter, without generating any code. This only uses the State
effect.
type Id = Integer
allocateUniqueLabel :: MonadState Id m => m Id
allocateUniqueLabel = do
i <- get
put (i+1) -- increment
return ("Label" ++ show (i+1))
And of course, we have actions to generate code, that don't need to care about the current state. So they use the Writer
effect.
jmp :: MonadWriter String m => String -> m ()
jmp l = tell ("jmp " ++ l ++ ";n")
label :: MonadWriter String m => String -> m ()
label l = tell (l ++ ":n")
The actual program looks the same as the original, but with more general types.
generateCode :: (MonadState Id m, MonadWriter String m) => m ()
generateCode = do
label1 <- allocateUniqueLabel
label2 <- allocateUniqueLabel
jmp label1
label label1
jmp label2
label label2
The effects are instantiated when we run this program, here using runWriterT
/ runWriter
and runStateT
/ runState
(the order doesn't matter much, these two effects commute).
type X86 = WriterT String (State Id)
runX86 :: X86 () -> String
runX86 gen = evalState (execWriterT gen) 1 -- start counting from 1
-- evalState and execWriterT are wrappers around `runStateT` and `runWriterT`:
-- - execWriterT: discards the result (of type ()), only keeping the generated code.
-- - evalState: discards the final state, only keeping the generated code,
-- and does some unwrapping after there are no effects to handle.
You probably want to use this monad stack:
type X86 a = StateT Integer (Writer String) a
Since you have a state and a writer, you could also consider using RWS
(reader-writer-state all in one):
type X86 a = RWS () String Integer a
Let's pick the first one for fun. I'd first define a helper function to increment the counter (monads cannot lawfully increment a counter "automatically"):
instr :: X86 a -> X86 a
instr i = do
x <- i
modify (+1)
return x
Then you could define jmp
as:
jmp :: String -> X86 ()
jmp l = instr $ do
lift (tell ("jmp " ++ l ++ ";n"))
-- 'tell' is one of Writer's operations, and then we 'lift'
-- it into StateT
(The do
there is superfluous, however I suspect there will be a pattern of starting instruction definitions with instr $ do
)
I would not roll my own monad for this -- it can be instructive to do so, but I think you'll get more mileage using the standard libraries for this one.
As you probably now underestand from the other answers, the problem with your approach was that even though you were using the counter, you were still generating your labels locally. In particular
label1 <- allocateUniqueLabel
label label1
was equivalent to
X86 { code = "Label1:n", counter = 1, value = const () }
We need to assemble the whole code first, generate the labels, and only afterwards (in some sense) generate the actual code using the labels. And this is what the other answers are suggesting by having the counter stored in the State
(or RWS
) monad.
There is yet another issue that we can address: You want to be able to jump both forwards and backwards. This is most likely why you have separate allocateUniqueLabel
and label
functions. But this allows to set the same label twice.
It is actually possible to use to do
notation with "backwards" binding using MonadFix
, which defines this monadic operation:
mfix :: (a -> m a) -> m a
Since both State
and RWS
have MonadFix
instances, we can indeed write code like this:
{-# LANGUAGE GeneralizedNewtypeDeriving, RecursiveDo #-}
module X86
( X86()
, runX86
, label
, jmp
) where
import Control.Monad.RWS
-- In production code it'll be much faster if we replace String with
-- ByteString.
newtype X86 a = X86 (RWS () String Int a)
deriving (Functor, Applicative, Monad, MonadFix)
runX86 :: X86 a -> String
runX86 (X86 k) = snd (execRWS k () 1)
newtype Label = Label { getLabel :: String }
label :: X86 Label
label = X86 $ do
counter <- get
let l = "Label" ++ show counter
tell (l ++ ":n")
modify (+1)
return (Label l)
jmp :: Label -> X86 ()
jmp (Label l) = X86 . tell $ "jmp " ++ l ++ ";n"
And use it like this:
example :: X86 ()
example = do
rec l1 <- label
jmp l2
l2 <- label
jmp l1
There are a few things to note:
RecursiveDo
extension to enable the rec
keyword. rec
delimits a block of mutually recursive definitions. In our case it could also start one line later ( rec jmp l2
). GHC then translates it into using mfix
internally. (Using the deprecated mdo
keyword instead of rec
would make the code somewhat more natural.) We wrap the internals in the X86
newtype. First it's always good to hide the internal implementation, it allows easy refactorings later. Second, mfix
requires that the function passed to it a -> ma
isn't strict in its argument. The effect must not depend on the argument, otherwise mfix
diverges. This is condition is satisfied for our functions, but if the internals are exposed, someone could define a contrived function like this:
-- | Reset the counter to the specified label.
evilReset :: Label -> X86 ()
evilReset = X86 . put . read . drop 5 . getLabel
Not only it breaks the uniqueness of labels, but also causes the following code to hang:
diverge :: X86 ()
diverge = do
rec evilReset l2
l2 <- label
return ()
Another quite similar alternative would be to use the Rand
monad and generate labels with the Random
instance of UUID
. Something like WriterT String Rand a
, which also has a MonadFix
instance.
(From purely academic perspective it might be possible to construct an arrow instead of a monad, that'd implement ArrowLoop
, but disallow state modifications that depend on values, such as in evilReset
. But encapsulation of X86
achieves the same goal, keeping the much friendlier do
syntax.)
上一篇: Haskell有什么大惊小怪的?
下一篇: 在Haskell中生成一个独特的值