Library ghc RULES don't activate
fgl
is a Haskell library for graph manipulation. This library comes with an implementation of its base classes - Data.Graph.Inductive.PatriciaTree
- that is supposedly highly tuned for performance. Part of that performance tuning involves ghc RULES pragmas to replace certain generic functions with specialized versions that are much faster.
However, my evidence is that these RULES don't seem to work at all, and I don't understand why not. For people trying to replicate exactly what I see, I've put my test project up at https://github.com/fizbin/GraphOptiTest and am using ghc version 7.10.2
.
Here's my test program:
{-# LANGUAGE TupleSections #-}
module Main where
import Control.Exception
import Control.Monad
import Data.Graph.Inductive
import qualified Data.Graph.Inductive.PatriciaTree as Pt
import qualified MyPatriciaTree as MPt
makeGraph :: (DynGraph gr) => Int -> gr () Int
makeGraph n = mkGraph (map (,()) [1 .. n])
(concatMap (x -> map (y -> (x, y, x*y)) [x .. n]) [1 .. n])
main1 :: IO ()
main1 =
replicateM_ 200 $ let x = makeGraph 200 :: Pt.Gr () Int
in evaluate (length $ show x)
main2 :: IO ()
main2 =
replicateM_ 200 $ let x = makeGraph 200 :: MPt.Gr () Int
in evaluate (length $ show x)
main :: IO ()
main = main1 >> main2
Now, Data.Graph.Inductive.PatriciaTree
has this definition for the class function mkGraph
:
mkGraph vs es = insEdges es
. Gr
. IM.fromList
. map (second (l -> (IM.empty,l,IM.empty)))
$ vs
Where insEdges
is a function defined in the module Data.Graph.Inductive.Graph
as:
insEdges :: (DynGraph gr) => [LEdge b] -> gr a b -> gr a b
insEdges es g = foldl' (flip insEdge) g es
And Data.Graph.Inductive.PatriciaTree
has this to say about insEdge
:
{-# RULES
"insEdge/Data.Graph.Inductive.PatriciaTree" insEdge = fastInsEdge
#-}
fastInsEdge :: LEdge b -> Gr a b -> Gr a b
fastInsEdge (v, w, l) (Gr g) = g2 `seq` Gr g2
where
g1 = IM.adjust addSucc' v g
g2 = IM.adjust addPred' w g1
addSucc' (ps, l', ss) = (ps, l', IM.insertWith addLists w [l] ss)
addPred' (ps, l', ss) = (IM.insertWith addLists v [l] ps, l', ss)
So, in theory, when I run main1
in my test program I should get that compiled down into something that eventually calls fastInsEdge
.
To test this, I compare against a modified version of Data.Graph.Inductive.PatriciaTree
that uses this as its definition of the mkGraph
method: (this is the class MyPatriciaTree
used above in main2
)
mkGraph vs es = doInsEdges
. Gr
. IM.fromList
. map (second (l -> (IM.empty,l,IM.empty)))
$ vs
where
doInsEdges g = foldl' (flip fastInsEdge) g es
When I run my test program (after cabal configure --enable-library-profiling --enable-executable-profiling
and cabal build GraphOptiTest
), though, the main2
method smokes the main1
method. It isn't even close - the profile shows 99.2% of the program's time is spent inside main1
. (and changing the program to just run main2
shows that yes, main2
is really fast on its own)
Yes, I do have -O
in the ghc-options
section of my cabal file.
Trying ghc options like -ddump-rule-firings
doesn't really help - all I can see is that these replacement rules aren't firing, but I have no idea why. I don't know how to get the compiler to tell me why it didn't activate the replacement rules.
Bringing up something discovered by messing around with fgl
's source in response @dfeuer's answer below:
If I add a specialized version of insEdges
to Data.Graph.Inductive.PatriciaTree
as:
{-# RULES
"insEdges/Data.Graph.Inductive.PatriciaTree" insEdges = fastInsEdges
#-}
fastInsEdges :: [LEdge b] -> Gr a b -> Gr a b
fastInsEdges es g = foldl' (flip fastInsEdge) g es
Then both main1
and main2
are now fast. This replacement rule fires; why doesn't the other one? (And no, telling ghc to NOINLINE
the function insEdge
does no good)
EPILOGUE:
So there's now a bug filed with the fgl
package for not tagging their functions that use insEdge
and insNode
appropriately so that the fast versions will be used. But in my code now I work around this and the workaround may be useful in more situations, so I thought I'd share it. At the top of my code now, I have:
import qualified Data.Graph.Inductive as G
import qualified Data.Graph.Inductive.PatriciaTree as Pt
-- Work around design and implementation performance issues
-- in the Data.Graph.Inductive package.
-- Specifically, the tuned versions of insNode, insEdge, gmap, nmap, and emap
-- for PatriciaTree graphs are exposed only through RULES pragmas, meaning
-- that you only get them when the compiler can specialize the function
-- to that specific instance of G.DynGraph. Therefore, I create my own
-- type class with the functions that have specialized versions and use that
-- type class here; the compiler then can do the specialized RULES
-- replacement on the Pt.Gr instance of my class.
class (G.DynGraph gr) => MyDynGraph gr where
mkGraph :: [G.LNode a] -> [G.LEdge b] -> gr a b
insNodes :: [G.LNode a] -> gr a b -> gr a b
insEdges :: [G.LEdge b] -> gr a b -> gr a b
insNode :: G.LNode a -> gr a b -> gr a b
insEdge :: G.LEdge b -> gr a b -> gr a b
gmap :: (G.Context a b -> G.Context c d) -> gr a b -> gr c d
nmap :: (a -> c) -> gr a b -> gr c b
emap :: (b -> c) -> gr a b -> gr a c
instance MyDynGraph Pt.Gr where
mkGraph nodes edges = insEdges edges $ G.mkGraph nodes []
insNodes vs g = foldl' (flip G.insNode) g vs
insEdges es g = foldl' (flip G.insEdge) g es
insNode = G.insNode
insEdge = G.insEdge
gmap = G.gmap
nmap = G.nmap
emap = G.emap
(Had I used the nemap
function in my code I would have included that in the class too) Then, any code of mine which was formerly written in terms of (G.DynGraph gr) => ...
is now written in terms of (MyDynGraph gr) => ...
. The compiler RULES activate for the Pt.Gr
instance, and I then get the optimized version for each function.
Essentially, this trades away the ability of the compiler to inline any of these functions into the calling code and possibly do other optimizations for always getting the optimized versions. (and the cost of an extra pointer indirection at runtime, but that's trivial in comparison) Since profiling showed that those other optimizations never yielded anything significant anyway, this was a clear net win in my case.
Many people's code could use SPECIALIZE
rules aggressively to get the optimized versions everywhere; however, sometimes that isn't possible and it wasn't in the real production code that caused my question without refactoring huge chunks of the application. I had a data structure with a member that has the type (forall gr. G.DynGraph gr => tokType -> gr () (MyEdge c))
- that now uses MyDynGraph
for the class constraint, but completely unwinding it to not have forall gr.
in the signature would have been a huge effort, and such a signature prevents specialization from working across that boundary.
I haven't done any experiments, but here's my guess. The insEdge
function is not marked with a (phased) INLINE
or NOINLINE
, so the inliner is free to inline it whenever it's fully applied. In the definition of insEdges
, we see
foldl' (flip insEdge) g es
Inlining foldl'
gives
foldr f' id es g
where f' x k z = k $! flip insEdge z x
flip
is now fully applied, so we can inline it:
foldr f' id es g
where f' x k z = k $! insEdge x z
Now insEdge
is fully applied, so GHC may choose to inline it right then and there, before the rule ever has a chance to fire.
Try adding {-# NOINLINE [0] insEdge #-}
right by the definition of insEdge
and see what happens. If it works, submit a pull request to fgl
.
PS In my opinion, this sort of thing should really be done by using class methods with defaults, rather than rewrite rules. Rules are always a bit fussy.
As the comments revealed, the big problem wasn't premature inlining, but rather a failure to specialize insEdge
. In particular, Data.Graph.Inductive.Graph
does not export an unfolding for insEdges
, so it's impossible to specialize it, and the insEdge
it calls, to the appropriate type. The ultimate fix was to mark insEdges
INLINABLE
, but I would still advise marking insEdge
NOINLINE [0]
out of an abundance of caution.
上一篇: 在Haskell程序中收集暂停时间
下一篇: 图书馆ghc规则不激活