更快地在Haskell中进行直方图计算
我对Haskell很新,我想创建一个直方图。 我正在使用Data.Vector.Unboxed来融合数据上的操作; 这是快速的(用-O -fllvm编译时),瓶颈是我的折叠应用程序; 它汇总了桶的数量。
我怎样才能让它更快? 我读过关于通过严格保密来减少Thunk的数量,所以我通过使用seq和foldr来制定严格的规定,但是没有看到性能提高。 强烈鼓励您的想法。
import qualified Data.Vector.Unboxed as V
histogram :: [(Int,Int)]
histogram = V.foldr' agg [] $ V.zip k v
where
n = 10000000
c = 1000000
k = V.generate n (i -> i `div` c * c)
v = V.generate n (i -> 1)
agg kv [] = [kv]
agg kv@(k,v) acc@((ck,cv):as)
| k == ck = let a = (ck,cv+v):as in a `seq` a
| otherwise = let a = kv:acc in a `seq` a
main :: IO ()
main = print histogram
编译:
ghc --make -O -fllvm histogram.hs
首先,用-O2 -rtsopts
编译程序。 然后,要获得可以优化的第一个想法,请使用选项+RTS -sstderr
运行程序:
$ ./question +RTS -sstderr
[(0,1000000),(1000000,1000000),(2000000,1000000),(3000000,1000000),(4000000,1000000),(5000000,1000000),(6000000,1000000),(7000000,1000000),(8000000,1000000),(9000000,1000000)]
1,193,907,224 bytes allocated in the heap
1,078,027,784 bytes copied during GC
282,023,968 bytes maximum residency (7 sample(s))
86,755,184 bytes maximum slop
763 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 1964 colls, 0 par 3.99s 4.05s 0.0021s 0.0116s
Gen 1 7 colls, 0 par 1.60s 1.68s 0.2399s 0.6665s
INIT time 0.00s ( 0.00s elapsed)
MUT time 2.67s ( 2.68s elapsed)
GC time 5.59s ( 5.73s elapsed)
EXIT time 0.02s ( 0.03s elapsed)
Total time 8.29s ( 8.43s elapsed)
%GC time 67.4% (67.9% elapsed)
Alloc rate 446,869,876 bytes per MUT second
Productivity 32.6% of total user, 32.0% of total elapsed
注意你的时间有67%花在GC上! 显然有一些错误。 为了找出问题所在,我们可以在启用堆分析的情况下运行程序(使用+RTS -h
),产生下图:
所以,你正在泄漏thunk。 这是如何发生的? 查看代码,唯一一次在agg
构建(递归)thunk的时间是在添加时进行的。 通过添加爆炸模式来制作cv
严格来修复问题:
{-# LANGUAGE BangPatterns #-}
import qualified Data.Vector.Unboxed as V
histogram :: [(Int,Int)]
histogram = V.foldr' agg [] $ V.zip k v
where
n = 10000000
c = 1000000
k = V.generate n (i -> i `div` c * c)
v = V.generate n id
agg kv [] = [kv]
agg kv@(k,v) acc@((ck,!cv):as) -- Note the !
| k == ck = (ck,cv+v):as
| otherwise = kv:acc
main :: IO ()
main = print histogram
输出:
$ time ./improved +RTS -sstderr
[(0,499999500000),(1000000,1499999500000),(2000000,2499999500000),(3000000,3499999500000),(4000000,4499999500000),(5000000,5499999500000),(6000000,6499999500000),(7000000,7499999500000),(8000000,8499999500000),(9000000,9499999500000)]
672,063,056 bytes allocated in the heap
94,664 bytes copied during GC
160,028,816 bytes maximum residency (2 sample(s))
1,464,176 bytes maximum slop
155 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 992 colls, 0 par 0.03s 0.03s 0.0000s 0.0001s
Gen 1 2 colls, 0 par 0.03s 0.03s 0.0161s 0.0319s
INIT time 0.00s ( 0.00s elapsed)
MUT time 1.24s ( 1.25s elapsed)
GC time 0.06s ( 0.06s elapsed)
EXIT time 0.03s ( 0.03s elapsed)
Total time 1.34s ( 1.34s elapsed)
%GC time 4.4% (4.5% elapsed)
Alloc rate 540,674,868 bytes per MUT second
Productivity 95.5% of total user, 95.1% of total elapsed
./improved +RTS -sstderr 1,14s user 0,20s system 99% cpu 1,352 total
这好多了。
所以现在你可以问,为什么这个问题出现了,即使你使用了seq
? 原因是seq
只会迫使第一个参数成为WHNF,并且对于(_,_)
(其中_是未评估的thunk)已经是WHNF! 此外, seq aa
是相同的a
,因为它seq ab
(非正式)表示:评价一个计算b之前,所以seq aa
只是意味着:评估前的评估,这是一样的只是评估!