Performance of Foldable's default methods
I've been exploring the Foldable
class and also the the Monoid
class.
Firstly, lets say I want to fold over a list of the Monoid First
. Like so:
x :: [First a]
fold? mappend mempty x
Then I assume in this case the most appropriate fold would be foldr
, as mappend
for First
is lazy in it's second argument.
Conversely, for Last
we'd want to foldl'
(or just foldl
I'm not sure).
Now moving away from lists, I've defined a simple binary tree like so:
{-# LANGUAGE GADTs #-}
data BinaryTree a where
BinaryTree :: BinaryTree a -> BinaryTree a -> BinaryTree a
Leaf :: a -> BinaryTree a
And I've made it Foldable
with the most straightforward definition:
instance Foldable BinaryTree where
foldMap f (BinaryTree left right) =
(foldMap f left) `mappend` (foldMap f right)
foldMap f (Leaf x) = f x
As Foldable
defines fold
as simply foldMap id
we can now do:
x1 :: BinaryTree (First a)
fold x1
x2 :: BinaryTree (Last a)
fold x2
Assuming our BinaryTree is balanced, and there's not many Nothing
values, these operations should take O(log(n))
time I believe.
But Foldable
also defines a whole lot of default methods like foldl
, foldl'
, foldr
and foldr'
based on foldMap
.
These default definitions seem to be implemented by composing a bunch of functions, wrapped in a Monoid called Endo
, one for each element in the collection, and then composing them all.
For the purpose of this discussion I am not modifying these default definitions.
So lets now consider:
x1 :: BinaryTree (First a)
foldr mappend mempty x1
x2 :: BinaryTree (Last a)
foldl mappend mempty x2
Does running these retain O(log(n))
performance of the ordinary fold
? (I'm not worried about constant factors for the moment). Does laziness result in the tree not needing to be fully traversed? Or will the default definitions of foldl
and foldr
require an entire traversal of the tree?
I tried to go though the algorithm step by step (much like they did on the Foldr Foldl Foldl' article) but I ended up completely confusing myself as this is a bit more complex as it involves an interaction between Foldable
, Monoid
and Endo
.
So what I'm looking for is an explanation of why (or why not) the default definition of say foldr
, would only take O(log(n))
time on a balanced binary tree like above. A step by step example like what's from the Foldr Foldl Foldl' article would be really helpful, but I understand if that's too difficult, as I totally confused myself attempting it.
Yes, it has O(log(n)) best case performance.
Endo
is a wrapper around (a -> a) kind of functions that:
instance Monoid (Endo a) where
mempty = Endo id
Endo f `mappend` Endo g = Endo (f . g)
And the default implementation of foldr
in Data.Foldable:
foldr :: (a -> b -> b) -> b -> t a -> b
foldr f z t = appEndo (foldMap (Endo #. f) t) z
The definition of .
(function composition) in case:
(.) f g = x -> f (g x)
Endo is defined by newtype constructor, so it only exists at compile stage, not run-time. #.
operator changes the type of it's second operand and discard the first. The newtype constructor and #.
operator guarantee that you can ignore the wrapper when considering performance issues.
So the default implementation of foldr
can be reduced to:
-- mappend = (.), mempty = id from instance Monoid (Endo a)
foldr :: (a -> b -> b) -> b -> t a -> b
foldr f z t = foldMap f t z
For your Foldable BinaryTree
:
foldr f z t
= foldMap f t z
= case t of
Leaf a -> f a z
-- what we care
BinaryTree l r -> ((foldMap f l) . (foldMap f r)) z
The default lazy evaluation in Haskell is ultimately simple, there are just two rules:
That makes it easy to trace the evaluation of the last line of the code above:
((foldMap f l) . (foldMap f r)) z
= (z -> foldMap f l (foldMap f r z)) z
= foldMap f l (foldMap f r z)
-- let z' = foldMap f r z
= foldMap f l z' -- height 1
-- if the branch l is still not a Leaf node
= ((foldMap f ll) . (foldMap f lr)) z'
= (z -> foldMap f ll (foldMap f lr)) z'
= foldMap f ll (foldMap f lr z')
-- let z'' = foldMap f lr z'
= foldMap f ll z'' -- height 2
The right branch of the tree is never expanded before the left has been fully expanded, and it goes one level higher after an O(1) operation of function expansion and application, therefore when it reached the left-most Leaf node:
= foldMap f leaf@(Leaf a) z'heightOfLeftMostLeaf
= f a z'heightOfLeftMostLeaf
Then f
looks at the value a
and decides to ignore its second argument (like what mappend
will do to First
values), the evaluation short-circuits, results O(height of the left-most leaf), or O(log(n)) performance when the tree is balanced.
foldl
is all the same, it's just foldr
with mappend
flipped ie O(log(n)) best case performance with Last
.
foldl'
and foldr'
are different.
foldl' :: (b -> a -> b) -> b -> t a -> b
foldl' f z0 xs = foldr f' id xs z0
where f' x k z = k $! f z x
At every step of reduction, the argument is evaluated first and then the function application, the tree will be traversed ie O(n) best case performance.
链接地址: http://www.djcxy.com/p/63636.html下一篇: 可折叠的默认方法的性能