Performance of Foldable's default methods

2018-06-22 16:20:34

I've been exploring the Foldable class and also the the Monoid class.

Firstly, lets say I want to fold over a list of the Monoid First . Like so:

x :: [First a]

fold? mappend mempty x

Then I assume in this case the most appropriate fold would be foldr , as mappend for First is lazy in it's second argument.

Conversely, for Last we'd want to foldl' (or just foldl I'm not sure).

Now moving away from lists, I've defined a simple binary tree like so:

{-# LANGUAGE GADTs #-}

data BinaryTree a where
  BinaryTree :: BinaryTree a -> BinaryTree a -> BinaryTree a
  Leaf :: a -> BinaryTree a

And I've made it Foldable with the most straightforward definition:

instance Foldable BinaryTree where
  foldMap f (BinaryTree left right) = 
    (foldMap f left) `mappend` (foldMap f right)
  foldMap f (Leaf x) = f x

As Foldable defines fold as simply foldMap id we can now do:

x1 :: BinaryTree (First a)
fold x1 

x2 :: BinaryTree (Last a)
fold x2

Assuming our BinaryTree is balanced, and there's not many Nothing values, these operations should take O(log(n)) time I believe.

But Foldable also defines a whole lot of default methods like foldl , foldl' , foldr and foldr' based on foldMap .

These default definitions seem to be implemented by composing a bunch of functions, wrapped in a Monoid called Endo , one for each element in the collection, and then composing them all.

For the purpose of this discussion I am not modifying these default definitions.

So lets now consider:

x1 :: BinaryTree (First a)
foldr mappend mempty x1 

x2 :: BinaryTree (Last a)
foldl mappend mempty x2

Does running these retain O(log(n)) performance of the ordinary fold ? (I'm not worried about constant factors for the moment). Does laziness result in the tree not needing to be fully traversed? Or will the default definitions of foldl and foldr require an entire traversal of the tree?

I tried to go though the algorithm step by step (much like they did on the Foldr Foldl Foldl' article) but I ended up completely confusing myself as this is a bit more complex as it involves an interaction between Foldable , Monoid and Endo .

So what I'm looking for is an explanation of why (or why not) the default definition of say foldr , would only take O(log(n)) time on a balanced binary tree like above. A step by step example like what's from the Foldr Foldl Foldl' article would be really helpful, but I understand if that's too difficult, as I totally confused myself attempting it.

Yes, it has O(log(n)) best case performance.

Endo is a wrapper around (a -> a) kind of functions that:

instance Monoid (Endo a) where
  mempty = Endo id
  Endo f `mappend` Endo g = Endo (f . g)

And the default implementation of foldr in Data.Foldable:

foldr :: (a -> b -> b) -> b -> t a -> b
foldr f z t = appEndo (foldMap (Endo #. f) t) z

The definition of . (function composition) in case:

(.) f g = x -> f (g x)

Endo is defined by newtype constructor, so it only exists at compile stage, not run-time. #. operator changes the type of it's second operand and discard the first. The newtype constructor and #. operator guarantee that you can ignore the wrapper when considering performance issues.

So the default implementation of foldr can be reduced to:

-- mappend = (.), mempty = id from instance Monoid (Endo a)
foldr :: (a -> b -> b) -> b -> t a -> b
foldr f z t = foldMap f t z

For your Foldable BinaryTree :

foldr f z t
  = foldMap f t z
  = case t of
    Leaf a -> f a z
    -- what we care
    BinaryTree l r -> ((foldMap f l) . (foldMap f r)) z

The default lazy evaluation in Haskell is ultimately simple, there are just two rules:

function application first

evaluate the arguments from left to right if the values matter

That makes it easy to trace the evaluation of the last line of the code above:

  ((foldMap f l) . (foldMap f r)) z
= (z -> foldMap f l (foldMap f r z)) z
= foldMap f l (foldMap f r z)
-- let z' = foldMap f r z
= foldMap f l z' -- height 1
-- if the branch l is still not a Leaf node
= ((foldMap f ll) . (foldMap f lr)) z'
= (z -> foldMap f ll (foldMap f lr)) z'
= foldMap f ll (foldMap f lr z')
-- let z'' = foldMap f lr z'
= foldMap f ll z'' -- height 2

The right branch of the tree is never expanded before the left has been fully expanded, and it goes one level higher after an O(1) operation of function expansion and application, therefore when it reached the left-most Leaf node:

= foldMap f leaf@(Leaf a) z'heightOfLeftMostLeaf
= f a z'heightOfLeftMostLeaf

Then f looks at the value a and decides to ignore its second argument (like what mappend will do to First values), the evaluation short-circuits, results O(height of the left-most leaf), or O(log(n)) performance when the tree is balanced.

foldl is all the same, it's just foldr with mappend flipped ie O(log(n)) best case performance with Last .

foldl' and foldr' are different.

foldl' :: (b -> a -> b) -> b -> t a -> b
foldl' f z0 xs = foldr f' id xs z0
  where f' x k z = k $! f z x

At every step of reduction, the argument is evaluated first and then the function application, the tree will be traversed ie O(n) best case performance.

链接地址: http://www.djcxy.com/p/63636.html

上一篇: 在NetBeans中没有项目的文件中自动完成

下一篇: 可折叠的默认方法的性能