Haskell/Performance examples

Goal: Explain optimizations step by step with examples that actually happened.

Tight loop
dons: Write Haskell as fast as C: exploiting strictness, laziness and recursion.

dons: Write Haskell as fast as C: exploiting strictness, laziness and recursion.

CSV Parsing
haskell-cafe: another Newbie performance question I hope he doesn't mind if I post his code here, I still have to ask him. -- apfe&lambda;mus 08:46, 18 May 2008 (UTC) type CSV = String

main = do                 args <- getArgs file <- readFile (head args) writeFile (head args ++ "2") (processFile (args !! 1) file)

processFile s    = writeCSV. doInteraction s. readCSV doInteraction line csv = insertLine (show line) (length csv - 1) csv writeCSV         = (\x -> x ++ "\n"). concat. intersperse "\n". (map (concat. intersperse ",". (map show))) insertLine line pos csv = (take pos csv) ++ [readCSVLine line] ++ drop pos csv readCSVLine      = read. (\x -> "["++x++"]") readCSV          = map readCSVLine. lines

I think there was another cvs parsing thread on the mailing list which I deemed appropriate, but I can't remember.

Space Leak
jkff asked about some code in #haskell which was analyzing a logfile. Basically, it was building a histogram

foldl' (\m (x,y) -> insertWith' x (\[y] ys -> y:ys) [y] m) M.empty [(ByteString.copy foo, ByteString.copy bar) | (foo,bar) <- map (match regex) lines]

The input was a 1GB logfile and the program blew the available memory mainly because the  weren't forced and the whole file lingered around in memory.

Branch Prediction Failure
This code demonstrates how your CPU may fail or succeed in predicting code branches (aka if clauses). The code generates a large random Vector of Ints. From this we compute the sum of all numbers larger than 999. If the data is sorted the CPU can easily deduce the probable outcome of the next branch. If the data is not sorted this will be much more difficult and less successful. Unfortunately, sorting for one purpose only is seldom a good idea. The code uses the criterion package for time measurement. Compile this e.g. with -Wall -keep-llvm-files -O2 -optlo-O3 -fllvm -rtsopts -threaded -eventlog -feager-blackholing -fmax-simplifier-iterations=20 -fsimplifier-phases=3 -fno-liberate-case -fspecialise-aggressively. The idea is from github.com/Kobzol/hardware-effects. (tkx68hamburg)

{-# LANGUAGE FlexibleInstances, RankNTypes, BangPatterns, CPP, PartialTypeSignatures, UnicodeSyntax, OverloadedStrings #-} {-# OPTIONS_GHC -fno-warn-orphans #-} module Main (main) where

import          Control.DeepSeq import          Control.Exception (evaluate) import          Control.Monad.ST import           Criterion.Main import qualified Data.Vector.Algorithms.Radix as Radix import qualified Data.Vector.Unboxed as UV import          System.Random

instance (NFData a) => NFData (V.Vector a)             where rnf v = V.foldr (\a b -> rnf a `seq` b) v
 * 1) ifdef DEFINE_NFDATA_VECTOR
 * 1) endif

setupUnsorted :: IO [UV.Vector Int] setupUnsorted = do small <- randomSampleUVVector 1024 med <- randomSampleUVVector $ 1024*1024 large <- randomSampleUVVector $ 20*1024*1024 return [small, med, large]

setupSorted :: IO [UV.Vector Int] setupSorted = fmap (fmap sortUVec) setupUnsorted

main :: IO main = do defaultMainWith defaultConfig [ env setupUnsorted (\ ~[small, med, large] -> bgroup "Branch without sort" [            bench "small" $ whnf core1 small,             bench "medium" $ whnf core1 med,             bench "large" $ whnf core1 large             ]          ), env setupUnsorted (\ ~[small, med, large] -> bgroup "Branch without sort using filter" [            bench "small" $ whnf core1a small,             bench "medium" $ whnf core1a med,             bench "large" $ whnf core1a large             ]          ), env setupUnsorted (\ ~[small, med, large] -> bgroup "Branch including sort" [            bench "small" $ whnf core2 small,             bench "medium" $ whnf core2 med,             bench "large" $ whnf core2 large             ]          ), env setupSorted (\ ~[small, med, large] -> bgroup "Branch after sort" [            bench "small" $ whnf core1 small,             bench "medium" $ whnf core1 med,             bench "large" $ whnf core1 large             ]          ) ]

{- INLINE -} core1 :: UV.Vector Int -> Int core1 = UV.foldl' condSum 0

{- INLINE -} core2 :: UV.Vector Int -> Int core2 = UV.foldl' condSum 0. sortUVec

{- INLINE -} core1a :: UV.Vector Int -> Int core1a = UV.sum. (UV.filter (>999))

{- INLINE -} condSum :: Int -> Int -> Int condSum !x !y = if y>999 then x+y else x

randomSampleUVVector :: Int -> IO (UV.Vector Int) randomSampleUVVector i = evaluate $ force $ UV.fromList (take i (randoms (mkStdGen 0) :: [Int]))

sortUVec :: forall a. (Ord a, Radix.Radix a, UV.Unbox a) => UV.Vector a -> UV.Vector a sortUVec vec = runST (do mv <- UV.thaw vec       Radix.sort mv        UV.unsafeFreeze mv)