Building a histogram with haskell, many times slower than with python

Building a histogram with haskell, many times slower than with python - haskell

I was going to test naive bayes classification. One part of it was going to be building a histogram of the training data. The problem is, I am using a large training data, the haskell-cafe mailing list since a couple of years back, and there are over 20k files in the folder.
It takes a while over two minutes to create the histogram with python, and a little over 8 minutes with haskell. I'm using Data.Map (insertWith'), enumerators and text. What else can I do to speed up the program?
Haskell:
import qualified Data.Text as T
import qualified Data.Text.IO as TI
import System.Directory
import Control.Applicative
import Control.Monad (filterM, foldM)
import System.FilePath.Posix ((</>))
import qualified Data.Map as M
import Data.Map (Map)
import Data.List (foldl')
import Control.Exception.Base (bracket)
import System.IO (Handle, openFile, hClose, hSetEncoding, IOMode(ReadMode), latin1)
import qualified Data.Enumerator as E
import Data.Enumerator (($$), (>==>), (<==<), (==<<), (>>==), ($=), (=$))
import qualified Data.Enumerator.List as EL
import qualified Data.Enumerator.Text as ET
withFile' :: (Handle -> IO c) -> FilePath -> IO c
withFile' f fp = do
bracket
(do
h ← openFile fp ReadMode
hSetEncoding h latin1
return h)
hClose
(f)
buildClassHistogram c = do
files ← filterM doesFileExist =<< map (c </> ) <$> getDirectoryContents c
foldM fileHistogram M.empty files
fileHistogram m file = withFile' (λh → E.run_ $ enumHist h) file
where
enumHist h = ET.enumHandle h $$ EL.fold (λm' l → foldl' (λm'' w → M.insertWith' (const (+1)) w 1 m'') m' $ T.words l) m
Python:
for filename in listdir(root):
filepath = root + "/" + filename
# print(filepath)
fp = open(filepath, "r", encoding="latin-1")
for word in fp.read().split():
if word in histogram:
histogram[word] = histogram[word]+1
else:
histogram[word] = 1
Edit: Added imports

You could try using imperative hash maps from the hashtables package: http://hackage.haskell.org/package/hashtables
I remember I once got a moderate speedup compared to Data.Map. I wouldn't expect anything spectacular though.
UPDATE
I simplified your python code so I could test it on a single big file (100 million lines):
import sys
histogram={}
for word in sys.stdin.readlines():
if word in histogram:
histogram[word] = histogram[word]+1
else:
histogram[word] = 1
print histogram.get("the")
Takes 6.06 seconds
Haskell translation using hashtables:
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.ByteString.Char8 as T
import qualified Data.HashTable.IO as HT
main = do
ls <- T.lines `fmap` T.getContents
h <- HT.new :: IO (HT.BasicHashTable T.ByteString Int)
flip mapM_ ls $ \w -> do
r <- HT.lookup h w
case r of
Nothing -> HT.insert h w (1::Int)
Just c -> HT.insert h w (c+1)
HT.lookup h "the" >>= print
Run with a large allocation area: histogram +RTS -A500M
Takes 9.3 seconds, with 2.4% GC. Still quite a bit slower than Python but not too bad.
According to the GHC user guide, you can change the RTS options while compiling:
GHC lets you change the default RTS options for a program at compile
time, using the -with-rtsopts flag (Section 4.12.6, “Options affecting
linking”). A common use for this is to give your program a default
heap and/or stack size that is greater than the default. For example,
to set -H128m -K64m, link with -with-rtsopts="-H128m -K64m".

Your Haskell and Python implementations are using maps with different complexities. Python dictionaries are hash maps so the expected time for each operation (membership test, lookup, and insertion) is O(1). The Haskell version uses Data.Map which is a balanced binary search tree so the same operations take O(lg n) time. If you change your Haskell version to use a different map implementation, say a hash table or some sort of trie, it should get a lot quicker. However, I'm not familiar enough with the different modules implementing these data structures to say which is best. I'd start with the Data category on Hackage and look for one that you like. You might also look for a map that allows destructive updates like STArray does.

We need more information:
How long does it take both programs to process the words from the input, with no data structure for maintaining counts?
How many distinct words are there, so we can judge whether the extra log N cost for balanced trees is a consideration?
What does GHC's profiler say? In particular, how much time is spent in allocation? It's possible that the Haskell version is spending most of its time allocating tree nodes that quickly become obsolete.
UPDATE: I missed that lowercase "text" might mean Data.Text. You may be comparing applies and oranges. Python's Latin1 encoding uses one byte per char. Although it tries to be efficient, Data.Text must allow for the possiblity of more than 256 characters. What happens if you switch to String, or better, Data.ByteString?
Depending on what these indicators say, here are a couple of things to try:
If analyzing the input is a bottleneck, try driving all your I/O and analysis from Data.ByteString instead of Text.
If the data structure is a bottleneck, Bentley and Sedgewick's ternary search trees are purely functional but perform competetively with hash tables. There is a TernaryTrees package on Hackage.

Related

Efficient Conversion of Bytestring to [Word16]

I am attempting to do a plain conversion from a bytestring to a list of Word16s. The implementation below using Data.Binary.Get works, though it is a performance bottleneck in the code. This is understandable as IO is always going to be slow, but I was wondering if there isn't a more efficient way of doing this.
getImageData' = do
e <- isEmpty
if e then return []
else do
w <- getWord16be
ws <- getImageData'
return $ w : ws

I suspect the big problem you're encountering with Data.Binary.Get is that the decoders are inherently much too strict for your purpose. This also appears to be the case for serialise, and probably also the other serialization libraries. I think the fundamental trouble is that while you know that the operation will succeed as long as the ByteString has an even number of bytes, the library does not know that. So it has to read in the whole ByteString before it can conclude "Ah yes, all is well" and construct the list you've requested. Indeed, the way you're building the result, it's going to build a slew of closures (proportional in number to the length) before actually doing anything useful.
How can you fix this? Just use the bytestring library directly. The easiest thing is to use unpack, but I think you'll probably get slightly better performance like this:
{-# language BangPatterns #-}
module Wordy where
import qualified Data.ByteString as BS
import Data.ByteString (ByteString)
import Data.Word (Word16)
import Data.Maybe (fromMaybe)
import Data.Bits (unsafeShiftL)
toDBs :: ByteString -> Maybe [Word16]
toDBs bs0
| odd (BS.length bs0) = Nothing
| otherwise = Just (go bs0)
where
go bs = fromMaybe [] $ do
(b1, bs') <- BS.uncons bs
(b2, bs'') <- BS.uncons bs'
let !res = (fromIntegral b1 `unsafeShiftL` 8) + fromIntegral b2
Just (res : go bs'')

Case conversion of large dictionary word set

I'm trying to match words from a dictionary, case-insensitively. My initial approach
looks like so:
read dict; convert all words to lowercase, store in set.
check new word for membership in set
Is there a better (more efficient) way to achieve this? I'm new to Haskell.
import System.IO
import Data.Text (toLower, pack, unpack)
import Data.Set (fromList, member)
main = do
let path = "/usr/share/dict/american-english"
h <- openFile path ReadMode
hSetEncoding h utf8
contents <- hGetContents h
let mySet = (fromList . map (unpack . toLower . pack) . lines) contents
putStrLn $ show $ member "acadia" mySet

I would just work with Text directly instead of converting to/from Strings.
Data.Text.IO contains versions of hGetContents, readFile, etc. for reading Text from files, and Data.Text has lines for Text.
{-# LANGUAGE OverloadedStrings #-}
import System.IO
import qualified Data.Text as T
import qualified Data.Text.IO as T
import qualified Data.Set as S
main = do
let path = "/usr/share/dict/american-english"
h <- openFile path ReadMode
hSetEncoding h utf8
contents <- T.hGetContents h
let mySet = (S.fromList . map T.toLower . T.lines) contents
putStrLn $ show $ S.member "acadia" mySet
By using T.tolower and T.lines we avoid explicit pack/unpack calls.
mySet is now a set of Text values rather than of Strings. By using
the OverloadedStrings pragma the literal "acadia" will be interpreted
as a Text value.

Yes, what you propose is reasonable. Some few remarks, mostly unrelated to the main question:
It would be more efficient to restrict your self to using only Text and not String.
Prefer the toCaseFold function to toLower, it's more appropriate for this case.

Even though you found my first answer helpful, let me propose another approach...
A boggle solver I wrote simply reads in the entire dictionary as a single ByteString, and to look up words performs a binary search on that ByteString.
The dictionary must already be in sorted order and normalized to lower case, but usually this is not a problem since the dictionary is static
and known in advance.
Of course, when you compute (lo+hi)/2 in performing the binary search you might land in the middle of word, so you simply back up to the beginning of the current word.
The main advantage of this is that loading the dictionary is extremely fast and it is memory efficient. Moreover, the search algorithm has good memory locality. I haven't measured it, but I wouldn't be surprised if creating a Data.Set will more than double the size of the raw data.
The code is available here: https://github.com/erantapaa/hoggle

Haskell requires more memory than Python when I read map from file. Why?

I have this simple code in Python:
input = open("baseforms.txt","r",encoding='utf8')
S = {}
for i in input:
words = i.split()
S.update( {j:words[0] for j in words} )
print(S.get("sometext","not found"))
print(len(S))
It requires 300MB for work. "baseforms.txt" size is 123M.
I've wrote the same code in Haskell:
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.Map as M
import qualified Data.ByteString.Lazy.Char8 as B
import Data.Text.Lazy.Encoding(decodeUtf8)
import qualified Data.Text.Lazy as T
import qualified Data.Text.Lazy.IO as I
import Control.Monad(liftM)
main = do
text <- B.readFile "baseforms.txt"
let m = (M.fromList . (concatMap (parseLine.decodeUtf8))) (B.lines text)
print (M.lookup "sometext" m)
print (M.size m)
where
parseLine line = let base:forms = T.words line in [(f,base)| f<-forms]
It requires 544 MB and it's slower than Python version. Why? Is it possible to optimise Haskell version?

There is a lot happening in the Haskell version that's not happening in the Python version.
readFile uses lazy IO, which is a bit weird in general. I would generally avoid lazy IO.
The file, as a bytestring, is broken into lines which are then decoded as UTF-8. This seems a little unnecessary, given the existence of Text IO functions.
The Haskell version is using a tree (Data.Map) whereas the Python version is using a hash table.
The strings are all lazy, which is probably not necessary if they're relatively short. Lazy strings have a couple words of overhead per string, which can add up. You could fuse the lazy strings, or you could read the file all at once, or you could use something like conduit.
GHC uses a copying collector, whereas the default Python implementation uses malloc() with reference counting and the occasional GC. This fact alone can account for large differences in memory usage, depending on your program.
Who knows how many thunks are getting created in the Haskell version.
It's unknown whether you've enabled optimizations.
It's unknown how much slower the Haskell version is.
We don't have your data file so we can't really test it ourselves.

It's a bit late, but I studied this a little and think Dietrich Epp's account is right, but can be simplified a little. Notice that there doesn't seem to be any real python programming going on in the python file: it is orchestrating a very simple sequence of calls to C string operations and then to a C hash table implementation. (This is often a problem with really simple python v. Haskell benchmarks.) The Haskell, by contrast, is building an immense persistent Map, which is a fancy tree. So the main points of opposition here are C vs Haskell, and hashtable-with-destructive-update vs persistent map. Since there is little overlap in the input file, the tree you are constructing includes all the information in the input string, some of it repeated, and then rearranged with a pile of Haskell constructors. This is I think the source of the alarm you are experiencing, but it can be explained.
Compare these two files, one using ByteString:
import qualified Data.Map as M
import qualified Data.ByteString.Char8 as B
main = do m <- fmap proc (B.readFile "baseforms.txt")
print (M.lookup (B.pack "sometext") m)
print (M.size m)
proc = M.fromList . concatMap (\(a:bs) -> map (flip (,) a) bs)
. map B.words . B.lines
and the other a Text-ified equivalent:
import qualified Data.Map as M
import qualified Data.ByteString.Char8 as B
import Data.Text.Encoding(decodeUtf8)
import qualified Data.Text as T
main = do
m <- fmap proc (B.readFile "baseforms.txt")
print (M.lookup (T.pack "sometext") m)
print (M.size m)
proc = M.fromList . concatMap (\(a:bs) -> map (flip (,) a) bs)
. map T.words . T.lines . decodeUtf8
On my machine, the python/C takes just under 6 seconds, the bytestring file takes 8 seconds, and the text file just over 10.
The bytestring implementation seems to use a bit more memory than the python, the text implementation distinctly more. The text implementation takes more time because, of course, it adds a conversion to text and then uses text operations to break the string and text comparisons to build the map.
Here is a go at analyzing the memory phenomena in the text case. First we have the bytestring in memory (130m). Once the text is constructed (~250m, to judge unscientifically from what's going on in top), the bytestring is garbage collected while we construct the tree. In the end the text tree (~380m it looks like) uses more memory than the bytestring tree (~260m) because the text fragments in the tree are bigger. The program as a whole uses more because the text held in memory during the tree construction is itself bigger. To put it crudely: each bit of white-space is being turned into a tree constructor and two text constructors together with the text version of whatever the first 'word' of the line was and whatever the text representation next word is. The weight of the constructors seems in either case to be about 130m, so at the last moment of the construction of the tree we are using something like 130m + 130m + 130m = 390m in the bytestring case, and 250m + 130m + 250m = 630m in the text case.

Why is this Haskell IO Code so slow? [duplicate]

I have a simple script written in both Python and Haskell. It reads a file with 1,000,000 newline separated integers, parses that file into a list of integers, quick sorts it and then writes it to a different file sorted. This file has the same format as the unsorted one. Simple.
Here is Haskell:
quicksort :: Ord a => [a] -> [a]
quicksort [] = []
quicksort (p:xs) = (quicksort lesser) ++ [p] ++ (quicksort greater)
where
lesser = filter (< p) xs
greater = filter (>= p) xs
main = do
file <- readFile "data"
let un = lines file
let f = map (\x -> read x::Int ) un
let done = quicksort f
writeFile "sorted" (unlines (map show done))
And here is Python:
def qs(ar):
if len(ar) == 0:
return ar
p = ar[0]
return qs([i for i in ar if i < p]) + [p] + qs([i for i in ar if i > p])
def read_file(fn):
f = open(fn)
data = f.read()
f.close()
return data
def write_file(fn, data):
f = open('sorted', 'w')
f.write(data)
f.close()
def main():
data = read_file('data')
lines = data.split('\n')
lines = [int(l) for l in lines]
done = qs(lines)
done = [str(l) for l in done]
write_file('sorted', "\n".join(done))
if __name__ == '__main__':
main()
Very simple. Now I compile the Haskell code with
$ ghc -O2 --make quick.hs
And I time those two with:
$ time ./quick
$ time python qs.py
Results:
Haskell:
real 0m10.820s
user 0m10.656s
sys 0m0.154s
Python:
real 0m9.888s
user 0m9.669s
sys 0m0.203s
How can Python possibly be faster than native code Haskell?
Thanks
EDIT:
Python version: 2.7.1
GHC version: 7.0.4
Mac OSX, 10.7.3
2.4GHz Intel Core i5
List generated by
from random import shuffle
a = [str(a) for a in xrange(0, 1000*1000)]
shuffle(a)
s = "\n".join(a)
f = open('data', 'w')
f.write(s)
f.close()
So all numbers are unique.

The Original Haskell Code
There are two issues with the Haskell version:
You're using string IO, which builds linked lists of characters
You're using a non-quicksort that looks like quicksort.
This program takes 18.7 seconds to run on my Intel Core2 2.5 GHz laptop. (GHC 7.4 using -O2)
Daniel's ByteString Version
This is much improved, but notice it still uses the inefficient built-in merge sort.
His version takes 8.1 seconds (and doesn't handle negative numbers, but that's more of a non-issue for this exploration).
Note
From here on this answer uses the following packages: Vector, attoparsec, text and vector-algorithms. Also notice that kindall's version using timsort takes 2.8 seconds on my machine (edit: and 2 seconds using pypy).
A Text Version
I ripped off Daniel's version, translated it to Text (so it handles various encodings) and added better sorting using a mutable Vector in an ST monad:
import Data.Attoparsec.Text.Lazy
import qualified Data.Text.Lazy as T
import qualified Data.Text.Lazy.IO as TIO
import qualified Data.Vector.Unboxed as V
import qualified Data.Vector.Algorithms.Intro as I
import Control.Applicative
import Control.Monad.ST
import System.Environment (getArgs)
parser = many (decimal <* char '\n')
main = do
numbers <- TIO.readFile =<< fmap head getArgs
case parse parser numbers of
Done t r | T.null t -> writeFile "sorted" . unlines
. map show . vsort $ r
x -> error $ Prelude.take 40 (show x)
vsort :: [Int] -> [Int]
vsort l = runST $ do
let v = V.fromList l
m <- V.unsafeThaw v
I.sort m
v' <- V.unsafeFreeze m
return (V.toList v')
This runs in 4 seconds (and also doesn't handle negatives)
Return to the Bytestring
So now we know we can make a more general program that's faster, what about making the ASCii -only version fast? No problem!
import qualified Data.ByteString.Lazy.Char8 as BS
import Data.Attoparsec.ByteString.Lazy (parse, Result(..))
import Data.Attoparsec.ByteString.Char8 (decimal, char)
import Control.Applicative ((<*), many)
import qualified Data.Vector.Unboxed as V
import qualified Data.Vector.Algorithms.Intro as I
import Control.Monad.ST
parser = many (decimal <* char '\n')
main = do
numbers <- BS.readFile "rands"
case parse parser numbers of
Done t r | BS.null t -> writeFile "sorted" . unlines
. map show . vsort $ r
vsort :: [Int] -> [Int]
vsort l = runST $ do
let v = V.fromList l
m <- V.unsafeThaw v
I.sort m
v' <- V.unsafeFreeze m
return (V.toList v')
This runs in 2.3 seconds.
Producing a Test File
Just in case anyone's curious, my test file was produced by:
import Control.Monad.CryptoRandom
import Crypto.Random
main = do
g <- newGenIO :: IO SystemRandom
let rs = Prelude.take (2^20) (map abs (crandoms g) :: [Int])
writeFile "rands" (unlines $ map show rs)
If you're wondering why vsort isn't packaged in some easier form on Hackage... so am I.

In short, don't use read. Replace read with a function like this:
import Numeric
fastRead :: String -> Int
fastRead s = case readDec s of [(n, "")] -> n
I get a pretty fair speedup:
~/programming% time ./test.slow
./test.slow 9.82s user 0.06s system 99% cpu 9.901 total
~/programming% time ./test.fast
./test.fast 6.99s user 0.05s system 99% cpu 7.064 total
~/programming% time ./test.bytestring
./test.bytestring 4.94s user 0.06s system 99% cpu 5.026 total
Just for fun, the above results include a version that uses ByteString (and hence fails the "ready for the 21st century" test by totally ignoring the problem of file encodings) for ULTIMATE BARE-METAL SPEED. It also has a few other differences; for example, it ships out to the standard library's sort function. The full code is below.
import qualified Data.ByteString as BS
import Data.Attoparsec.ByteString.Char8
import Control.Applicative
import Data.List
parser = many (decimal <* char '\n')
reallyParse p bs = case parse p bs of
Partial f -> f BS.empty
v -> v
main = do
numbers <- BS.readFile "data"
case reallyParse parser numbers of
Done t r | BS.null t -> writeFile "sorted" . unlines . map show . sort $ r

More a Pythonista than a Haskellite, but I'll take a stab:
There's a fair bit of overhead in your measured runtime just reading and writing the files, which is probably pretty similar between the two programs. Also, be careful that you've warmed up the cache for both programs.
Most of your time is spent making copies of lists and fragments of lists. Python list operations are heavily optimized, being one of the most-frequently used parts of the language, and list comprehensions are usually pretty performant too, spending much of their time in C-land inside the Python interpreter. There is not a lot of the stuff that is slowish in Python but wicked fast in static languages, such as attribute lookups on object instances.
Your Python implementation throws away numbers that are equal to the pivot, so by the end it may be sorting fewer items, giving it an obvious advantage. (If there are no duplicates in the data set you're sorting, this isn't an issue.) Fixing this bug probably requires making another copy of most of the list in each call to qs(), which would slow Python down a little more.
You don't mention what version of Python you're using. If you're using 2.x, you could probably get Haskell to beat Python just by switching to Python 3.x. :-)
I'm not too surprised the two languages are basically neck-and-neck here (a 10% difference is not noteworthy). Using C as a performance benchmark, Haskell loses some performance for its lazy functional nature, while Python loses some performance due to being an interpreted language. A decent match.
Since Daniel Wagner posted an optimized Haskell version using the built-in sort, here's a similarly optimized Python version using list.sort():
mylist = [int(x.strip()) for x in open("data")]
mylist.sort()
open("sorted", "w").write("\n".join(str(x) for x in mylist))
3.5 seconds on my machine, vs. about 9 for the original code. Pretty much still neck-and-neck with the optimized Haskell. Reason: it's spending most of its time in C-programmed libraries. Also, TimSort (the sort used in Python) is a beast.

This is after the fact, but I think most of the trouble is in the Haskell writing. The following module is pretty primitive -- one should use builders probably and certainly avoid the ridiculous roundtrip via String for showing -- but it is simple and did distinctly better than pypy with kindall's improved python and better than the 2 and 4 sec Haskell modules elsewhere on this page (it surprised me how much they were using lists, so I made a couple more turns of the crank.)
$ time aa.hs real 0m0.709s
$ time pypy aa.py real 0m1.818s
$ time python aa.py real 0m3.103s
I'm using the sort recommended for unboxed vectors from vector-algorithms. The use of Data.Vector.Unboxed in some form is clearly now the standard, naive way of doing this sort of thing -- it's the new Data.List (for Int, Double, etc.) Everything but the sort is irritating IO management, which could I think still be massively improved, on the write end in particular. The reading and sorting together take about 0.2 sec as you can see from asking it to print what's at a bunch of indexes instead of writing to file, so twice as much time is spent writing as in anything else. If the pypy is spending most of its time using timsort or whatever, then it looks like the sorting itself is surely massively better in Haskell, and just as simple -- if you can just get your hands on the darned vector...
I'm not sure why there aren't convenient functions around for reading and writing vectors of unboxed things from natural formats -- if there were, this would be three lines long and would avoid String and be much faster, but maybe I just haven't seen them.
import qualified Data.ByteString.Lazy.Char8 as BL
import qualified Data.ByteString.Char8 as B
import qualified Data.Vector.Unboxed.Mutable as M
import qualified Data.Vector.Unboxed as V
import Data.Vector.Algorithms.Radix
import System.IO
main = do unsorted <- fmap toInts (BL.readFile "data")
vec <- V.thaw unsorted
sorted <- sort vec >> V.freeze vec
withFile "sorted" WriteMode $ \handle ->
V.mapM_ (writeLine handle) sorted
writeLine :: Handle -> Int -> IO ()
writeLine h int = B.hPut h $ B.pack (show int ++ "\n")
toInts :: BL.ByteString -> V.Vector Int
toInts bs = V.unfoldr oneInt (BL.cons ' ' bs)
oneInt :: BL.ByteString -> Maybe (Int, BL.ByteString)
oneInt bs = if BL.null bs then Nothing else
let bstail = BL.tail bs
in if BL.null bstail then Nothing else BL.readInt bstail

To follow up #kindall interesting answer, those timings are dependent from both the python / Haskell implementation you use, the hardware configuration on which you run the tests, and the algorithm implementation you right in both languages.
Nevertheless we can try to get some good hints of the relative performances of one language implementation compared to another, or from one language to another language. With well known alogrithms like qsort, it's a good beginning.
To illustrate a python/python comparison, I just tested your script on CPython 2.7.3 and PyPy 1.8 on the same machine:
CPython: ~8s
PyPy: ~2.5s
This shows there can be room for improvements in the language implementation, maybe compiled Haskell is not performing at best the interpretation and compilation of your corresponding code. If you are searching for speed in Python, consider also to switch to pypy if needed and if your covering code permits you to do so.

i noticed some problem everybody else didn't notice for some reason; both your haskell and python code have this. (please tell me if it's fixed in the auto-optimizations, I know nothing about optimizations). for this I will demonstrate in haskell.
in your code you define the lesser and greater lists like this:
where lesser = filter (<p) xs
greater = filter (>=p) xs
this is bad, because you compare with p each element in xs twice, once for getting in the lesser list, and again for getting in the greater list. this (theoretically; I havn't checked timing) makes your sort use twice as much comparisons; this is a disaster. instead, you should make a function which splits a list into two lists using a predicate, in such a way that
split f xs
is equivalent to
(filter f xs, filter (not.f) xs)
using this kind of function you will only need to compare each element in the list once to know in which side of the tuple to put it.
okay, lets do it:
where
split :: (a -> Bool) -> [a] -> ([a], [a])
split _ [] = ([],[])
split f (x:xs)
|f x = let (a,b) = split f xs in (x:a,b)
|otherwise = let (a,b) = split f xs in (a,x:b)
now lets replace the lesser/greater generator with
let (lesser, greater) = split (p>) xs in (insert function here)
full code:
quicksort :: Ord a => [a] -> [a]
quicksort [] = []
quicksort (p:xs) =
let (lesser, greater) = splitf (p>) xs
in (quicksort lesser) ++ [p] ++ (quicksort greater)
where
splitf :: (a -> Bool) -> [a] -> ([a], [a])
splitf _ [] = ([],[])
splitf f (x:xs)
|f x = let (a,b) = splitf f xs in (x:a,b)
|otherwise = let (a,b) = splitf f xs in (a,x:b)
for some reason I can't right the getter/lesser part in the where clauses so I had to right it in let clauses.
also, if it is not tail-recursive let me know and fix it for me (I don't know yet how tail-recorsive works fully)
now you should do the same for the python code. I don't know python so I can't do it for you.
EDIT:
there actually happens to already be such function in Data.List called partition. note this proves the need for this kind of function because otherwise it wouldn't be defined.
this shrinks the code to:
quicksort :: Ord a => [a] -> [a]
quicksort [] = []
quicksort (p:xs) =
let (lesser, greater) = partition (p>) xs
in (quicksort lesser) ++ [p] ++ (quicksort greater)

Python is really optimized for this sort of thing. I suspect that Haskell isn't. Here's a similar question that provides some very good answers.

Space explosion when folding over Producers/Parsers in Haskell

Supposing I have a module like this:
module Explosion where
import Pipes.Parse (foldAll, Parser, Producer)
import Pipes.ByteString (ByteString, fromLazy)
import Pipes.Aeson (DecodingError)
import Pipes.Aeson.Unchecked (decoded)
import Data.List (intercalate)
import Data.ByteString.Lazy.Char8 (pack)
import Lens.Family (view)
import Lens.Family.State.Strict (zoom)
produceString :: Producer ByteString IO ()
produceString = fromLazy $ pack $ intercalate " " $ map show [1..1000000]
produceInts ::
Producer Int IO (Either (DecodingError, Producer ByteString IO ()) ())
produceInts = view decoded produceString
produceInts' :: Producer Int IO ()
produceInts' = produceInts >> return ()
parseBiggest :: Parser ByteString IO Int
parseBiggest = zoom decoded (foldAll max 0 id)
The 'produceString' function is a bytestring producer, and I am concerned with folding a parse over it to produce some kind of result.
The following two programs show different ways of tackling the problem of finding the maximum value in the bytestring by parsing it as a series of JSON ints.
Program 1:
module Main where
import Explosion (produceInts')
import Pipes.Prelude (fold)
main :: IO ()
main = do
biggest <- fold max 0 id produceInts'
print $ show biggest
Program 2:
module Main where
import Explosion (parseBiggest, produceString)
import Pipes.Parse (evalStateT)
main :: IO ()
main = do
biggest <- evalStateT parseBiggest produceString
print $ show biggest
Unfortunately, both programs eat about 200MB of memory total when I profile them, a problem I'd hoped the use of streaming parsers would solve. The first program spends most of its time and memory (> 70%) in (^.) from Lens.Family, while the second spends it in fmap, called by zoom from Lens.Family.State.Strict. The usage graphs are below. Both programs spend about 70% of their time doing garbage collection.
Am I doing something wrong? Is the Prelude function max not strict enough? I can't tell if the library functions are bad, or if I'm using the library wrong! (It's probably the latter.)
For completeness, here's a git repo that you can clone and run cabal install in if you'd like to see what I'm talking about first-hand, and here's the memory usage of the two programs:

Wrapping a strict bytestring in a single yield doesn't make it lazy. You have to yield smaller chunks to get any streaming behavior.
Edit: I found the error. pipes-aeson internally uses a consecutively function defined like this:
consecutively parser = step where
step p0 = do
(mr, p1) <- lift $
S.runStateT atEndOfBytes (p0 >-> PB.dropWhile B.isSpaceWord8)
case mr of
Just r -> return (Right r)
Nothing -> do
(ea, p2) <- lift (S.runStateT parser p1)
case ea of
Left e -> return (Left (e, p2))
Right a -> yield a >> step p2
The problematic line is the one with PB.dropWhile. This adds a quadratic blow up proportional to the number of parsed elements.
What happens is that the pipe that is threaded through this computation accumulates a new cat pipe downstream of it after each parse. So after N parses you get N cat pipes, which adds O(N) overhead to each parsed element.
I've created a Github issue to fix this. pipes-aeson is maintained by Renzo and he has fixed this issue before.
Edit: I've submitted a pull request to fix a second problem (you needed to use the intercalate for lazy bytestrings). Now the program runs in 5 KB constant space for both versions:

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string