Suppose I have a tree represented as a list of parents and I want to reverse the edges, obtaining a list of children for each node. For this tree - http://i.stack.imgur.com/uapqT.png - transformation would look like:
[0,0,0,1,1,2,5,4,4] -> [[2,1],[4,3],[5],[],[8,7],[6],[],[],[]]
But it's not limited to graph transposing, however. I have a few other problems that I would solve in imperative language in the following way: traverse some source data array and non-sequentially update a resulting array as I get to know something about it.
Essentially, my question is "what is Haskell's idiomatic way to solve things like this?". As I understand, I can do it in imperative way by means of mutable vectors, but isn't there some purely functional method? If not, how would I properly use mutables?
Finally, I need it to work fast, that is O(n) complexity, and non-standard packages are not an option for me.
It's worth to consider the pure functions in Data.Vector or Data.Array that internally use mutation, in order to be more efficient (the accum-s in both libraries, plus the unfolds and construct-s in vector).
The accum-s are great when we don't care about intermediate states of an array during construction. They're nicely applicable for transposing graphs, although we have to provide a range for the node keys:
{-# LANGUAGE TupleSections #-}
import qualified Data.Array as A
type Graph = [(Int, [Int])]
transpose :: (Int, Int) -> Graph -> Graph
transpose range g =
A.assocs $ A.accumArray (flip (:)) [] range (do {(i, ns) <- g; map (,i) ns})
Here we first unroll the graph into an adjacency list, but with swapped pairs of indices, and then accumulate them into an array. It's roughly as fast as a standard imperative loop over a mutable array, and it's more convenient than the ST monad.
Alternatively, we can just use IntMap, likely alongside the State monad, and just port our imperative algorithms as they are, and the performance will be satisfactory for most purposes.
Fortunately IntMap provides a lot of higher-order functions, so we're not (always) forced to program in an imperative style with it. There's an analogue for accum, for instance:
import qualified Data.IntMap.Strict as IM
transpose :: Graph -> Graph
transpose g =
IM.assocs $ IM.fromListWith (++) (do {(i, ns) <- g; (i,[]) : map (,[i]) ns})
A purely functional way would be to use a map to store the information, producing O(n log n) algorithm:
import qualified Data.IntMap as IM
import Data.Maybe (fromMaybe)
childrenMap :: [Int] -> IM.IntMap [Int]
childrenMap xs = foldr addChild IM.empty $ zip xs [0..]
where
addChild :: (Int, Int) -> IM.IntMap [Int] -> IM.IntMap [Int]
addChild (parent, child) = IM.alter (Just . (child :) . fromMaybe []) parent
You could also use an imperative solution and keep things pure using the ST monad, which is obviously O(n), but the imperative code somewhat obscures the main idea:
import Control.Monad (forM_)
import Data.Array
import Data.Array.MArray
import Data.Array.ST
childrenST :: [Int] -> [[Int]]
childrenST xs = elems $ runSTArray $ do
let l = length xs
arr <- newArray (0, l - 1) []
let add (parent, child) =
writeArray arr parent . (child :) =<< readArray arr parent
forM_ (zip xs [0..]) add
return arr
One drawback of this approach is that an index is out of bounds, it just fails.
Another is that you traverse the list twice. However, if you used arrays instead of lists everywhere, this wouldn't matter.
Related
In Haskell we have list generators, such as:
[x+y | x<-[1,2,3], y<-[1,2,3]]
with which we get
[2,3,4,3,4,5,4,5,6]
Is it possible to have a set generator which doesn't automatically add an element if it is already in the list?
In our example we would obtain:
[2,3,4,5,6]
If so, how? If it is not already implemented, how would you implement it?
Haskell can do this, but not quite out-of-the-box.
The basic underpinning is that a list comprehension can also be written as a monadic binding chain, in the list monad:
Prelude> [x+y | x<-[1,2,3], y<-[1,2,3]]
[2,3,4,3,4,5,4,5,6]
Prelude> [1,2,3] >>= \x -> [1,2,3] >>= \y -> return (x+y)
[2,3,4,3,4,5,4,5,6]
...or, with better readable do-syntax (which is syntactic sugar for monadic binding)
Prelude> do x<-[1,2,3]; y<-[1,2,3]; return (x+y)
[2,3,4,3,4,5,4,5,6]
In fact, there's a language extension that also turns all list comprehensions into syntactic sugar for such a monadic chain. Example in the tuple (aka writer) monad:
Prelude> :set -XMonadComprehensions
Prelude> [x+y | x <- ("Hello", 4), y <- ("World", 5)] :: (String, Int)
("HelloWorld",9)
So really, all we need is a set monad. This is sensible enough, however Data.Set.Set is not a monad on Hask (the category of all Haskell types) but only only the subcategory that satisfies the Ord constraint (which is needed for lookup / to avoid duplicates). In the case of sets, there is however a hack that allows hiding that constraint from the actual monad instance; it's used in the set-monad package. Et voilĂ :
Prelude Data.Set.Monad> [x+y | x<-fromList[1,2,3], y<-fromList[1,2,3]]
fromList [2,3,4,5,6]
The hack that's needed for instance Monad Set comes at a price. It works like this:
{-# LANGUAGE GADTs, RankNTypes #-}
import qualified Data.Set as DS
data Set r where
Prim :: (Ord r => DS.Set r) -> Set r
Return :: a -> Set a
Bind :: Set a -> (a -> Set b) -> Set b
...
What this means is: a value of type Data.Set.Monad.Set Int does not really contain a concrete set of integers. Rather, it contains an abstract syntax expression for a computation that yields a set as the result. This isn't great for performance, in particular it means that values won't be shared. So don't use this for big sets.
There's a better option: use it directly as a monad in the proper category (which only contains orderable types to begin with). This unfortunately requires even more language-bending, but it's possible; I've made an example in the constrained-categories library.
If your values can be put in Data.Set.Set (i.e. they are in class Ord) you can just apply Data.Set.toList . Data.Set.fromList to your list:
Prelude> import Data.Set
Prelude Data.Set> Data.Set.toList . Data.Set.fromList $ [x+y | x<-[1,2,3], y<-[1,2,3]]
[2,3,4,5,6]
The complexity of this would be O(n log n).
If the type obeys (Eq a, Hashable a), you can use Data.HashSet in much the same way. The average complexity is O(n).
If all you have is Eq, you have to get by with something like Data.List.nub:
Prelude> import Data.List
Prelude Data.List> nub [x+y | x<-[1,2,3], y<-[1,2,3]]
[2,3,4,5,6]
but the complexity is inherently quadratic.
As part of learning how to work with StateT and the nondeterminism monad, I'd like to write a function which uses these to enumerate the partitions of an integer (while being allowed to reuse integers). For example, passing an argument of 4 should result in [[1,1,1,1],[1,1,2],[2,2],[1,3],[4]] (uniqueness doesn't matter, I'm more concerned with just getting to working code).
(Also, I'm aware that there's a recursive solution for generating partitions as well as dynamic programming and generating function based solutions for counting partitions - the purpose of this exercise is to construct a minimal working example that combines StateT and [].)
Here's my attempt that was designed to work on any input less than or equal to 5:
{-# LANGUAGE NoImplicitPrelude #-}
{-# OPTIONS_GHC -Wall #-}
import CorePrelude
import Control.Monad.State.Lazy
sumState :: StateT Int [] [Int]
sumState = do
m <- lift [1..5]
n <- get <* modify (-m+)
case compare n 0 of
LT -> mzero
EQ -> return [m]
GT -> fmap (n:) sumState
runner :: Int -> [([Int],Int)]
runner = runStateT sumState
I'm using runStateT rather than evalStateT to help with debugging (it's helpful to see the final state values). Like I said, I'm not too worried about generating unique partitions since I'd first like to just understand the correct way to use these two monads together.
Loading it in GHCi and evaluating runner 4 results in the following and I'm confused as to why the above code produces this output.
[([4,3,2,1,1],-1),([4,3,2,1,2],-2),([4,3,2,1,3],-3),([4,3,2,1,4],-4),([4,3,2,1,5],-5),([4,3,2,1],-1),([4,3,2,2],-2),([4,3,2,3],-3),([4,3,2,4],-4),([4,3,2,5],-5),([4,3,1,1],-1),([4,3,1,2],-2),([4,3,1,3],-3),([4,3,1,4],-4),([4,3,1,5],-5),([4,3,1],-1),([4,3,2],-2),([4,3,3],-3),([4,3,4],-4),([4,3,5],-5),([4,2,1,1],-1),([4,2,1,2],-2),([4,2,1,3],-3),([4,2,1,4],-4),([4,2,1,5],-5),([4,2,1],-1),([4,2,2],-2),([4,2,3],-3),([4,2,4],-4),([4,2,5],-5),([4,1,1],-1),([4,1,2],-2),([4,1,3],-3),([4,1,4],-4),([4,1,5],-5),([4,1],-1),([4,2],-2),([4,3],-3),([4,4],-4),([4,5],-5)]
What am I doing wrong? What's the correct way to combine StateT and [] in order to enumerate partitions?
You just have two little mistakes. The first is here:
n <- get <* modify (-m+)
This gets the value of n before we subtract m. You almost certainly want
n <- modify (-m+) >> get
instead, or
modify (-m+)
n <- get
if you prefer that spelling. The other is that you're putting the current state in the list instead of the value you're adding in the GT branch:
GT -> fmap (n:) sumState
Change that to
GT -> fmap (m:) sumState
and you're golden:
*Main> runner 4
[([1,1,1,1],0),([1,1,2],0),([1,2,1],0),([1,3],0),([2,1,1],0),([2,2],0),([3,1],0),([4],0)]
This is my FIRST haskell program! "wordCount" takes in a list of words and returns a tuple with with each case-insensitive word paired with its usage count. Any suggestions for improvement on either code readability or performance?
import List;
import Char;
uniqueCountIn ns xs = map (\x -> length (filter (==x) xs)) ns
nubl (xs) = nub (map (map toLower) xs) -- to lowercase
wordCount ws = zip ns (uniqueCountIn ns ws)
where ns = nubl ws
Congrats on your first program!
For cleanliness: lose the semicolons. Use the new hierarchical module names instead (Data.List, Data.Char). Add type signatures. As you get more comfortable with function composition, eta contract your function definitions (remove rightmost arguments). e.g.
nubl :: [String] -> [String]
nubl = nub . map (map toLower)
If you want to be really rigorous, use explicit import lists:
import Data.List (nub)
import Data.Char (toLower)
For performance: use a Data.Map to store the associations instead of nub and filter. In particular, see fromListWith and toList. Using those functions you can simplify your implementation and improve performance at the same time.
One of the ways to improve readibility is to try to get used to the standard functions. Hoogle is one of the tools that sets Haskell apart from the rest of the world ;)
import Data.Char (toLower)
import Data.List (sort, group)
import Control.Arrow ((&&&))
wordCount :: String -> [(String, Int)]
wordCount = map (head &&& length) . group . sort . words . map toLower
EDIT: Explanation: So you think of it as a chain of mappings:
(map toLower) :: String -> String lowercases the entire text, for the purpose of case
insensitivity
words :: String -> [String] splits a piece of text into words
sort :: Ord a => [a] -> [a] sorts
group :: Eq a => [a] -> [[a]] gathers identicial elements in a list, for example, group
[1,1,2,3,3] -> [[1,1],[2],[3,3]]
&&& :: (a -> b) -> (a -> c) -> (a -> (b, c)) applies two functions on the same piece of data, then returns
the tuple of results. For example: (head &&& length) ["word","word","word"] -> ("word", 3) (actually &&& is a little more general, but the simplified explanation works for this example)
EDIT: Or actually, look for the "multiset" package on Hackage.
It is always good to ask more experienced developers for feedback. Nevertheless you could use hlint to get feedback on some small scale issues. It'll tell you about hierarchical imports, unncessary parenthesis, alternative higher-order functions, etc.
Regarding the function, nub1. If you don't follow luqui's advice to remove the parameter altogether yet, I would at least remove the parenthesis around xs on the right side of the equation.
Is it possible to implement a quicksort in Haskell (with RANDOM-PIVOT) that still has a simple Ord a => [a]->[a] signature?
I'm starting to understand Monads, and, for now, I'm kind of interpreting monads as somethink like a 'command pattern', which works great for IO.
So, I understand that a function that returns a random number should actually return a monadic value like IO, because, otherwise, it would break referential transparency. I also understand that there should be no way to 'extract' the random integer from the returned monadic value, because, otherwise, it would, again, break referential transparency.
But yet, I still think that it should be possible to implement a 'pure' [a]->[a] quicksort function, even if it uses random pivot, because, it IS referential transparent. From my point of view, the random pivot is just a implementation detail, and shouldn't change the function's signature
OBS: I'm not actually interested in the specific quicksort problem (so, I don't want to sound rude but I'm not looking for "use mergesort" or "random pivot doesn't increase performance in practice" kind of answers) I'm actually interested in how to implement a 'pure' function that uses 'impure' functions inside it, in cases like quicksort, where I can assure that the function actually is a pure one.
Quicksort is just a good example.
You are making a false assumption that picking the pivot point is just an implementation detail. Consider a partial ordering on a set. Like a quicksort on cards where
card a < card b if the face value is less but if you were to evaluate booleans:
4 spades < 4 hearts (false)
4 hearts < 4 spades (false)
4 hearts = 4 spades (false)
In that case the choice of pivots would determine the final ordering of the cards. In precisely the same way
for a function like
a = get random integer
b = a + 3
print b
is determined by a. If you are randomly choosing something then your computation is or could be non deterministic.
OK, check this out.
Select portions copied form the hashable package, and voodoo magic language pragmas
{-# LANGUAGE FlexibleInstances, UndecidableInstances, NoMonomorphismRestriction, OverlappingInstances #-}
import System.Random (mkStdGen, next, split)
import Data.List (foldl')
import Data.Bits (shiftL, xor)
class Hashable a where
hash :: a -> Int
instance (Integral a) => Hashable a where
hash = fromIntegral
instance Hashable Char where
hash = fromEnum
instance (Hashable a) => Hashable [a] where
hash = foldl' combine 0 . map hash
-- ask the authors of the hashable package about this if interested
combine h1 h2 = (h1 + h1 `shiftL` 5) `xor` h2
OK, so now we can take a list of anything Hashable and turn it into an Int. I've provided Char and Integral a instances here, more and better instances are in the hashable packge, which also allows salting and stuff.
This is all just so we can make a number generator.
genFromHashable = mkStdGen . hash
So now the fun part. Let's write a function that takes a random number generator, a comparator function, and a list. Then we'll sort the list by consulting the generator to select a pivot, and the comparator to partition the list.
qSortByGen _ _ [] = []
qSortByGen g f xs = qSortByGen g'' f l ++ mid ++ qSortByGen g''' f r
where (l, mid, r) = partition (`f` pivot) xs
pivot = xs !! (pivotLoc `mod` length xs)
(pivotLoc, g') = next g
(g'', g''') = split g'
partition f = foldl' step ([],[],[])
where step (l,mid,r) x = case f x of
LT -> (x:l,mid,r)
EQ -> (l,x:mid,r)
GT -> (l,mid,x:r)
Library functions: next grabs an Int from the generator, and produces a new generator. split forks the generator into two distinct generators.
My functions: partition uses f :: a -> Ordering to partition the list into three lists. If you know folds, it should be quite clear. (Note that it does not preserve the initial ordering of the elements in the sublists; it reverses them. Using a foldr could remedy this were it an issue.) qSortByGen works just like I said before: consult the generator for the pivot, partition the list, fork the generator for use in the two recursive calls, recursively sort the left and right sides, and concatenate it all together.
Convenience functions are easy to compose from here
qSortBy f xs = qSortByGen (genFromHashable xs) f xs
qSort = qSortBy compare
Notice the final function's signature.
ghci> :t qSort
qSort :: (Ord a, Hashable a) => [a] -> [a]
The type inside the list must implement both Hashable and Ord. There's the "pure" function you were asking for, with one logical added requirement. The more general functions are less restrictive in their requirements.
ghci> :t qSortBy
qSortBy :: (Hashable a) => (a -> a -> Ordering) -> [a] -> [a]
ghci> :t qSortByGen
qSortByGen
:: (System.Random.RandomGen t) =>
t -> (a -> a -> Ordering) -> [a] -> [a]
Final notes
qSort will behave exactly the same way for all inputs. The "random" pivot selection is. in fact, deterministic. But it is obscured by hashing the list and then seeding a random number generator, making it "random" enough for me. ;)
qSort also only works for lists with length less than maxBound :: Int, which ghci tells me is 9,223,372,036,854,775,807. I thought there would be an issue with negative indexes, but in my ad-hoc testing I haven't run into it yet.
Or, you can just live with the IO monad for "truer" randomness.
qSortIO xs = do g <- getStdGen -- add getStdGen to your imports
return $ qSortByGen g compare xs
ghci> :t qSortIO
qSortIO :: (Ord a) => [a] -> IO [a]
ghci> qSortIO "Hello world"
" Hdellloorw"
ghci> qSort "Hello world"
" Hdellloorw"
In such cases, where you know that the function is referentially transparent, but you can't proof it to the compiler, you may use the function unsafePerformIO :: IO a -> a from the module Data.Unsafe.
For instance, you may use unsafePerformIO to get an initial random state and then do anything using just this state.
But please notice: Don't use it if it's not really needed. And even then, think twice about it. unsafePerformIO is somewhat the root of all evil, since it's consequences can be dramatical - anything is possible from coercing different types to crashing the RTS using this function.
Haskell provides the ST monad to perform non-referentially-transparent actions with a referentially transparent result.
Note that it doesn't enforce referential transparency; it just insures that potentially non-referentially-transparent temporary state can't leak out. Nothing can prevent you from returning manipulated pure input data that was rearranged in a non-reproducible way. Best is to implement the same thing in both ST and pure ways and use QuickCheck to compare them on random inputs.
As an exercise in Haskell, I'm trying to implement heapsort. The heap is usually implemented as an array in imperative languages, but this would be hugely inefficient in purely functional languages. So I've looked at binary heaps, but everything I found so far describes them from an imperative viewpoint and the algorithms presented are hard to translate to a functional setting. How to efficiently implement a heap in a purely functional language such as Haskell?
Edit: By efficient I mean it should still be in O(n*log n), but it doesn't have to beat a C program. Also, I'd like to use purely functional programming. What else would be the point of doing it in Haskell?
There are a number of Haskell heap implementations in an appendix to Okasaki's Purely Functional Data Structures. (The source code can be downloaded at the link. The book itself is well worth reading.) None of them are binary heaps, per se, but the "leftist" heap is very similar. It has O(log n) insertion, removal, and merge operations. There are also more complicated data structures like skew heaps, binomial heaps, and splay heaps which have better performance.
Jon Fairbairn posted a functional heapsort to the Haskell Cafe mailing list back in 1997:
http://www.mail-archive.com/haskell#haskell.org/msg01788.html
I reproduce it below, reformatted to fit this space. I've also slightly simplified the code of merge_heap.
I'm surprised treefold isn't in the standard prelude since it's so useful. Translated from the version I wrote in Ponder in October 1992 -- Jon Fairbairn
module Treefold where
-- treefold (*) z [a,b,c,d,e,f] = (((a*b)*(c*d))*(e*f))
treefold f zero [] = zero
treefold f zero [x] = x
treefold f zero (a:b:l) = treefold f zero (f a b : pairfold l)
where
pairfold (x:y:rest) = f x y : pairfold rest
pairfold l = l -- here l will have fewer than 2 elements
module Heapsort where
import Treefold
data Heap a = Nil | Node a [Heap a]
heapify x = Node x []
heapsort :: Ord a => [a] -> [a]
heapsort = flatten_heap . merge_heaps . map heapify
where
merge_heaps :: Ord a => [Heap a] -> Heap a
merge_heaps = treefold merge_heap Nil
flatten_heap Nil = []
flatten_heap (Node x heaps) = x:flatten_heap (merge_heaps heaps)
merge_heap heap Nil = heap
merge_heap node_a#(Node a heaps_a) node_b#(Node b heaps_b)
| a < b = Node a (node_b: heaps_a)
| otherwise = Node b (node_a: heaps_b)
You could also use the ST monad, which allows you to write imperative code but expose a purely functional interface safely.
As an exercise in Haskell, I implemented an imperative heapsort with the ST Monad.
{-# LANGUAGE ScopedTypeVariables #-}
import Control.Monad (forM, forM_)
import Control.Monad.ST (ST, runST)
import Data.Array.MArray (newListArray, readArray, writeArray)
import Data.Array.ST (STArray)
import Data.STRef (newSTRef, readSTRef, writeSTRef)
heapSort :: forall a. Ord a => [a] -> [a]
heapSort list = runST $ do
let n = length list
heap <- newListArray (1, n) list :: ST s (STArray s Int a)
heapSizeRef <- newSTRef n
let
heapifyDown pos = do
val <- readArray heap pos
heapSize <- readSTRef heapSizeRef
let children = filter (<= heapSize) [pos*2, pos*2+1]
childrenVals <- forM children $ \i -> do
childVal <- readArray heap i
return (childVal, i)
let (minChildVal, minChildIdx) = minimum childrenVals
if null children || val < minChildVal
then return ()
else do
writeArray heap pos minChildVal
writeArray heap minChildIdx val
heapifyDown minChildIdx
lastParent = n `div` 2
forM_ [lastParent,lastParent-1..1] heapifyDown
forM [n,n-1..1] $ \i -> do
top <- readArray heap 1
val <- readArray heap i
writeArray heap 1 val
writeSTRef heapSizeRef (i-1)
heapifyDown 1
return top
btw I contest that if it's not purely functional then there is no point in doing so in Haskell. I think my toy implementation is much nicer than what one would achieve in C++ with templates, passing around stuff to the inner functions.
And here is a Fibonacci Heap in Haskell:
https://github.com/liuxinyu95/AlgoXY/blob/algoxy/datastruct/heap/other-heaps/src/FibonacciHeap.hs
Here are the pdf file for some other k-ary heaps based on Okasaki's work.
https://github.com/downloads/liuxinyu95/AlgoXY/kheap-en.pdf
Just like in efficient Quicksort algorithms written in Haskell, you need to use monads (state transformers) to do stuff in-place.
Arrays in Haskell aren't as hugely inefficient as you might think, but typical practice in Haskell would probably be to implement this using ordinary data types, like this:
data Heap a = Empty | Heap a (Heap a) (Heap a)
fromList :: Ord a => [a] -> Heap a
toSortedList :: Ord a => Heap a -> [a]
heapSort = toSortedList . fromList
If I were solving this problem, I might start by stuffing the list elements into an array, making it easier to index them for heap creation.
import Data.Array
fromList xs = heapify 0 where
size = length xs
elems = listArray (0, size - 1) xs :: Array Int a
heapify n = ...
If you're using a binary max heap, you might want to keep track of the size of the heap as you remove elements so you can find the bottom right element in O(log N) time. You could also take a look at other types of heaps that aren't typically implemented using arrays, like binomial heaps and fibonacci heaps.
A final note on array performance: in Haskell there's a tradeoff between using static arrays and using mutable arrays. With static arrays, you have to create new copies of the arrays when you change the elements. With mutable arrays, the garbage collector has a hard time keeping different generations of objects separated. Try implementing the heapsort using an STArray and see how you like it.
I tried to port standard binary heap into functional settings. There is an article with described idea: A Functional Approach to Standard Binary Heaps. All the source code listings in the article are in Scala. But it might be ported very easy into any other functional language.
Here is a page containing an ML version of HeapSort. It's quite detailed and should provide a good starting point.
http://flint.cs.yale.edu/cs428/coq/doc/Reference-Manual021.html