I am trying to construct a lazy data structure that holds an infinite bitmap. I would like to support the following operations:
true :: InfBitMap
Returns an infinite bitmap of True, i.e. all positions should have value True.
falsify :: InfBitMap -> [Int] -> InfBitMap
Set all positions in the list to False. The list is possible infinite. For example, falsify true [0,2..] will return a list where all (and only) odd positions are True.
check :: InfBitMap -> Int -> Bool
Check the value of the index.
Here is what I could do so far.
-- InfBitMap will look like [(#), (#, #), (#, #, #, #)..]
type InfBitMap = [Seq Bool]
true :: InfBitMap
true = iterate (\x -> x >< x) $ singleton True
-- O(L * log N) where N is the biggest index in the list checked for later
-- and L is the length of the index list. It is assumed that the list is
-- sorted and unique.
falsify :: InfBitMap -> [Int] -> InfBitMap
falsify ls is = map (falsify' is) ls
where
-- Update each sequence with all indices within its length
-- Basically composes a list of (update pos False) for all positions
-- within the length of the sequence and then applies it.
falsify' is l = foldl' (.) id
(map ((flip update) False)
(takeWhile (< length l) is))
$ l
-- O(log N) where N is the index.
check :: InfBitMap -> Int -> Bool
check ls i = index (fromJust $ find ((> i) . length) ls) i
I am wondering if there is some Haskellish concept/data-structure that I am missing that would make my code more elegant / more efficient (constants do not matter to me, just order). I tried looking at Zippers and Lenses but they do not seem to help. I would like to keep the complexities of updates and checks logarithmic (maybe just amortized logarithmic).
Note: before someone suspects it, no this is not a homework problem!
Update:
It just occurred to me that check can be improved to:
-- O(log N) where N is the index.
-- Returns "collapsed" bitmap for later more efficient checks.
check :: InfBitMap -> Int -> (Bool, InfBitMap)
check ls i = (index l i, ls')
where
ls'#(l:_) = dropWhile ((<= i) . length) ls
Which can be turned into a Monad for code cleanliness.
A slight variation on the well-known integer trie seems to be applicable here.
{-# LANGUAGE DeriveFunctor #-}
data Trie a = Trie a (Trie a) (Trie a) deriving (Functor)
true :: Trie Bool
true = Trie True true true
-- O(log(index))
check :: Trie a -> Int -> a
check t i | i < 0 = error "negative index"
check t i = go t (i + 1) where
go (Trie a _ _) 1 = a
go (Trie _ l r) i = go (if even i then l else r) (div i 2)
--O(log(index))
modify :: Trie a -> Int -> (a -> a) -> Trie a
modify t i f | i < 0 = error "negative index"
modify t i f = go t (i + 1) where
go (Trie a l r) 1 = Trie (f a) l r
go (Trie a l r) i | even i = Trie a (go l (div i 2)) r
go (Trie a l r) i = Trie a l (go r (div i 2))
Unfortunately we can't use modify to implement falsify because we can't handle infinite lists of indices that way (all modifications have to be performed before an element of the trie can be inspected). Instead, we should do something more like a merge:
ascIndexModify :: Trie a -> [(Int, a -> a)] -> Trie a
ascIndexModify t is = go 1 t is where
go _ t [] = t
go i t#(Trie a l r) ((i', f):is) = case compare i (i' + 1) of
LT -> Trie a (go (2*i) l ((i', f):is)) (go (2*i+1) r ((i', f):is))
GT -> go i t is
EQ -> Trie (f a) (go (2*i) l is) (go (2*i+1) r is)
falsify :: Trie Bool -> [Int] -> Trie Bool
falsify t is = ascIndexModify t [(i, const False) | i <- is]
We assume strictly ascending indices in is, since otherwise we would skip places in the trie or even get non-termination, for example in check (falsify t (repeat 0)) 1.
The time complexities are a bit complicated by laziness. In check (falsify t is) index, we pay an additional cost of a constant log 2 index number of comparisons, and a further length (filter (<index) is) number of comparisons (i. e. the cost of stepping over all indices smaller than what we're looking up). You could say it's O(max(log(index), length(filter (<index) is)). Anyway, it's definitely better than the O(length is * log (index)) that we would get for a falsify implemented for finite is-es using modify.
We must keep in mind that tree nodes are evaluated once, and subsequent check-s for the same index after the first check are not paying any extra cost for falsify. Again, laziness makes this a bit complicated.
This falsify is also pretty well-behaved when we want to traverse a prefix of a trie. Take this toList function:
trieToList :: Trie a -> [a]
trieToList t = go [t] where
go ts = [a | Trie a _ _ <- ts]
++ go (do {Trie _ l r <- ts; [l, r]})
It's a standard breadth-first traversal, in linear time. The traversal time remains linear when we compute take n $ trieToList (falsify t is), since falsify incurs at most n + length (filter (<n) is) extra comparisons, which is at most 2 * n, assuming strictly increasing is.
(side note: the space requirement of breadth-first traversal is rather painful, but I can't see a simple way to help it, since iterative deepening is even worse here, because there the whole tree must be held in memory, while bfs only has to remember the bottom level of the tree).
One way to represent this is as a function.
true = const True
falsify ls is = \i -> not (i `elem` is) && ls i
check ls i = ls i
The true and falsify functions are nice and efficient. The check function can be as bad as linear. It's possible to improve the efficiency of the same basic idea. I like its elegance.
Related
I'd like to create a list data structure that can zipWith that has a better behavior with self reference. This is for an esoteric language that will rely on self reference and laziness to be Turing complete using only values (no user functions). I've already created it, called Atlas but it has many built ins, I'd like to reduce that and be able to compile/interpret in Haskell.
The issue is that zipWith checks if either list is empty and returns empty. But in the case that this answer depends on the result of zipWith then it will loop infinitely. Essentially I'd like it to detect this case and have faith that the list won't be empty. Here is an example using DList
import Data.DList
import Data.List (uncons)
zipDL :: (a->b->c) -> DList a -> DList b -> DList c
zipDL f a b = fromList $ zipL f (toList a) (toList b)
zipL :: (a->b->c) -> [a] -> [b] -> [c]
zipL _ [] _ = []
zipL _ _ [] = []
zipL f ~(a:as) ~(b:bs) = f a b : zipL f as bs
a = fromList [5,6,7]
main=print $ dh where
d = zipDL (+) a $ snoc (fromList dt) 0
~(Just (dh,dt)) = uncons $ toList d
This code would sum the list 5,6,7 except for the issue. It can be fixed by removing zipL _ _ [] = [] because then it assumes that the result won't be empty and then it in fact turns out not to be empty. But this is a bad solution because we can't always assume that it is the second list that could have the self reference.
Another way of explaining it is if we talk about the sizes of these list.
The size of zip a b = min (size a) (size b)
So in this example: size d = min (size a) (size d-1+1)
But there in lies the problem, if the size of d is 0, then the size of d = 0, but if size of d is 1 the size is 1, however once the size of d is said to be greater than size of a, then the size would be a, which is a contradiction. But any size 0-a works which means it is undefined.
Essentially I want to detect this case and make the size of d = a.
So far the only thing I have figured out is to make all lists lists of Maybe, and terminate lists with a Nothing value. Then in the application of the zipWith binary function return Nothing if either value is Nothing. You can then take out both of the [] checks in zip, because you can think of all lists as being infinite. Finally to make the summation example work, instead of doing a snoc, do a map, and replace any Nothing value with the snoc value. This works because when checking the second list for Nothing, it can lazily return true, since no value of the second list can be nothing.
Here is that code:
import Data.Maybe
data L a = L (Maybe a) (L a)
nil :: L a
nil = L Nothing nil
fromL :: [a] -> L a
fromL [] = nil
fromL (x:xs) = L (Just x) (fromL xs)
binOpMaybe :: (a->b->c) -> Maybe a -> Maybe b -> Maybe c
binOpMaybe f Nothing _ = Nothing
binOpMaybe f _ Nothing = Nothing
binOpMaybe f (Just a) (Just b) = Just (f a b)
zip2W :: (a->b->c) -> L a -> L b -> L c
zip2W f ~(L a as) ~(L b bs) = L (binOpMaybe f a b) (zip2W f as bs)
unconsL :: L a -> (Maybe a, Maybe (L a))
unconsL ~(L a as) = (a, Just as)
mapOr :: a -> L a -> L a
mapOr v ~(L a as) = L (Just $ fromMaybe v a) $ mapOr v as
main=print $ h
where
a = fromL [4,5,6]
b = zip2W (+) a (mapOr 0 (fromJust t))
(h,t) = unconsL $ b
The downside to this approach is it needs this other operator to map with Just . fromMaybe initialvalue. This is a less intuitive operator than ++. And without it the language could be built entirely on ++ uncons and (:[]) which would be pretty neat.
The other thing I've figured out is in the current ruby implementation to throw an error when a value depends on itself, and catch it in the empty list detection. But this is vary hacky and not entirely sound, although it does work for cases like this. I don't think this can work in Haskell since I don't think you can detect self dependence?
Sorry for the long description and the very odd use case. I've spent tons of time thinking about this, but haven't solved it yet and can't explain it any more succinctly! Not expecting an answer but figured it is worth a shot, thanks for considering.
EDIT:
After seeing it framed as a greatest fixed point question, it seems like a poor question because there is no efficient general solution to such a problem. For example, suppose the code was b = zipWith (+) a (if length b < 1 then [1] else []).
For my purposes it could still be nice to handle some cases correctly - the example provided does have a solution. So I could reframe the question as: when can we find the greatest fixed point efficiently and what is that fixed point? But I believe there is no simple answer to such a question, and so it would be a poor basis for a programming language to rely on ad hoc rules.
Sounds like you want a greatest fixed point. I'm not sure I've seen this done before, but maybe it's possible to make a sensible type class for types that support those.
class GF a where gfix :: (a -> a) -> a
instance GF a => GF [a] where
gfix f = case (f (repeat undefined), f []) of
(_:_, _) -> b:bs where
b = gfix (\a' -> head (f (a':bs)))
bs = gfix (\as' -> tail (f (b:as')))
([], []) -> []
_ -> error "no fixed point greater than bottom exists"
-- use the usual least fixed point. this ain't quite right, but
-- it works for this example, and maybe it's Good Enough
instance GF Int where gfix f = let x = f x in x
Try it out in ghci:
> gfix (\xs -> zipWith (+) [5,6,7] (tail xs ++ [0])) :: [Int]
[18,13,7]
This implementation isn't particularly efficient; e.g. replacing [5,6,7] with [1..n] results in a runtime that's quadratic in n. Perhaps with some cleverness that can be improved, but it's not immediately obvious to me how that would go.
I have an answer for this specific case, not general.
appendRepeat :: a -> [a] -> [a]
appendRepeat v a = h : appendRepeat v t
where
~(h,t) =
if null a
then (v,[])
else (head a,tail a)
a = [4,5,6]
main=print $ head b
where
b = zipWith (+) a $ appendRepeat 0 (tail b)
appendRepeat adds a an infinite list of a repeated value to the end of a list. But the key thing about it is it doesn't check if list is empty or not when deciding that it is returning a non empty list where the tail is a recursive call. This way laziness never ends up in an infinite loop checking the zipWith _ [] case.
So this code works, and for the purposes of the original question, it can be used to convert the language to just using 2 simple functions (++ and :[]). But the interpreter would need to do some static analysis for appending a repeated value and replace it to using this special appendRepeat function (which can easily be done in Atlas). It seems hacky to only make this one implementation switcharoo, but that is all that is needed.
Did I implement inorder level-order tree transversal using tail-recursion correctly?
inorder (Leaf n) temp = n:temp
inorder (Node (n, left, right)) temp = inorder left (n:inorder right temp)
inorder :: Tree a -> [a] -> [a]
Tree is declared as
data Tree a = Leaf a | Node (a, Tree a, Tree a) deriving Show
and returns
[2,1,3] on call inorder three [] where three = Node (1, Leaf 2, Leaf 3)
This technically isn't tail recursive because you have a recursive call inorder right temp in a nontail position. One way to fix this would be with continuations. You write a function which takes an accumulator like before, but rather than the accumulator being just a list it's actually a function representing the work left to do in the computation. This means that instead of making a non-tail call and just returning, we can always tail call because the context we need is saved to the continuation.
inorder = go id
where go :: ([a] -> r) -> Tree a -> r
go k Leaf = k []
go k (Node a l r) = go l (\ls -> go r (\rs -> k $ ls ++ n : rs))
Here every call is a tail call as required but it's quite innefficient because it requires a ++ operation at every level, pushing us into quadratic costs. A more efficient algorithm would avoid building up an explicit list and instead build up a difference list, delaying the construction on the concrete structure and giving a more efficient algorithm
type Diff a = [a] -> [a] -- A difference list is just a function
nil :: Diff a
nil xs = xs
cons :: a -> Diff a -> Diff a
cons a d = (:) a . d
append :: Diff a -> Diff a -> Diff a
append xs ys = xs . ys
toList :: Diff a -> a
toList xs = xs []
Note that all of these operations are O(1) except for toList which is O(n) in the number of entries. The important point here is that diff lists are cheap and easy to append so we'll construct these in our algorithm and construct the concrete list at the very end
inorder = go toList
where go :: (Diff a -> r) -> Tree a -> r
go k Leaf = k nil
go k (Node a l r) =
go l (\ls -> go r (\rs -> k $ ls `append` cons n rs))
And now, through gratuitous application of functions we've gotten a completely unidiomatic Haskell program. You see in Haskell we don't really care about tail calls because we generally want to handle infinite structures correctly and that's not really possible if we demand everything be tail recursive. In fact, I would say that while not tail recursive, the code you originally had is the most idiomatic, that's even how it's implemented in Data.Set! It has the property that we can lazily consume the result of that toList and it will work with us and lazily process the tree. So in your implementation, something like
min :: Tree a -> a
min = listToMaybe . toList
is going to be pretty darn close to how you would implement it by hand efficiency wise! It will not construct traverse the whole tree first like my version will have to. These sort of compositional effects of laziness pay more dividends in real Haskell code than syntactically making our code use only tail calls (which does nothing to actually guarantee space usage anyways).
EDIT: while I'm still interested in an answer on the problems the execution faces in this case, it appears that it was indeed related to strictness since a -O fixes the execution and the program can handle the tree really quickly.
I'm currently working on the 67th problem of Project Euler.
I already solved it using simple lists and dynamic programming.
I'd like to solve it now using a tree datastructure (well, where a Node can have two parents so it's not really a tree). I thought I'd use a simple tree but would take care to craft it so that Nodes are shared when appropriate:
data Tree a = Leaf a | Node a (Tree a) (Tree a) deriving (Show, Eq)
Solving the problem is then just a matter of going through the tree recursively:
calculate :: (Ord a, Num a) => Tree a => a
calculate (Node v l r) = v + (max (calculate l) (calculate r))
calculate (Leaf v) = v
Obviously this has exponential time complexity though. So I tried to memoize the results with :
calculate :: (Ord a, Num a) => Tree a => a
calculate = memo go
where go (Node v l r) = v + (max (calculate l) (calculate r))
go (Leaf v) = v
where memo comes from Stable Memo. Stable Memo is supposed to memoize based on whether or not it has seen the exact same arguments (as in, same in memory).
So I used ghc-vis to see if my tree was correctly sharing nodes to avoid recomputation of things already computed in another branch.
On the sample tree produced by my function as such: lists2tree [[1], [2, 3], [4, 5, 6]], it returns the following correct sharing:
(source: crydee.eu)
Here we can see that the node 5 is shared.
Yet it seems that my tree in the actual Euler Problem isn't getting memoized correctly.
The code is available on github, but I guess that apart from the calculate method above, the only other important method is the one that creates the tree. Here it is:
lists2tree :: [[a]] -> Tree a
lists2tree = head . l2t
l2t :: [[a]] -> [Tree a]
l2t (xs:ys:zss) = l2n xs ts t
where (t:ts) = l2t (ys:zss)
l2t (x:[]) = l2l x
l2t [] = undefined
l2n :: [a] -> [Tree a] -> Tree a -> [Tree a]
l2n (x:xs) (y:ys) p = Node x p y:l2n xs ys y
l2n [] [] _ = []
l2n _ _ _ = undefined
l2l :: [a] -> [Tree a]
l2l = map (\l -> Leaf l)
It basically goes through the list of lists two rows at a time and then creates nodes from bottom to top recursively.
What is wrong with this approach? I thought it might that the program will still produce a complete tree parse in thunks before getting to the leaves and hence before memoizing, avoiding all the benefits of memoization but I'm not sure it's the case. If it is, is there a way to fix it?
This doesn't really address the original question, but I find it is usually easier and more powerful to use explicit memoization.
I chose to store the triangle as a list indexed by a position rather than a tree:
[ ((1,1),3),
((2,1),7), ((2,2), 4),
....
Suppose that part of the result has already been memoized in a list of this format. Then computing the answer at a particular coordinate is trivial:
a # i = let Just v = lookup i a in v
compute tree result (x,y) = tree # (x,y) + max (result # (x+1,y)) (result # (x+1,y+1))
Now we must build result. This is also trivial; all we have to do is map compute over all valid indices.
euler67 :: [((Int, Int), Integer)] -> Integer
euler67 tree = result # (1,1)
where
xMax = maximum $ map (fst . fst) tree
result = [ ((x,y), compute (x,y)) | x <- [1 .. xMax], y <- [1..x] ]
++ [ ((xMax + 1,y),0) | y <- [1..xMax + 1]]
compute (x,y) = tree # (x,y) + max (result # (x+1,y)) (result # (x+1,y+1))
Computing height of the triangle (xMax) is just getting the maximum x-index. Of course we are assuming that the tree is well formed.
The only remotely complicated part is determining which indices are valid for result. Obviously we need 1 row for every row in the original tree. Row x will have x items. We also add an extra row of zeroes at the bottom - we could handle the base case in a special way in compute but it is probably easier this way.
You'll notice that is is quite slow for the hundred row triangle. This is because lookup is traversing three lists per call to compute. To speed it up I used arrays:
euler67' :: Array (Int, Int) Integer -> Integer
euler67' tree = result ! (1,1)
where
((xMin, yMin), (xMax, yMax)) = bounds tree
result = accumArray (+) 0 ((xMin, yMin), (xMax + 1, yMax + 1)) $
[ ((x,y), compute (x,y)) | x <- [xMin .. xMax], y <- [yMin..x] ]
++ [ ((xMax + 1,y),0) | y <- [yMin..xMax + 1]]
compute (x,y) = tree ! (x,y) + max (result ! (x+1,y)) (result ! (x+1,y+1))
Also here is the code I used for reading the files:
readTree' :: String -> IO (Array (Int, Int) Integer)
readTree' path = do
tree <- readTree path
let
xMax = maximum $ map (fst . fst) tree
yMax = maximum $ map (snd . fst) tree
return $ array ((1,1), (xMax,yMax)) tree
readTree :: String -> IO [((Int, Int), Integer)]
readTree path = do
s <- readFile path
return $ map f $ concat $ zipWith (\n xs -> zip (repeat n) xs) [1..] $ map (zip [1..] . map read . words) $ lines s
where
f (a, (b, c)) = ((a,b), c)
I am a new comer to the Haskell world and I am wondering if there is something like this:
data IndexedList a = IList Int [a]
findIndex::(Int->Int)->IndexedList a->(a,IndexedList a)
findIndex f (IList x l) = (l!!(f x), IList (f x) l)
next::IndexedList a->(a,IndexedList a)
next x = findIndex (+1) x
I've noticed that this kind of list is not purely functional but kind of useful for some applications. Should it be considered harmful?
Thanks,
Bob
It's certainly useful to have a list that comes equipped with a pointed to a particular location in the list. However, the way it's usually done in Haskell is somewhat different - rather than using an explicit pointer, we tend to use a zipper.
The list zipper looks like this
data ListZipper a = LZ [a] a [a] deriving (Show)
You should think of the middle field a as being the element that is currently pointed to, the first field [a] as being the elements before the current position, and the final field [a] as being the elements after the current position.
Usually we store the elements before the current one in reverse order, for efficiency, so that the list [0, 1, 2, *3*, 4, 5, 6] with a pointer to the middle element, would be stored as
LZ [2,1,0] 3 [4,5,6]
You can define functions that move the pointer to the left or right
left (LZ (a:as) b bs) = LZ as a (b:bs)
right (LZ as a (b:bs)) = LZ (a:as) b bs
If you want to move to the left or right n times, then you can do that with the help of a function that takes another function, and applies it n times to its argument
times n f = (!!n) . iterate f
so that to move left three times, you could use
>> let lz = LZ [2,1,0] 3 [4,5,6]
>> (3 `times` left) lz
LZ [] 0 [1,2,3,4,5,6]
Your two functions findIndex and next can be written as
next :: ListZipper a -> (a, ListZipper a)
next = findIndex 1
findIndex :: Int -> ListZipper a -> (a, ListZipper a)
findIndex n x = let y#(LZ _ a _) = (n `times` right) x in (a, y)
Contrary to what you think this list is in fact purely functional. The reason is that IList (f x) l creates a new list (and does not, as you may think, modify the current IndexedList). It is in general not that easy to create non-purely functional data structures or functions in Haskell, as long as you stay away from unsafePerformIO.
The reason I would recommend against using the IndexedList is that there is no assurance that the index is less than the length of the list. In this case the lookup l!!(f x) will fail with an exception, which is generally considered bad style in Haskell. An alternative could be to use a safe lookup, which returns a Maybe a like the following:
findIndex :: (Int -> Int) -> IndexedList a -> (Maybe a, IndexedList a)
findIndex f (IList i l) = (maybe_x, IList new_i l)
where
new_i = f i
maybe_x = if new_i < length l
then Just (l !! newI)
else Nothing
I can also not think of a usecase where such a list would be useful, but I guess I am limited by my creativity ;)
I've coded up the 0-1 Knapsack problem in Haskell. I'm fairly proud about the laziness and level of generality achieved so far.
I start by providing functions for creating and dealing with a lazy 2d matrix.
mkList f = map f [0..]
mkTable f = mkList (\i -> mkList (\j -> f i j))
tableIndex table i j = table !! i !! j
I then make a specific table for a given knapsack problem
knapsackTable = mkTable f
where f 0 _ = 0
f _ 0 = 0
f i j | ws!!i > j = leaveI
| otherwise = max takeI leaveI
where takeI = tableIndex knapsackTable (i-1) (j-(ws!!i)) + vs!!i
leaveI = tableIndex knapsackTable (i-1) j
-- weight value pairs; item i has weight ws!!i and value vs!!i
ws = [0,1,2, 5, 6, 7] -- weights
vs = [0,1,7,11,21,31] -- values
And finish off with a couple helper functions for looking at the table
viewTable table maxI maxJ = take (maxI+1) . map (take (maxJ+1)) $ table
printTable table maxI maxJ = mapM_ print $ viewTable table maxI maxJ
This much was pretty easy. But I want to take it a step further.
I want a better data structure for the table. Ideally, it should be
Unboxed (immutable) [edit] never mind this
Lazy
Unbounded
O(1) time to construct
O(1) time complexity for looking up a given entry,
(more realistically, at worst O(log n), where n is i*j for looking up the entry at row i, column j)
Bonus points if you can explain why/how your solution satisfies these ideals.
Also bonus points if you can further generalize knapsackTable, and prove that it is efficient.
In improving the data structure you should try to satisfy the following goals:
If I ask for the solution where the maximum weight is 10 (in my current code, that would be indexTable knapsackTable 5 10, the 5 means include items 1-5) only the minimal amount of work necessary should be performed. Ideally this means no O(i*j) work for forcing the spine of each row of the table to necessary column length. You could say this isn't "true" DP, if you believe DP means evaluating the entirety of the table.
If I ask for the entire table to be printed (something like printTable knapsackTable 5 10), the values of each entry should be computed once and only once. The values of a given cell should depend on the values of other cells (DP style: the idea being, never recompute the same subproblem twice)
Ideas:
Data.Array bounded :(
UArray strict :(
Memoization techniques (SO question about DP in Haskell) this might work
Answers that make some compromises to my stated ideals will be upvoted (by me, anyways) as long as they are informative. The answer with the least compromises will probably be the "accepted" one.
First, your criterion for an unboxed data structure is probably a bit mislead. Unboxed values must be strict, and they have nothing to do with immutability. The solution I'm going to propose is immutable, lazy, and boxed. Also, I'm not sure in what way you are wanting construction and querying to be O(1). The structure I'm proposing is lazily constructed, but because it's potentially unbounded, its full construction would take infinite time. Querying the structure will take O(k) time for any particular key of size k, but of course the value you're looking up may take further time to compute.
The data structure is a lazy trie. I'm using Conal Elliott's MemoTrie library in my code. For genericity, it takes functions instead of lists for the weights and values.
knapsack :: (Enum a, Num w, Num v, Num a, Ord w, Ord v, HasTrie a, HasTrie w) =>
(a -> w) -> (a -> v) -> a -> w -> v
knapsack weight value = knapsackMem
where knapsackMem = memo2 knapsack'
knapsack' 0 w = 0
knapsack' i 0 = 0
knapsack' i w
| weight i > w = knapsackMem (pred i) w
| otherwise = max (knapsackMem (pred i) w)
(knapsackMem (pred i) (w - weight i)) + value i
Basically, it's implemented as a trie with a lazy spine and lazy values. It's bounded only by the key type. Because the entire thing is lazy, its construction before forcing it with queries is O(1). Each query forces a single path down the trie and its value, so it's O(1) for a bounded key size O(log n). As I already said, it's immutable, but not unboxed.
It will share all work in the recursive calls. It doesn't actually allow you to print the trie directly, but something like this should not do any redundant work:
mapM_ (print . uncurry (knapsack ws vs)) $ range ((0,0), (i,w))
Unboxed implies strict and bounded. Anything 100% Unboxed cannot be Lazy or Unbounded. The usual compromise is embodied in converting [Word8] to Data.ByteString.Lazy where there are unboxed chunks (strict ByteString) which are linked lazily together in an unbounded way.
A much more efficient table generator (enhanced to track individual items) could be made using "scanl", "zipWith", and my "takeOnto". This effectively avoid using (!!) while creating the table:
import Data.List(sort,genericTake)
type Table = [ [ Entry ] ]
data Entry = Entry { bestValue :: !Integer, pieces :: [[WV]] }
deriving (Read,Show)
data WV = WV { weight, value :: !Integer }
deriving (Read,Show,Eq,Ord)
instance Eq Entry where
(==) a b = (==) (bestValue a) (bestValue b)
instance Ord Entry where
compare a b = compare (bestValue a) (bestValue b)
solutions :: Entry -> Int
solutions = length . filter (not . null) . pieces
addItem :: Entry -> WV -> Entry
addItem e wv = Entry { bestValue = bestValue e + value wv, pieces = map (wv:) (pieces e) }
-- Utility function for improve
takeOnto :: ([a] -> [a]) -> Integer -> [a] -> [a]
takeOnto endF = go where
go n rest | n <=0 = endF rest
| otherwise = case rest of
(x:xs) -> x : go (pred n) xs
[] -> error "takeOnto: unexpected []"
improve oldList wv#(WV {weight=wi,value = vi}) = newList where
newList | vi <=0 = oldList
| otherwise = takeOnto (zipWith maxAB oldList) wi oldList
-- Dual traversal of index (w-wi) and index w makes this a zipWith
maxAB e2 e1 = let e2v = addItem e2 wv
in case compare e1 e2v of
LT -> e2v
EQ -> Entry { bestValue = bestValue e1
, pieces = pieces e1 ++ pieces e2v }
GT -> e1
-- Note that the returned table is finite
-- The dependence on only the previous row makes this a "scanl" operation
makeTable :: [Int] -> [Int] -> Table
makeTable ws vs =
let wvs = zipWith WV (map toInteger ws) (map toInteger vs)
nil = repeat (Entry { bestValue = 0, pieces = [[]] })
totW = sum (map weight wvs)
in map (genericTake (succ totW)) $ scanl improve nil wvs
-- Create specific table, note that weights (1+7) equal weight 8
ws, vs :: [Int]
ws = [2,3, 5, 5, 6, 7] -- weights
vs = [1,7,8,11,21,31] -- values
t = makeTable ws vs
-- Investigate table
seeTable = mapM_ seeBestValue t
where seeBestValue row = mapM_ (\v -> putStr (' ':(show (bestValue v)))) row >> putChar '\n'
ways = mapM_ seeWays t
where seeWays row = mapM_ (\v -> putStr (' ':(show (solutions v)))) row >> putChar '\n'
-- This has two ways of satisfying a bestValue of 8 for 3 items up to total weight 5
interesting = print (t !! 3 !! 5)
Lazy storable vectors: http://hackage.haskell.org/package/storablevector
Unbounded, lazy, O(chunksize) time to construct, O(n/chunksize) indexing, where chunksize can be sufficiently large for any given purpose. Basically a lazy list with some significant constant factor benifits.
To memoize functions, I recommend a library like Luke Palmer's memo combinators. The library uses tries, which are unbounded and have O(key size) lookup. (In general, you can't do better than O(key size) lookup because you always have to touch every bit of the key.)
knapsack :: (Int,Int) -> Solution
knapsack = memo f
where
memo = pair integral integral
f (i,j) = ... knapsack (i-b,j) ...
Internally, the integral combinator probably builds an infinite data structure
data IntTrie a = Branch IntTrie a IntTrie
integral f = \n -> lookup n table
where
table = Branch (\n -> f (2*n)) (f 0) (\n -> f (2*n+1))
Lookup works like this:
lookup 0 (Branch l a r) = a
lookup n (Branch l a r) = if even n then lookup n2 l else lookup n2 r
where n2 = n `div` 2
There are other ways to build infinite tries, but this one is popular.
Why won't you use Data.Map putting the other Data.Map into it? As far as I know it's quite fast.
It wouldn't be lazy though.
More than that, you can implement Ord typeclass for you data
data Index = Index Int Int
and put a two dimensional index directly as a key. You can achieve laziness by generating this map as a list and then just use
fromList [(Index 0 0, value11), (Index 0 1, value12), ...]