Haskell Knapsack - haskell

I've written an answer to the bounded knapsack problem with one of each item in Scala, and tried transposing it to Haskell with the following result:
knapsack :: [ ( Int, Int ) ] -> [ ( Int, Int ) ] -> Int -> [ ( Int, Int ) ]
knapsack xs [] _ = xs
knapsack xs ys max =
foldr (maxOf) [ ] [ knapsack ( y : xs ) ( filter (y /=) ys ) max | y <- ys
, weightOf( y : xs ) <= max ]
maxOf :: [ ( Int, Int ) ] -> [ ( Int, Int ) ] -> [ ( Int, Int ) ]
maxOf a b = if valueOf a > valueOf b then a else b
valueOf :: [ ( Int, Int ) ] -> Int
valueOf [ ] = 0
valueOf ( x : xs ) = fst x + valueOf xs
weightOf :: [ ( Int, Int ) ] -> Int
weightOf [ ] = 0
weightOf ( x : xs ) = snd x + weightOf xs
I'm not looking for tips on how to clean up the code, just to get it working. To my knowledge it should be doing the following:
For each tuple option (in ys)
if the weight of the current tuple (y) and the running total (xs) combined is less than the capacity
get the optimal knapsack that contains the current tuple and the current total (xs), using the available tuples (in ys) less the current tuple
Finally, get the most valuable of these results and return it
*Edit: * Sorry, forgot to say what's wrong... So it compiles alright, but it gives the wrong answer. For the following inputs, what I expect and what it produces:
knapsack [] [(1,1),(2,2)] 5
Expect: [(1,1),(2,2)]
Produces: [(1,1),(2,2)]
knapsack [] [(1,1),(2,2),(3,3)] 5
Expect: [(2,2),(3,3)]
Produces: []
knapsack [] [(2,1),(3,2),(4,3),(6,4)] 5
Expect: [(2,1),(6,4)]
Produces: []
So I was wondering what could be the cause of the discrepancy?
The solution, thanks to sepp2k:
ks = knapsack []
knapsack :: [ ( Int, Int ) ] -> [ ( Int, Int ) ] -> Int -> [ ( Int, Int ) ]
knapsack xs [] _ = xs
knapsack xs ys max =
foldr (maxOf) [ ] ( xs : [ knapsack ( y : xs ) ( ys #- y ) max
| y <- ys, weightOf( y : xs ) <= max ] )
(#-) :: [ ( Int, Int ) ] -> ( Int, Int ) -> [ ( Int, Int ) ]
[ ] #- _ = [ ]
( x : xs ) #- y = if x == y then xs else x : ( xs #- y )
maxOf :: [ ( Int, Int ) ] -> [ ( Int, Int ) ] -> [ ( Int, Int ) ]
maxOf a b = if valueOf a > valueOf b then a else b
valueOf :: [ ( Int, Int ) ] -> Int
valueOf [ ] = 0
valueOf ( x : xs ) = fst x + valueOf xs
weightOf :: [ ( Int, Int ) ] -> Int
weightOf [ ] = 0
weightOf ( x : xs ) = snd x + weightOf xs
Which returns the expected results, above.

Your first case fires when ys contains. so for knapsack [foo,bar] [] 42, you get back [foo, bar], which is what you want. However it does not fire when ys contains nothing except elements that would put you over the max weight, i.e. knapsack [(x, 20), (y,20)] [(bla, 5)] will return [] and thus discard the previous result. Since this is not what you want you should adjust your cases so that the second case only fires if there's at least one element in ys that's below the max weight.
One way to do that would be to throw out any elements that put you over the max weight when recursing, so that that scenario simply can't happen.
Another way would be to switch the order of the cases and add a guard to the first case that says that ys must contain at least one element that does not put you over the total weight (and adjust the other case to not require ys to be empty).
PS: Another, unrelated problem with your code is that it ignores duplicates. I.e. if you use it on the list [(2,2), (2,2)] it will act as if the list was just [(2,2)] because filter (y /=) ys will throw out all occurrences of y, not just one.

Some improvements on your working version:
import Data.List
import Data.Function(on)
ks = knapsack []
knapsack :: [(Int, Int)] -> [(Int, Int)] -> Int -> [(Int, Int)]
knapsack xs [] _ = xs
knapsack xs ys max =
foldr (maxOf) [] (xs: [knapsack (y:xs) (delete y ys) max
| y <- ys, weightOf(y:xs) <= max ] ) where
weightOf = sum . map snd
maxOf :: [(Int, Int)] -> [(Int, Int)] -> [(Int, Int)]
maxOf a b = maximumBy (compare `on` valueOf) [a,b] where
valueOf = sum . map fst

Might I suggest using a dynamic programming approach? This way of solving 0-1 knapsack problems are almost painfully slow, at least when the amount of variables gets larger than around 20. While it's simple, it's just too ineffective. Here's my shot at it:
import Array
-- creates the dynamic programming table as an array
dynProgTable (var,cap) = a where
a = array ((0,0),(length var,cap)) [ ((i,j), best i j)
| i <- [0..length var] , j <- [0..cap] ] where
best 0 _ = 0
best _ 0 = 0
best i j
| snd (var !! (i-1)) > j = a!decline
| otherwise = maximum [a!decline,value+a!accept]
where decline = (i-1,j)
accept = (i-1,j - snd (var !! (i-1)))
value = fst (var !! (i-1))
--Backtracks the solution from the dynamic programming table
--Output on the form [Int] where i'th element equals 1 if
--i'th variable was accepted, 0 otherwise.
solve (var,cap) =
let j = cap
i = length var
table = dynProgTable (var,cap)
step _ 0 _ = []
step a k 0 = step table (k-1) 0 ++ [0]
step a k l
| a!(k,l) == a!(k-1,l) = step a (k-1) l ++ [0]
| otherwise = step a (k-1) (l - snd (var !! (k-1))) ++ [1]
in step table i j
In the input (var,cap), var is a list of variables in the form of 2-tuples (c,w), where c is the cost and w is the weight. cap is the maximum weight allowance.
I'm sure above code could be cleaned up to make it more readable and obvious, but that's how it turned out for me :) Where the code snippet by Landei above is short, my computer took ages computing instances with only 20 variables. The dynamic programming approach above gave me a solution for 1000 variables faster.
If you don't know about dynamic programming, you should check out this link:Lecture slides on dynamic programming, it helped me a lot.
For an introduction to arrays, check out Array tutorial.

Related

List Comprehension with list of lists

I have problems to understand the following piece of code:
treePositions :: Tree a -> [[Int]]
treePositions (Node _ ts) =
[] : [ (i : is ) | i <- [0..(length ts - 1)],
is <- treePositions (index ts i) ]
This function would calculate any valid paths to a position in the given tree, where for every node the edges are marked with 0..lastOutgoingEdge.
If I understood it right the index function would return the node at Index i in the node list of the tree.
index :: [a] -> Int -> a
index :: [a] -> Int -> a
index [] i = error "invalid index"
index (x:xs) 0 = x
index (x:xs) i = ith xs (i-1)
Now for given trees:
t1 = Node "a" [
Node "b" [
Node "c"[],
Node "d"[]
]
]
t2 = Node "z" []
the function would return:
treePositions t1 == [ [], [0], [0,0], [0,1] ]
treePositions t2 == [ [] ]
What I don't understand is this part:
[] : [ (i : is ) | i <- [0..(length ts - 1)],is <- treePositions (index ts i) ]
My thoughts:
If I have x|x <- [0..10], x < 2. This would translate in "take every x in [0..10] for which x < 2 applies. So if I would take every i in [0..(length ts - 1)], how would this then return lists for a condition?
In your case, the part after the comma is not a condition, but a second generator. Simplified, the notation looks like this:
> [ (a, b) | a <- [1..3], b <- [1..2] ]
> [(1,1),(1,2),(2,1),(2,2),(3,1),(3,2)]
The example compherension above means the following:
go through list [1..3], and for every element of it,
go through list [1..2], and for every element of it,
produce a tuple
Further, a generator can depend on elements of previous generators, e.g.:
> [ (a, b) | a <- [1..3], b <- [1..a] ]
> [(1,1),(2,1),(2,2),(3,1),(3,2),(3,3)]
So in your case:
[ (i : is ) | i <- [0..(length ts - 1)],is <- treePositions (index ts i) ]
the logic is this:
for every i in 0..(length ts - 1),
go through every is in treePositions (index ts i),
and produce i : is as result

counting frequencies in a collection of bins

I need to count values inbetween values in a list i.e. [135,136,138,140] would count all the numbers between 135-136,136-138,138-140. with the input list [135.2,135.3,137,139] would out put[2,1,1] using type [Float] [Float] [Int]. So far I have:
heightbetween :: Float -> Float -> [Float] -> Int
heightbetween _ _ [] = 0
heightbetween n s (x:xs)
| (n < x) && (s > x) = 1 + (heightbetween n s xs)
| otherwise = heightbetween n s xs
count :: [Float] -> [Float] -> [Int]
count [] [] = []
count [x,y] = [(x,y)]
count (x:y:ys) = (x,y):count (y:ys)
forEach fun lst = heightbetween op ([],lst)
where
op (start,[]) = Nothing
op (start,a:as) = Just (start++(fun a):as
,(start++[a],as))
forPairs fun lst lst2 = map (map fst)
$ forEach (\(a,b)->(fun a b,b))
$ zip lst lst2
Your count looks strange. It should be like this:
-- count -> ranges -> data -> [counts]
count :: [Float] -> [Float] -> [Int]
count [] _ = [] -- no ranges given -> empty list
count [_] _ = [] -- no ranges, but single number -> empty list
count _ [] = [] -- no data given -> empty list
count (x:y:xs) d =
(heightbetween x y d) : count (y:xs) d
heightbetween :: Float -> Float -> [Float] -> Int
heightbetween _ _ [] = 0
heightbetween n s (x:xs)
| (n < x) && (s > x) = 1 + (heightbetween n s xs)
| otherwise = heightbetween n s xs
The other lines are obsolete.
Then invoking
count [135,136,138,140] [135.2,135.3,137,139]
gives
[2,1,1]
First, make sure that your range list is in order....
rangePoints = [135,136,138,140]
orderedRangePoints = sort rangePoints
Next, you will find it much easier to work with actual ranges (which you can represent using a 2-tuple (low,high))
ranges = zip orderedRangePoints $ tail orderedRangePoints
You will need an inRange function (one already exists in Data.Ix, but unfortunately it includes the upperbound, so you can't use it)
inRange (low,high) val | val >= low && val < high = True
inRange _ _ = False
You will also want to order your input points
theData = sort [135.2,135.3,137,139]
With all of this out of the way, the binCount function is easy to write.
binCount'::[(Float, Float)]->[Float]->[Int]
binCount' [] [] = []
binCount' (range:rest) vals =
length valsInRange:binCount' rest valsAboveRange
where
(valsInRange, valsAboveRange) = span (`inRange` range) vals
Notice, that I defined a function called binCount', not binCount. I did this, because I consider this an unsafe function, because it only works on ordered ranges and values.... You should finalize this by writing a safer binCount function, which puts all of the stuff above in its where clause. You should probably add all the types and some error checking also (what happens if a value is outside of all ranges?).

Haskell: Generating k-itemsets for apriori

I am trying to generate all k-item sets for use in apriori, I am following this pseudocode:
L1= {frequent items};
for (k= 2; Lk-1 !=∅; k++) do begin
Ck= candidates generated from Lk-1 (that is: cartesian product Lk-1 x Lk-1 and eliminating any
k-1 size itemset that is not frequent);
for each transaction t in database do
increment the count of all candidates in
Ck that are contained in t
Lk = candidates in Ck with min_sup
end
return U_k Lk;
,here is the code I have:
-- d transactions, threshold
kItemSets d thresh = kItemSets' 2 $ frequentItems d thresh
where
kItemSets' _ [] = [[]]
kItemSets' k t = ck ++ (kItemSets' (k+1) ck)
where
-- those (k-1) length sets that meet the threshold of being a subset of the transactions in d
ck = filter (\x->(countSubsets x d) >= thresh) $ combinations k t
-- length n combinations that can be made from xs
combinations 0 _ = [[]]
combinations _ [] = []
combinations n xs#(y:ys)
| n < 0 = []
| otherwise = case drop (n-1) xs of
[ ] -> []
[_] -> [xs]
_ -> [y:c | c <- combinations (n-1) ys]
++ combinations n ys
-- those items of with frequency o in the dataset
frequentItems xs o = [y| y <- nub cs, x<-[count y cs], x >= o]
where
cs = concat xs
isSubset a b = not $ any (`notElem` b) a
-- Count how many times the list y appears as a subset of a list of lists xs
countSubsets y xs = length $ filter (isSubset y ) xs
count :: Eq a => a -> [a] -> Int
count x [] = 0
count x (y:ys) | x == y = 1+(count x ys)
| otherwise = count x ys
transactions =[["Butter", "Biscuits", "Cream", "Newspaper", "Bread", "Chocolate"],
["Cream", "Newspaper", "Tea", "Oil", "Chocolate"] ,
["Chocolate", "Cereal", "Bread"],
["Chocolate", "Flour", "Biscuits", "Newspaper"],
["Chocolate", "Biscuits", "Newspaper"] ]
But when I compile I get the error:
apriori.hs:5:51:
Occurs check: cannot construct the infinite type: a0 = [a0]
Expected type: [a0]
Actual type: [[a0]]
In the second argument of kItemSets', namely `ck'
In the second argument of `(++)', namely `(kItemSets' (k + 1) ck)'
Failed, modules loaded: none.
But when I run from ghci:
*Main> mapM_ print $ filter (\x->(countSubsets x transactions ) >= 2 ) $ combinations 2 $ frequentItems transactions 2
["Biscuits","Newspaper"]
["Biscuits","Chocolate"]
["Cream","Newspaper"]
["Cream","Chocolate"]
["Newspaper","Chocolate"]
["Bread","Chocolate"]
Which is correct, since it's those 2-item sets that meet the occurrence threshold in the set of transactions. But what I need for the 3-item sets is
[["Biscuits", "Chocolate", "Newspaper" ],
["Chocolate", "Cream", "Newspaper"]]
and for this to be appended to the list of 2-item sets. How would I change my current code to achieve this? I know it can be built from the 2-item set, but I'm not sure how to go about it.
Had to use this for line 5:
kItemSets' k t = ck ++ (kItemSets' (k+1) $ nub $ concat ck)
Not the most efficient but it works.

Retrieve strings from Matrix

I'm stuck with my homework task, somebody help, please..
Here is the task:
Find all possible partitions of string into words of some dictionary
And here is how I'm trying to do it:
I use dynamical programming concept to fill matrix and then I'm stuck with how to retrieve data from it
-- Task5_2
retrieve :: [[Int]] -> [String] -> Int -> Int -> Int -> [[String]]
retrieve matrix dict i j size
| i >= size || j >= size = []
| index /= 0 = [(dict !! index)]:(retrieve matrix dict (i + sizeOfWord) (i + sizeOfWord) size) ++ retrieve matrix dict i (next matrix i j) size
where index = (matrix !! i !! j) - 1; sizeOfWord = length (dict !! index)
next matrix i j
| j >= (length matrix) = j
| matrix !! i !! j > 0 = j
| otherwise = next matrix i (j + 1)
getPartitionMatrix :: String -> [String] -> [[Int]]
getPartitionMatrix text dict = [[ indiceOfWord (getWord text i j) dict 1 | j <- [1..(length text)]] | i <- [1..(length text)]]
--------------------------
getWord :: String -> Int -> Int -> String
getWord text from to = map fst $ filter (\a -> (snd a) >= from && (snd a) <= to) $ zip text [1..]
indiceOfWord :: String -> [String] -> Int -> Int
indiceOfWord _ [] _ = 0
indiceOfWord word (x:xs) n
| word == x = n
| otherwise = indiceOfWord word xs (n + 1)
-- TESTS
dictionary = ["la", "a", "laa", "l"]
string = "laa"
matr = getPartitionMatrix string dictionary
test = retrieve matr dictionary 0 0 (length string)
Here is a code that do what you ask for. It doesn't work exactly like your solution but should work as fast if (and only if) both our dictionary lookup were improved to use tries as would be reasonable. As it is I think it may be a bit faster than your solution :
module Partitions (partitions) where
import Data.Array
import Data.List
data Branches a = Empty | B [([a],Branches a)] deriving (Show)
isEmpty Empty = True
isEmpty _ = False
flatten :: Branches a -> [ [ [a] ] ]
flatten Empty = []
flatten (B []) = [[]]
flatten (B ps) = concatMap (\(word, bs) -> ...) ps
type Dictionary a = [[a]]
partitions :: (Ord a) => Dictionary a -> [a] -> [ [ [a] ] ]
partitions dict xs = flatten (parts ! 0)
where
parts = listArray (0,length xs) $ zipWith (\i ys -> starting i ys) [0..] (tails xs)
starting _ [] = B []
starting i ys
| null words = ...
| otherwise = ...
where
words = filter (`isPrefixOf` ys) $ dict
go word = (word, parts ! (i + length word))
It works like this : At each position of the string, it search all possible words starting from there in the dictionary and evaluates to a Branches, that is either a dead-end (Empty) or a list of pairs of a word and all possible continuations after it, discarding those words that can't be continued.
Dynamic programming enter the picture to record every possibilities starting from a given index in a lazy array. Note that the knot is tied : we compute parts by using starting, which uses parts to lookup which continuations are possible from a given index. This only works because we only lookup indices after the one starting is computing and starting don't use parts for the last index.
To retrieve the list of partitions from this Branches datatype is analogous to the listing of all path in a tree.
EDIT : I removed some crucial parts of the solution in order to let the questioner search for himself. Though that shouldn't be too hard to complete with some thinking. I'll probably put them back with a somewhat cleaned up version later.

Comparing 3 output lists in haskell

I am doing another Project Euler problem and I need to find when the result of these 3 lists is equal (we are given 40755 as the first time they are equal, I need to find the next:
hexag n = [ n*(2*n-1) | n <- [40755..]]
penta n = [ n*(3*n-1)/2 | n <- [40755..]]
trian n = [ n*(n+1)/2 | n <- [40755..]]
I tried adding in the other lists as predicates of the first list, but that didn't work:
hexag n = [ n*(2*n-1) | n <- [40755..], penta n == n, trian n == n]
I am stuck as to where to to go from here.
I tried graphing the function and even calculus but to no avail, so I must resort to a Haskell solution.
Your functions are weird. They get n and then ignore it?
You also have a confusion between function's inputs and outputs. The 40755th hexagonal number is 3321899295, not 40755.
If you really want a spoiler to the problem (but doesn't that miss the point?):
binarySearch :: Integral a => (a -> Bool) -> a -> a -> a
binarySearch func low high
| low == high = low
| func mid = search low mid
| otherwise = search (mid + 1) high
where
search = binarySearch func
mid = (low+high) `div` 2
infiniteBinarySearch :: Integral a => (a -> Bool) -> a
infiniteBinarySearch func =
binarySearch func ((lim+1) `div` 2) lim
where
lim = head . filter func . lims $ 0
lims x = x:lims (2*x+1)
inIncreasingSerie :: (Ord a, Integral i) => (i -> a) -> a -> Bool
inIncreasingSerie func val =
val == func (infiniteBinarySearch ((>= val) . func))
figureNum :: Integer -> Integer -> Integer
figureNum shape index = (index*((shape-2)*index+4-shape)) `div` 2
main :: IO ()
main =
print . head . filter r $ map (figureNum 6) [144..]
where
r x = inIncreasingSerie (figureNum 5) x && inIncreasingSerie (figureNum 3) x
Here's a simple, direct answer to exactly the question you gave:
*Main> take 1 $ filter (\(x,y,z) -> (x == y) && (y == z)) $ zip3 [1,2,3] [4,2,6] [8,2,9]
[(2,2,2)]
Of course, yairchu's answer might be more useful in actually solving the Euler question :)
There's at least a couple ways you can do this.
You could look at the first item, and compare the rest of the items to it:
Prelude> (\x -> all (== (head x)) $ tail x) [ [1,2,3], [1,2,3], [4,5,6] ]
False
Prelude> (\x -> all (== (head x)) $ tail x) [ [1,2,3], [1,2,3], [1,2,3] ]
True
Or you could make an explicitly recursive function similar to the previous:
-- test.hs
f [] = True
f (x:xs) = f' x xs where
f' orig (y:ys) = if orig == y then f' orig ys else False
f' _ [] = True
Prelude> :l test.hs
[1 of 1] Compiling Main ( test.hs, interpreted )
Ok, modules loaded: Main.
*Main> f [ [1,2,3], [1,2,3], [1,2,3] ]
True
*Main> f [ [1,2,3], [1,2,3], [4,5,6] ]
False
You could also do a takeWhile and compare the length of the returned list, but that would be neither efficient nor typically Haskell.
Oops, just saw that didn't answer your question at all. Marking this as CW in case anyone stumbles upon your question via Google.
The easiest way is to respecify your problem slightly
Rather than deal with three lists (note the removal of the superfluous n argument):
hexag = [ n*(2*n-1) | n <- [40755..]]
penta = [ n*(3*n-1)/2 | n <- [40755..]]
trian = [ n*(n+1)/2 | n <- [40755..]]
You could, for instance generate one list:
matches :: [Int]
matches = matches' 40755
matches' :: Int -> [Int]
matches' n
| hex == pen && pen == tri = n : matches (n + 1)
| otherwise = matches (n + 1) where
hex = n*(2*n-1)
pen = n*(3*n-1)/2
tri = n*(n+1)/2
Now, you could then try to optimize this for performance by noticing recurrences. For instance when computing the next match at (n + 1):
(n+1)*(n+2)/2 - n*(n+1)/2 = n + 1
so you could just add (n + 1) to the previous tri to obtain the new tri value.
Similar algebraic simplifications can be applied to the other two functions, and you can carry all of them in accumulating parameters to the function matches'.
That said, there are more efficient ways to tackle this problem.

Resources