I'm trying to understand histomorphisms from this blog on recursion schemes. I'm facing a problem when I'm running the example to solve the change making problem as mentioned in the blog.
Change making problem takes the denominations for a currency and tries to find the minimum number of coins required to create a given sum of money. The code below is taken from the blog and should compute the answer.
{-# LANGUAGE DeriveFunctor #-}
module Main where
import Control.Arrow ( (>>>) )
import Data.List ( partition )
import Prelude hiding (lookup)
newtype Term f = In {out :: f (Term f)}
data Attr f a = Attr
{ attribute :: a
, hole :: f (Attr f a)
}
type CVAlgebra f a = f (Attr f a) -> a
histo :: Functor f => CVAlgebra f a -> Term f -> a
histo h = out >>> fmap worker >>> h
where
worker t = Attr (histo h t) (fmap worker (out t))
type Cent = Int
coins :: [Cent]
coins = [50, 25, 10, 5, 1]
data Nat a
= Zero
| Next a
deriving (Functor)
-- Convert from a natural number to its foldable equivalent, and vice versa.
expand :: Int -> Term Nat
expand 0 = In Zero
expand n = In (Next (expand (n - 1)))
compress :: Nat (Attr Nat a) -> Int
compress Zero = 0
compress (Next (Attr _ x)) = 1 + compress x
change :: Cent -> Int
change amt = histo go (expand amt)
where
go :: Nat (Attr Nat Int) -> Int
go Zero = 1
go curr#(Next attr) =
let given = compress curr
validCoins = filter (<= given) coins
remaining = map (given -) validCoins
(zeroes, toProcess) = partition (== 0) remaining
results = sum (map (lookup attr) toProcess)
in length zeroes + results
lookup :: Attr Nat a -> Int -> a
lookup cache 0 = attribute cache
lookup cache n = lookup inner (n - 1) where (Next inner) = hole cache
Now if you evaluate change 10 it will give you 3.
Which is... incorrect because you can make 10 using 1 coin of value 10.
So I considered maybe it's solving the coin change problem, which finds the maximum number of ways in which you can make the given sum of money. For e.g. you can make 10 in 4 ways with { 1, 1, ... 10 times }, { 1, 1, 1, 1, 5}, { 5, 5 }, { 10 }.
So what is wrong with this piece of code? Where is it going wrong in solving the problem?
TLDR
The above piece of code from this blog on recursion schemes is not finding minimum or maximum ways to change a sum of money. Why is it not working?
I put some more thought into encoding this problem with recursion schemes. Maybe there's a good way to solve the unordered problem (i.e., considering 5c + 1c to be different from 1c + 5c) using a histomorphism to cache the undirected recursive calls, but I don't know what it is. Instead, I looked for a way to use recursion schemes to implement the dynamic-programming algorithm, where the search tree is probed in a specific order so that you're sure you never visit any node more than once.
The tool that I used is the hylomorphism, which comes up a bit later in the article series you're reading. It composes an unfold (anamorphism) with a fold (catamorphism). A hylomorphism uses ana to build up an intermediate structure, and then cata to tear it down into a final result. In this case, the intermediate structure I used describes a subproblem. It has two constructors: either the subproblem is solved already, or there is some amount of money left to make change for, and a pool of coin denominations to use:
data ChangePuzzle a = Solved Int
| Pending {spend, forget :: a}
deriving Functor
type Cent = Int
type ChangePuzzleArgs = ([Cent], Cent)
We need a coalgebra that turns a single problem into subproblems:
divide :: Coalgebra ChangePuzzle ChangePuzzleArgs
divide (_, 0) = Solved 1
divide ([], _) = Solved 0
divide (coins#(x:xs), n) | n < 0 = Solved 0
| otherwise = Pending (coins, n - x) (xs, n)
I hope the first three cases are obvious. The last case is the only one with multiple subproblems. We can either use one coin of the first listed denomination, and continue to make change for that smaller amount, or we can leave the amount the same but reduce the list of coin denominations we're willing to use.
The algebra for combining subproblem results is much simpler: we simply add them up.
conquer :: Algebra ChangePuzzle Int
conquer (Solved n) = n
conquer (Pending a b) = a + b
I originally tried to write conquer = sum (with the appropriate Foldable instance), but this is incorrect. We're not summing up the a types in the subproblem; rather, all the interesting values are in the Int field of the Solved constructor, and sum doesn't look at those because they're not of type a.
Finally, we let recursion schemes do the actual recursion for us with a simple hylo call:
waysToMakeChange :: ChangePuzzleArgs -> Int
waysToMakeChange = hylo conquer divide
And we can confirm it works in GHCI:
*Main> waysToMakeChange (coins, 10)
4
*Main> waysToMakeChange (coins, 100)
292
Whether you think this is worth the effort is up to you. Recursion schemes have saved us very little work here, as this problem is easy to solve by hand. But you may find reifying the intermediate states makes the recursive structure explicit, instead of implicit in the call graph. Anyway it's an interesting exercise if you want to practice recursion schemes in preparation for more complicated tasks.
The full, working file is included below for convenience.
{-# LANGUAGE DeriveFunctor #-}
import Control.Arrow ( (>>>), (<<<) )
newtype Term f = In {out :: f (Term f)}
type Algebra f a = f a -> a
type Coalgebra f a = a -> f a
cata :: (Functor f) => Algebra f a -> Term f -> a
cata fn = out >>> fmap (cata fn) >>> fn
ana :: (Functor f) => Coalgebra f a -> a -> Term f
ana f = In <<< fmap (ana f) <<< f
hylo :: Functor f => Algebra f b -> Coalgebra f a -> a -> b
hylo alg coalg = ana coalg >>> cata alg
data ChangePuzzle a = Solved Int
| Pending {spend, forget :: a}
deriving Functor
type Cent = Int
type ChangePuzzleArgs = ([Cent], Cent)
coins :: [Cent]
coins = [50, 25, 10, 5, 1]
divide :: Coalgebra ChangePuzzle ChangePuzzleArgs
divide (_, 0) = Solved 1
divide ([], _) = Solved 0
divide (coins#(x:xs), n) | n < 0 = Solved 0
| otherwise = Pending (coins, n - x) (xs, n)
conquer :: Algebra ChangePuzzle Int
conquer (Solved n) = n
conquer (Pending a b) = a + b
waysToMakeChange :: ChangePuzzleArgs -> Int
waysToMakeChange = hylo conquer divide
The initial confusion with the blog post was because it was pointing to a different problem in the wikipedia link.
Retaking a look at change, it's trying to find the number of "ordered" ways of making change for a given value. This means that the ordering of coins matters. The correct value of change 10 should be 9.
Coming back to the problem, the main issue is with the implementation of the lookup method. The key point to note is that lookup is backwards i.e to calculate the contribution of a denomination to the sum it should be passed as argument to the lookup and not it's difference with the given value.
-- to find contribution of 5 to the number of ways we can
-- change 15. We should pass the cache of 15 and 5 as the
-- parameters. So the cache will be unrolled 5 times to
-- to get the value from cache of 10
lookup :: Attr Nat a -- ^ cache
-> Int -- ^ how much to roll back
-> a
lookup cache 1 = attribute cache
lookup cache n = lookup inner (n - 1) where (Next inner) = hole cache
The complete solution is described in this issue by #howsiwei.
Edit: Base on discussion in the comments this can be solved using histomorphisms but with a few challenges
It can be solved using histomorphisms but the cache and functor types will need to be more complex to hold more state. Namely -
The cache will need to keep a list of permitted denominations for a particular amount this will allow us eliminate overlap
The harder challenge is to come up with a functor that can order all the information. Nat will not be sufficient because it cannot distinguish between different values of a complex cache type.
I see two problems with this program. One of them I know how to fix, but the other apparently requires more knowledge of recursion schemes than I have.
The one I can fix is that it's looking up the wrong values in its cache. When given = 10, of course validCoins = [10,5,1], and so we find (zeroes, toProcess) = ([0], [5,9]). So far so good: we can give a dime directly, or give a nickel and then make change for the remaining five cents, or we can give a penny and change the remaining nine cents. But then when we write lookup 9 attr, we're saying "look 9 steps in history to when curr = 1", where what we meant was "look 1 step into history to when curr = 9". As a result we drastically undercount in pretty much all cases: even change 100 is only 16, while a Google search claims the right result is 292 (I haven't verified this today by implementing it myself).
There are a few equivalent ways to fix this; the smallest diff would be to replace
results = sum (map (lookup attr)) toProcess)
with
results = sum (map (lookup attr . (given -)) toProcess)
The second problem is: the values in the cache are wrong. As I mentioned in a comment on the question, this counts different orderings of the same denominations as separate answers to the question. After I fix the first problem, the lowest input where this second problem manifests is 7, with the incorrect result change 7 = 3. If you try change 100 I don't know how long it takes to compute: much longer than it should, probably a very long time. But even a modest value like change 30 yields a number that's much larger than it should be.
I don't see a way to fix this without a substantial algorithm rework. Traditional dynamic-programming solutions to this problem involve producing the solutions in a specific order so you can avoid double-counting. i.e., they first decide how many dimes to use (here, 0 or 1), then compute how to make change for the remaining amounts without using any dimes. I don't know how to work that idea in here - your cache key would need to be larger, including both the target amount and also the allowed set of coins.
Related
I'm following the NLPWP Computational Linguistics site and trying to create a Haskell procedure to find collocations (most common groupings of two words, like "United States" or "to find") in a list of words. I've got the following working code to find bigram frequency:
import Data.Map (Map)
import qualified Data.Map as Map
-- | Function for creating a list of bigrams
-- | e.g. [("Colorless", "green"), ("green", "ideas")]
bigram :: [a] -> [[a]]
bigram [] = []
bigram [_] = []
bigram xs = take 2 xs : bigram (tail xs)
-- | Helper for freqList and freqBigram
countElem base alow = case (Map.lookup alow base) of
Just v -> Map.insert alow (v + 1) base
Nothing -> Map.insert alow 1 base
-- | Maps each word to its frequency.
freqList alow = foldl countElem Map.empty alow
-- | Maps each bigram to its frequency.
freqBigram alow = foldl countElem Map.empty (bigram alow)
I'm trying to write a function that outputs a Map from each bigram to [freq of bigram]/[(freq word 1)*(freq word 2)]. Could you possibly provide advice on how to approach it?
None of the following code is working, but it gives a vague outline for what I was trying to do.
collocations alow =
| let f key = (Map.lookup key freqBi) / ((Map.lookup (first alow) freqs)*(Map.lookup (last alow) freqs))
in Map.mapWithKey f = freqBi
where freqs = (freqList alow)
where freqBi = (freqBigram alow)
I'm very new to Haskell, so let me know if you've got any idea how to fix the collocations procedure. Style tips are also welcome.
Most of your code looks sane, except for the final colloctions function.
I'm not sure why there's a stray pipe in there after the equals sign. You're not trying to write any kind of pattern guard, so I don't think that should be there.
Map.lookup returns a Maybe key, so trying to do division or multiplication isn't going to work. Maybe what you want is some kind of function that takes a key and a map, and returns the associated count or zero if the key doesn't exist?
Other than that, it looks like you're not too far off having this work.
As I read it, your confusion stems from mistaking types, more or less. General advice: Use type signatures on all your top level functions and make sure they are sensible and what you expect of the function (I often do this even before implementing the function).
Let's take a look at your
-- | Function for creating a list of bigrams
-- | e.g. [("Colorless", "green"), ("green", "ideas")]
bigram :: [a] -> [[a]]
If you're giving in a list of Strings, you'll be getting a list of lists of Strings, so your bigram is a list.
You could decide to be more explicit (only allow Strings instead of sometype a - for the beginning at least). So, actually we get a list of Words an make a list of Bigrams from it:
type Word = String
type Bigram = (Word, Word)
bigram :: [Word] -> [Bigram]
For the implementation you can try to use readily available functions from Data.List, for example zipWith and tail.
Now your freqList and freqBigram look like
freqList :: [Word] -> Map Word Int
freqBigram :: [Word] -> Map Bigram Int
With this error messages of the compiler will be clearer to you. To point at it: Take care what you're doing in the lookups for the word frequencies. You're searching for the frequency of word1 and word2, and the bigram is (word1,word2).
Now you should be able to figure the solution out on your own, I guess.
First of all I advise you to have a look at the function
insertWith :: Ord k => (a -> a -> a) -> k -> a -> Map k a -> Map k a
maybe you'll recognize the pattern if used
f freqs bg = insertWith (+) bg 1 freqs
Next as #MathematicalOrchid already pointed out your solution is not too far from being correct.
lookup :: Ord k => k -> Map k a -> Maybe a
You already took care of that in your countElems function.
what I'd like to note that there is this neat abstraction called Applicative, which works really well for problems like yours.
First of all you have to import Control.Applicative if you're using GHC prior to 7.10 for newer versions it is already at your fingertips.
So what does this abstraction provide, similar to Functor it gives you a way to handle "side effects" in your case the possibility of the failing lookup resulting in Nothing.
We have two operators provided by Applicative: pure and <*>, and in addition as every Applicative is required to be a Functor we also get fmap or <$> which are the latter is just an infix alias for convenience.
So how does this apply to your situation?
<*> :: Applicative f => f (a -> b) -> f a -> f b
<$> :: Functor f => a -> b -> f a -> f b
First of all you see that those two look darn similar but with <*> being slightly less familiar.
Now having a function
f :: Int -> Int
f x = x + 3
and
x1 :: Maybe Int
x1 = Just 4
x2 :: Maybe Int
x2 = Nothing
one couldn't simply just f y because that wouldn't typecheck - but and that is the first idea to keep in mind. Maybe is a Functor (it is also an Applicative - it is even more an M-thing, but let's not go there).
f <$> x1 = Just 7
f <$> x2 = Nothing
so you can imagine the f looking up the value and performing the calculation inside the Just and if there is no value - a.k.a. we have the Nothing situation, we'll do what every lazy student does - be lazy and do nothing ;-).
Now we get to the next part <*>
g1 :: Maybe (Int -> Int)
g1 = Just (x + 3)
g2 :: Maybe (Int -> Int)
g2 = Nothing
Still g1 x1 wouldn't work, but
g1 <*> x1 = Just 7
g1 <*> x2 = Nothing
g2 <*> x1 = Nothing -- remember g2 is Nothing
g2 <*> x2 = Nothing
NEAT! - but still how does this solve your problem?
The 'magic' is using both operators ... for multi-argument functions
h :: Int -> Int -> Int
h x y = x + y + 2
and partial function application, which just means put in one value get back a function that waits for the next value.
GHCi> :type h 1
h 1 :: Int -> Int
Now the strange thing happens we can use with a function like h.
GHCi> :type h1 <$> x1
h1 <$> x1 :: Maybe (Int -> Int)
well that's good because then we can use our <*> with it
y1 :: Maybe Int
y1 = Just 7
h1 <$> x1 <*> y1 = Just (4 + 7 + 2)
= Just 13
and this even works with an arbitrary number of arguments
k :: Int -> Int -> Int -> Int -> Int
k x y z w = ...
k <$> x1 <*> y1 <*> z1 <*> w1 = ...
So design a pure function that works with Int, Float, Double or whatever you like and then use the Functor/Applicative abstraction to make your lookup and frequency calculation work with each other.
Given a function of type f :: a -> a, we can produce a function that applies f for n times:
nTimes :: Int -> (a -> a) -> (a -> a)
nTimes 0 _ = id
nTimes 1 f = f
nTimes n f = f . nTimes (n-1) f
I can use exponentiating by squaring method here to implement another nTimes function:
nTimes' :: Int -> (a -> a) -> (a -> a)
nTimes' = nTimes'' id
where
nTimes'' acc n f
| n == 0 = acc
| even n = nTimes'' acc (n `div` 2) (f . f)
| otherwise = nTimes'' (acc . f) (n-1) f
My question is:
Do nTimes and nTimes' always produce the same result?
Will nTimes' be faster?
Although they are equivalent, I would be extremely surprised if ntimes' were actually faster or memory-saving in any real situation. The problem is that unlike with the x * x doubling in ordinary exponentiation by squaring, f . f does not actually share any of the real work done when applying f. It is still going to turn in the end into applying the outermost single f to an argument constructed by all the remainder somehow. And ntimes (n-1) f x is going to be about the most compact representation you can have of that remainder until it is itself actually needed to be evaluated, which will require applying its leftmost f to a representation of ntimes (n-2) f x, etc.
EDIT: Let me add that this could change significantly if you were doing memoization, i.e. replacing f . f by memo (f . f) for some memo-combinator that modifies a function to remember its results. In that case actual work could be shared, and this version of ntimes' might sometimes be an improvement. Other times it could waste an awful lot of memory, though.
It will produce the same result in both cases, because both * and . are associative operators.
However, the "speedup" is not the speedup you might be thinking of. Exponentiation by squaring is good because it decreases the number of times the * operator is applied, from linear to logarithmic number of times. In this case, you are decreasing the number of times the . operator is applied, from linear to logarithmic number of times.
However, like Ørjan Johansen said, unlike *, the . operator doesn't really do much -- it just takes two function values, and outputs a new function value, which essentially wraps the given two functions plus some code.
The resulting function that you get from nTimes', when applied to a value, must still run f n times. Therefore, there is no improvement in actually running the resulting function, only an improvement in the process of constructing the resulting function using ..
Do nTimes and nTimes' always produce the same result?
Yes. (Unless you have bugs, didn’t check that).
Will nTimes' be faster?
Probably not significantly . f itself is shared in both cases, so there is no recomputation there.
If f is not sufficiently lazy you are building a list of references to f vs. a tree with sharing of references to f, so you are saving a bit of memory here.
I'm new in Haskell and try to solve 3 problem from http://projecteuler.net/.
The prime factors of 13195 are 5, 7, 13 and 29.
What is the largest prime factor of the number 600851475143 ?
My solution:
import Data.List
getD :: Int -> Int
getD x =
-- find deviders
let deriveList = filter (\y -> (x `mod` y) == 0) [1 .. x]
filteredList = filter isSimpleNumber deriveList
in maximum filteredList
-- Check is nmber simple
isSimpleNumber :: Int -> Bool
isSimpleNumber x = let deriveList = map (\y -> (x `mod` y)) [1 .. x]
filterLength = length ( filter (\z -> z == 0) deriveList)
in
case filterLength of
2 -> True
_ -> False
I try to run for example:
getD 13195
> 29
But when i try:
getD 600851475143
I get error Exception: Prelude.maximum: empty list Why?
Thank you #Barry Brown, I think i must use:
getD :: Integer -> Integer
But i get error:
Couldn't match expected type `Int' with actual type `Integer'
Expected type: [Int]
Actual type: [Integer]
In the second argument of `filter', namely `deriveList'
In the expression: filter isSimpleNumber deriveList
Thank you.
Your type signature limits the integer values to about 2^29. Try changing Int to Integer.
Edit:
I see that you already realised that you need to use Integer instead of Int. You need to change the types of both getD and isSimpleNumber otherwise you will get a type mismatch.
Also in general, if you are having trouble with types, simply remove the type declarations and let Haskell tell you the correct types.
Main> :t getD
getD :: Integral a => a -> a
Main> :t isSimpleNumber
isSimpleNumber :: Integral a => a -> Bool
After you found the error, may I point out that your solution is quite verbose? In this case a very simple implementation using brute force is good enough:
getD n = getD' n 2 where
getD' n f | n == f = f
| n `mod` f == 0 = getD' (n `div` f) f
| otherwise = getD' n (succ f)
this question is easy enough for brute-force solution, but it is a bad idea to do so because the whole idea of project euler is problems you need to really think of to solve (see end of answer)
so here are some of your program's flaws:
first, use rem instead of mod. it is more efficient.
some mathematical thinking should have told you that you don't need to check all numbers from 1 to x in the isprime function and the getD function, but checking all numbers from the squareroot to one (or reversed) should be sufficient. note that in getD you will actually need to filter numbers between x and the square root, because you search for the biggest one.
why do you use the maximum function in getD? you know the list is monotonically growing, so you may as well get the last one.
despite you only need the biggest divisor (which is prime) you compute the divisors list from small to big making the computer check for each value if it is a divisor or not although discarding the result once a bigger divisor is found. it should be fixed by filtering the list of numbers from x to 1, not from 1 to x. this will cause the computer to check divisibility (how should I say that?) for the biggest possible divisor, not throwing to the trash the knowledge of previous checks. note that this optimization takes effect only if the previous point is optimized, because otherwise the computer will compute all divisors anyway.
with the previous points mixed, you should have filtered all numbers [x,x-1 .. squareroot x] and taken the first.
you don't use an efficient isPrime function. if I were you, I would have searched for an isprime library function, which is guaranteed to be efficient.
and there are more..
with this kind of code you will never be able to solve harder project euler problems. they are designed to need extra thinking about the problem (for instance noticing you don't have to check numbers greater from the square root) and writing fast and efficient code. this is the purpose of project euler; being smart about programming. so don't skip it.
I apologize for not coming up with a good title for this question. I'm having some trouble expressing what I need. I have a simple problem in Haskell and I am wondering what the best approach is to solve it.
Let's say I have a list of numbers: [-3,2,1,2]. I want to return the value with the highest absolute value. That is, I want to return -3. So I want:
f = maximum . map abs
The problem is, of course, that this returns the calculated value (3) and not the original value (-3).
I could figure out a way of doing this, maybe mapping the original list to a tuple of (originalValue, calculatdValue), finding the tuple whose snd is returned by my function (maximum) and then return fst of that tuple.
But this seems like a lot of "plumbing" for a simple problem like this, and I wonder if there is some abstraction I'm missing that solves this. That is, there is this generally procedure I do all the time, and I want some way of neatly doing it:
I want to take a list of items.
I want to map them to a certain value (let's say the absolute value)
Then I want to select one based on some criteria (let's say I want the maximum or maybe the minimum).
But then I want to return the original value. (If the list was [-3,2,1,2] and I want to return the value with the highest abs, then I would return -3).
Is there a library function for this? Is there a functor or a monad for this?
I think I want a function with the signature:
f :: ([b] -> b) -> (a -> b) -> [a] -> a
i.e.
f maximum abs [-3,2,1,2]
This feels very "functory" to me or maybe "monadic".
Use maximumBy which takes a comparison function. You can then pass some function that compares the way you want.
maximumBy (compare `on` abs)
Stop...hoogle time!
So you've got a list of stuff [a]. And you want to end up with just one of those a. You also want to compare elements of this list in some special way (not their natural ordering), in order to determine which comes first. This is the tricky part, but you should be able to see that what I've described is a function of the form a -> a -> Ordering.
Put it all together:
(a -> a -> Ordering) -> [a] -> a
And hoogle it. maximumBy and minimumBy are the first hits :) Hoogle can be a powerful asset when you learn to use it. (See augustss's answer for details on how to use maximumBy in this case)
Another way to do it, if the conversion is a bit expensive:
maximumWith :: (Ord b) => (a -> b) -> [a] -> a
maximumWith f = snd . maximumBy (compare `on` fst) . map (f &&& id)
This type is similar to GHC.Exts's sortWith, which gives us another way to do it:
maximumWith :: (Ord b) => (a -> b) -> [a] -> a
maximumWith f = head . sortWith (Down . f)
We can define a minimumWith similarly:
minimumWith :: (Ord b) => (a -> b) -> [a] -> a
minimumWith f = head . sortWith f
A look at the source for sortWith reveals that it's implemented by sortBy, so it lacks the caching that the first definition for maximumWith had.
This, obviously calls for some benchmarking:
module Main where
import Control.Arrow ((&&&))
import Data.List (sortBy)
import Data.Function (on)
import GHC.Exts (sortWith)
import Criterion.Main
sortWith :: (Ord b) => (a -> b) -> [a] -> [a]
sortWith f = map snd . sortBy (compare `on` fst) . map (f &&& id)
badFib :: Int -> Int
badFib 0 = 1
badFib 1 = 1
badFib n = badFib (n - 1) + badFib (n - 2)
main = defaultMain [ bench "GHC.Exts.sortWith" $ nf (GHC.Exts.sortWith badFib) [0..20]
, bench "Main.sortWith" $ nf (Main.sortWith badFib) [0..20]
]
The results on my laptop:
benchmarking GHC.Exts.sortWith
collecting 100 samples, 12 iterations each, in estimated 1.504415 s
bootstrapping with 100000 resamples
mean: 1.264608 ms, lb 1.260519 ms, ub 1.270248 ms, ci 0.950
std dev: 24.42169 us, lb 19.21734 us, ub 31.50275 us, ci 0.950
found 8 outliers among 100 samples (8.0%)
5 (5.0%) high mild
3 (3.0%) high severe
variance introduced by outliers: 0.996%
variance is unaffected by outliers
benchmarking Main.sortWith
collecting 100 samples, 50 iterations each, in estimated 1.516733 s
bootstrapping with 100000 resamples
mean: 305.9089 us, lb 304.0602 us, ub 310.9257 us, ci 0.950
std dev: 14.41005 us, lb 6.680240 us, ub 30.26940 us, ci 0.950
found 18 outliers among 100 samples (18.0%)
9 (9.0%) high mild
9 (9.0%) high severe
variance introduced by outliers: 0.999%
variance is unaffected by outliers
If you are trying to have something ordered and compared by a projection always, rather than just at a specific usage (in which case see augustss's answer), then use a newtype wrapper:
newtype AbsInt = AbsInt Int
instance Eq AbsInt where
AbsInt x == AbsInt y = abs x == abs y
instance Ord AbsInt where
compare (AbsInt x) (AbsInt y) = compare x y
Now, for example:
maximum [AbsInt 1, AbsInt 10, AbsInt (-50)] = AbsInt (-50)
Presumably you would be working with AbsInt as your objects of study, so you wouldn't be writing those AbsInts everywhere.
The more operations you need on AbsInt, the more boilerplate you need. However if you just want to "pass through" some instances, GHC has an extension GeneralizedNewtypeDeriving that allows that; eg.:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
newtype AbsInt = AbsInt Int
deriving (Num)
Now AbsInt behaves like an Int with regard to arithmetic, but (given the instances above) by absolute values with regard to comparison. Also note that the Num instance gives you the ability to use literals, so:
(maximum [1,2,-3] :: AbsInt) = AbsInt (-3)
I believe something along the lines of the following should work.
foldl abs_max (head xs) xs
where abs_max x y = if (abs x) > (abs y) then x else y
Looking beyond the task at hand you could generalize it by abstracting out the comparison function and passing it in later.
Here is something I cooked up. It's kind of meh, because it requires (Eq b)
selectOn :: (Eq b) => ([b] -> b) -> (a -> b) -> [a] -> a
selectOn reducer f list = head $ filter (\x -> f(x) == k ) list
where k = reducer $ map f list
And then:
selectOn maximum abs [1,2,-3]
Or:
selectOn sum id [-3, 0, 3]
I guess I can generalize compare on and get the exact same effect.
I've coded up the 0-1 Knapsack problem in Haskell. I'm fairly proud about the laziness and level of generality achieved so far.
I start by providing functions for creating and dealing with a lazy 2d matrix.
mkList f = map f [0..]
mkTable f = mkList (\i -> mkList (\j -> f i j))
tableIndex table i j = table !! i !! j
I then make a specific table for a given knapsack problem
knapsackTable = mkTable f
where f 0 _ = 0
f _ 0 = 0
f i j | ws!!i > j = leaveI
| otherwise = max takeI leaveI
where takeI = tableIndex knapsackTable (i-1) (j-(ws!!i)) + vs!!i
leaveI = tableIndex knapsackTable (i-1) j
-- weight value pairs; item i has weight ws!!i and value vs!!i
ws = [0,1,2, 5, 6, 7] -- weights
vs = [0,1,7,11,21,31] -- values
And finish off with a couple helper functions for looking at the table
viewTable table maxI maxJ = take (maxI+1) . map (take (maxJ+1)) $ table
printTable table maxI maxJ = mapM_ print $ viewTable table maxI maxJ
This much was pretty easy. But I want to take it a step further.
I want a better data structure for the table. Ideally, it should be
Unboxed (immutable) [edit] never mind this
Lazy
Unbounded
O(1) time to construct
O(1) time complexity for looking up a given entry,
(more realistically, at worst O(log n), where n is i*j for looking up the entry at row i, column j)
Bonus points if you can explain why/how your solution satisfies these ideals.
Also bonus points if you can further generalize knapsackTable, and prove that it is efficient.
In improving the data structure you should try to satisfy the following goals:
If I ask for the solution where the maximum weight is 10 (in my current code, that would be indexTable knapsackTable 5 10, the 5 means include items 1-5) only the minimal amount of work necessary should be performed. Ideally this means no O(i*j) work for forcing the spine of each row of the table to necessary column length. You could say this isn't "true" DP, if you believe DP means evaluating the entirety of the table.
If I ask for the entire table to be printed (something like printTable knapsackTable 5 10), the values of each entry should be computed once and only once. The values of a given cell should depend on the values of other cells (DP style: the idea being, never recompute the same subproblem twice)
Ideas:
Data.Array bounded :(
UArray strict :(
Memoization techniques (SO question about DP in Haskell) this might work
Answers that make some compromises to my stated ideals will be upvoted (by me, anyways) as long as they are informative. The answer with the least compromises will probably be the "accepted" one.
First, your criterion for an unboxed data structure is probably a bit mislead. Unboxed values must be strict, and they have nothing to do with immutability. The solution I'm going to propose is immutable, lazy, and boxed. Also, I'm not sure in what way you are wanting construction and querying to be O(1). The structure I'm proposing is lazily constructed, but because it's potentially unbounded, its full construction would take infinite time. Querying the structure will take O(k) time for any particular key of size k, but of course the value you're looking up may take further time to compute.
The data structure is a lazy trie. I'm using Conal Elliott's MemoTrie library in my code. For genericity, it takes functions instead of lists for the weights and values.
knapsack :: (Enum a, Num w, Num v, Num a, Ord w, Ord v, HasTrie a, HasTrie w) =>
(a -> w) -> (a -> v) -> a -> w -> v
knapsack weight value = knapsackMem
where knapsackMem = memo2 knapsack'
knapsack' 0 w = 0
knapsack' i 0 = 0
knapsack' i w
| weight i > w = knapsackMem (pred i) w
| otherwise = max (knapsackMem (pred i) w)
(knapsackMem (pred i) (w - weight i)) + value i
Basically, it's implemented as a trie with a lazy spine and lazy values. It's bounded only by the key type. Because the entire thing is lazy, its construction before forcing it with queries is O(1). Each query forces a single path down the trie and its value, so it's O(1) for a bounded key size O(log n). As I already said, it's immutable, but not unboxed.
It will share all work in the recursive calls. It doesn't actually allow you to print the trie directly, but something like this should not do any redundant work:
mapM_ (print . uncurry (knapsack ws vs)) $ range ((0,0), (i,w))
Unboxed implies strict and bounded. Anything 100% Unboxed cannot be Lazy or Unbounded. The usual compromise is embodied in converting [Word8] to Data.ByteString.Lazy where there are unboxed chunks (strict ByteString) which are linked lazily together in an unbounded way.
A much more efficient table generator (enhanced to track individual items) could be made using "scanl", "zipWith", and my "takeOnto". This effectively avoid using (!!) while creating the table:
import Data.List(sort,genericTake)
type Table = [ [ Entry ] ]
data Entry = Entry { bestValue :: !Integer, pieces :: [[WV]] }
deriving (Read,Show)
data WV = WV { weight, value :: !Integer }
deriving (Read,Show,Eq,Ord)
instance Eq Entry where
(==) a b = (==) (bestValue a) (bestValue b)
instance Ord Entry where
compare a b = compare (bestValue a) (bestValue b)
solutions :: Entry -> Int
solutions = length . filter (not . null) . pieces
addItem :: Entry -> WV -> Entry
addItem e wv = Entry { bestValue = bestValue e + value wv, pieces = map (wv:) (pieces e) }
-- Utility function for improve
takeOnto :: ([a] -> [a]) -> Integer -> [a] -> [a]
takeOnto endF = go where
go n rest | n <=0 = endF rest
| otherwise = case rest of
(x:xs) -> x : go (pred n) xs
[] -> error "takeOnto: unexpected []"
improve oldList wv#(WV {weight=wi,value = vi}) = newList where
newList | vi <=0 = oldList
| otherwise = takeOnto (zipWith maxAB oldList) wi oldList
-- Dual traversal of index (w-wi) and index w makes this a zipWith
maxAB e2 e1 = let e2v = addItem e2 wv
in case compare e1 e2v of
LT -> e2v
EQ -> Entry { bestValue = bestValue e1
, pieces = pieces e1 ++ pieces e2v }
GT -> e1
-- Note that the returned table is finite
-- The dependence on only the previous row makes this a "scanl" operation
makeTable :: [Int] -> [Int] -> Table
makeTable ws vs =
let wvs = zipWith WV (map toInteger ws) (map toInteger vs)
nil = repeat (Entry { bestValue = 0, pieces = [[]] })
totW = sum (map weight wvs)
in map (genericTake (succ totW)) $ scanl improve nil wvs
-- Create specific table, note that weights (1+7) equal weight 8
ws, vs :: [Int]
ws = [2,3, 5, 5, 6, 7] -- weights
vs = [1,7,8,11,21,31] -- values
t = makeTable ws vs
-- Investigate table
seeTable = mapM_ seeBestValue t
where seeBestValue row = mapM_ (\v -> putStr (' ':(show (bestValue v)))) row >> putChar '\n'
ways = mapM_ seeWays t
where seeWays row = mapM_ (\v -> putStr (' ':(show (solutions v)))) row >> putChar '\n'
-- This has two ways of satisfying a bestValue of 8 for 3 items up to total weight 5
interesting = print (t !! 3 !! 5)
Lazy storable vectors: http://hackage.haskell.org/package/storablevector
Unbounded, lazy, O(chunksize) time to construct, O(n/chunksize) indexing, where chunksize can be sufficiently large for any given purpose. Basically a lazy list with some significant constant factor benifits.
To memoize functions, I recommend a library like Luke Palmer's memo combinators. The library uses tries, which are unbounded and have O(key size) lookup. (In general, you can't do better than O(key size) lookup because you always have to touch every bit of the key.)
knapsack :: (Int,Int) -> Solution
knapsack = memo f
where
memo = pair integral integral
f (i,j) = ... knapsack (i-b,j) ...
Internally, the integral combinator probably builds an infinite data structure
data IntTrie a = Branch IntTrie a IntTrie
integral f = \n -> lookup n table
where
table = Branch (\n -> f (2*n)) (f 0) (\n -> f (2*n+1))
Lookup works like this:
lookup 0 (Branch l a r) = a
lookup n (Branch l a r) = if even n then lookup n2 l else lookup n2 r
where n2 = n `div` 2
There are other ways to build infinite tries, but this one is popular.
Why won't you use Data.Map putting the other Data.Map into it? As far as I know it's quite fast.
It wouldn't be lazy though.
More than that, you can implement Ord typeclass for you data
data Index = Index Int Int
and put a two dimensional index directly as a key. You can achieve laziness by generating this map as a list and then just use
fromList [(Index 0 0, value11), (Index 0 1, value12), ...]