I'm trying to write a function in Haskell to generate multidimensional lists.
(Technically I'm using Curry, but my understanding is that it's mostly a superset of Haskell, and the thing I'm trying to do is common to Haskell as well.)
After a fair bit of head scratching, I realized my initial desired function (m_array generating_function list_of_dimensions, giving a list nested to a depth equal to length list_of_dimensions) was probably at odds with they type system itself, since (AFAICT) the nesting-depth of lists is part of its type, and my function wanted to return values whose nesting-depths differed based on the value of a parameter, meaning it wanted to return values whose types varied based on the value of a parameter, which (AFAICT) isn't supported in Haskell. (If I'm wrong, and this CAN be done, please tell me.) At this point I moved on to the next paragraph, but if there's a workaround I've missed that takes very similar parameters and still outputs a nested list, let me know. Like, maybe if you can encode the indices as some data type that implicitly includes the nesting level in its type, and is instantiated with e.g. dimensions 5 2 6 ..., maybe that'd work? Not sure.
In any case, I thought that perhaps I could encode the nesting-depth by nesting the function itself, while still keeping the parameters manageable. This did work, and I ended up with the following:
ma f (l:ls) idx = [f ls (idx++[i]) | i <- [0..(l-1)]]
However, so far it's still a little clunky to use: you need to nest the calls, like
ma (ma (ma (\_ i -> 0))) [2,2,2] []
(which, btw, gives [[[0,0],[0,0]],[[0,0],[0,0]]]. If you use (\_ i -> i), it fills the array with the indices of the corresponding element, which is a result I'd like to keep available, but could be a confusing example.)
I'd prefer to minimize the boilerplate necessary. If I can't just call
ma (\_ i -> i) [2,2,2]
I'd LIKE to be able to call, at worst,
ma ma ma (\_ i -> i) [2,2,2] []
But if I try that, I get errors. Presumably the list of parameters is being divvied up in a way that doesn't make sense for the function. I've spent about half an hour googling and experimenting, trying to figure out Haskell's mechanism for parsing strings of functions like that, but I haven't found a clear explanation, and understanding eludes me. So, the formal questions:
How does Haskell parse e.g. f1 f2 f3 x y z? How are the arguments assigned? Is it dependent on the signatures of the functions, or does it e.g. just try to call f1 with 5 arguments?
Is there a way of restructuring ma to permit calling it without parentheses? (Adding at most two helper functions would be permissible, e.g. maStart ma ma maStop (\_ i -> i) [1,2,3,4] [], if necessary.)
The function you want in your head-scratching paragraph is possible directly -- though a bit noisily. With GADTs and DataKinds, values can be parameterized by numbers. You won't be able to use lists directly, because they don't mention their length in their type, but a straightforward variant that does works great. Here's how it looks.
{-# Language DataKinds #-}
{-# Language GADTs #-}
{-# Language ScopedTypeVariables #-}
{-# Language StandaloneDeriving #-}
{-# Language TypeOperators #-}
import GHC.TypeLits
infixr 5 :+
data Vec n a where
O :: Vec 0 a -- O is supposed to look a bit like a mix of 0 and []
(:+) :: a -> Vec n a -> Vec (n+1) a
data FullTree n a where
Leaf :: a -> FullTree 0 a
Branch :: [FullTree n a] -> FullTree (n+1) a
deriving instance Show a => Show (Vec n a)
deriving instance Show a => Show (FullTree n a)
ma :: forall n a. ([Int] -> a) -> Vec n Int -> FullTree n a
ma f = go [] where
go :: [Int] -> Vec n' Int -> FullTree n' a
go is O = Leaf (f is)
go is (l :+ ls) = Branch [go (i:is) ls | i <- [0..l-1]]
Try it out in ghci:
> ma (\_ -> 0) (2 :+ 2 :+ 2 :+ O)
Branch [Branch [Branch [Leaf 0,Leaf 0],Branch [Leaf 0,Leaf 0]],Branch [Branch [Leaf 0,Leaf 0],Branch [Leaf 0,Leaf 0]]]
> ma (\i -> i) (2 :+ 2 :+ 2 :+ O)
Branch [Branch [Branch [Leaf [0,0,0],Leaf [1,0,0]],Branch [Leaf [0,1,0],Leaf [1,1,0]]],Branch [Branch [Leaf [0,0,1],Leaf [1,0,1]],Branch [Leaf [0,1,1],Leaf [1,1,1]]]]
A low-tech solution:
In Haskell, you can model multi-level lists by using the so-called free monad.
The base definition is:
data Free ft a = Pure a | Free (ft (Free ft a))
where ft can be any functor, but here we are interested in ft being [], that is the list functor.
So we define our multidimensional list like this:
import Control.Monad
import Control.Monad.Free
type Mll = Free [] -- Multi-Level List
The Mll type transformer happens to be an instance of the Functor, Foldable, Traversable classes, which can come handy.
To make an array of arbitrary dimension, we start with:
the list of dimensions, for example [5,2,6]
the filler function, which returns a value for a given set of indices
We can start by making a “grid” object, whose item at indices say [x,y,z] is precisely the [x,y,z] list. As we have a functor instance, we can complete the process by just applying fmap filler to our grid object.
This gives the following code:
makeNdArray :: ([Int] -> a) -> [Int] -> Mll a
makeNdArray filler dims =
let
addPrefix x (Pure xs) = Pure (x:xs)
addPrefix x (Free xss) = Free $ map (fmap (x:)) xss
makeGrid [] = Pure []
makeGrid (d:ds) = let base = 0
fn k = addPrefix k (makeGrid ds)
in Free $ map fn [base .. (d-1+base)]
grid = makeGrid dims
in
fmap filler grid -- because we are an instance of the Functor class
To visualize the resulting structure, it is handy to be able to remove the constructor names:
displayMll :: Show a => Mll a -> String
displayMll = filter (\ch -> not (elem ch "Pure Free")) . show
The resulting structure can easily be flattened if need be:
toListFromMll :: Mll a -> [a]
toListFromMll xs = foldr (:) [] xs
For numeric base types, we can get a multidimensional sum function “for free”, so to speak:
mllSum :: Num a => (Mll a) -> a
mllSum = sum -- because we are an instance of the Foldable class
-- or manually: foldr (+) 0
Some practice:
We use [5,2,6] as the dimension set. To visualize the structure, we associate a decimal digit to every index. We can pretend to have 1-base indexing by adding 111, because that way all the resulting numbers are 3 digits long, which makes the result easier to check. Extra newlines added manually.
$ ghci
GHCi, version 8.8.4: https://www.haskell.org/ghc/ :? for help
λ>
λ> dims = [5,2,6]
λ> filler = \[x,y,z] -> (100*x + 10*y + z + 111)
λ>
λ> mxs = makeNdArray filler dims
λ>
λ> displayMll mxs
"[[[111,112,113,114,115,116],[121,122,123,124,125,126]],
[[211,212,213,214,215,216],[221,222,223,224,225,226]],
[[311,312,313,314,315,316],[321,322,323,324,325,326]],
[[411,412,413,414,415,416],[421,422,423,424,425,426]],
[[511,512,513,514,515,516],[521,522,523,524,525,526]]]"
λ>
As mentioned above, we can flatten the structure:
λ>
λ> xs = toListFromMll mxs
λ> xs
[111,112,113,114,115,116,121,122,123,124,125,126,211,212,213,214,215,216,221,222,223,224,225,226,311,312,313,314,315,316,321,322,323,324,325,326,411,412,413,414,415,416,421,422,423,424,425,426,511,512,513,514,515,516,521,522,523,524,525,526]
λ>
or take its overall sum:
λ>
λ> sum mxs
19110
λ>
λ> sum xs
19110
λ>
λ>
λ> length mxs
60
λ>
λ> length xs
60
λ>
Related
Using the SBV library, I'm trying to satisfy conditions on a symbolic list of states:
data State = Intro | Start | Content | Comma | Dot
mkSymbolicEnumeration ''State
-- examples of such lists
[Intro, Start, Content, Comma, Start, Comma, Content, Dot]
[Intro, Comma, Start, Content, Comma, Content, Start, Dot]
All works fine except that I need the final list to contain exactly n elements of either [Intro, Start, Content] in total. Currently I do it using a bounded filter:
answer :: Int -> Symbolic [State]
answer n = do
seq <- sList "seq"
let maxl = n+6
let minl = n+2
constrain $ L.length seq .<= fromIntegral maxl
constrain $ L.length seq .>= fromIntegral minl
-- some additional constraints hidden for brevity purposes
let etypes e = e `sElem` [sIntro, sStart, sContent]
constrain $ L.length (L.bfilter maxl etypes seq) .== fromIntegral n
As you can see, the list can be of any length between n+2 and n+6, the important bit is that it has the right count of [sIntro, sStart, sContent] elements within it.
It works all fine, except it's extremely slow. Like, for n=4 it takes a few seconds, but for n>=6 it takes forever (more than 30 minutes and still counting). If I remove the bounded filter constraint, the result is instant with n up to 25 or so.
In the end, I don't particularly care about using L.bfilter. All I need is a way to declare that the final symbolic list should contain exactly n elements of some given types.
-> Is there a faster way to be able to satisfy for count(sIntro || sStart || sContent)?
-- EDIT after discussion in comments:
The code below is supposed to make sure that all valid elements are up-front in the elts list. For example, if we count 8 valids elements from elts, then we take 8 elts and we count the validTaken valid elements in this sub-list. If the result is 8, it means that all the 8 valids elements are up-front in elts. Sadly, this results in a systematic Unsat outcome, even after removing all other constraints. The function works well when tested against some dummy lists of elements, though.
-- | test that all valid elements are upfront in the list of elements
validUpFront :: SInteger -> [Elem] -> SBool
validUpFront valids elts =
let takeValids = flip take elts <$> (fromInteger <$> unliteral valids)
validTaken = sum $ map (oneIf . included) $ fromMaybe [] takeValids
in valids .== validTaken
-- ...
answer n = runSMT $ do
-- ...
let valids = sum $ map (oneIf . included) elts :: SInteger
constrain $ validUpFront valids elts
Solvers for the sequence logic, while quite versatile, are notoriously slow. For this particular problem, I'd recommend using regular boolean logic, which will perform much better. Here's how I'd code your problem:
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE StandaloneDeriving #-}
import Data.SBV
import Data.SBV.Control
import Data.Maybe
import Control.Monad
data State = Intro | Start | Content | Comma | Dot
mkSymbolicEnumeration ''State
data Elem = Elem { included :: SBool
, element :: SState
}
new :: Symbolic Elem
new = do i <- free_
e <- free_
pure Elem {included = i, element = e}
get :: Elem -> Query (Maybe State)
get e = do isIn <- getValue (included e)
if isIn
then Just <$> getValue (element e)
else pure Nothing
answer :: Int -> IO [State]
answer n = runSMT $ do
let maxl = n+6
let minl = n+2
-- allocate upto maxl elements
elts <- replicateM maxl new
-- ask for at least minl of them to be valid
let valids :: SInteger
valids = sum $ map (oneIf . included) elts
constrain $ valids .>= fromIntegral minl
-- count the interesting ones
let isEtype e = included e .&& element e `sElem` [sIntro, sStart, sContent]
eTypeCount :: SInteger
eTypeCount = sum $ map (oneIf . isEtype) elts
constrain $ eTypeCount .== fromIntegral n
query $ do cs <- checkSat
case cs of
Sat -> catMaybes <$> mapM get elts
_ -> error $ "Query is " ++ show cs
Example run:
*Main> answer 5
[Intro,Comma,Comma,Intro,Intro,Intro,Start]
I've been able to run upto answer 500 which returned in about 5 seconds on my relatively old machine.
Making sure all valids are at the beginning
The easiest way to make all the valid elements are at the beginning of the list is to count the alternations in the included value, and make sure you allow only one such transition:
-- make sure there's at most one-flip in the sequence.
-- This'll ensure all the selected elements are up-front.
let atMostOneFlip [] = sTrue
atMostOneFlip (x:xs) = ite x (atMostOneFlip xs) (sAll sNot xs)
constrain $ atMostOneFlip (map included elts)
This'll make sure all the valids precede the suffix of the list that contain the invalid entries. When you write your other properties, you'd have to check that both the current element and the next element is valid. In template form:
foo (x:y:rest) = ((included x .&& included y) .=> (element y .== sStart .=> element x .== sDot))
.&& foo (y:rest)
By symbolically looking at the values of included x and included y, you can determine if they are both included, or if x is the last element, or if they're both out; and write the corresponding constraints as implications in each case. The above shows the case for when you're in the middle of the sequence somewhere, with both x and y included.
I am writing a small function in Haskell to check if a list is a palindrome by comparing it with it's reverse.
checkPalindrome :: [Eq a] -> Bool
checkPalindrome l = (l == reverse l)
where
reverse :: [a] -> [a]
reverse xs
| null xs = []
| otherwise = (last xs) : reverse newxs
where
before = (length xs) - 1
newxs = take before xs
I understand that I should use [Eq a] in the function definition because I use the equality operator later on, but I get this error when I compile:
Expected kind ‘*’, but ‘Eq a’ has kind ‘GHC.Prim.Constraint’
In the type signature for ‘checkPalindrome’:
checkPalindrome :: [Eq a] -> Bool
P.s Feel free to correct me if I am doing something wrong with my indentation, I'm very new to the language.
Unless Haskell adopted a new syntax, your type signature should be:
checkPalindrome :: Eq a => [a] -> Bool
Declare the constraint on the left hand side of a fat-arrow, then use it on the right hand side.
Unlike OO languages, Haskell makes a quite fundamental distinction between
Constraints – typeclasses like Eq.
Types – concrete types like Bool or lists of some type.
In OO languages, both of these would be represented by classes†, but a Haskell type class is completely different. You never have “values of class C”, only “types of class C”. (These concrete types may then contain values, but the classes don't.)
This distinction may seem pedantic, but it's actually very useful. What you wrote, [Eq a] -> Bool, would supposedly mean: each element of the list must be comparable... but comparable to what? You could have elements of different type in the list, how do you know that these elements are comparable to each other? In Haskell, that's no issue, because whenever the function is used you first settle on one type a. This type must be in the Eq class. The list then must have all elements from the same type a. This way you ensure that each element of the list is comparable to all of the others, not just, like, comparable to itself. Hence the signature
checkPalindrome :: Eq a => [a] -> Bool
This is the usual distinction on the syntax level: constraints must always‡ be written on the left of an => (implication arrow).
The constraints before the => are “implicit arguments”: you don't explicitly “pass Eq a to the function” when you call it, instead you just pass the stuff after the =>, i.e. in your example a list of some concrete type. The compiler will then look at the type and automatically look up its Eq typeclass instance (or raise a compile-time error if the type does not have such an instance). Hence,
GHCi, version 7.10.2: http://www.haskell.org/ghc/ :? for help
Prelude> let palin :: Eq a => [a] -> Bool; palin l = l==reverse l
Prelude> palin [1,2,3,2,1]
True
Prelude> palin [1,2,3,4,5]
False
Prelude> palin [sin, cos, tan]
<interactive>:5:1:
No instance for (Eq (a0 -> a0))
(maybe you haven't applied enough arguments to a function?)
arising from a use of ‘palin’
In the expression: palin [sin, cos, tan]
In an equation for ‘it’: it = palin [sin, cos, tan]
...because functions can't be equality-compared.
†Constraints may in OO also be interfaces / abstract base classes, which aren't “quite proper classes” but are still in many ways treated the same way as OO value-classes. Most modern OO languages now also support Haskell-style parametric polymorphism in addition to “element-wise”/covariant/existential polymorphism, but they require somewhat awkward extends trait-mechanisms because this was only implemented as an afterthought.
‡There are also functions which have “constraints in the arguments”, but that's a more advanced concept called rank-n polymorphism.
This is really an extended comment. Aside from your little type error, your function has another problem: it's extremely inefficient. The main problem is your definition of reverse.
reverse :: [a] -> [a]
reverse xs
| null xs = []
| otherwise = (last xs) : reverse newxs
where
before = (length xs) - 1
newxs = take before xs
last is O(n), where n is the length of the list. length is also O(n), where n is the length of the list. And take is O(k), where k is the length of the result. So your reverse will end up taking O(n^2) time. One fix is to just use the standard reverse function instead of writing your own. Another is to build up the result recursively, accumulating the result as you go:
reverse :: [a] -> [a]
reverse xs0 = go [] xs0
go acc [] = acc
go acc (x : xs) = go (x : acc) xs
This version is O(n).
There's another source of inefficiency in your implementation:
checkPalindrome l = (l == reverse l)
This isn't nearly as bad, but let's look at what it does. Suppose we have the string "abcdefedcba". Then we test whether "abcdefedcba" == "abcdefedcba". By the time we've checked half the list, we already know the answer. So we'd like to stop there! There are several ways to accomplish this. The simplest efficient one is probably to calculate the length of the list as part of the process of reversing it so we know how much we'll need to check:
reverseCount :: [a] -> (Int, [a])
reverseCount xs0 = go 0 [] xs0 where
go len acc [] = (len, acc)
go len acc (x : xs) = len `seq`
go (len + 1) (x : acc) xs
Don't worry about the len `seq` bit too much; that's just a bit of defensive programming to make sure laziness doesn't make things inefficient; it's probably not even necessary if optimizations are enabled. Now you can write a version of == that only looks at the first n elements of the lists:
eqTo :: Eq a => Int -> [a] -> [a] -> Bool
eqTo 0 _ _ = True
eqTo _ [] [] = True
eqTo n (x : xs) (y : ys) =
x == y && eqTo (n - 1) xs ys
eqTo _ _ _ = False
So now
isPalindrome xs = eqTo ((len + 1) `quot` 2) xs rev_xs
where
(len, rev_xs) = reverseCount xs
Here's another way, that's more efficient and arguably more elegant, but a bit tricky. We don't actually need to reverse the whole list; we only need to reverse half of it. This saves memory allocation. We can use a tortoise and hare trick:
splitReverse ::
[a] ->
( [a] -- the first half, reversed
, Maybe a -- the middle element
, [a] ) -- the second half, in order
splitReverse xs0 = go [] xs0 xs0 where
go front rear [] = (front, Nothing, rear)
go front (r : rs) [_] = (front, Just r, rs)
go front (r : rs) (_ : _ : xs) =
go (r : front) rs xs
Now
isPalindrome xs = front == rear
where
(front, _, rear) = splitReverse xs
Now for some numbers, using the test case
somePalindrome :: [Int]
somePalindrome = [1..10000] ++ [10000,9999..1]
Your original implementation takes 7.523s (2.316 mutator; 5.204 GC) and allocates 11 gigabytes to build the test list and check if it's a palindrome. My counting implementation takes less than 0.01s and allocates 2.3 megabytes. My tortoise and hare implementation takes less than 0.01s and allocates 1.7 megabytes.
I am using Data.MemoCombinators (https://hackage.haskell.org/package/data-memocombinators-0.3/docs/Data-MemoCombinators.html) to memoize a function that takes a set as its parameter and returns a set (this is a contrived example that does nothing but takes a long time to finish):
test s = case Set.toList s of
[] -> Set.singleton 0
[x] -> Set.singleton 1
(x:xs) -> test (Set.singleton x) `Set.union` test (Set.fromList xs)
Since Data.MemoCombinators does not implement a table for sets, I wanted to write my own:
{-# LANGUAGE RankNTypes #-}
import Data.MemoCombinators (Memo)
import qualified Data.MemoCombinators as Memo
import Data.Set (Set)
import qualified Data.Set as Set
set :: Ord a => Memo a -> ((Set a) -> r) -> (Set a) -> r
set m f = Memo.list m (f . Set.fromList) . Set.toList
and here is my test that was supposed to be memoized:
test s = set Memo.integral test' s
where
test' s = case Set.toList s of
[] -> Set.singleton 0
[x] -> Set.singleton 1
(x:xs) -> test (Set.singleton x) `Set.union` test (Set.fromList xs)
There is no documentation for Data.MemoCombinators that is clear to me, so basically I do not know exactly what I am doing.
My questions are:
what is the second parameter to the Memo.list function? Is it a memoizer for the elements of the list?
how to implement a table for a set directly, without using Memo.list? Here is would like to figure out how to implement memoization manually without using someone's library. For example, using a Map. I have seen examples that memoize integers using an infinite list but in case of a map I cannot figure out how to initialize the map and how to insert into it.
Thanks for any help.
what is the second parameter to the Memo.list function? Is it a memoizer for the elements of the list?
The first parameter m is the memoizer for the elements of the list. The second parameter f is the function that you want to apply to the list (and that will be memoized too).
how to implement a table for a set directly, without using Memo.list? Here is would like to figure out how to implement
memoization manually without using someone's library. For example,
using a Map. I have seen examples that memoize integers using an
infinite list but in case of a map I cannot figure out how to
initialize the map and how to insert into it.
Using the same strategy of Data.MemoCombinators, you can do something similar to want they do for lists. This approach does not use an explicit data structure for that, but explores the way Haskell keep things in memory and lazy evaluation.
set :: Ord a => Memo a -> Memo (Set a)
set m f = table (f Set.empty) (m (\x -> set m (f . (x `Set.insert`))))
where
table nil cons set | Set.null set = nil
| otherwise = uncurry cons (Set.deleteFindMin set)
You can also use memoization in Haskell using an explicit data structure (like a Map). I will use the Fibonacci example to demonstrate that, because it easier to benchmark, but it would be similar for other functions.
Let's start with the naive implementation:
fib0 :: Integer -> Integer
fib0 0 = 0
fib0 1 = 1
fib0 x = fib0 (x-1) + fib0 (x-2)
Then Data.MemoCombinators proposes this implementation:
import qualified Data.MemoCombinators as Memo
fib1 :: Integer -> Integer
fib1 = Memo.integral fib'
where
fib' 0 = 0
fib' 1 = 1
fib' x = fib1 (x-1) + fib1 (x-2)
And finally, my version using Map:
import Data.Map (Map)
import qualified Data.Map as Map
fib2 :: Integer -> Integer
fib2 = fst . fib' (Map.fromList [(0, 0),(1, 1)])
where
fib' m0 x | x `Map.member` m0 = (Map.findWithDefault 0 x m0, m0)
| otherwise = let (v1, m1) = fib' m0 (x-1)
(v2, m2) = fib' m1 (x-2)
y = v1 + v2
in (y, Map.insert x y m2)
Now, let's see how they perform:
fib0 40: 13.529371s
fib1 40: 0.000121s
fib2 40: 0.000048s
The fib0 was already too slow. Let's do a proper test with the other two:
fib1 400000: 6.234243s
fib2 400000: 4.022798s
fib1 500000: 8.683649s
fib2 500000: 5.781104s
The Map solution seem actually to outperform the Memo solution for all tests I performed. But I think the greatest advantage of Data.MemoCombinators is actually having this great performance without having to write much more code than the naive solution.
Updated: I changed the conclusions, because I was not doing the benchmark properly. I was doing several calls in the same execution, and in the case of 500000, whatever was the second call (either fib1 or fib2), that was taking too long.
What you have for test is fine, although normally you would define test as a function on sets using Set operations. Here is an example of what I'm talking about:
-- memoize a function on Set Int
foo = set M.integral foo'
where foo' s | Set.null s = 0
foo' s = let a = Set.findMin s
b = Set.findMax s
m = (a+b) `div` 2
(lo,found,hi) = Set.splitMember m s
in if a >= b
then 1
else (if found then 1 else 0) + foo lo + foo hi
This is a very inefficient way of counting the number of elements in a set, but note how foo' is defined in terms of Set operations.
Re your other questions:
what is the second parameter to the Memo.list function? Is it a memoizer for the elements of the list?
Memo.list has signature Memo a -> Memo [a], so in the expression Memo.list m f we have:
m :: Memo a
f :: [a] -> r -- some type r
Memo.list m f :: [a] -> r
So f is the function on [a] that you are memoizing, and m is a memoizer for functions taking a parameter of type a.
how to implement a table for a set directly?
It depends on what you mean by "directly". Memoizing in this fashion is going to involving creating an (possibly infinite) lazy data structure. The string, integral and list memoizers all use some form a lazy trie. This is very different from memoization in imperative languages where you explicitly check a hash map to see if you've already computed something and update that hash map with the function's value, etc. (Btw - you can do that sort of memoization in the ST or IO monads and it might work even better than the Data.Memocombinators approach - something to consider.)
Your idea of memoizing a Set a -> r function by passing to a list is a fine idea, but I would use to/from AscList:
set m f = Memo.list m (f . Set.fromAscList) . Set.toAscList
That way the set Set.fromList [3,4,5] will re-use the same part of the trie that was created to memoize the value for Set.fromList [3,4].
Haskell's expressiveness enables us to rather easily define a powerset function:
import Control.Monad (filterM)
powerset :: [a] -> [[a]]
powerset = filterM (const [True, False])
To be able to perform my task it is crucial for said powerset to be sorted by a specific function, so my implementation kind of looks like this:
import Data.List (sortBy)
import Data.Ord (comparing)
powersetBy :: Ord b => ([a] -> b) -> [a] -> [[a]]
powersetBy f = sortBy (comparing f) . powerset
Now my question is whether there is a way to only generate a subset of the powerset given a specific start and endpoint, where f(start) < f(end) and |start| < |end|. For example, my parameter is a list of integers ([1,2,3,4,5]) and they are sorted by their sum. Now I want to extract only the subsets in a given range, lets say 3 to 7. One way to achieve this would be to filter the powerset to only include my range but this seems (and is) ineffective when dealing with larger subsets:
badFunction :: Ord b => b -> b -> ([a] -> b) -> [a] -> [[a]]
badFunction start end f = filter (\x -> f x >= start && f x <= end) . powersetBy f
badFunction 3 7 sum [1,2,3,4,5] produces [[1,2],[3],[1,3],[4],[1,4],[2,3],[5],[1,2,3],[1,5],[2,4],[1,2,4],[2,5],[3,4]].
Now my question is whether there is a way to generate this list directly, without having to generate all 2^n subsets first, since it will improve performance drastically by not having to check all elements but rather generating them "on the fly".
If you want to allow for completely general ordering-functions, then there can't be a way around checking all elements of the powerset. (After all, how would you know the isn't a special clause built in that gives, say, the particular set [6,8,34,42] a completely different ranking from its neighbours?)
However, you could make the algorithm already drastically faster by
Only sorting after filtering: sorting is O (n · log n), so you want keep n low here; for the O (n) filtering step it matters less. (And anyway, number of elements doesn't change through sorting.)
Apply the ordering-function only once to each subset.
So
import Control.Arrow ((&&&))
lessBadFunction :: Ord b => (b,b) -> ([a]->b) -> [a] -> [[a]]
lessBadFunction (start,end) f
= map snd . sortBy (comparing fst)
. filter (\(k,_) -> k>=start && k<=end)
. map (f &&& id)
. powerset
Basically, let's face it, powersets of anything but a very small basis are infeasible. The particular application “sum in a certain range” is pretty much a packaging problem; there are quite efficient ways to do that kind of thing, but you'll have to give up the idea of perfect generality and of quantification over general subsets.
Since your problem is essentially a constraint satisfaction problem, using an external SMT solver might be the better alternative here; assuming you can afford the extra IO in the type and the need for such a solver to be installed. The SBV library allows construction of such problems. Here's one encoding:
import Data.SBV
-- c is the cost type
-- e is the element type
pick :: (Num e, SymWord e, SymWord c) => c -> c -> ([SBV e] -> SBV c) -> [e] -> IO [[e]]
pick begin end cost xs = do
solutions <- allSat constraints
return $ map extract $ extractModels solutions
where extract ts = [x | (t, x) <- zip ts xs, t]
constraints = do tags <- mapM (const free_) xs
let tagged = zip tags xs
finalCost = cost [ite t (literal x) 0 | (t, x) <- tagged]
solve [finalCost .>= literal begin, finalCost .<= literal end]
test :: IO [[Integer]]
test = pick 3 7 sum [1,2,3,4,5]
We get:
Main> test
[[1,2],[1,3],[1,2,3],[1,4],[1,2,4],[1,5],[2,5],[2,3],[2,4],[3,4],[3],[4],[5]]
For large lists, this technique will beat out generating all subsets and filtering; assuming the cost function generates reasonable constraints. (Addition will be typically OK, if you've multiplications, the backend solver will have a harder time.)
(As a side note, you should never use filterM (const [True, False]) to generate power-sets to start with! While that expression is cute and fun, it is extremely inefficient!)
In the following snippet:
import qualified Data.Set as Set
data Nat = Zero | Succ Nat deriving (Eq, Show, Ord)
instance Enum Nat where
pred (Succ x) = x
succ x = Succ x
toEnum 0 = Zero
toEnum x = Succ (toEnum (x-1))
fromEnum Zero = 0
fromEnum (Succ x) = 1 + (fromEnum x)
nats :: [Nat]
nats = [Zero ..]
natSet :: Set.Set Nat
natSet = Set.fromList nats
Why does:
elem (toEnum 100) nats == True
but
Set.member (toEnum 100) natSet never ends?
The existing answers are sufficient, but I want to expound a little bit on the behavior of Sets.
Looks like you are hoping for a lazy set of all Nats; you take the infinite list of all Nats and use Set.toList on it. That would be nice; mathematicians often talk in terms of the "set of all natural numbers". The problem is the implementation of Set is not as accommodating of laziness as lists are.
The implementation of Set is based on size balanced binary trees (or
trees of bounded balance)
The docs for Data.Set
Suppose you wish to lazily construct a binary tree from a list. Elements from the list would only be inserted into the tree when a deeper traversal of the tree was necessary. So then you ask if 100 is in the tree. It would go along adding numbers 1-99 to the tree, one at a time. Then it would finally add 100 to the tree, and discover that 100 is indeed an element in the tree. But notice what we did. We just performed an in-order traversal of the lazy list! So the first time, our imaginary LazyTree.contains would have roughly the same complexity as List.find (assuming an ammortized O(1) insert, which is a bad assumption for a simple binary tree, which would have O(log n) complexity). And without balancing, our tree would be very lopsided (we added the numbers 1 through 100 in order, so it would just be a big linked list down the right child of each branch). But with tree balancing during the traversal, it would be hard to know where to begin the traversal again; at least it certainly wouldn't be immediately intuitive.
tl;dr: nobody (afaik) has made a good lazy Set yet. So infinite Sets are more easily represented as infinite lists, for now.
Set.fromList is not lazy, so it will not end if you pass it an infinite list. But natSet is not constructed until it is needed, so you only notice it when you run Set.member on it.
For example, even Set.null $ Set.fromList [0..] does not terminate.
You can't have infinite sets. This doesn't just affect Set.member, whenever you do anything which will cause natSet to be evaluated even one step (even Set.null), it will go into an infinite loop.
Let's see what happens when we adapt GHC's Set code to accommodate infinite sets:
module InfSet where
data InfSet a = Bin a (InfSet a) (InfSet a)
-- create an infinite set by unfolding a value
ofUnfold :: (x -> (x, a, x)) -> x -> InfSet a
ofUnfold f x =
let (lx,a,rx) = f x
l = ofUnfold f lx
r = ofUnfold f rx
in Bin a l r
-- check for membership in the infinite set
member :: Ord a => a -> InfSet a -> Bool
member x (Bin y l r) = case compare x y of
LT -> member x l
GT -> member x r
EQ -> True
-- construct an infinite set representing a range of numbers
range :: Fractional a => (a, a) -> InfSet a
range = ofUnfold $ \(lo,hi) ->
let mid = (hi+lo) / 2
in ( (lo, mid), mid, (mid, hi) )
Note how, instead of constructing the infinite set from an infinite list,
I instead define a function ofUnfold to unfold a single value into an infinite list.
It allows us to construct both branches lazily in parallel (we don't need to finish
one branch before constructing another).
Let's give it a whirl:
ghci> :l InfSet
[1 of 1] Compiling InfSet ( InfSet.hs, interpreted )
Ok, modules loaded: InfSet.
ghci> let r = range (0,128)
ghci> member 64 r
True
ghci> member 63 r
True
ghci> member 62 r
True
ghci> member (1/2) r
True
ghci> member (3/4) r
True
Well, that seems to work. What if we try a value outside of the Set?
ghci> member 129 r
^CInterrupted.
That will just run and run and never quit. There's no stopping branches in the inifinite set,
so the search never quits. We could check the original range somehow, but that's not practical for infinite sets of discrete elements:
ghci> let ex = ofUnfold (\f -> ( f . (LT:), f [EQ], f . (GT:) )) id
ghci> :t ex
ex :: InfSet [Ordering]
ghci> member [EQ] ex
True
ghci> member [LT,EQ] ex
True
ghci> member [EQ,LT] ex
^CInterrupted.
So infinite sets are possible but I'm not sure they're useful.
I felt the same way so I added a set that works with infinite Lists. However they need to be sorted, so my algorithm knowns when to stop looking for more.
Prelude> import Data.Set.Lazy
Prelude Data.Set.Lazy> natset = fromList [1..]
Prelude Data.Set.Lazy> 100 `member` natset
True
Prelude Data.Set.Lazy> (-10) `member` natset
False
Its on hackage.
http://hackage.haskell.org/package/lazyset-0.1.0.0/docs/Data-Set-Lazy.html