Why doesn't Haskell's Data.Set support infinite sets?

Why doesn't Haskell's Data.Set support infinite sets? - haskell

In the following snippet:
import qualified Data.Set as Set
data Nat = Zero | Succ Nat deriving (Eq, Show, Ord)
instance Enum Nat where
pred (Succ x) = x
succ x = Succ x
toEnum 0 = Zero
toEnum x = Succ (toEnum (x-1))
fromEnum Zero = 0
fromEnum (Succ x) = 1 + (fromEnum x)
nats :: [Nat]
nats = [Zero ..]
natSet :: Set.Set Nat
natSet = Set.fromList nats
Why does:
elem (toEnum 100) nats == True
but
Set.member (toEnum 100) natSet never ends?

The existing answers are sufficient, but I want to expound a little bit on the behavior of Sets.
Looks like you are hoping for a lazy set of all Nats; you take the infinite list of all Nats and use Set.toList on it. That would be nice; mathematicians often talk in terms of the "set of all natural numbers". The problem is the implementation of Set is not as accommodating of laziness as lists are.
The implementation of Set is based on size balanced binary trees (or
trees of bounded balance)
The docs for Data.Set
Suppose you wish to lazily construct a binary tree from a list. Elements from the list would only be inserted into the tree when a deeper traversal of the tree was necessary. So then you ask if 100 is in the tree. It would go along adding numbers 1-99 to the tree, one at a time. Then it would finally add 100 to the tree, and discover that 100 is indeed an element in the tree. But notice what we did. We just performed an in-order traversal of the lazy list! So the first time, our imaginary LazyTree.contains would have roughly the same complexity as List.find (assuming an ammortized O(1) insert, which is a bad assumption for a simple binary tree, which would have O(log n) complexity). And without balancing, our tree would be very lopsided (we added the numbers 1 through 100 in order, so it would just be a big linked list down the right child of each branch). But with tree balancing during the traversal, it would be hard to know where to begin the traversal again; at least it certainly wouldn't be immediately intuitive.
tl;dr: nobody (afaik) has made a good lazy Set yet. So infinite Sets are more easily represented as infinite lists, for now.

Set.fromList is not lazy, so it will not end if you pass it an infinite list. But natSet is not constructed until it is needed, so you only notice it when you run Set.member on it.
For example, even Set.null $ Set.fromList [0..] does not terminate.

You can't have infinite sets. This doesn't just affect Set.member, whenever you do anything which will cause natSet to be evaluated even one step (even Set.null), it will go into an infinite loop.

Let's see what happens when we adapt GHC's Set code to accommodate infinite sets:
module InfSet where
data InfSet a = Bin a (InfSet a) (InfSet a)
-- create an infinite set by unfolding a value
ofUnfold :: (x -> (x, a, x)) -> x -> InfSet a
ofUnfold f x =
let (lx,a,rx) = f x
l = ofUnfold f lx
r = ofUnfold f rx
in Bin a l r
-- check for membership in the infinite set
member :: Ord a => a -> InfSet a -> Bool
member x (Bin y l r) = case compare x y of
LT -> member x l
GT -> member x r
EQ -> True
-- construct an infinite set representing a range of numbers
range :: Fractional a => (a, a) -> InfSet a
range = ofUnfold $ \(lo,hi) ->
let mid = (hi+lo) / 2
in ( (lo, mid), mid, (mid, hi) )
Note how, instead of constructing the infinite set from an infinite list,
I instead define a function ofUnfold to unfold a single value into an infinite list.
It allows us to construct both branches lazily in parallel (we don't need to finish
one branch before constructing another).
Let's give it a whirl:
ghci> :l InfSet
[1 of 1] Compiling InfSet ( InfSet.hs, interpreted )
Ok, modules loaded: InfSet.
ghci> let r = range (0,128)
ghci> member 64 r
True
ghci> member 63 r
True
ghci> member 62 r
True
ghci> member (1/2) r
True
ghci> member (3/4) r
True
Well, that seems to work. What if we try a value outside of the Set?
ghci> member 129 r
^CInterrupted.
That will just run and run and never quit. There's no stopping branches in the inifinite set,
so the search never quits. We could check the original range somehow, but that's not practical for infinite sets of discrete elements:
ghci> let ex = ofUnfold (\f -> ( f . (LT:), f [EQ], f . (GT:) )) id
ghci> :t ex
ex :: InfSet [Ordering]
ghci> member [EQ] ex
True
ghci> member [LT,EQ] ex
True
ghci> member [EQ,LT] ex
^CInterrupted.
So infinite sets are possible but I'm not sure they're useful.

I felt the same way so I added a set that works with infinite Lists. However they need to be sorted, so my algorithm knowns when to stop looking for more.
Prelude> import Data.Set.Lazy
Prelude Data.Set.Lazy> natset = fromList [1..]
Prelude Data.Set.Lazy> 100 `member` natset
True
Prelude Data.Set.Lazy> (-10) `member` natset
False
Its on hackage.
http://hackage.haskell.org/package/lazyset-0.1.0.0/docs/Data-Set-Lazy.html

Related

Haskell nested function order

I'm trying to write a function in Haskell to generate multidimensional lists.
(Technically I'm using Curry, but my understanding is that it's mostly a superset of Haskell, and the thing I'm trying to do is common to Haskell as well.)
After a fair bit of head scratching, I realized my initial desired function (m_array generating_function list_of_dimensions, giving a list nested to a depth equal to length list_of_dimensions) was probably at odds with they type system itself, since (AFAICT) the nesting-depth of lists is part of its type, and my function wanted to return values whose nesting-depths differed based on the value of a parameter, meaning it wanted to return values whose types varied based on the value of a parameter, which (AFAICT) isn't supported in Haskell. (If I'm wrong, and this CAN be done, please tell me.) At this point I moved on to the next paragraph, but if there's a workaround I've missed that takes very similar parameters and still outputs a nested list, let me know. Like, maybe if you can encode the indices as some data type that implicitly includes the nesting level in its type, and is instantiated with e.g. dimensions 5 2 6 ..., maybe that'd work? Not sure.
In any case, I thought that perhaps I could encode the nesting-depth by nesting the function itself, while still keeping the parameters manageable. This did work, and I ended up with the following:
ma f (l:ls) idx = [f ls (idx++[i]) | i <- [0..(l-1)]]
However, so far it's still a little clunky to use: you need to nest the calls, like
ma (ma (ma (\_ i -> 0))) [2,2,2] []
(which, btw, gives [[[0,0],[0,0]],[[0,0],[0,0]]]. If you use (\_ i -> i), it fills the array with the indices of the corresponding element, which is a result I'd like to keep available, but could be a confusing example.)
I'd prefer to minimize the boilerplate necessary. If I can't just call
ma (\_ i -> i) [2,2,2]
I'd LIKE to be able to call, at worst,
ma ma ma (\_ i -> i) [2,2,2] []
But if I try that, I get errors. Presumably the list of parameters is being divvied up in a way that doesn't make sense for the function. I've spent about half an hour googling and experimenting, trying to figure out Haskell's mechanism for parsing strings of functions like that, but I haven't found a clear explanation, and understanding eludes me. So, the formal questions:
How does Haskell parse e.g. f1 f2 f3 x y z? How are the arguments assigned? Is it dependent on the signatures of the functions, or does it e.g. just try to call f1 with 5 arguments?
Is there a way of restructuring ma to permit calling it without parentheses? (Adding at most two helper functions would be permissible, e.g. maStart ma ma maStop (\_ i -> i) [1,2,3,4] [], if necessary.)

The function you want in your head-scratching paragraph is possible directly -- though a bit noisily. With GADTs and DataKinds, values can be parameterized by numbers. You won't be able to use lists directly, because they don't mention their length in their type, but a straightforward variant that does works great. Here's how it looks.
{-# Language DataKinds #-}
{-# Language GADTs #-}
{-# Language ScopedTypeVariables #-}
{-# Language StandaloneDeriving #-}
{-# Language TypeOperators #-}
import GHC.TypeLits
infixr 5 :+
data Vec n a where
O :: Vec 0 a -- O is supposed to look a bit like a mix of 0 and []
(:+) :: a -> Vec n a -> Vec (n+1) a
data FullTree n a where
Leaf :: a -> FullTree 0 a
Branch :: [FullTree n a] -> FullTree (n+1) a
deriving instance Show a => Show (Vec n a)
deriving instance Show a => Show (FullTree n a)
ma :: forall n a. ([Int] -> a) -> Vec n Int -> FullTree n a
ma f = go [] where
go :: [Int] -> Vec n' Int -> FullTree n' a
go is O = Leaf (f is)
go is (l :+ ls) = Branch [go (i:is) ls | i <- [0..l-1]]
Try it out in ghci:
> ma (\_ -> 0) (2 :+ 2 :+ 2 :+ O)
Branch [Branch [Branch [Leaf 0,Leaf 0],Branch [Leaf 0,Leaf 0]],Branch [Branch [Leaf 0,Leaf 0],Branch [Leaf 0,Leaf 0]]]
> ma (\i -> i) (2 :+ 2 :+ 2 :+ O)
Branch [Branch [Branch [Leaf [0,0,0],Leaf [1,0,0]],Branch [Leaf [0,1,0],Leaf [1,1,0]]],Branch [Branch [Leaf [0,0,1],Leaf [1,0,1]],Branch [Leaf [0,1,1],Leaf [1,1,1]]]]

A low-tech solution:
In Haskell, you can model multi-level lists by using the so-called free monad.
The base definition is:
data Free ft a = Pure a | Free (ft (Free ft a))
where ft can be any functor, but here we are interested in ft being [], that is the list functor.
So we define our multidimensional list like this:
import Control.Monad
import Control.Monad.Free
type Mll = Free [] -- Multi-Level List
The Mll type transformer happens to be an instance of the Functor, Foldable, Traversable classes, which can come handy.
To make an array of arbitrary dimension, we start with:
the list of dimensions, for example [5,2,6]
the filler function, which returns a value for a given set of indices
We can start by making a “grid” object, whose item at indices say [x,y,z] is precisely the [x,y,z] list. As we have a functor instance, we can complete the process by just applying fmap filler to our grid object.
This gives the following code:
makeNdArray :: ([Int] -> a) -> [Int] -> Mll a
makeNdArray filler dims =
let
addPrefix x (Pure xs) = Pure (x:xs)
addPrefix x (Free xss) = Free $ map (fmap (x:)) xss
makeGrid [] = Pure []
makeGrid (d:ds) = let base = 0
fn k = addPrefix k (makeGrid ds)
in Free $ map fn [base .. (d-1+base)]
grid = makeGrid dims
in
fmap filler grid -- because we are an instance of the Functor class
To visualize the resulting structure, it is handy to be able to remove the constructor names:
displayMll :: Show a => Mll a -> String
displayMll = filter (\ch -> not (elem ch "Pure Free")) . show
The resulting structure can easily be flattened if need be:
toListFromMll :: Mll a -> [a]
toListFromMll xs = foldr (:) [] xs
For numeric base types, we can get a multidimensional sum function “for free”, so to speak:
mllSum :: Num a => (Mll a) -> a
mllSum = sum -- because we are an instance of the Foldable class
-- or manually: foldr (+) 0
Some practice:
We use [5,2,6] as the dimension set. To visualize the structure, we associate a decimal digit to every index. We can pretend to have 1-base indexing by adding 111, because that way all the resulting numbers are 3 digits long, which makes the result easier to check. Extra newlines added manually.
$ ghci
GHCi, version 8.8.4: https://www.haskell.org/ghc/ :? for help
λ>
λ> dims = [5,2,6]
λ> filler = \[x,y,z] -> (100*x + 10*y + z + 111)
λ>
λ> mxs = makeNdArray filler dims
λ>
λ> displayMll mxs
"[[[111,112,113,114,115,116],[121,122,123,124,125,126]],
[[211,212,213,214,215,216],[221,222,223,224,225,226]],
[[311,312,313,314,315,316],[321,322,323,324,325,326]],
[[411,412,413,414,415,416],[421,422,423,424,425,426]],
[[511,512,513,514,515,516],[521,522,523,524,525,526]]]"
λ>
As mentioned above, we can flatten the structure:
λ>
λ> xs = toListFromMll mxs
λ> xs
[111,112,113,114,115,116,121,122,123,124,125,126,211,212,213,214,215,216,221,222,223,224,225,226,311,312,313,314,315,316,321,322,323,324,325,326,411,412,413,414,415,416,421,422,423,424,425,426,511,512,513,514,515,516,521,522,523,524,525,526]
λ>
or take its overall sum:
λ>
λ> sum mxs
19110
λ>
λ> sum xs
19110
λ>
λ>
λ> length mxs
60
λ>
λ> length xs
60
λ>

How do I fix ‘Eq a’ has kind ‘GHC.Prim.Constraint’ error in haskell?

I am writing a small function in Haskell to check if a list is a palindrome by comparing it with it's reverse.
checkPalindrome :: [Eq a] -> Bool
checkPalindrome l = (l == reverse l)
where
reverse :: [a] -> [a]
reverse xs
| null xs = []
| otherwise = (last xs) : reverse newxs
where
before = (length xs) - 1
newxs = take before xs
I understand that I should use [Eq a] in the function definition because I use the equality operator later on, but I get this error when I compile:
Expected kind ‘*’, but ‘Eq a’ has kind ‘GHC.Prim.Constraint’
In the type signature for ‘checkPalindrome’:
checkPalindrome :: [Eq a] -> Bool
P.s Feel free to correct me if I am doing something wrong with my indentation, I'm very new to the language.

Unless Haskell adopted a new syntax, your type signature should be:
checkPalindrome :: Eq a => [a] -> Bool
Declare the constraint on the left hand side of a fat-arrow, then use it on the right hand side.

Unlike OO languages, Haskell makes a quite fundamental distinction between
Constraints – typeclasses like Eq.
Types – concrete types like Bool or lists of some type.
In OO languages, both of these would be represented by classes†, but a Haskell type class is completely different. You never have “values of class C”, only “types of class C”. (These concrete types may then contain values, but the classes don't.)
This distinction may seem pedantic, but it's actually very useful. What you wrote, [Eq a] -> Bool, would supposedly mean: each element of the list must be comparable... but comparable to what? You could have elements of different type in the list, how do you know that these elements are comparable to each other? In Haskell, that's no issue, because whenever the function is used you first settle on one type a. This type must be in the Eq class. The list then must have all elements from the same type a. This way you ensure that each element of the list is comparable to all of the others, not just, like, comparable to itself. Hence the signature
checkPalindrome :: Eq a => [a] -> Bool
This is the usual distinction on the syntax level: constraints must always‡ be written on the left of an => (implication arrow).
The constraints before the => are “implicit arguments”: you don't explicitly “pass Eq a to the function” when you call it, instead you just pass the stuff after the =>, i.e. in your example a list of some concrete type. The compiler will then look at the type and automatically look up its Eq typeclass instance (or raise a compile-time error if the type does not have such an instance). Hence,
GHCi, version 7.10.2: http://www.haskell.org/ghc/ :? for help
Prelude> let palin :: Eq a => [a] -> Bool; palin l = l==reverse l
Prelude> palin [1,2,3,2,1]
True
Prelude> palin [1,2,3,4,5]
False
Prelude> palin [sin, cos, tan]
<interactive>:5:1:
No instance for (Eq (a0 -> a0))
(maybe you haven't applied enough arguments to a function?)
arising from a use of ‘palin’
In the expression: palin [sin, cos, tan]
In an equation for ‘it’: it = palin [sin, cos, tan]
...because functions can't be equality-compared.
†Constraints may in OO also be interfaces / abstract base classes, which aren't “quite proper classes” but are still in many ways treated the same way as OO value-classes. Most modern OO languages now also support Haskell-style parametric polymorphism in addition to “element-wise”/covariant/existential polymorphism, but they require somewhat awkward extends trait-mechanisms because this was only implemented as an afterthought.
‡There are also functions which have “constraints in the arguments”, but that's a more advanced concept called rank-n polymorphism.

This is really an extended comment. Aside from your little type error, your function has another problem: it's extremely inefficient. The main problem is your definition of reverse.
reverse :: [a] -> [a]
reverse xs
| null xs = []
| otherwise = (last xs) : reverse newxs
where
before = (length xs) - 1
newxs = take before xs
last is O(n), where n is the length of the list. length is also O(n), where n is the length of the list. And take is O(k), where k is the length of the result. So your reverse will end up taking O(n^2) time. One fix is to just use the standard reverse function instead of writing your own. Another is to build up the result recursively, accumulating the result as you go:
reverse :: [a] -> [a]
reverse xs0 = go [] xs0
go acc [] = acc
go acc (x : xs) = go (x : acc) xs
This version is O(n).
There's another source of inefficiency in your implementation:
checkPalindrome l = (l == reverse l)
This isn't nearly as bad, but let's look at what it does. Suppose we have the string "abcdefedcba". Then we test whether "abcdefedcba" == "abcdefedcba". By the time we've checked half the list, we already know the answer. So we'd like to stop there! There are several ways to accomplish this. The simplest efficient one is probably to calculate the length of the list as part of the process of reversing it so we know how much we'll need to check:
reverseCount :: [a] -> (Int, [a])
reverseCount xs0 = go 0 [] xs0 where
go len acc [] = (len, acc)
go len acc (x : xs) = len `seq`
go (len + 1) (x : acc) xs
Don't worry about the len `seq` bit too much; that's just a bit of defensive programming to make sure laziness doesn't make things inefficient; it's probably not even necessary if optimizations are enabled. Now you can write a version of == that only looks at the first n elements of the lists:
eqTo :: Eq a => Int -> [a] -> [a] -> Bool
eqTo 0 _ _ = True
eqTo _ [] [] = True
eqTo n (x : xs) (y : ys) =
x == y && eqTo (n - 1) xs ys
eqTo _ _ _ = False
So now
isPalindrome xs = eqTo ((len + 1) `quot` 2) xs rev_xs
where
(len, rev_xs) = reverseCount xs
Here's another way, that's more efficient and arguably more elegant, but a bit tricky. We don't actually need to reverse the whole list; we only need to reverse half of it. This saves memory allocation. We can use a tortoise and hare trick:
splitReverse ::
[a] ->
( [a] -- the first half, reversed
, Maybe a -- the middle element
, [a] ) -- the second half, in order
splitReverse xs0 = go [] xs0 xs0 where
go front rear [] = (front, Nothing, rear)
go front (r : rs) [_] = (front, Just r, rs)
go front (r : rs) (_ : _ : xs) =
go (r : front) rs xs
Now
isPalindrome xs = front == rear
where
(front, _, rear) = splitReverse xs
Now for some numbers, using the test case
somePalindrome :: [Int]
somePalindrome = [1..10000] ++ [10000,9999..1]
Your original implementation takes 7.523s (2.316 mutator; 5.204 GC) and allocates 11 gigabytes to build the test list and check if it's a palindrome. My counting implementation takes less than 0.01s and allocates 2.3 megabytes. My tortoise and hare implementation takes less than 0.01s and allocates 1.7 megabytes.

memoizing a function that takes a set as parameter

I am using Data.MemoCombinators (https://hackage.haskell.org/package/data-memocombinators-0.3/docs/Data-MemoCombinators.html) to memoize a function that takes a set as its parameter and returns a set (this is a contrived example that does nothing but takes a long time to finish):
test s = case Set.toList s of
[] -> Set.singleton 0
[x] -> Set.singleton 1
(x:xs) -> test (Set.singleton x) `Set.union` test (Set.fromList xs)
Since Data.MemoCombinators does not implement a table for sets, I wanted to write my own:
{-# LANGUAGE RankNTypes #-}
import Data.MemoCombinators (Memo)
import qualified Data.MemoCombinators as Memo
import Data.Set (Set)
import qualified Data.Set as Set
set :: Ord a => Memo a -> ((Set a) -> r) -> (Set a) -> r
set m f = Memo.list m (f . Set.fromList) . Set.toList
and here is my test that was supposed to be memoized:
test s = set Memo.integral test' s
where
test' s = case Set.toList s of
[] -> Set.singleton 0
[x] -> Set.singleton 1
(x:xs) -> test (Set.singleton x) `Set.union` test (Set.fromList xs)
There is no documentation for Data.MemoCombinators that is clear to me, so basically I do not know exactly what I am doing.
My questions are:
what is the second parameter to the Memo.list function? Is it a memoizer for the elements of the list?
how to implement a table for a set directly, without using Memo.list? Here is would like to figure out how to implement memoization manually without using someone's library. For example, using a Map. I have seen examples that memoize integers using an infinite list but in case of a map I cannot figure out how to initialize the map and how to insert into it.
Thanks for any help.

what is the second parameter to the Memo.list function? Is it a memoizer for the elements of the list?
The first parameter m is the memoizer for the elements of the list. The second parameter f is the function that you want to apply to the list (and that will be memoized too).
how to implement a table for a set directly, without using Memo.list? Here is would like to figure out how to implement
memoization manually without using someone's library. For example,
using a Map. I have seen examples that memoize integers using an
infinite list but in case of a map I cannot figure out how to
initialize the map and how to insert into it.
Using the same strategy of Data.MemoCombinators, you can do something similar to want they do for lists. This approach does not use an explicit data structure for that, but explores the way Haskell keep things in memory and lazy evaluation.
set :: Ord a => Memo a -> Memo (Set a)
set m f = table (f Set.empty) (m (\x -> set m (f . (x `Set.insert`))))
where
table nil cons set | Set.null set = nil
| otherwise = uncurry cons (Set.deleteFindMin set)
You can also use memoization in Haskell using an explicit data structure (like a Map). I will use the Fibonacci example to demonstrate that, because it easier to benchmark, but it would be similar for other functions.
Let's start with the naive implementation:
fib0 :: Integer -> Integer
fib0 0 = 0
fib0 1 = 1
fib0 x = fib0 (x-1) + fib0 (x-2)
Then Data.MemoCombinators proposes this implementation:
import qualified Data.MemoCombinators as Memo
fib1 :: Integer -> Integer
fib1 = Memo.integral fib'
where
fib' 0 = 0
fib' 1 = 1
fib' x = fib1 (x-1) + fib1 (x-2)
And finally, my version using Map:
import Data.Map (Map)
import qualified Data.Map as Map
fib2 :: Integer -> Integer
fib2 = fst . fib' (Map.fromList [(0, 0),(1, 1)])
where
fib' m0 x | x `Map.member` m0 = (Map.findWithDefault 0 x m0, m0)
| otherwise = let (v1, m1) = fib' m0 (x-1)
(v2, m2) = fib' m1 (x-2)
y = v1 + v2
in (y, Map.insert x y m2)
Now, let's see how they perform:
fib0 40: 13.529371s
fib1 40: 0.000121s
fib2 40: 0.000048s
The fib0 was already too slow. Let's do a proper test with the other two:
fib1 400000: 6.234243s
fib2 400000: 4.022798s
fib1 500000: 8.683649s
fib2 500000: 5.781104s
The Map solution seem actually to outperform the Memo solution for all tests I performed. But I think the greatest advantage of Data.MemoCombinators is actually having this great performance without having to write much more code than the naive solution.
Updated: I changed the conclusions, because I was not doing the benchmark properly. I was doing several calls in the same execution, and in the case of 500000, whatever was the second call (either fib1 or fib2), that was taking too long.

What you have for test is fine, although normally you would define test as a function on sets using Set operations. Here is an example of what I'm talking about:
-- memoize a function on Set Int
foo = set M.integral foo'
where foo' s | Set.null s = 0
foo' s = let a = Set.findMin s
b = Set.findMax s
m = (a+b) `div` 2
(lo,found,hi) = Set.splitMember m s
in if a >= b
then 1
else (if found then 1 else 0) + foo lo + foo hi
This is a very inefficient way of counting the number of elements in a set, but note how foo' is defined in terms of Set operations.
Re your other questions:
what is the second parameter to the Memo.list function? Is it a memoizer for the elements of the list?
Memo.list has signature Memo a -> Memo [a], so in the expression Memo.list m f we have:
m :: Memo a
f :: [a] -> r -- some type r
Memo.list m f :: [a] -> r
So f is the function on [a] that you are memoizing, and m is a memoizer for functions taking a parameter of type a.
how to implement a table for a set directly?
It depends on what you mean by "directly". Memoizing in this fashion is going to involving creating an (possibly infinite) lazy data structure. The string, integral and list memoizers all use some form a lazy trie. This is very different from memoization in imperative languages where you explicitly check a hash map to see if you've already computed something and update that hash map with the function's value, etc. (Btw - you can do that sort of memoization in the ST or IO monads and it might work even better than the Data.Memocombinators approach - something to consider.)
Your idea of memoizing a Set a -> r function by passing to a list is a fine idea, but I would use to/from AscList:
set m f = Memo.list m (f . Set.fromAscList) . Set.toAscList
That way the set Set.fromList [3,4,5] will re-use the same part of the trie that was created to memoize the value for Set.fromList [3,4].

Infinite lazy bitmap

I am trying to construct a lazy data structure that holds an infinite bitmap. I would like to support the following operations:
true :: InfBitMap
Returns an infinite bitmap of True, i.e. all positions should have value True.
falsify :: InfBitMap -> [Int] -> InfBitMap
Set all positions in the list to False. The list is possible infinite. For example, falsify true [0,2..] will return a list where all (and only) odd positions are True.
check :: InfBitMap -> Int -> Bool
Check the value of the index.
Here is what I could do so far.
-- InfBitMap will look like [(#), (#, #), (#, #, #, #)..]
type InfBitMap = [Seq Bool]
true :: InfBitMap
true = iterate (\x -> x >< x) $ singleton True
-- O(L * log N) where N is the biggest index in the list checked for later
-- and L is the length of the index list. It is assumed that the list is
-- sorted and unique.
falsify :: InfBitMap -> [Int] -> InfBitMap
falsify ls is = map (falsify' is) ls
where
-- Update each sequence with all indices within its length
-- Basically composes a list of (update pos False) for all positions
-- within the length of the sequence and then applies it.
falsify' is l = foldl' (.) id
(map ((flip update) False)
(takeWhile (< length l) is))
$ l
-- O(log N) where N is the index.
check :: InfBitMap -> Int -> Bool
check ls i = index (fromJust $ find ((> i) . length) ls) i
I am wondering if there is some Haskellish concept/data-structure that I am missing that would make my code more elegant / more efficient (constants do not matter to me, just order). I tried looking at Zippers and Lenses but they do not seem to help. I would like to keep the complexities of updates and checks logarithmic (maybe just amortized logarithmic).
Note: before someone suspects it, no this is not a homework problem!
Update:
It just occurred to me that check can be improved to:
-- O(log N) where N is the index.
-- Returns "collapsed" bitmap for later more efficient checks.
check :: InfBitMap -> Int -> (Bool, InfBitMap)
check ls i = (index l i, ls')
where
ls'#(l:_) = dropWhile ((<= i) . length) ls
Which can be turned into a Monad for code cleanliness.

A slight variation on the well-known integer trie seems to be applicable here.
{-# LANGUAGE DeriveFunctor #-}
data Trie a = Trie a (Trie a) (Trie a) deriving (Functor)
true :: Trie Bool
true = Trie True true true
-- O(log(index))
check :: Trie a -> Int -> a
check t i | i < 0 = error "negative index"
check t i = go t (i + 1) where
go (Trie a _ _) 1 = a
go (Trie _ l r) i = go (if even i then l else r) (div i 2)
--O(log(index))
modify :: Trie a -> Int -> (a -> a) -> Trie a
modify t i f | i < 0 = error "negative index"
modify t i f = go t (i + 1) where
go (Trie a l r) 1 = Trie (f a) l r
go (Trie a l r) i | even i = Trie a (go l (div i 2)) r
go (Trie a l r) i = Trie a l (go r (div i 2))
Unfortunately we can't use modify to implement falsify because we can't handle infinite lists of indices that way (all modifications have to be performed before an element of the trie can be inspected). Instead, we should do something more like a merge:
ascIndexModify :: Trie a -> [(Int, a -> a)] -> Trie a
ascIndexModify t is = go 1 t is where
go _ t [] = t
go i t#(Trie a l r) ((i', f):is) = case compare i (i' + 1) of
LT -> Trie a (go (2*i) l ((i', f):is)) (go (2*i+1) r ((i', f):is))
GT -> go i t is
EQ -> Trie (f a) (go (2*i) l is) (go (2*i+1) r is)
falsify :: Trie Bool -> [Int] -> Trie Bool
falsify t is = ascIndexModify t [(i, const False) | i <- is]
We assume strictly ascending indices in is, since otherwise we would skip places in the trie or even get non-termination, for example in check (falsify t (repeat 0)) 1.
The time complexities are a bit complicated by laziness. In check (falsify t is) index, we pay an additional cost of a constant log 2 index number of comparisons, and a further length (filter (<index) is) number of comparisons (i. e. the cost of stepping over all indices smaller than what we're looking up). You could say it's O(max(log(index), length(filter (<index) is)). Anyway, it's definitely better than the O(length is * log (index)) that we would get for a falsify implemented for finite is-es using modify.
We must keep in mind that tree nodes are evaluated once, and subsequent check-s for the same index after the first check are not paying any extra cost for falsify. Again, laziness makes this a bit complicated.
This falsify is also pretty well-behaved when we want to traverse a prefix of a trie. Take this toList function:
trieToList :: Trie a -> [a]
trieToList t = go [t] where
go ts = [a | Trie a _ _ <- ts]
++ go (do {Trie _ l r <- ts; [l, r]})
It's a standard breadth-first traversal, in linear time. The traversal time remains linear when we compute take n $ trieToList (falsify t is), since falsify incurs at most n + length (filter (<n) is) extra comparisons, which is at most 2 * n, assuming strictly increasing is.
(side note: the space requirement of breadth-first traversal is rather painful, but I can't see a simple way to help it, since iterative deepening is even worse here, because there the whole tree must be held in memory, while bfs only has to remember the bottom level of the tree).

One way to represent this is as a function.
true = const True
falsify ls is = \i -> not (i `elem` is) && ls i
check ls i = ls i
The true and falsify functions are nice and efficient. The check function can be as bad as linear. It's possible to improve the efficiency of the same basic idea. I like its elegance.

Is there an indexed list in Haskell and is it good or bad?

I am a new comer to the Haskell world and I am wondering if there is something like this:
data IndexedList a = IList Int [a]
findIndex::(Int->Int)->IndexedList a->(a,IndexedList a)
findIndex f (IList x l) = (l!!(f x), IList (f x) l)
next::IndexedList a->(a,IndexedList a)
next x = findIndex (+1) x
I've noticed that this kind of list is not purely functional but kind of useful for some applications. Should it be considered harmful?
Thanks,
Bob

It's certainly useful to have a list that comes equipped with a pointed to a particular location in the list. However, the way it's usually done in Haskell is somewhat different - rather than using an explicit pointer, we tend to use a zipper.
The list zipper looks like this
data ListZipper a = LZ [a] a [a] deriving (Show)
You should think of the middle field a as being the element that is currently pointed to, the first field [a] as being the elements before the current position, and the final field [a] as being the elements after the current position.
Usually we store the elements before the current one in reverse order, for efficiency, so that the list [0, 1, 2, *3*, 4, 5, 6] with a pointer to the middle element, would be stored as
LZ [2,1,0] 3 [4,5,6]
You can define functions that move the pointer to the left or right
left (LZ (a:as) b bs) = LZ as a (b:bs)
right (LZ as a (b:bs)) = LZ (a:as) b bs
If you want to move to the left or right n times, then you can do that with the help of a function that takes another function, and applies it n times to its argument
times n f = (!!n) . iterate f
so that to move left three times, you could use
>> let lz = LZ [2,1,0] 3 [4,5,6]
>> (3 `times` left) lz
LZ [] 0 [1,2,3,4,5,6]
Your two functions findIndex and next can be written as
next :: ListZipper a -> (a, ListZipper a)
next = findIndex 1
findIndex :: Int -> ListZipper a -> (a, ListZipper a)
findIndex n x = let y#(LZ _ a _) = (n `times` right) x in (a, y)

Contrary to what you think this list is in fact purely functional. The reason is that IList (f x) l creates a new list (and does not, as you may think, modify the current IndexedList). It is in general not that easy to create non-purely functional data structures or functions in Haskell, as long as you stay away from unsafePerformIO.
The reason I would recommend against using the IndexedList is that there is no assurance that the index is less than the length of the list. In this case the lookup l!!(f x) will fail with an exception, which is generally considered bad style in Haskell. An alternative could be to use a safe lookup, which returns a Maybe a like the following:
findIndex :: (Int -> Int) -> IndexedList a -> (Maybe a, IndexedList a)
findIndex f (IList i l) = (maybe_x, IList new_i l)
where
new_i = f i
maybe_x = if new_i < length l
then Just (l !! newI)
else Nothing
I can also not think of a usecase where such a list would be useful, but I guess I am limited by my creativity ;)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string