what is the most appropriate data structure supporting binary search to solve LIS? - haskell

I would like to solve the Longest increasing subsequence problem in Haskell, with the patience sorting algorithm.
I first did it with lists, and it worked in O(n^2) time.
Now I would like to do create an algorithm that solve it in O(n log n) time. To do it I need to get the "first fit" when I insert each value v, in other words finding the first pile whose last element is bigger than v in n log n time.
data OrderedStruct = undefined -- ???
-- I need a method to remove elements in (n log n) time
popFirstBigger :: Ord a -> a -> OrderedStruct a -> (Maybe a, OrderedStruct a)
popFirstBigger a t = undefined
-- and one to insert them in (n log n) or faster
insert :: Ord a -> a -> OrderedStruct a -> OrderedStruct a
insert a t = undefined
I could do it with a balanced binary search tree, but I would like to know if it exists a shortest way.
For example, any structure that I could use in a dichotomic search would be sufficient.
Does such a data structure exists in standard Haskell (Seq for example?)
Otherwise which data structure could I use?

Data.Set offers insert, delete, and lookupGT, all in O(log n) time.

Related

Haskell taking in two list with a int and returning a tuple

I am trying to learn haskell and saw a exercise which says
Write two different Haskell functions having the same type:
[a] -> [b] -> Int -> (a,b)
So from my understanding the expressions should take in two lists, an int and return a tuple of the type of the lists.
What i tried so far was
together :: [a] -> [b] -> Int -> (a,b)
together [] [] 0 = (0,0)
together [b] [a] x = if x == a | b then (b,a) else (0,0)
I know I am way off but any help is appreciated!
First you need to make your mind up what the function should return. That is partly determined by the signature. But still you can come up with a lot of functions that return different things, but have the same signature.
Here one of the most straightforward functions is probably to return the elements that are placed on the index determined by the third parameter.
It makes no sense to return (0,0), since a and b are not per se numerical types. Furthermore if x == a | b is not semantically valid. You can write this as x == a || x == b, but this will not work, since a and b are not per se Ints.
We can implement a function that returns the heads of the two lists in case the index is 0. In case the index is negative, or at least one of the two lists is exhausted, then we can raise an error. I leave it as an exercise what to do in case the index is greater than 0:
together :: [a] -> [b] -> Int -> (a,b)
together [] _ = error "List exhausted"
together _ [] = error "List exhausted"
together (a:_) (b:_) 0 = (a, b)
together (a:_) (b:_) n | n < 0 = error "Negative index!"
| …
you thus still need to fill in the ….
I generally dislike those "write any function with this signature"-type excercises precisely because of how arbitrary they are. You're supposed to figure out a definition that would make sense for that particular signature and implement it. In a lot of cases, you can wing it by ignoring as many arguments as possible:
fa :: [a] -> [b] -> Int -> (a,b)
fa (a:_) (b:_) _ = (a,b)
fa _ _ _ = error "Unfortunately, this function can't be made total because lists can be empty"
The error here is the important bit to note. You attempted to go around that problem by returning 0s, but this will only work when 0 is a valid value for types of a and b. Your next idea could be some sort of a "Default" value, but not every type has such a concept. The key observation is that without any knowledge about a type, in order to produce a value from a function, you need to get this value from somewhere else first*.
If you actually wanted a more sensible definition, you'd need to think up a use for that Int parameter; maybe it's the nth element from each
list? With the help of take :: Int -> [a] -> [a] and head :: [a] -> a this should be doable as an excercise.
Again, your idea of comparing x with a won't work for all types; not every type is comparable with an Int. You might think that this would make generic functions awfully limited; that's the point where you typically learn about how to express certain expectations about the types you get, which will allow you to operate only on certain subsets of all possible types.
* That's also the reason why id :: a -> a has only one possible implementation.
Write two different Haskell functions having the same type:
[a] -> [b] -> Int -> (a,b)
As Willem and Bartek have pointed out, there's a lot of gibberish functions that have this type.
Bartek took the approach of picking two based on what the simplest functions with that type could look like. One was a function that did nothing but throw an error. And one was picking the first element of each list, hoping they were not empty and failing otherwise. This is a somewhat theoretical approach, since you probably don't ever want to use those functions in practice.
Willem took the approach of suggesting an actually useful function with that type and proceeded to explore how to exhaust the possible patterns of such a function: For lists, match the empty list [] and the non-empty list a:_, and for integers, match some stopping point, 0 and some categories n < 0 and ….
A question that arises to me is if there is any other equally useful function with this type signature, or if a second function would necessarily have to be hypothetically constructed. It would seem natural that the Int argument has some relation to the positions of elements in [a] and [b], since they are also integers, especially because a pair of single (a,b) is returned.
But the only remotely useful functions (in the sense of not being completely silly) that I can think of are small variations of this: For example, the Int could be the position from the end rather than from the beginning, or if there's not enough elements in one of the lists, it could default to the last element of a list rather than an error. Neither of these are very pleasing to make ("from the end" conflicts with the list being potentially infinite, and having a fall-back to the last element of a list conflicts with the fact that lists don't necessarily have a last element), so it is tempting to go with Bartek's approach of writing the simplest useless function as the second one.

Remove root element from a heap tree

How do I remove the smallest element of an heap tree?
This element is at the root of the tree. If I remove that, I'm left with two independent subtrees.
data Heap a = Empty
| Node a (Heap a) (Heap a)
The type of the function is:
removeMin :: Heap a -> (a, Heap a)
It should return the tree and the minimum removed.
Should I make an auxiliary function to build a new tree, or is there a faster way to do this?
Your type, as written, raises some questions:
Q: What's the output from removeMin Empty?
A: You can't produce an a from nothing, so the result should be wrapped in Maybe.
Q: If I've put (+), (-) and (*) in a Heap (Int -> Int -> Int), which one should be returned by removeMin?
A: Not all data types have an ordering (notably, functions lack one), so it makes sense to require that the data type have an Ord instance.
So the updated type becomes:
removeMin :: Ord a => Heap a -> Maybe (a, Heap a)
Now consider it case by case:
Empty has no min element:
removeMin Empty = Nothing
If one branch is empty, the remaining heap is the other branch
removeMin (Node a Empty r) = Just (a, r)
removeMin (Node a l Empty) = Just (a, l)
Convince yourself that this works for Node a Empty Empty.
If neither branch is empty, then the new smallest min element must be the root of one of the branches.
The branches in the resulting Heap are just the branch of the larger element, and the branch of the smaller element, with its minimum removed.
Fortunately, we already have a helper to remove the minimum from a Heap!
removeMin (Node a l#(Node la _ _) r#(Node ra _ _)) = Just (a, Node mina maxN minN')
where (minN, maxN) = if la <= ra then (l,r) else (r,l)
Just (mina, minN') = removeMin minN
Now, while this produces a valid heap, it's not necessarily the best algorithm because it's not guaranteed to produce a balanced heap. A poorly balanced heap is no better than a linked list, giving you O(n) insertion and deletion times where a balanced heap can give you O(log n).
You should build an appropriate function to build new tree, but don't worry- it will not perform poorly. GHC can optimize such use cases and this operation could be just as fast as you want (including large, even infinite (recursive) data structures).
I understand you are able to create such auxiliary function by yourself? It is straightforward - anyway, in case of troubles I can write it later.
Think of it this way: After removing the top node, you're left with two heaps. So you need to implement (recursive) merging of two heaps, something like
merge :: (Ord a) => Heap a -> Heap a -> Heap a
You could also implement a monoid instance for Heap
instance (Ord a) => Monoid (Heap a) where
mempty = Empty
mappend = -- the merging function

Set-Like data structure that maintains insertion Order?

The properties I'm looking for are
initially maintains insertion order
transversing in the insertion order
and of course maintain that each element is unique
But there are cases where It's okay to disregard insertion order, such as...
retrieving a difference between two different sets
performing a union the two sets eliminating any duplicates
Java's LinkedHashSet seems to be exactly what I'm after, except for the fact it's not written in Haskell.
current & initial solution
The easiest (and a relevantly inefficient) solution is to implement it as a list and transform it into a set when I need too, but I believe there is likely a better way.
other ideas
My first idea was to implement it as a Data.Set of a newtype of (Int, a) where it would be ordered by the first tuple index, and the second index (a) being the actual value. I quickly realised this wasn't going to work because as the set would allow for duplicates of the type a, which would have defeated the whole purpose of using a set.
simultaneously maintaining a list and a set? (nope)
Another Idea I had was have an abstract data type that would maintain both a list and set representation of the data, which doesn't sound to efficient either.
recap
Are there any descent implementations of such a data structure in Haskell? I've seen Data.List.Ordered but it seems to just add set operations to lists, which sounds terribly inefficient as well (but likely what I'll settle with if I can't find a solution). Another solution suggested here, was to implement it via finger tree, but I would prefer to not reimplement it if it's already a solved problem.
You can certainly use Data.Set with what is isomorphic to (Int, a), but wrapped in a newtype with different a Eq instance:
newtype Entry a = Entry { unEntry :: (Int, a) } deriving (Show)
instance Eq a => Eq (Entry a) where
(Entry (_, a)) == (Entry (_, b)) = a == b
instance Ord a => Ord (Entry a) where
compare (Entry (_, a)) (Entry (_, b)) = compare a b
But this won't quite solve all your problems if you want automatic incrementing of your index, so you could make a wrapper around (Set (Entry a), Int):
newtype IndexedSet a = IndexedSet (Set (Entry a), Int) deriving (Eq, Show)
But this does mean that you'll have to re-implement Data.Set to respect this relationship:
import qualified Data.Set as S
import Data.Set (Set)
import Data.Ord (comparing)
import Data.List (sortBy)
-- declarations from above...
null :: IndexedSet a -> Bool
null (IndexedSet (set, _)) = S.null set
-- | If you re-index on deletions then size will just be the associated index
size :: IndexedSet a -> Int
size (IndexedSet (set, _)) = S.size set
-- Remember that (0, a) == (n, a) for all n
member :: Ord a => a -> IndexedSet a -> Bool
member a (IndexedSet (set, _)) = S.member (Entry (0, a)) set
empty :: IndexedSet a
empty = IndexedSet (S.empty, 0)
-- | This function is critical, you have to make sure to increment the index
-- Might also want to consider making it strict in the i field for performance
insert :: Ord a => a -> IndexedSet a -> IndexedSet a
insert a (IndexedSet (set, i)) = IndexedSet (S.insert (Entry (i, a)) set, i + 1)
-- | Simply remove the `Entry` wrapper, sort by the indices, then strip those off
toList :: IndexedSet a -> [a]
toList (IndexedSet (set, _))
= map snd
$ sortBy (comparing fst)
$ map unEntry
$ S.toList set
But this is fairly trivial in most cases and you can add functionality as you need it. The only thing you'll need to really worry about is what to do in deletions. Do you re-index everything or are you just concerned about order? If you're just concerned about order, then it's simple (and size can be left sub-optimal by having to actually calculate the size of the underlying Set), but if you re-index then you can get your size in O(1) time. These sorts of decisions should be decided based on what problem you're trying to solve.
I would prefer to not reimplement it if it's already a solved problem.
This approach is definitely a re-implementation. But it isn't complicated in most of the cases, could be pretty easily turned into a nice little library to upload to Hackage, and retains a lot of the benefits of sets without much bookkeeping.

Haskell newbie on types

I'm completely new to Haskell (and more generally to functional programming), so forgive me if this is really basic stuff. To get more than a taste, I try to implement in Haskell some algorithmic stuff I'm working on. I have a simple module Interval that implements intervals on the line. It contains the type
data Interval t = Interval t t
the helper function
makeInterval :: (Ord t) => t -> t -> Interval t
makeInterval l r | l <= r = Interval l r
| otherwise = error "bad interval"
and some utility functions about intervals.
Here, my interest lies in multidimensional intervals (d-intervals), those objects that are composed of d intervals. I want to separately consider d-intervals that are the union of d disjoint intervals on the line (multiple interval) from those that are the union of d interval on d separate lines (track interval). With distinct algorithmic treatments in mind, I think it would be nice to have two distinct types (even if both are lists of intervals here) such as
import qualified Interval as I
-- Multilple interval
newtype MInterval t = MInterval [I.Interval t]
-- Track interval
newtype TInterval t = TInterval [I.Interval t]
to allow for distinct sanity checks, e.g.
makeMInterval :: (Ord t) => [I.Interval t] -> MInterval t
makeMInterval is = if foldr (&&) True [I.precedes i i' | (i, i') <- zip is (tail is)]
then (MInterval is)
else error "bad multiple interval"
makeTInterval :: (Ord t) => [I.Interval t] -> TInterval t
makeTInterval = TInterval
I now get to the point, at last! But some functions are naturally concerned with both multiple intervals and track intervals. For example, a function order would return the number of intervals in a multiple interval or a track interval. What can I do? Adding
-- Dimensional interval
data DInterval t = MIntervalStuff (MInterval t) | TIntervalStuff (TInterval t)
does not help much, since, if I understand well (correct me if I'm wrong), I would have to write
order :: DInterval t -> Int
order (MIntervalStuff (MInterval is)) = length is
order (TIntervalStuff (TInterval is)) = length is
and call order as order (MIntervalStuff is) or order (TIntervalStuff is) when is is a MInterval or a TInterval. Not that great, it looks odd. Neither I want to duplicate the function (I have many functions that are concerned with both multiple and track intevals, and some other d-interval definitions such as equal length multiple and track intervals).
I'm left with the feeling that I'm completely wrong and have missed some important point about types in Haskell (and/or can't forget enough here about OO programming). So, quite a newbie question, what would be the best way in Haskell to deal with such a situation? Do I have to forget about introducing MInterval and TInterval and go with one type only?
Thanks a lot for your help,
Garulfo
Edit: this is the same idea as sclv's answer; his links provide more information on this technique.
What about this approach?
data MInterval = MInterval --multiple interval
data TInterval = TInterval --track interval
data DInterval s t = DInterval [I.Interval t]
makeMInterval :: (Ord t) => [I.Interval t] -> Maybe (DInterval MInterval t)
makeMInterval is = if foldr (&&) True [I.precedes i i' | (i, i') <- zip is (tail is)]
then Just (DInterval is)
else Nothing
order :: DInterval s t -> Int
order (DInterval is) = length is
equalOrder :: DInterval s1 t -> DInterval s2 t -> Bool
equalOrder i1 i2 = order i1 == order i2
addToMInterval :: DInterval MInterval t -> Interval t -> Maybe (DInterval MInterval t)
addToMInterval = ..
Here the type DInterval represents multi-dimension intervals, but it takes an extra type parameter as a phantom type. This extra type information allows the typechecker to differentiate between different kinds of intervals even though they have exactly the same representation.
You get the type safety of your original design, but all your structures share the same implementation. Crucially, when the type of the interval doesn't matter, you can leave it unspecified.
I also changed the implementation of your makeMInterval function; returning a Maybe for a function like this is much more idiomatic than calling error.
More explanation on Maybe:
Let's examine your function makeInterval. This function is supposed to take a list of Interval's, and if they meet the criteria return a multiple interval, otherwise a track interval. This explanation leads to the type:
makeInterval :: (Ord t) =>
[I.Interval t] ->
Either (DInterval TInterval t) (DInterval MInterval t)
Now that we have the type, how can we implement it? We'd like to re-use our makeMInterval function.
makeInterval is = maybe
(Left $ DInterval TInterval is)
Right
(makeMInterval is)
The function maybe takes three arguments: a default b to use if the Maybe is Nothing, a function a -> b if the Maybe is Just a, and a Maybe a. It returns either the default or the result of applying the function to the Maybe value.
Our default is the track interval, so we create a Left track interval for the first argument. If the maybe is Just (DInterval MInterval t), the multiple interval already exists so all that's necessary is to stick it into the Right side of the either. Finally, makeMInterval is used to create the multiple interval.
What you want are types with the same structure and common operations, but which may be distinguished, and for which certain operations only make sense on one or another type. The idiom for this in Haskell is Phantom Types.
http://www.haskell.org/haskellwiki/Phantom_type
http://www.haskell.org/haskellwiki/Wrapper_types
http://blog.malde.org/index.php/2009/05/14/using-a-phantom-type-to-label-different-kinds-of-sequences/
http://neilmitchell.blogspot.com/2007/04/phantom-types-for-real-problems.html
What you want is a typeclass. Typeclasses is how overloading is done in Haskell. A type class is a common structure that is shared by many types. In your case you want to create a class of all types that have a order function:
class Dimensional a where
order :: a -> Int
This is saying there's a class of types a called Dimensional, and for each type belonging to this class there's a function a -> Int called order. Now you need to declare all your types as an belonging to the class:
instance Dimensional MInterval where
order (MInterval is) = length is
and so on. You can also declare functions that act on things that are from a given class, this way:
isOneDimensional :: (Dimensional a) => a -> Bool
isOneDimensional interval = (order interval == 1)
The type declaration can be read "for all types a that belong to typeclass Dimensional, this functions takes a and returns a Bool".
I like to think about classes as algebraic structures in mathematics (it's not quite the same thing as you can't enforce some logical constraints, but...): you state what are the requirements for a type to belong to some class (in the analogy, the operations must be defined over a set for it to belong to a certain algebraic construction), and then, for each specific type you state what is the specific implementation of the required functions (in the analogy, what are the specific functions for each specific set).
Take, for example, a group. Groups must have an identity, an inverse for each element and a multiplication:
class Group a where:
identity :: a
inverse :: a -> a
(<*>) :: a -> a -> a
and Integers form a group under addition:
instance Group Integer where:
identity = 0
inverse x = (- x)
x <*> y = x + y
Note that this will not solve the repetition, as you have to instantiate each type.
consider the use of typeclasses to show similarities between different types. See the haskell tutorial[1] about classes and overloading for help.
[1]: http://www.haskell.org/tutorial/classes.html haskell tutorial

How can iterative deepening search implemented efficient in haskell?

I have an optimization problem I want to solve. You have some kind of data-structure:
data Foo =
{ fooA :: Int
, fooB :: Int
, fooC :: Int
, fooD :: Int
, fooE :: Int
}
and a rating function:
rateFoo :: myFoo -> Int
I have to optimize the result of rateFoo by changing the values in the struct. In this specific case, I decided to use iterative deepening search to solve the problem. The (infinite) search tree for the best optimization is created by another function, which simply applies all possible changes recursivly to the tree:
fooTree :: Foo -> Tree
My searching function looks something like this:
optimize :: Int -> Foo -> Foo
optimize threshold foo = undefined
The question I had, before I start is this:
As the tree can be generated by the data at each point, is it possible to have only the parts of the tree generated, which are currently needed by the algorithm? Is it possible to have the memory freed and the tree regenerated if needed in order to save memory (A leave at level n can be generated in O(n) and n remains small, but not small enough to have the whole tree in memory over time)?
Is this something I can excpect from the runtime? Can the runtime unevaluate expressions (turn an evaluated expression into an unevaluated one)? Or what is the dirty hack I have to do for this?
The runtime does not unevaluate expressions.
There's a straightforward way to get what you want however.
Consider a zipper-like structure for your tree. Each node holds a value and a thunk representing down, up, etc. When you move to the next node, you can either move normally (placing the previous node value in the corresponding slot) or forgetfully (placing an expression which evaluates to the previous node in the right slot). Then you have control over how much "history" you hang on to.
Here's my advice:
Just implement your algorithm in the
most straightforward way possible.
Profile.
Optimize for speed or memory use if necessary.
I very quickly learned that I'm not smart and/or experienced enough to reason about what GHC will do or how garbage collection will work. Sometimes things that I'm sure will be disastrously memory-inefficient work smoothly the first time around, and–less often–things that seem simple require lots of fussing with strictness annotations, etc.
The Real World Haskell chapter on profiling and optimization is incredibly helpful once you get to steps 2 and 3.
For example, here's a very simple implementation of IDDFS, where f expands children, p is the search predicate, and x is the starting point.
search :: (a -> [a]) -> (a -> Bool) -> a -> Bool
search f p x = any (\d -> searchTo f p d x) [1..]
where
searchTo f p d x
| d == 0 = False
| p x = True
| otherwise = any (searchTo f p $ d - 1) (f x)
I tested by searching for "abbaaaaaacccaaaaabbaaccc" with children x = [x ++ "a", x ++ "bb", x ++ "ccc"] as f. It seems reasonably fast and requires very little memory (linear with the depth, I think). Why not try something like this first and then move to a more complicated data structure if it isn't good enough?

Resources