Adding an element to a list - haskell

I want to write a function that adds an item to a list in the correct order say 1 in [2, 3]. I am new to haskell and need help on how to do it without using Ord.

Your question makes no sense without using Ord.
Using Ord, the function you want is insert
The insert function takes an element and a list and inserts the element into the list at the last position where it is still less than or equal to the next element.

It's not hard to write a function that inserts an element into a sorted list. It would look something like this:
insert :: Ord a => a -> [a] -> [a]
insert x [] = [x]
insert x (y:ys)
| x > y = y : insert x ys
| otherwise = x : y : ys
However, this is unlikely to be efficient for your use case. The problem with a list is that you end up creating a new copy of a large part of the spine repeatedly with this kind of insertion problem. You also have to scan linearly across the list until you find the right location, which isn't the fastest way to search for the right location.
You might be better off using a data structure such as the one in Data.Set or Data.IntSet. These are typically O(log n) for insertion because they use trees or other data structures that permit more sharing than a list does, and find the right location fast.

Related

Haskell: Should I get "Stack space overflow" when constructing an IntMap from a list with a million values?

My problem is that when using any of the Map implementations in Haskell that I always get a "Stack space overflow" when working with a million values.
What I'm trying to do is process a list of pairs. Each pair contains two Ints (not Integers, I failed miserably with them so I tried Ints instead). I want to go through each pair in the list and use the first Int as a key. For each unique key I want to build up a list of second elements where each of the second elements are in a pair that have the same first element. So what I want at the end is a "Map" from an Int to a list of Ints. Here's an example.
Given a list of pairs like this:
[(1,10),(2,11),(1,79),(3,99),(1,42),(3,18)]
I would like to end up with a "Map" like this:
{1 : [42,79,10], 2 : [11], 3 : [18,99]}
(I'm using a Python-like notation above to illustrate a "Map". I know it ain't Haskell. It's just there for illustrative purposes.)
So the first thing I tried was my own hand built version where I sorted the list of pairs of Ints and then went through the list building up a new list of pairs but this time the second element was a list. The first element is the key i.e. the unique Int values of the first element of each pair and the second element is a list of the second values of each original pair which have the key as the first element.
So given a list of pairs like this:
[(1,10),(2,11),(1,79),(3,99),(1,42),(3,18)]
I end up with a list of pairs like this:
[(1, [42,79,10], (2, [11]), (3, [18,99])]
This is easy to do. But there is one problem. The performance of the "sort" function on the original list (of 10 million pairs) is shockingly bad. I can generate the original list of pairs in less than a second. I can process the sorted list into my hand built map in less than a second. However, sorting the original list of pairs takes 40 seconds.
So I thought about using one of the built-in "Map" data structures in Haskell to do the job. The idea being I build my original list of pairs and then using standard Map functions to build a standard Map.
And that's where it all went pear-shaped. It works well on a list of 100,000 values but when I move to 1 million values, I get a "Stack space overflow" error.
So here's some example code that suffers from the problem. Please, please note that is not the actual code that I want to implement. It is just a very simplified version of code for which the same problem exists. I don't really want to separate a million consecutive numbers into odd and even partitions!!
import Data.IntMap.Strict(IntMap, empty, findWithDefault, insert, size)
power = 6
ns :: [Int]
ns = [1..10^power-1]
mod2 n = if odd n then 1 else 0
mod2Pairs = zip (map mod2 ns) ns
-- Takes a list of pairs and returns a Map where the key is the unique Int values
-- of the first element of each pair and the value is a list of the second values
-- of each pair which have the key as the first element.
-- e.g. makeMap [(1,10),(2,11),(1,79),(3,99),(1,42),(3,18)] =
-- 1 -> [42,79,10], 2 -> [11], 3 -> [18,99]
makeMap :: [(Int,a)] -> IntMap [a]
makeMap pairs = makeMap' empty pairs
where
makeMap' m [] = m
makeMap' m ((a, b):cs) = makeMap' m' cs
where
bs = findWithDefault [] a m
m' = insert a (b:bs) m
mod2Map = makeMap mod2Pairs
main = do
print $ "Yowzah"
print $ "length mod2Pairs="++ (show $ length mod2Pairs)
print $ "size mod2Map=" ++ (show $ size mod2Map)
When I run this, I get:
"Yowzah"
"length mod2Pairs=999999"
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize -RTS' to increase it.
From the above output, it should be clear that the stack space overflow happens when I try to do "makeMap mod2Pairs".
But to my naive eye all this seems to do is go through a list of pairs and for each pair lookup a key (the first element of each pair) and A) if it doesn't find a match return an empty list or B) if it does find a match, return the list that has previously been inserted. In either case it "cons"'s the second element of the pair to the "found" list and inserts that back into the Map with the same key.
(PS instead of findWithDefault, I've also tried lookup and handled the Just and Nothing using case but to no avail.)
I've had a look through the Haskell documentation on the various Map implementations and from the point of view of performance in terms of CPU and memory (especially stack memory), it seems that A) a strict implementation and B) one where the keys are Ints would be the best. I have also tried Data.Map and Data.Strict.Map and they also suffer from the same problem.
I am convinced the problem is with the "Map" implementation. Am I right? Why would I get a stack overflow error i.e. what is the Map implementation doing in the background that is causing a stack overflow? Is it making lots and lots of recursive calls behind the scenes?
Can anyone help explain what is going on and how to get around the problem?
I don't have an old enough GHC to check (this works just fine for me, and I don't have 7.6.3 as you do), but my guess would be that your makeMap' is too lazy. Probably this will fix it:
makeMap' m ((a, b):cs) = m `seq` makeMap' m' cs
Without it, you are building up a million-deep nested thunk, and deeply-nested thunks is the traditional way to cause stack overflows in Haskell.
Alternately, I would try just replacing the whole makeMap implementation with fromListWith:
makeMap pairs = fromListWith (++) [(k, [v]) | (k, v) <- pairs]

Haskell Remove duplicates from list

I am new to Haskell and I am trying the below code to remove duplicates from a list. However it does not seem to work.
compress [] = []
compress (x:xs) = x : (compress $ dropWhile (== x) xs)
I have tried some search, all the suggestions use foldr/ map.head. Is there any implementation with basic constructs?
I think that the issue that you are referring to in your code is that your current implementation will only get rid of adjacent duplicates. As it was posted in a comment, the builtin function nub will eliminate every duplicate, even if it's not adjacent, and keep only the first occurrence of any element. But since you asked how to implement such a function with basic constructs, how about this?
myNub :: (Eq a) => [a] -> [a]
myNub (x:xs) = x : myNub (filter (/= x) xs)
myNub [] = []
The only new function that I introduced to you there is filter, which filters a list based on a predicate (in this case, to get rid of every element present in the rest of that list matching the current element).
I hope this helps.
First of all, never simply state "does not work" in a question. This leaves to the reader to check whether it's a compile time error, run time error, or a wrong result.
In this case, I am guessing it's a wrong result, like this:
> compress [1,1,2,2,3,3,1]
[1,2,3,1]
The problem with your code is that it removes successive duplicates, only. The first pair of 1s gets compressed, but the last lone 1 is not removed because of that.
If you can, sort the list in advance. That will make equal elements close, and then compress does the right job. The output will be in a different order, of course. There are ways to keep the order too if needed (start with zip [0..] xs, then sort, then ...).
If you can not sort becuase there is really no practical way to define a comparison, but only an equality, then use nub. Be careful that this is much less efficient than sorting & compressing. This loss of performance is intrinsic: without a comparator, you can only use an inefficient quadratic algorithm.
foldr and map are very basic FP constructs. However, they are very general and I found them a bit mind-bending to understand for a long time. Tony Morris' talk Explain List Folds to Yourself helped me a lot.
In your case an existing list function like filter :: (a -> Bool) -> [a] -> [a] with a predicate that exludes your duplicate could be used in lieu of dropWhile.

What datatype to choose for a dungeon map

As part of a coding challenge I have to implement a dungeon map.
I have already designed it using Data.Map as a design choice because printing the map was not required and sometimes I had to update an map tile, e.g. when an obstacle was destroyed.
type Dungeon = Map Pos Tile
type Pos = (Int,Int) -- cartesian coordinates
data Tile = Wall | Destroyable | ...
But what if I had to print it too - then I would have to use something like
elaboratePrint . sort $ fromList dungeon where elaboratePrint takes care of the linebreaks and makes nice unicode symbols from the tileset.
Another choice I considered would be a nested list
type Dungeon = [[Tile]]
This would have the disadvantage, that it is hard to update a single element in such a data structure. But printing then would be a simple one liner unlines . map show.
Another structure I considered was Array, but as I am not used to arrays a short glance at the hackage docs - i only found a map function that operated on indexes and one that worked on elements, unless one is willing to work with mutable arrays updating one element is not easy at first glance. And printing an array is also not clear how to do that fast and easily.
So now my question - is there a better data structure for representing a dungeon map that has the property of easy printing and easy updating single elements.
How about an Array? Haskell has real, 2-d arrays.
import Data.Array.IArray -- Immutable Arrays
Now an Array is indexed by any Ix a => a. And luckily, there is an instance (Ix a, Ix b) => Ix (a, b). So we can have
type Dungeon = Array (Integer, Integer) Tile
Now you construct one of these with any of several functions, the simplest to use being
array :: Ix i => (i, i) -> [(i, a)] -> Array i a
So for you,
startDungeon = array ( (0, 0), (100, 100) )
[ ( (x, y), Empty ) | x <- [0..100], y <- [0..100]]
And just substitute 100 and Empty for the appropriate values.
If speed becomes a concern, then it's a simple fix to use MArray and ST. I'd suggest not switching unless speed is actually a real concern here.
To address the pretty printing
import Data.List
import Data.Function
pretty :: Array (Integer, Integer) Tile -> String
pretty = unlines . map show . groupBy ((==) `on` snd.fst) . assoc
And map show can be turned in to however you want to format [Tile] into a row. If you decide that you really want these to be printed in an awesome and efficient manner (Console game maybe) you should look at a proper pretty printing library, like this one.
First — tree-likes such as Data.Map and lists remain the natural data structures for functional languages. Map is a bit of an overkill structure-wise if you only need rectangular maps, but [[Tile]] may actually be pretty fine. It has O(√n) for both random-access and updates, that's not too bad.
In particular, it's better than pure-functional updates of a 2D array (O(n))! So if you need really good performance, there's no way around using mutable arrays. Which isn't necessarily bad though, after all a game is intrinsically concerned with IO and state. What is good about Data.Array, as noted by jozefg, is the ability to use tuples as Ix indexes, so I would go with MArray.
Printing is easy with arrays. You probably only need rectangular parts of the whole map, so I'd just extract such slices with a simple list comprehension
[ [ arrayMap ! (x,y) | x<-[21..38] ] | y<-[37..47] ]
You already know how to print lists.

Getting constant length retrieve time constant with immutable lists in a functional programming context

I am currently facing the problem of having to make my calculations based on the length of a given list. Having to iterate over all the elements of the list to know its size is a big performance penalty as I'm using rather big lists.
What are the suggested approaches to the problem?
I guess I could always carry a size value together with the list so I know beforehand its size without having to compute it at the call site but that seems a brittle approach. I could also define a own type of list where each node has as property its the lists' size but then I'd lose the leverage provided by my programming language's libraries for standard lists.
How do you guys handle this on your daily routine?
I am currently using F#. I am aware I can use .NET's mutable (array) lists, which would solve the problem. I am way more interested, though, in the purely immutable functional approach.
The built-in F# list type doesn't have any caching of the length and there is no way to add that in some clever way, so you'll need to define your own type. I think that writing a wrapper for the existing F# list type is probably the best option.
This way, you can avoid explicit conversions - when you wrap the list, it will not actually copy it (as in svick's implementation), but the wrapper can easily cache the Length property:
open System.Collections
type LengthList<'T>(list:list<'T>) =
let length = lazy list.Length
member x.Length = length.Value
member x.List = list
interface IEnumerable with
member x.GetEnumerator() = (list :> IEnumerable).GetEnumerator()
interface seq<'T> with //'
member x.GetEnumerator() = (list :> seq<_>).GetEnumerator()
[<CompilationRepresentation(CompilationRepresentationFlags.ModuleSuffix)>]
module LengthList =
let ofList l = LengthList<_>(l)
let ofSeq s = LengthList<_>(List.ofSeq s)
let toList (l:LengthList<_>) = l.List
let length (l:LengthList<_>) = l.Length
The best way to work with the wrapper is to use LengthList.ofList for creating LengthList from a standard F# list and to use LengthList.toList (or just the List) property before using any functions from the standard List module.
However, it depends on the complexity of your code - if you only need length in a couple of places, then it may be easier to keep it separately and use a tuple list<'T> * int.
How do you guys handle this on your daily routine?
We don't, because this isn't a problem in daily routine. It sounds like a problem perhaps in limited domains.
If you created the lists recently, then you've probably already done O(N) work, so walking the list to get its length is probably not a big deal.
If you're making a few very large lists that are then not 'changing' much (obviously never changing, but I mean changing set of references to heads of lists that are used in your domain/algorithm), then it may make sense to just have a dictionary off to the side of reference-to-list-head*length tuples, and consult the dictionary when asking for lengths (doing the real work to walk them when needed, but caching results for future asks about the same list).
Finally, if you really are dealing with some algorithm that needs to be constantly updating the lists in play and constantly consulting the lengths, then create your own list-like data type (yes, you'll also need to write map/filter and any others).
(Very generally, I think it is typically best to use the built-in data structures 99.99% of the time. In the 0.01% of the time where you are developing an algorithm or bit of code that needs to be very highly optimized, then almost always you need to abandon built-in data structures (which are good enough for most cases) and use a custom data structure designed to solve the exact problem you are working on. Look to wikipedia or Okasaki's "'Purely Functional Data Structures" for ideas and inspriation in that case. But rarely go to that case.)
I don't see why carying the length around is a brittle approach. Try something like this (Haskell):
data NList a = NList Int [a]
nNil :: NList [a]
nNil = NList 0 []
nCons :: a -> NList a -> NList a
nCons x (NList n xs) = NList (n+1) (x:xs)
nHead :: NList a -> a
nHead (NList _ (x:_)) = x
nTail :: NList a -> NList a
nTail (NList n (_:xs)) = NList (n-1) xs
convert :: [a] -> NList a
convert xs = NList (length xs) xs
and so on. If this is in a library or module, you can make it safe (I think) by not exporting the constructor NList.
It may also be possible to coerce GHC into memoizing length, but I'm not sure how or when.
In F#, most List functions have an equivalent Seq functions. That means, you can just implement your own immutable linked list that carries the length with each node. Something like this:
type MyList<'T>(item : Option<'T * MyList<'T>>) =
let length =
match item with
| None -> 0
| Some (_, tail) -> tail.Length + 1
member this.Length = length
member private this.sequence =
match item with
| None -> Seq.empty
| Some (x, tail) ->
seq {
yield x
yield! tail.sequence
}
interface seq<'T> with
member this.GetEnumerator() =
(this.sequence).GetEnumerator()
member this.GetEnumerator() =
(this.sequence :> System.Collections.IEnumerable).GetEnumerator()
module MyList =
let rec ofList list =
match list with
| [] -> MyList None
| head::tail -> MyList(Some (head, ofList tail))

Does there exist something like (xs:x)

I'm new to Haskell. I know I can create a reverse function by doing this:
reverse :: [a] -> [a]
reverse [] = []
reverse (x:xs) = (Main.reverse xs) ++ [x]
Is there such a thing as (xs:x) (a list concatenated with an element, i.e. x is the last element in the list) so that I put the last list element at the front of the list?
rotate :: [a] -> [a]
rotate [] = []
rotate (xs:x) = [x] ++ xs
I get these errors when I try to compile a program containing this function:
Occurs check: cannot construct the infinite type: a = [a]
When generalising the type(s) for `rotate'
I'm also new to Haskell, so my answer is not authoritative. Anyway, I would do it using last and init:
Prelude> last [1..10] : init [1..10]
[10,1,2,3,4,5,6,7,8,9]
or
Prelude> [ last [1..10] ] ++ init [1..10]
[10,1,2,3,4,5,6,7,8,9]
The short answer is: this is not possible with pattern matching, you have to use a function.
The long answer is: it's not in standard Haskell, but it is if you are willing to use an extension called View Patterns, and also if you have no problem with your pattern matching eventually taking longer than constant time.
The reason is that pattern matching is based on how the structure is constructed in the first place. A list is an abstract type, which have the following structure:
data List a = Empty | Cons a (List a)
deriving (Show) -- this is just so you can print the List
When you declare a type like that you generate three objects: a type constructor List, and two data constructors: Empty and Cons. The type constructor takes types and turns them into other types, i.e., List takes a type a and creates another type List a. The data constructor works like a function that returns something of type List a. In this case you have:
Empty :: List a
representing an empty list and
Cons :: a -> List a -> List a
which takes a value of type a and a list and appends the value to the head of the list, returning another list. So you can build your lists like this:
empty = Empty -- similar to []
list1 = Cons 1 Empty -- similar to 1:[] = [1]
list2 = Cons 2 list1 -- similar to 2:(1:[]) = 2:[1] = [2,1]
This is more or less how lists work, but in the place of Empty you have [] and in the place of Cons you have (:). When you type something like [1,2,3] this is just syntactic sugar for 1:2:3:[] or Cons 1 (Cons 2 (Cons 3 Empty)).
When you do pattern matching, you are "de-constructing" the type. Having knowledge of how the type is structured allows you to uniquely disassemble it. Consider the function:
head :: List a -> a
head (Empty) = error " the empty list have no head"
head (Cons x xs) = x
What happens on the type matching is that the data constructor is matched to some structure you give. If it matches Empty, than you have an empty list. If if matches Const x xs then x must have type a and must be the head of the list and xs must have type List a and be the tail of the list, cause that's the type of the data constructor:
Cons :: a -> List a -> List a
If Cons x xs is of type List a than x must be a and xs must be List a. The same is true for (x:xs). If you look to the type of (:) in GHCi:
> :t (:)
(:) :: a -> [a] -> [a]
So, if (x:xs) is of type [a], x must be a and xs must be [a] . The error message you get when you try to do (xs:x) and then treat xs like a list, is exactly because of this. By your use of (:) the compiler infers that xs have type a, and by your use of
++, it infers that xs must be [a]. Then it freaks out cause there's no finite type a for which a = [a] - this is what he's trying to tell you with that error message.
If you need to disassemble the structure in other ways that don't match the way the data constructor builds the structure, than you have to write your own function. There are two functions in the standard library that do what you want: last returns the last element of a list, and init returns all-but-the-last elements of the list.
But note that pattern matching happens in constant time. To find out the head and the tail of a list, it doesn't matter how long the list is, you just have to look to the outermost data constructor. Finding the last element is O(N): you have to dig until you find the innermost Cons or the innermost (:), and this requires you to "peel" the structure N times, where N is the size of the list.
If you frequently have to look for the last element in long lists, you might consider if using a list is a good idea after all. You can go after Data.Sequence (constant time access to first and last elements), Data.Map (log(N) time access to any element if you know its key), Data.Array (constant time access to an element if you know its index), Data.Vector or other data structures that match your needs better than lists.
Ok. That was the short answer (:P). The long one you'll have to lookup a bit by yourself, but here's an intro.
You can have this working with a syntax very close to pattern matching by using view patterns. View Patterns are an extension that you can use by having this as the first line of your code:
{-# Language ViewPatterns #-}
The instructions of how to use it are here: http://hackage.haskell.org/trac/ghc/wiki/ViewPatterns
With view patterns you could do something like:
view :: [a] -> (a, [a])
view xs = (last xs, init xs)
someFunction :: [a] -> ...
someFunction (view -> (x,xs)) = ...
than x and xs will be bound to the last and the init of the list you provide to someFunction. Syntactically it feels like pattern matching, but it is really just applying last and init to the given list.
If you're willing to use something different from plain lists, you could have a look at the Seq type in the containers package, as documented here. This has O(1) cons (element at the front) and snoc (element at the back), and allows pattern matching the element from the front and the back, through use of Views.
"Is there such a thing as (xs:x) (a list concatenated with an element, i.e. x is the last element in the list) so that I put the last list element at the front of the list?"
No, not in the sense that you mean. These "patterns" on the left-hand side of a function definition are a reflection of how a data structure is defined by the programmer and stored in memory. Haskell's built-in list implementation is a singly-linked list, ordered from the beginning - so the pattern available for function definitions reflects exactly that, exposing the very first element plus the rest of the list (or alternatively, the empty list).
For a list constructed in this way, the last element is not immediately available as one of the stored components of the list's top-most node. So instead of that value being present in pattern on the left-hand side of the function definition, it's calculated by the function body onthe right-hand side.
Of course, you can define new data structures, so if you want a new list that makes the last element available through pattern-matching, you could build that. But there's be some cost: Maybe you'd just be storing the list backwards, so that it's now the first element which is not available by pattern matching, and requires computation. Maybe you're storing both the first and last value in the structures, which would require additional storage space and bookkeeping.
It's perfectly reasonable to think about multiple implementations of a single data structure concept - to look forward a little bit, this is one use of Haskell's class/instance definitions.
Reversing as you suggested might be much less efficient. Last is not O(1) operation, but is O(N) and that mean that rotating as you suggested becomes O(N^2) alghorhim.
Source:
http://www.haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/src/GHC-List.html#last
Your first version has O(n) complexity. Well it is not, becuase ++ is also O(N) operation
you should do this like
rotate l = rev l []
where
rev [] a = a
rev (x:xs) a = rev xs (x:a)
source : http://www.haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/src/GHC-List.html#reverse
In your latter example, x is in fact a list. [x] becomes a list of lists, e.g. [[1,2], [3,4]].
(++) wants a list of the same type on both sides. When you are using it, you're doing [[a]] ++ [a] which is why the compiler is complaining. According to your code a would be the same type as [a], which is impossible.
In (x:xs), x is the first item of the list (the head) and xs is everything but the head, i.e., the tail. The names are irrelevant here, you might as well call them (head:tail).
If you really want to take the last item of the input list and put that in the front of the result list, you could do something like:
rotate :: [a] -> [a]
rotate [] = []
rotate lst = (last lst):(rotate $ init lst)
N.B. I haven't tested this code at all as I don't have a Haskell environment available at the moment.

Resources