Is there some way to reduce the pain of range tracking? - haskell

There's currently a pull request by Jonathan S. to replace the implementation of Data.IntMap with one explained in this README based on ideas from a blog post by Edward Kmett.
The basic concept Jonathan S. developed from is that an IntMap is a binary tree that looks like this (I've made some slight changes to his development for the sake of consistency):
data IntMap0 a = Empty | NonEmpty (IntMapNE0 a)
data IntMapNE0 a =
Tip !Int a
| Bin { lo :: !Int
, hi :: !Int
, left :: !(IntMapNE0 a)
, right :: !(IntMapNE0 a) }
In this representation, each node has a field indicating the least and greatest key contained in the IntMapNE0. Using just a little bit fiddling allows this to be used as a PATRICIA trie. Jonathan noted that this structure has almost twice as much range information as it needs. Following a left or right spine will produce all the same lo or hi bounds. So he cut those out by only including the bound not determined by the ancestors:
data IntMap1 a = Empty | NonEmpty { topLo :: !Int, child :: !(IntMapNE1 a) }
data IntMapNE1 a =
Tip a
| IntMapNE1 { bound :: !Int
, left :: !(IntMapNE1 a)
, right :: !(IntMapNE1 a)
Now each node has either a left bound or a right bound, but not both. A right child will have only a left bound, while a left child will have only a right bound.
Jonathan makes one further change, moving the values out of the leaves and into the internal nodes, which places them exactly where they are determined. He also uses phantom types to help track left and right. The final type (for now, anyway) is
data L
data R
newtype IntMap a = IntMap (IntMap_ L a) deriving (Eq)
data IntMap_ t a = NonEmpty !Int a !(Node t a) | Empty deriving (Eq)
data Node t a = Bin !Int a !(Node L a) !(Node R a) | Tip deriving (Eq, Show)
Certain aspects of this new implementation are quite attractive. Most importantly, many of the most-used operations are substantially faster. Less importantly, but very nicely, the bit fiddling involved is much easier to understand.
However, there is one serious pain point: passing the missing range information down through the tree. This isn't so bad for lookups, insertions, etc., but gets pretty seriously hairy in the union and intersection code. Is there some abstraction that would allow this to be done automatically?
A couple extremely vague thoughts:
Could the phantom types be used with a custom class to tie treatment directly to handedness?
The "missing piece" nature is somewhat reminiscent of some zippery situations. Might there be a way to use ideas from that realm?
I've started thinking about using an intermediate type of some sort to provide a symmetrical "view" of the structure, but I'm a bit stuck. I can fairly easily convert back and forth between the basic structure and the fancy one, but that conversion is recursive. I need a way to convert only partially, but I don't know nearly enough about fancily built types to get it done.

Is there some abstraction that would allow this to be done automatically?
You should be able to define a set of pattern synonyms that give you that. I’ll start from the second-to-last variant of your code, i.e.:
data IntMap1 a = Empty | NonEmpty { topLo :: !Int, child :: !(IntMapNE1 a) }
data IntMapNE1 a =
Tip a
| IntMapNE1 { bound :: !Int
, left :: !(IntMapNE1 a)
, right :: !(IntMapNE1 a)
We tuple such a value with the bound from the parent in an Either (indicating whether it is a low or a high bound).
viewLoHi (Left lo, IntMapNE1 hi left right)
= Just (lo, hi, (Left lo, left), (Right hi, right)
viewLoHi (Right hi, IntMapNE1 lo left right)
= Just (lo, hi, (Left lo, left), (Right hi, right)
viewLoHi _
= Nothing
pattern Bin' lo hi left right <- (viewLoHi -> Just (lo, hi, left, right))
The top-level data type is different, so it needs its own pattern synonym
viewLoHi' (NonEmpty lo child) = viewLoHi (Left lo, child)
viewLoHi' Empty = Nothing
pattern NonEmpty' lo hi left right <- (viewLoHi' -> Just (lo, hi, left, right)
Using only NonEmpty' and Bin' as you traverse the tree, the bookkeeping should now be completely hidden. (Code not tested, so there will be typos here)

Related

Function to measure size of a binary tree

When i tried to size (N a (left a) (right a)) instead of size (N a left right), i was told by ghci that this line conflicts when the definition. I am not sure why because in my data signature, it is N a (Tree a) (Tree a). size is a function to count the number of nodes in a bin tree.
data Tree a = Nil | N a (Tree a) (Tree a) deriving (Show, Read, Eq)
size :: Tree Int -> Int
size Nil = 0
size (N _ left right) = 1 + size left + size right
When i tried to size (N a (left a) (right a)) instead of size (N a left right)
left and right in this case are expressions of type Tree Int.
a is not a known variable or type in this context.
In case the definition is updated to size (N a left right), then a is a bound expression of type Int.
To help you see what’s going on, you could write your match internal nodes to name the left and right subtrees and their respective values with
size (N _ left#(N a _ _) right#(N b _ _)) = 1 + size left + size right
Section 3.17.1 “Patterns” describes what is happening with the at signs, which allow the programmer to name the left and right subtrees.
Patterns of the form var#pat are called as-patterns, and allow one to use var as a name for the value being matched by pat.
The broad approach is inelegant for a number of reasons.
left and right are already constrained to be of type Tree because of the declaration of the Tree algebraic datatype.
Much worse, you’d also have to define the other three cases of size for either one or two Nil arguments.
Section 3.17.2 Informal Semantics of Pattern Matching outlines the cases for how the language handles patterns. Of especial note to you in the context of this question are
1. Matching the pattern var against a value v always succeeds and binds var to v.
and
5. Matching the pattern con pat1 … patn against a value, where con is a constructor defined by data, depends on the value:
If the value is of the form con v1 … vn, sub-patterns are matched left-to-right against the components of the data value; if all matches succeed, the overall match succeeds; the first to fail or diverge causes the overall match to fail or diverge, respectively.
If the value is of the form con′ v1 … vn, where con is a different constructor to con′, the match fails.
If the value is ⊥, the match diverges.
The first is how you want to do it and how you wrote it in your question, by binding the left and right subtree to variables. Your first attempt looked vaguely like binding to a constructor, and that’s why you got a syntax error.
Haskell pattern matching can be more sophisticated, e.g., view patterns. For learning exercises, master the basics first.

Accessing values in haskell custom data type

I'm very new to haskell and need to use a specific data type for a problem I am working on.
data Tree a = Leaf a | Node [Tree a]
deriving (Show, Eq)
So when I make an instance of this e.g Node[Leaf 1, Leaf2, Leaf 3] how do I access these? It won't let me use head or tail or indexing with !! .
You perform pattern matching. For example if you want the first child, you can use:
firstChild :: Tree a -> Maybe (Tree a)
firstChild (Node (h:_)) = Just h
firstChild _ = Nothing
Here we wrap the answer in a Maybe type, since it is possible that we process a Leaf x or a Node [], such that there is no first child.
Or we can for instance obtain the i-th item with:
iThChild :: Int -> Tree a -> Tree a
iThChild i (Node cs) = cs !! i
So here we unwrap the Node constructor, obtain the list of children cs, and then perform cs !! i to obtain the i-th child. Note however that (!!) :: [a] -> Int -> a is usually a bit of an anti-pattern: it is unsafe, since we have no guarantees that the list contains enough elements, and using length is an anti-pattern as well, since the list can have infinite length, so we can no do such bound check.
Usually if one writes algorithms in Haskell, one tends to make use of linear access, and write total functions: functions that always return something.

Is there a sense of 'object equality' in Haskell?

If I have a singly linked list in Haskell:
data LL a = Empty | Node a (LL a) deriving (Show, Eq)
I can easily implement methods to insert at the end and at the beginning. But what about inserting after or before a particular element? If I have a LL of Integer, can I make a distinction in Haskell between inserting 4 after a particular node containing a 1, rather than the first 1 that it sees when processing the list?
Node 1 (Node 2 (Node 3 (Node 1 Empty)))
I'm curious how an insertAfter method would look that you would be able to specify "insert 5 after this particular node containing a 1". If I wanted to insert after the first node containing 1, would I have to pass in the entire list to specify this, and for the last node, only Node 1 Empty?
I'm not sure if it's right to address this as 'object equality'- but I'm wondering if there's a way to refer to particular elements of a type with the same payload in a data structure like this.
No, there is no such thing. The only way to tell apart values is by their structure; there is no identity like objects in some languages have. That is, there's no way you could tell apart these two values: (Just 5, Just 5) behaves exactly the same as let x = Just 5 in (x, x). Likewise, there is no difference between "this Node 1" and "some other Node 1": they are indistinguishable.
Usually the "solution" to this problem is to think of your problem in some other way so that there's no longer a need to distinguish based on identity (and usually there in fact is no need). But, as mentioned in the comments, you can emulate the "pointer" mechanic of other languages yourself, by generating distinct tags of some sort, eg increasing integers, and assigning one to each object so that you can tell them apart.
As others have pointed, in Haskell every value is immutable and there is no object.
To specify an unique node, you either need to specify it structually (the first node in the linked list that contains 1, for example) or give every node an extra tag somehow (simulating what happens in an imperative world) so that we can distinguish them.
To structurally distinguish a node from others, we basically need to know the location of
that node, e.g. a zipper that not only gives you the value at the point, but also its "neighborhoods".
And more detailed about "giving every node an extra tag":
First of all, you need to make every value an object, that requires you to generate unique tags at runtime. This is usually done by an allocator, the simplest allocator might just keep an integer, bump it when we need to create a new object:
-- | bumps counter
genId :: (Monad m, Functor m, Enum e) => StateT e m e
genId = get <* modify succ
-- | given a value, initializes a new node value
newNode :: (Monad m, Functor m, Enum e) => a -> StateT e m (a,e)
newNode x = genId >>= return . (x,)
And if you want to make an existing linked list work, we need to walk through it and give every node value a tag to make it an object:
-- | tags the llnked list with an extra value
tagged :: (Traversable f, Enum e, Monad m, Functor m)
=> f a -> StateT e m (f (a,e))
tagged = traverse newNode
And here is the full demo, it does look Maybe "a little" awkward:
{-# LANGUAGE DeriveFunctor, DeriveFoldable, DeriveTraversable, TupleSections #-}
import Control.Applicative
import Control.Monad.State hiding (mapM_)
import Data.Traversable
import Data.Foldable
import Prelude hiding (mapM_)
data LL a = Empty | Node a (LL a)
deriving (Show, Eq, Functor, Foldable, Traversable)
-- | bumps counter
genId :: (Monad m, Functor m, Enum e) => StateT e m e
genId = get <* modify succ
-- | given a value, initializes a new node value
newNode :: (Monad m, Functor m, Enum e) => a -> StateT e m (a,e)
newNode x = genId >>= return . (x,)
example :: LL Int
example = Node 1 (Node 2 (Node 3 (Node 1 Empty)))
-- | tags the llnked list with an extra value
tagged :: (Traversable f, Enum e, Monad m, Functor m)
=> f a -> StateT e m (f (a,e))
tagged = traverse newNode
insertAfter :: (a -> Bool) -> a -> LL a -> LL a
insertAfter cond e ll = case ll of
Empty -> Empty
Node v vs -> Node v (if cond v
then Node e vs
else insertAfter cond e vs)
demo :: StateT Int IO ()
demo = do
-- ll1 = Node (1,0) (Node (2,1) (Node (3,2) (Node (1,3) Empty)))
ll1 <- tagged example
nd <- newNode 10
let tagIs t = (== t) . snd
ll2 = insertAfter (tagIs 0) nd ll1
-- ll2 = Node (1,0) (Node (10,4) (Node (2,1) (Node (3,2) (Node (1,3) Empty))))
ll3 = insertAfter (tagIs 3) nd ll1
-- ll3 = Node (1,0) (Node (2,1) (Node (3,2) (Node (1,3) (Node (10,4) Empty))))
liftIO $ mapM_ print [ll1,ll2,ll3]
main :: IO ()
main = evalStateT demo (0 :: Int)
In this demo, tagIs is essentially doing the "object equality" thing because it is only interested in the extra tag we added before. Notice here I cheated in order to specify two nodes with their "values" being 1: one tagged 0 and the other tagged 3. Before running the program, it's impossible to tell what the actually tag would be. (Just like hard-coding a pointer value and hope it happens to work) In a more realistic setting, you would need another function to scan through the linked list and collect you a list of tags with a certain value (in this example, if you search the linked list to find all the nodes with "value" 1, you would have [0,3]) to work with.
"object equality" seems more like a concept from imperative programming languages, which assumes that there are allocators to offer "references" or "pointers" so that we can talk about "object equality". We have to simulate that allocator, I guess this is the thing that makes functional programming a little awkward to deal with it.
Kristopher Micinski remarked that you actually can do something similar with the ST monad, and you can do it with IO as well. Specifically, you can create an STRef or IORef, which is a sort of mutable box. The box can only be accessed using IO or ST actions as appropriate, which maintains the clean separation between "pure" and "impure" code. These references have identity—asking if two are equal tells you if they are actually the same box, rather than whether they have the same contents. But this is not really so pleasant, and not something you're likely to do without a good reason.
No, because it would break referential transparency. The results from calling a method with the same input multiple times should be indistinguishable, and it should be possible to replace it transparently with calling the method with that input once and then re-using the result. However, calling a method that returns some structure multiple times may produce a new copy of the structure every time -- structures with different "identity". If you could somehow tell that they have different identities, then it violates referential transparency.

Objects of multiple datatypes

I need to implement a chess game for a school assignment, and you have to make an interface that will work for other games on the same board. So, you have to implement chess pieces, but also pieces for other games.
I tried to do this:
data ChessPiece = King | Queen | Knight | Rook | Bishop | Pawn deriving (Enum, Eq, Show)
data Piece = ChessPiece | OtherGamePiece deriving (Enum, Eq, Show)
data ColoredPiece = White Piece | Black Piece
data Board = Board { boardData :: (Array Pos (Maybe ColoredPiece)) }
Then I try to load the begin f the chess game with:
beginBoard = Board (listArray (Pos 0 0, Pos 7 7) (pieces White ++ pawns White ++ space ++ pawns Black ++ pieces Black)) where
pieces :: (Piece -> ColoredPiece) -> [Maybe ColoredPiece]
pieces f = [Just (f Rook), Just (f Knight), Just (f Bishop), Just (f Queen), Just (f King), Just (f Bishop), Just (f Knight), Just (f Rook)]
pawns :: (Piece -> ColoredPiece) -> [Maybe ColoredPiece]
pawns f = (take 8 (repeat (Just (f Pawn))))
space = take 32 (repeat Nothing)
And I get the error "Couldn't match expected type Piece' with actual typeChessPiece'
In the first argument of f', namelyRook'
In the first argument of Just', namely(f Rook)'
In the expression: Just (f Rook)"
So, I've the feeling that the ChessPiece needs to be 'casted' to a (regular) Piece somehow.
(I know, I am using terms from imperative programming, but I hope that I make myself clear here, I will be happy to make my question clearer if needed).
Is the construct that I'm trying to make possible? (sort of like a class structure from OO languages, but then applied to datatypes, where one datatype is a sub-datatype from the other, and an object can be two datatypes at the same time. For example a Rook is a ChessPiece and therefore a Piece)
What am I doing wrong? Any suggestions on how to implement the structure I need?
What you are after is normally referred to as sub-typing. Most OO languages achieve sub-typing using sub-classes.
Haskell, however, is decidedly not an OO language; in fact, it does not really have any sort of sub-typing at all. Happily, you can usually achieve much the same effect using "parametric polymorphism". Now, "parametric polymorphism" is a scary-sounding term! What does it mean?
In fact, it has a very simple meaning: you can write code that works for all (concrete) types. The Maybe type, which you already know how to use, is a great example here. The type is defined as follows:
data Maybe a = Just a | Nothing
note how it is written as Maybe a rather than just Maybe; the a is a type variable. This means that, when you go to use Maybe, you can use it with any type. You can have a Maybe Int, a Maybe Bool, a Maybe [Int] and even a Maybe (Maybe (Maybe (Maybe Double))).
You can use this approach to define your board. For basic board functions, you do not care about what "piece" is actually on the board--there are some actions that make sense for any piece. On the other hand, if you do care about the type of the piece, you will be caring about what the type is exactly, because the rules for each game are going to be different.
This means that you can define your board with some type variable for pieces. Right now, your board representation looks like this:
data Board = Board {boardData :: Array Pos (Maybe ColoredPiece)}
since you want to generalize the board to any sort of piece, you need to add a type variable instead of specifying ColoredPiece:
data Board p = Board {boardData :: Array Pos p}
now you've defined a Board type for any piece type you could possibly imagine!
So, to use this board representation for chess pieces, you need to pass the type of the piece to your new Board type. This will look something like this:
type ChessBoard = Board ColoredPiece
(For reference, type just creates a synonym--now writing ChessBoard is completely equivalent to writing Board ColoredPiece.)
So now, whenever you have a chess board, use your new ChessBoard type.
Additionally, you can write some useful functions that work on any board. For example, let's imagine all you want to do is get a list of the pieces. The type of this function would then be:
listPieces :: Board p -> [p]
You can write a whole bunch of other similar functions that don't care about the actual piece by using type variables like p in your function types. This function will now work for any board you give it, including a Board ColoredPiece, otherwise know as ChessBoard.
In summary: you want to write your Board representation polymorphically. This lets you achieve the same effect as you wanted to try with sub-typing.
Tikhon's solution is the way to go. FYI though, note the difference between a type constructor and a data constructor. Right here, for example:
data ChessPiece = King | Queen | Knight | Rook | Bishop | Pawn deriving (Enum, Eq, Show)
data Piece = ChessPiece | OtherGamePiece deriving (Enum, Eq, Show)
This won't work because you're defining a type constructor called ChessPiece in the first line and a data constructor called ChessPiece in the other, and these aren't the same thing. The type constructor says something like: "a ChessPiece type can be a King, or a Queen, or a..." while the data constructor just creates generic data (that also happens to be called ChessPiece).
What you can do is redefine the first data constructor for the Piece type; some generic data called ChessPiece that carries some information about the type ChessPiece under the hood. The following typechecks:
data ChessPiece = King | Queen | Knight | Rook | Bishop | Pawn deriving (Enum, Eq, Show)
data Piece = ChessPiece ChessPiece | OtherGamePiece -- note the change
data ColoredPiece = White Piece | Black Piece
and you could alter your functions like so:
pieces :: (Piece -> ColoredPiece) -> [Maybe ColoredPiece]
pieces f = [Just (f (ChessPiece Rook)), Just (f (ChessPiece Knight)), Just (f (ChessPiece Bishop)), Just (f (ChessPiece Queen)), Just (f (ChessPiece King)), Just (f (ChessPiece Bishop)), Just (f (ChessPiece Knight)), Just (f (ChessPiece Rook))]
To make the difference between type and data constructors more obvious, here's a limited version that that uses different names for each:
data ChessRoyalty = King | Queen
data Piece = ChessPiece ChessRoyalty | OtherGamePiece
data ColoredPiece = White Piece | Black Piece

How do I use Haskell's type system to enforce correctness while still being able to pattern-match?

Let's say that I have an adt representing some kind of tree structure:
data Tree = ANode (Maybe Tree) (Maybe Tree) AValType
| BNode (Maybe Tree) (Maybe Tree) BValType
| CNode (Maybe Tree) (Maybe Tree) CValType
As far as I know there's no way of pattern matching against type constructors (or the matching functions itself wouldn't have a type?) but I'd still like to use the compile-time type system to eliminate the possibility of returning or parsing the wrong 'type' of Tree node. For example, it might be that CNode's can only be parents to ANodes. I might have
parseANode :: Parser (Maybe Tree)
as a Parsec parsing function that get's used as part of my CNode parser:
parseCNode :: Parser (Maybe Tree)
parseCNode = try (
string "<CNode>" >>
parseANode >>= \maybeanodel ->
parseANode >>= \maybeanoder ->
parseCValType >>= \cval ->
string "</CNode>"
return (Just (CNode maybeanodel maybeanoder cval))
) <|> return Nothing
According to the type system, parseANode could end up returning a Maybe CNode, a Maybe BNode, or a Maybe ANode, but I really want to make sure that it only returns a Maybe ANode. Note that this isn't a schema-value of data or runtime-check that I want to do - I'm actually just trying to check the validity of the parser that I've written for a particular tree schema. IOW, I'm not trying to check parsed data for schema-correctness, what I'm really trying to do is check my parser for schema correctness - I'd just like to make sure that I don't botch-up parseANode someday to return something other than an ANode value.
I was hoping that maybe if I matched against the value constructor in the bind variable, that the type-inferencing would figure out what I meant:
parseCNode :: Parser (Maybe Tree)
parseCNode = try (
string "<CNode>" >>
parseANode >>= \(Maybe (ANode left right avall)) ->
parseANode >>= \(Maybe (ANode left right avalr)) ->
parseCValType >>= \cval ->
string "</CNode>"
return (Just (CNode (Maybe (ANode left right avall)) (Maybe (ANode left right avalr)) cval))
) <|> return Nothing
But this has a lot of problems, not the least of which that parseANode is no longer free to return Nothing. And it doesn't work anyways - it looks like that bind variable is treated as a pattern match and the runtime complains about non-exhaustive pattern matching when parseANode either returns Nothing or Maybe BNode or something.
I could do something along these lines:
data ANode = ANode (Maybe BNode) (Maybe BNode) AValType
data BNode = BNode (Maybe CNode) (Maybe CNode) BValType
data CNode = CNode (Maybe ANode) (Maybe ANode) CValType
but that kind of sucks because it assumes that the constraint is applied to all nodes - I might not be interested in doing that - indeed it might just be CNodes that can only be parenting ANodes. So I guess I could do this:
data AnyNode = AnyANode ANode | AnyBNode BNode | AnyCNode CNode
data ANode = ANode (Maybe AnyNode) (Maybe AnyNode) AValType
data BNode = BNode (Maybe AnyNode) (Maybe AnyNode) BValType
data CNode = CNode (Maybe ANode) (Maybe ANode) CValType
but then this makes it much harder to pattern-match against *Node's - in fact it's impossible because they're just completely distinct types. I could make a typeclass wherever I wanted to pattern-match I guess
class Node t where
matchingFunc :: t -> Bool
instance Node ANode where
matchingFunc (ANode left right val) = testA val
instance Node BNode where
matchingFunc (BNode left right val) = val == refBVal
instance Node CNode where
matchingFunc (CNode left right val) = doSomethingWithACValAndReturnABool val
At any rate, this just seems kind of messy. Can anyone think of a more succinct way of doing this?
I don't understand your objection to your final solution. You can still pattern match against AnyNodes, like this:
f (AnyANode (ANode x y z)) = ...
It's a little more verbose, but I think it has the engineering properties you want.
I'd still like to use the compile-time type system to eliminate the possibility of returning or parsing the wrong 'type' of Tree node
This sounds like a use case for GADTs.
{-# LANGUAGE GADTs, EmptyDataDecls #-}
data ATag
data BTag
data CTag
data Tree t where
ANode :: Maybe (Tree t) -> Maybe (Tree t) -> AValType -> Tree ATag
BNode :: Maybe (Tree t) -> Maybe (Tree t) -> BValType -> Tree BTag
CNode :: Maybe (Tree t) -> Maybe (Tree t) -> CValType -> Tree CTag
Now you can use Tree t when you don't care about the node type, or Tree ATag when you do.
An extension of keegan's answer: encoding the correctness properties of red/black trees is sort of a canonical example. This thread has code showing both the GADT and nested data type solution: http://www.reddit.com/r/programming/comments/w1oz/how_are_gadts_useful_in_practical_programming/cw3i9

Resources