Is there a sense of 'object equality' in Haskell? - haskell

If I have a singly linked list in Haskell:
data LL a = Empty | Node a (LL a) deriving (Show, Eq)
I can easily implement methods to insert at the end and at the beginning. But what about inserting after or before a particular element? If I have a LL of Integer, can I make a distinction in Haskell between inserting 4 after a particular node containing a 1, rather than the first 1 that it sees when processing the list?
Node 1 (Node 2 (Node 3 (Node 1 Empty)))
I'm curious how an insertAfter method would look that you would be able to specify "insert 5 after this particular node containing a 1". If I wanted to insert after the first node containing 1, would I have to pass in the entire list to specify this, and for the last node, only Node 1 Empty?
I'm not sure if it's right to address this as 'object equality'- but I'm wondering if there's a way to refer to particular elements of a type with the same payload in a data structure like this.

No, there is no such thing. The only way to tell apart values is by their structure; there is no identity like objects in some languages have. That is, there's no way you could tell apart these two values: (Just 5, Just 5) behaves exactly the same as let x = Just 5 in (x, x). Likewise, there is no difference between "this Node 1" and "some other Node 1": they are indistinguishable.
Usually the "solution" to this problem is to think of your problem in some other way so that there's no longer a need to distinguish based on identity (and usually there in fact is no need). But, as mentioned in the comments, you can emulate the "pointer" mechanic of other languages yourself, by generating distinct tags of some sort, eg increasing integers, and assigning one to each object so that you can tell them apart.

As others have pointed, in Haskell every value is immutable and there is no object.
To specify an unique node, you either need to specify it structually (the first node in the linked list that contains 1, for example) or give every node an extra tag somehow (simulating what happens in an imperative world) so that we can distinguish them.
To structurally distinguish a node from others, we basically need to know the location of
that node, e.g. a zipper that not only gives you the value at the point, but also its "neighborhoods".
And more detailed about "giving every node an extra tag":
First of all, you need to make every value an object, that requires you to generate unique tags at runtime. This is usually done by an allocator, the simplest allocator might just keep an integer, bump it when we need to create a new object:
-- | bumps counter
genId :: (Monad m, Functor m, Enum e) => StateT e m e
genId = get <* modify succ
-- | given a value, initializes a new node value
newNode :: (Monad m, Functor m, Enum e) => a -> StateT e m (a,e)
newNode x = genId >>= return . (x,)
And if you want to make an existing linked list work, we need to walk through it and give every node value a tag to make it an object:
-- | tags the llnked list with an extra value
tagged :: (Traversable f, Enum e, Monad m, Functor m)
=> f a -> StateT e m (f (a,e))
tagged = traverse newNode
And here is the full demo, it does look Maybe "a little" awkward:
{-# LANGUAGE DeriveFunctor, DeriveFoldable, DeriveTraversable, TupleSections #-}
import Control.Applicative
import Control.Monad.State hiding (mapM_)
import Data.Traversable
import Data.Foldable
import Prelude hiding (mapM_)
data LL a = Empty | Node a (LL a)
deriving (Show, Eq, Functor, Foldable, Traversable)
-- | bumps counter
genId :: (Monad m, Functor m, Enum e) => StateT e m e
genId = get <* modify succ
-- | given a value, initializes a new node value
newNode :: (Monad m, Functor m, Enum e) => a -> StateT e m (a,e)
newNode x = genId >>= return . (x,)
example :: LL Int
example = Node 1 (Node 2 (Node 3 (Node 1 Empty)))
-- | tags the llnked list with an extra value
tagged :: (Traversable f, Enum e, Monad m, Functor m)
=> f a -> StateT e m (f (a,e))
tagged = traverse newNode
insertAfter :: (a -> Bool) -> a -> LL a -> LL a
insertAfter cond e ll = case ll of
Empty -> Empty
Node v vs -> Node v (if cond v
then Node e vs
else insertAfter cond e vs)
demo :: StateT Int IO ()
demo = do
-- ll1 = Node (1,0) (Node (2,1) (Node (3,2) (Node (1,3) Empty)))
ll1 <- tagged example
nd <- newNode 10
let tagIs t = (== t) . snd
ll2 = insertAfter (tagIs 0) nd ll1
-- ll2 = Node (1,0) (Node (10,4) (Node (2,1) (Node (3,2) (Node (1,3) Empty))))
ll3 = insertAfter (tagIs 3) nd ll1
-- ll3 = Node (1,0) (Node (2,1) (Node (3,2) (Node (1,3) (Node (10,4) Empty))))
liftIO $ mapM_ print [ll1,ll2,ll3]
main :: IO ()
main = evalStateT demo (0 :: Int)
In this demo, tagIs is essentially doing the "object equality" thing because it is only interested in the extra tag we added before. Notice here I cheated in order to specify two nodes with their "values" being 1: one tagged 0 and the other tagged 3. Before running the program, it's impossible to tell what the actually tag would be. (Just like hard-coding a pointer value and hope it happens to work) In a more realistic setting, you would need another function to scan through the linked list and collect you a list of tags with a certain value (in this example, if you search the linked list to find all the nodes with "value" 1, you would have [0,3]) to work with.
"object equality" seems more like a concept from imperative programming languages, which assumes that there are allocators to offer "references" or "pointers" so that we can talk about "object equality". We have to simulate that allocator, I guess this is the thing that makes functional programming a little awkward to deal with it.

Kristopher Micinski remarked that you actually can do something similar with the ST monad, and you can do it with IO as well. Specifically, you can create an STRef or IORef, which is a sort of mutable box. The box can only be accessed using IO or ST actions as appropriate, which maintains the clean separation between "pure" and "impure" code. These references have identity—asking if two are equal tells you if they are actually the same box, rather than whether they have the same contents. But this is not really so pleasant, and not something you're likely to do without a good reason.

No, because it would break referential transparency. The results from calling a method with the same input multiple times should be indistinguishable, and it should be possible to replace it transparently with calling the method with that input once and then re-using the result. However, calling a method that returns some structure multiple times may produce a new copy of the structure every time -- structures with different "identity". If you could somehow tell that they have different identities, then it violates referential transparency.

Related

The simplest way to generically traverse a tree in haskell

Suppose I used language-javascript library to build AST in Haskell. The AST has nodes of different types, and each node can have fields of those different types.
And each type can have numerous constructors. (All the types instantiate Data, Eq and Show).
I would like to count each type's constructor occurrence in the tree. I could use toConstr to get the constructor, and ideally I'd make a Tree -> [Constr] function fisrt (then counting is easy).
There are different ways to do that. Obviously pattern matching is too verbose (imagine around 3 types with 9-28 constructors).
So I'd like to use a generic traversal, and I tried to find the solution in SYB library.
There is an everywhere function, which doesn't suit my needs since I don't need a Tree -> Tree transformation.
There is gmapQ, which seems suitable in terms of its type, but as it turns out it's not recursive.
The most viable option so far is everywhereM. It still does the useless transformation, but I can use a Writer to collect toConstr results. Still, this way doesn't really feel right.
Is there an alternative that will not perform a useless (for this task) transformation and still deliver the list of constructors? (The order of their appearance in the tree doesn't matter for now)
Not sure if it's the simplest, but:
> data T = L | B T T deriving Data
> everything (++) (const [] `extQ` (\x -> [toConstr (x::T)])) (B L (B (B L L) L))
[B,L,B,B,L,L,L]
Here ++ says how to combine the results from subterms.
const [] is the base case for subterms who are not of type T. For those of type T, instead, we apply \x -> [toConstr (x::T)].
If you have multiple tree types, you'll need to extend the query using
const [] `extQ` (handleType1) `extQ` (handleType2) `extQ` ...
This is needed to identify the types for which we want to take the constructors. If there are a lot of types, probably this can be made shorter in some way.
Note that the code above is not very efficient on large trees since using ++ in this way can lead to quadratic complexity. It would be better, performance wise, to return a Data.Map.Map Constr Int. (Even if we do need to define some Ord Constr for that)
universe from the Data.Generics.Uniplate.Data module can give you a list of all the sub-trees of the same type. So using Ilya's example:
data T = L | B T T deriving (Data, Show)
tree :: T
tree = B L (B (B L L) L)
λ> import Data.Generics.Uniplate.Data
λ> universe tree
[B L (B (B L L) L),L,B (B L L) L,B L L,L,L,L]
λ> fmap toConstr $ universe tree
[B,L,B,B,L,L,L]

QuickCheck giving up investigating a recursive data structure (rose tree.)

Given an arbitrary tree, I can construct a subtype relation over that tree, using Schubert numbering:
constructH :: Tree a -> Tree (Type a)
where Type nests the original label, and additionally provides the data needed to perform child/parent (or subtype) checks. With Schubert Numbering, the two Int parameters are sufficient for that.
data Type a where !Int -> !Int -> a -> Type a
This leads to the binary predicate
subtypeOf :: Type a -> Type a -> Bool
I now want to test with QuickCheck that this does indeed do what I want it to do. The following property, however, does not work, because QuickCheck just gives up:
subtypeSanity ∷ Tree (Type ()) → Gen Prop
subtypeSanity Node { rootLabel = t, subForest = f } =
let subtypes = concatMap flatten f
in (not $ null subtypes) ==> conjoin
(forAll (elements subtypes) (\x → x `subtypeOf` t):(map subtypeSanity f))
If I leave out the recursive call to subtypeSanity, i.e. the tail of the list I'm passing to conjoin, the property runs fine, but tests just the root node of the tree! How can I descend into my data structure recursively without QuickCheck giving up on generating new test cases?
If needed, I could provide the code to construct the Schubert Hierarchy, and the Arbitrary instance for Tree (Type a), to provide a complete runnable example, but that would be quite a bit of code. I'm convinced that I'm just not "getting" QuickCheck, and using it in the wrong way here.
EDIT: unfortunately, the sized function does not seem to eliminate the problem here. It ends up with the same result (see comment to J. Abrahamson's answer.)
EDIT II: I ended up "fixing" my problem by avoiding the recursive step, and avoiding conjoin. We just make a list of all nodes in the tree, then test the single-node property (which worked fine from the beginning) on those.
allNodes ∷ Tree a → [Tree a]
allNodes n#(Node { subForest = f }) = n:(concatMap allNodes f)
subtypeSanity ∷ Tree (Type ()) → Gen Prop
subtypeSanity tree = forAll (elements $ allNodes tree)
(\(Node { rootLabel = t, subForest = f }) →
let subtypes = concatMap flatten f
in (not $ null subtypes) ==> forAll (elements subtypes) (\x → x `subtypeOf` t))
Tweaking the Arbitrary instance for trees did not work. Here is the arbitrary instance I'm still using:
instance (Arbitrary a, Eq a) ⇒ Arbitrary (Tree (Type a)) where
arbitrary = liftM (constructH) $ sized arbTree
arbTree ∷ Arbitrary a ⇒ Int → Gen (Tree a)
arbTree n = do
m ← choose (0,n)
if m == 0
then Node <$> arbitrary <*> (return [])
else do part ← randomPartition n m
Node <$> arbitrary <*> mapM arbTree part
-- this is a crude way to find a sufficiently random x1,..,xm,
-- such that x1 + .. + xm = n, for any n, m, with 0 < m.
randomPartition ∷ Int → Int → Gen [Int]
randomPartition n m' = do
let m = m' - 1
seed ← liftM ((++[n]) . sort) $ replicateM m (choose (0,n))
return $ zipWith (-) seed (0:seed)
I consider the problem "solved for now," but if someone could explain to me why the recursive step and/or conjoin made QuickCheck give up (after passing "only" 0 tests,) I would be more than grateful.
When generating Arbitrary recursive structures, QuickCheck is often a bit too eager and generates sprawling, enormous random examples. These are undesirable as they usually don't better check the properties of interest and can be very slow. Two solutions are
Use things like the size parameter (sized function) and frequency function to bias the generator toward small trees.
Use a small-type oriented generator like those in smallcheck. These try to exhaustively generate all "small" examples and thus help to keep the size of the tree down.
To clarify the sized and frequency method of controlling generation size, here's an example RoseTree
data Rose a = It a | Rose [Rose a]
instance Arbitrary a => Arbitrary (Rose a) where
arbitrary = frequency
[ (3, It <$> arbitrary) -- The 3-to-1 ratio is chosen, ah,
-- arbitrarily...
-- you'll want to tune it
, (1, Rose <$> children)
]
where children = sized $ \n -> vectorOf n arbitrary
It can be done even more simply with a different Rose formation by very carefully controlling the size of the child list
data Rose a = Rose a [Rose a]
instance Arbitrary a => Arbitrary (Rose a) where
arbitrary = Rose <$> arbitrary <*> sized (\n -> vectorOf (tuneUp n) arbitrary)
where tuneUp n = round $ fromIntegral n / 4.0
You could do this without referencing sized, but that gives the user of your Arbitrary instance a knob to ask for larger trees if needed.
In case it's useful for those stumbling across this issue: when QuickCheck "gives up", it's a sign that your pre-condition (using ==>) is too hard to satisfy.
QuickCheck uses a simple rejection sampling technique: pre-conditions have no effect on the generation of values. QuickCheck generates a bunch of random values like normal. After these are generated, they're sent through the pre-condition: if the result is True, the property is tested with that value; if it's False, that value is discarded. If your pre-condition rejects most of the values QuickCheck has generated, then QuickCheck will "give up" (better to give up completely, than to make statistically dubious pass/fail claims).
In particular, QuickCheck will not attempt to produce values which satisfy a given pre-condition. It's up to you to make sure that the generator you're using (arbitrary or otherwise) produces lots of values which pass your pre-condition.
Let's see how this is manifesting in your example:
subtypeSanity :: Tree (Type ()) -> Gen Prop
subtypeSanity Node { rootLabel = t, subForest = f } =
let subtypes = concatMap flatten f
in (not $ null subtypes) ==> conjoin
(forAll (elements subtypes) (`subtypeOf` t):(map subtypeSanity f))
There is only one occurance of ==>, so its precondition (not $ null subtypes) must be too hard to satisfy. This is due to the recursive call map subtypeSanity f: not only are you rejecting any Tree which has an empty subForest, you're also (due to the recursion) rejecting any Tree where the subForest contains Trees with empty subForests, and rejecting any Tree where the subForest contains Trees with subForests containing Trees with empty subForests, and so on.
According to your arbitrary instance, Trees are only nested to finite depth: eventually we will always reach an empty subForest, hence your recursive precondition will always fail, and QuickCheck will give up.

How efficient is the derived Eq instance in GHC?

Is there a short circuit built in to GHC's (and Haskell's in general) derived Eq instance that will fire when I compare the same instance of a data type?
-- will this fire?
let same = complex == complex
My plan is to read in a lazy datastructure (let's say a tree), change some values and then compare the old and the new version to create a diff that will then be written back to the file.
If there would be a short circuit built in then the compare step would break as soon as it finds that the new structure is referencing old values. At the same time this wouldn't read in more than necessary from the file in the first place.
I know I'm not supposed to worry about references in Haskell but this seems to be a nice way to handle lazy file changes. If there is no shortcircuit builtin, would there be a way to implement this? Suggestions on different schemes welcome.
StableNames are specifically designed to solve problems like yours.
Note that StableNames can only be created in the IO monad. So you have two choices: either create your objects in the IO monad, or use unsafePerformIO in your (==) implementation (which is more or less fine in this situation).
But I should stress that it is possible to do this in a totally safe way (without unsafe* functions): only creation of stable names should happen in IO; after that, you may compare them in a totally pure way.
E.g.
data SNWrapper a = SNW !a !(StableName a)
snwrap :: a -> IO (SNWrapper a)
snwrap a = SNW a <$> makeStableName a
instance Eq a => Eq (SNWrapper a) where
(SNW a sna) (SNW b snb) = sna == snb || a == b
Notice that if stable name comparison says "no", you still need to perform full value comparison to get a definitive answer.
In my experience that worked pretty well when you have lots of sharing and for some reason are not willing to use other methods to indicate sharing.
(Speaking of other methods, you could, for example, replace the IO monad with a State Integer monad and generate unique integers in that monad as an equivalent of "stable names".)
Another trick is, if you have a recursive data structure, make the recursion go through SNWrapper. E.g. instead of
data Tree a = Bin (Tree a) (Tree a) | Leaf a
type WrappedTree a = SNWrapper (Tree a)
use
data Tree a = Bin (WrappedTree a) (WrappedTree a) | Leaf a
type WrappedTree a = SNWrapper (Tree a)
This way, even if short-circuiting doesn't fire at the topmost layer, it might fire somewhere in the middle and still save you some work.
There's no short-circuiting when both arguments of (==) are the same object. The derived Eq instance will do a structural comparison, and in the case of equality, of course needs to traverse the entire structure. You can build in a possible shortcut yourself using
GHC.Prim.reallyUnsafePtrEquality# :: a -> a -> GHC.Prim.Int#
but that will in fact fire only rarely:
Prelude GHC.Base> let x = "foo"
Prelude GHC.Base> I# (reallyUnsafePtrEquality# x x)
1
Prelude GHC.Base> I# (reallyUnsafePtrEquality# True True)
1
Prelude GHC.Base> I# (reallyUnsafePtrEquality# 3 3)
0
Prelude GHC.Base> I# (reallyUnsafePtrEquality# (3 :: Int) 3)
0
And if you read a structure from file, it will certainly not find it the same object as one that was already in memory.
You can use rewrite rules to avoid the comparison of lexically identical objects
module Equal where
{-# RULES
"==/same" forall x. x == x = True
#-}
main :: IO ()
main = let x = [1 :: Int .. 10] in print (x == x)
which leads to
$ ghc -O -ddump-rule-firings Equal.hs
[1 of 1] Compiling Equal ( Equal.hs, Equal.o )
Rule fired: Class op enumFromTo
Rule fired: ==/same
Rule fired: Class op show
the rule firing (note: it didn't fire with let x = "foo", but with user-defined types, it should).

Can someone explain the traverse function in Haskell?

I am trying and failing to grok the traverse function from Data.Traversable. I am unable to see its point. Since I come from an imperative background, can someone please explain it to me in terms of an imperative loop? Pseudo-code would be much appreciated. Thanks.
traverse is the same as fmap, except that it also allows you to run effects while you're rebuilding the data structure.
Take a look at the example from the Data.Traversable documentation.
data Tree a = Empty | Leaf a | Node (Tree a) a (Tree a)
The Functor instance of Tree would be:
instance Functor Tree where
fmap f Empty = Empty
fmap f (Leaf x) = Leaf (f x)
fmap f (Node l k r) = Node (fmap f l) (f k) (fmap f r)
It rebuilds the entire tree, applying f to every value.
instance Traversable Tree where
traverse f Empty = pure Empty
traverse f (Leaf x) = Leaf <$> f x
traverse f (Node l k r) = Node <$> traverse f l <*> f k <*> traverse f r
The Traversable instance is almost the same, except the constructors are called in applicative style. This means that we can have (side-)effects while rebuilding the tree. Applicative is almost the same as monads, except that effects cannot depend on previous results. In this example it means that you could not do something different to the right branch of a node depending on the results of rebuilding the left branch for example.
For historical reasons, the Traversable class also contains a monadic version of traverse called mapM. For all intents and purposes mapM is the same as traverse - it exists as a separate method because Applicative only later became a superclass of Monad.
If you would implement this in an impure language, fmap would be the same as traverse, as there is no way to prevent side-effects. You can't implement it as a loop, as you have to traverse your data structure recursively. Here's a small example how I would do it in Javascript:
Node.prototype.traverse = function (f) {
return new Node(this.l.traverse(f), f(this.k), this.r.traverse(f));
}
Implementing it like this limits you to the effects that the language allows though. If you f.e. want non-determinism (which the list instance of Applicative models) and your language doesn't have it built-in, you're out of luck.
traverse turns things inside a Traversable into a Traversable of things "inside" an Applicative, given a function that makes Applicatives out of things.
Let's use Maybe as Applicative and list as Traversable. First we need the transformation function:
half x = if even x then Just (x `div` 2) else Nothing
So if a number is even, we get half of it (inside a Just), else we get Nothing. If everything goes "well", it looks like this:
traverse half [2,4..10]
--Just [1,2,3,4,5]
But...
traverse half [1..10]
-- Nothing
The reason is that the <*> function is used to build the result, and when one of the arguments is Nothing, we get Nothing back.
Another example:
rep x = replicate x x
This function generates a list of length x with the content x, e.g. rep 3 = [3,3,3]. What is the result of traverse rep [1..3]?
We get the partial results of [1], [2,2] and [3,3,3] using rep. Now the semantics of lists as Applicatives is "take all combinations", e.g. (+) <$> [10,20] <*> [3,4] is [13,14,23,24].
"All combinations" of [1] and [2,2] are two times [1,2]. All combinations of two times [1,2] and [3,3,3] are six times [1,2,3]. So we have:
traverse rep [1..3]
--[[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
I think it's easiest to understand in terms of sequenceA, as traverse can be defined as
follows.
traverse :: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)
traverse f = sequenceA . fmap f
sequenceA sequences together the elements of a structure from left to right, returning a structure with the same shape containing the results.
sequenceA :: (Traversable t, Applicative f) => t (f a) -> f (t a)
sequenceA = traverse id
You can also think of sequenceA as reversing the order of two functors, e.g. going from a list of actions into an action returning a list of results.
So traverse takes some structure, and applies f to transform every element in the structure into some applicative, it then sequences up the effects of those applicatives from left to right, returning a structure with the same shape containing the results.
You can also compare it to Foldable, which defines the related function traverse_.
traverse_ :: (Foldable t, Applicative f) => (a -> f b) -> t a -> f ()
So you can see that the key difference between Foldable and Traversable is that the latter allows you to preserve the shape of the structure, whereas the former requires you to fold the result up into some other value.
A simple example of its usage is using a list as the traversable structure, and IO as the applicative:
λ> import Data.Traversable
λ> let qs = ["name", "quest", "favorite color"]
λ> traverse (\thing -> putStrLn ("What is your " ++ thing ++ "?") *> getLine) qs
What is your name?
Sir Lancelot
What is your quest?
to seek the holy grail
What is your favorite color?
blue
["Sir Lancelot","to seek the holy grail","blue"]
While this example is rather unexciting, things get more interesting when traverse is used on other types of containers, or using other applicatives.
It's kind of like fmap, except that you can run effects inside the mapper function, which also changes the result type.
Imagine a list of integers representing user IDs in a database: [1, 2, 3]. If you want to fmap these user IDs to usernames, you can't use a traditional fmap, because inside the function you need to access the database to read the usernames (which requires an effect -- in this case, using the IO monad).
The signature of traverse is:
traverse :: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)
With traverse, you can do effects, therefore, your code for mapping user IDs to usernames looks like:
mapUserIDsToUsernames :: (Num -> IO String) -> [Num] -> IO [String]
mapUserIDsToUsernames fn ids = traverse fn ids
There's also a function called mapM:
mapM :: (Traversable t, Monad m) => (a -> m b) -> t a -> m (t b)
Any use of mapM can be replaced with traverse, but not the other way around. mapM only works for monads, whereas traverse is more generic.
If you just want to achieve an effect and not return any useful value, there are traverse_ and mapM_ versions of these functions, both of which ignore the return value from the function and are slightly faster.
traverse is the loop. Its implementation depends on the data structure to be traversed. That might be a list, tree, Maybe, Seq(uence), or anything that has a generic way of being traversed via something like a for-loop or recursive function. An array would have a for-loop, a list a while-loop, a tree either something recursive or the combination of a stack with a while-loop; but in functional languages you do not need these cumbersome loop commands: you combine the inner part of the loop (in the shape of a function) with the data structure in a more directly manner and less verbose.
With the Traversable typeclass, you could probably write your algorithms more independent and versatile. But my experience says, that Traversable is usually only used to simply glue algorithms to existing data structures. It is quite nice not to need to write similar functions for different datatypes qualified, too.

What are arrows, and how can I use them?

I tried to learn the meaning of arrows, but I didn't understand them.
I used the Wikibooks tutorial. I think Wikibook's problem is mainly that it seems to be written for somebody who already understands the topic.
Can somebody explain what arrows are and how I can use them?
I don't know a tutorial, but I think it's easiest to understand arrows if you look at some concrete examples. The biggest problem I had learning how to use arrows was that none of the tutorials or examples actually show how to use arrows, just how to compose them. So, with that in mind, here's my mini-tutorial. I'll examine two different arrows: functions and a user-defined arrow type MyArr.
-- type representing a computation
data MyArr b c = MyArr (b -> (c,MyArr b c))
1) An Arrow is a calculation from input of a specified type to output of a specified type. The arrow typeclass takes three type arguments: the arrow type, the input type, and the output type. Looking at the instance head for arrow instances we find:
instance Arrow (->) b c where
instance Arrow MyArr b c where
The Arrow (either (->) or MyArr) is an abstraction of a computation.
For a function b -> c, b is the input and c is the output.
For a MyArr b c, b is the input and c is the output.
2) To actually run an arrow computation, you use a function specific to your arrow type. For functions you simply apply the function to an argument. For other arrows, there needs to be a separate function (just like runIdentity, runState, etc. for monads).
-- run a function arrow
runF :: (b -> c) -> b -> c
runF = id
-- run a MyArr arrow, discarding the remaining computation
runMyArr :: MyArr b c -> b -> c
runMyArr (MyArr step) = fst . step
3) Arrows are frequently used to process a list of inputs. For functions these can be done in parallel, but for some arrows output at any given step depends upon previous inputs (e.g. keeping a running total of inputs).
-- run a function arrow over multiple inputs
runFList :: (b -> c) -> [b] -> [c]
runFList f = map f
-- run a MyArr over multiple inputs.
-- Each step of the computation gives the next step to use
runMyArrList :: MyArr b c -> [b] -> [c]
runMyArrList _ [] = []
runMyArrList (MyArr step) (b:bs) = let (this, step') = step b
in this : runMyArrList step' bs
This is one reason Arrows are useful. They provide a computation model that can implicitly make use of state without ever exposing that state to the programmer. The programmer can use arrowized computations and combine them to create sophisticated systems.
Here's a MyArr that keeps count of the number of inputs it has received:
-- count the number of inputs received:
count :: MyArr b Int
count = count' 0
where
count' n = MyArr (\_ -> (n+1, count' (n+1)))
Now the function runMyArrList count will take a list length n as input and return a list of Ints from 1 to n.
Note that we still haven't used any "arrow" functions, that is either Arrow class methods or functions written in terms of them.
4) Most of the code above is specific to each Arrow instance[1]. Everything in Control.Arrow (and Control.Category) is about composing arrows to make new arrows. If we pretend that Category is part of Arrow instead of a separate class:
-- combine two arrows in sequence
>>> :: Arrow a => a b c -> a c d -> a b d
-- the function arrow instance
-- >>> :: (b -> c) -> (c -> d) -> (b -> d)
-- this is just flip (.)
-- MyArr instance
-- >>> :: MyArr b c -> MyArr c d -> MyArr b d
The >>> function takes two arrows and uses the output of the first as input to the second.
Here's another operator, commonly called "fanout":
-- &&& applies two arrows to a single input in parallel
&&& :: Arrow a => a b c -> a b c' -> a b (c,c')
-- function instance type
-- &&& :: (b -> c) -> (b -> c') -> (b -> (c,c'))
-- MyArr instance type
-- &&& :: MyArr b c -> MyArr b c' -> MyArr b (c,c')
-- first and second omitted for brevity, see the accepted answer from KennyTM's link
-- for further details.
Since Control.Arrow provides a means to combine computations, here's one example:
-- function that, given an input n, returns "n+1" and "n*2"
calc1 :: Int -> (Int,Int)
calc1 = (+1) &&& (*2)
I've frequently found functions like calc1 useful in complicated folds, or functions that operate on pointers for example.
The Monad type class provides us with a means to combine monadic computations into a single new monadic computation using the >>= function. Similarly, the Arrow class provides us with means to combine arrowized computations into a single new arrowized computation using a few primitive functions (first, arr, and ***, with >>> and id from Control.Category). Also similar to Monads, the question of "What does an arrow do?" can't be generally answered. It depends on the arrow.
Unfortunately I don't know of many examples of arrow instances in the wild. Functions and FRP seem to be the most common applications. HXT is the only other significant usage that comes to mind.
[1] Except count. It's possible to write a count function that does the same thing for any instance of ArrowLoop.
From a glance at your history on Stack Overflow, I'm going to assume that you're comfortable with some of the other standard type classes, particularly Functor and Monoid, and start with a brief analogy from those.
The single operation on Functor is fmap, which serves as a generalized version of map on lists. This is pretty much the entire purpose of the type class; it defines "things that you can map over". So, in a sense Functor represents a generalization of that particular aspect of lists.
The operations for Monoid are generalized versions of the empty list and (++), and it defines "things that can be combined associatively, with a particular thing that's an identity value". Lists are pretty much the simplest thing that fits that description, and Monoid represents a generalization of that aspect of lists.
In the same way as the above two, the operations on the Category type class are generalized versions of id and (.), and it defines "things connecting two types in a particular direction, that can be connected head-to-tail". So, this represents a generalization of that aspect of functions. Notably not included in the generalization are currying or function application.
The Arrow type class builds off of Category, but the underlying concept is the same: Arrows are things that compose like functions and have an "identity arrow" defined for any type. The additional operations defined on the Arrow class itself just define a way to lift an arbitrary function to an Arrow and a way to combine two arrows "in parallel" as a single arrow between tuples.
So, the first thing to keep in mind here is that expressions building Arrows are, essentially, elaborate function composition. The combinators like (***) and (>>>) are for writing "pointfree" style, while proc notation gives a way to assign temporary names to inputs and outputs while wiring things up.
A useful thing to note here is that, even though Arrows are sometimes described as being "the next step" from Monads, there's really not a terribly meaningful relationship there. For any Monad you can work with Kleisli arrows, which are just functions with a type like a -> m b. The (<=<) operator in Control.Monad is arrow composition for these. On the other hand, Arrows don't get you a Monad unless you also include the ArrowApply class. So there's no direct connection as such.
The key difference here is that whereas Monads can be used to sequence computations and do things step-by-step, Arrows are in some sense "timeless" just like regular functions. They can include extra machinery and functionality that gets spliced in by (.), but it's more like building a pipeline, not accumulating actions.
The other related type classes add additional functionality to an arrow, such as being able to combine arrows with Either as well as (,).
My favorite example of an Arrow is stateful stream transducers, which look something like this:
data StreamTrans a b = StreamTrans (a -> (b, StreamTrans a b))
A StreamTrans arrow converts an input value to an output and an "updated" version of itself; consider the ways that this differs from a stateful Monad.
Writing instances for Arrow and its related type classes for the above type may be a good exercise for understanding how they work!
I also wrote a similar answer previously that you may find helpful.
I'd like to add that arrows in Haskell are far simpler than they might appear
based on the literature. They are simply abstractions of functions.
To see how this is practically useful, consider that you have a bunch of
functions you want to compose, where some of them are pure and some are
monadic. For example, f :: a -> b, g :: b -> m1 c, and h :: c -> m2 d.
Knowing each of the types involved, I could build a composition by hand, but
the output type of the composition would have to reflect the intermediate
monad types (in the above case, m1 (m2 d)). What if I just wanted to treat
the functions as if they were just a -> b, b -> c, and c -> d? That is,
I want to abstract away the presence of monads and reason only about the
underlying types. I can use arrows to do exactly this.
Here is an arrow which abstracts away the presence of IO for functions in the
IO monad, such that I can compose them with pure functions without the
composing code needing to know that IO is involved. We start by defining an
IOArrow to wrap IO functions:
data IOArrow a b = IOArrow { runIOArrow :: a -> IO b }
instance Category IOArrow where
id = IOArrow return
IOArrow f . IOArrow g = IOArrow $ f <=< g
instance Arrow IOArrow where
arr f = IOArrow $ return . f
first (IOArrow f) = IOArrow $ \(a, c) -> do
x <- f a
return (x, c)
Then I make some simple functions that I want to compose:
foo :: Int -> String
foo = show
bar :: String -> IO Int
bar = return . read
And use them:
main :: IO ()
main = do
let f = arr (++ "!") . arr foo . IOArrow bar . arr id
result <- runIOArrow f "123"
putStrLn result
Here I am calling IOArrow and runIOArrow, but if I were passing these arrows
around in a library of polymorphic functions, they would only need to accept
arguments of type "Arrow a => a b c". None of the library code would need to
be made aware that a monad was involved. Only the creator and end user of the
arrow needs to know.
Generalizing IOArrow to work for functions in any Monad is called the "Kleisli
arrow", and there is already a builtin arrow for doing just that:
main :: IO ()
main = do
let g = arr (++ "!") . arr foo . Kleisli bar . arr id
result <- runKleisli g "123"
putStrLn result
You could of course also use arrow composition operators, and proc syntax, to
make it a little clearer that arrows are involved:
arrowUser :: Arrow a => a String String -> a String String
arrowUser f = proc x -> do
y <- f -< x
returnA -< y
main :: IO ()
main = do
let h = arr (++ "!")
<<< arr foo
<<< Kleisli bar
<<< arr id
result <- runKleisli (arrowUser h) "123"
putStrLn result
Here it should be clear that although main knows the IO monad is involved,
arrowUser does not. There would be no way of "hiding" IO from arrowUser
without arrows -- not without resorting to unsafePerformIO to turn the
intermediate monadic value back into a pure one (and thus losing that context
forever). For example:
arrowUser' :: (String -> String) -> String -> String
arrowUser' f x = f x
main' :: IO ()
main' = do
let h = (++ "!") . foo . unsafePerformIO . bar . id
result = arrowUser' h "123"
putStrLn result
Try writing that without unsafePerformIO, and without arrowUser' having to
deal with any Monad type arguments.
There are John Hughes's Lecture Notes from an AFP (Advanced Functional Programming) workshop. Note they were written before the Arrow classes were changed in the Base libraries:
http://www.cse.chalmers.se/~rjmh/afp-arrows.pdf
When I began exploring Arrow compositions (essentially Monads), my approach was to break out of the functional syntax and composition it is most commonly associated with and begin by understanding its principles using a more declarative approach. With this in mind, I find the following breakdown more intuitive:
function(x) {
func1result = func1(x)
if(func1result == null) {
return null
} else {
func2result = func2(func1result)
if(func2result == null) {
return null
} else {
func3(func2result)
}
So, essentially, for some value x, call one function first that we assume may return null (func1), another that may retun null or be assigned to null interchangably, finally, a third function that may also return null. Now given the value x, pass x to func3, only then, if it does not return null, pass this value into func2, and only if this value is not null, pass this value into func1. It's more deterministic and the control flow allows you to construct more sophisticated exception handling.
Here we can utilise the arrow composition: (func3 <=< func2 <=< func1) x.

Resources