How to make a sequence-polymorphic zip function that works for both finite and infinite sequences? - haskell

I'm trying to write some sequence-related functions for my own edification and interest. So far, I have the following typeclasses:
class (Traversable s, Monad s) => Sequential s where
infixr 5 ~:
(~:) :: a -> s a -> s a
headEx :: s a -> a
tailEx :: s a -> s a
null :: s a -> Bool
class (Sequential s) => Finite s where
emptyS :: s a
The first of these (Sequential) is meant to represent all sequency things (including both finite and infinite sequences), while the second is for only finite sequency things.
I'm having some polymorphism-related issues in defining some typical list functions based on these. An example of these would be zip, which to my initial guess, would have a type signature of:
zip :: (Sequential s1, Sequential s2, Sequential s3) => s1 a -> s2 b -> s3 (a, b)
However, I run up against a problem - if I want to zip together an infinite sequence and a finite sequence, I obviously want the result to be a finite sequence as well. However, my attempt at an implementation fails, as Sequence has no concept of emptiness (and indeed cannot have such a concept, because infinite sequences can't be empty). Additionally, I cannot see any way of checking whether one of the Sequentials being passed in is actually a Finite, which would lead to a Finite result. As a result, I am not sure how I could write an implementation of this in its most general form.
I'm not sure whether my approach is at fault, or there's something in the language I am missing, but I would really like to know how I can implement a zip generic across finite and infinite sequences in any combination.

Related

How do we overcome the compile time and runtime gap when programming in a Dependently Typed Language?

I'm told that in dependent type system, "types" and "values" is mixed, and we can treat both of them as "terms" instead.
But there is something I can't understand: in a strongly typed programming language without Dependent Type (like Haskell), Types is decided (infered or checked) at compile time, but values is decided (computed or inputed) at runtime.
I think there must be a gap between these two stages. Just think that if a value is interactively read from STDIN, how can we reference this value in a type which must be decided AOT?
e.g. There is a natural number n and a list of natural number xs (which contains n elements) which I need to read from STDIN, how can I put them into a data structure Vect n Nat?
Suppose we input n :: Int at runtime from STDIN. We then read n strings, and store them into vn :: Vect n String (pretend for the moment this can be done).
Similarly, we can read m :: Int and vm :: Vect m String. Finally, we concatenate the two vectors: vn ++ vm (simplifying a bit here). This can be type checked, and will have type Vect (n+m) String.
Now it is true that the type checker runs at compile time, before the values n,m are known, and also before vn,vm are known. But this does not matter: we can still reason symbolically on the unknowns n,m and argue that vn ++ vm has that type, involving n+m, even if we do not yet know what n+m actually is.
It is not that different from doing math, where we manipulate symbolic expressions involving unknown variables according to some rules, even if we do not know the values of the variables. We don't need to know what number is n to see that n+n = 2*n.
Similarly, the type checker can type check
-- pseudocode
readNStrings :: (n :: Int) -> IO (Vect n String)
readNStrings O = return Vect.empty
readNStrings (S p) = do
s <- getLine
vp <- readNStrings p
return (Vect.cons s vp)
(Well, actually some more help from the programmer could be needed to typecheck this, since it involves dependent matching and recursion. But I'll neglect this.)
Importantly, the type checker can check that without knowing what n is.
Note that the same issue actually already arises with polymorphic functions.
fst :: forall a b. (a, b) -> a
fst (x, y) = x
test1 = fst # Int # Float (2, 3.5)
test2 = fst # String # Bool ("hi!", True)
...
One might wonder "how can the typechecker check fst without knowing what types a and b will be at runtime?". Again, by reasoning symbolically.
With type arguments this is arguably more obvious since we usually run the programs after type erasure, unlike value parameters like our n :: Int above, which can not be erased. Still, there is some similarity between universally quantifying over types or over Int.
It seems to me that there are two questions here:
Given that some values are unknown during compile-time (e.g., values read from STDIN), how can we make use of them in types? (Note that chi has already given an excellent answer to this.)
Some operations (e.g., getLine) seem to make absolutely no sense at compile-time; how could we possibly talk about them in types?
The answer to (1), as chi has said, is symbolic or abstract reasoning. You can read in a number n, and then have a procedure that builds a Vect n Nat by reading from the command line n times, making use of arithmetic properties such as the fact that 1+(n-1) = n for nonzero natural numbers.
The answer to (2) is a bit more subtle. Naively, you might want to say "this function returns a vector of length n, where n is read from the command line". There are two types you might try to give this (apologies if I'm getting Haskell notation wrong)
unsafePerformIO (do n <- getLine; return (IO (Vect (read n :: Int) Nat)))
or (in pseudo-Coq notation, since I'm not sure what Haskell's notation for existential types is)
IO (exists n, Vect n Nat)
These two types can actually both be made sense of, and say different things. The first type, to me, says "at compile time, read n from the command line, and return a function which, at runtime, gives a vector of length n by performing IO". The second type says "at runtime, perform IO to get a natural number n and a vector of length n".
The way I like looking at this is that all side effects (other than, perhaps, non-termination) are monad transformers, and there is only one monad: the "real-world" monad. Monad transformers work just as well at the type level as at the term level; the one thing which is special is run :: M a -> a which executes the monad (or stack of monad transformers) in the "real world". There are two points in time at which you can invoke run: one is at compile time, where you invoke any instance of run which shows up at the type level. Another is at runtime, where you invoke any instance of run which shows up at the value level. Note that run only makes sense if you specify an evaluation order; if your language does not specify whether it is call-by-value or call-by-name (or call-by-push-value or call-by-need or call-by-something-else), you can get incoherencies when you try to compute a type.

How to make a custom Attoparsec parser combinator that returns a Vector instead of a list?

{-# LANGUAGE OverloadedStrings #-}
import Data.Attoparsec.Text
import Control.Applicative(many)
import Data.Word
parseManyNumbers :: Parser [Int] -- I'd like many to return a Vector instead
parseManyNumbers = many (decimal <* skipSpace)
main :: IO ()
main = print $ parseOnly parseManyNumbers "131 45 68 214"
The above is just an example, but I need to parse a large amount of primitive values in Haskell and need to use arrays instead of lists. This is something that possible in the F#'s Fparsec, so I've went as far as looking at Attoparsec's source, but I can't figure out a way to do it. In fact, I can't figure out where many from Control.Applicative is defined in the base Haskell library. I thought it would be there as that is where documentation on Hackage points to, but no such luck.
Also, I am having trouble deciding what data structure to use here as I can't find something as convenient as a resizable array in Haskell, but I would rather not use inefficient tree based structures.
An option to me would be to skip Attoparsec and implement an entire parser inside the ST monad, but I would rather avoid it except as a very last resort.
There is a growable vector implementation in Haskell, which is based on the great AMT algorithm: "persistent-vector". Unfortunately, the library isn't that much known in the community so far. However to give you a clue about the performance of the algorithm, I'll say that it is the algorithm that drives the standard vector implementations in Scala and Clojure.
I suggest you implement your parser around that data-structure under the influence of the list-specialized implementations. Here the functions are, btw:
-- | One or more.
some :: f a -> f [a]
some v = some_v
where
many_v = some_v <|> pure []
some_v = (fmap (:) v) <*> many_v
-- | Zero or more.
many :: f a -> f [a]
many v = many_v
where
many_v = some_v <|> pure []
some_v = (fmap (:) v) <*> many_v
Some ideas:
Data Structures
I think the most practical data structure to use for the list of Ints is something like [Vector Int]. If each component Vector is sufficiently long (i.e. has length 1k) you'll get good space economy. You'll have
to write your own "list operations" to traverse it, but you'll avoid re-copying data that you would have to perform to return the data in a single Vector Int.
Also consider using a Dequeue instead of a list.
Stateful Parsing
Unlike Parsec, Attoparsec does not provide for user state. However, you
might be able to make use of the runScanner function (link):
runScanner :: s -> (s -> Word8 -> Maybe s) -> Parser (ByteString, s)
(It also returns the parsed ByteString which in your case may be problematic since it will be very large. Perhaps you can write an alternate version which doesn't do this.)
Using unsafeFreeze and unsafeThaw you can incrementally fill in a Vector. Your s data structure might look
something like:
data MyState = MyState
{ inNumber :: Bool -- True if seen a digit
, val :: Int -- value of int being parsed
, vecs :: [ Vector Int ] -- past parsed vectors
, v :: Vector Int -- current vector we are filling
, vsize :: Int -- number of items filled in current vector
}
Maybe instead of a [Vector Int] you use a Dequeue (Vector Int).
I imagine, however, that this approach will be slow since your parsing function will get called for every single character.
Represent the list as a single token
Parsec can be used to parse a stream of tokens, so how about writing
your own tokenizer and letting Parsec create the AST.
The key idea is to represent these large sequences of Ints as a single token. This gives you a lot more latitude in how you parse them.
Defer Conversion
Instead of converting the numbers to Ints at parse time, just have parseManyNumbers return a ByteString and defer the conversion until
you actually need the values. This much enable you to avoid reifying
the values as an actual list.
Vectors are arrays, under the hood. The tricky thing about arrays is that they are fixed-length. You pre-allocate an array of a certain length, and the only way of extending it is to copy the elements into a larger array.
This makes linked lists simply better at representing variable-length sequences. (It's also why list implementations in imperative languages amortise the cost of copying by allocating arrays with extra space and copying only when the space runs out.) If you don't know in advance how many elements there are going to be, your best bet is to use a list (and perhaps copy the list into a Vector afterwards using fromList, if you need to). That's why many returns a list: it runs the parser as many times as it can with no prior knowledge of how many that'll be.
On the other hand, if you happen to know how many numbers you're parsing, then a Vector could be more efficient. Perhaps you know a priori that there are always n numbers, or perhaps the protocol specifies before the start of the sequence how many numbers there'll be. Then you can use replicateM to allocate and populate the vector efficiently.

Haskell: Computation "in a monad" -- meaning?

In reading about monads, I keep seeing phrases like "computations in the Xyz monad". What does it mean for a computation to be "in" a certain monad?
I think I have a fair grasp on what monads are about: allowing computations to produce outputs that are usually of some expected type, but can alternatively or additionally convey some other information, such as error status, logging info, state and so on, and allow such computations to be chained.
But I don't get how a computation would be said to be "in" a monad. Does this just refer to a function that produces a monadic result?
Examples: (search "computation in")
http://www.haskell.org/tutorial/monads.html
http://www.haskell.org/haskellwiki/All_About_Monads
http://ertes.de/articles/monads.html
Generally, a "computation in a monad" means not just a function returning a monadic result, but such a function used inside a do block, or as part of the second argument to (>>=), or anything else equivalent to those. The distinction is relevant to something you said in a comment:
"Computation" occurs in func f, after val extracted from input monad, and before result is wrapped as monad. I don't see how the computation per se is "in" the monad; it seems conspicuously "out" of the monad.
This isn't a bad way to think about it--in fact, do notation encourages it because it's a convenient way to look at things--but it does result in a slightly misleading intuition. Nowhere is anything being "extracted" from a monad. To see why, forget about (>>=)--it's a composite operation that exists to support do notation. The more fundamental definition of a monad are three orthogonal functions:
fmap :: (a -> b) -> (m a -> m b)
return :: a -> m a
join :: m (m a) -> m a
...where m is a monad.
Now think about how to implement (>>=) with these: starting with arguments of type m a and a -> m b, your only option is using fmap to get something of type m (m b), after which you can use join to flatten the nested "layers" to get just m b.
In other words, nothing is being taken "out" of the monad--instead, think of the computation as going deeper into the monad, with successive steps being collapsed into a single layer of the monad.
Note that the monad laws are also much simpler from this perspective--essentially, they say that when join is applied doesn't matter as long as the nesting order is preserved (a form of associativity) and that the monadic layer introduced by return does nothing (an identity value for join).
Does this just refer to a function that produces a monadic result?
Yes, in short.
In long, it's because Monad allows you to inject values into it (via return) but once inside the Monad they're stuck. You have to use some function like evalWriter or runCont which is strictly more specific than Monad to get values back "out".
More than that, Monad (really, its partner, Applicative) is the essence of having a "container" and allowing computations to happen inside of it. That's what (>>=) gives you, the ability to do interesting computations "inside" the Monad.
So functions like Monad m => m a -> (a -> m b) -> m b let you compute with and around and inside a Monad. Functions like Monad m => a -> m a let you inject into the Monad. Functions like m a -> a would let you "escape" the Monad except they don't exist in general (only in specific). So, for conversation's sake we like to talk about functions that have result types like Monad m => m a as being "inside the monad".
Usually monad stuff is easier to grasp when starting with "collection-like" monads as example. Imagine you calculate the distance of two points:
data Point = Point Double Double
distance :: Point -> Point -> Double
distance p1 p2 = undefined
Now you may have a certain context. E.g. one of the points may be "illegal" because it is out of some bounds (e.g. on the screen). So you wrap your existing computation in the Maybe monad:
distance :: Maybe Point -> Maybe Point -> Maybe Double
distance p1 p2 = undefined
You have exactly the same computation, but with the additional feature that there may be "no result" (encoded as Nothing).
Or you have a have a two groups of "possible" points, and need their mutual distances (e.g. to use later the shortest connection). Then the list monad is your "context":
distance :: [Point] -> [Point] -> [Double]
distance p1 p2 = undefined
Or the points are entered by a user, which makes the calculation "nondeterministic" (in the sense that you depend on things in the outside world, which may change), then the IO monad is your friend:
distance :: IO Point -> IO Point -> IO Double
distance p1 p2 = undefined
The computation remains always the same, but happens to take place in a certain "context", which adds some useful aspects (failure, multi-value, nondeterminism). You can even combine these contexts (monad transformers).
You may write a definition that unifies the definitions above, and works for any monad:
distance :: Monad m => m Point -> m Point -> m Double
distance p1 p2 = do
Point x1 y1 <- p1
Point x2 y2 <- p2
return $ sqrt ((x1-x2)^2 + (y1-y2)^2)
That proves that our computation is really independent from the actual monad, which leads to formulations as "x is computed in(-side) the y monad".
Looking at the links you provided, it seems that a common usage of "computation in" is with regards to a single monadic value. Excerpts:
Gentle introduction - here we run a computation in the SM monad, but the computation is the monadic value:
-- run a computation in the SM monad
runSM :: S -> SM a -> (a,S)
All about monads - previous computation refers to a monadic value in the sequence:
The >> function is a convenience operator that is used to bind a monadic computation that does not require input from the previous computation in the sequence
Understanding monads - here the first computation could refer to e.g. getLine, a monadic value :
(binding) gives an intrinsic idea of using the result of a computation in another computation, without requiring a notion of running computations.
So as an analogy, if I say i = 4 + 2, then i is the value 6, but it is equally a computation, namely the computation 4 + 2. It seems the linked pages uses computation in this sense - computation as a monadic value - at least some of the time, in which case it makes sense to use the expression "a computation in" the given monad.
Consider the IO monad. A value of type IO a is a description of a large (often infinite) number of behaviours where a behaviour is a sequence of IO events (reads, writes, etc). Such a value is called a "computation"; in this case it is a computation in the IO monad.

Why does State need a value?

Just learning about State monad from this excellent tutorial. However, when I tried to explain it to a non-programmer they had a question that stumped me.
If the purpose of the State is to simulate mutable memory, why is the function that state monad stores is of the type:
s -> (a, s)
and not simply:
s -> s
In other words, what is the need for the "intermediate" value? For example, couldn't we, in the cases where we need it, simulate it by simply defining a state as a tuple of (state, value)?
I'm sure I confused something, any help is appreciated.
To draw a parallel with an imperative language like C, s -> s corresponds to a function with the return type void, which is invoked purely for side effects (such as mutating the memory). It is isomorphic to State s ().
And indeed, it is possible to write C functions which communicate only through global variables. But, as in C, it is often convenient to return values from functions. That's what a is for.
Of course it's possible that for your particular problem s -> s is a better choice. Although it's not a Monad, it is a Monoid (when wrapped in Endo). So you can construct such functions using <> and mempty, which correspond to >>= and return of Monad.
To expand a bit on Nick's answer:
s is the state. If all your functions were s -> s (state to state), your functions would not be able to return any values. You could define your state as (the actual state, value returned), but that conflates the state with the value the state-ful functions are computing. And it's also the common case that you'll want functions to actually compute and return values...
s' -> s' is equivalent to (a, s) -> (a, s). Here it is obvious that your State will need an initial a to start things off in addition to s.
On the other hand s -> (a, s) only needs the seed s to begin things and does not require an a value at all.
Thus the type of s -> (a, s) tells you that State is less complex than if it were (a, s) -> (a, s). Types in Haskell convey LOTS of information.
If the purpose of the State is to simulate mutable memory, why is the function that state monad stores is of the type:
s -> (a, s)
and not simply:
s -> s
The purpose of the State monad is not to simulate mutable memory, but rather to model computations that both produce a value and have a side effect. Simply, given some initial state of type s, your computation will produce some value of type a, as well as an updated state.
Maybe your computation does not produce a value... Then, easy: the value type a is simply (). Perhaps on the other hand your computation does not have a side effect. Again, easy: you might think of your state transition function (the s -> s argument to modify) as just being id. But often you're dealing with both at the same time.
You can actually use get and put as relatively simple examples:
get :: State s s -- s -> (s, s)
put :: s -> State () -- s -> (s -> ((), s))
get is a computation which, given the current state (the first s), will return it both as a value -- that is, the result of the computation -- and as the "new" (unmodified) state.
put is a computation which, given a new state (the first s) and a current state (the second s), will simply ignore the current state. It will produce () as the computed value (because, of course, it hasn't computed any value!) and hang onto the new state provided.
Presumably you want to use your stateful computations inside of do notation?
You should ask yourself what the Monad instance would look like for a stateful computation defined by
newtype State s = { runState :: s -> s }
The problem to be solved is that you have an input and a series of functions, and you want to apply the functions to the input in order.
If the functions are purely state-changing functions, s -> s on an input of type s, then you don't need State to use them. Haskell is very good at chaining together functions like these, e.g. with the standard composition operator ., or something like foldr (.) id, or foldr id.
However, if the functions both mutate a state and report some result of doing so, so that you could give them the type s -> (s,a), then gluing them all together is a bit of a nuisance. You have to unpack the result tuple and pass the new state value to the next function, use the reported value somewhere else, and then unpack that result, and so on. It's easy to pass the wrong state to an input function because you have to name each result and input explicitly to do the unpacking. You end up with something like this:
let
(res1, s1) = fun1 s0
(res2, s2) = fun2 s1
(res3, s3) = fun3 res1 res2 s1
...
in resN
There, I accidentally passed s1 instead of s2, maybe because I added the second line in later and didn't realise the third line needed changing. When composing the s -> s functions, this problem can't possibly arise because there are no names to get right:
let
resN = fun1 . fun2 . fun3 . -- etc.
So we invented State to do the same trick. State is essentially just a way of gluing functions like s -> (s,a) together in such a way that the right state always gets passed to the right function.
So it's not so much that people went "we want to use State, let's use s -> (s,a)" but rather "we're writing functions like s -> (s,a), let's invent State to make that easy". With functions s -> s, it's already easy and we don't have to invent anything.
As an example of how s -> (s,a) arises naturally, consider parsing: a parser will be given some input, consume some of the input and return a value. In Haskell, this is naturally modelled as taking an input list, and returning a pair of the value and the remaining input - i.e. [Input] -> ([Input], a), or State [Input].
a is the value returned, and the s is the final state.
http://www.haskell.org/haskellwiki/State_Monad#Implementation

How to pick a random list element in a pure function?

I want to make a Haskell function that can pick out a random number from a given list.
My type signature is:
randomPick :: [a] -> a
What should I do?
Part of the definition of a "pure" function in Haskell is that it is referentially transparent, that is, interchangeable with the result of evaluating it. This implies that the result of evaluating it must be the same every time. Ergo, the function you want isn't possible, I'm afraid. To generate random numbers in Haskell, a function needs to do one of two things:
Take and return a pseudorandom number generator, e.g.:
randomPick :: RNG -> [a] -> (a, RNG)
Or use IO to access randomness from "the outside world":
randomPick :: [a] -> IO a
Both styles are provided by the module System.Random. Also, in the former case, passing the PRNG around can be abstracted out using the State monad, or perhaps a special-purpose Random monad.
What you've described can't be done in pure functional code.
Pure functional code implies that you will get the same output for the same input, every time. Since a randomizing function, by definition, gives you different output for the same input, this is impossible in pure functional code.
Unless you pass around an extra value as explained in #camccann's answer. Technically, it doesn't even have to be as advanced as an RNG, depending on your needs. You could pass around an integer, and multiply it by 10 and subtract 3 (or whatever), then take the modulo of that to find your index. Then your function remains pure, but you do directly control the randomness.
Another option is to use RandomRIO to generate a number in a range, which you can then use to select an index from the list. This will require you to enter the IO monad.
If you want to use random number generators in purely functional code but not have to explicitly pass around generator state then you can use state monad (or monad transformer) and hide the plumbing. State monads are still referentially transparent and it's safe & normal to escape a state monad. You could also use the ST monad if you want true local mutable state that is purely functional on the outside.
Here is some useful code I wrote and use sometimes:
rand :: (Random a, RandomGen g, MonadState g m) => a -> a -> m a
rand lo hi = do
r <- get
let (val, r') = randomR (lo, hi) r
put r'
return val

Resources