How can I turn this parser into an applicative, polyvariadic one? - haskell

I'm trying to parse dates such as 09/10/2015 17:20:52:
{-# LANGUAGE FlexibleContexts #-}
import Text.Parsec
import Text.Parsec.String
import Text.Read
import Control.Applicative hiding (many, (<|>))
data Day = Day
{ mo :: Int
, dy :: Int
, yr :: Int
} deriving (Show)
data Time = Time
{ hr :: Int
, min :: Int
, sec :: Int
} deriving (Show)
day = listUncurry Day <$> (sepCount 3 (char '/') $ read <$> many digit)
time = listUncurry Time <$> (sepCount 3 (char ':') $ dign 2 )
dign :: (Stream s m Char, Read b) => Int -> ParsecT s u m b
dign = (read <$>) . flip count digit
-- how generalize to n?
listUncurry h [x1,x2,x3] = h x1 x2 x3
sepCount n sep p = (:) <$> p <*> (count (n-1) $ sep *> p)
I have a hunch that some kind of zipWithN would generalize listUncurry. Maybe some kind of foldl ($)?
As a side question (out of curiosity), can parsec parsers be used generatively?

Actually, you only need Functor:
listUncurry :: Functor f => (a -> a -> a -> r) -> f [a] -> f r
listUncurry h p =
(\[x, y, z] -> h x y z) <$> p
To me, a hint that only Functor is necessary is when you have a code pattern like:
do x <- m
return (f ...)
This is equivalent to
m >>= (\x -> return (f ...))
which is the same as
fmap (\x -> f ...) m
This is because the monad laws imply this identity:
fmap f xs = xs >>= return . f
Polyvariadic listUncurry
I don't really recommend this in most circumstances since it turns what would be compile time errors into runtime errors, but this is how you could implement a polyvariadic listUncurry:
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE FlexibleInstances #-}
class ListUncurry a x r where
listUncurry :: a -> [x] -> r
instance ListUncurry k a r => ListUncurry (a -> k) a r where
listUncurry f (x:xs) = listUncurry (f x) xs
listUncurry _ _ = error "listUncurry: Too few arguments given"
instance ListUncurry r a r where
listUncurry r [] = r
listUncurry _ _ = error "listUncurry: Too many arguments given"
You will need a lot of explicit type annotations if you use it too. There is probably a way to use a type family or functional dependency to help with that, but I can't think of it off the top of my head at the moment. Since that is probably solvable (to an extent at least), in my mind the bigger problem is the type errors being changed from compile time errors to runtime errors.
Sample usage:
ghci> listUncurry ord ['a'] :: Int
97
ghci> listUncurry ((==) :: Int -> Int -> Bool) [1,5::Int] :: Bool
False
ghci> listUncurry ((==) :: Char -> Char -> Bool) ['a'] :: Bool
*** Exception: listUncurry: Too few arguments given
ghci> listUncurry ((==) :: Char -> Char -> Bool) ['a','b','c'] :: Bool
*** Exception: listUncurry: Too many arguments given
A safer listUncurry
If you change the class to
class ListUncurry a x r where
listUncurry :: a -> [x] -> Maybe r
and change the error cases in the instances appropriately, you will at least get a better interface to handling the errors. You could also replace the Maybe with a type that differentiates between the "too many" and "too few" argument errors if you wanted to retain that information.
I feel that this would be a bit better of an approach, although you will need to add a bit more error handling (Maybe's Functor, Applicative and Monad interfaces will make this fairly nice though).
Comparing the two approaches
It ultimately depends on what sort of error this would represent. If the program execution can no longer continue in any meaningful way if it runs into such an error, then the first approach (or something like it) might be more appropriate than the second. If there is any way to recover from the error, the second approach would be better than the first.
Whether or not a polyvariadic technique should be used in the first place is a different question. It might be better to restructure the program to avoid the additional complexity of the polyvariadic stuff.

also i'm sure i shouldn't be snocing a list -- what's the right way to do this?
The following implementation of sepCount is more efficient:
-- | #sepCount n sep p# applies #n# (>=1) occurrences of #p#,
-- separated by #sep#. Returns a list of the values returned by #p#.
sepCount n sep p = p <:> count (n - 1) (sep *> p)
where (<:>) = liftA2 (:)

Related

How to create a function that takes a list of arguments and returns a function that applies arguments to another function [duplicate]

Shouldn’t this definition be allowed in a lazy language like Haskell in which functions are curried?
apply f [] = f
apply f (x:xs) = apply (f x) xs
It’s basically a function that applies the given function to the given list of arguments and is very easily done in Lisp for example.
Are there any workarounds?
It is hard to give a static type to the apply function, since its type depends on the type of the (possibly heterogeneous) list argument. There are at least two ways one way to write this function in Haskell that I can think of:
Using reflection
We can defer type checking of the application until runtime:
import Data.Dynamic
import Data.Typeable
apply :: Dynamic -> [Dynamic] -> Dynamic
apply f [] = f
apply f (x:xs) = apply (f `dynApp` x) xs
Note that now the Haskell program may fail with a type error at runtime.
Via type class recursion
Using the semi-standard Text.Printf trick (invented by augustss, IIRC), a solution can be coded up in this style (exercise). It may not be very useful though, and still requires some trick to hide the types in the list.
Edit: I couldn't come up with a way to write this, without using dynamic types or hlists/existentials. Would love to see an example
I like Sjoerd Visscher's reply, but the extensions -- especially IncoherentInstances, used in this case to make partial application possible -- might be a bit daunting. Here's a solution that doesn't require any extensions.
First, we define a datatype of functions that know what to do with any number of arguments. You should read a here as being the "argument type", and b as being the "return type".
data ListF a b = Cons b (ListF a (a -> b))
Then we can write some (Haskell) functions that munge these (variadic) functions. I use the F suffix for any functions that happen to be in the Prelude.
headF :: ListF a b -> b
headF (Cons b _) = b
mapF :: (b -> c) -> ListF a b -> ListF a c
mapF f (Cons v fs) = Cons (f v) (mapF (f.) fs)
partialApply :: ListF a b -> [a] -> ListF a b
partialApply fs [] = fs
partialApply (Cons f fs) (x:xs) = partialApply (mapF ($x) fs) xs
apply :: ListF a b -> [a] -> b
apply f xs = headF (partialApply f xs)
For example, the sum function could be thought of as a variadic function:
sumF :: Num a => ListF a a
sumF = Cons 0 (mapF (+) sumF)
sumExample = apply sumF [3, 4, 5]
However, we also want to be able to deal with normal functions, which don't necessarily know what to do with any number of arguments. So, what to do? Well, like Lisp, we can throw an exception at runtime. Below, we'll use f as a simple example of a non-variadic function.
f True True True = 32
f True True False = 67
f _ _ _ = 9
tooMany = error "too many arguments"
tooFew = error "too few arguments"
lift0 v = Cons v tooMany
lift1 f = Cons tooFew (lift0 f)
lift2 f = Cons tooFew (lift1 f)
lift3 f = Cons tooFew (lift2 f)
fF1 = lift3 f
fExample1 = apply fF1 [True, True, True]
fExample2 = apply fF1 [True, False]
fExample3 = apply (partialApply fF1 [True, False]) [False]
Of course, if you don't like the boilerplate of defining lift0, lift1, lift2, lift3, etc. separately, then you need to enable some extensions. But you can get quite far without them!
Here is how you can generalize to a single lift function. First, we define some standard type-level numbers:
{-# LANGUAGE MultiParamTypeClasses, FlexibleInstances, FlexibleContexts, TypeFamilies, UndecidableInstances #-}
data Z = Z
newtype S n = S n
Then introduce the typeclass for lifting. You should read the type I n a b as "n copies of a as arguments, then a return type of b".
class Lift n a b where
type I n a b :: *
lift :: n -> I n a b -> ListF a b
instance Lift Z a b where
type I Z a b = b
lift _ b = Cons b tooMany
instance (Lift n a (a -> b), I n a (a -> b) ~ (a -> I n a b)) => Lift (S n) a b where
type I (S n) a b = a -> I n a b
lift (S n) f = Cons tooFew (lift n f)
And here's the examples using f from before, rewritten using the generalized lift:
fF2 = lift (S (S (S Z))) f
fExample4 = apply fF2 [True, True, True]
fExample5 = apply fF2 [True, False]
fExample6 = apply (partialApply fF2 [True, False]) [False]
No, it cannot. f and f x are different types. Due to the statically typed nature of haskell, it can't take any function. It has to take a specific type of function.
Suppose f is passed in with type a -> b -> c. Then f x has type b -> c. But a -> b -> c must have the same type as a -> b. Hence a function of type a -> (b -> c) must be a function of type a -> b. So b must be the same as b -> c, which is an infinite type b -> b -> b -> ... -> c. It cannot exist. (continue to substitute b -> c for b)
Here's one way to do it in GHC. You'll need some type annotations here and there to convince GHC that it's all going to work out.
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE FunctionalDependencies #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE UndecidableInstances #-}
{-# LANGUAGE IncoherentInstances #-}
class Apply f a r | f -> a r where
apply :: f -> [a] -> r
instance Apply f a r => Apply (a -> f) a r where
apply f (a:as) = apply (f a) as
instance Apply r a r where
apply r _ = r
test = apply ((+) :: Int -> Int -> Int) [1::Int,2]
apply' :: (a -> a -> a) -> [a] -> a
apply' = apply
test' = apply' (+) [1,2]
This code is a good illustration of the differences between static and dynamic type-checking. With static type-checking, the compiler can't be sure that apply f really is being passed arguments that f expects, so it rejects the program. In lisp, the checking is done at runtime and the program might fail then.
I am not sure how much this would be helpful as I am writing this in F# but I think this can be easily done in Haskell too:
type 'a RecFunction = RecFunction of ('a -> 'a RecFunction)
let rec apply (f: 'a RecFunction) (lst: 'a list) =
match (lst,f) with
| ([],_) -> f
| ((x::xs), RecFunction z) -> apply (z x) xs
In this case the "f" in question is defined using a discriminated union which allows recursive data type definition. This can be used to solved the mentioned problem I guess.
With the help and input of some others I defined a way to achieve this (well, sort of, with a custom list type) which is a bit different from the previous answers. This is an old question, but it seems to still be visited so I will add the approach for completeness.
We use one extension (GADTs), with a list type a bit similar to Daniel Wagner's, but with a tagging function type rather than a Peano number. Let's go through the code in pieces. First we set the extension and define the list type. The datatype is polymorphic so in this formulation arguments don't have to have the same type.
{-# LANGUAGE GADTs #-}
-- n represents function type, o represents output type
data LApp n o where
-- no arguments applied (function and output type are the same)
End :: LApp o o
-- intentional similarity to ($)
(:$) :: a -> LApp m o -> LApp (a -> m) o
infixr 5 :$ -- same as :
Let's define a function that can take a list like this and apply it to a function. There is some type trickery here: the function has type n, a call to listApply will only compile if this type matches the n tag on our list type. By leaving our output type o unspecified, we leave some freedom in this (when creating the list we don't have to immediately entirely fix the kind of function it can be applied to).
-- the apply function
listApply :: n -> LApp n o -> o
listApply fun End = fun
listApply fun (p :$ l) = listApply (fun p) l
That's it! We can now apply functions to arguments stored in our list type. Expected more? :)
-- showing off the power of AppL
main = do print . listApply reverse $ "yrruC .B lleksaH" :$ End
print . listApply (*) $ 1/2 :$ pi :$ End
print . listApply ($) $ head :$ [1..] :$ End
print $ listApply True End
Unfortunately we are kind of locked in to our list type, we can't just convert normal lists to use them with listApply. I suspect this is a fundamental issue with the type checker (types end up depending on the value of a list) but to be honest I'm not entirely sure.
-- Can't do this :(
-- listApply (**) $ foldr (:$) End [2, 32]
If you feel uncomfortable about using a heterogeneous list, all you have to do is add an extra parameter to the LApp type, e.g:
-- Alternative definition
-- data FList n o a where
-- Nil :: FList o o a
-- Cons :: a -> FList f o a -> FList (a -> f) o a
Here a represents the argument type, where the function which is applied to will also have to accept arguments of all the same type.
This isn't precisely an answer to your original question, but I think it might be an answer to your use-case.
pure f <*> [arg] <*> [arg2] ...
-- example
λ>pure (\a b c -> (a*b)+c) <*> [2,4] <*> [3] <*> [1]
[7,13]
λ>pure (+) <*> [1] <*> [2]
[3]
The applicative instance of list is a lot broader than this super narrow use-case though...
λ>pure (+1) <*> [1..10]
[2,3,4,5,6,7,8,9,10,11]
-- Or, apply (+1) to items 1 through 10 and collect the results in a list
λ>pure (+) <*> [1..5] <*> [1..5]
[2,3,4,5,6,3,4,5,6,7,4,5,6,7,8,5,6,7,8,9,6,7,8,9,10]
{- The applicative instance of list gives you every possible combination of
elements from the lists provided, so that is every possible sum of pairs
between one and five -}
λ>pure (\a b c -> (a*b)+c) <*> [2,4] <*> [4,3] <*> [1]
[9,7,17,13]
{- that's - 2*4+1, 2*3+1, 4*4+1, 4*3+1
Or, I am repeating argC when I call this function twice, but a and b are
different -}
λ>pure (\a b c -> show (a*b) ++ c) <*> [1,2] <*> [3,4] <*> [" look mah, other types"]
["3 look mah, other types","4 look mah, other types","6 look mah, other types","8 look mah, other types"]
So it's not the same concept, precisely, but it a lot of those compositional use-cases, and adds a few more.

Is it possible to iterate the application of a non-endomorphism?

In Haskell if I want to repeatedly apply an endomorphism a -> a to a value of type a I can just use iterate.
What about a function that is not an endomorphisms, but generic enough to work correctly on its return type?
Consider for example Just :: a -> Maybe a; I can write
Just . Just . Just ...
as many times as I want. Is there a way to write this shortly with something like
iterate' 3 Just :: a -> Maybe (Maybe (Maybe a))
or do we need something like dependent types to do this?
It is possible with a minor tweak to the syntax you proposed: iterate' #3 Just instead of iterate' 3 Just.
This is because the result type depends on the number, so the number has to be a type literal, not a value literal. As you correctly note, doing this with arbitrary numbers would require dependent types[1], which Haskell doesn't have.
{-# OPTIONS_GHC -Wall #-}
{-# LANGUAGE TypeFamilies, KindSignatures, DataKinds,
FlexibleInstances, UndecidableInstances, ScopedTypeVariables,
FunctionalDependencies, TypeApplications, RankNTypes, FlexibleContexts,
AllowAmbiguousTypes #-}
import qualified GHC.TypeLits as Lit
-- from type-natural
import Data.Type.Natural
import Data.Type.Natural.Builtin
class Iterate (n :: Nat) (f :: * -> *) (a :: *) (r :: *)
| n f a -> r
where
iterate_peano :: Sing n -> (forall b . b -> f b) -> a -> r
instance Iterate 'Z f a a where
iterate_peano SZ _ = id
instance Iterate n f (f a) r => Iterate ('S n) f a r where
iterate_peano (SS n) f x = iterate_peano n f (f x)
iterate'
:: forall (n :: Lit.Nat) f a r .
(Iterate (ToPeano n) f a r, SingI n)
=> (forall b . b -> f b) -> a -> r
iterate' f a = iterate_peano (sToPeano (sing :: Sing n)) f a
If you load this in ghci, you can say
*Main> :t iterate' #3 Just
iterate' #3 Just :: a -> Maybe (Maybe (Maybe a))
*Main> iterate' #3 Just True
Just (Just (Just True))
This code uses two different type-level naturals: the built-in Nat from GHC.TypeLits and the classic Peano numerals from Data.Type.Natural. The former are needed to provide the nice iterate' #3 syntax, the latter are needed to perform the recursion (which happens in the Iterate class). I used Data.Type.Natural.Builtin to convert from a literal to the corresponding Peano numeral.
[1] However, given a specific way to consume the iterated values (e.g. if you know in advance that you'll only want to show them), you probably could adapt this code to work even for dynamic values of n. There's nothing in the type of iterate' that requires a statically known Nat; the only challenge is to prove that the result of the iteration satisfies the constraints you need.
You can do it with template haskell, if you know the number at compile time (but unless the number is pretty large I don't think it's worth the hassle). If you don't know the number yet at compile time, you need to correctly model the return type, which we can do using a non-regular type:
data Iter f a = Iter0 a | IterS (Iter f (f a))
iterate' :: Int -> (forall x. x -> f x) -> a -> Iter f a
iterate' 0 f x = Iter0 x
iterate' n f x = IterS (iterate' (n-1) f (f x))
Iter is essentially a way of expressing the data type a | f a | f (f a) | f (f (f a)) | .... To use the result you need to recurse on Iter. Also the function has to be of the form a -> f a for some type constructor f, so you may need to do some newtype wrapping to get there. So... it's kind of a pain either way.
You can do this without Template Haskell or type-level Nats. The kind of variable-depth recursive type you are building actually fits perfectly into the model of a free monad. We can use the unfold function from the free package to build up a Free structure and short-circuit when our counter reaches 0.
-- This extension is enabled so we can have nice type annotations
{-# Language ScopedTypeVariables #-}
import Control.Monad.Free (Free)
import qualified Control.Monad.Free as Free
iterate' :: forall f a. Functor f => Int -> (a -> f a) -> a -> Free f a
iterate' counter0 f x0 = Free.unfold run (counter0, x0)
where
-- If counter is 0, short circuit with current result
-- Otherwise, continue computation with modified counter
run :: (Int, a) -> Either a (f (Int, a))
run (0 , x) = Left x
run (counter, x) = Right (countDown counter <$> f x)
countDown :: Int -> a -> (Int, a)
countDown counter x = (counter - 1, x)
Now, it's easy to create and digest these types of values for any Functor.
> iterate' 3 Just True
Free (Just (Free (Just (Free (Just (Pure True))))))
> let f i = if i == 1 then Left "abort" else Right (i+1)
> iterate' 0 f 0
Pure 0
> iterate' 1 f 0
Free (Right (Pure 1))
> iterate' 2 f 0
Free (Right (Free (Left "abort")))
If your Functor also happens to be a Monad, you can use retract to collapse the recursive structure.
> Free.retract (iterate' 3 Just True)
Just True
> Free.retract (iterate' 0 f 0)
Right 0
> Free.retract (iterate' 1 f 0)
Right 1
> Free.retract (iterate' 2 f 0)
Left "abort"
I suggest reading the docs for Control.Monad.Free so you can get an idea for how these structures are created/consumed.
(Just as an aside, a -> Maybe a is an endomorphism, but it's an endomorphism in the Kleisli category of Maybe.)

Excluding Types in Haskell

How would I use pattern matching in order to exclude certain types of inputs? For instance given the following:
f list k =
if null list
then []
else head list + k : f (tail list) k
How can I use pattern matching to make f s/t it only allows Int and Integer but not Double or Float?
As suchtgott explained, you should not actually try to limit your function to exactly Int and Integer; you probably want to use an Integral constraint.
Suppose, for fun, that you really did want to limit it just like that. The way to do this in Haskell 98 is a bit weird (jump down to the break to see how this is done in GHC Haskell:
class Integral a => IntOrInteger a where
intOrInteger :: Either (f a -> f Int) (f a -> f Integer)
instance IntOrInteger Int where
intOrInteger = Left id
instance IntOrInteger Integer where
intOrInteger = Right id
How do you use such a thing? Well, to start off with
intOrIntegerSimple :: IntOrInteger a => Either (a -> Int) (a -> Integer)
intOrIntegerSimple = case intOrInteger of
Left f -> Left (runIdentity . f . Identity)
Right f -> Right (runIdentity . f . Identity)
But how do you flip it around, and turn an Int or Integer back to an instance of IntOrInteger?
newtype Switch f a i = Switch {runSwitch :: f i -> f a}
intOrInteger' :: IntOrInteger a => Either (f Int -> f a) (f Integer -> f a)
intOrInteger' = case intOrInteger of
Left f -> Left (runSwitch (f (Switch id)))
Right f -> Right (runSwitch (f (Switch id)))
intOrIntegerSimple' :: IntOrInteger a => Either (Int -> a) (Integer -> a)
intOrIntegerSimple' = case intOrInteger' of
Left f -> Left (runIdentity . f . Identity)
Right f -> Right (runIdentity . f . Identity)
When you don't really care whether you have an Int or an Integer, but want to be sure you really have one of them, you have to watch out for invalid instances. What might an invalid instance look like? This is one option:
instance IntOrInteger Word where
intOrInteger = Left (const undefined)
To just test that the instance is valid, you can use this (importing Data.Proxy):
ensureIntOrInteger :: IntOrInteger a => Proxy a -> ()
ensureIntOrInteger p = case intOrInteger of
Left f -> f p `seq` ()
Right f -> f p `seq` ()
The result of ensureIntOrInteger will be defined if and only if the type a is actually an Int or an Integer.
Nice; it works. But in practice it's pretty nasty to use. You can do much better with a few GHC extensions:
{-# LANGUAGE GADTs, TypeFamilies, TypeOperators, ConstraintKinds,
UndecidableInstances, UndecidableSuperClasses, DataKinds #-}
-- In 8.0 and later, Constraint is also available from Data.Kind
import GHC.Exts (Constraint)
import Data.Type.Equality ((:~:)(..))
import GHC.TypeLits (TypeError, ErrorMessage (..))
type family IntOrIntegerC a :: Constraint where
IntOrIntegerC Int = ()
IntOrIntegerC Integer = ()
IntOrIntegerC t = TypeError ('ShowType t :<>:
'Text " is not an Int or an Integer.")
class (Integral a, IntOrIntegerC a) => IntOrInteger a where
intOrInteger :: Either (a :~: Int) (a :~: Integer)
instance IntOrInteger Int where
intOrInteger = Left Refl
instance IntOrInteger Integer where
intOrInteger = Right Refl
With this formulation, the IntOrIntegerC constraint family blocks out any invalid types without your needing to do anything, giving a useful error message if someone tries to write a bogus instance. And if you actually do need to use the equality evidence, it's simply a matter of pattern matching on intOrInteger or using the various handy functions in Data.Type.Equality.
A point of style: using head and tail is generally discouraged in Haskell. We prefer to pattern match instead. Your function could be written
f [] _ = []
f (x : xs) k = x + k : f xs k
f/mapAdd implements a map. map is
map :: (a -> b) -> [a] -> [b]
map _ [] = []
map f (x : xs) = f x : map f xs
and can be used to formulate mapAdd:
Prelude> mapAdd n = map (+ n)
Prelude> mapAdd 3 [1,2,3,4]
[4,5,6,7]
GHCi's :type command tells you the automatically inferred type signature of mapAdd:
Prelude> :type mapAdd
mapAdd :: Num b => b -> [b] -> [b]
Here, mapAdd's bs are constrained to the general numerical type class Num. You can explicitly constrain to the type class Integral, which (basically) exclusively contains the data types Int, and Integer. Integral is a subclass of Num.
Prelude> :{
Prelude| mapAdd :: Integral a => a -> [a] -> [a]
Prelude| mapAdd n = map (+ n)
Prelude| :}
Prelude> :t mapAdd
mapAdd :: Integral a => a -> [a] -> [a]
mapAdd n = map (+ n) is pointfree for mapAdd n lst = map (+ n) lst.
You can create a dumb type class with instances only for Int and Integer and then use a type constraint:
class (Num a) => IntOrInteger a
instance IntOrInteger Int
instance IntOrInteger Integer
f :: (IntOrInteger a) => [a] -> a -> [a]
f list k =
if null list
then []
else head list + k : f (tail list) k
Then if you use f with Int, it will compile:
> f [1,2,3] 1
[2,3,4]
But it will not compile with Double:
> f [1,2,3] (1::Double)
<interactive>:19:1: error:
* No instance for (IntOrInteger Double) arising from a use of `f'
* In the expression: f [1, 2, 3] (1 :: Double)
In an equation for `it': it = f [1, 2, 3] (1 :: Double)
There are a lot of ways to constrain types, the most direct is just with type families.
type family Elem x xs :: Constraint where
Elem x (x ': xs) = ()
Elem x (y ': xs) = Elem x xs
Elem x '[] = TypeError ('ShowType x :<>: 'Text " is not a permitted type.")
type a ~~ b = Elem a b
function :: (a ~~ [Int, Integer], Num a) => [a] -> [a]
This allows you to white list a set of types. You won't know which one in particular you have though, that's more complicated.
However, you said you wanted to exclude certain types, so perhaps you want a black list. We simply flip the first and last case:
type family NotElem x xs :: Constraint where
NotElem x (x ': xs) = TypeError ('ShowType x :<>: 'Text " is a forbidden type.")
NotElem x (y ': xs) = NotElem x xs
NotElem x '[] = ()
type a !~~ b = NotElem a b
function :: (a !~~ [Double, Float], Num a) => [a] -> [a]
function will now accept any Num type that is not a Double or Float.
However, there's no reason to actually do any of this for your example function. You lose nothing at all by allowing it to work on the full range of Num types. In fact, these hijinks will at the least cost you a tiny smidgen more time on type checking, or if you go further and do reflection based things, possibly run time overhead as well.

Unexpected memory growth with Control.Monad foldM

I have the following code, which has been stripped down and is I think as minimal as possible that has some very odd behaviour.
The code consists of two source files:
One to define some data:
module MyFunction where
data MyFunction =
MyFunction {
functionNumber :: Int,
functionResult :: IO String
}
makeMyFunction :: Show a => Int -> IO a -> MyFunction
makeMyFunction number result = MyFunction {
functionNumber = number,
functionResult = result >>= return . show }
And the other is Main:
module Main (main) where
import System.CPUTime (getCPUTime)
import Data.List (foldl')
import Data.Foldable (foldlM)
import Control.Monad (foldM)
import MyFunction
exampleFunction = do
--let x = foldl' (\a b -> a `seq` (a + b)) 0 [1..20000000] -- This works
--x <- foldlM (\a b -> a `seq` return (a + b)) 0 [1..20000000] -- This works (*)
x <- foldM (\a b -> a `seq` return (a + b)) 0 [1..20000000] -- This doesn't
print x
return ()
runFunction fn = do
result <- functionResult fn
duration <- getCPUTime
if result /= "()"
then putStrLn ""
else return ()
putStrLn (show (fromIntegral duration / (10^9)) ++ "ms")
return fn
main = do
runFunction (makeMyFunction 123 exampleFunction)
return ()
The code as above (compiled using GHC 7.10.3 with stack 1.0.0 with default flags) has a rapid increase in memory usage (exceeding 1GB), and takes typically 3.3 seconds.
If I make a changes to the code, for example:
Use one of the commented alternatives to the problem line
Take out any line from runFunction
The memory usage will remain minimal, and takes only about 1 second.
One feature that I think is most surprising to me is that replacing foldM with foldlM (which as far as I know foldM = foldlM) fixes the problem.
Also making changes to code that I don't see has any relationship to the problem lines of code also fixes the problem. For example removing the last putStrLn.
Another oddity is that if I merge the MyFunction module into the Main module, while it doesn't fix the problem, it actually causes foldlM to behave as foldM using excessive memory.
In the real code that this came from, I have a large number exampleFunctions, and there is significantly more Main code, and every so often I encounter this sort of unexplained memory usage from functions, that can usually be resolved by some sort of voodoo.
I'm looking for an explanation for the behaviour. If I know why this occurs I can then look into avoiding it. Could this be a compiler issue, or maybe just a misunderstanding on my part?
(*) I've highlighted the secondary issue that causes the same memory growth to occur with foldlM.
Here is foldlM from Foldable.hs (ghc)
-- | Monadic fold over the elements of a structure,
-- associating to the left, i.e. from left to right.
foldlM :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
foldlM f z0 xs = foldr f' return xs z0
where f' x k z = f z x >>= k
and foldM from Monad.hs
foldM :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b
{-# INLINEABLE foldM #-}
{-# SPECIALISE foldM :: (a -> b -> IO a) -> a -> [b] -> IO a #-}
{-# SPECIALISE foldM :: (a -> b -> Maybe a) -> a -> [b] -> Maybe a #-}
foldM = foldlM
I placed these definitions to a separate module Test and tested the execution with and without INLINEABLE / SPESIALISE lines. Whatever the reason is, leaving out the SPECIALISE directives helped and the execution time and memory usage was like with foldlM.
After a little bit more digging, removing line
{-# SPECIALISE foldM :: (a -> b -> IO a) -> a -> [b] -> IO a #-}
effected the most.

Constructing efficient monad instances on `Set` (and other containers with constraints) using the continuation monad

Set, similarly to [] has a perfectly defined monadic operations. The problem is that they require that the values satisfy Ord constraint, and so it's impossible to define return and >>= without any constraints. The same problem applies to many other data structures that require some kind of constraints on possible values.
The standard trick (suggested to me in a haskell-cafe post) is to wrap Set into the continuation monad. ContT doesn't care if the underlying type functor has any constraints. The constraints become only needed when wrapping/unwrapping Sets into/from continuations:
import Control.Monad.Cont
import Data.Foldable (foldrM)
import Data.Set
setReturn :: a -> Set a
setReturn = singleton
setBind :: (Ord b) => Set a -> (a -> Set b) -> Set b
setBind set f = foldl' (\s -> union s . f) empty set
type SetM r a = ContT r Set a
fromSet :: (Ord r) => Set a -> SetM r a
fromSet = ContT . setBind
toSet :: SetM r r -> Set r
toSet c = runContT c setReturn
This works as needed. For example, we can simulate a non-deterministic function that either increases its argument by 1 or leaves it intact:
step :: (Ord r) => Int -> SetM r Int
step i = fromSet $ fromList [i, i + 1]
-- repeated application of step:
stepN :: Int -> Int -> Set Int
stepN times start = toSet $ foldrM ($) start (replicate times step)
Indeed, stepN 5 0 yields fromList [0,1,2,3,4,5]. If we used [] monad instead, we would get
[0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5]
instead.
The problem is efficiency. If we call stepN 20 0 the output takes a few seconds and stepN 30 0 doesn't finish within a reasonable amount of time. It turns out that all Set.union operations are performed at the end, instead of performing them after each monadic computation. The result is that exponentially many Sets are constructed and unioned only at the end, which is unacceptable for most tasks.
Is there any way around it, to make this construction efficient? I tried but without success.
(I even suspect that there could be some kinds of theoretical limits following from Curry-Howard isomorphism and Glivenko's theorem. Glivenko's theorem says that for any propositional tautology φ the formula ¬¬φ can be proved in intuitionistic logic. However, I suspect that the length of the proof (in normal form) can be exponentially long. So, perhaps, there could be cases when wrapping a computation into the continuation monad will make it exponentially longer?)
Monads are one particular way of structuring and sequencing computations. The bind of a monad cannot magically restructure your computation so as to happen in a more efficient way. There are two problems with the way you structure your computation.
When evaluating stepN 20 0, the result of step 0 will be computed 20 times. This is because each step of the computation produces 0 as one alternative, which is then fed to the next step, which also produces 0 as alternative, and so on...
Perhaps a bit of memoization here can help.
A much bigger problem is the effect of ContT on the structure of your computation. With a bit of equational reasoning, expanding out the result of replicate 20 step, the definition of foldrM and simplifying as many times as necessary, we can see that stepN 20 0 is equivalent to:
(...(return 0 >>= step) >>= step) >>= step) >>= ...)
All parentheses of this expression associate to the left. That's great, because it means that the RHS of each occurrence of (>>=) is an elementary computation, namely step, rather than a composed one. However, zooming in on the definition of (>>=) for ContT,
m >>= k = ContT $ \c -> runContT m (\a -> runContT (k a) c)
we see that when evaluating a chain of (>>=) associating to the left, each bind will push a new computation onto the current continuation c. To illustrate what is going on, we can use again a bit of equational reasoning, expanding out this definition for (>>=) and the definition for runContT, and simplifying, yielding:
setReturn 0 `setBind`
(\x1 -> step x1 `setBind`
(\x2 -> step x2 `setBind` (\x3 -> ...)...)
Now, for each occurrence of setBind, let's ask ourselves what the RHS argument is. For the leftmost occurrence, the RHS argument is the whole rest of the computation after setReturn 0. For the second occurrence, it's everything after step x1, etc. Let's zoom in to the definition of setBind:
setBind set f = foldl' (\s -> union s . f) empty set
Here f represents all the rest of the computation, everything on the right hand side of an occurrence of setBind. That means that at each step, we are capturing the rest of the computation as f, and applying f as many times as there are elements in set. The computations are not elementary as before, but rather composed, and these computations will be duplicated many times.
The crux of the problem is that the ContT monad transformer is transforming the initial structure of the computation, which you meant as a left associative chain of setBind's, into a computation with a different structure, ie a right associative chain. This is after all perfectly fine, because one of the monad laws says that, for every m, f and g we have
(m >>= f) >>= g = m >>= (\x -> f x >>= g)
However, the monad laws do not impose that the complexity remain the same on each side of the equations of each law. And indeed, in this case, the left associative way of structuring this computation is a lot more efficient. The left associative chain of setBind's evaluates in no time, because only elementary subcomputations are duplicated.
It turns out that other solutions shoehorning Set into a monad also suffer from the same problem. In particular, the set-monad package, yields similar runtimes. The reason being, that it too, rewrites left associative expressions into right associative ones.
I think you have put the finger on a very important yet rather subtle problem with insisting that Set obeys a Monad interface. And I don't think it can be solved. The problem is that the type of the bind of a monad needs to be
(>>=) :: m a -> (a -> m b) -> m b
ie no class constraint allowed on either a or b. That means that we cannot nest binds on the left, without first invoking the monad laws to rewrite into a right associative chain. Here's why: given (m >>= f) >>= g, the type of the computation (m >>= f) is of the form m b. A value of the computation (m >>= f) is of type b. But because we can't hang any class constraint onto the type variable b, we can't know that the value we got satisfies an Ord constraint, and therefore cannot use this value as the element of a set on which we want to be able to compute union's.
Recently on Haskell Cafe Oleg gave an example how to implement the Set monad efficiently. Quoting:
... And yet, the efficient genuine Set monad is possible.
...
Enclosed is the efficient genuine Set monad. I wrote it in direct style (it seems to be faster, anyway). The key is to use the optimized choose function when we can.
{-# LANGUAGE GADTs, TypeSynonymInstances, FlexibleInstances #-}
module SetMonadOpt where
import qualified Data.Set as S
import Control.Monad
data SetMonad a where
SMOrd :: Ord a => S.Set a -> SetMonad a
SMAny :: [a] -> SetMonad a
instance Monad SetMonad where
return x = SMAny [x]
m >>= f = collect . map f $ toList m
toList :: SetMonad a -> [a]
toList (SMOrd x) = S.toList x
toList (SMAny x) = x
collect :: [SetMonad a] -> SetMonad a
collect [] = SMAny []
collect [x] = x
collect ((SMOrd x):t) = case collect t of
SMOrd y -> SMOrd (S.union x y)
SMAny y -> SMOrd (S.union x (S.fromList y))
collect ((SMAny x):t) = case collect t of
SMOrd y -> SMOrd (S.union y (S.fromList x))
SMAny y -> SMAny (x ++ y)
runSet :: Ord a => SetMonad a -> S.Set a
runSet (SMOrd x) = x
runSet (SMAny x) = S.fromList x
instance MonadPlus SetMonad where
mzero = SMAny []
mplus (SMAny x) (SMAny y) = SMAny (x ++ y)
mplus (SMAny x) (SMOrd y) = SMOrd (S.union y (S.fromList x))
mplus (SMOrd x) (SMAny y) = SMOrd (S.union x (S.fromList y))
mplus (SMOrd x) (SMOrd y) = SMOrd (S.union x y)
choose :: MonadPlus m => [a] -> m a
choose = msum . map return
test1 = runSet (do
n1 <- choose [1..5]
n2 <- choose [1..5]
let n = n1 + n2
guard $ n < 7
return n)
-- fromList [2,3,4,5,6]
-- Values to choose from might be higher-order or actions
test1' = runSet (do
n1 <- choose . map return $ [1..5]
n2 <- choose . map return $ [1..5]
n <- liftM2 (+) n1 n2
guard $ n < 7
return n)
-- fromList [2,3,4,5,6]
test2 = runSet (do
i <- choose [1..10]
j <- choose [1..10]
k <- choose [1..10]
guard $ i*i + j*j == k * k
return (i,j,k))
-- fromList [(3,4,5),(4,3,5),(6,8,10),(8,6,10)]
test3 = runSet (do
i <- choose [1..10]
j <- choose [1..10]
k <- choose [1..10]
guard $ i*i + j*j == k * k
return k)
-- fromList [5,10]
-- Test by Petr Pudlak
-- First, general, unoptimal case
step :: (MonadPlus m) => Int -> m Int
step i = choose [i, i + 1]
-- repeated application of step on 0:
stepN :: Int -> S.Set Int
stepN = runSet . f
where
f 0 = return 0
f n = f (n-1) >>= step
-- it works, but clearly exponential
{-
*SetMonad> stepN 14
fromList [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14]
(0.09 secs, 31465384 bytes)
*SetMonad> stepN 15
fromList [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
(0.18 secs, 62421208 bytes)
*SetMonad> stepN 16
fromList [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
(0.35 secs, 124876704 bytes)
-}
-- And now the optimization
chooseOrd :: Ord a => [a] -> SetMonad a
chooseOrd x = SMOrd (S.fromList x)
stepOpt :: Int -> SetMonad Int
stepOpt i = chooseOrd [i, i + 1]
-- repeated application of step on 0:
stepNOpt :: Int -> S.Set Int
stepNOpt = runSet . f
where
f 0 = return 0
f n = f (n-1) >>= stepOpt
{-
stepNOpt 14
fromList [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14]
(0.00 secs, 515792 bytes)
stepNOpt 15
fromList [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
(0.00 secs, 515680 bytes)
stepNOpt 16
fromList [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
(0.00 secs, 515656 bytes)
stepNOpt 30
fromList [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30]
(0.00 secs, 1068856 bytes)
-}
I don't think your performance problems in this case are due to the use of Cont
step' :: Int -> Set Int
step' i = fromList [i,i + 1]
foldrM' f z0 xs = Prelude.foldl f' setReturn xs z0
where f' k x z = f x z `setBind` k
stepN' :: Int -> Int -> Set Int
stepN' times start = foldrM' ($) start (replicate times step')
gets similar performance to the Cont based implementation but occurs entirely in the Set "restricted monad"
I am not sure if I believe your claim about Glivenko's theorem leading to exponential increase in (normalized) proof size--at least in the Call-By-Need context. That is because we can arbitrarily reuse subproofs (and our logic is second order, we need only a single proof of forall a. ~~(a \/ ~a)). Proofs are not trees, they are graphs (sharing).
In general, you are likely to see performance costs from Cont wrapping Set but they can usually be avoided via
smash :: (Ord r, Ord k) => SetM r r -> SetM k r
smash = fromSet . toSet
I found out another possibility, based on GHC's ConstraintKinds extension. The idea is to redefine Monad so that it includes a parametric constraint on allowed values:
{-# LANGUAGE ConstraintKinds #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE RebindableSyntax #-}
import qualified Data.Foldable as F
import qualified Data.Set as S
import Prelude hiding (Monad(..), Functor(..))
class CFunctor m where
-- Each instance defines a constraint it valust must satisfy:
type Constraint m a
-- The default is no constraints.
type Constraint m a = ()
fmap :: (Constraint m a, Constraint m b) => (a -> b) -> (m a -> m b)
class CFunctor m => CMonad (m :: * -> *) where
return :: (Constraint m a) => a -> m a
(>>=) :: (Constraint m a, Constraint m b) => m a -> (a -> m b) -> m b
fail :: String -> m a
fail = error
-- [] instance
instance CFunctor [] where
fmap = map
instance CMonad [] where
return = (: [])
(>>=) = flip concatMap
-- Set instance
instance CFunctor S.Set where
-- Sets need Ord.
type Constraint S.Set a = Ord a
fmap = S.map
instance CMonad S.Set where
return = S.singleton
(>>=) = flip F.foldMap
-- Example:
-- prints fromList [3,4,5]
main = print $ do
x <- S.fromList [1,2]
y <- S.fromList [2,3]
return $ x + y
(The problem with this approach is in the case the monadic values are functions, such as m (a -> b), because they can't satisfy constraints like Ord (a -> b). So one can't use combinators like <*> (or ap) for this constrained Set monad.)

Resources