newtype-like operational semantics for a GADT where the type fully determines the constructor - haskell

Suppose I have a GADT like the following:
data Tag = A | B | C
data Payload (tag :: Tag) where
PA :: Int -> Payload A
PB :: Double -> Payload B
PC :: Bool -> Payload C
I want Payload to have no runtime representation on its own -- i.e. I want to have Coercible Int (Payload A), I want zero-cost pattern matching, and in general I want it to behave as if I had the following three newtype definitions instead:
newtype PayloadA = PA Int
newtype PayloadB = PB Double
newtype PayloadC = PC Bool
Is there a way to convince GHC to give me that?

This looks impossible to achieve, at least with current GHC. Assume your Payload A had the same representation of Int, and so on.
data Tag = A | B | C
data Payload (tag :: Tag) where
PA :: Int -> Payload A
PB :: Double -> Payload B
PC :: Bool -> Payload C
Then, how should we implement this?
foo :: Payload tag -> Int
foo (PA i) = i
foo (PB _) = 1
foo (PC _) = 2
To implement foo, we somehow need to extract the tag from the Payload tag, but that is impossible if Payload tag does not store the tag in its representation.
What we could do, instead, is to separate the runtime tag representation from its payload data.
type family Payload (tag :: Tag) where
Payload 'A = Int
Payload 'B = Double
Payload 'C = Bool
Now, Payload 'A is exactly an Int. We however lose the ability to write foo, since the tag is no longer stored at runtime along the payload. We can instead write this:
-- singleton, could be auto-generated using the singletons library
data STag (tag :: Tag) where
SA :: STag 'A
SB :: STag 'B
SC :: STag 'C
bar :: STag tag -> Payload tag -> Int
bar PA i = i
bar PB _ = 1
bar PC _ = 2
Note how we essentially add the tag as an argument, since we do need it to be represented at runtime.

You can do like:
data family Payload (tag :: Tag)
newtype instance Payload A = PA Int
newtype instance Payload B = PB Double
newtype instance Payload C = PC Bool
This meets your requirement of actually having types distinct from Int, Double, Bool, and of having the operation semantics of a newtype. Of course, the price you pay is you won't be able to pattern-match to determine which is which. But you can recover such things with a typeclass or by explicitly passing the tag (which mostly amount to the same thing under the hood); for example:
class Foo (t :: Tag) where foo :: Payload t -> Int
instance Foo A where foo (PA i) = i
instance Foo B where foo (PB _) = 2
instance Foo C where foo (PC _) = 3

Related

Encode Custom Data Type to Lazy Byte String

when you want to convert a custom type into bytestring you would do following:
data Foo = Foo {value1 :: Int}
instance Binary Foo where
get =liftM Foo get
put b = put (value1 b)
and if you have a type with multiple viable values as such:
data Foo2 = Foo2A Int | Foo2B Int
instance Binary Foo2 where
get = do flag <- getWord8
case flag of
0 -> fmap Foo2A get
1 -> fmap Foo2B get
put (Foo2A i) = do put (0 :: Word8)
put i
put (Foo2B i) = do put (1 :: Word8)
put i
but if you have a type as such (following...) how would I do this?:
data Foo3 = Foo3A Int | Foo3B
instance Binary Foo3 where
get = do flag <- getWord8
case flag of
0 -> fmap Foo3A get
1 -> ....????? Foo3B has no value - only Data Constructor
put (Foo3A i) = do put (0 :: Word8)
put i
put (Foo3B) = put (1 :: Word8)
You can also derive these instances:
newtype Foo = Foo {value1 :: Int}
deriving newtype Binary
data Foo2 = Foo2A Int | Foo2B Int
deriving stock Generic
deriving anyclass Binary
data Foo3 = Foo3A Int | Foo3B
deriving stock Generic
deriving anyclass Binary
To match what you wrote for put, you want pure Foo3B there in get.

Improve Coroutine Request type safety

Working with monad-coroutine package I have some coroutine doing a lot of work and it needs some input from time to time:
Coroutine (Request SomeRequest SomeResponse) (State MyState) a
where
data SomeRequest
= GetImportantData Int
| OtherImportantStuff Float
| SomethingElse (Vector Int)
| ...
data SomeResponse
= ImprtantData (Vector Float)
| ImportantStuff Int
| ...
As you can see for each entry in SomeRequest I have a respective entry in SomeResponse.
During the runtime of this coroutine I have something like this:
...
ImportantData info <- request (GetImportantData 666)
...
Now I'm afraid that this approach is not good because what I want is to make sure that whenever I request important data with GetImportantData the only possible response is ImportantData and nothing else. With my current approach I have to pattern match every single time I make a request (to make sure that the input is actually what I want).
Any way I can improve the design/approach to make sure that for GetImportantData I get ImportantData back only, for OtherImportantStuff I get ImportantStuff only, etc ?
Rather than using the monad-coroutine-provided
data Request request response x = Request request (response -> x)
define your own suspension type
data MySuspension x
= GetImportantData Int (Vector Float -> x)
| GetOtherImportantStuff Float (Int -> x)
| ...
deriving (Functor)
Or you can use a GADT
data MyRequest r where
GetImportantData :: Int -> MyRequest (Vector Float)
GetOtherImportantStuff :: Float -> MyRequest Int
...
and a corresponding suspension type involving an existential, as in the operational package. (monad-coroutine just provides a free monad transformer, and operational provides a slightly different kind of free monad transformer. Coroutine MySuspension m r is essentially the same as ProgramT MyRequest m r.)
Phantom types and GADTs may help you achieve more type safety here.
{-# LANGUAGE GADTs #-}
import qualified Data.Vector as V
data Important
data SomethingElse
data Request a where
GetImportantData :: Int -> Request Important
OtherRequest :: Float -> Request SomethingElse
data Response a where
ImportantData :: V.Vector Int -> Response Important
OtherResponse :: Int -> Response SomethingElse
-- a typical use case
controller :: Request Important -> Response Important
controller (GetImportantData n) = ImportantData $ V.singleton n
-- or, more generally
controller' :: Request a -> Response a
controller' (GetImportantData n) = ImportantData $ V.singleton n
-- error: Couldn't match type 'Important' with 'SomethingElse'
badController :: Request a -> Response a
badController (GetImportantData n) = OtherResponse n
Request a and Response a are phantom types because the type parameter a has nothing to do with the underlying values (e.g. Int in GetImportantData) . Phantom type is widely used for ensuring type safety.
The language extension GADTs permits explicit type declaration of a constructor, make it easy to distinguish between constructors of a data type.
Instead of
data Foo = Bar | Qux
where Bar and Qux both have type Foo, with GADTs one can define
data Foo a where
Bar :: Foo Int
Qux :: Foo Float
by doing so Bar and Qux have different types.
There are some brilliant tutorials about this topic on WikiBooks and Haskell wiki.
https://wiki.haskell.org/Phantom_type
https://en.wikibooks.org/wiki/Haskell/GADT

Deserializing many network messages without using an ad-hoc parser implementation

I have a question pertaining to deserialization. I can envision a solution using Data.Data, Data.Typeable, or with GHC.Generics, but I'm curious if it can be accomplished without generics, SYB, or meta-programming.
Problem Description:
Given a list of [String] that is known to contain the fields of a locally defined algebraic data type, I would like to deserialize the [String] to construct the target data type. I could write a parser to do this, but I'm looking for a generalized solution that will deserialize to an arbitrary number of data types defined within the program without writing a parser for each type. With knowledge of the number and type of value constructors an algebraic type has, it's as simple as performing a read on each string to yield the appropriate values necessary to build up the type. However, I don't want to use generics, reflection, SYB, or meta-programming (unless it's otherwise impossible).
Say I have around 50 types defined similar to this (all simple algebraic types composed of basic primitives (no nested or recursive types, just different combinations and orderings of primitives) :
data NetworkMsg = NetworkMsg { field1 :: Int, field2 :: Int, field3 :: Double}
data NetworkMsg2 = NetworkMsg2 { field1 :: Double, field2 :: Int, field3 :: Double }
I can determine the data-type to be associated with a [String] I've received over the network using a tag id that I parse before each [String].
Possible conjectured solution path:
Since data constructors are first-class values in Haskell, and actually have a type-- Can NetworkMsg constructor be thought of as a function, such as:
NetworkMsg :: Int -> Int -> Double -> NetworkMsg
Could I transform this function into a function on tuples using uncurryN then copy the [String] into a tuple of the same shape the function now takes?
NetworkMsg' :: (Int, Int, Double) -> NetworkMsg
I don't think this would work because I'd need knowledge of the value constructors and type information, which would require Data.Typeable, reflection, or some other metaprogramming technique.
Basically, I'm looking for automatic deserialization of many types without writing type instance declarations or analyzing the type's shape at run-time. If it's not feasible, I'll do it an alternative way.
You are correct in that the constructors are essentially just functions so you can write generic instances for any number of types by just writing instances for the functions. You'll still need to write a separate instance
for all the different numbers of arguments, though.
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE MultiParamTypeClasses #-}
import Text.Read
import Control.Applicative
class FieldParser p r where
parseFields :: p -> [String] -> Maybe r
instance Read a => FieldParser (a -> r) r where
parseFields con [a] = con <$> readMaybe a
parseFields _ _ = Nothing
instance (Read a, Read b) => FieldParser (a -> b -> r) r where
parseFields con [a, b] = con <$> readMaybe a <*> readMaybe b
parseFields _ _ = Nothing
instance (Read a, Read b, Read c) => FieldParser (a -> b -> c -> r) r where
parseFields con [a, b, c] = con <$> readMaybe a <*> readMaybe b <*> readMaybe c
parseFields _ _ = Nothing
{- etc. for as many arguments as you need -}
Now you can use this type class to parse any message based on the constructor as long as the type-checker is able to figure out the resulting message type from context (i.e. it is not able to deduce it simply from the given constructor for these sort of multi-param type class instances).
data Test1 = Test1 {fieldA :: Int} deriving Show
data Test2 = Test2 {fieldB ::Int, fieldC :: Float} deriving Show
test :: String -> [String] -> IO ()
test tag fields = case tag of
"Test1" -> case parseFields Test1 fields of
Just (a :: Test1) -> putStrLn $ "Succesfully parsed " ++ show a
Nothing -> putStrLn "Parse error"
"Test2" -> case parseFields Test2 fields of
Just (a :: Test2) -> putStrLn $ "Succesfully parsed " ++ show a
Nothing -> putStrLn "Parse error"
I'd like to know how exactly you use the message types in the application, though, because having each message as its separate type makes it very difficult to have any sort of generic message handler.
Is there some reason why you don't simply have a single message data type? Such as
data NetworkMsg
= NetworkMsg1 {fieldA :: Int}
| NetworkMsg2 {fieldB :: Int, fieldC :: Float}
Now, while the instances are built in pretty much the same way, you get much better type inference since the result type is always known.
instance Read a => MessageParser (a -> NetworkMsg) where
parseMsg con [a] = con <$> readMaybe a
instance (Read a, Read b) => MessageParser (a -> b -> NetworkMsg) where
parseMsg con [a, b] = con <$> readMaybe a <*> readMaybe b
instance (Read a, Read b, Read c) => MessageParser (a -> b -> c -> NetworkMsg) where
parseMsg con [a, b, c] = con <$> readMaybe a <*> readMaybe b <*> readMaybe c
parseMessage :: String -> [String] -> Maybe NetworkMsg
parseMessage tag fields = case tag of
"NetworkMsg1" -> parseMsg NetworkMsg1 fields
"NetworkMsg2" -> parseMsg NetworkMsg2 fields
_ -> Nothing
I'm also not sure why you want to do type-generic programming specifically without actually using any of the tools meant for generics. GHC.Generics, SYB or Template Haskell is usually the best solution for this kind of problem.

Is there a compiler-extension for untagged union types in Haskell?

In some languages (#racket/typed, for example), the programmer can specify a union type without discriminating against it, for instance, the type (U Integer String) captures integers and strings, without tagging them (I Integer) (S String) in a data IntOrStringUnion = ... form or anything like that.
Is there a way to do the same in Haskell?
Either is what you're looking for... ish.
In Haskell terms, I'd describe what you're looking for as an anonymous sum type. By anonymous, I mean that it doesn't have a defined name (like something with a data declaration). By sum type, I mean a data type that can have one of several (distinguishable) types; a tagged union or such. (If you're not familiar with this terminology, try Wikipedia for starters.)
We have a well-known idiomatic anonymous product type, which is just a tuple. If you want to have both an Int and a String, you just smush them together with a comma: (Int, String). And tuples (seemingly) can go on forever--(Int, String, Double, Word), and you can pattern-match the same way. (There's a limit, but never mind.)
The well-known idiomatic anonymous sum type is Either, from Data.Either (and the Prelude):
data Either a b = Left a | Right b
deriving (Eq, Ord, Read, Show, Typeable)
It has some shortcomings, most prominently a Functor instance that favors Right in a way that's odd in this context. The problem is that chaining it introduces a lot of awkwardness: the type ends up like Either (Int (Either String (Either Double Word))). Pattern matching is even more awkward, as others have noted.
I just want to note that we can get closer to (what I understand to be) the Racket use case. From my extremely brief Googling, it looks like in Racket you can use functions like isNumber? to determine what type is actually in a given value of a union type. In Haskell, we usually do that with case analysis (pattern matching), but that's awkward with Either, and function using simple pattern-matching will likely end up hard-wired to a particular union type. We can do better.
IsNumber?
I'm going to write a function I think is an idiomatic Haskell stand-in for isNumber?. First, we don't like doing Boolean tests and then running functions that assume their result; instead, we tend to just convert to Maybe and go from there. So the function's type will end with -> Maybe Int. (Using Int as a stand-in for now.)
But what's on the left hand of the arrow? "Something that might be an Int -- or a String, or whatever other types we put in the union." Uh, okay. So it's going to be one of a number of types. That sounds like typeclass, so we'll put a constraint and a type variable on the left hand of the arrow: MightBeInt a => a -> Maybe Int. Okay, let's write out the class:
class MightBeInt a where
isInt :: a -> Maybe Int
fromInt :: Int -> a
Okay, now how do we write the instances? Well, we know if the first parameter to Either is Int, we're golden, so let's write that out. (Incidentally, if you want a nice exercise, only look at the instance ... where parts of these next three code blocks, and try to implement that class members yourself.)
instance MightBeInt (Either Int b) where
isInt (Left i) = Just i
isInt _ = Nothing
fromInt = Left
Fine. And ditto if Int is the second parameter:
instance MightBeInt (Either a Int) where
isInt (Right i) = Just i
isInt _ = Nothing
fromInt = Right
But what about Either String (Either Bool Int)? The trick is to recurse on the right hand type: if it's not Int, is it an instance of MightBeInt itself?
instance MightBeInt b => MightBeInt (Either a b) where
isInt (Right xs) = isInt xs
isInt _ = Nothing
fromInt = Right . fromInt
(Note that these all require FlexibleInstances and OverlappingInstances.) It took me a long time to get a feel for writing and reading these class instances; don't worry if this instance is surprising. The punch line is that we can now do this:
anInt1 :: Either Int String
anInt1 = fromInt 1
anInt2 :: Either String (Either Int Double)
anInt2 = fromInt 2
anInt3 :: Either String Int
anInt3 = fromInt 3
notAnInt :: Either String Int
notAnInt = Left "notint"
ghci> isInt anInt3
Just 3
ghci> isInt notAnInt
Nothing
Great!
Generalizing
Okay, but now do we need to write another type class for each type we want to look up? Nope! We can parameterize the class by the type we want to look up! It's a pretty mechanical translation; the only question is how to tell the compiler what type we're looking for, and that's where Proxy comes to the rescue. (If you don't want to install tagged or run base 4.7, just define data Proxy a = Proxy. It's nothing special, but you'll need PolyKinds.)
class MightBeA t a where
isA :: proxy t -> a -> Maybe t
fromA :: t -> a
instance MightBeA t t where
isA _ = Just
fromA = id
instance MightBeA t (Either t b) where
isA _ (Left i) = Just i
isA _ _ = Nothing
fromA = Left
instance MightBeA t b => MightBeA t (Either a b) where
isA p (Right xs) = isA p xs
isA _ _ = Nothing
fromA = Right . fromA
ghci> isA (Proxy :: Proxy Int) anInt3
Just 3
ghci> isA (Proxy :: Proxy String) notAnInt
Just "notint"
Now the usability situation is... better. The main thing we've lost, by the way, is the exhaustiveness checker.
Notational Parity With (U String Int Double)
For fun, in GHC 7.8 we can use DataKinds and TypeFamilies to eliminate the infix type constructors in favor of type-level lists. (In Haskell, you can't have one type constructor--like IO or Either--take a variable number of parameters, but a type-level list is just one parameter.) It's just a few lines, which I'm not really going to explain:
type family OneOf (as :: [*]) :: * where
OneOf '[] = Void
OneOf '[a] = a
OneOf (a ': as) = Either a (OneOf as)
Note that you'll need to import Data.Void. Now we can do this:
anInt4 :: OneOf '[Int, Double, Float, String]
anInt4 = fromInt 4
ghci> :kind! OneOf '[Int, Double, Float, String]
OneOf '[Int, Double, Float, String] :: *
= Either Int (Either Double (Either Float [Char]))
In other words, OneOf '[Int, Double, Float, String] is the same as Either Int (Either Double (Either Float [Char])).
You need some kind of tagging because you need to be able to check if a value is actually an Integer or a String to use it for anything. One way to alleviate having to create a custom ADT for every combination is to use a type such as
{-# LANGUAGE TypeOperators #-}
data a :+: b = L a | R b
infixr 6 :+:
returnsIntOrString :: Integer -> Integer :+: String
returnsIntOrString i
| i `rem` 2 == 0 = R "Even"
| otherwise = L (i * 2)
returnsOneOfThree :: Integer -> Integer :+: String :+: Bool
returnsOneOfThree i
| i `rem` 2 == 0 = (R . L) "Even"
| i `rem` 3 == 0 = (R . R) False
| otherwise = L (i * 2)

Can you pattern match constructors on a type class constrained parameter?

See code example below. It won't compile. I had thought that maybe it's because it has to have a single type for the first parameter in the test function. But that doesn't make sense because if I don't pattern match on it so it will compile, I can call it with both MyObj11 5 and MyObj21 5 which are two different types.
So what is it that restricts so you can't pattern match on constructors with a type class constrained parameter? Or is there some mechanism by which you can?
class SomeClass a where toString :: a -> String
instance SomeClass MyType1 where toString v = "MyType1"
instance SomeClass MyType2 where toString v = "MyType2"
data MyType1 = MyObj11 Int | MyObj12 Int Int
data MyType2 = MyObj21 Int | MyObj22 Int Int
test :: SomeClass a => a -> String
test (MyObj11 x) = "11"
test (MyObj12 x y) = "12" -- Error here if remove 3rd line: rigid type bound error
test (MyObj22 x y) = "22" -- Error here about not match MyType1.
what is it that restricts so you can't pattern match on constructors with a type class constrained parameter?
When you pattern match on an explicit constructor, you commit to a specific data type representation. This data type is not shared among all instances of the class, and so it is simply not possible to write a function that works for all instances in this way.
Instead, you need to associate the different behaviors your want with each instance, like so:
class C a where
toString :: a -> String
draw :: a -> String
instance C MyType1 where
toString v = "MyType1"
draw (MyObj11 x) = "11"
draw (MyObj12 x y) = "12"
instance C MyType2 where
toString v = "MyType2"
draw (MyObj22 x y) = "22"
data MyType1 = MyObj11 Int | MyObj12 Int Int
data MyType2 = MyObj21 Int | MyObj22 Int Int
test :: C a => a -> String
test x = draw x
The branches of your original test function are now distributed amongst the instances.
Some alternative tricks involve using class-associated data types (where you prove to the compiler that a data type is shared amongst all instances), or view patterns (which let you generalize pattern matching).
View patterns
We can use view patterns to clean up the connection between pattern matching and type class instances, a little, allowing us to approximate pattern matching across instances by pattern matching on a shared type.
Here's an example, where we write one function, with two cases, that lets us pattern match against anything in the class.
{-# LANGUAGE ViewPatterns #-}
class C a where
view :: a -> View
data View = One Int
| Two Int Int
data MyType1 = MyObj11 Int | MyObj12 Int Int
instance C MyType1 where
view (MyObj11 n) = One n
view (MyObj12 n m) = Two n m
data MyType2 = MyObj21 Int | MyObj22 Int Int
instance C MyType2 where
view (MyObj21 n) = One n
view (MyObj22 n m) = Two n m
test :: C a => a -> String
test (view -> One n) = "One " ++ show n
test (view -> Two n m) = "Two " ++ show n ++ show m
Note how the -> syntax lets us call back to the right view function in each instance, looking up a custom data type encoding per-type, in order to pattern match on it.
The design challenge is to come up with a view type that captures all the behavior variants you're interested in.
In your original question, you wanted every constructor to have a different behavior, so there's actually no reason to use a view type (dispatching directly to that behavior in each instance already works well enough).

Resources