Haskell data serialization of some data implementing a common type class - haskell

Let's start with the following
data A = A String deriving Show
data B = B String deriving Show
class X a where
spooge :: a -> Q
[ Some implementations of X for A and B ]
Now let's say we have custom implementations of show and read, named show' and read' respectively which utilize Show as a serialization mechanism. I want show' and read' to have types
show' :: X a => a -> String
read' :: X a => String -> a
So I can do things like
f :: String -> [Q]
f d = map (\x -> spooge $ read' x) d
Where data could have been
[show' (A "foo"), show' (B "bar")]
In summary, I wanna serialize stuff of various types which share a common typeclass so I can call their separate implementations on the deserialized stuff automatically.
Now, I realize you could write some template haskell which would generate a wrapper type, like
data XWrap = AWrap A | BWrap B deriving (Show)
and serialize the wrapped type which would guarantee that the type info would be stored with it, and that we'd be able to get ourselves back at least an XWrap... but is there a better way using haskell ninja-ery?
EDIT
Okay I need to be more application specific. This is an API. Users will define their As, and Bs and fs as they see fit. I don't ever want them hacking through the rest of the code updating their XWraps, or switches or anything. The most i'm willing to compromise is one list somewhere of all the A, B, etc. in some format. Why?
Here's the application. A is "Download a file from an FTP server." B is "convert from flac to mp3". A contains username, password, port, etc. information. B contains file path information. There could be MANY As and Bs. Hundreds. As many as people are willing to compile into the program. Two was just an example. A and B are Xs, and Xs shall be called "Tickets." Q is IO (). Spooge is runTicket. I want to read the tickets off into their relevant data types and then write generic code that will runTicket on the stuff read' from the stuff on disk. At some point I have to jam type information into the serialized data.

I'd first like to stress for all our happy listeners out there that XWrap is a very good way, and a lot of the time you can write one yourself faster than writing it using Template Haskell.
You say you can get back "at least an XWrap", as if that meant you couldn't recover the types A and B from XWrap or you couldn't use your typeclass on them. Not true! You can even define
separateAB :: [XWrap] -> ([A],[B])
If you didn't want them mixed together, you should serialise them seperately!
This is nicer than haskell ninja-ery; maybe you don't need to handle arbitrary instances, maybe just the ones you made.
Do you really need your original types back? If you feel like using existential types because you just want to spooge your deserialised data, why not either serialise the Q itself, or have some intermediate data type PoisedToSpooge that you serialise, which can deserialise to give you all the data you need for a really good spooging. Why not make it an instance of X too?
You could add a method to your X class that converts to PoisedToSpooge.
You could call it something fun like toPoisedToSpooge, which trips nicely off the tongue, don't you think? :)
Anyway this would remove your typesystem complexity at the same time as resolving the annoying ambiguous type in
f d = map (\x -> spooge $ read' x) d -- oops, the type of read' x depends on the String
You can replace read' with
stringToPoisedToSpoogeToDeserialise :: String -> PoisedToSpooge -- use to deserialise
and define
f d = map (\x -> spooge $ stringToPoisedToSpoogeToDeserialise x) -- no ambiguous type
which we could of course write more succincly as
f = map (spooge.stringToPoisedToSpoogeToDeserialise)
although I recognise the irony here in suggesting making your code more succinct. :)

If what you really want is a heterogeneous list then use existential types. If you want serialization then use Cereal + ByteString. If you want dynamic typing, which is what I think your actual goal is, then use Data.Dynamic. If none of this is what you want, or you want me to expand please press the pound key.
Based on your edit, I don't see any reason a list of thunks won't work. In what way does IO () fail to represent both the operations of "Download a file from an FTP server" and "convert from flac to MP3"?

I'll assume you want to do more things with deserialised Tickets
than run them, because if not you may as well ask the user to supply a bunch of String -> IO()
or similar, nothing clever needed at all.
If so, hooray! It's not often I feel it's appropriate to recommend advanced language features like this.
class Ticketable a where
show' :: a -> String
read' :: String -> Maybe a
runTicket :: a -> IO ()
-- other useful things to do with tickets
This all hinges on the type of read'. read' :: Ticket a => String -> a isn't very useful,
because the only thing it can do with invalid data is crash.
If we change the type to read' :: Ticket a => String -> Maybe a this can allow us to read from disk and
try all the possibilities or fail altogether.
(Alternatively you could use a parser: parse :: Ticket a => String -> Maybe (a,String).)
Let's use a GADT to give us ExistentialQuantification without the syntax and with nicer error messages:
{-# LANGUAGE GADTs #-}
data Ticket where
MkTicket :: Ticketable a => a -> Ticket
showT :: Ticket -> String
showT (MkTicket a) = show' a
runT :: Ticket -> IO()
runT (MkTicket a) = runTicket a
Notice how the MkTicket contstuctor supplies the context Ticketable a for free! GADTs are great.
It would be nice to make Ticket and instance of Ticketable, but that won't work, because there would be
an ambiguous type a hidden in it. Let's take functions that read Ticketable types and make them read
Tickets.
ticketize :: Ticketable a => (String -> Maybe a) -> (String -> Maybe Ticket)
ticketize = ((.).fmap) MkTicket -- a little pointfree fun
You could use some unusual sentinel string such as
"\n-+-+-+-+-+-Ticket-+-+-+-Border-+-+-+-+-+-+-+-\n" to separate your serialised data or better, use separate files
altogether. For this example, I'll just use "\n" as the separator.
readTickets :: [String -> Maybe Ticket] -> String -> [Maybe Ticket]
readTickets readers xs = map (foldr orelse (const Nothing) readers) (lines xs)
orelse :: (a -> Maybe b) -> (a -> Maybe b) -> (a -> Maybe b)
(f `orelse` g) x = case f x of
Nothing -> g x
just_y -> just_y
Now let's get rid of the Justs and ignore the Nothings:
runAll :: [String -> Maybe Ticket] -> String -> IO ()
runAll ps xs = mapM_ runT . catMaybes $ readTickets ps xs
Let's make a trivial ticket that just prints the contents of some directory
newtype Dir = Dir {unDir :: FilePath} deriving Show
readDir xs = let (front,back) = splitAt 4 xs in
if front == "dir:" then Just $ Dir back else Nothing
instance Ticketable Dir where
show' (Dir p) = "dir:"++show p
read' = readDir
runTicket (Dir p) = doesDirectoryExist p >>= flip when
(getDirectoryContents >=> mapM_ putStrLn $ p)
and an even more trivial ticket
data HelloWorld = HelloWorld deriving Show
readHW "HelloWorld" = Just HelloWorld
readHW _ = Nothing
instance Ticketable HelloWorld where
show' HelloWorld = "HelloWorld"
read' = readHW
runTicket HelloWorld = putStrLn "Hello World!"
and then put it all together:
myreaders = [ticketize readDir,ticketize readHW]
main = runAll myreaders $ unlines ["HelloWorld",".","HelloWorld","..",",HelloWorld"]

Just use Either. Your users don't even have to wrap it themselves. You have your deserializer wrap it in the Either for you. I don't know exactly what your serialization protocol is, but I assume that you have some way to detect which kind of request, and the following example assumes the first byte distinguishes the two requests:
deserializeRequest :: IO (Either A B)
deserializeRequest = do
byte <- get1stByte
case byte of
0 -> do
...
return $ Left $ A <A's fields>
1 -> do
...
return $ Right $ B <B's fields>
Then you don't even need to type-class spooge. Just make it a function of Either A B:
spooge :: Either A B -> Q

Related

Putting a type in the Read typeclass doesn't work in the REPL

I'm defining a type GosperInteger, representing the Eisenstein integers in a complex base, and I'd like to enter these numbers in the REPL and do operations on them. So I put the type in the Read and Show typeclasses. Here's the code (there's also an Internals module, see https://github.com/phma/gosperbase to run it):
module Data.GosperBase where
import Data.Array.Unboxed
import Data.Word
import Data.GosperBase.Internals
import qualified Data.Sequence as Seq
import Data.Sequence ((><), (<|), (|>), Seq((:<|)), Seq((:|>)))
import Data.Char
import Data.List
import Data.Maybe
{- This computes complex numbers in base 2.5-√(-3/4), called the Gosper base
because it is the scale factor from one Gosper island to the next bigger one.
The digits are cyclotomic:
2 3
6 0 1
4 5
For layout of all numbers up to 3 digits, see doc/GosperBase.ps .
-}
newtype GosperInteger = GosperInteger (Seq.Seq Word)
chunkDigitsInt :: Seq.Seq Char -> Maybe (Seq.Seq (Seq.Seq Char))
-- ^If the string ends in 'G', reverses the rest of the characters
-- and groups them into chunks of digitsPerLimb.
chunkDigitsInt (as:|>'G') = Just (Seq.reverse (Seq.chunksOf (fromIntegral digitsPerLimb) (Seq.reverse as)))
chunkDigitsInt as = Nothing
parseChunkRjust :: Seq.Seq Char -> Maybe Word
parseChunkRjust Seq.Empty = Just 0
parseChunkRjust (n:<|ns) =
let ms = parseChunkRjust ns
in case ms of
Just num -> if (n >= '0' && n < '7')
then Just (7 * num + fromIntegral (ord n - ord '0'))
else Nothing
Nothing -> Nothing
showLimb :: Word -> Word -> String
showLimb _ 0 = ""
showLimb val ndig = chr (fromIntegral ((val `div` 7 ^ (ndig-1)) `mod` 7) + ord '0') : (showLimb val (ndig-1))
parseRjust :: Seq.Seq Char -> Maybe (Seq.Seq Word)
parseRjust as =
let ns = chunkDigitsInt as
in case ns of
Just chunks -> traverse parseChunkRjust chunks
Nothing -> Nothing
showRjust' :: Seq.Seq Word -> String
showRjust' Seq.Empty = ""
showRjust' (a:<|as) = (showLimb a digitsPerLimb) ++ (showRjust' as)
showRjust :: Seq.Seq Word -> String
showRjust Seq.Empty = "0"
showRjust (a:<|as) = (showLimb a (snd (msdPosLimb a))) ++ (showRjust' as)
parse1InitTail :: (String, String) -> Maybe (GosperInteger, String)
parse1InitTail (a,b) =
let aParse = parseRjust (Seq.fromList a)
in case aParse of
Just mant -> Just (GosperInteger mant,b)
Nothing -> Nothing
parseGosperInteger :: String -> [(GosperInteger, String)]
parseGosperInteger str =
let its = zip (inits str) (tails str) -- TODO stop on invalid char
in catMaybes (fmap parse1InitTail its)
instance Read GosperInteger where
readsPrec _ str = parseGosperInteger str
instance Show GosperInteger where
show (GosperInteger m) = showRjust m ++ "G"
iAdd :: GosperInteger -> GosperInteger -> GosperInteger
iAdd (GosperInteger a) (GosperInteger b) =
GosperInteger (stripLeading0 (addRjust a b))
iMult :: GosperInteger -> GosperInteger -> GosperInteger
iMult (GosperInteger a) (GosperInteger b) =
GosperInteger (stripLeading0 (mulMant a b))
I'd like to do
> 425G * 256301G
16061525G
which requires putting GosperInteger in the Num typeclass, which I haven't done yet.
Showing a number works, and calling read on a string works, but reading a number typed into the REPL does not. Why?
> read "45G" :: GosperInteger
45G
> 45G
<interactive>:2:3: error: Data constructor not in scope: G
It is not possible to do that in a proper way (you can probably bodge this by writing an odd Num instance).
I think a better approach would be to just write that num instance, then you can write:
ghci> 425 * 256301 :: GosperInteger
16061525
If you don't want to have to write that :: GosperInteger signature you can do a few things:
Use ghci> default (GosperInteger, Double) that will mean it will automatically pick your GosperInteger type if there is ambiguity. You can also use this in normal source files.
Define a function g :: GosperInteger -> GosperInteger; g = id which you can use to disambiguate manually with less syntactic overhead:
ghci> g (425 * 256301)
16061525
The GHCi repl doesn't simply call read on the text that you type in. Instead, it has a much more complicated parser that separates your text into various tokens. One type of token is numeric: any integral number you type in will get "read" as an Integer. Of course, if you type 32 and want it to be an Int, not an Integer, this would be a problem, so the Num type class has a super convenient fromInteger function. With this, an Integer token can be converted into any instance of the Num class.
But, you want something slightly different: you want the parser to group together the numeric token along with the G token and treat them as one unit. For full support, you'd need to make an extension to the GHC parser, much like how if you type 2e7 into the prompt, you correctly get a floating point number. This isn't a simple change you can address in your source file or GHCi settings.
With all that said, there are some hacks we can play with. As Noughtmare mentions, "you can probably bodge this by writing an odd Num instance", and indeed you can! Fair warning: you probably don't want to do this, but let's explore it anyway.
The problem is that the parser returned two tokens, one that's numeric and the other that's G. Since it's uppercase, that G token is being interpreted as a data constructor (your error message pointed that out too: " Data constructor not in scope: G"). The key is to use this to our advantage.
Consider the following:
data G = G
deriving Show
instance Num (G -> GosperInteger) where
fromInteger i G = integerToGosperInteger i
Now, assuming you wrote that function integerToGosperInteger, this instance would let you type, e.g., 45G and produce a GosperInteger 45G. Hurrah! You can even do 425G * 256301G and it will work as expected. Furthermore, if you cleverly omit a fromInteger definition from your Num GosperInteger class, then you'll get a runtime error if you try to simply use a number like 425 as as GosperInteger (that is, you'll get an error for implicit coercions that don't have the G).
There are some problems.
If you try this, you'll find that type inference is pretty terrible. It probably won't work right at the prompt unless you set default (GosperInteger, Double), and you'll probably want to use lots of type annotations in your source files.
If you leave out the G, you'll get terrible type error messages or, even worse, runtime errors.
You'll get a warning that your Num instance for G -> GosperInteger is incomplete. It is incomplete, but there's no sensible definitions for anything else. You could suppress the warning or set all of the missing methods to error "This isn't how this is supposed to be used" or something, but it's still a bit of a blemish in the code.
But, if you can deal with the problems and you squint hard enough, it sorta kinda gets you what you want.

How to interact with pure algorithm in IO code

To illustrate the point with a trivial example, say I have implemented filter:
filter :: (a -> Bool) -> [a] -> [a]
And I have a predicate p that interacts with the real world:
p :: a -> IO Bool
How do it make it work with filter without writing a separate implementation:
filterIO :: (a -> IO Bool) -> [a] -> IO [a]
Presumably if I can turn p into p':
p': IO (a -> Bool)
Then I can do
main :: IO ()
main = do
p'' <- p'
print $ filter p'' [1..100]
But I haven't been able to find the conversion.
Edited:
As people have pointed out in the comment, such a conversion doesn't make sense as it would break the encapsulation of the IO Monad.
Now the question is, can I structure my code so that the pure and IO versions don't completely duplicate the core logic?
How do it make it work with filter without writing a separate implementation
That isn't possible and the fact this sort of thing isn't possible is by design - Haskell places firm limits on its types and you have to obey them. You cannot sprinkle IO all over the place willy-nilly.
Now the question is, can I structure my code so that the pure and IO versions don't completely duplicate the core logic?
You'll be interested in filterM. Then, you can get both the functionality of filterIO by using the IO monad and the pure functionality using the Identity monad. Of course, for the pure case, you now have to pay the extra price of wrapping/unwrapping (or coerceing) the Identity wrapper. (Side remark: since Identity is a newtype this is only a code readability cost, not a runtime one.)
ghci> data Color = Red | Green | Blue deriving (Read, Show, Eq)
Here is a monadic example (note that the lines containing only Red, Blue, and Blue are user-entered at the prompt):
ghci> filterM (\x -> do y<-readLn; pure (x==y)) [Red,Green,Blue]
Red
Blue
Blue
[Red,Blue] :: IO [Color]
Here is a pure example:
ghci> filterM (\x -> Identity (x /= Green)) [Red,Green,Blue]
Identity [Red,Blue] :: Identity [Color]
As already said, you can use filterM for this specific task. However, it is usually better to keep with Haskell's characteristic strict seperation of IO and calculations. In your case, you can just tick off all necessary IO in one go and then do the interesting filtering in nice, reliable, easily testable pure code (i.e. here, simply with the normal filter):
type A = Int
type Annotated = (A, Bool)
p' :: Annotated -> Bool
p' = snd
main :: IO ()
main = do
candidates <- forM [1..100] $ \n -> do
permitted <- p n
return (n, permitted)
print $ fst <$> filter p' candidates
Here, we first annotate each number with a flag indicating what the environment says. This flag can then simply be read out in the actual filtering step, without requiring any further IO.
In short, this would be written:
main :: IO ()
main = do
candidates <- forM [1..100] $ \n -> (n,) <$> p n
print $ fst <$> filter snd candidates
While it is not feasible for this specific task, I'd also add that you can in principle achieve the IO seperation with something like your p'. This requires that the type A is “small enough” that you can evaluate the predicate with all values that are possible at all. For instance,
import qualified Data.Map as Map
type A = Char
p' :: IO (A -> Bool)
p' = (Map.!) . Map.fromList <$> mapM (\c -> (c,) <$> p c) ['\0'..]
This evaluates the predicate once for all of the 1114112 chars there are and stores the results in a lookup table.

Parsing to Free Monads

Say I have the following free monad:
data ExampleF a
= Foo Int a
| Bar String (Int -> a)
deriving Functor
type Example = Free ExampleF -- this is the free monad want to discuss
I know how I can work with this monad, eg. I could write some nice helpers:
foo :: Int -> Example ()
foo i = liftF $ Foo i ()
bar :: String -> Example Int
bar s = liftF $ Bar s id
So I can write programs in haskell like:
fooThenBar :: Example Int
fooThenBar =
do
foo 10
bar "nice"
I know how to print it, interpret it, etc. But what about parsing it?
Would it be possible to write a parser that could parse arbitrary
programs like:
foo 12
bar nice
foo 11
foo 42
So I can store them, serialize them, use them in cli programs etc.
The problem I keep running into is that the type of the program depends on which program is being parsed. If the program ends with a foo it's of
type Example () if it ends with a bar it's of type Example Int.
I do not feel like writing parsers for every possible permutation (it's simple here because there are only two possibilities, but imagine we add
Baz Int (String -> a), Doo (Int -> a), Moz Int a, Foz String a, .... This get's tedious and error-prone).
Perhaps I'm solving the wrong problem?
Boilerplate
To run the above examples, you need to add this to the beginning of the file:
{-# LANGUAGE DeriveFunctor #-}
import Control.Monad.Free
import Text.ParserCombinators.Parsec
Note: I put up a gist containing this code.
Not every Example value can be represented on the page without reimplementing some portion of Haskell. For example, return putStrLn has a type of Example (String -> IO ()), but I don't think it makes sense to attempt to parse that sort of Example value out of a file.
So let's restrict ourselves to parsing the examples you've given, which consist only of calls to foo and bar sequenced with >> (that is, no variable bindings and no arbitrary computations)*. The Backus-Naur form for our grammar looks approximately like this:
<program> ::= "" | <expr> "\n" <program>
<expr> ::= "foo " <integer> | "bar " <string>
It's straightforward enough to parse our two types of expression...
type Parser = Parsec String ()
int :: Parser Int
int = fmap read (many1 digit)
parseFoo :: Parser (Example ())
parseFoo = string "foo " *> fmap foo int
parseBar :: Parser (Example Int)
parseBar = string "bar " *> fmap bar (many1 alphaNum)
... but how can we give a type to the composition of these two parsers?
parseExpr :: Parser (Example ???)
parseExpr = parseFoo <|> parseBar
parseFoo and parseBar have different types, so we can't compose them with <|> :: Alternative f => f a -> f a -> f a. Moreover, there's no way to know ahead of time which type the program we're given will be: as you point out, the type of the parsed program depends on the value of the input string. "Types depending on values" is called dependent types; Haskell doesn't feature a proper dependent type system, but it comes close enough for us to have a stab at making this example work.
Let's start by forcing the expressions on either side of <|> to have the same type. This involves erasing Example's type parameter using existential quantification.†
data Ex a = forall i. Wrap (a i)
parseExpr :: Parser (Ex Example)
parseExpr = fmap Wrap parseFoo <|> fmap Wrap parseBar
This typechecks, but the parser now returns an Example containing a value of an unknown type. A value of unknown type is of course useless - but we do know something about Example's parameter: it must be either () or Int because those are the return types of parseFoo and parseBar. Programming is about getting knowledge out of your brain and onto the page, so we're going to wrap up the Example value with a bit of GADT evidence which, when unwrapped, will tell you whether a was Int or ().
data Ty a where
IntTy :: Ty Int
UnitTy :: Ty ()
data (a :*: b) i = a i :&: b i
type Sig a b = Ex (a :*: b)
pattern Sig x y = Wrap (x :&: y)
parseExpr :: Parser (Sig Ty Example)
parseExpr = fmap (\x -> Sig UnitTy x) parseFoo <|>
fmap (\x -> Sig IntTy x) parseBar
Ty is (something like) a runtime "singleton" representative of Example's type parameter. When you pattern match on IntTy, you learn that a ~ Int; when you pattern match on UnitTy you learn that a ~ (). (Information can be made to flow the other way, from types to values, using classes.) :*:, the functor product, pairs up two type constructors ensuring that their parameters are equal; thus, pattern matching on the Ty tells you about its accompanying Example.
Sig is therefore called a dependent pair or sigma type - the type of the second component of the pair depends on the value of the first. This is a common technique: when you erase a type parameter by existential quantification, it usually pays to make it recoverable by bundling up a runtime representative of that parameter.
Note that this use of Sig is equivalent to Either (Example Int) (Example ()) - a sigma type is a sum, after all - but this version scales better when you're summing over a large (or possibly infinite) set.
Now it's easy to build our expression parser into a program parser. We just have to repeatedly apply the expression parser, and then manipulate the dependent pairs in the list.
parseProgram :: Parser (Sig Ty Example)
parseProgram = fmap (foldr1 combine) $ parseExpr `sepBy1` (char '\n')
where combine (Sig _ val) (Sig ty acc) = Sig ty (val >> acc)
The code I've shown you is not exemplary. It doesn't separate the concerns of parsing and typechecking. In production code I would modularise this design by first parsing the data into an untyped syntax tree - a separate data type which doesn't enforce the typing invariant - then transform that into a typed version by type-checking it. The dependent pair technique would still be necessary to give a type to the output of the type-checker, but it wouldn't be tangled up in the parser.
*If binding is not a requirement, have you thought about using a free applicative to represent your data?
†Ex and :*: are reusable bits of machinery which I lifted from the Hasochism paper
So, I worry that this is the same sort of premature abstraction that you see in object-oriented languages, getting in the way of things. For example, I am not 100% sure that you are using the structure of the free monad -- your helpers for example simply seem to use id and () in a rather boring way, in fact I'm not sure if your Int -> x is ever anything other than either Pure :: Int -> Free ExampleF Int or const (something :: Free ExampleF Int).
The free monad for a functor F can basically be described as a tree whose data is stored in leaves and whose branching factor is controlled by the recursion in each constructor of the functor F. So for example Free Identity has no branching, hence only one leaf, and thus has the same structure as the monad:
data MonoidalFree m x = MF m x deriving (Functor)
instance Monoid m => Monad (MonoidalFree m) where
return x = MF mempty x
MF m x >>= my_x = case my_x x of MF n y -> MF (mappend m n) y
In fact Free Identity is isomorphic to MonoidalFree (Sum Integer), the difference is just that instead of MF (Sum 3) "Hello" you see Free . Identity . Free . Identity . Free . Identity $ Pure "Hello" as the means of tracking this integer. On the other hand if you have data E x = L x | R x deriving (Functor) then you get a sort of "path" of Ls and Rs before you hit this one leaf, Free E is going to be isomorphic to MonoidalFree [Bool].
The reason I'm going through this is that when you combine Free with an Integer -> x functor, you get an infinitely branching tree, and when I'm looking through your code to figure out how you're actually using this tree, all I see is that you use the id function with it. As far as I can tell, that restricts the recursion to either have the form Free (Bar "string" Pure) or else Free (Bar "string" (const subExpression)), in which case the system would seem to reduce completely to the MonoidalFree [Either Int String] monad.
(At this point I should pause to ask: Is that correct as far as you know? Was this what was intended?)
Anyway. Aside from my problems with your premature abstraction, the specific problem that you're citing with your monad (you can't tell the difference between () and Int has a bunch of really complicated solutions, but one really easy one. The really easy solution is to yield a value of type Example (Either () Int) and if you have a () you can fmap Left onto it and if you have an Int you can fmap Right onto it.
Without a much better understanding of how you're using this thing over TCP/IP we can't recommend a better structure for you than the generic free monads that you seem to be finding -- in particular we'd need to know how you're planning on using the infinite-branching of Int -> x options in practice.

Deserializing many network messages without using an ad-hoc parser implementation

I have a question pertaining to deserialization. I can envision a solution using Data.Data, Data.Typeable, or with GHC.Generics, but I'm curious if it can be accomplished without generics, SYB, or meta-programming.
Problem Description:
Given a list of [String] that is known to contain the fields of a locally defined algebraic data type, I would like to deserialize the [String] to construct the target data type. I could write a parser to do this, but I'm looking for a generalized solution that will deserialize to an arbitrary number of data types defined within the program without writing a parser for each type. With knowledge of the number and type of value constructors an algebraic type has, it's as simple as performing a read on each string to yield the appropriate values necessary to build up the type. However, I don't want to use generics, reflection, SYB, or meta-programming (unless it's otherwise impossible).
Say I have around 50 types defined similar to this (all simple algebraic types composed of basic primitives (no nested or recursive types, just different combinations and orderings of primitives) :
data NetworkMsg = NetworkMsg { field1 :: Int, field2 :: Int, field3 :: Double}
data NetworkMsg2 = NetworkMsg2 { field1 :: Double, field2 :: Int, field3 :: Double }
I can determine the data-type to be associated with a [String] I've received over the network using a tag id that I parse before each [String].
Possible conjectured solution path:
Since data constructors are first-class values in Haskell, and actually have a type-- Can NetworkMsg constructor be thought of as a function, such as:
NetworkMsg :: Int -> Int -> Double -> NetworkMsg
Could I transform this function into a function on tuples using uncurryN then copy the [String] into a tuple of the same shape the function now takes?
NetworkMsg' :: (Int, Int, Double) -> NetworkMsg
I don't think this would work because I'd need knowledge of the value constructors and type information, which would require Data.Typeable, reflection, or some other metaprogramming technique.
Basically, I'm looking for automatic deserialization of many types without writing type instance declarations or analyzing the type's shape at run-time. If it's not feasible, I'll do it an alternative way.
You are correct in that the constructors are essentially just functions so you can write generic instances for any number of types by just writing instances for the functions. You'll still need to write a separate instance
for all the different numbers of arguments, though.
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE MultiParamTypeClasses #-}
import Text.Read
import Control.Applicative
class FieldParser p r where
parseFields :: p -> [String] -> Maybe r
instance Read a => FieldParser (a -> r) r where
parseFields con [a] = con <$> readMaybe a
parseFields _ _ = Nothing
instance (Read a, Read b) => FieldParser (a -> b -> r) r where
parseFields con [a, b] = con <$> readMaybe a <*> readMaybe b
parseFields _ _ = Nothing
instance (Read a, Read b, Read c) => FieldParser (a -> b -> c -> r) r where
parseFields con [a, b, c] = con <$> readMaybe a <*> readMaybe b <*> readMaybe c
parseFields _ _ = Nothing
{- etc. for as many arguments as you need -}
Now you can use this type class to parse any message based on the constructor as long as the type-checker is able to figure out the resulting message type from context (i.e. it is not able to deduce it simply from the given constructor for these sort of multi-param type class instances).
data Test1 = Test1 {fieldA :: Int} deriving Show
data Test2 = Test2 {fieldB ::Int, fieldC :: Float} deriving Show
test :: String -> [String] -> IO ()
test tag fields = case tag of
"Test1" -> case parseFields Test1 fields of
Just (a :: Test1) -> putStrLn $ "Succesfully parsed " ++ show a
Nothing -> putStrLn "Parse error"
"Test2" -> case parseFields Test2 fields of
Just (a :: Test2) -> putStrLn $ "Succesfully parsed " ++ show a
Nothing -> putStrLn "Parse error"
I'd like to know how exactly you use the message types in the application, though, because having each message as its separate type makes it very difficult to have any sort of generic message handler.
Is there some reason why you don't simply have a single message data type? Such as
data NetworkMsg
= NetworkMsg1 {fieldA :: Int}
| NetworkMsg2 {fieldB :: Int, fieldC :: Float}
Now, while the instances are built in pretty much the same way, you get much better type inference since the result type is always known.
instance Read a => MessageParser (a -> NetworkMsg) where
parseMsg con [a] = con <$> readMaybe a
instance (Read a, Read b) => MessageParser (a -> b -> NetworkMsg) where
parseMsg con [a, b] = con <$> readMaybe a <*> readMaybe b
instance (Read a, Read b, Read c) => MessageParser (a -> b -> c -> NetworkMsg) where
parseMsg con [a, b, c] = con <$> readMaybe a <*> readMaybe b <*> readMaybe c
parseMessage :: String -> [String] -> Maybe NetworkMsg
parseMessage tag fields = case tag of
"NetworkMsg1" -> parseMsg NetworkMsg1 fields
"NetworkMsg2" -> parseMsg NetworkMsg2 fields
_ -> Nothing
I'm also not sure why you want to do type-generic programming specifically without actually using any of the tools meant for generics. GHC.Generics, SYB or Template Haskell is usually the best solution for this kind of problem.

Binary instance for an existential

Given an existential data type, for example:
data Foo = forall a . (Typeable a, Binary a) => Foo a
I'd like to write instance Binary Foo. I can write the serialisation (serialise the TypeRep then serialise the value), but I can't figure out how to write the deserialisation. The basic problem is that given a TypeRep you need to map back to the type dictionary for that type - and I don't know if that can be done.
This question has been asked before on the haskell mailing list http://www.haskell.org/pipermail/haskell/2006-September/018522.html, but no answers were given.
You need some way that each Binary instance can register itself (just as in your witness version). You can do this by bundling each instance declaration with an exported foreign symbol, where the symbol name is derived from the TypeRep. Then when you want to deserialize you get the name from the TypeRep and look up that symbol dynamically (with dlsym() or something similar). The value exported by the foreign export can, e.g., be the deserializer function.
It's crazy ugly, but it works.
This can be solved in GHC 7.10 and onwards using the Static Pointers Language extension:
{-# LANGUAGE StaticPointers #-}
{-# LANGUAGE InstanceSigs #-}
data Foo = forall a . (StaticFoo a, Binary a, Show a) => Foo a
class StaticFoo a where
staticFoo :: a -> StaticPtr (Get Foo)
instance StaticFoo String where
staticFoo _ = static (Foo <$> (get :: Get String))
instance Binary Foo where
put (Foo x) = do
put $ staticKey $ staticFoo x
put x
get = do
ptr <- get
case unsafePerformIO (unsafeLookupStaticPtr ptr) of
Just value -> deRefStaticPtr value :: Get Foo
Nothing -> error "Binary Foo: unknown static pointer"
A full description of the solution can be found on this blog post, and a complete snippet here.
If you could do that, you would also be able to implement:
isValidRead :: TypeRep -> String -> Bool
This would be a function that changes its behavior due to someone defining a new type! Not very pure-ish.. I think (and hope) that one can't implement this in Haskell..
I have an answer that slightly works in some situations (not enough for my purposes), but may be the best that can be done. You can add a witness function to witness any types that you have, and then the deserialisation can lookup in the witness table. The rough idea is (untested):
witnesses :: IORef [Foo]
witnesses = unsafePerformIO $ newIORef []
witness :: (Typeable a, Binary a) => a -> IO ()
witness x = modifyIORef (Foo x :)
instance Binary Foo where
put (Foo x) = put (typeOf x) >> put x
get = do
ty <- get
wits <- unsafePerformIO $ readIORef witnesses
case [Foo x | Foo x <- wits, typeOf x == ty] of
Foo x:_ -> fmap Foo $ get `asTypeOf` return x
[] -> error $ "Could not find a witness for the type: " ++ show ty
The idea is that as you go through, you call witness on values of every type that you may plausibly encounter when deserialising. When you deserialise you search this list. The obvious problem is that if you fail to call witness before deserialisation you get a crash.

Resources