How to obtain a Data.Data.Constr etc. from a Type Representation? - haskell

I'm currently writing a minimalistic Haskell persistence framework that uses Data.Data Generics to provide persistence operations for data types in record syntax (I call them Entities here).
This works quite well overall (see code repo here: https://github.com/thma/generic-persistence), I've got just one ugly spot left.
Currently my function for looking up an entity by primary key has the following signature:
retrieveEntityById
:: forall a conn id. (Data a, IConnection conn, Show id)
=> conn -> TypeInfo -> id -> IO a
This function takes an HDBC database connection, A TypeInfo object and the primary key value id.
The TypeInfo contains data describing the type a. This Info will be used to generate and perform a select statement for primary key lookup and contruct an instance of type a from the HDBC result row.
TypeInfo contains Data.Data.Constr and a description of all constructor fields that are obtained with the following function:
-- | A function that returns a list of FieldInfos representing the
-- name, constructor and type of each field in a data type.
fieldInfo :: (Data a) => a -> [FieldInfo]
fieldInfo x = zipWith3 FieldInfo names constrs types
where
constructor = toConstr x
candidates = constrFields constructor
constrs = gmapQ toConstr x
types = gmapQ typeOf x
names =
if length candidates == length constrs
then map Just candidates
else replicate (length constrs) Nothing
Deriving this kind of information works great when having an actual Data a value at hand (for example in my function for updating an existing entity).
But in the retrieveEntityById case this is not possible, as the object has yet to loaded from the DB. That's why I have to call the function with an extra TypeInfo parameter that I create from a sample entity.
I would like to get rid of this extra parameter, to have a function signature like:
retrieveEntityById
:: forall a conn id. (Data a, IConnection conn, Show id)
=> conn -> id -> IO a
I tried several things like using a Proxy or a TypeRep, but I did not manage to derive my TypeInfo data from them.
Any hints and ideas are most welcome!

You can access metadata from Data a constraints without a value x :: a:
dataTypeOf (undefined :: a) is a representation of the type a. It takes an undefined argument because Data is an old interface from a time when undefined was considered more acceptable.
dataTypeConstrs extracts the list of constructors from that representation.
As you already noted in your code, although there is constrFields to get field names, there isn't an obvious way to get the arity of a non-record constructor. A solution to count the fields of a constructor is to use gunfold with the Const Int functor.
Although your current approach seems specialized to types with a single constructor, this is not a fundamental limitation. For more than one constructor, you could store the constructor name in its persistent encoding, and on decoding, it can be looked up in the list from dataTypeConstrs.

Related

Clarification on Existential Types in Haskell

I am trying to understand Existential types in Haskell and came across a PDF http://www.ii.uni.wroc.pl/~dabi/courses/ZPF15/rlasocha/prezentacja.pdf
Please correct my below understandings that I have till now.
Existential Types not seem to be interested in the type they contain but pattern matching them say that there exists some type we don't know what type it is until & unless we use Typeable or Data.
We use them when we want to Hide types (ex: for Heterogeneous Lists) or we don't really know what the types at Compile Time.
GADT's provide the clear & better syntax to code using Existential Types by providing implicit forall's
My Doubts
In Page 20 of above PDF it is mentioned for below code that it is impossible for a Function to demand specific Buffer. Why is it so? When I am drafting a Function I exactly know what kind of buffer I gonna use eventhough I may not know what data I gonna put into that.
What's wrong in Having :: Worker MemoryBuffer Int If they really want to abstract over Buffer they can have a Sum type data Buffer = MemoryBuffer | NetBuffer | RandomBuffer and have a type like :: Worker Buffer Int
data Worker x = forall b. Buffer b => Worker {buffer :: b, input :: x}
data MemoryBuffer = MemoryBuffer
memoryWorker = Worker MemoryBuffer (1 :: Int)
memoryWorker :: Worker Int
As Haskell is a Full Type Erasure language like C then How does it know at Runtime which function to call. Is it something like we gonna maintain few information and pass in a Huge V-Table of Functions and at runtime it gonna figure out from V-Table? If it is so then what sort of Information it gonna store?
GADT's provide the clear & better syntax to code using Existential Types by providing implicit forall's
I think there's general agreement that the GADT syntax is better. I wouldn't say that it's because GADTs provide implicit foralls, but rather because the original syntax, enabled with the ExistentialQuantification extension, is potentially confusing/misleading. That syntax, of course, looks like:
data SomeType = forall a. SomeType a
or with a constraint:
data SomeShowableType = forall a. Show a => SomeShowableType a
and I think the consensus is that the use of the keyword forall here allows the type to be easily confused with the completely different type:
data AnyType = AnyType (forall a. a) -- need RankNTypes extension
A better syntax might have used a separate exists keyword, so you'd write:
data SomeType = SomeType (exists a. a) -- not valid GHC syntax
The GADT syntax, whether used with implicit or explicit forall, is more uniform across these types, and seems to be easier to understand. Even with an explicit forall, the following definition gets across the idea that you can take a value of any type a and put it inside a monomorphic SomeType':
data SomeType' where
SomeType' :: forall a. (a -> SomeType') -- parentheses optional
and it's easy to see and understand the difference between that type and:
data AnyType' where
AnyType' :: (forall a. a) -> AnyType'
Existential Types not seem to be interested in the type they contain but pattern matching them say that there exists some type we don't know what type it is until & unless we use Typeable or Data.
We use them when we want to Hide types (ex: for Heterogeneous Lists) or we don't really know what the types at Compile Time.
I guess these aren't too far off, though you don't have to use Typeable or Data to use existential types. I think it would be more accurate to say an existential type provides a well-typed "box" around an unspecified type. The box does "hide" the type in a sense, which allows you to make a heterogeneous list of such boxes, ignoring the types they contain. It turns out that an unconstrained existential, like SomeType' above is pretty useless, but a constrained type:
data SomeShowableType' where
SomeShowableType' :: forall a. (Show a) => a -> SomeShowableType'
allows you to pattern match to peek inside the "box" and make the type class facilities available:
showIt :: SomeShowableType' -> String
showIt (SomeShowableType' x) = show x
Note that this works for any type class, not just Typeable or Data.
With regard to your confusion about page 20 of the slide deck, the author is saying that it's impossible for a function that takes an existential Worker to demand a Worker having a particular Buffer instance. You can write a function to create a Worker using a particular type of Buffer, like MemoryBuffer:
class Buffer b where
output :: String -> b -> IO ()
data Worker x = forall b. Buffer b => Worker {buffer :: b, input :: x}
data MemoryBuffer = MemoryBuffer
instance Buffer MemoryBuffer
memoryWorker = Worker MemoryBuffer (1 :: Int)
memoryWorker :: Worker Int
but if you write a function that takes a Worker as argument, it can only use the general Buffer type class facilities (e.g., the function output):
doWork :: Worker Int -> IO ()
doWork (Worker b x) = output (show x) b
It can't try to demand that b be a particular type of buffer, even via pattern matching:
doWorkBroken :: Worker Int -> IO ()
doWorkBroken (Worker b x) = case b of
MemoryBuffer -> error "try this" -- type error
_ -> error "try that"
Finally, runtime information about existential types is made available through implicit "dictionary" arguments for the typeclasses that are involved. The Worker type above, in addtion to having fields for the buffer and input, also has an invisible implicit field that points to the Buffer dictionary (somewhat like v-table, though it's hardly huge, as it just contains a pointer to the appropriate output function).
Internally, the type class Buffer is represented as a data type with function fields, and instances are "dictionaries" of this type:
data Buffer' b = Buffer' { output' :: String -> b -> IO () }
dBuffer_MemoryBuffer :: Buffer' MemoryBuffer
dBuffer_MemoryBuffer = Buffer' { output' = undefined }
The existential type has a hidden field for this dictionary:
data Worker' x = forall b. Worker' { dBuffer :: Buffer' b, buffer' :: b, input' :: x }
and a function like doWork that operates on existential Worker' values is implemented as:
doWork' :: Worker' Int -> IO ()
doWork' (Worker' dBuf b x) = output' dBuf (show x) b
For a type class with only one function, the dictionary is actually optimized to a newtype, so in this example, the existential Worker type includes a hidden field that consists of a function pointer to the output function for the buffer, and that's the only runtime information needed by doWork.
In Page 20 of above PDF it is mentioned for below code that it is impossible for a Function to demand specific Buffer. Why is it so?
Because Worker, as defined, takes only one argument, the type of the "input" field (type variable x). E.g. Worker Int is a type. The type variable b, instead, is not a parameter of Worker, but is a sort of "local variable", so to speak. It can not be passed as in Worker Int String -- that would trigger a type error.
If we instead defined:
data Worker x b = Worker {buffer :: b, input :: x}
then Worker Int String would work, but the type is no longer existential -- we now always have to pass the buffer type as well.
As Haskell is a Full Type Erasure language like C then How does it know at Runtime which function to call. Is it something like we gonna maintain few information and pass in a Huge V-Table of Functions and at runtime it gonna figure out from V-Table? If it is so then what sort of Information it gonna store?
This is roughly correct. Briefly put, each time you apply constructor Worker, GHC infers the b type from the arguments of Worker, and then searches for an instance Buffer b. If that is found, GHC includes an additional pointer to the instance in the object. In its simplest form, this is not too different from the "pointer to vtable" which is added to each object in OOP when virtual functions are present.
In the general case, it can be much more complex, though. The compiler might use a different representation and add more pointers instead of a single one (say, directly adding the pointers to all the instance methods), if that speeds up code. Also, sometimes the compiler needs to use multiple instances to satisfy a constraint. E.g., if we need to store the instance for Eq [Int] ... then there is not one but two: one for Int and one for lists, and the two needs to be combined (at run time, barring optimizations).
It is hard to guess exactly what GHC does in each case: that depends on a ton of optimizations which might or might not trigger.
You could try googling for the "dictionary based" implementation of type classes to see more about what's going on. You can also ask GHC to print the internal optimized Core with -ddump-simpl and observe the dictionaries being constructed, stored, and passed around. I have to warn you: Core is rather low level, and can be hard to read at first.

Code reuse in Haxl - avoiding GADT constructor-per-request-type

Haxl is an amazing library, but one of the major pain points I find is caused by the fact that each sort of request to the data source requires its own constructor in the Request GADT. For example, taking the example from the tutorial:
data BlogRequest a where
FetchPosts :: BlogRequest [PostId]
FetchPostContent :: PostId -> BlogRequest PostContent
Then each of these constructors are pattern matched on and processed separately in the match function of the DataSource instance. This style results in a lot of boilerplate for a non-trivial application. Take for example an application using a relational database, where each table has a primary key. There may be many hundreds of tables so I don't want to define a constructor for each table (let alone on all the possible joins across tables...). What I really want is something like:
data DBRequest a where
RequestById :: PersistEntity a => Key a -> DBRequest (Maybe a)
I'm using persistent to create types from my tables, but that isn't a critical detail -- I just want to use a single constructor for multiple possible return types.
The problem comes when trying to write the fetch function. The usual procedure with Haxl is to pattern match on the constructor to separate out the various types of BlockedFetch requests, which in the above example would correspond to something like this:
resVars :: [ResultVar (Maybe a)]
args :: [Key a]
(psArgs, psResVars) = unzip
[(key, r) | BlockedFetch (RequestById key) r <- blockedFetches]
...then I would (somehow) group the arguments by their key type, and dispatch a SQL query for each group. But that approach won't because here may be requests for multiple PersistentEntity types (i.e. database tables), each of which is a different type a, so building the list is impossible. I've thought of using an existentially quantified type to get around this issue (something like SomeSing in the singletons library), but then I see no way to group the requests as required without pattern matching on every possible table/type.
Is there any way to achieve this sort of code reuse?
I see two approaches:
Typeable:
data DBRequest a where
RequestById :: (Typeable a, PersistEntity a) => Key a -> DBRequest (Maybe a)
or GADT "tag" type:
data Tag a where
TagValue1 :: Tag Value1
TagValue2 :: Tag Value2
TagValue3 :: Tag Value3
TagValue4 :: Tag Value4
TagValue5 :: Tag Value5
data DBRequest a where
RequestById :: PersistEntity a => Tag a => Key a -> DBRequest (Maybe a)
These are very similar patterns, especially If you use GHC-8.2,
with https://hackage.haskell.org/package/base-4.10.1.0/docs/Type-Reflection.html
(replace Tag a with TypeRep a).
Either way, you can group Key a using the tag. I haven't tried, but
dependent-map might be handy: http://hackage.haskell.org/package/dependent-map

Redundancy regarding product types and tuples in Haskell

In Haskell you have product types and you have tuples.
You use tuples if you don't want to associate a dedicated type with the value, and you can use product types if you wish to do so.
However I feel there is redundancy in the notation of product types
data Foo = Foo (String, Int, Char)
data Bar = Bar String Int Char
Why are there both kinds of notations? Is there any case where you would prefer one the other?
I guess you can't use record notation when using tuples, but that's just a convenience problem. Another thing might be the notion of order in tuples, as opposed to product types, but I think that's just due to the naming of the functions fst and snd.
#chi's answer is about the technical differences in terms of Haskell's evaluation model. I hope to give you some insight into the philosophy of this sort of typed programming.
In category theory we generally work with objects "up to isomorphism". Your Bar is of course isomorphic to (String, Int, Char), so from a categorical perspective they're the same thing.
bar_tuple :: Iso' Bar (String, Int, Char)
bar_tuple = iso to from
where to (Bar s i c) = (s, i, c)
from (s, i, c) = Bar s i c
In some sense tuples are a Platonic form of product type, in that they have no meaning beyond being a collection of disparate values. All the other product types can be mapped to and from a plain old tuple.
So why not use tuples everywhere, when all Haskell types ultimately boil down to a sum of products? It's about communication. As Martin Fowler says,
Any fool can write code that a computer can understand. Good programmers write code that humans can understand.
Names are important! Writing down a custom product type like
data Customer = Customer { name :: String, address :: String }
imbues the type Customer with meaning to the person reading the code, unlike (String, String) which just means "two strings".
Custom types are particularly useful when you want to enforce invariants by hiding the representation of your data and using smart constructors:
newtype NonEmpty a = NonEmpty [a]
nonEmpty :: [a] -> Maybe (NonEmpty a)
nonEmpty [] = Nothing
nonEmpty xs = Just (NonEmpty xs)
Now, if you don't export the NonEmpty constructor, you can force people to go through the nonEmpty smart constructor. If someone hands you a NonEmpty value you may safely assume that it has at least one element.
You can of course represent Customer as a tuple under the hood and expose evocatively-named field accessors,
newtype Customer = Bar (String, String)
name, address :: Customer -> String
name (Customer (n, a)) = n
address (Customer (n, a)) = a
but this doesn't really buy you much, except that it's now cheaper to convert Customer to a tuple (if, say, you're writing performance-sensitive code that works with a tuple-oriented API).
If your code is intended to solve a particular problem - which of course is the whole point of writing code - it pays to not just solve the problem, but make it look like you've solved it too. Someone - maybe you in a couple of years - is going to have to read this code and understand it with no a priori knowledge of how it works. Custom types are a very important communication tool in this regard.
The type
data Foo = Foo (String, Int, Char)
represents a double-lifted tuple. It values comprise
undefined
Foo undefined
Foo (undefined, undefined, undefined)
etc.
This is usually troublesome. Because of this, it's rare to see such definitions in actual code. We either have plain data types
data Foo = Foo String Int Char
or newtypes
newtype Foo = Foo (String, Int, Char)
The newtype can be just as inconvenient to use, but at least it
does not double-lift the tuple: undefined and Foo undefined are now equal values.
The newtype also provides zero-cost conversion between a plain tuple and Foo, in both directions.
You can see such newtypes in use e.g. when the programmer needs a different instance for some type class, than the one already associated with the tuple. Or, perhaps, it is used in a "smart constructor" idiom.
I would not expect the pattern used in Foo to be frequent. There is slight difference in what the constructor acts like: Foo :: (String, Int, Char) -> Foo as opposed to Bar :: String -> Int -> Char -> Bar. Then Foo undefined and Foo (undefined, ..., ...) are strictly speaking different things, whereas you miss one level of undefinedness in Bar.

Why DuplicateRecordFields cannot have type inference?

Related post: How to disambiguate selector function?
https://ghc.haskell.org/trac/ghc/wiki/Records/OverloadedRecordFields/DuplicateRecordFields
However, we do not infer the type of the argument to determine the datatype, or have any way of deferring the choice to the constraint solver.
It's actually annoying that this feature is not implemented. I tried to look up multiple sources but I could not find a reason why they decide not to infer types.
Does anyone know a good reason for this? Is it because of a limitation of the current type system?
You will be interested in OverloadedRecordFields which is still being implemented.
The current implementation is intentionally crippled, so as to not introduce too much new stuff all at once. Inferring the record projection types turns out to open a nasty can of worms (which the aforementioned extension addresses).
Consider the following GHCi interaction
ghci> data Record1 = Record1 { field :: Int }
ghci> data Record2 = Record2 { field :: Bool }
ghci> :t field
What should the type of field be now? Somehow, we need a way to capture the notion of "any record with a field called field". To this end, OverloadedRecordFields introduces a new built-in type class
The new module GHC.Records defines the following:
class HasField (x :: k) r a | x r -> a where
getField :: r -> a
A HasField x r a constraint represents the fact that x is a field of
type a belonging to a record type r. The getField method gives the
record selector function.
Then, from our example above, it is as if the following instances were magically generated by GHC (in fact that is not actually what will happen, but it is a good first approximation).
instance HasField "field" Record1 Int where
getField (Record1 f) = f
instance HasField "field" Record2 Bool where
getField (Record2 f) = f
If you are interested, I recommend reading the proposal. The other feature I haven't mentioned is the IsLabel class. Once all of this is implemented (and some more for updating records) I look forward to being able to get my lenses for free (so I can stop declaring field names starting with an underscore and enabling TemplateHaskell for makeLenses).

Statically enforcing that two objects were created from the same (Int) "seed"

In a library I'm working on, I have an API similar to the following:
data Collection a = Collection Seed {-etc...-}
type Seed = Int
newCollection :: Seed -> IO (Collection a)
newCollection = undefined
insert :: a -> Collection a -> IO () -- ...and other mutable set-like functions
insert = undefined
mergeCollections :: Collection a -> Collection a -> IO (Collection a)
mergeCollections (Collection s0 {-etc...-}) (Collection s1 {-etc...-})
| s0 /= s1 = error "This is invalid; how can we make it statically unreachable?"
| otherwise = undefined
I'd like to be able to enforce that the user cannot call mergeCollections on Collections created with different Seed values.
I thought of trying to tag Collection with a type-level natural: I think this would mean that the Seed would have to be statically known at compile time, but my users might be getting it from an environment variable or user input, so I don't think that would work.
I also hoped I might be able to do something like:
newtype Seed u = Seed Int
newSeed :: Int -> Seed u
newCollection :: Seed u -> IO (Collection u a)
mergeCollections :: Collection u a -> Collection u a -> IO (Collection u a)
Where somehow a Seed is tagged with a unique type in some way, such that the type system could track that both arguments to merge were created from the seed returned by the same invocation of newSeed. To be clear in this (hand-wavy) scheme a and b here would somehow not unify: let a = newSeed 1; b = newSeed 1;.
Is something like this possible?
Examples
Here are some examples of ways I can imagine users creating Seeds and Collections. Users would like to use the other operations (inserting, merging, etc) as freely as they could with any other IO mutable collection:
We only need one seed for all Collections (dynamically) created during the program, but the user must be able to specify in some way how the seed should be determined from the environment at runtime.
One or more static keys gathered from environment vars (or command line args):
main = do
s1 <- getEnv "SEED1"
s2 <- getEnv "SEED2"
-- ... many Collections may be created dynamically from these seeds
-- and dynamically merged later
Probably not in a convenient way. For handling seeds that are known only at runtime, you can use existential types; but then you cannot statically check that two of these existentially wrapped collections match up. The much simpler solution is simply this:
merge :: Collection a -> Collection a -> IO (Maybe (Collection a))
On the other hand, if it is okay to force all operations to be done "together", in a sense, then you can do something like what the ST monad does: group all the operations together, then supply an operation for "running" all the operations that only works if the operations don't leak collections by demanding they be perfectly polymorphic over a phantom variable, hence that the return type doesn't mention the phantom variable. (Tikhon Jelvis also suggests this in his comments.) Here's how that might look:
{-# LANGUAGE Rank2Types #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
module Collection (Collection, COp, newCollection, merge, inspect, runCOp) where
import Control.Monad.Reader
type Seed = Int
data Collection s a = Collection Seed
newtype COp s a = COp (Seed -> a) deriving (Functor, Applicative, Monad, MonadReader Seed)
newCollection :: COp s (Collection s a)
newCollection = Collection <$> ask
merge :: Collection s a -> Collection s a -> COp s (Collection s a)
merge l r = return (whatever l r) where
whatever = const
-- just an example; substitute whatever functions you want to have for
-- consuming Collections
inspect :: Collection s a -> COp s Int
inspect (Collection seed) = return seed
runCOp :: (forall s. COp s a) -> Seed -> a
runCOp (COp f) = f
Note in particular that the COp and Collection constructors are not exported. Consequently we need never fear that a Collection will escape its COp; runCOp newCollection is not well-typed (and any other operation that tries to "leak" the collection to the outside world will have the same property). Therefore it will not be possible to pass a Collection constructed with one seed to a merge operating in the context of another seed.
I believe this is impossible with the constraint that the seeds come from runtime values, like user input. The typechecker as a tool can only reject invalid programs if we can determine the program is invalid at compiletime. Supposing there is a type such that the typechecker is able to reject programs based on user input, we could deduce that the typechecker is doing some sort of time travel or is able to wholly simulate our deterministic universe. The best you can do as a library author is to smuggle your types into something like ExceptT, which documents the seed precondition and exports awareness for it.

Resources