Mapping data constructors to types - haskell

I'm creating an application that allows an admin to build a form, which a user can fill out. Questions can be of different types. Each kind of question corresponds to a type of response data.
Is it possible to encode this at the type level? How would you organize this?
data QuestionType = EmailText | PlainText | Numeric
-- this doesn't have any response information, it's the metadata
-- about the question itself.
data Question = { id :: ID, label :: Text, questionType :: QuestionType }
data Answer = { questionID :: ID, response :: Response }
-- I would like to map those question types to different response types
data Response = ???
-- EmailText => Text
-- PlainText => Text
-- Numeric => Int
I've thought about Type Families, which would work perfectly except that I want to map from different data constructors to different types, and type families would require separate types for each one.
This would be a good fit for a single ADT, with the response information included in each constructor, but I need to be able to deal with question types independently from responses.
How should I approach this?

I've not completely understood what you exactly want, but maybe this can be a starting point:
{-# LANGUAGE DataKinds, GADTs #-}
data Response (qt :: QuestionType) where
RPlainText :: Text -> Response PlainText
REmailText :: Text -> Response EmailText
RNumeric :: Int -> Response Numeric
data Answer qt = Answer {questionID :: ID, response :: Response qt}
If you do not want the qt argument in Answer qt, you probably need existential types to hide that, but at that point you'll probably want to link it with the questions somehow.

Having types depend on values is exactly what dependent types are for. Unfortunately, Haskell doesn't have dependent types. However, you can use a sum type to capture the possible response types:
data Response = ResponseText Text | ResponseInt Int

Related

Existential type classes vs. Data Constructors vs. Coproducts

I find myself running up against the same pattern in my designs where I start with a type with a few data constructors, eventually want to be able to type against those data constructors and thus split them into their own types, just to then have to increase the verbosity of other parts of the program by needing to use Either or another tagged-union for situations where I still need to represent multiple of these types (namely collections).
I am hoping someone can point me to a better way of accomplishing what I'm trying to do. Let me start with a simple example. I am modeling a testing system, where you can have nested test suites which eventually end in tests. So, something like this:
data Node =
Test { source::string }
Suite { title::string, children::[Node] }
So, pretty simple so far, essentially a fancy Tree/Leaf declaration. However, I quickly realize that I want to be able to make functions that take Tests specifically. As such, I'll now split it up as so:
data Test = Test { source::string }
data Suite = Suite { title::string, children::[Either Test Suite] }
Alternatively I might roll a "custom" Either (especially if the example is more complicated and has more than 2 options), say something like:
data Node =
fromTest Test
fromSuite Suite
So, already its pretty unfortunate that just to be able to have a Suite that can have a combination of Suites or Tests I end up with a weird overhead Either class (whether it be with an actual Either or a custom one). If I use existential type classes, I could get away with making both Test and Suite derive "Node_" and then have Suite own a List of said Nodes. Coproducts would allow something similar, where I'd essentially do the same Either strategy without the verbosity of the tags.
Allow me to expand now with a more complex example. The results of the tests can be either Skipped (the test was disabled), Success, Failure, or Omitted (the test or suite could not be run due to a previous failure). Again, I originally started with something like this:
data Result = Success | Omitted | Failure | Skipped
data ResultTree =
Tree { children::[ResultTree], result::Result } |
Leaf Result
But I quickly realized I wanted to be able to write functions that took specific results, and more importantly, have the type itself enforce the ownership properties: A successful suite must only own Success or Skipped children, Failure's children can be anything, Omitted can only own Omitted, etc. So now I end up with something like this:
data Success = Success { children::[Either Success Skipped] }
data Failure = Failure { children::[AnyResult] }
data Omitted = Omitted { children::[Omitted] }
data Skipped = Skipped { children::[Skipped] }
data AnyResult =
fromSuccess Success |
fromFailure Failure |
fromOmitted Omitted |
fromSkipped Skipped
Again, I now have these weird "Wrapper" types like AnyResult, but, I get type enforcement of something that used to only be enforced from runtime operation. Is there a better strategy to this that doesn't involve turning on features like existential type classes?
The first thing that came to my mind reading your sentence: "I quickly realized I wanted to be able to write functions that took specific results" is Refinement Types.
They allow to take only some values from a type as input, and make those constraints compile-time check/error.
There is this video from a talk at HaskellX 2018, that introduces LiquidHaskell, which allows the use of Refinement Types in Haskell:
https://skillsmatter.com/skillscasts/11068-keynote-looking-forward-to-niki-vazou-s-keynote-at-haskellx-2018
You have to decorate your haskell function signature, and have LiquidHaskell installed:
f :: Int -> i : Int {i | i < 3} -> Int would be a function which could only accept as second parameter an Int with a value < 3, checked at compile time.
You might as well put constraints on your Result type.
I think what you may be looking for is GADTs with DataKinds. This lets you refine the types of each constructor in a data type to a particular set of possible values. For example:
data TestType = Test | Suite
data Node (t :: TestType) where
TestNode :: { source :: String } -> Node 'Test
SuiteNode :: { title :: String, children :: [SomeNode] } -> Node 'Suite
data SomeNode where
SomeNode :: Node t -> SomeNode
Then when a function operates only on tests, it can take a Node 'Test; on suites, a Node 'Suite; and on either, a polymorphic Node a. When pattern-matching on a Node a, each case branch gets access to an equality constraint:
useNode :: Node a -> Foo
useNode node = case node of
TestNode source -> {- here it’s known that (a ~ 'Test) -}
SuiteNode title children -> {- here, (a ~ 'Suite) -}
Indeed if you took a concrete Node 'Test, the SuiteNode branch would be disallowed by the compiler, since it can’t ever match.
SomeNode is an existential that wraps a Node of an unknown type; you can add extra class constraints to this if you want.
You can do a similar thing with Result:
data ResultType = Success | Omitted | Failure | Skipped
data Result (t :: ResultType) where
SuccessResult
:: [Either (Result 'Success) (Result 'Skipped)]
-> Result 'Success
FailureResult
:: [SomeResult]
-> Result 'Failure
OmittedResult
:: [Result 'Omitted]
-> Result 'Omitted
SkippedResult
:: [Result 'Skipped]
-> Result 'Skipped
data SomeResult where
SomeResult :: Result t -> SomeResult
Of course I assume in your actual code there’s more information in these types; as it is, they don’t represent much. When you have a dynamic computation such as running a test that may produce different kinds of result, you can return it wrapped in SomeResult.
In order to work with dynamic results, you may need to prove to the compiler that two types are equal; for that, I direct you to Data.Type.Equality, which provides a type a :~: b which is inhabited by a single constructor Refl when the two types a and b are equal; you can pattern-match on this to inform the typechecker about type equalities, or use the various combinators to carry out more complicated proofs.
Also useful in conjunction with GADTs (and ExistentialTypes, less generally) is RankNTypes, which basically enables you to pass polymorphic functions as arguments to other functions; this is necessary if you want to consume an existential generically:
consumeResult :: SomeResult -> (forall t. Result t -> r) -> r
consumeResult (SomeResult res) k = k res
This is an example of continuation-passing style (CPS), where k is the continuation.
As a final note, these extensions are widely used and largely uncontroversial; you needn’t be wary of opting in to (most) type system extensions when they let you express what you mean more directly.

"Embedding/inheriting" one `data` constructor in another?

Consider the following fragment:
data File
= NoFile
| FileInfo {
path :: FilePath,
modTime :: Data.Time.Clock.UTCTime
}
| FileFull {
path :: FilePath,
modTime :: Data.Time.Clock.UTCTime,
content :: String
}
deriving Eq
That duplication is a bit of a "wart", though in this one-off instance not particularly painful. In order to further improve my understanding of Haskell's rich type system, what might be preferred "clean"/"idiomatic" approaches for refactoring other than either simply creating a separate data record type for the 2 duplicate fields (then replacing them with single fields of that new data type) or replacing the FileFull record notation with something like | FileFull File String, which wouldn't be quite clean either (as here one would only want FileInfo in there for example, not NoFile)?
(Both these "naive" approaches would be somewhat intrusive/annoying with respect to having to then fix up many modules manually throughout the rest of the code-base here.)
One thing I considered would be parameterizing like so:
data File a
= NoFile
| FileMaybeWithContent {
path :: FilePath,
modTime :: Data.Time.Clock.UTCTime
content :: a
}
deriving Eq
Then for those "just info, not loaded" contexts a would be (), otherwise String. Seems too general anyway, we want either String or nothing, leading us to Maybe, doing once again away with the a parameter.
Of course we've been there before: content could just be done with Maybe String of course, then "refactor any compile errors away" and "done". That'll probably be the order of the day, but knowing Haskell and the many funky GHC extensions.. who knows just what exotic theoretic trick/axiom/law I've been missing, right?! See, the differently-named "semantic insta-differentiator" between a "just meta-data info" value and a "file content with meta info" value does work well throughout the rest of the code-base as far as eased comprehension.
(And yes, I perhaps should have removed NoFile and used Maybe Files throughout, but then... not sure whether there's really a solid reason to do so and a different question altogether anyway..)
All of the following are equivalent/isomorphic, as I think you've discovered:
data F = U | X A B | Y A B C
data F = U | X AB | Y AB C
data AB = AB A B
data F = U | X A B (Maybe C)
So the color of the bike shed really depends on the context (e.g. do you have use for an AB elsewhere?) and your own aesthetic preferences.
It might clarify things and help you understand what you're doing to have some sense of the algebra of algebraic data types
We call types like Either "sum types" and types like (,) "product types" and they are subject to the same kinds of transformations you're familiar with like factoring
f = 1 + (a * b) + (a * b * c)
= 1 + ((a * b) * ( 1 + c))
As others have noted, the NoFile constructor is probably not necessary, but you can keep it if you want. If you feel your code is more readable and/or better understood with it, then I say keep it.
Now the trick with combining the other two constructors is by hiding the content field. You were on the right track by parameterizing File, but that alone isn't enough since then we can have File Foo, File Bar, etc. Fortunately, GHC has some nifty ways to help us.
I'll write out the code here and then explain how it works.
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE DataKinds #-}
import Data.Void
data Desc = Info | Full
type family Content (a :: Desc) where
Content Full = String
Content _ = Void
data File a = File
{ path :: FilePath
, modTime :: UTCTime
, content :: Content a
}
There are a few things going on here.
First, note that in the File record, the content field now has type Content a instead of just a. Content is a type family, which is (in my opinion) a confusing name for type-level function. That is, the compiler replaces Content a with some other type based on what a is and how we've defined Content.
We defined Content Full to be String, so that when we have a value f1 :: File Full, its content field will have a String value. On the other hand, f2 :: File Info will have a content field with type Void which has no values.
Cool right? But what's preventing us from having File Foo now?
That's where DataKinds comes to the rescue. It "promotes" the data type Desc to a kind (the type of types in Haskell) and type constructors ,Info and Full, to types of kind Desc instead of merely values of type Desc.
Notice in the declaration of Content that I have annotated a. It looks like a type annotation, but a is already a type. This is a kind annotation. It forces a to be something of kind Desc and the only types of kind Desc are Info and Full.
By now you're probably totally sold on how awesome this is, but I should warn you there's no free lunch. In particular, this is a compile-time construction. Your single File type becomes two different types. This can cause other related logic (producers and consumers of File records) to become complicated. If your use case doesn't mix File Info records with File Full records, then this is the way to go. On the other hand, if you want to do something like have a list of File records which can be a mixture of both types, then you're better off just making the type of your content field Maybe String.
Another thing is, how exactly do you make a File Info since there's no value of Void to use for the content field? Well, technically it should be ok to use undefined or error "this should never happen" since it is (morally) impossible to have a function of type Void -> a, but if that makes you feel uneasy (and it probably should), then just replace Void with (). Unit is almost as useless and doesn't require 'values' of bottom.

Path dependent types in Haskell

I am trying to design an API for some database system in Haskell, and I would like to model the columns of this database in a way such that interactions between columns of different tables cannot get mixed up.
More precisely, imagine that you have a type to represent a table in a database, associated to some type:
type Table a = ...
and that you can extract the columns of the table, along with the type of the column:
type Column col = ...
Finally, there are various extractors. For example, if your table contains descriptions of frogs, a function would let you extract the column containing the weight of the frog:
extractCol :: Table Frog -> Column Weight
Here is the question: I would like to distinguish the origin of the columns so that users cannot do operations between tables. For example:
bullfrogTable = undefined :: Table Frog
toadTable = undefined :: Table Frog
bullfrogWeights = extractCol bullfrogTable
toadWeights = extractCol toadTable
-- Or some other columns from the toad table
toadWeights' = extractCol toadTable
-- This should compile
addWeights toadWeights' toadWeights
-- This should trigger a type error
addWeights bullfrogWeights toadWeights
I know how to achieve this in Scala (using path-dependent types, see [1]), and I have been thinking of 3 options in Haskell:
not using types, and just doing a check at runtime (the current solution)
the TypeInType extension to add a phantom type on the Table type itself, and pass this extra type to the columns. I am not keen on it, because the construction of such a type would be very complicated (tables are generated through complex DAG operations) and probably slow to compile in this context.
wrapping the operations using a forall construct similar to the ST monad, but in my case, I would like the extra tagging type to actually escape the construction.
I am happy to have a very limited valid scoping for the construction of the same columns (i.e. columns from table and (id table) not being mixable), and I mostly care about the DSL feel of the API rather than the safety.
[1] What is meant by Scala's path-dependent types?
My current solution
Here is what I ended up doing, using RankNTypes.
I still want to give users the ability to use columns how they see fit without having some strong type checks, and opt in if they want some stronger type guarantees: this is a DSL for data scientist who will not know the power of the Haskell side
Tables are still tagged by their content:
type Table a = ...
and columns are now tagged with some extra reference types, on top of the type of the data they contain:
type Column ref col = ...
Projections from tables to columns are either tagged or untagged. In practice, this is hidden behind a lens-like DSL.
extractCol :: Table Frog -> Column Frog Weight
data TaggedTable ref a = TaggedTable { _ttTable :: Table a }
extractColTagged :: Table ref Frog -> Column ref Weight
withTag :: Table a -> (forall ref. TaggedTable ref a -> b) -> b
withTag tb f = f (TaggedTable tb)
Now I can write some code as following:
let doubleToadWeights = withTag toadTable $ \ttoadTable ->
let toadWeights = extractColTagged ttoadTable in
addWeights toadWeights toadWeights
and this will not compile, as desired:
let doubleToadWeights =
toadTable `withTag` \ttoads ->
bullfrogTable `withTag` \tbullfrogs ->
let toadWeights = extractColTagged ttoads
bullfrogWeights = extractColTagged tbullfrogs
in addWeights toadWeights bullfrogWeights -- Type error
From a DSL perspective, I believe it is not as straightforward as what one could achieve with Scala, but the type error message is understandable, which is paramount for me.
Haskell does not (as far as I know) have path dependent types, but you can get some of the way by using rank 2 types. For instance the ST monad has a dummy type parameter s that is used to prevent leakage between invocations of runST:
runST :: (forall s . ST s a) -> a
Within an ST action you can have an STRef:
newSTRef :: a -> ST s (STRef s a)
But the STRef you get carries the s type parameter, so it isn't allowed to escape from the runST.

Haskell: Confusion with own data types. Record syntax and unique fields

I just uncovered this confusion and would like a confirmation that it is what it is. Unless, of course, I am just missing something.
Say, I have these data declarations:
data VmInfo = VmInfo {name, index, id :: String} deriving (Show)
data HostInfo = HostInfo {name, index, id :: String} deriving (Show)
vm = VmInfo "vm1" "01" "74653"
host = HostInfo "host1" "02" "98732"
What I always thought and what seems to be so natural and logical is this:
vmName = vm.name
hostName = host.name
But this, obviously, does not work. I got this.
Questions
So my questions are.
When I create a data type with record syntax, do I have to make sure that all the fields have unique names? If yes - why?
Is there a clean way or something similar to a "scope resolution operator", like :: or ., etc., so that Haskell distinguishes which data type the name (or any other none unique fields) belongs to and returns the correct result?
What is the correct way to deal with this if I have several declarations with the same field names?
As a side note.
In general, I need to return data types similar to the above example.
First I returned them as tuples (seemed to me the correct way at the time). But tuples are hard to work with as it is impossible to extract individual parts of a complex type as easy as with the lists using "!!". So next thing I thought of the dictionaries/hashes.
When I tried using dictionaries I thought what is the point of having own data types then?
Playing/learning data types I encountered the fact that led me to the above question.
So it looks like it is easier for me to use dictionaries instead of own data types as I can use the same fields for different objects.
Can you please elaborate on this and tell me how it is done in real world?
Haskell record syntax is a bit of a hack, but the record name emerges as a function, and that function has to have a unique type. So you can share record-field names among constructors of a single datatype but not among distinct datatypes.
What is the correct way to deal with this if I have several declarations with the same field names?
You can't. You have to use distinct field names. If you want an overloaded name to select from a record, you can try using a type class. But basically, field names in Haskell don't work the way they do in say, C or Pascal. Calling it "record syntax" might have been a mistake.
But tuples are hard to work with as it is impossible to extract individual parts of a complex type
Actually, this can be quite easy using pattern matching. Example
smallId :: VmInfo -> Bool
smallId (VmInfo { vmId = n }) = n < 10
As to how this is done in the "real world", Haskell programmers tend to rely heavily on knowing what type each field is at compile time. If you want the type of a field to vary, a Haskell programmer introduces a type parameter to carry varying information. Example
data VmInfo a = VmInfo { vmId :: Int, vmName :: String, vmInfo :: a }
Now you can have VmInfo String, VmInfo Dictionary, VmInfo Node, or whatever you want.
Summary: each field name must belong to a unique type, and experienced Haskell programmers work with the static type system instead of trying to work around it. And you definitely want to learn about pattern matching.
There are more reasons why this doesn't work: lowercase typenames and data constructors, OO-language-style member access with .. In Haskell, those member access functions actually are free functions, i.e. vmName = name vm rather than vmName = vm.name, that's why they can't have same names in different data types.
If you really want functions that can operate on both VmInfo and HostInfo objects, you need a type class, such as
class MachineInfo m where
name :: m -> String
index :: m -> String -- why String anyway? Shouldn't this be an Int?
id :: m -> String
and make instances
instance MachineInfo VmInfo where
name (VmInfo vmName _ _) = vmName
index (VmInfo _ vmIndex _) = vmIndex
...
instance MachineInfo HostInfo where
...
Then name machine will work if machine is a VmInfo as well as if it's a HostInfo.
Currently, the named fields are top-level functions, so in one scope there can only be one function with that name. There are plans to create a new record system that would allow having fields of the same name in different record types in the same scope, but that's still in the design phase.
For the time being, you can make do with unique field names, or define each type in its own module and use the module-qualified name.
Lenses can help take some of the pain out of dealing with getting and setting data structure elements, especially when they get nested. They give you something that looks, if you squint, kind of like object-oriented accessors.
Learn more about the Lens family of types and functions here: http://lens.github.io/tutorial.html
As an example for what they look like, this is a snippet from the Pong example found at the above github page:
data Pong = Pong
{ _ballPos :: Point
, _ballSpeed :: Vector
, _paddle1 :: Float
, _paddle2 :: Float
, _score :: (Int, Int)
, _vectors :: [Vector]
-- Since gloss doesn't cover this, we store the set of pressed keys
, _keys :: Set Key
}
-- Some nice lenses to go with it
makeLenses ''Pong
That makes lenses to access the members without the underscores via some TemplateHaskell magic.
Later on, there's an example of using them:
-- Update the paddles
updatePaddles :: Float -> State Pong ()
updatePaddles time = do
p <- get
let paddleMovement = time * paddleSpeed
keyPressed key = p^.keys.contains (SpecialKey key)
-- Update the player's paddle based on keys
when (keyPressed KeyUp) $ paddle1 += paddleMovement
when (keyPressed KeyDown) $ paddle1 -= paddleMovement
-- Calculate the optimal position
let optimal = hitPos (p^.ballPos) (p^.ballSpeed)
acc = accuracy p
target = optimal * acc + (p^.ballPos._y) * (1 - acc)
dist = target - p^.paddle2
-- Move the CPU's paddle towards this optimal position as needed
when (abs dist > paddleHeight/3) $
case compare dist 0 of
GT -> paddle2 += paddleMovement
LT -> paddle2 -= paddleMovement
_ -> return ()
-- Make sure both paddles don't leave the playing area
paddle1 %= clamp (paddleHeight/2)
paddle2 %= clamp (paddleHeight/2)
I recommend checking out the whole program in its original location and looking through the rest of the lens material; it's very interesting even if you don't end up using them.
Yes, you cannot have two records in the same module with the same field names. The field names are added to the module's scope as functions, so you would use name vm rather than vm.name. You could have two records with the same field names in different modules and import one of the modules qualified as some name, but this is probably awkward to work with.
For a case like this, you should probably just use a normal algebraic data type:
data VMInfo = VMInfo String String String
(Note that the VMInfo has to be capitalized.)
Now you can access the fields of VMInfo by pattern matching:
myFunc (VMInfo name index id) = ... -- name, index and id are bound here

Any nice record Handling tricks in Haskell?

I'm aware of partial updates for records like :
data A a b = A { a :: a, b :: b }
x = A { a=1,b=2 :: Int }
y = x { b = toRational (a x) + 4.5 }
Are there any tricks for doing only partial initialization, creating a subrecord type, or doing (de)serialization on subrecord?
In particular, I found that the first of these lines works but the second does not :
read "A {a=1,b=()}" :: A Int ()
read "A {a=1}" :: A Int ()
You could always massage such input using a regular expression, but I'm curious what Haskell-like options exist.
Partial initialisation works fine: A {a=1} is a valid expression of type A Int (); the Read instance just doesn't bother parsing anything the Show instance doesn't output. The b field is initialised to error "...", where the string contains file/line information to help with debugging.
You generally shouldn't be using Read for any real-world parsing situations; it's there for toy programs that have really simple serialisation needs and debugging.
I'm not sure what you mean by "subrecord", but if you want serialisation/deserialisation that can cope with "upgrades" to the record format to contain more information while still being able to process old (now "partial") serialisations, then the safecopy library does just that.
You cannot leave some value in Haskell "uninitialized" (it would not be possible to "initialize" it later anyway, since Haskell is pure). If you want to provide "default" values for the fields, then you can make some "default" value for your record type, and then do a partial update on that default value, setting only the fields you care about. I don't know how you would implement read for this in a simple way, however.

Resources