Polymorphic Vector storage for in-memory column store type of task

Polymorphic Vector storage for in-memory column store type of task - haskell

"My brain is about to explode" tells me ghc and the feeling is mutual.
There's an aggregation function that works as a charm on polymorphic vectors (in-memory column store context) - takes 2 vectors and groups by unique values of the 1st one while applying function f to values of the 2nd. So, similar to SQL GROUP BY or mongo aggregation. Examples:
\> groupColumns creg (+) cint
fromList [("EMEA",345),("NA",681),("RoW",988)]
\> groupColumns cint (*) cdouble
fromList [(1,13.0),(2,46.0),(4,16.0),(23,5359.0),(24,528.0),(234,5475.599999999999),(43,18619.0),(12,14.399999999999999),(412,1697.44),(252,-6350.4)]
Relevant code:
groupColumns :: (Eq k, G.Vector v k, G.Vector v2 a, Hashable k) =>
v k -> (a -> a -> a) -> v2 a -> Map.HashMap k a
groupColumns xxs f yys = ...
cint :: U.Vector Int64
cint = U.fromList [1,2,43,234,23,412,24,12,4,252,1,2,43,234,23,412,24,12,4,252]
cdouble :: U.Vector Double
cdouble = U.fromList [13,23,433,23.4,233,4.12,22,1.2,4,-25.2, 1,2,43,234,23,412,24,12,4,252]
creg :: V.Vector Text
creg = V.fromList ["EMEA", "NA", "EMEA", "RoW", "NA", "RoW", "EMEA", "EMEA", "NA", "RoW", "EMEA", "NA", "RoW", "NA", "RoW", "NA", "RoW", "EMEA", "NA", "EMEA"]
The problem arises when I want to parse a user's input and run an aggregation function built dynamically. Let's say there's a source data table of "Region" : Text, "Revenue" : Int, "Country" : Text, "Booking Date" : Int. A user may want to do (pseudocode): 1) groupby "Region", "Country" sum "Revenue" or 2) groupby "Country", "Booking Date" sum "Revenue" etc etc. The issue is that "Region" and "Country" vectors are V.Vector Text while "Booking Date" and "Revenue" are U.Vector Int64 - so I can't store them in one Hashtable or List and do an obvious thing: get abstract (Vector a) from this one Hashtable or List, pass them to groupColumns function (which already perfectly supports polymorphic vectors!!!) and get a result. I don't care about specific type here for groupColumns, I only care that whatever I'm passing supports Vector interface (by being part of the type family).
So, it boils down to: I need some sort of Storage type that 1) given Text (name of the vector) 2) gives back U.Vector a or V.Vector b without explicit type signature. In the ideal world it'd be just one line call: groupColumns (extractVec col1 cms) func (extractVec col2 cms), where col1 and col2 is Text parsed from user input and func is a function dynamically set from parsed user input.
In the real world I tried:
1) heterogenous tricks (of the Data HV where HV :: (Vector v a) => v a -> HV sort) but there both mine and ghc brains start to explode because various type variables escape their scope (e.g., in the f function that is passed to groupColumns) - even though getting (HV U.Vector Int64) from [(Text, HV a)] is straightforward.
2) Typed vector storage, like this:
data ColumnMemoryStore = ColumnMemoryStore {
intCols :: Map.HashMap Text (U.Vector Int64),
doubleCols :: Map.HashMap Text (U.Vector Double),
textCols :: Map.HashMap Text (V.Vector Text),
typeSchema :: Map.HashMap Text SupportedTypes -- helper map from column names to their types
}
with polymorphic extractVector function, so I can do extractVector name cms :: U.Vector Int64 - and it automagically returns U.Vector Int64 from the intCols and respectively for others. Here, the problem is that after parsing user input I have to analyze what is the Type of the vectors he wants to aggregate by (by consulting typeSchema) and give corresponding type signatures to extractVector calls - which turns into absolutely, terrifyingly ugly spaghetti of case statements that makes me want to write everything in C as it will be 5x shorter. Here's a sample:
let t1 = checkColType' colname (cms gs)
case t1 of
PText -> let col = extractVec colname (cms gs) :: V.Vector T.Text
result = groupColumns col (+) aggCol
in outputStrLn $ show result
PInt -> let col = extractVec colname (cms gs) :: U.Vector Int64
result = groupColumns col (+) aggCol
in outputStrLn $ show result
etc etc etc. This compiles and works but it's ugly, non-functional and boilerplate. I mean, the ONLY reason for doing this is the need to specify return type of extractVec explicitly while then it's never used by groupColumns, which simply expects anything that's Vector v a! There has to be a way around it...
3) Should I even think about Data.Reflection or something similar but no less scary? Template Haskell?
I am sorry for a long description, but I spent tons of time researching and feel like completely stuck - which probably (hopefully) means I'm missing something pretty obvious (like not enough abstraction levels) and those of you who think in Haskell can at least point me to the correct approach of solving this issue. Thanks a lot!

Related

Anonymous records: what ways to type-level tag in Haskell?

I'm playing with lightweight anonymous record-alikes, more to explore the type theory for them than anything 'industrial strength'. I want the fields to be simply type-tagged.
myRec = (EmpId 54321, EmpName "Jo", EmpPhone "98-7654321") -- in which
newtype EmpPhone a = EmpPhone a -- and maybe
data EmpName a where EmpName :: IsString a => a -> EmpName a -- GADT
data EmpId a where EmpId :: Int -> EmpId Int -- GADT to same pattern
Although I could put newtype EmpId = EmpId Int, I want to follow the same pattern for all tags, so that I can go for example:
project (EmpId, EmpName) myRec -- use tags as field names
I'll also use StandaloneDeriving/DeriveAnyType to derive instance Eq, Show, Num etc.
Other possible designs
For the records, rather than Haskell tuples I could use HList or make my own data types Tuple0, Tuple1, Tuple2, .... I don't think that would affect the typing issues below.
For the tags/fields I could pair a Symbol (type-level String) as phantom type with the value -- for example CTRex does something like that. Then use TypeApplications to build fields.
data Tag (tag :: Symbol) a = Tag a
myRec = (Tag #"EmpId" 54321, ...)
That makes the field syntax (and projection list) rather 'noisy'; also prevents any validation that EmpIds are Int, etc.
Three related lines of questions on typing for these:
How best to prevent
sillyRec = (EmpId 65432, Just "not my tag", "or [] as constructor",
Right "or even worse" :: Either Int String)
I could declare a class, put my tags only in it (not too bad with DeriveAnyClass), put constraints everywhere. But my tags have a consistent structure: single data constructor named same as the type; single type parameter which is the only parameter to the data constructor.
How to express I want each record-alike to follow a consistent type pattern? That is prevent:
notaRec = (EmpId 76543, EmpName)
Bare EmpName is OK in a projection list, providing all the other fields are bare constructors. I want to say that notaRec is not well-Kinded, but bare EmpName is Kind * -> *, which is unifiable with *. So I mean more like: all fields in the record fit the same type pattern.
Then when I get to sets-of-records (aka tables/relations)
myTable = ( myRec, -- tuple of tuples
(EmpName "Kaz", EmpPhone 987654312, EmpId 87654),
EmpId 98765, EmpPhone "21-4365879", EmpName "Bo")
Putting the fields in a different order is OK because we have a tuple-of-tuples. But EmpPhone is at two different types in the two records. And the last line isn't a record at all: it's fields at the 'wrong' pattern. (Same mis-match as with bare EmpName in 2.)
Again I want to say these are ill-Kinded. My field tags are appearing at different 'depths' or in differing type patterns.
I guess I could get there with a great deal of hard-coding for valid instances/combos of types. Is there a more generic way?
EDIT: In response to comments. (Yes I'm mortal too. Thanks #duplode for figuring out the formatting.)
why not type Record = (EmpId Int, EmpName String, EmpPhone String)?
As a type synonym that's fine. But doesn't answer the question because I want it equivalent to any permutation of those tags. (I think I can verify that equivalence at type level using HList techniques.)
some sort of high-level overview of your objective [thank you David]
I want to treat the ( ... , ... , ... ) as a set. Because the Relational Database Model says relations are sets of 'tuples' [not Haskell tuples] and 'tuples' are sets of pairs of tag-value. I also want to treat the project function as having a first-class parameter which is a set of tags. (Contrast that in Codd's Relational Algebra, the π operator has its set of tags subscripted as if part of the operator.)
These couldn't be Haskell Sets because the elements are not the same type. I want to say the elements are the same Kind; and that a Haskell-tuple of same-Kinded elements represents a set-of that Kind. But I know that's abusing terminology. (The alternative design I considered using Symbol tags perhaps shows better there's a Kindiness aspect.)
If I can treat the Haskell tuples as set-ish, I can use well-known HList techniques to emulate the Relational Operators.
If this helps explain, I could do this with a lot of boilerplate:
class MyTag a -- type/kind-level predicate
deriving instance MyTag (EmpId Int) -- uses DeriveAnyClass
-- etc for all my tags
class WellKinded tup
instance WellKinded ()
instance {-# OVERLAPPING #-}
(MyTag (n1 a1), MyTag (n2 a2), MyTag (n3 a3))
=> WellKinded (n1 a1, n2 a2, n3 a3) -- and so on for every arity of tuple
instance {-# OVERLAPPABLE #-}
(MyTag (n1 a1), MyTag (n2 a2), MyTag (n3 a3))
=> WellKinded (a1 -> n1 a1, a2 -> n2 a2, a3 -> n3 a3)
All those instances for different arities are rapidly going to get tedious, so I could convert to HList; despatch an instance on the Kind of the first element; iterate down the list verifying all the same Kind.
For tuple-of-tuples, detect the Kind of the first element of the first sub-tuple; iterate both across and down. (Again needs OverlappingInstances: a tuple-of-tuples-of-tuples is still a tuple. This is what I mean by "a great deal of hard-coding" above.) It doesn't seem unachievable. But it does feel like going down the wrong rabbit-hole.

This is crazy enough it might just work. Pattern synonyms to the rescue:
newtype Label (n :: Symbol) (a :: *) = MkLab a -- newtype yay!
deriving (Eq, Ord, Show)
pattern EmpPhone x = MkLab x :: Label "EmpPhone" a
pattern EmpName x = MkLab x :: IsString a => Label "EmpName" a
pattern EmpId x = MkLab x :: Label "EmpId" Int
myRec = (EmpId 54321, EmpName "Jo", EmpPhone "98-7654321") -- works a treat
Then to answer the q's
To count as a record, all tuple elements must be of type Label s a.
To count as a projection list, all tuple elements must be of type a -> Label s a.
(That works, by the way.)
Those are the only types/kinds allowed in tuples-as-records.
So to parse a tuple-of-tuples at type level, I need only despatch on the type of the leftmost element.
I'm looking for type constructor Label.
All the rest I can do with HList-style type matching.
For those patterns I did need to switch on a swag of extensions:
{-# LANGUAGE PatternSynonyms,
KindSignatures, DataKinds,
ScopedTypeVariables, -- for the signatures on patterns
RankNTypes #-} -- for the signatures with contexts
import GHC.TypeLits -- for the Symbols

Here's a kinda answer or at least explanation for 2., 3.; a partial answer to 1.
How to express I want each record-alike to follow a consistent type pattern? That is prevent:
notaRec = (EmpId 76543, EmpName)
On the face of it EmpId 76543 matches type pattern (n a); whereas EmpName :: a -> (n a). But Hindley-Milner doesn't "match" simplistically like that, it uses unifiability. So all of these unify with (n a):
-- as `( n a )`
a -> (n a) -- as `( ((->) a) (n a) )`
(b, c) -- as `( (,) b ) c `
(b, c, d) -- as `( (,,) b c ) d ` -- etc for all larger Haskell tuples
[ a ], Maybe a -- as `( [] a )`, `( Maybe a )`
Either b c -- as `( (Either b) c )`
b -> (Either b c) -- as `( ((->) b) (Either b c) )` -- for example, bare `Left`
To disagree with myself on the abuse of terminology:
I want to say these are ill-Kinded. My field tags are appearing at different 'depths' ...
But I know that's abusing terminology.
Any type with a -> outermost constructor is at a different Kind vs one without. Either is at a different Kind vs EmpId, because it is different arity. Type unification builds the 'most general unifier', and that makes them appear same-Kinded.
For the purposes here we want the opposite of the mgu -- call it the 'maximally specific Kind', MaSK for short.
We can express it with a closed Type Family and lots of overlapping equations (so the order of them is critical). This can also catch the Prelude's constructors that shouldn't count:
type family MaSK ( a :: * ) where
-- presume the result is one from some pre-declared bunch of types
-- use that result to verify all 'elements' of a set are same-kinded
MaSK (_ -> _ _ _) = No -- e.g. bare `Left`
MaSK (_ -> [ _ ]) = No -- reject unwanted constructors
MaSK (_ -> Maybe _ ) = No -- ditto
MaSK (a' -> n a') = YesAsBareTag -- this we want
MaSK (_ -> _ _ ) = No --
MaSK (_ -> _ ) = No
MaSK ( _ , _ , _ , _ ) = YesAsSet -- etc for greater arities
MaSK ( _ , _ , _ ) = YesAsSet
MaSK ( _ , _ ) = YesAsSet
MaSK (_ _ _ ) = No -- too much arity, e.g. `Either b c`
MaSK [ _ ] = No -- reject unwanted constructors
MaSK (Maybe _) = No -- ditto
MaSK (n a) = YesAsTagValue -- this we want providing all the above eliminated
MaSK _ = No -- i.e. bare `Int, Bool, Char, ...`
Limitations: this approach can't check there's a single data constructor for the type, nor that other constructors for that type match the pattern, nor that the constructor is named same as the type, nor that the constructor might smuggle in existentially-quantified parameters. For that, go full metal generics.

Polymorphic return types and "rigid type variable" error in Haskell

There's a simple record Column v a which holds a Vector from the Data.Vector family (so that v can be Vector.Unboxed, just Vector etc), it's name and type (simple enum-like ADT SupportedTypes). I would like to be able to serialize it using the binary package. To do that, I try to define a Binary instance below.
Now put works fine, however when I try to define deserialization in the get function and want to set a specific type to the rawVector that is being returned based on the colType (U.Vector Int64 when it's PInt, U.Vector Double when it's PDouble etc) - I get this error message:
Couldn't match type v with U.Vector
v is a rigid type variable bound by the instance declaration at src/Quark/Base/Column.hs:75:10
Expected type: v a
Actual type: U.Vector Int64
error.
Is there a better way to achieve my goal - deserialize Vectors of different types based on the colType value or am I stuck with defining Binary instance for all possible Vector / primitive type combinations? Shouldn't be the case...
Somewhat new to Haskell and appreciate any help! Thanks!
{-# LANGUAGE OverloadedStrings, TransformListComp, RankNTypes,
TypeSynonymInstances, FlexibleInstances, OverloadedLists, DeriveGeneric #-}
{-# LANGUAGE MultiParamTypeClasses, FlexibleContexts,
TypeFamilies, ScopedTypeVariables, InstanceSigs #-}
import qualified Data.Vector.Generic as G
import qualified Data.Vector.Unboxed as U
data Column v a = Column {rawVector :: G.Vector v a => v a, colName :: Text, colType :: SupportedTypes }
instance (G.Vector v a, Binary (v a)) => Binary (Column v a) where
put Column {rawVector = vec, colName = cn, colType = ct} = do put (fromEnum ct) >> put cn >> put vec
get = do t <- get :: Get Int
nm <- get :: Get Text
let pt = toEnum t :: SupportedTypes
case pt of
PInt -> do vec <- get :: Get (U.Vector Int64)
return Column {rawVector = vec, colName = nm, colType = pt}
PDouble -> do vec <- get :: Get (U.Vector Double)
return Column {rawVector = vec, colName = nm, colType = pt}
UPDATED Thank you for all the answers below, some pretty good ideas! It's quite clear that what I want to do is impossible to achieve head-on - so that is my answer. But the other suggested solutions are a good reading in itself, thanks a bunch!

The type you are really trying to represent is
data Column v = Column (Either (v Int) (v Double))
but this representation may be unsatisfactory to you. So how do you write this type with the vector itself at the 'top level' of the constructor?
First, start with a representation of your sum (Either Int Double) at the type level, as opposed to the value level:
data IsSupportedType a where
TInt :: IsSupportedType Int
TDouble :: IsSupportedType Double
From here Column is actually quite simple:
data Column v a = Column (IsSupportedType a) (v a)
But you'll probably want a existentially quantified to use it how you want:
data Column v = forall a . Column (IsSupportedType a) (v a)
The binary instance is as follows:
instance (Binary (v Int), Binary (v Double)) => Binary (Column v) where
put (Column t v) = do
case t of
TInt -> put (0 :: Int) >> put v
TDouble -> put (1 :: Int) >> put v
get = do
t :: Int <- get
case t of
0 -> Column TInt <$> get
1 -> Column TDouble <$> get
Note that there is no inherent reliance in Vector here - v could really be anything.

The problem you're actually running into (or if you're not yet, that you will) is that you're trying to decide a resulting type from an input value. You cannot do that. At all. You could cleverly lock the result type in a box and throw away the key so the type appears to be normal from the outside, but then you cannot do anything much with it because you locked the type in a box and threw away the key. You can store extra information about it using GADTs and boxing it up with a type class instance, but even still this is not a great idea.
Your could make your life far easier here if you simply had two constructors for Column to reflect whether there was a vector of Ints or Doubles.
But really, don't do any of that. Just let the automatically derivable Binary instance deserialize any deserializable value into your vector for you.
data Column a = ... deriving (Binary)
Using the DeriveAnyClass extension that let's you derive any class that has a Generic implementation (which Binary has). Then just deserialize a Column Double or a Column Int when you need it.

As the comment says, you can simply not case on the type, and always call
vec <- get
return Column {rawVector = vec, colName = nm, colType = pt}
This fulfills your type signature properly. But note that colType is not useful to you here -- you have no way to enforce that it corresponds to the type within your vector, since it only exists at the value level. But that may be ok, and you may simply want to remove colType from your data structure altogether, since you can always derive it directly from the concrete type of a chosen in Column v a.
In fact, the constraint in the Column type isn't doing much good either, and I think it would be better to render it just as
data Column v a = Column {rawVector :: v a, colName :: Text}
Now you can just enforce the G.Vector constraint at call sites where necessary...

Why are type-safe relational operations so difficult?

I was trying to code a relational problem in Haskell, when I had to find out that doing this in a type safe manner is far from obvious. E.g. a humble
select 1,a,b, from T
already raises a number of questions:
what is the type of this function?
what is the type of the projection 1,a,b ? What is the type of a projection in general?
what is the result type and how do I express the relationship between the result type and the projection?
what is the type of such a function which accepts any valid projection?
how can I detect invalid projections at compile time ?
How would I add a column to a table or to a projection?
I believe even Oracle's PL/SQL language does not get this quite right. While invald projections are mostly detected at compile time, the is a large number of type errors which only show at runtime. Most other bindings to RDBMSs (e.g. Java's jdbc and perl's DBI) use SQL contained in Strings and thus give up type-safety entirely.
Further research showed that there are some Haskell libraries (HList, vinyl and TRex), which provide type-safe extensible records and some more. But these libraries all require Haskell extensions like DataKinds, FlexibleContexts and many more. Furthermore these libraries are not easy to use and have a smell of trickery, at least to uninitialized observers like me.
This suggests, that type-safe relational operations do not fit in well with the functional paradigm, at least not as it is implemented in Haskell.
My questions are the following:
What are the fundamental causes of this difficulty to model relational operations in a type safe way. Where does Hindley-Milner fall short? Or does the problem originate at typed lambda calculus already?
Is there a paradigm, where relational operations are first class citizens? And if so, is there a real-world implementation?

Let's define a table indexed on some columns as a type with two type parameters:
data IndexedTable k v = ???
groupBy :: (v -> k) -> IndexedTable k v
-- A table without an index just has an empty key
type Table = IndexedTable ()
k will be a (possibly nested) tuple of all columns that the table is indexed on. v will be a (possibly nested) tuple of all columns that the table is not indexed on.
So, for example, if we had the following table
| Id | First Name | Last Name |
|----|------------|-----------|
| 0 | Gabriel | Gonzalez |
| 1 | Oscar | Boykin |
| 2 | Edgar | Codd |
... and it were indexed on the first column, then the type would be:
type Id = Int
type FirstName = String
type LastName = String
IndexedTable Int (FirstName, LastName)
However, if it were indexed on the first and second column, then the type would be:
IndexedTable (Int, Firstname) LastName
Table would implement the Functor, Applicative, and Alternative type classes. In other words:
instance Functor (IndexedTable k)
instance Applicative (IndexedTable k)
instance Alternative (IndexedTable k)
So joins would be implemented as:
join :: IndexedTable k v1 -> IndexedTable k v2 -> IndexedTable k (v1, v2)
join t1 t2 = liftA2 (,) t1 t2
leftJoin :: IndexedTable k v1 -> IndexedTable k v2 -> IndexedTable k (v1, Maybe v2)
leftJoin t1 t2 = liftA2 (,) t1 (optional t2)
rightJoin :: IndexedTable k v1 -> IndexedTable k v2 -> IndexedTable k (Maybe v1, v2)
rightJoin t1 t2 = liftA2 (,) (optional t1) t2
Then you would have a separate type that we will call a Select. This type will also have two type parameters:
data Select v r = ???
A Select would consume a bunch of rows of type v from the table and produce a result of type r. In other words, we should have a function of type:
selectIndexed :: Indexed k v -> Select v r -> r
Some example Selects that we might define would be:
count :: Select v Integer
sum :: Num a => Select a a
product :: Num a => Select a a
max :: Ord a => Select a a
This Select type would implement the Applicative interface, so we could combine multiple Selects into a single Select. For example:
liftA2 (,) count sum :: Select Integer (Integer, Integer)
That would be analogous to this SQL:
SELECT COUNT(*), SUM(*)
However, often our table will have multiple columns, so we need a way to focus a Select onto a single column. Let's call this function Focus:
focus :: Lens' a b -> Select b r -> Select a r
So that we can write things like:
liftA3 (,,) (focus _1 sum) (focus _2 product) (focus _3 max)
:: (Num a, Num b, Ord c)
=> Select (a, b, c) (a, b, c)
So if we wanted to write something like:
SELECT COUNT(*), MAX(firstName) FROM t
That would be equivalent to this Haskell code:
firstName :: Lens' Row String
table :: Table Row
select table (liftA2 (,) count (focus firstName max)) :: (Integer, String)
So you might wonder how one might implement Select and Table.
I describe how to implement Table in this post:
http://www.haskellforall.com/2014/12/a-very-general-api-for-relational-joins.html
... and you can implement Select as just:
type Select = Control.Foldl.Fold
type focus = Control.Foldl.pretraverse
-- Assuming you define a `Foldable` instance for `IndexedTable`
select t s = Control.Foldl.fold s t
Also, keep in mind that these are not the only ways to implement Table and Select. They are just a simple implementation to get you started and you can generalize them as necessary.
What about selecting columns from a table? Well, you can define:
column :: Select a (Table a)
column = Control.Foldl.list
So if you wanted to do:
SELECT col FROM t
... you would write:
field :: Lens' Row Field
table :: Table Row
select (focus field column) table :: [Field]
The important takeaway is that you can implement a relational API in Haskell just fine without any fancy type system extensions.

Is there a compiler-extension for untagged union types in Haskell?

In some languages (#racket/typed, for example), the programmer can specify a union type without discriminating against it, for instance, the type (U Integer String) captures integers and strings, without tagging them (I Integer) (S String) in a data IntOrStringUnion = ... form or anything like that.
Is there a way to do the same in Haskell?

Either is what you're looking for... ish.
In Haskell terms, I'd describe what you're looking for as an anonymous sum type. By anonymous, I mean that it doesn't have a defined name (like something with a data declaration). By sum type, I mean a data type that can have one of several (distinguishable) types; a tagged union or such. (If you're not familiar with this terminology, try Wikipedia for starters.)
We have a well-known idiomatic anonymous product type, which is just a tuple. If you want to have both an Int and a String, you just smush them together with a comma: (Int, String). And tuples (seemingly) can go on forever--(Int, String, Double, Word), and you can pattern-match the same way. (There's a limit, but never mind.)
The well-known idiomatic anonymous sum type is Either, from Data.Either (and the Prelude):
data Either a b = Left a | Right b
deriving (Eq, Ord, Read, Show, Typeable)
It has some shortcomings, most prominently a Functor instance that favors Right in a way that's odd in this context. The problem is that chaining it introduces a lot of awkwardness: the type ends up like Either (Int (Either String (Either Double Word))). Pattern matching is even more awkward, as others have noted.
I just want to note that we can get closer to (what I understand to be) the Racket use case. From my extremely brief Googling, it looks like in Racket you can use functions like isNumber? to determine what type is actually in a given value of a union type. In Haskell, we usually do that with case analysis (pattern matching), but that's awkward with Either, and function using simple pattern-matching will likely end up hard-wired to a particular union type. We can do better.
IsNumber?
I'm going to write a function I think is an idiomatic Haskell stand-in for isNumber?. First, we don't like doing Boolean tests and then running functions that assume their result; instead, we tend to just convert to Maybe and go from there. So the function's type will end with -> Maybe Int. (Using Int as a stand-in for now.)
But what's on the left hand of the arrow? "Something that might be an Int -- or a String, or whatever other types we put in the union." Uh, okay. So it's going to be one of a number of types. That sounds like typeclass, so we'll put a constraint and a type variable on the left hand of the arrow: MightBeInt a => a -> Maybe Int. Okay, let's write out the class:
class MightBeInt a where
isInt :: a -> Maybe Int
fromInt :: Int -> a
Okay, now how do we write the instances? Well, we know if the first parameter to Either is Int, we're golden, so let's write that out. (Incidentally, if you want a nice exercise, only look at the instance ... where parts of these next three code blocks, and try to implement that class members yourself.)
instance MightBeInt (Either Int b) where
isInt (Left i) = Just i
isInt _ = Nothing
fromInt = Left
Fine. And ditto if Int is the second parameter:
instance MightBeInt (Either a Int) where
isInt (Right i) = Just i
isInt _ = Nothing
fromInt = Right
But what about Either String (Either Bool Int)? The trick is to recurse on the right hand type: if it's not Int, is it an instance of MightBeInt itself?
instance MightBeInt b => MightBeInt (Either a b) where
isInt (Right xs) = isInt xs
isInt _ = Nothing
fromInt = Right . fromInt
(Note that these all require FlexibleInstances and OverlappingInstances.) It took me a long time to get a feel for writing and reading these class instances; don't worry if this instance is surprising. The punch line is that we can now do this:
anInt1 :: Either Int String
anInt1 = fromInt 1
anInt2 :: Either String (Either Int Double)
anInt2 = fromInt 2
anInt3 :: Either String Int
anInt3 = fromInt 3
notAnInt :: Either String Int
notAnInt = Left "notint"
ghci> isInt anInt3
Just 3
ghci> isInt notAnInt
Nothing
Great!
Generalizing
Okay, but now do we need to write another type class for each type we want to look up? Nope! We can parameterize the class by the type we want to look up! It's a pretty mechanical translation; the only question is how to tell the compiler what type we're looking for, and that's where Proxy comes to the rescue. (If you don't want to install tagged or run base 4.7, just define data Proxy a = Proxy. It's nothing special, but you'll need PolyKinds.)
class MightBeA t a where
isA :: proxy t -> a -> Maybe t
fromA :: t -> a
instance MightBeA t t where
isA _ = Just
fromA = id
instance MightBeA t (Either t b) where
isA _ (Left i) = Just i
isA _ _ = Nothing
fromA = Left
instance MightBeA t b => MightBeA t (Either a b) where
isA p (Right xs) = isA p xs
isA _ _ = Nothing
fromA = Right . fromA
ghci> isA (Proxy :: Proxy Int) anInt3
Just 3
ghci> isA (Proxy :: Proxy String) notAnInt
Just "notint"
Now the usability situation is... better. The main thing we've lost, by the way, is the exhaustiveness checker.
Notational Parity With (U String Int Double)
For fun, in GHC 7.8 we can use DataKinds and TypeFamilies to eliminate the infix type constructors in favor of type-level lists. (In Haskell, you can't have one type constructor--like IO or Either--take a variable number of parameters, but a type-level list is just one parameter.) It's just a few lines, which I'm not really going to explain:
type family OneOf (as :: [*]) :: * where
OneOf '[] = Void
OneOf '[a] = a
OneOf (a ': as) = Either a (OneOf as)
Note that you'll need to import Data.Void. Now we can do this:
anInt4 :: OneOf '[Int, Double, Float, String]
anInt4 = fromInt 4
ghci> :kind! OneOf '[Int, Double, Float, String]
OneOf '[Int, Double, Float, String] :: *
= Either Int (Either Double (Either Float [Char]))
In other words, OneOf '[Int, Double, Float, String] is the same as Either Int (Either Double (Either Float [Char])).

You need some kind of tagging because you need to be able to check if a value is actually an Integer or a String to use it for anything. One way to alleviate having to create a custom ADT for every combination is to use a type such as
{-# LANGUAGE TypeOperators #-}
data a :+: b = L a | R b
infixr 6 :+:
returnsIntOrString :: Integer -> Integer :+: String
returnsIntOrString i
| i `rem` 2 == 0 = R "Even"
| otherwise = L (i * 2)
returnsOneOfThree :: Integer -> Integer :+: String :+: Bool
returnsOneOfThree i
| i `rem` 2 == 0 = (R . L) "Even"
| i `rem` 3 == 0 = (R . R) False
| otherwise = L (i * 2)

Map identity functor over record

I have a record type like this one:
data VehicleState f = VehicleState
{
orientation :: f (Quaternion Double),
orientationRate :: f (Quaternion Double),
acceleration :: f (V3 (Acceleration Double)),
velocity :: f (V3 (Velocity Double)),
location :: f (Coordinate),
elapsedTime :: f (Time Double)
}
deriving (Show)
This is cool, because I can have a VehicleState Signal where I have all sorts of metadata, I can have a VehicleState (Wire s e m ()) where I have the netwire semantics of each signal, or I can have a VehicleState Identity where I have actual values observed at a certain time.
Is there a good way to map back and forth between VehicleState Identity and VehicleState', defined by mapping runIdentity over each field?
data VehicleState' = VehicleState'
{
orientation :: Quaternion Double,
orientationRate :: Quaternion Double,
acceleration :: V3 (Acceleration Double),
velocity :: V3 (Velocity Double),
location :: Coordinate,
elapsedTime :: Time Double
}
deriving (Show)
Obviously it's trivial to write one, but I actually have several types like this in my real application and I keep adding or removing fields, so it is tedious.
I am writing some Template Haskell that does it, just wondering if I am reinventing the wheel.

If you're not opposed to type families and don't need too much type inference, you can actually get away with using a single datatype:
import Data.Singletons.Prelude
data Record f = Record
{ x :: Apply f Int
, y :: Apply f Bool
, z :: Apply f String
}
type Record' = Record IdSym0
test1 :: Record (TyCon1 Maybe)
test1 = Record (Just 3) Nothing (Just "foo")
test2 :: Record'
test2 = Record 2 False "bar"
The Apply type family is defined in the singletons package. It can be applied to
various type functions also defined in that package (and of course, you can define your
own). The IdSym0 has the property that Apply IdSym0 x reduces to plain x. And
TyCon1 has the property that Apply (TyCon1 f) x reduces to f x.
As demonstrated by
test1 and test2, this allows both versions of your datatype. However, you need
type annotations for most records now.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Polymorphic Vector storage for in-memory column store type of task - haskell

Related

Anonymous records: what ways to type-level tag in Haskell?

Polymorphic return types and "rigid type variable" error in Haskell

Why are type-safe relational operations so difficult?

Is there a compiler-extension for untagged union types in Haskell?

Map identity functor over record

Categories

Resources