Implementing DataType abstraction in Haskell - haskell

I want to implement am Abstract Datype in haskell.
Given a moule with a defined type, say Mytype:
module A (myType,MyType) where
type MyType = Float
mytype :: Float -> MyType
myType f = f
Implemented internally as a single Float, I exported only a function to construct a value given a Float, and the type itself.
The problem is, when I load that module, I can acces the implementation.
Given:
module B where
import A
data OtherType = One MyType
| Two MyType MyType
deriving Show
I can construct an object of type OtherType like this:
One $ mytype 1.0
Or like this:
One $ (1.0 :: Float)
With a real abstraction I shouldn't be able to do that!
How can I export the type Mytype, in a way such that I can only construct values from my constructor functions

You can create an Algebraic Datatype instead:
module A (myType,MyType) where
data MyType = MyType Float
mytype :: Float -> MyType
myType f = MyType f
Then, trying to evaluate things like
One (MyType 3.0)
throws "Not in scope: data constructor `MyType'"

Related

Why can't I match type Int with type a

Haskell Noob here.
An oversimplified case of what I'm trying to do here:
test :: Int -> a
test i = i -- Couldn't match expected type ‘a’ with actual type ‘Int’. ‘a’ is a rigid type variable bound by ...
I don't quite understand why this wouldn't work. I mean, Int is surely included in something of type a.
What I was really trying to achieve is this:
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE KindSignatures #-}
data EnumType = Enum1 | Enum2 | Enum3
data MyType (a :: EnumType) where
Type1 :: Int -> MyType 'Enum1
Type2 :: String -> MyType 'Enum2
Type3 :: Bool -> MyType 'Enum3
myFunc :: EnumType -> MyType 'Enum1 -> MyType any
myFunc Enum1 t = t -- Can't match type `any` with `Enum1`. any is a rigid type variable bound by ...
myFunc Enum2 _ = Type2 "hi"
myFunc Enum3 _ = Type3 True
What is going on here? Is there a way to work around this or is it just something you can't do?
For the GADT function you want to write, the standard technique is to use singletons. The problem is that values of type EnumType are value-level things, but you want to inform the type system of something. So you need a way to connect types of kind EnumType with values of type EnumType (which itself has kind Type). That's impossible, so we cheat: we connect types x of kind EnumType with values of a new type, SEnumType x, such that the value uniquely determines x. Here's how it looks:
data SEnumType a where
SEnum1 :: SEnumType Enum1
SEnum2 :: SEnumType Enum2
SEnum3 :: SEnumType Enum3
myFunc :: SEnumType a -> MyType Enum1 -> MyType a
myFunc SEnum1 t = t
myFunc SEnum2 _ = Type2 "hi"
myFunc SEnum3 _ = Type3 True
Now the a in the return type MyType a isn't just fabricated out of thin air; it is constrained to be equal to the incoming a from SEnumType, and pattern matching on which SEnumType it is lets you observe whether a is Enum1, Enum2, or Enum3.
Is there a way to work around this or is it just something you can't do?
I'm afraid that it's just "something you can't do". The reason is simple to explain.
When you write a type signature like (to take your first, simpler example)
test :: Int -> a
or, to write it the more literal, expanded form
test :: forall a. Int -> a
You are saying that literally, "for all a", this function can take an Int and return a value of type a. This is important, because calling code has to believe this type signature and therefore be able to do something like this (this isn't realistic code, but imagine a case where you feed the result of test 2 to a function that requires a Char or one of those other types):
test 2 :: Char
test 2 :: [Int]
test 2 :: (Double, [Char])
and so on. Clearly your function can't work with any of these examples - but it has to be able to work with any of them if you give it this type signature. Your code, quite simply, does not fit that type signature. (And nor could any, unless you "cheat" by having eg test x = undefined.)
This shouldn't be a problem though - the compiler is simply protecting you from a mistake, because I'm sure you realise that your code cannot satisfy this type signature. To take your "real" example:
myFunc :: EnumType -> MyType Enum1 -> MyType any
although this produces a compilation error, your code in the function is likely correct, and the problem is the type signature. If you replace it with
myFunc :: EnumType -> MyType Enum1 -> MyType Enum1
then it will compile (barring any further errors, which I've not checked it for), and presumably do what you want. It doesn't look like you actually want to be able to call myFunc and have it produce, say, a MyType Int. (If by any chance you do, I'd suggest you ask a separate question where you elaborate on what you actually need here.)
As was already said, your signature expresses a universal type
myFunc :: ∀ a . EnumType -> MyType 'Enum1 -> MyType a
whereas what you're actually trying to express is an existential type
myFunc :: EnumType -> MyType 'Enum1 -> (∃ a . MyType a)
Haskell doesn't have a feature quite like that, but it does have some way to achieve essentially the same thing.
Both GADTs and the ExistentialTypes extension allow expressing existentials, but you need to define a separate type for them.
data MyDynType where
MyDynWrap :: MyType a -> MyDynType
myFunc :: EnumType -> MyType 'Enum1 -> MyDynType
myFunc Enum1 t = MyDynWrap t
myFunc Enum2 _ = MyDynWrap $ Type2 "hi"
myFunc Enum3 _ = MyDynWrap $ Type3 True
Maybe you don't even need a separate type, but can simply modify MyType to be “dynamic” in the first place.
data MyType = Type1 Int | Type2 String | Type3 Bool
myFunc :: EnumType -> MyType -> MyType
myFunc Enum1 (Type1 i) = Type1 i
myFunc Enum2 _ = Type2 "hi"
myFunc Enum3 _ = Type3 True
existentials can be emulated at the spot, anonymously, by unwrapping a layer of continuation-passing style and then using the dual universal quantifier via the RankNTypes extension.
{-# LANGUAGE RankNTypes #-}
data MyType (a :: EnumType) where ... -- as original
myFunc :: EnumType -> MyType 'Enum1 -> (∀ a . MyType a -> r) -> r
myFunc Enum1 t q = q t
myFunc Enum2 _ q = q (Type2 "hi")
myFunc Enum3 _ q = q (Type3 True)
the GADT function you want to write, the standard technique is to use singletons. The problem is that values of type EnumType are ...

Clarifying Data Constructor in Haskell

In the following:
data DataType a = Data a | Datum
I understand that Data Constructor are value level function. What we do above is defining their type. They can be function of multiple arity or const. That's fine. I'm ok with saying Datum construct Datum. What is not that explicit and clear to me here is somehow the difference between the constructor function and what it produce. Please let me know if i am getting it well:
1 - a) Basically writing Data a, is defining both a Data Structure and its Constructor function (as in scala or java usually the class and the constructor have the same name) ?
2 - b) So if i unpack and make an analogy. With Data a We are both defining a Structure(don't want to use class cause class imply a type already i think, but maybe we could) of object (Data Structure), the constructor function (Data Constructor/Value constructor), and the later return an object of that object Structure. Finally The type of that Structure of object is given by the Type constructor. An Object Structure in a sense is just a Tag surrounding a bunch value of some type. Is my understanding correct ?
3 - c) Can I formally Say:
Data Constructor that are Nullary represent constant values -> Return the the constant value itself of which the type is given by the Type Constructor at the definition site.
Data Constructor that takes an argument represent class of values, where class is a Tag ? -> Return an infinite number of object of that class, of which the type is given by the Type constructor at the definition site.
Another way of writing this:
data DataType a = Data a | Datum
Is with generalised algebraic data type (GADT) syntax, using the GADTSyntax extension, which lets us specify the types of the constructors explicitly:
{-# LANGUAGE GADTSyntax #-}
data DataType a where
Data :: a -> DataType a
Datum :: DataType a
(The GADTs extension would work too; it would also allow us to specify constructors with different type arguments in the result, like DataType Int vs. DataType Bool, but that’s a more advanced topic, and we don’t need that functionality here.)
These are exactly the types you would see in GHCi if you asked for the types of the constructor functions with :type / :t:
> :{
| data DataType a where
| Data :: a -> DataType a
| Datum :: DataType a
| :}
> :type Data
Data :: a -> DataType a
> :t Datum
Datum :: DataType a
With ExplicitForAll we can also specify the scope of the type variables explicitly, and make it clearer that the a in the data definition is a separate variable from the a in the constructor definitions by also giving them different names:
data DataType a where
Data :: forall b. b -> DataType b
Datum :: forall c. DataType c
Some more examples of this notation with standard prelude types:
data Either a b where
Left :: forall a b. a -> Either a b
Right :: forall a b. b -> Either a b
data Maybe a where
Nothing :: Maybe a
Just :: a -> Maybe a
data Bool where
False :: Bool
True :: Bool
data Ordering where
LT, EQ, GT :: Ordering -- Shorthand for repeated ‘:: Ordering’
I understand that Data Constructor are value level function. What we do above is defining their type. They can be function of multiple arity or const. That's fine. I'm ok with saying Datum construct Datum. What is not that explicit and clear to me here is somehow the difference between the constructor function and what it produce.
Datum and Data are both “constructors” of DataType a values; neither Datum nor Data is a type! These are just “tags” that select between the possible varieties of a DataType a value.
What is produced is always a value of type DataType a for a given a; the constructor selects which “shape” it takes.
A rough analogue of this is a union in languages like C or C++, plus an enumeration for the “tag”. In pseudocode:
enum Tag {
DataTag,
DatumTag,
}
// A single anonymous field.
struct DataFields<A> {
A field1;
}
// No fields.
struct DatumFields<A> {};
// A union of the possible field types.
union Fields<A> {
DataFields<A> data;
DatumFields<A> datum;
}
// A pair of a tag with the fields for that tag.
struct DataType<A> {
Tag tag;
Fields<A> fields;
}
The constructors are then just functions returning a value with the appropriate tag and fields. Pseudocode:
<A> DataType<A> newData(A x) {
DataType<A> result;
result.tag = DataTag;
result.fields.data.field1 = x;
return result;
}
<A> DataType<A> newDatum() {
DataType<A> result;
result.tag = DatumTag;
// No fields.
return result;
}
Unions are unsafe, since the tag and fields can get out of sync, but sum types are safe because they couple these together.
A pattern-match like this in Haskell:
case someDT of
Datum -> f
Data x -> g x
Is a combination of testing the tag and extracting the fields. Again, in pseudocode:
if (someDT.tag == DatumTag) {
f();
} else if (someDT.tag == DataTag) {
var x = someDT.fields.data.field1;
g(x);
}
Again this is coupled in Haskell to ensure that you can only ever access the fields if you have checked the tag by pattern-matching.
So, in answer to your questions:
1 - a) Basically writing Data a, is defining both a Data Structure and its Constructor function (as in scala or java usually the class and the constructor have the same name) ?
Data a in your original code is not defining a data structure, in that Data is not a separate type from DataType a, it’s just one of the possible tags that a DataType a value may have. Internally, a value of type DataType Int is one of the following:
The tag for Data (in GHC, a pointer to an “info table” for the constructor), and a reference to a value of type Int.
x = Data (1 :: Int) :: DataType Int
+----------+----------------+ +---------+----------------+
x ---->| Data tag | pointer to Int |---->| Int tag | unboxed Int# 1 |
+----------+----------------+ +---------+----------------+
The tag for Datum, and no other fields.
y = Datum :: DataType Int
+-----------+
y ----> | Datum tag |
+-----------+
In a language with unions, the size of a union is the maximum of all its alternatives, since the type must support representing any of the alternatives with mutation. In Haskell, since values are immutable, they don’t require any extra “padding” since they can’t be changed.
It’s a similar situation for standard data types, e.g., a product or sum type:
(x :: X, y :: Y) :: (X, Y)
+---------+--------------+--------------+
| (,) tag | pointer to X | pointer to Y |
+---------+--------------+--------------+
Left (m :: M) :: Either M N
+-----------+--------------+
| Left tag | pointer to M |
+-----------+--------------+
Right (n :: N) :: Either M N
+-----------+--------------+
| Right tag | pointer to N |
+-----------+--------------+
2 - b) So if i unpack and make an analogy. With Data a We are both defining a Structure(don't want to use class cause class imply a type already i think, but maybe we could) of object (Data Structure), the constructor function (Data Constructor/Value constructor), and the later return an object of that object Structure. Finally The type of that Structure of object is given by the Type constructor. An Object Structure in a sense is just a Tag surrounding a bunch value of some type. Is my understanding correct ?
This is sort of correct, but again, the constructors Data and Datum aren’t “data structures” by themselves. They’re just the names used to introduce (construct) and eliminate (match) values of type DataType a, for some type a that is chosen by the caller of the constructors to fill in the forall
data DataType a = Data a | Datum says:
If some term e has type T, then the term Data e has type DataType T
Inversely, if some value of type DataType T matches the pattern Data x, then x has type T in the scope of the match (case branch or function equation)
The term Datum has type DataType T for any type T
3 - c) Can I formally Say:
Data Constructor that are Nullary represent constant values -> Return the the constant value itself of which the type is given by the Type Constructor at the definition site.
Data Constructor that takes an argument represent class of values, where class is a Tag ? -> Return an infinite number of object of that class, of which the type is given by the Type constructor at the definition site.
Not exactly. A type constructor like DataType :: Type -> Type, Maybe :: Type -> Type, or Either :: Type -> Type -> Type, or [] :: Type -> Type (list), or a polymorphic data type, represents an “infinite” family of concrete types (Maybe Int, Maybe Char, Maybe (String -> String), …) but only in the same way that id :: forall a. a -> a represents an “infinite” family of functions (id :: Int -> Int, id :: Char -> Char, id :: String -> String, …).
That is, the type a here is a parameter filled in with an argument value given by the caller. Usually this is implicit, through type inference, but you can specify it explicitly with the TypeApplications extension:
-- Akin to: \ (a :: Type) -> \ (x :: a) -> x
id :: forall a. a -> a
id x = x
id #Int :: Int -> Int
id #Int 1 :: Int
Data :: forall a. a -> DataType a
Data #Char :: Char -> DataType Char
Data #Char 'x' :: DataType Char
The data constructors of each instantiation don’t really have anything to do with each other. There’s nothing in common between the instantiations Data :: Int -> DataType Int and Data :: Char -> DataType Char, apart from the fact that they share the same tag name.
Another way of thinking about this in Java terms is with the visitor pattern. DataType would be represented as a function that accepts a “DataType visitor”, and then the constructors don’t correspond to separate data types, they’re just the methods of the visitor which accept the fields and return some result. Writing the equivalent code in Java is a worthwhile exercise, but here it is in Haskell:
{-# LANGUAGE RankNTypes #-}
-- (Allows passing polymorphic functions as arguments.)
type DataType a
= forall r. -- A visitor with a generic result type
r -- With one “method” for the ‘Datum’ case (no fields)
-> (a -> r) -- And one for the ‘Data’ case (one field)
-> r -- Returning the result
newData :: a -> DataType a
newData field = \ _visitDatum visitData -> visitData field
newDatum :: DataType a
newDatum = \ visitDatum _visitData -> visitDatum
Pattern-matching is simply running the visitor:
matchDT :: DataType a -> b -> (a -> b) -> b
matchDT dt visitDatum visitData = dt visitDatum visitData
-- Or: matchDT dt = dt
-- Or: matchDT = id
-- case someDT of { Datum -> f; Data x -> g x }
-- f :: r
-- g :: a -> r
-- someDT :: DataType a
-- :: forall r. r -> (a -> r) -> r
someDT f (\ x -> g x)
Similarly, in Haskell, data constructors are just the ways of introducing and eliminating values of a user-defined type.
What is not that explicit and clear to me here is somehow the difference between the constructor function and what it produce
I'm having trouble following your question, but I think you are complicating things. I would suggest not thinking too deeply about the "constructor" terminology.
But hopefully the following helps:
Starting simple:
data DataType = Data Int | Datum
The above reads "Declare a new type named DataType, which has the possible values Datum or Data <some_number> (e.g. Data 42)"
So e.g. Datum is a value of type DataType.
Going back to your example with a type parameter, I want to point out what the syntax is doing:
data DataType a = Data a | Datum
^ ^ ^ These things appear in type signatures (type level)
^ ^ These things appear in code (value level stuff)
There's a bit of punning happening here. so in the data declaration you might see "Data Int" and this is mixing type-level and value-level stuff in a way that you wouldn't see in code. In code you'd see e.g. Data 42 or Data someVal.
I hope that helps a little...

What is the right way to declare data that is an extension of another data

I am modelling a set of "things". For the most part all the things have the same characteristics.
data Thing = Thing { chOne :: Int, chTwo :: Int }
There is a small subset of things that can be considered to have an "extended" set of characteristics in addition to the base set shared by all members.
chThree :: String
I'd like to have functions that can operate on both kinds of things (these functions only care about properties chOne and chTwo):
foo :: Thing -> Int
I'd also like to have functions that operate on the kind of things with the chThree characteristic.
bar :: ThingLike -> String
I could do
data ThingBase = Thing { chOne :: Int, chTwo :: Int }
data ThingExt = Thing { chOne :: Int, chTwo :: Int, chThree :: Int }
fooBase :: ThingBase -> Int
fooExt :: ThingExt -> Int
bar :: ThingExt -> String
But this is hideous.
I guess I could use type classes, but all the boilerplate suggests this is wrong:
class ThingBaseClass a of
chOne' :: Int
chTwo' :: Int
instance ThingBaseClass ThingBase where
chOne' = chOne
chTwo' = chTwo
instance ThingBaseClass ThingExt where
chOne' = chOne
chTwo' = chTwo
class ThingExtClass a of
chThree' :: String
instance ThingExtClass ThingExt where
chThree' = chThree
foo :: ThingBaseClass a => a -> Int
bar :: ThingExtClass a => a -> String
What is the right way to do this?
One way to do so, is the equivalent of OO aggregation :
data ThingExt = ThingExt { thing :: Thing, chTree :: Int }
You can then create a class as in your post
instance ThingLike ThingExt where
chOne' = chOne . thing
chTwo' = chTwo . thing
If you are using the lens library you can use makeClassy which will generate all this boiler plate for you.
You can make a data type that is a type union of the two distinct types of things:
data ThingBase = ThingBase { chBaseOne :: Int, chBaseTwo :: Int }
data ThingExt = ThingExt { chExtOne :: Int, chExtTwo :: Int, chExtThree :: Int }
data ThingLike = CreatedWithBase ThingBase |
CreatedWithExt ThingExt
Then for any function which should take either a ThingBase or a ThingExt, and do different things depending, you can do pattern matching on the type constructor:
foo :: ThingLike -> Int
foo (CreatedWithBase (ThingBase c1 c2)) = c1 + c2
foo (CreatedWithExt (ThingExt c1 c2 c3)) = c3
-- Or another way:
bar :: ThingLike -> Int
bar (CreatedWithBase v) = (chBaseOne v) + (chBaseTwo v)
bar (CreatedWithExt v) = chExtThree v
This has the benefit that it forces you to pedantically specify exactly what happens to ThingBases or ThingExts wherever they appear to be processed as part of handling a ThingLike, by creating the extra wrapping layer of constructors (the CreatedWithBase and CreatedWithExt constructors I used, whose sole purpose is to indicate which type of thing you expect at a certain point of code).
But it has the disadvantage that it doesn't allow for overloaded names for the field accessor functions. Personally I don't see this as too big of a loss, since the extra verbosity required to reference attributes acts like a natural complexity penalty and helps motivate the programmer to keep the code sparse and use fewer bad accessor/getter/setter anti-patterns. However, if you want to go far with overloaded accessor names, you should look into lenses.
This is just one idea and it's not right for every problem. The example you already give with type classes is also perfectly fine and I don't see any good reason to call it hideous.
Just about the only "bad" thing would be wanting to somehow implicitly process ThingBases differently from ThingExts without needing anything in the type signature or the pattern matching sections of a function body to explicitly tell people reading your code precisely when and where the two different types are differentiated, which would be more like a duck typing approach which is not really what you should do in Haskell.
This seems to be what you're trying to get at by trying to force both ThingBase and ThingExt to have a value constructor with the same name of just Thing -- it seems artificially nice that the same word can construct values of either type, but my feeling is it's not actually nice. I might be misunderstanding though.
A very simple solution is to introduce a type parameter:
data ThingLike a = ThingLike { chOne, chTwo :: Int, chThree :: a }
deriving Show
Then, a ThingBase is just a ThingLike with no third element, so
type ThingBase = ThingLike ()
ThingExt contains an additional Int, so
type ThingExt = ThingLike Int
This has the advantage of using only a single constructor and only three record accessors. There is minimal duplication, and writing your desired functions is simple:
foo :: ThingLike a -> Int
foo (ThingLike x y _) = x+y
bar :: ThingExt -> String
bar (ThingLike x y z) = show $ x+y+z
One option is:
data Thing = Thing { chOne :: Int, chTwo :: Int }
| OtherThing { chOne :: Int, chTwo :: Int, chThree :: String }
Another is
data Thing = Thing { chOne :: Int, chTwo :: Int, chThree :: Maybe String }
If you want to distinguish the two Things at the type level and have overloaded accessors then you need to make use of a type class.
You could use a Maybe ThingExt field on ThingBase I guess, at least if you only have one extension type.
If you have several extensions like this, you can use a combination of embedding and matching on various constructors of the embedded data type, where each constructor represents one way to extend the base structure.
Once that becomes unmanageable, classes might become unevitable, but some kind of data type composition would still be useful to avoid duplication.

Haskell data type fields

I have my own data type:
type Types = String
data MyType = MyType [Types]
I have a utility function:
initMyType :: [Types] -> MyType
initMyType types = Mytype types
Now I create:
let a = MyType ["1A", "1B", "1C"]
How can I get the list ["1A", "1B", "1C"] from a? And in general, how can I get data from a data constructor?
Besides using pattern matching, as in arrowdodger's answer, you can also use record syntax to define the accessor automatically:
data MyType = MyType { getList :: [Types] }
This defines a type exactly the same as
data MyType = MyType [Types]
but also defines a function
getList :: MyType -> [Types]
and allows (but doesn't require) the syntax MyType { getList = ["a", "b", "c"] } for constructing values of MyType. By the way, initMyTypes is not really necessary unless it does something else besides constructing the value, because it does exactly the same as the constructor MyType (but can't be used in pattern matching).
You can pattern-match it somewhere in your code, or write deconstructing function:
getList (MyType lst) = lst
There are many ways of pattern matching in Haskell. See http://www.haskell.org/tutorial/patterns.html for details on where patterns can be used (e.g. case statements), and on different kinds of patterns (lazy patterns, 'as' patterns etc)
(Answer for future generations.)
So, your task is to obtain all fields from a data type constructor.
Here is a pretty elegant way to do this.
We're going to use DeriveFoldable GHC extension, so to enable it we must
change your datatype declaration like so: data MyType a = MyType [a] deriving (Show, Foldable).
Here is a generic way to get all data from a data constructor.
This is the whole solution:
{-# LANGUAGE DeriveFoldable #-}
import Data.Foldable (toList)
type Types = String
data MyType a = MyType [a] deriving (Show, Foldable)
main =
let a = MyType ["1A", "1B", "1C"] :: MyType Types
result = toList a
in print result
Printing the result will give you '["1A","1B","1C"]' as you wanted.

Using Data.Array in a Haskell Data Type

I have been developing some code that uses Data.Array to use multidimensional arrays,
now I want to put those arrays into a data type so I have something like this
data MyType = MyType { a :: Int, b :: Int, c :: Array }
Data.Array has type:
(Ix i, Num i, Num e) => Array i e
Where "e" can be of any type not just Num.
I am convinced I am missing a concept completely.
How do I accomplish this?
What is special about the Data.Array type that is different from Int, Num, String etc?
Thanks for the help!
Array is not a type. It's a type constructor. It has kind * -> * -> * which means that you give it two types to get a type back. You can sort of think of it like a function. Types like Int are of kind *. (Num is a type class, which is an entirely different thing).
You're declaring c to be a field of a record, i.e., c is a value. Values have to have a type of kind *. (There are actually a few more kinds for unboxed values but don't worry about that for now).
So you need to provide two type arguments to make a type for c. You can choose two concrete types, or you can add type arguments to MyType to allow the choice to be made elsewhere.
data MyType1 = MyType { a, b :: Int, c :: Array Foo Bar }
data MyType2 i e = MyType { a, b :: Int, c :: Array i e }
References
Kinds for C++ users.
Kind (type theory) on Wikipedia.
You need to add the type variables i and e to your MyType:
data MyTYpe i e = MyType { a, b :: Int, c :: Array i e }

Resources