Haskell - Using one data type kind in another - haskell

Haskell newbie; I want to be able to declare Val which can be either IntVal, StringVal FloatVal and a List which can be either StringList, IntList, FloatList, whose elements are (correspondingly): StringVal, IntVal and FloatVal.
My attempt so far:
data Val = IntVal Int
| FloatVal Float
| StringVal String deriving Show
data List = IntList [(IntVal Int)]
| FloatList [(FloatVal Float)]
| StringList [(StringVal String)] deriving Show
fails with the error:
Not in scope: type constructor or class ‘IntVal’
A data constructor of that name is in scope; did you mean DataKinds?
data List = IntList [(IntVal Int)]
... (similarly for StringVal, FloatVal..)
what is the right way to achieve this?
PS:
declaring List as data List = List [Val] ends up allowing Lists as follows:
l = [(IntVal 10),(StringVal "Hello")], which I do not want to allow.
I want each element of list to be a Value of same kind

There is a solution using GADTs. The problem is that IntVal etc are not actually types, they are just constructors (basically functions that also support pattern matching) for the single type Val. So once you have made a Val, the information about which kind of value it is is completely lost at the type level (that is, compile time).
The trick is to tag Val with the type it contains.
data Val a where
IntVal :: Int -> Val Int
FloatVal :: Float -> Val Float
StringVal :: String -> Val String
Then if you have a plain list [Val a] it will already be homogeneous. If you must:
data List = IntList [Val Int]
| FloatList [Val Float]
...
which is slightly different in that it "erases" the type of list, and it can distinguish between an empty list of ints and an empty list of floats, for example. You could also use the same GADT trick with List
data List a where
IntList :: [Val Int] -> List Int
FloatList :: [Val Float] -> List Float
...
but in that case I think a better design is probably the simpler
newtype List a = List [Val a]
The trade-offs between all these different designs really depends on what you are planning to do with them.

Related

Haskell : recursive data type (Parameterized Types)

I have this :
data Val s i a = S s | I i | A a deriving (Show)
To play with non-homogenous lists in Haskell. So I can do something like (just an example function ):
oneDown :: Val String Int [String]-> Either String String
But I would actually like this to be a list of Vals, i.e. something like :
oneDown :: Val String Int [Val]-> Either String String
What you're looking for would result in an infinite data type, which Haskell explicitly disallows. However, we can hide this infinity behind a newtype and the compiler won't complain.
data Val s i a = S s | I i | A a deriving (Show)
newtype Val' = Val' (Val String Int [Val']) deriving (Show)
It's still doing exactly what your example did (plus a few type constructors that will get optimized away at runtime), but now we can infinitely recurse because we've guarded the recursive type.
This is actually what the recursion-schemes library does to get inductively-defined data that we can define generic recursion techniques on. If you're interested in generalized data types like this, you may have a look at that library.
To construct this newly-made type, we have to use the Val' constructor.
let myVal = A [Val' (I 3), Val' (S "ABC"), Val' (A [])]

What does such notation mean in haskell?

function :: Type1 Type2
Are Type1 and Type2 return values (tuples)?
data Loc = Loc String Int Int
data Parser b a = P (b -> [(a, b)])
parse :: Parser b a -> b -> [(a, b)]
parse (P p) inp = p inp
type Lexer a = Parser (Loc, String) a
item :: Lexer Char
item = ????
How should I return Lexer and Char from item function?
Could you please give me some simple example.
No, it is not a tuple, types can be parameterized as well. In imperative languages like Java, this concept is usually know as generic types (although there is no one-on-one mapping of the two concepts).
In Java for instance you have classes like:
class LinkedList<E> {
// ...
}
Now here we can see LinkedList as a function that takes as input a parameter E, and then returns a real type (for example LinkedList<String> is a linked list that stores Strings). So we can see such abstract type as a function.
This is a concept that is used in Haskell as well. We have for instance the Maybe type:
data Maybe a = Nothing | Just a
Notice the a here. This is a type parameter that we need to fill in. A function can not return a Maybe, it can only return a Maybe a where a is filled in. For example a Maybe Char: a Maybe type that is a Nothing, or a Just x with x a Char.

How should types be used in Haskell type classes?

I'm new to Haskell, and a little confused about how type classes work. Here's a simplified example of something I'm trying to do:
data ListOfInts = ListOfInts {value :: [Int]}
data ListOfDoubles = ListOfDoubles {value :: [Double]}
class Incrementable a where
increment :: a -> a
instance Incrementable ListOfInts where
increment ints = map (\x -> x + 1) ints
instance Incrementable ListOfDoubles where
increment doubles = map (\x -> x + 1) doubles
(I realize that incrementing each element of a list can be done very simply, but this is just a simplified version of a more complex problem.)
The compiler tells me that I have multiple declarations of value. If I change the definitions of ListOfInts and ListOfDoubles as follows:
type ListOfInts = [Int]
type ListOfDoubles = [Double]
Then the compiler says "Illegal instance declaration for 'Incrementable ListOfInts'" (and similarly for ListOfDoubles. If I use newtype, e.g., newtype ListOfInts = ListOfInts [Int], then the compiler tells me "Couldn't match expected type 'ListOfInts' with actual type '[b0]'" (and similarly for ListOfDoubles.
My understanding of type classes is that they facilitate polymorphism, but I'm clearly missing something. In the first example above, does the compiler just see that the type parameter a refers to a record with a field called value and that it appears I'm trying to define increment for this type in multiple ways (rather than seeing two different types, one which has a field whose type of a list of Ints, and the other whose type is a list of Doubles)? And similarly for the other attempts?
Thanks in advance.
You're really seeing two separate problems, so I'll address them as such.
The first one is with the value field. Haskell records work in a slightly peculiar way: when you name a field, it is automatically added to the current scope as a function. Essentially, you can think of
data ListOfInts = ListOfInts {value :: [Int]}
as syntax sugar for:
data ListOfInts = ListOfInts [Int]
value :: ListOfInt -> [Int]
value (ListOfInts v) = v
So having two records with the same field name is just like having two different functions with the same name--they overlap. This is why your first error tells you that you've declared values multiple times.
The way to fix this would be to define your types without using the record syntax, as I did above:
data ListOfInts = ListOfInts [Int]
data ListOfDoubles = ListOfDoubles [Double]
When you used type instead of data, you simply created a type synonym rather than a new type. Using
type ListOfInts = [Int]
means that ListOfInts is the same as just [Int]. For various reasons, you can't use type synonyms in class instances by default. This makes sense--it would be very easy to make a mistake like trying to write an instance for [Int] as well as one for ListOfInts, which would break.
Using data to wrap a single type like [Int] or [Double] is the same as using newtype. However, newtype has the advantage that it carries no runtime overhead at all. So the best way to write these types would indeed be with newtype:
newtype ListOfInts = ListOfInts [Int]
newtype ListOfDoubles = ListOfDoubles [Double]
An important thing to note is that when you use data or newtype, you also have to "unwrap" the type if you want to get at its content. You can do this with pattern matching:
instance Incrementable ListOfInts where
increment (ListOfInts ls) = ListOfInts (map (\ x -> x + 1) ls)
This unwraps the ListOfInts, maps a function over its contents and wraps it back up.
As long as you unwrap the value this way, your instances should work.
On a side note, you can write map (\ x -> x + 1) as map (+ 1), using something that is called an "operator section". All this means is that you implicitly create a lambda filling in whichever argument of the operator is missing. Most people find the map (+ 1) version easier to read because there is less unnecessary noise.

Create a type that can contain an int and a string in either order

I'm following this introduction to Haskell, and this particular place (user defined types 2.2) I'm finding particularly obscure. To the point, I don't even understand what part of it is code, and what part is the thoughts of the author. (What is Pt - it is never defined anywhere?). Needless to say, I can't execute / compile it.
As an example that would make it easier for me to understand, I wanted to define a type, which is a pair of an Integer and a String, or a String and an Integer, but nothing else.
The theoretical function that would use it would look like so:
combine :: StringIntPair -> String
combine a b = (show a) ++ b
combine a b = a ++ (show b)
If you need a working code, that does the same, here's CL code for doing it:
(defgeneric combine (a b)
(:documentation "Combines strings and integers"))
(defmethod combine ((a string) (b integer))
(concatenate 'string a (write-to-string b)))
(defmethod combine ((a integer) (b string))
(concatenate 'string (write-to-string a) b))
(combine 100 "500")
Here's one way to define the datatype:
data StringIntPair = StringInt String Int |
IntString Int String
deriving (Show, Eq, Ord)
Note that I've defined two constructors for type StringIntPair, and they are StringInt and IntString.
Now in the definition of combine:
combine :: StringIntPair -> String
combine (StringInt s i) = s ++ (show i)
combine (IntString i s) = (show i) ++ s
I'm using pattern matching to match the constructors and select the correct behavior.
Here are some examples of usage:
*Main> let y = StringInt "abc" 123
*Main> let z = IntString 789 "a string"
*Main> combine y
"abc123"
*Main> combine z
"789a string"
*Main> :t y
y :: StringIntPair
*Main> :t z
z :: StringIntPair
A few things to note about the examples:
StringIntPair is a type; doing :t <expression> in the interpreter shows the type of an expression
StringInt and IntString are constructors of the same type
the vertical bar (|) separates constructors
a well-written function should match each constructor of its argument's types; that's why I've written combine with two patterns, one for each constructor
data StringIntPair = StringInt String Int
| IntString Int String
combine :: StringIntPair -> String
combine (StringInt s i) = s ++ (show i)
combine (IntString i s) = (show i) ++ s
So it can be used like that:
> combine $ StringInt "asdf" 3
"asdf3"
> combine $ IntString 4 "fasdf"
"4fasdf"
Since Haskell is strongly typed, you always know what type a variable has. Additionally, you will never know more. For instance, consider the function length that calculates the length of a list. It has the type:
length :: [a] -> Int
That is, it takes a list of arbitrary a (although all elements have the same type) and returns and Int. The function may never look inside one of the lists node and inspect what is stored in there, since it hasn't and can't get any informations about what type that stuff stored has. This makes Haskell pretty efficient, since, as opposed to typical OOP languages such as Java, no type information has to be stored at runtime.
To make it possible to have different types of variables in one parameter, one can use an Algebraic Data Type (ADT). One, that stores either a String and an Int or an Int and a String can be defined as:
data StringIntPair = StringInt String Int
| IntString Int String
You can find out about which of the two is taken by pattern matching on the parameter. (Notice that you have only one, since both the string and the in are encapsulated in an ADT):
combine :: StringIntPair -> String
combine (StringInt str int) = str ++ show int
combine (IntString int str) = show int ++ str

Haskell Access Tuple Data Inside List Comprehension

I have defined a custom type as follows:
-- Atom reference number, x coordinate, y coordinate, z coordinate, element symbol,
-- atom name, residue sequence number, amino acid abbreviation
type Atom = (Int, Double, Double, Double, Word8, ByteString, Int, ByteString)
I would like to gather all of the atoms with a certain residue sequence number nm.
This would be nice:
[x | x <- p, d == nm]
where
(_, _, _, _, _, _, d, _) = x
where p is a list of atoms.
However, this does not work because I can not access the variable x outside of the list comprehension, nor can I think of a way to access a specific tuple value from inside the list comprehension.
Is there a tuple method I am missing, or should I be using a different data structure?
I know I could write a recursive function that unpacks and checks every tuple in the list p, but I am actually trying to use this nested inside an already recursive function, so I would rather not need to introduce that complexity.
This works:
[x | (_, _, _, _, _, _, d, _) <- p, d == nm]
However, you should really define your own data type here. A three-element tuple is suspicious; an eight-element tuple is very bad news indeed. Tuples are difficult to work with and less type-safe than data types (if you represent two different kinds of data with two tuples with the same element types, they can be used interchangeably). Here's how I'd write Atom as a record:
data Point3D = Point3D Double Double Double
data Atom = Atom
{ atomRef :: Int
, atomPos :: Point3D
, atomSymbol :: Word8
, atomName :: ByteString
, atomSeqNum :: Int
, atomAcidAbbrev :: ByteString
} deriving (Eq, Show)
(The "atom" prefix is to avoid clashing with the names of fields in other records.)
You can then write the list comprehension as follows:
[x | x <- p, atomSeqNum x == nm]
As a bonus, your definition of Atom becomes self-documenting, and you reap the benefits of increased type safety. Here's how you'd create an Atom using this definition:
myAtom = Atom
{ atomRef = ...
, atomPos = ...
, ... etc. ...
}
By the way, it's probably a good idea to make some of the fields of these types strict, which can be done by putting an exclamation mark before the type of the field; this helps avoid space leaks from unevaluated thunks building up. For instance, since it doesn't make much sense to evaluate a Point3D without also evaluating all its components, I would instead define Point3D as:
data Point3D = Point3D !Double !Double !Double
It would probably be a good idea to make all the fields of Atom strict too, although perhaps not all of them; for example, the ByteString fields should be left non-strict if they're generated by the program, not always accessed and possibly large. On the other hand, if their values are read from a file, then they should probably be made strict.
You should definitely use a different structure. Instead of using a tuple, take a look at records.
data Atom = Atom { reference :: Int
, position :: (Double, Double, Double)
, symbol :: Word8
, name :: ByteString
, residue :: Int
, abbreviation :: ByteString
}
You can then do something like this:
a = Atom ...
a {residue=10} -- this is now a with a residue of 10

Resources