Can I name a function signature? - haskell

I'm passing around a partially applied function. The full signature is:
import Data.Map as Map
-- Update the correct bin of the histogram based on the min value, bin width,
-- the histogram stored as a map, and the actual value we are interested in.
updateHist :: Double -> Double -> Map.Map Bin Double -> Double ->
Map.Map Bin Double
The function updates a Map which stores data for a histogram. The first two parameters give the bottom bounds of data we are interested, the next is the bin width for the histogram. I fill these values in when the program starts up and pass the partially applied function all over the module. This means I have a ton of functions with a signature like:
-- Extra the data out of the string and update the histogram (in the Map) with it.
doSomething :: String -> (Map.Map Bin Double -> Double -> Map.Map Bin Double) ->
Map.Map Bin Double
This is all fine and dandy, but writing "(Map.Map Bin Double -> Double -> Map.Map Bin Double)" is rather verbose. I'd like to replace them all with "UpdateHistFunc" as a type but for some reason I keep failing.
I tried:
newtype UpdateHistFunc = Map.Map Bin Double -> Double -> Map.Map Bin Double
This failed with the error:
HistogramForColumn.hs:84:44: parse error on input `->'
What am I doing wrong?

Are you confusing type and newtype here?
Using type defines a type synonym, which is what you seem to be trying to do, whereas newtype creates a new type that needs a constructor name, like with data.
In other words, you probably want this:
type UpdateHistFunc = Map.Map Bin Double -> Double -> Map.Map Bin Double
...or maybe this:
newtype UpdateHistFunc = UpdateHistFunc (Map.Map Bin Double -> Double -> Map.Map Bin Double)
The latter obviously needs to be "unwrapped" in order to apply the function.
For reference:
data defines a new algebraic data type, which can be recursive, have distinct instances of type classes, introduces an extra layer of possible laziness, all that stuff.
newtype defines a data type with a single constructor taking a single argument, which can be recursive and have distinct instances, but only for type checking; after compilation, it's equivalent to the type it contains.
type defines a type synonym, which can't be recursive or have distinct instances, is fully expanded when type checking, and amounts to little more than a macro.
If you're wondering about the semantic distinction between data and newtype where "extra laziness" is concerned, compare these two types and the possible values they can have:
data DType = DCon DType
newtype NType = NCon NType
For instance, what do you think these functions will do if applied to undefined vs. DCon undefined and NCon undefined, respectively?
fd (DCon x) = x
fn (NCon x) = x

Related

Clarifying Data Constructor in Haskell

In the following:
data DataType a = Data a | Datum
I understand that Data Constructor are value level function. What we do above is defining their type. They can be function of multiple arity or const. That's fine. I'm ok with saying Datum construct Datum. What is not that explicit and clear to me here is somehow the difference between the constructor function and what it produce. Please let me know if i am getting it well:
1 - a) Basically writing Data a, is defining both a Data Structure and its Constructor function (as in scala or java usually the class and the constructor have the same name) ?
2 - b) So if i unpack and make an analogy. With Data a We are both defining a Structure(don't want to use class cause class imply a type already i think, but maybe we could) of object (Data Structure), the constructor function (Data Constructor/Value constructor), and the later return an object of that object Structure. Finally The type of that Structure of object is given by the Type constructor. An Object Structure in a sense is just a Tag surrounding a bunch value of some type. Is my understanding correct ?
3 - c) Can I formally Say:
Data Constructor that are Nullary represent constant values -> Return the the constant value itself of which the type is given by the Type Constructor at the definition site.
Data Constructor that takes an argument represent class of values, where class is a Tag ? -> Return an infinite number of object of that class, of which the type is given by the Type constructor at the definition site.
Another way of writing this:
data DataType a = Data a | Datum
Is with generalised algebraic data type (GADT) syntax, using the GADTSyntax extension, which lets us specify the types of the constructors explicitly:
{-# LANGUAGE GADTSyntax #-}
data DataType a where
Data :: a -> DataType a
Datum :: DataType a
(The GADTs extension would work too; it would also allow us to specify constructors with different type arguments in the result, like DataType Int vs. DataType Bool, but that’s a more advanced topic, and we don’t need that functionality here.)
These are exactly the types you would see in GHCi if you asked for the types of the constructor functions with :type / :t:
> :{
| data DataType a where
| Data :: a -> DataType a
| Datum :: DataType a
| :}
> :type Data
Data :: a -> DataType a
> :t Datum
Datum :: DataType a
With ExplicitForAll we can also specify the scope of the type variables explicitly, and make it clearer that the a in the data definition is a separate variable from the a in the constructor definitions by also giving them different names:
data DataType a where
Data :: forall b. b -> DataType b
Datum :: forall c. DataType c
Some more examples of this notation with standard prelude types:
data Either a b where
Left :: forall a b. a -> Either a b
Right :: forall a b. b -> Either a b
data Maybe a where
Nothing :: Maybe a
Just :: a -> Maybe a
data Bool where
False :: Bool
True :: Bool
data Ordering where
LT, EQ, GT :: Ordering -- Shorthand for repeated ‘:: Ordering’
I understand that Data Constructor are value level function. What we do above is defining their type. They can be function of multiple arity or const. That's fine. I'm ok with saying Datum construct Datum. What is not that explicit and clear to me here is somehow the difference between the constructor function and what it produce.
Datum and Data are both “constructors” of DataType a values; neither Datum nor Data is a type! These are just “tags” that select between the possible varieties of a DataType a value.
What is produced is always a value of type DataType a for a given a; the constructor selects which “shape” it takes.
A rough analogue of this is a union in languages like C or C++, plus an enumeration for the “tag”. In pseudocode:
enum Tag {
DataTag,
DatumTag,
}
// A single anonymous field.
struct DataFields<A> {
A field1;
}
// No fields.
struct DatumFields<A> {};
// A union of the possible field types.
union Fields<A> {
DataFields<A> data;
DatumFields<A> datum;
}
// A pair of a tag with the fields for that tag.
struct DataType<A> {
Tag tag;
Fields<A> fields;
}
The constructors are then just functions returning a value with the appropriate tag and fields. Pseudocode:
<A> DataType<A> newData(A x) {
DataType<A> result;
result.tag = DataTag;
result.fields.data.field1 = x;
return result;
}
<A> DataType<A> newDatum() {
DataType<A> result;
result.tag = DatumTag;
// No fields.
return result;
}
Unions are unsafe, since the tag and fields can get out of sync, but sum types are safe because they couple these together.
A pattern-match like this in Haskell:
case someDT of
Datum -> f
Data x -> g x
Is a combination of testing the tag and extracting the fields. Again, in pseudocode:
if (someDT.tag == DatumTag) {
f();
} else if (someDT.tag == DataTag) {
var x = someDT.fields.data.field1;
g(x);
}
Again this is coupled in Haskell to ensure that you can only ever access the fields if you have checked the tag by pattern-matching.
So, in answer to your questions:
1 - a) Basically writing Data a, is defining both a Data Structure and its Constructor function (as in scala or java usually the class and the constructor have the same name) ?
Data a in your original code is not defining a data structure, in that Data is not a separate type from DataType a, it’s just one of the possible tags that a DataType a value may have. Internally, a value of type DataType Int is one of the following:
The tag for Data (in GHC, a pointer to an “info table” for the constructor), and a reference to a value of type Int.
x = Data (1 :: Int) :: DataType Int
+----------+----------------+ +---------+----------------+
x ---->| Data tag | pointer to Int |---->| Int tag | unboxed Int# 1 |
+----------+----------------+ +---------+----------------+
The tag for Datum, and no other fields.
y = Datum :: DataType Int
+-----------+
y ----> | Datum tag |
+-----------+
In a language with unions, the size of a union is the maximum of all its alternatives, since the type must support representing any of the alternatives with mutation. In Haskell, since values are immutable, they don’t require any extra “padding” since they can’t be changed.
It’s a similar situation for standard data types, e.g., a product or sum type:
(x :: X, y :: Y) :: (X, Y)
+---------+--------------+--------------+
| (,) tag | pointer to X | pointer to Y |
+---------+--------------+--------------+
Left (m :: M) :: Either M N
+-----------+--------------+
| Left tag | pointer to M |
+-----------+--------------+
Right (n :: N) :: Either M N
+-----------+--------------+
| Right tag | pointer to N |
+-----------+--------------+
2 - b) So if i unpack and make an analogy. With Data a We are both defining a Structure(don't want to use class cause class imply a type already i think, but maybe we could) of object (Data Structure), the constructor function (Data Constructor/Value constructor), and the later return an object of that object Structure. Finally The type of that Structure of object is given by the Type constructor. An Object Structure in a sense is just a Tag surrounding a bunch value of some type. Is my understanding correct ?
This is sort of correct, but again, the constructors Data and Datum aren’t “data structures” by themselves. They’re just the names used to introduce (construct) and eliminate (match) values of type DataType a, for some type a that is chosen by the caller of the constructors to fill in the forall
data DataType a = Data a | Datum says:
If some term e has type T, then the term Data e has type DataType T
Inversely, if some value of type DataType T matches the pattern Data x, then x has type T in the scope of the match (case branch or function equation)
The term Datum has type DataType T for any type T
3 - c) Can I formally Say:
Data Constructor that are Nullary represent constant values -> Return the the constant value itself of which the type is given by the Type Constructor at the definition site.
Data Constructor that takes an argument represent class of values, where class is a Tag ? -> Return an infinite number of object of that class, of which the type is given by the Type constructor at the definition site.
Not exactly. A type constructor like DataType :: Type -> Type, Maybe :: Type -> Type, or Either :: Type -> Type -> Type, or [] :: Type -> Type (list), or a polymorphic data type, represents an “infinite” family of concrete types (Maybe Int, Maybe Char, Maybe (String -> String), …) but only in the same way that id :: forall a. a -> a represents an “infinite” family of functions (id :: Int -> Int, id :: Char -> Char, id :: String -> String, …).
That is, the type a here is a parameter filled in with an argument value given by the caller. Usually this is implicit, through type inference, but you can specify it explicitly with the TypeApplications extension:
-- Akin to: \ (a :: Type) -> \ (x :: a) -> x
id :: forall a. a -> a
id x = x
id #Int :: Int -> Int
id #Int 1 :: Int
Data :: forall a. a -> DataType a
Data #Char :: Char -> DataType Char
Data #Char 'x' :: DataType Char
The data constructors of each instantiation don’t really have anything to do with each other. There’s nothing in common between the instantiations Data :: Int -> DataType Int and Data :: Char -> DataType Char, apart from the fact that they share the same tag name.
Another way of thinking about this in Java terms is with the visitor pattern. DataType would be represented as a function that accepts a “DataType visitor”, and then the constructors don’t correspond to separate data types, they’re just the methods of the visitor which accept the fields and return some result. Writing the equivalent code in Java is a worthwhile exercise, but here it is in Haskell:
{-# LANGUAGE RankNTypes #-}
-- (Allows passing polymorphic functions as arguments.)
type DataType a
= forall r. -- A visitor with a generic result type
r -- With one “method” for the ‘Datum’ case (no fields)
-> (a -> r) -- And one for the ‘Data’ case (one field)
-> r -- Returning the result
newData :: a -> DataType a
newData field = \ _visitDatum visitData -> visitData field
newDatum :: DataType a
newDatum = \ visitDatum _visitData -> visitDatum
Pattern-matching is simply running the visitor:
matchDT :: DataType a -> b -> (a -> b) -> b
matchDT dt visitDatum visitData = dt visitDatum visitData
-- Or: matchDT dt = dt
-- Or: matchDT = id
-- case someDT of { Datum -> f; Data x -> g x }
-- f :: r
-- g :: a -> r
-- someDT :: DataType a
-- :: forall r. r -> (a -> r) -> r
someDT f (\ x -> g x)
Similarly, in Haskell, data constructors are just the ways of introducing and eliminating values of a user-defined type.
What is not that explicit and clear to me here is somehow the difference between the constructor function and what it produce
I'm having trouble following your question, but I think you are complicating things. I would suggest not thinking too deeply about the "constructor" terminology.
But hopefully the following helps:
Starting simple:
data DataType = Data Int | Datum
The above reads "Declare a new type named DataType, which has the possible values Datum or Data <some_number> (e.g. Data 42)"
So e.g. Datum is a value of type DataType.
Going back to your example with a type parameter, I want to point out what the syntax is doing:
data DataType a = Data a | Datum
^ ^ ^ These things appear in type signatures (type level)
^ ^ These things appear in code (value level stuff)
There's a bit of punning happening here. so in the data declaration you might see "Data Int" and this is mixing type-level and value-level stuff in a way that you wouldn't see in code. In code you'd see e.g. Data 42 or Data someVal.
I hope that helps a little...

List of a Type Classe instance

I've been playing around with Haskell type classes and I am facing a problem I hope someone could help me to solve. Consider that I come from a Swift background and "trying" to port some of protocol oriented knowledge to Haskell code.
Initially I declared a bunch of JSON parsers which had the same structure, just a different implementation:
data Candle = Candle {
mts :: Integer,
open :: Double,
close :: Double
}
data Bar = Bar {
mts :: Integer,
min :: Double,
max :: Double
}
Then I decided to create a "Class" that would define their basic operations:
class GenericData a where
dataName :: a -> String
dataIdentifier :: a -> Double
dataParsing :: a -> String -> Maybe a
dataEmptyInstance :: a
instance GenericData Candle where
dataName _ = "Candle"
dataIdentifier = fromInteger . mts
dataParsing _ = candleParsing
dataEmptyInstance = emptyCandle
instance GenericData Bar where
dataName _ = "Bar"
dataIdentifier = fromInteger . mts
dataParsing _ = barParsing
dataEmptyInstance = emptyBar
My first code smell was the need to include "a" when it was not needed (dataName or dataParsing) but then I proceded.
analyzeArguments :: GenericData a => [] -> [String] -> Maybe (a, [String])
analyzeArguments [] _ = Nothing
analyzeArguments _ [] = Nothing
analyzeArguments name data
| name == "Candles" = Just (head possibleCandidates, data)
| name == "Bar" = Just (last possibleRecordCandidates, data)
| otherwise = Nothing
possibleCandidates :: GenericData a => [a]
possibleCandidates = [emptyCandle, emptyBar]
Now, when I want to select if either instance should be selected to perform parsing, I always get the following error
• Couldn't match expected type ‘a’ with actual type ‘Candle’
‘a’ is a rigid type variable bound by
the type signature for:
possibleCandidates :: forall a. GenericData a => [a]
at src/GenericRecords.hs:42:29
My objective was to create a list of instances of GenericData because other functions depend on that being selected to execute the correct dataParser. I understand this has something to do with the type class checker, the * -> Constraint, but still not finding a way to solve this conflict. I have used several GHC language extensions but none has solved the problem.
You have a type signature:
possibleCandidates :: GenericData a => [a]
Which you might thing implies that you can put anything in that list as long as it is GenericData. But that is not the way Haskell's type system actually works. The value possibleCandidates can be a list of any type which has a GenericData class but every element of the list must be of the same type.
What the GHC error message is telling you (in its own special way) is that the first element of the list is a Candle so it thinks that the rest of the list should also be of type Candle but the second element is actually a Bar.
Now there are ways to make heterogeneous lists (and other collections) in Haskell, but it is almost never the right thing to do.
One typical solution to this problem is to just merge everything down into one sum data type:
data GenericData = GenericCandle Candle | GenericBar Bar
You could even forgo the step of indirection and just put the Candle and Bar data directly into the data structure.
Now instead f a class you just have a datatype and your class functions become normal functions:
dataName :: GenericData -> String
dataIdentifier :: GenericData -> Double
dataParsing :: GenericData -> String -> Maybe a
dataEmptyInstance :: String -> GenericData
There are some other more complex ways to make this work, but if a sum data type fits the bill, use it. It is very common for parsers in Haskell to have a large sum data type (usually also recursive) as their result. Take a look at the Value type in Aeson the standard JSON library for an example.

What is this type?

Haskell novice here. I know from type classes that =>means "in the context of". Yet, I can't read the following type, found in module Statistics.Sample
(Vector v (Double, Double), Vector v Double) => v (Double, Double) -> Double
What constraints are being applied on v left of => ?
The Data.Vector.Generic.Vector typeclass takes two type arguments, v and a where v :: * -> * is the type of the container and a :: * is the type of the elements in the container. This is simply a generic interface for the vector types defined in the vector package, notably Data.Vector.Unboxed.Vector.
This is essentially saying that the type v must be able to hold (Double, Double) and Double, although not simultaneously. If you were to use v ~ Data.Vector.Unboxed.Vector then this works just fine. The reason is due to the implementation of correlation, which uses unzip. This function splits a v (a, b) into (v a, v b). Since correlation is working on v (Double, Double), it needs the additional constraint that v can hold Doubles.
This generic type is meant to make the correlation function work with more types than Data.Vector.Vector, including any vector style types that might be implemented in other libraries.
I want to stress that these constraints
Data.Vector.Generic.Vector v (Double, Double)
Data.Vector.Generic.Vector v Double
State that whatever type you choose for v is capable of holding (Double, Double) and is also capable of holding Double. This is specifying certain prerequisites for your vector type, not the actual contents of the vector. The actual contents of the vector is specified in the first argument to the correlation function.

Haskell generic data structure

I want to create a type to store some generic information, as for me, this type is
Molecule, where i store chemical graph and molecular properties.
data Molecule = Molecule {
name :: Maybe String,
graph :: Gr Atom Bond,
property :: Maybe [Property] -- that's a question
} deriving(Show)
Properties I want to represent as tuple
type Property a = (String,a)
because a property may have any type: Float, Int, String e.t.c.
The question is how to form Molecule data structure, so I will be able to collect any numbers of any types of properties in Molecule. If I do
data Molecule a = Molecule {
name :: Maybe String,
graph :: Gr Atom Bond,
property :: Maybe [Property a]
} deriving(Show)
I have to diretly assign one type when I create a molecule.
If you know in advance the set of properties a molecule might have, you could define a sum type:
data Property = Mass Float | CatalogNum Int | Comment String
If you want this type to be extensible, you could use Data.Dynamic as another answer suggests. For instance:
data Molecule = Molecule { name :: Maybe String,
graph :: Gr Atom Bond,
property :: [(String,Dynamic)]
} deriving (Show)
mass :: Molecule -> Maybe Float
mass m = case lookup "mass" (property m) of
Nothing -> Nothing
Just i -> fromDynamic i
You could also get rid of the "stringly-typed" (String,a) pairs, say:
-- in Molecule:
-- property :: [Dynamic]
data Mass = Mass Float
mass :: Molecule -> Maybe Mass
mass m = ...
Neither of these attempts gives much type safety over just parsing out of (String,String) pairs since there is no way to enforce the invariant that the user creates well-formed properties (short of wrapping properties in a new type and hiding the constructors in another module, which again breaks extensibility).
What you might want are Ocaml-style polymorphic variants. You could look at Vinyl, which provides type-safe extensible records.
As an aside, you might want to get rid of the Maybe wrapper around the list of properties, since the empty list already encodes the case of no properties.
You might want to look at Data.Dynamic for a psudo-dynamic typing solution.

Substitution Algorithm in Haskell

I'm trying to write a substitution algorithm in Haskell.
I have defined a polymorphic data type Subst a with a single constructor S::[(String, a)] -> Subst a as so:
data Subst a = S [(String, a)]
I now want to write a function single::String -> a -> Subst a for constructing a substitution for only a single variable
This is what I tried:
single::String -> a -> Subst a
single s1 (Subst a) = s1 a
However, I'm getting this error: Not in scope: data constructor 'Subst'
Does anyone have insight to what i'm doing wrong?
The data constructor is not the same thing as the type constuctor
In your code the type constructor is Subst the data constructor is S
Type constructors are used to create new types, e.g. in data Foo = Foo (Maybe Int) Maybe is a type constructor, Foo is the data constructor (as well as the type constructor, but they can be named differently as you discovered). Data constructors are used to create instances of types (also don't confuse this with creating an instance of a polymorphic type, e.g. Int -> Int is an instance of a -> a).
So you need to use S when you want to pattern match in your single function. Not Subst.
Hopefully that makes sense, if not, please tell me :)
P.S. data constructors are, for all intents and purposes, functions, which means you can do the same things with them that you'd typically do with functions. E.g. you can do map Bar [a,b,c] and it will apply the data constructor to each element.
single :: String -> a -> Subst a
single str a = S [(str, a)]
The [(str, a)] part creates a list with one element. That element is a tuple (or "pair"), with str as the left part of the tuple and a as the right part of the tuple. The above function then wraps that single-element list in the S constructor to create a value of type Subst a.
The result is a list that contains the rule for a single substitution from str to the value a.

Resources