Avoiding boilerplate code due to equivalent constructors

Avoiding boilerplate code due to equivalent constructors - haskell

I have an ADT as follows:
Prelude> data Bond = FixedRateBond Float Float | FloatingRateBond Float Float
I want to do an operation on every value constructors of this ADT as follows:
Prelude> let foo :: Bond -> Float
Prelude| foo (FixedRateBond a b) = a + b
Prelude| foo (FloatingRateBond a b) = a + b
As you can see I have code duplication here; for every value I have a + b. I will have more value constructors so this is going to be repeated even more. To me this is code smell, but I don't know how I would refactor it to eliminate the duplicated code. Is there a functional way to avoid this repeated code? This is a trivial example as I have stripped down the real problem to bare essentials to explain the problem.

You're correct. This is a code smell, and it's actually a very common modelling mistake. All you need to do is just factor the rate-type out. E.g.,
data RateType = Fixed | Floating
data Bond = Bond RateType Float Float
Then you'll have
foo :: Bond -> Float
foo (Bond _ a b) = a + b
atop of other benefits like RateType now actually being a type, which you can have Enum and Bounded instances for.
Basically, the rule of thumb here is: if you have multiple constructors implementing the same thing, there must be an enum asking to be factored out.

Related

Algebraic Data Types in Haskell

Hello I have only worked with imperative programming so far and I am learning Haskell :)
I have the following algebraic data types:
data Day = I | ... | XXXI deriving(Ord,Eq) (Days in roman numerals)
data Month = Jan | ... | Dec deriving(Ord,Eq) (Months abbreviated to 3 letters)
data Year = 0 | ... | 2021 (actually Ints)
and I need to do some calculations with those. My first thought was mapping the Days and Months to Ints and do the calculations from there. For example:
dayConversionMap = [(I,1), (II,2), (III,3), (IV,4), ... , (XXXI,31)]
monthConversionMap = [(Jan,1), (Feb,2), (Mar,3), ... , (Dec,12)]
My question is:
Is this a good solution for my problem?
How could i convert the Days/Months into Ints in a function given I have those maps.
Thanks in advance! :)

Don't, and I repeat don't do date and time calculations yourself. Ship out to a good library. See also Falsehoods programmers believe about time for an incomplete list of reasons to do this.

Answering your questions:
There are many ways of doing what you are looking for, and a practical way would be to use a library, as already pointed out. Even better, find a library and read the source code. Hoogle is your friend.
But For learning Haskell purposes:
Instead of mapping them manually you could try provide a function. And because this is a behaviour you want more types to share you could create a Type Class. One naive (maybe didactic, less so practical, maybe fun to play with) way would be:
Define a Class that provides a way to convert to and from Int, let's call it ToInt. Now you have a common interface of getting that behaviour out of all your Types
class ToInt a where
toInt :: a -> Int
fromInt :: Int -> Maybe a
Now you can implement your to and from for your types and use them to convert.
instance ToInt Day where
toInt x =
case x of
I -> 1
II -> 2
-- ...
fromInt x =
case x of
1 -> I
-- ...
instance ToInt Month where
-- ...
calculation :: Day -> Month -> Year -> Int
calculation d m y = doSomething (toInt d)
where
doSomething :: Int -> Int
doSomething = ...
Note that this is a simple but bad example.
You can see the tedious nature of both implementation, and the limited help you get from the type checker. But once you start implementing something, you get a feel for what works and what not. Then you can search for how others deal with these issues in the actual libraries, for example.

Is it bad form to make new types/datas for clarity? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I would like to know if it is bad form to do something like this:
data Alignment = LeftAl | CenterAl | RightAl
type Delimiter = Char
type Width = Int
setW :: Width -> Alignment -> Delimiter -> String -> String
Rather than something like this:
setW :: Int -> Char -> Char -> String -> String
I do know that remaking those types effectively does nothing but take up a few lines in exchange for clearer code. However, if I use the type Delimiter for multiple functions, this would be much clearer to someone importing this module, or reading the code later.
I am relatively new to Haskell so I do not know what is good practice for this type of stuff. If this is not a good idea, or there is something that would improve clarity that is preferred, what would that be?

You're using type aliases, they only slightly help with code readability. However, it's better to use newtype instead of type for better type-safety. Like this:
data Alignment = LeftAl | CenterAl | RightAl
newtype Delimiter = Delimiter { unDelimiter :: Char }
newtype Width = Width { unWidth :: Int }
setW :: Width -> Alignment -> Delimiter -> String -> String
You will deal with extra wrapping and unwrapping of newtype. But the code will be more robust against further refactorings. This style guide suggests to use type only for specializing polymorphic types.

I wouldn't consider that bad form, but clearly, I don't speak for the Haskell community at large. The language feature exists, as far as I can tell, for that particular purpose: to make the code easier to read.
One can find examples of the use of type aliases in various 'core' libraries. For example, the Read class defines this method:
readList :: ReadS [a]
The ReadS type is just a type alias
type ReadS a = String -> [(a, String)]
Another example is the Forest type in Data.Tree:
type Forest a = [Tree a]
As Shersh points out, you can also wrap new types in newtype declarations. That's often useful if you need to somehow constrain the original type in some way (e.g. with smart constructors) or if you want to add functionality to a type without creating orphan instances (a typical example is to define QuickCheck Arbitrary instances to types that don't otherwise come with such an instance).

Using newtype—which creates a new type with the same representation as the underlying type but not substitutable with it— is considered good form. It's a cheap way to avoid primitive obsession, and it's especially useful for Haskell because in Haskell the names of function arguments are not visible in the signature.
Newtypes can also be a place on which to hang useful typeclass instances.
Given that newtypes are ubiquitous in Haskell, over time the language has gained some tools and idioms to manipulate them:
Coercible A "magical" typeclass that simplifies conversions between newtypes and their underlying types, when the newtype constructor is in scope. Often useful to avoid boilerplate in function implementations.
ghci> coerce (Sum (5::Int)) :: Int
ghci> coerce [Sum (5::Int)] :: [Int]
ghci> coerce ((+) :: Int -> Int -> Int) :: Identity Int -> Identity Int -> Identity Int
ala. An idiom (implemented in various packages) that simplifies the selection of a newtype that we might want to use with functions like foldMap.
ala Sum foldMap [1,2,3,4 :: Int] :: Int
GeneralizedNewtypeDeriving. An extension for auto-deriving instances for your newtype based on instances available in the underlying type.
DerivingVia A more general extension, for auto-deriving instances for your newtype based on instances available in some other newtype with the same underlying type.

One important thing to note is that Alignment versus Char is not just a matter of clarity, but one of correctness. Your Alignment type expresses the fact that there are only three valid alignments, as opposed to however many inhabitants Char has. By using it, you avoid trouble with invalid values and operations, and also enable GHC to informatively tell you about incomplete pattern matches if warnings are turned on.
As for the synonyms, opinions vary. Personally, I feel type synonyms for small types like Int can increase cognitive load, by making you track different names for what is rigorously the same thing. That said, leftaroundabout makes a great point in that this kind of synonym can be useful in the early stages of prototyping a solution, when you don't necessarily want to worry about the details of the concrete representation you are going to adopt for your domain objects.
(It is worth mentioning that the remarks here about type largely don't apply to newtype. The use cases are different, though: while type merely introduces a different name for the same thing, newtype introduces a different thing by fiat. That can be a surprisingly powerful move -- see danidiaz's answer for further discussion.)

Definitely is good, and here is another example, supose you have this data type with some op:
data Form = Square Int | Rectangle Int Int | EqTriangle Int
perimeter :: Form -> Int
perimeter (Square s) = s * 4
perimeter (Rectangle b h) = (b * h) * 2
perimeter (EqTriangle s) = s * 3
area :: Form -> Int
area (Square s) = s ^ 2
area (Rectangle b h) = (b * h)
area (EqTriangle s) = (s ^ 2) `div` 2
Now imagine you add the circle:
data Form = Square Int | Rectangle Int Int | EqTriangle Int | Cicle Int
add its operations:
perimeter (Cicle r ) = pi * 2 * r
area (Cicle r) = pi * r ^ 2
it is not very good right? Now I want to use Float... I have to change every Int for Float
data Form = Square Double | Rectangle Double Double | EqTriangle Double | Cicle Double
area :: Form -> Double
perimeter :: Form -> Double
but, what if, for clarity and even for reuse, I use type?
data Form = Square Side | Rectangle Side Side | EqTriangle Side | Cicle Radius
type Distance = Int
type Side = Distance
type Radius = Distance
type Area = Distance
perimeter :: Form -> Distance
perimeter (Square s) = s * 4
perimeter (Rectangle b h) = (b * h) * 2
perimeter (EqTriangle s) = s * 3
perimeter (Cicle r ) = pi * 2 * r
area :: Form -> Area
area (Square s) = s * s
area (Rectangle b h) = (b * h)
area (EqTriangle s) = (s * 2) / 2
area (Cicle r) = pi * r * r
That allows me to change the type only changing one line in the code, supose I want the Distance to be in Int, I will only change that
perimeter :: Form -> Distance
...
totalDistance :: [Form] -> Distance
totalDistance = foldr (\x rs -> perimeter x + rs) 0
I want the Distance to be in Float, so I just change:
type Distance = Float
If I want to change it to Int, I have to make some adjustments in the functions, but thats other issue.

Why do We Need Sum Types?

Imagine a language which doesn't allow multiple value constructors for a data type. Instead of writing
data Color = White | Black | Blue
we would have
data White = White
data Black = Black
data Blue = Black
type Color = White :|: Black :|: Blue
where :|: (here it's not | to avoid confusion with sum types) is a built-in type union operator. Pattern matching would work in the same way
show :: Color -> String
show White = "white"
show Black = "black"
show Blue = "blue"
As you can see, in contrast to coproducts it results in a flat structure so you don't have to deal with injections. And, unlike sum types, it allows to randomly combine types resulting in greater flexibility and granularity:
type ColorsStartingWithB = Black :|: Blue
I believe it wouldn't be a problem to construct recursive data types as well
data Nil = Nil
data Cons a = Cons a (List a)
type List a = Cons a :|: Nil
I know union types are present in TypeScript and probably other languages, but why did the Haskell committee chose ADTs over them?

Haskell's sum type is very similar to your :|:.
The difference between the two is that the Haskell sum type | is a tagged union, while your "sum type" :|: is untagged.
Tagged means every instance is unique - you can distunguish Int | Int from Int (actually, this holds for any a):
data EitherIntInt = Left Int | Right Int
In this case: Either Int Int carries more information than Int because there can be a Left and Right Int.
In your :|:, you cannot distinguish those two:
type EitherIntInt = Int :|: Int
How do you know if it was a left or right Int?
See the comments for an extended discussion of the section below.
Tagged unions have another advantage: The compiler can verify whether you as the programmer handled all cases, which is implementation-dependent for general untagged unions. Did you handle all cases in Int :|: Int? Either this is isomorphic to Int by definition or the compiler has to decide which Int (left or right) to choose, which is impossible if they are indistinguishable.
Consider another example:
type (Integral a, Num b) => IntegralOrNum a b = a :|: b -- untagged
data (Integral a, Num b) => IntegralOrNum a b = Either a b -- tagged
What is 5 :: IntegralOrNum Int Double in the untagged union? It is both an instance of Integral and Num, so we can't decide for sure and have to rely on implementation details. On the other hand, the tagged union knows exactly what 5 should be because it is branded with either Left or Right.
As for naming: The disjoint union in Haskell is a union type. ADTs are only a means of implementing these.

I will try to expand the categorical argument mentioned by #BenjaminHodgson.
Haskell can be seen as the category Hask, in which objects are types and morphisms are functions between types (disregarding bottom).
We can define a product in Hask as tuple - categorically speaking it meets the definition of the product:
A product of a and b is the type c equipped with projections p and q such that p :: c -> a and q :: c -> b and for any other candidate c' equipped with p' and q' there exists a morphism m :: c' -> c such that we can write p' as p . m and q' as q . m.
Read up on this in Bartosz' Category Theory for Programmers for further information.
Now for every category, there exists the opposite category, which has the same morphism but reverses all the arrows. The coproduct is thus:
The coproduct c of a and b is the type c equipped with injections i :: a -> c and j :: b -> c such that for all other candidates c' with i' and j' there exists a morphism m :: c -> c' such that i' = m . i and j' = m . j.
Let's see how the tagged and untagged union perform given this definition:
The untagged union of a and b is the type a :|: b such that:
i :: a -> a :|: b is defined as i a = a and
j :: b -> a :|: b is defined as j b = b
However, we know that a :|: a is isomorphic to a. Based on that observation we can define a second candidate for the product a :|: a :|: b which is equipped with the exact same morphisms. Therefore, there is no single best candidate, since the morphism m between a :|: a :|: b and a :|: b is id. id is a bijection, which implies that m is invertible and "convert" types either way. A visual representation of that argument. Replace p with i and q with j.
Restricting ourselves Either, as you can verify yourself with:
i = Left and
j = Right
This shows that the categorical complement of the product type is the disjoint union, not the set-based union.
The set union is part of the disjoint union, because we can define it as follows:
data Left a = Left a
data Right b = Right b
type DisjUnion a b = Left a :|: Right b
Because we have shown above that the set union is not a valid candidate for the coproduct of two types, we would lose many "free" properties (which follow from parametricity as leftroundabout mentioned) by not choosing the disjoint union in the category Hask (because there would be no coproduct).

This is an idea I've thought a lot about myself: a language with “first-class type algebra”. Pretty sure we could do about everything this way that we do in Haskell. Certainly if these disjunctions were, like Haskell alternatives, tagged unions; then you could directly rewrite any ADT to use them. In fact GHC can do this for you: if you derive a Generic instance, a variant type will be represented by a :+: construct, which is in essence just Either.
I'm not so sure if untagged unions would also do. As long as you require the types participating in a sum to be discernibly different, the explicit tagging should in principle not be necessary. The language would then need a convenient way to match on types at runtime. Sounds a lot like what dynamic languages do – obviously comes with quite some overhead though.
The biggest problem would be that if the types on both sides of :|: must be unequal then you lose parametricity, which is one of Haskell's nicest traits.

Given that you mention TypeScript, it is instructive to have a look at what its docs have to say about its union types. The example there starts from a function...
function padLeft(value: string, padding: any) { //etc.
... that has a flaw:
The problem with padLeft is that its padding parameter is typed as any. That means that we can call it with an argument that’s neither a number nor a string
One plausible solution is then suggested, and rejected:
In traditional object-oriented code, we might abstract over the two types by creating a hierarchy of types. While this is much more explicit, it’s also a little bit overkill.
Rather, the handbook suggests...
Instead of any, we can use a union type for the padding parameter:
function padLeft(value: string, padding: string | number) { // etc.
Crucially, the concept of union type is then described in this way:
A union type describes a value that can be one of several types.
A string | number value in TypeScript can be either of string type or of number type, as string and number are subtypes of string | number (cf. Alexis King's comment to the question). An Either String Int value in Haskell, however, is neither of String type nor of Int type -- its only, monomorphic, type is Either String Int. Further implications of that difference show up in the remainder of the discussion:
If we have a value that has a union type, we can only access members that are common to all types in the union.
In a roughly analogous Haskell scenario, if we have, say, an Either Double Int, we cannot apply (2*) directly on it, even though both Double and Int have instances of Num. Rather, something like bimap is necessary.
What happens when we need to know specifically whether we have a Fish? [...] we’ll need to use a type assertion:
let pet = getSmallPet();
if ((<Fish>pet).swim) {
(<Fish>pet).swim();
}
else {
(<Bird>pet).fly();
}
This sort of downcasting/runtime type checking is at odds with how the Haskell type system ordinarily works, even though it can be implemented using the very same type system (also cf. leftaroundabout's answer). In contrast, there is nothing to figure out at runtime about the type of an Either Fish Bird: the case analysis happens at value level, and there is no need to deal with anything failing and producing Nothing (or worse, null) due to runtime type mismatches.

Can GADTs be used to prove type inequalities in GHC?

So, in my ongoing attempts to half-understand Curry-Howard through small Haskell exercises, I've gotten stuck at this point:
{-# LANGUAGE GADTs #-}
import Data.Void
type Not a = a -> Void
-- | The type of type equality proofs, which can only be instantiated if a = b.
data Equal a b where
Refl :: Equal a a
-- | Derive a contradiction from a putative proof of #Equal Int Char#.
intIsNotChar :: Not (Equal Int Char)
intIsNotChar intIsChar = ???
Clearly the type Equal Int Char has no (non-bottom) inhabitants, and thus semantically there ought to be an absurdEquality :: Equal Int Char -> a function... but for the life of me I can't figure out any way to write one other than using undefined.
So either:
I'm missing something, or
There is some limitation of the language that makes this an impossible task, and I haven't managed to understand what it is.
I suspect the answer is something like this: the compiler is unable to exploit the fact that there are no Equal constructors that don't have a = b. But if that is so, what makes it true?

Here's a shorter version of Philip JF's solution, which is the way dependent type theorists have been refuting equations for years.
type family Discriminate x
type instance Discriminate Int = ()
type instance Discriminate Char = Void
transport :: Equal a b -> Discriminate a -> Discriminate b
transport Refl d = d
refute :: Equal Int Char -> Void
refute q = transport q ()
In order to show that things are different, you have to catch them behaving differently by providing a computational context which results in distinct observations. Discriminate provides exactly such a context: a type-level program which treats the two types differently.
It is not necessary to resort to undefined to solve this problem. Total programming sometimes involves rejecting impossible inputs. Even where undefined is available, I would recommend not using it where a total method suffices: the total method explains why something is impossible and the typechecker confirms; undefined merely documents your promise. Indeed, this method of refutation is how Epigram dispenses with "impossible cases" whilst ensuring that a case analysis covers its domain.
As for computational behaviour, note that refute, via transport is necessarily strict in q and that q cannot compute to head normal form in the empty context, simply because no such head normal form exists (and because computation preserves type, of course). In a total setting, we'd be sure that refute would never be invoked at run time. In Haskell, we're at least certain that its argument will diverge or throw an exception before we're obliged to respond to it. A lazy version, such as
absurdEquality e = error "you have a type error likely to cause big problems"
will ignore the toxicity of e and tell you that you have a type error when you don't. I prefer
absurdEquality e = e `seq` error "sue me if this happens"
if the honest refutation is too much like hard work.

I don't understand the problem with using undefined every type is inhabited by bottom in Haskell. Our language is not strongly normalizing... You are looking for the wrong thing. Equal Int Char leads to type errors not nice well kept exceptions. See
{-# LANGUAGE GADTs, TypeFamilies #-}
data Equal a b where
Refl :: Equal a a
type family Pick cond a b
type instance Pick Char a b = a
type instance Pick Int a b = b
newtype Picker cond a b = Picker (Pick cond a b)
pick :: b -> Picker Int a b
pick = Picker
unpick :: Picker Char a b -> a
unpick (Picker x) = x
samePicker :: Equal t1 t2 -> Picker t1 a b -> Picker t2 a b
samePicker Refl x = x
absurdCoerce :: Equal Int Char -> a -> b
absurdCoerce e x = unpick (samePicker e (pick x))
you could use this to create the function you want
absurdEquality e = absurdCoerce e ()
but that will produce undefined behavior as its computation rule. false should cause programs to abort, or at the very least run for ever. Aborting is the computation rule that is akin to turning minimal logic into intiutionistic logic by adding not. The correct definition is
absurdEquality e = error "you have a type error likely to cause big problems"
as to the question in the title: essentially no. To the best of my knowledge, type inequality is not representable in a practical way in current Haskell. Coming changes to the type system may lead to this getting nicer, but as of right now, we have equalities but not inequalites.

Am I thinking about and using singleton types in Haskell correctly?

I want to create several incompatible, but otherwise equal, datatypes. That is, I'd like to have a parameterized type Foo a, and functions such as
bar :: (Foo a) -> (Foo a) -> (Foo a)
without actually caring about what a is. To clarify further, I'd like the type system to stop me from doing
x :: Foo Int
y :: Foo Char
bar x y
while I at the same time don't really care about Int and Char (I only care that they're not the same).
In my actual code I have a type for polynomials over a given ring. I don't actually care what the indeterminates are, as long as the type system stops me from adding a polynomial in t with a polynomial in s. So far I've solved this by creating a typeclass Indeterminate, and parameterizing my polynomial type as
data (Ring a, Indeterminate b) => Polynomial a b
This approach feels perfectly natural for the Ring part because I do care about which particular ring a given polynomial is over. It feels very contrived for the Indeterminate part, as detailed below.
The above approach works fine, but feels contrived. Especially so this part:
class Indeterminate a where
indeterminate :: a
data T = T
instance Indeterminate T where
indeterminate = T
data S = S
instance Indeterminate S where
indeterminate = S
(and so on for perhaps a few more indeterminates). It feels weird and wrong. Essentially I'm trying to demand that instances of Indeterminate be singletons (in this sense). The feeling of weirdness is one indicator that I might be attacking this wrongly. Another is the fact that I end up having to annotate a lot of my Polynomial a bs since the actual type b often cannot be inferred (that's not strange, but is annoying nevertheless).
Any suggestions? Should I just keep on doing it like this, or am I missing something?
PS: Don't feel offended if I don't upvote or accept answers immediately. I'll be unable to check back in for a few days.

First of all, I'm not sure this:
data (Ring a, Indeterminate b) => Polynomial a b
...is doing what you expect it to. Contexts on data definitions are not terribly useful--see the discussion here for some reasons why, most of which amount to them forcing you to add extra annotations without actually providing many additional type guarantees.
Second, do you actually care about the "indeterminate" parameter other than to ensure that the types are kept distinct? A pretty standard way of doing that sort of thing is what's called phantom types--essentially, parameters in the type constructor that aren't used in the data constructor. You'll never use or need a value of the phantom type, so functions can be as polymorphic as you want, e.g.:
data Foo a b = Foo b
foo :: Foo a b -> Foo a b
foo (Foo x) = Foo x
bar :: Foo a c -> Foo b c
bar (Foo x) = Foo x
baz :: Foo Int Int -> Foo Char Int -> Foo () Int
baz (Foo x) (Foo y) = Foo $ x + y
Obviously this does require annotations, but only in places where you're deliberately adding restrictions. Otherwise, inference will work normally for the phantom type parameter.
It seems to me that the above approach should be sufficient for what you're doing here--the business with singleton types is mostly about bridging the gap between more complicated type-level stuff and regular value-level computations by creating type proxies for values. This could be useful for, say, marking vectors with types that indicate their basis, or marking numeric values with physical units--both cases where the annotation has more meaning than just "an indeterminate called X".

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string