Is nested pair a good idea in Haskell - haskell

AFAIK there is no way to do heterogeneous arrays in Haskell or extend data type.
However it seems it could be achieved easily by using nested pair (like CONS are).
For example
data Point2D a = Point2D a a
data Point3D a = Point3D a a a
Could be written using nested pair like this;
type Point2D = (a, (a, ())
type Point3D = (a, (a, (a, ()))
That way accessors can be common for Point2D and Point3D
x = fst
y = fst.snd
z = fst.snd.snd
This technique can also be used to extend record like this
type Person = (String, ())
name = fst
type User = (String, (String, ())
email = fst.snd
etc ...
Is it a good idea and if so , why is there no built-in support for such thing in Haskell ?
Is it what GADT is about ?

It can be a good idea, if that's what you need.
GADTs are not directly to do with this idea (if you want to know more about GADTs, probably better to google GADTs and read some of the links, and/or ask a separate Stackoverflow question).
There's no built in support for the same reason there's no built in support for graphics, matrix operations, or most other things; there's no need for support to be built in, because it can be added perfectly well by libraries, such as the HList package (indeed, even most of the functionality that is "built in" to Haskell is actually merely implemented in libraries; just libraries that happen to be always distributed with Haskell).
HList has fancy types for representing "heterogenous lists" and convenient functions for performing operations on them, and also uses these to develop "extensible records". HLists basically are equivalent to what you could do by developing your nested tuple idea further, so your basic idea is good enough that someone's already thought of it. :)

Here is a way to have arbitrarily nested pairs:
data Y f = In (f (Y f))
data P a = Pair Int a | STOP
type PY = Y P
a, b :: PY
a = In (Pair 23 (In (Pair 24 (In STOP))))
b = In (Pair 23 (In (Pair 24 (In (Pair 25 (In STOP))))))
c = [a,b] -- a and b have same type

Related

Existential types in Haskell and generics in other languages

I was trying to grasp the concept of existential types in Haskell using the article Haskell/Existentially quantified types. At the first glance, the concept seems clear and somewhat similar to generics in object oriented languages. The main example there is something called "heterogeneous list", defined as follows:
data ShowBox = forall s. Show s => SB s
heteroList :: [ShowBox]
heteroList = [SB (), SB 5, SB True]
instance Show ShowBox where
show (SB s) = show s
f :: [ShowBox] -> IO ()
f xs = mapM_ print xs
main = f heteroList
I had a different notion of a "heterogeneous list", something like Shapeless in Scala. But here, it's just a list of items wrapped in an existential type that only adds a type constraint. The exact type of its elements is not manifested in its type signature, the only thing we know is that they all conform to the type constraint.
In object-oriented languages, it seems very natural to write something like this (example in Java). This is a ubiquitous use case, and I don't need to create a wrapper type to process a list of objects that all implement a certain interface. The animals list has a generic type List<Vocal>, so I can assume that its elements all conform to this Vocal interface:
interface Vocal {
void voice();
}
class Cat implements Vocal {
public void voice() {
System.out.println("meow");
}
}
class Dog implements Vocal {
public void voice() {
System.out.println("bark");
}
}
var animals = Arrays.asList(new Cat(), new Dog());
animals.forEach(Vocal::voice);
I noticed that existential types are only available as a language extension, and they are not described in most of the "basic" Haskell books or tutorials, so my suggestion is that this is quite an advanced language feature.
My question is, why? Something that seems basic in languages with generics (constructing and using a list of objects whose types implement some interface and accessing them polymorphically), in Haskell requires a language extension, custom syntax and creating an additional wrapper type? Is there no way of achieving something like that without using existential types, or is there just no basic-level use cases for this?
Or maybe I'm just mixing up the concepts, and existential types and generics mean completely different things. Please help me make sense of it.
Yes,existential types and generic mean different things. An existential type can be used similarly to an interface in an object-oriented language. You can put one in a list of course, but a list or any other generic type is not needed to use an interface. It is enough to have a variable of type Vocal to demonstrate its usage.
It is not widely used in Haskell because it is not really needed most of the time.
nonHeteroList :: [IO ()]
nonHeteroList = [print (), print 5, print True]
does the same thing without any language extension.
An existential type (or an interface in an object-oriented language) is nothing but a piece of data with a bundled dictionary of methods. If you only have one method in your dictionary, just use a function. If you have more than one, you can use a tuple or a record of those. So if you have something like
interface Shape {
void Draw();
double Area();
}
you can express it in Haskell as, for example,
type Shape = (IO (), Double)
and say
circle center radius = (drawCircle center radius, pi * radius * radius)
rectangle topLeft bottomRight = (drawRectangle topLeft bottomRight,
abs $ (topLeft.x-bottomRight.x) * (topLeft.y-bottomRight.y))
shapes = [circle (P 2.0 3.5) 4.2, rectangle (P 3.3 7.2) (P -2.0 3.1)]
though you can express exactly the same thing with type classes, instances and existentials
class Shape a where
draw :: a -> IO ()
area :: a -> Double
data ShapeBox = forall s. Shape s => SB s
instance Shape ShapeBox where
draw (SB a) = draw a
area (SB a) = area a
data Circle = Circle Point Double
instance Shape Circle where
draw (Circle c r) = drawCircle c r
area (Circle _ r) = pi * r * r
data Rectangle = Rectangle Point Point
instance Shape Rectangle where
draw (Rectangle tl br) = drawRectangle tl br
area (Rectangle tl br) = abs $ (tl.x - br.x) * (tl.y - br.y)
shapes = [Circle (P 2.0 3.5) 4.2, Rectangle (P 3.3 7.2) (P -2.0 3.1)]
and there you have it, N times longer.
is there just no basic-level use cases for this?
Sort-of, yeah. While in Java, you have no choice but to have open classes, Haskell has ADTs which you'd normally use for these kind of use-cases. In your example, Haskell can represent it in one of two ways:
data Cat = Cat
data Dog = Dog
class Animal a where
voice :: a -> String
instance Animal Cat where
voice Cat = "meow"
instance Animal Dog where
voice Dog = "woof"
or
data Animal = Cat | Dog
voice Cat = "meow"
voice Dog = "woof"
If you needed something extensible, you'd use the former, but if you need to be able to case on the type of animal, you'd use the latter. If you wanted the former, but wanted a list, you don't have to use existential types, you could instead capture what you wanted in a list, like:
voicesOfAnimals :: [() -> String]
voicesOfAnimals = [\_ -> voice Cat, \_ -> voice Dog]
Or even more simply
voicesOfAnimals :: [String]
voicesOfAnimals = [voice Cat, voice Dog]
This is kind-of what you're doing with Heterogenous lists anyway, you have a constraint, in this case Animal a on each element, which lets you call voice on each element, but nothing else, since the constraint doesn't give you any more information about the value (well if you had the constraint Typeable a you'd be able to do more, but let's not worry about dynamic types here).
As for the reason for why Haskell doesn't support Heterogenous lists without extensions and wrappers, I'll let someone else explain it but key topics are:
subtyping
variance
inference
https://gitlab.haskell.org/ghc/ghc/-/wikis/impredicative-polymorphism (I think)
In your Java example, what's the type of Arrays.asList(new Cat())? Well, it depends on what you declare it as. If you declare the variable with List<Cat>, it typechecks, you can declare it with List<Animal>, and you can declare it with List<Object>. If you declared it as a List<Cat>, you wouldn't be able to reassign it to List<Animal> as that would be unsound.
In Haskell, typeclasses can't be used as the type within a list (so [Cat] is valid in the first example and [Animal] is valid in the second example, but [Animal] isn't valid in the first example), and this seems to be due to impredicative polymorphism not being supported in Haskell (not 100% sure). Haskell lists are defined something like [a] = [] | a : [a]. [x, y, z] is just syntatic sugar for x : (y : (z : [])). So consider the example in Haskell. Let's say you type [Dog] in the repl (this is equivalent to Dog : [] btw). Haskell infers this to have the type [Dog]. But if you were to give it Cat at the front, like [Cat, Dog] (Cat : Dog : []), it would match the 2nd constructor (:), and would infer the type of Cat : ... to [Cat], which Dog : [] would fail to match.
Since others have explained how you can avoid existential types in many cases, I figured I'd point out why you might want them. The simplest example I can think of is called Coyoneda:
data Coyoneda f a = forall x. Coyoneda (x -> a) (f x)
Coyoneda f a holds a container (or other functor) full of some type x and a function that can be mapped over it to produce an f a. Here's the Functor instance:
instance Functor (Coyoneda f) where
fmap f (Coyoneda g x) = Coyoneda (f . g) x
Note that this does not have a Functor f constraint! What makes it useful? To explain that takes two more functions:
liftCoyoneda :: f a -> Coyoneda f a
liftCoyoneda = Coyoneda id
lowerCoyoneda :: Functor f => Coyoneda f a -> f a
lowerCoyoneda (Coyoneda f x) = fmap f x
The cool thing is that fmap applications get built up and performed all together:
lowerCoyoneda . fmap f . fmap g . fmap h . liftCoyoneda
is operationally
fmap (f . g . h)
rather than
fmap f . fmap g . fmap h
This can be useful if fmap is expensive in the underlying functor.

What are Prisms?

I'm trying to achieve a deeper understanding of lens library, so I play around with the types it offers. I have already had some experience with lenses, and know how powerful and convenient they are. So I moved on to Prisms, and I'm a bit lost. It seems that prisms allow two things:
Determining if an entity belongs to a particular branch of a sum type, and if it does, capturing the underlying data in a tuple or a singleton.
Destructuring and reconstructing an entity, possibly modifying it in process.
The first point seems useful, but usually one doesn't need all the data from an entity, and ^? with plain lenses allows getting Nothing if the field in question doesn't belong to the branch the entity represents, just like it does with prisms.
The second point... I don't know, might have uses?
So the question is: what can I do with a Prism that I can't with other optics?
Edit: thank you everyone for excellent answers and links for further reading! I wish I could accept them all.
Lenses characterise the has-a relationship; Prisms characterise the is-a relationship.
A Lens s a says "s has an a"; it has methods to get exactly one a from an s and to overwrite exactly one a in an s. A Prism s a says "a is an s"; it has methods to upcast an a to an s and to (attempt to) downcast an s to an a.
Putting that intuition into code gives you the familiar "get-set" (or "costate comonad coalgebra") formulation of lenses,
data Lens s a = Lens {
get :: s -> a,
set :: a -> s -> s
}
and an "upcast-downcast" representation of prisms,
data Prism s a = Prism {
up :: a -> s,
down :: s -> Maybe a
}
up injects an a into s (without adding any information), and down tests whether the s is an a.
In lens, up is spelled review and down is preview. There’s no Prism constructor; you use the prism' smart constructor.
What can you do with a Prism? Inject and project sum types!
_Left :: Prism (Either a b) a
_Left = Prism {
up = Left,
down = either Just (const Nothing)
}
_Right :: Prism (Either a b) b
_Right = Prism {
up = Right,
down = either (const Nothing) Just
}
Lenses don't support this - you can't write a Lens (Either a b) a because you can't implement get :: Either a b -> a. As a practical matter, you can write a Traversal (Either a b) a, but that doesn't allow you to create an Either a b from an a - it'll only let you overwrite an a which is already there.
Aside: I think this subtle point about Traversals is the source of your confusion about partial record fields.
^? with plain lenses allows getting Nothing if the field in question doesn't belong to the branch the entity represents
Using ^? with a real Lens will never return Nothing, because a Lens s a identifies exactly one a inside an s.
When confronted with a partial record field,
data Wibble = Wobble { _wobble :: Int } | Wubble { _wubble :: Bool }
makeLenses will generate a Traversal, not a Lens.
wobble :: Traversal' Wibble Int
wubble :: Traversal' Wibble Bool
For an example of this how Prisms can be applied in practice, look to Control.Exception.Lens, which provides a collection of Prisms into Haskell's extensible Exception hierarchy. This lets you perform runtime type tests on SomeExceptions and inject specific exceptions into SomeException.
_ArithException :: Prism' SomeException ArithException
_AsyncException :: Prism' SomeException AsyncException
-- etc.
(These are slightly simplified versions of the actual types. In reality these prisms are overloaded class methods.)
Thinking at a higher level, certain whole programs can be thought of as being "basically a Prism". Encoding and decoding data is one example: you can always convert structured data to a String, but not every String can be parsed back:
showRead :: (Show a, Read a) => Prism String a
showRead = Prism {
up = show,
down = listToMaybe . fmap fst . reads
}
To summarise, Lenses and Prisms together encode the two core design tools of object-oriented programming: composition and subtyping. Lenses are a first-class version of Java's . and = operators, and Prisms are a first-class version of Java's instanceof and implicit upcasting.
One fruitful way of thinking about Lenses is that they give you a way of splitting up a composite s into a focused value a and some context c. Pseudocode:
type Lens s a = exists c. s <-> (a, c)
In this framework, a Prism gives you a way to look at an s as being either an a or some context c.
type Prism s a = exists c. s <-> Either a c
(I'll leave it to you to convince yourself that these are isomorphic to the simple representations I demonstrated above. Try implementing get/set/up/down for these types!)
In this sense a Prism is a co-Lens. Either is the categorical dual of (,); Prism is the categorical dual of Lens.
You can also observe this duality in the "profunctor optics" formulation - Strong and Choice are dual.
type Lens s t a b = forall p. Strong p => p a b -> p s t
type Prism s t a b = forall p. Choice p => p a b -> p s t
This is more or less the representation which lens uses, because these Lenses and Prisms are very composable. You can compose Prisms to get bigger Prisms ("a is an s, which is a p") using (.); composing a Prism with a Lens gives you a Traversal.
I just wrote a blog post, which might help build some intuition about Prisms: Prisms are constructors (Lenses are fields). http://oleg.fi/gists/posts/2018-06-19-prisms-are-constructors.html
Prisms could be introduced as first-class pattern matching, but that is a
one-sided view. I'd say they are generalised constructors, though maybe
more often used for pattern matching than for actual construction.
The important property of constructors (and lawful prisms), is their
injectivity. Though the usual prism laws don't state that directly,
injectivity property can be deduced.
To quote lens-library documentation, the prisms laws are:
First, if I review a value with a Prism and then preview, I will get it back:
preview l (review l b) ≡ Just b
Second, if you can extract a value a using a Prism l from a value s, then
the value s is completely described by l and a:
preview l s ≡ Just a ⇒ review l a ≡ s
In fact, the first law alone is enough to prove the injectivity of construction
via Prism:
review l x ≡ review l y ⇒ x ≡ y
The proof is straight-forward:
review l x ≡ review l y
-- x ≡ y -> f x ≡ f y
preview l (review l x) ≡ preview l (review l y)
-- rewrite both sides with the first law
Just x ≡ Just y
-- injectivity of Just
x ≡ y
We can use injectivity property as an additional tool in the equational
reasoning toolbox. Or we can use it as a easy property to check to decide
whether something is a lawful Prism. The check is easy as we only the
review side of Prism. Many smart constructors, which for example
normalise the input data, aren't lawful prisms.
An example using case-insensitive:
-- Bad!
_CI :: FoldCase s => Prism' (CI s) s
_CI = prism' ci (Just . foldedCase)
λ> review _CI "FOO" == review _CI "foo"
True
λ> "FOO" == "foo"
False
The first law is also violated:
λ> preview _CI (review _CI "FOO")
Just "foo"
In addition to the other excellent answers, I feel Isos provide a nice vantage point for considering this matter.
There being some i :: Iso' s a means if you have an s value you also (virtually) have an a value, and vice versa. The Iso' gives you two conversion functions, view i :: s -> a and review i :: a -> s which are both guaranteed to succeed and lossless.
There being some l :: Lens' s a means if you have an s you also have an a, but not vice versa. view l :: s -> a may drop information along the way, as the conversion isn't required to be lossless, and so you can't go the other way if all you have is an a (cf. set l :: a -> s -> s, which also requires an s in addition to the a value in order to provide the missing information).
There being some p :: Prism' s a means if you have an s value you might also have an a, but there are no guarantees. The conversion preview p :: s -> Maybe a is not guaranteed to succeed. Still, you do have the other direction, review p :: a -> s.
In other words, an Iso is invertible and always succeeds. If you drop the invertibility requirement, you get a Lens; if you drop the success guarantee, you get a Prism. If you drop both, you get an affine traversal (which is not in lens as a separate type), and if you go a step further and give up on having at most one target you end up with a Traversal. That is reflected in one of the diamonds of the lens subtype hierarchy:
Traversal
/ \
/ \
/ \
Lens Prism
\ /
\ /
\ /
Iso

Simple dependent type example in Haskell for Dummies. How are they useful in practice in Haskell? Why should I care about dependent types ?

I hear a lot about dependent types nowadays and I heard that DataKinds is somehow related to dependent typing (but I am not sure about this... just heard it on a Haskell Meetup).
Could someone illustrate with a super simple Haskell example what dependent typing is and what is it good for ?
On wikipedia it is written that dependent types can help prevent bugs. Could you give a simple example about how dependent types in Haskell can prevent bugs?
Something that I could start using in five minutes right now to prevent bugs in my Haskell code?
Dependent types are basically functions from values to types, how can this be used in practice? Why is that good ?
Late to the party, this answer is basically a shameless plug.
Sam Lindley and I wrote a paper about Hasochism, the pleasure and pain of dependently typed programming in Haskell. It gives plenty of examples of what's possible now in Haskell and draws points of comparison (favourable as well as not) with the Agda/Idris generation of dependently typed languages.
Although it is an academic paper, it is about actual programs, and you can grab the code from Sam's repo. We have lots of little examples (e.g. orderedness of mergesort output) but we end up with a text editor example, where we use indexing by width and height to manage screen geometry: we make sure that components are regular rectangles (vectors of vectors, not ragged lists of lists) and that they fit together exactly.
The key power of dependent types is to maintain consistency between separate data components (e.g., the head vector in a matrix and every vector in its tail must all have the same length). That's never more important than when writing conditional code. The situation (which will one day come to be seen as having been ridiculously naïve) is that the following are all type-preserving rewrites
if b then t else e => if b then e else t
if b then t else e => t
if b then t else e => e
Although we are presumably testing b because it gives us some useful insight into what would be appropriate (or even safe) to do next, none of that insight is mediated via the type system: the idea that b's truth justifies t and its falsity justifies e is missing, despite being critical.
Plain old Hindley-Milner does give us one means to ensure some consistency. Whenever we have a polymorphic function
f :: forall a. r[a] -> s[a] -> t[a]
we must instantiate a consistently: however the first argument fixes a, the second argument must play along, and we learn something useful about the result while we are at it. Allowing data at the type level is useful because some forms of consistency (e.g. lengths of things) are more readily expressed in terms of data (numbers).
But the real breakthrough is GADT pattern matching, where the type of a pattern can refine the type of the argument it matches. You have a vector of length n; you look to see whether it's nil or cons; now you know whether n is zero or not. This is a form of testing where the type of the code in each case is more specific than the type of the whole, because in each case something which has been learned is reflected at the type level. It is learning by testing which makes a language dependently typed, at least to some extent.
Here's a silly game to play, whatever typed language you use. Replace every type variable and every primitive type in your type expressions with 1 and evaluate types numerically (sum the sums, multiply the products, s -> t means t-to-the-s) and see what you get: if you get 0, you're a logician; if you get 1, you're a software engineer; if you get a power of 2, you're an electronic engineer; if you get infinity, you're a programmer. What's going on in this game is a crude attempt to measure the information we're managing and the choices our code must make. Our usual type systems are good at managing the "software engineering" aspects of coding: unpacking and plugging together components. But as soon as a choice has been made, there is no way for types to observe it, and as soon as there are choices to make, there is no way for types to guide us: non-dependent type systems approximate all values in a given type as the same. That's a pretty serious limitation on their use in bug prevention.
The common example is to encode the length of a list in it's type, so you can do things like (pseudo code).
cons :: a -> List a n -> List a (n+1)
Where n is an integer. This let you specify that adding an object to list increment its length by one.
You can then prevent head (which give you the first element of a list) to be ran on empty list
head :: n > 0 => List a n -> a
Or do things like
to3uple :: List a 3 -> (a,a,a)
The problem with this type of approach is you then can't call head on a arbitrary list without having proven first that the list is not null.
Sometime the proof can be done by the compiler, ex:
head (a `cons` l)
Otherwise, you have to do things like
if null list
then ...
else (head list)
Here it's safe to call head, because you are in the else branch and therefore guaranteed that the length is not null.
However, Haskell doesn't do dependent type at the moment, all the examples have given won't work as nicely, but you should be able to declare this type of list using DataKind because you can promote a int to type which allow to instanciate List a b with List Int 1. (b is a phantom type taking a literal).
If you are interested in this type of safety, you can have a look a liquid Haskell.
Here is a example of such code
{-# LANGUAGE DataKinds, KindSignatures, TypeFamilies, TypeOperators #-}
import GHC.TypeLits
data List a (n:: Nat) = List [a] deriving Show
cons :: a -> List a n -> List a (n + 1)
cons x (List xs) = List (x:xs)
singleton :: a -> List a 1
singleton x = List [x]
data NonEmpty
data EmptyList
type family ListLength a where
ListLength (List a 0) = EmptyList
ListLength (List a n) = NonEmpty
head' :: (ListLength (List a n) ~ NonEmpty) => List a n -> a
head' (List xs) = head xs
tail' :: (ListLength (List a n) ~ NonEmpty) => List a n -> List a (n-1)
tail' (List xs) = List (tail xs)
list = singleton "a"
head' list -- return "a"
Trying to do head' (tail' list) doesn't compile and give
Couldn't match type ‘EmptyList’ with ‘NonEmpty’
Expected type: NonEmpty
Actual type: ListLength (List [Char] 0)
In the expression: head' (tail' list)
In an equation for ‘it’: it = head' (tail' list)
Adding to #mb14's example, here's some simpler working code.
First, we need DataKinds, GADTs, and KindSignatures to really make it clear:
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTS #-}
{-# LANGUAGE KindSignatures #-}
Now let's define a Nat type, and a Vector type based on it:
data Nat :: * where
Z :: Nat
S :: Nat -> Nat
data Vector :: Nat -> * -> * where
Nil :: Vector Z a
(:-:) :: a -> Vector n a -> Vector (S n) a
And voila, lists using dependent types that can be called safe in certain circumstances.
Here are the head and tail functions:
head' :: Vector (S n) a -> a
head' (a :-: _) = a
-- The other constructor, Nil, doesn't apply here because of the type signature!
tail' :: Vector (S n) a -> Vector n a
tail (_ :-: xs) = xs
-- Ditto here.
This is a more concrete and understandable example than above, but does the same sort of thing.
Note that in Haskell, Types can influence values, but values cannot influence types in the same dependent ways. There are languages such as Idris that are similar to Haskell but also support value-to-type dependent typing, which I would recommend looking into.
The machines package lets users define machines that can request values. Many machines request only one type of value, but it's also possible to define machines that sometimes ask for one type and sometimes ask for another type. The requests are values of a GADT type, which allows the value of the request to determine the type of the response.
Step k o r = ...
| forall t . Await (t -> r) (k t) r
The machine provides a request of type k t for some unspecified type t, and a function to deal with the result. By pattern matching on the request, the machine runner learns what type it must supply the machine. The machine's response handler doesn't need to check that it got the right sort of response.

How to work around F#'s type system

In Haskell, you can use unsafeCoerce to override the type system. How to do the same in F#?
For example, to implement the Y-combinator.
I'd like to offer a different solution, based on embedding the untyped lambda calculus in a typed functional language. The idea is to create a data type that allows us to change between types α and α → α, which subsequently allows to escape the restrictions of a type system. I'm not very familiar with F# so I'll give my answer in Haskell, but I believe it could be adapted easily (perhaps the only complication could be F#'s strictness).
-- | Roughly represents morphism between #a# and #a -> a#.
-- Therefore we can embed a arbitrary closed λ-term into #Any a#. Any time we
-- need to create a λ-abstraction, we just nest into one #Any# constructor.
--
-- The type parameter allows us to embed ordinary values into the type and
-- retrieve results of computations.
data Any a = Any (Any a -> a)
Note that the type parameter isn't significant for combining terms. It just allows us to embed values into our representation and extract them later. All terms of a particular type Any a can be combined freely without restrictions.
-- | Embed a value into a λ-term. If viewed as a function, it ignores its
-- input and produces the value.
embed :: a -> Any a
embed = Any . const
-- | Extract a value from a λ-term, assuming it's a valid value (otherwise it'd
-- loop forever).
extract :: Any a -> a
extract x#(Any x') = x' x
With this data type we can use it to represent arbitrary untyped lambda terms. If we want to interpret a value of Any a as a function, we just unwrap its constructor.
First let's define function application:
-- | Applies a term to another term.
($$) :: Any a -> Any a -> Any a
(Any x) $$ y = embed $ x y
And λ abstraction:
-- | Represents a lambda abstraction
l :: (Any a -> Any a) -> Any a
l x = Any $ extract . x
Now we have everything we need for creating complex λ terms. Our definitions mimic the classical λ-term syntax, all we do is using l to construct λ abstractions.
Let's define the Y combinator:
-- λf.(λx.f(xx))(λx.f(xx))
y :: Any a
y = l (\f -> let t = l (\x -> f $$ (x $$ x))
in t $$ t)
And we can use it to implement Haskell's classical fix. First we'll need to be able to embed a function of a -> a into Any a:
embed2 :: (a -> a) -> Any a
embed2 f = Any (f . extract)
Now it's straightforward to define
fix :: (a -> a) -> a
fix f = extract (y $$ embed2 f)
and subsequently a recursively defined function:
fact :: Int -> Int
fact = fix f
where
f _ 0 = 1
f r n = n * r (n - 1)
Note that in the above text there is no recursive function. The only recursion is in the Any data type, which allows us to define y (which is also defined non-recursively).
In Haskell, unsafeCoerce has the type a -> b and is generally used to assert to the compiler that the thing being coerced actually has the destination type and it's just that the type-checker doesn't know it.
Another, less common use, is to reinterpret a pattern of bits as another type. For example an unboxed Double# could be reinterpreted as an unboxed Int64#. You have to be sure about the underlying representations for this to be safe.
In F#, the first application can be achieved with box |> unbox as John Palmer said in a comment on the question. If possible use explicit type arguments to make sure that you don't accidentally have the wrong coercion inferred, e.g. box<'a> |> unbox<'b> where 'a and 'b are type variables or concrete types that are already in scope in your code.
For the second application, look at the BitConverter class for specific conversions of bit-patterns. In theory you could also do something like interfacing with unmanaged code to achieve this, but that seems very heavyweight.
These techniques won't work for implementing the Y combinator because the cast is only valid if the runtime objects actually do have the target type, but with the Y combinator you actually need to call the same function again but with a different type. For this you need the kinds of encoding tricks mentioned in the question John Palmer linked to.

Are newtypes faster than enumerations?

According to this article,
Enumerations don't count as single-constructor types as far as GHC is concerned, so they don't benefit from unpacking when used as strict constructor fields, or strict function arguments. This is a deficiency in GHC, but it can be worked around.
And instead the use of newtypes is recommended. However, I cannot verify this with the following code:
{-# LANGUAGE MagicHash,BangPatterns #-}
{-# OPTIONS_GHC -O2 -funbox-strict-fields -rtsopts -fllvm -optlc --x86-asm-syntax=intel #-}
module Main(main,f,g)
where
import GHC.Base
import Criterion.Main
data D = A | B | C
newtype E = E Int deriving(Eq)
f :: D -> Int#
f z | z `seq` False = 3422#
f z = case z of
A -> 1234#
B -> 5678#
C -> 9012#
g :: E -> Int#
g z | z `seq` False = 7432#
g z = case z of
(E 0) -> 2345#
(E 1) -> 6789#
(E 2) -> 3535#
f' x = I# (f x)
g' x = I# (g x)
main :: IO ()
main = defaultMain [ bench "f" (whnf f' A)
, bench "g" (whnf g' (E 0))
]
Looking at the assembly, the tags for each constructor of the enumeration D is actually unpacked and directly hard-coded in the instruction. Furthermore, the function f lacks error-handling code, and more than 10% faster than g. In a more realistic case I have also experienced a slowdown after converting a enumeration to a newtype. Can anyone give me some insight about this? Thanks.
It depends on the use case. For the functions you have, it's expected that the enumeration performs better. Basically, the three constructors of D become Ints resp. Int#s when the strictness analysis allows that, and GHC knows it's statically checked that the argument can only have one of the three values 0#, 1#, 2#, so it needs not insert error handling code for f. For E, the static guarantee of only one of three values being possible isn't given, so it needs to add error handling code for g, that slows things down significantly. If you change the definition of g so that the last case becomes
E _ -> 3535#
the difference vanishes completely or almost completely (I get a 1% - 2% better benchmark for f still, but I haven't done enough testing to be sure whether that's a real difference or an artifact of benchmarking).
But this is not the use case the wiki page is talking about. What it's talking about is unpacking the constructors into other constructors when the type is a component of other data, e.g.
data FooD = FD !D !D !D
data FooE = FE !E !E !E
Then, if compiled with -funbox-strict-fields, the three Int#s can be unpacked into the constructor of FooE, so you'd basically get the equivalent of
struct FooE {
long x, y, z;
};
while the fields of FooD have the multi-constructor type D and cannot be unpacked into the constructor FD(1), so that would basically give you
struct FooD {
long *px, *py, *pz;
}
That can obviously have significant impact.
I'm not sure about the case of single-constructor function arguments. That has obvious advantages for types with contained data, like tuples, but I don't see how that would apply to plain enumerations, where you just have a case and splitting off a worker and a wrapper makes no sense (to me).
Anyway, the worker/wrapper transformation isn't so much a single-constructor thing, constructor specialisation can give the same benefit to types with few constructors. (For how many constructors specialisations would be created depends on the value of -fspec-constr-count.)
(1) That might have changed, but I doubt it. I haven't checked it though, so it's possible the page is out of date.
I would guess that GHC has changed quite a bit since that page was last updated in 2008. Also, you're using the LLVM backend, so that's likely to have some effect on performance as well. GHC can (and will, since you've used -O2) strip any error handling code from f, because it knows statically that f is total. The same cannot be said for g. I would guess that it's the LLVM backend that then unpacks the constructor tags in f, because it can easily see that there is nothing else used by the branching condition. I'm not sure of that, though.

Resources