Implicit, static type cast (coercion) in Haskell - haskell

Consider the following design problem in Haskell. I have a simple, symbolic EDSL in which I want to express variables and general expressions (multivariate polynomials) such as x^2 * y + 2*z + 1. In addition, I want to express certain symbolic equations over expressions, say x^2 + 1 = 1, as well as definitions, like x := 2*y - 2.
The goal is to:
Have a separate type for variables and general expressions - certain
functions might be applied to variables and not complex expressions.
For instance, a definition operator := might be of type
(:=) :: Variable -> Expression -> Definition and it should not
be possible to pass a complex expression as its left-hand side
parameter (though it should be possible to pass a variable as its
right-hand side parameter, without explicit casting).
Have expressions an instance of Num, so that it's possible to
promote integer literals to expressions and use a convenient
notation for common algebraic operations like addition or
multiplication without introducing some auxiliary wrapper operators.
In other words, I would like to have an implicit and static type cast (coercion) of variables to expressions. Now, I know that as such, there are no implicit type casts in Haskell. Nevertheless, certain object-oriented programming concepts (simple inheritance, in this case) are expressible in Haskell's type system, either with or without language extensions. How could I satisfy both above points while keeping a lightweight syntax? Is it even possible?
It is clear that the main problem here is Num's type restriction, e.g.
(+) :: Num a => a -> a -> a
In principle, it's possible to write a single (generalised) algebraic data type for both variables and expressions. Then, one could write := in such a way, that the left-hand side expression is discriminated and only a variable constructor is accepted, with a run-time error otherwise. That's however not a clean, static (i.e. compile-time) solution...
Ideally, I would like to achieve a lightweight syntax such as
computation = do
x <- variable
t <- variable
t |:=| x^2 - 1
solve (t |==| 0)
In particular, I want to forbid notation like
t + 1 |:=| x^2 - 1 since := should give a definition of a variable and not an entire left-hand side expression.

To leverage polymorphism rather than subtyping (because that's all you have in Haskell), don't think "a variable is an expression", but "both variables and expressions have some operations in common". Those operations can be put in a type class:
class HasVar e where fromVar :: Variable -> e
instance HasVar Variable where fromVar = id
instance HasVar Expression where ...
Then, rather than casting things, make things polymorphic. If you have v :: forall e. HasVar e => e, it can be used both as an expression and as a variable.
example :: (forall e. HasVar e => e) -> Definition
example v = (v := v) -- v can be used as both Variable and Expression
(:=) :: Variable -> Expression -> Definition
Skeleton to make the code below typecheck:
computation :: Solver ()
computation = do
V x <- variable
V t <- variable
t |:=| x^2 - 1
solve (t |==| 0)


What does a stand for in a data type declaration?

Normally when using type declarations we do:
function_name :: Type -> Type
However in an exercise I am trying to solve there is the following structure:
function_name :: Type a -> Type a
or explicitly as in the exercise
alphabet :: DFA a -> Alphabet a
alphabet = undefined
What does a stand for?
Short answer: it's a type variable.
At the computation level, the way we define functions is to use variables to refer to their arguments. Like this:
f x = x + 3
Here x is a variable, and its value will be chosen when the function is called. Haskell has a similar (but not identical...) mechanism in its type sublanguage. For example, you can write things like:
type F x = (x, Int, x)
type Endo a = a -> a -> a
Here again x is a variable in the first one (and a in the second), and its value will be chosen at use sites. One can also use this mechanism when defining new types. (The previous two examples just give new names to existing types, but the following does more.) One of the most basic nontrivial examples of this is the Maybe family of types:
data Maybe a = Nothing | Just a
The things on the right of the = are computation-level, so you can mostly ignore them for now, but on the left we are declaring a new family of types Maybe which accepts other types as an argument. For example, Maybe Int, Maybe (Bool, String), Maybe (Endo Char), and even passing in expressions that have variables like Maybe (x, Int, x) are all possible.
Syntactically, type constructors (things which are defined as part of the program text and that we expect the compiler to look up the definition for) start with an upper case letter and type variables (things which will be instantiated later and so don't currently have a concrete definition) start with lower case letters.
So, in the type signature you showed:
alphabet :: DFA a -> Alphabet a
I suspect there are actually two constructs new to you, not just one: first, the type variable a that you asked about, and second, the concept of type application, where we apply at the type level one "function-like" type to another. (Outside of this answer, people say "parameterized" instead of "function-like".)
...and, believe it or not, there is even a type system for types that makes sure you don't write things like these:
Int a -- Int is not parameterized, so shouldn't be applied to arguments
Int Char -- ditto
Maybe -> String -- Maybe is parameterized, so should be applied to
-- arguments, but isn't

How are tail-position contexts GHC join points paper formed?

Compiling without Continuations describes a way to extend ANF System F with join points. GHC itself has join points in Core (an intermediate representation) rather than exposing join points directly in the surface language (Haskell). Out of curiosity, I started trying to write a language that simply extends System F with join points. That is, the join points are user facing. However, there's something about the typing rules in the paper that I don't understand. Here are the parts that I do understand:
There are two environments, one for ordinary values/functions and one that only has join points.
The rational for ∆ being ε in several of the rules. In the expression let x:σ = u in ..., u cannot reference any join points (VBIND) because it join points cannot return to arbitrary locations.
The strange typing rule for JBIND. The paper does a good job explaining this.
Here's what I don't get. The paper introduces a notation that I will call the "overhead arrow", but the paper itself does not explicitly give it a name or mention it. Visually, it looks like a arrow pointing to the right, and it goes above an expression. Roughly, this seems to indicate a "tail context" (the paper does use this term). In the paper, these overhead arrows can be applied to terms, types, data constructors, and even environments. They can be nested as well. Here's the main difficulty I'm having. There are several rules with premises that include type environments under an overhead arrow. JUMP, CASE, RVBIND, and RJBIND all include premises with such a type environment (Figure 2 in the paper). However, none of the typing rules have a conclusion where the type environment is under an overhead arrow. So, I cannot see how JUMP, CASE, etc. can ever be used since the premises cannot be derived by any of the other rules.
That's the question, but if anyone has any supplementary material that provides more context are the overhead arrow convention or if anyone is aware an implementation of the System-F-with-join-points type system (other than in GHC's IR), that would be helpful too.
In this paper, x⃗ means “A sequence of x, separated by appropriate delimiters”.
A few examples:
If x is a variable, λx⃗. e is an abbreviation for λx1. λx2. … λxn e. In other words, many nested 1-argument lambdas, or a many-argument lambda.
If σ and τ are types, σ⃗ → τ is an abbreviation for σ1 → σ2 → … → σn → τ. In other words, a function type with many parameter types.
If a is a type variable and σ is a type, ∀a⃗. σ is an abbreviation for ∀a1. ∀a2. … ∀an. σ. In other words, many nested polymorphic functions, or a polymorphic function with many type parameters.
In Figure 1 of the paper, the syntax of a jump expression is defined as:
e, u, v ⩴ … | jump j ϕ⃗ e⃗ τ
If this declaration were translated into a Haskell data type, it might look like this:
data Term
-- | A jump expression has a label that it jumps to, a list of type argument
-- applications, a list of term argument applications, and the return type
-- of the overall `jump`-expression.
= Jump LabelVar [Type] [Term] Type
| ... -- Other syntactic forms.
That is, a data constructor that takes a label variable j, a sequence of type arguments ϕ⃗, a sequence of term arguments e⃗, and a return type τ.
“Zipping” things together:
Sometimes, multiple uses of the overhead arrow place an implicit constraint that their sequences have the same length. One place that this occurs is with substitutions.
{ϕ/⃗a} means “replace a1 with ϕ1, replace a2 with ϕ2, …, replace an with ϕn”, implicitly asserting that both a⃗ and ϕ⃗ have the same length, n.
Worked example: the JUMP rule:
The JUMP rule is interesting because it provides several uses of sequencing, and even a sequence of premises. Here’s the rule again:
(j : ∀a⃗. σ⃗ → ∀r. r) ∈ Δ
(Γ; ε ⊢⃗ u : σ {ϕ/⃗a})
Γ; Δ ⊢ jump j ϕ⃗ u⃗ τ : τ
The first premise should be fairly straightforward, now: lookup j in the label context Δ, and check that the type of j starts with a bunch of ∀s, followed by a bunch of function types, ending with a ∀r. r.
The second “premise” is actually a sequence of premises. What is it looping over? So far, the sequences we have in scope are ϕ⃗, σ⃗, a⃗, and u⃗.
ϕ⃗ and a⃗ are used in a nested sequence, so probably not those two.
On the other hand, u⃗ and σ⃗ seem quite plausible if you consider what they mean.
σ⃗ is the list of argument types expected by the label j, and u⃗ is the list of argument terms provided to the label j, and it makes sense that you might want to iterate over argument types and argument terms together.
So this “premise” actually means something like this:
for each pair of σ and u:
Γ; ε ⊢ u : σ {ϕ/⃗a}
Pseudo-Haskell implementation
Finally, here’s a somewhat-complete code sample illustrating what this typing rule might look like in an actual implementation. x⃗ is implemented as a list of x values, and some monad M is used to signal failure when a premise is not satisfied.
data LabelVar
data Type
= ...
data Term
= Jump LabelVar [Type] [Term] Type
| ...
typecheck :: TermContext -> LabelContext -> Term -> M Type
typecheck gamma delta (Jump j phis us tau) = do
-- Look up `j` in the label context. If it's not there, throw an error.
typeOfJ <- lookupLabel j delta
-- Check that the type of `j` has the right shape: a bunch of `foralls`,
-- followed by a bunch of function types, ending with `forall r.r`. If it
-- has the correct shape, split it into a list of `a`s, a list of `\sigma`s
-- and the return type, `forall r.r`.
(as, sigmas, ret) <- splitLabelType typeOfJ
-- exactZip is a helper function that "zips" two sequences together.
-- If the sequences have the same length, it produces a list of pairs of
-- corresponding elements. If not, it raises an error.
for each (u, sigma) in exactZip (us, sigmas):
-- Type-check the argument `u` in a context without any tail calls,
-- and assert that its type has the correct form.
sigma' <- typecheck gamma emptyLabelContext u
-- let subst = { \sequence{\phi / a} }
subst <- exactZip as phis
assert (applySubst subst sigma == sigma')
-- After all the premises have been satisfied, the type of the `jump`
-- expression is just its return type.
return tau
-- Other syntactic forms
typecheck gamma delta u = ...
-- Auxiliary definitions
type M = ...
instance Monad M
lookupLabel :: LabelVar -> LabelContext -> M Type
splitLabelType :: Type -> M ([TypeVar], [Type], Type)
exactZip :: [a] -> [b] -> M [(a, b)]
applySubst :: [(TypeVar, Type)] -> Type -> Type
As far as I know SPJ’s style for notation, and this does align with what I see in the paper, it simply means “0 or more”. E.g. you can replace \overarrow{a} with a_1, …, a_n, n >= 0.
It may be “1 or more” in some cases, but it shouldn’t be hard to figure one which one of the two.

How to work around F#'s type system

In Haskell, you can use unsafeCoerce to override the type system. How to do the same in F#?
For example, to implement the Y-combinator.
I'd like to offer a different solution, based on embedding the untyped lambda calculus in a typed functional language. The idea is to create a data type that allows us to change between types α and α → α, which subsequently allows to escape the restrictions of a type system. I'm not very familiar with F# so I'll give my answer in Haskell, but I believe it could be adapted easily (perhaps the only complication could be F#'s strictness).
-- | Roughly represents morphism between #a# and #a -> a#.
-- Therefore we can embed a arbitrary closed λ-term into #Any a#. Any time we
-- need to create a λ-abstraction, we just nest into one #Any# constructor.
-- The type parameter allows us to embed ordinary values into the type and
-- retrieve results of computations.
data Any a = Any (Any a -> a)
Note that the type parameter isn't significant for combining terms. It just allows us to embed values into our representation and extract them later. All terms of a particular type Any a can be combined freely without restrictions.
-- | Embed a value into a λ-term. If viewed as a function, it ignores its
-- input and produces the value.
embed :: a -> Any a
embed = Any . const
-- | Extract a value from a λ-term, assuming it's a valid value (otherwise it'd
-- loop forever).
extract :: Any a -> a
extract x#(Any x') = x' x
With this data type we can use it to represent arbitrary untyped lambda terms. If we want to interpret a value of Any a as a function, we just unwrap its constructor.
First let's define function application:
-- | Applies a term to another term.
($$) :: Any a -> Any a -> Any a
(Any x) $$ y = embed $ x y
And λ abstraction:
-- | Represents a lambda abstraction
l :: (Any a -> Any a) -> Any a
l x = Any $ extract . x
Now we have everything we need for creating complex λ terms. Our definitions mimic the classical λ-term syntax, all we do is using l to construct λ abstractions.
Let's define the Y combinator:
-- λf.(λx.f(xx))(λx.f(xx))
y :: Any a
y = l (\f -> let t = l (\x -> f $$ (x $$ x))
in t $$ t)
And we can use it to implement Haskell's classical fix. First we'll need to be able to embed a function of a -> a into Any a:
embed2 :: (a -> a) -> Any a
embed2 f = Any (f . extract)
Now it's straightforward to define
fix :: (a -> a) -> a
fix f = extract (y $$ embed2 f)
and subsequently a recursively defined function:
fact :: Int -> Int
fact = fix f
f _ 0 = 1
f r n = n * r (n - 1)
Note that in the above text there is no recursive function. The only recursion is in the Any data type, which allows us to define y (which is also defined non-recursively).
In Haskell, unsafeCoerce has the type a -> b and is generally used to assert to the compiler that the thing being coerced actually has the destination type and it's just that the type-checker doesn't know it.
Another, less common use, is to reinterpret a pattern of bits as another type. For example an unboxed Double# could be reinterpreted as an unboxed Int64#. You have to be sure about the underlying representations for this to be safe.
In F#, the first application can be achieved with box |> unbox as John Palmer said in a comment on the question. If possible use explicit type arguments to make sure that you don't accidentally have the wrong coercion inferred, e.g. box<'a> |> unbox<'b> where 'a and 'b are type variables or concrete types that are already in scope in your code.
For the second application, look at the BitConverter class for specific conversions of bit-patterns. In theory you could also do something like interfacing with unmanaged code to achieve this, but that seems very heavyweight.
These techniques won't work for implementing the Y combinator because the cast is only valid if the runtime objects actually do have the target type, but with the Y combinator you actually need to call the same function again but with a different type. For this you need the kinds of encoding tricks mentioned in the question John Palmer linked to.

Functions don't just have types: They ARE Types. And Kinds. And Sorts. Help put a blown mind back together

I was doing my usual "Read a chapter of LYAH before bed" routine, feeling like my brain was expanding with every code sample. At this point I was convinced that I understood the core awesomeness of Haskell, and now just had to understand the standard libraries and type classes so that I could start writing real software.
So I was reading the chapter about applicative functors when all of a sudden the book claimed that functions don't merely have types, they are types, and can be treated as such (For example, by making them instances of type classes). (->) is a type constructor like any other.
My mind was blown yet again, and I immediately jumped out of bed, booted up the computer, went to GHCi and discovered the following:
Prelude> :k (->)
(->) :: ?? -> ? -> *
What on earth does it mean?
If (->) is a type constructor, what are the value constructors? I can take a guess, but would have no idea how define it in traditional data (->) ... = ... | ... | ... format. It's easy enough to do this with any other type constructor: data Either a b = Left a | Right b. I suspect my inability to express it in this form is related to the extremly weird type signature.
What have I just stumbled upon? Higher kinded types have kind signatures like * -> * -> *. Come to think of it... (->) appears in kind signatures too! Does this mean that not only is it a type constructor, but also a kind constructor? Is this related to the question marks in the type signature?
I have read somewhere (wish I could find it again, Google fails me) about being able to extend type systems arbitrarily by going from Values, to Types of Values, to Kinds of Types, to Sorts of Kinds, to something else of Sorts, to something else of something elses, and so on forever. Is this reflected in the kind signature for (->)? Because I've also run into the notion of the Lambda cube and the calculus of constructions without taking the time to really investigate them, and if I remember correctly it is possible to define functions that take types and return types, take values and return values, take types and return values, and take values which return types.
If I had to take a guess at the type signature for a function which takes a value and returns a type, I would probably express it like this:
a -> ?
or possibly
a -> *
Although I see no fundamental immutable reason why the second example couldn't easily be interpreted as a function from a value of type a to a value of type *, where * is just a type synonym for string or something.
The first example better expresses a function whose type transcends a type signature in my mind: "a function which takes a value of type a and returns something which cannot be expressed as a type."
You touch so many interesting points in your question, so I am
afraid this is going to be a long answer :)
Kind of (->)
The kind of (->) is * -> * -> *, if we disregard the boxity GHC
inserts. But there is no circularity going on, the ->s in the
kind of (->) are kind arrows, not function arrows. Indeed, to
distinguish them kind arrows could be written as (=>), and then
the kind of (->) is * => * => *.
We can regard (->) as a type constructor, or maybe rather a type
operator. Similarly, (=>) could be seen as a kind operator, and
as you suggest in your question we need to go one 'level' up. We
return to this later in the section Beyond Kinds, but first:
How the situation looks in a dependently typed language
You ask how the type signature would look for a function that takes a
value and returns a type. This is impossible to do in Haskell:
functions cannot return types! You can simulate this behaviour using
type classes and type families, but let us for illustration change
language to the dependently typed language
Agda. This is a
language with similar syntax as Haskell where juggling types together
with values is second nature.
To have something to work with, we define a data type of natural
numbers, for convenience in unary representation as in
Peano Arithmetic.
Data types are written in
GADT style:
data Nat : Set where
Zero : Nat
Succ : Nat -> Nat
Set is equivalent to * in Haskell, the "type" of all (small) types,
such as Natural numbers. This tells us that the type of Nat is
Set, whereas in Haskell, Nat would not have a type, it would have
a kind, namely *. In Agda there are no kinds, but everything has
a type.
We can now write a function that takes a value and returns a type.
Below is a the function which takes a natural number n and a type,
and makes iterates the List constructor n applied to this
type. (In Agda, [a] is usually written List a)
listOfLists : Nat -> Set -> Set
listOfLists Zero a = a
listOfLists (Succ n) a = List (listOfLists n a)
Some examples:
listOfLists Zero Bool = Bool
listOfLists (Succ Zero) Bool = List Bool
listOfLists (Succ (Succ Zero)) Bool = List (List Bool)
We can now make a map function that operates on listsOfLists.
We need to take a natural number that is the number of iterations
of the list constructor. The base cases are when the number is
Zero, then listOfList is just the identity and we apply the function.
The other is the empty list, and the empty list is returned.
The step case is a bit move involving: we apply mapN to the head
of the list, but this has one layer less of nesting, and mapN
to the rest of the list.
mapN : {a b : Set} -> (a -> b) -> (n : Nat) ->
listOfLists n a -> listOfLists n b
mapN f Zero x = f x
mapN f (Succ n) [] = []
mapN f (Succ n) (x :: xs) = mapN f n x :: mapN f (Succ n) xs
In the type of mapN, the Nat argument is named n, so the rest of
the type can depend on it. So this is an example of a type that
depends on a value.
As a side note, there are also two other named variables here,
namely the first arguments, a and b, of type Set. Type
variables are implicitly universally quantified in Haskell, but
here we need to spell them out, and specify their type, namely
Set. The brackets are there to make them invisible in the
definition, as they are always inferable from the other arguments.
Set is abstract
You ask what the constructors of (->) are. One thing to point out
is that Set (as well as * in Haskell) is abstract: you cannot
pattern match on it. So this is illegal Agda:
cheating : Set -> Bool
cheating Nat = True
cheating _ = False
Again, you can simulate pattern matching on types constructors in
Haskell using type families, one canoical example is given on
Brent Yorgey's blog.
Can we define -> in the Agda? Since we can return types from
functions, we can define an own version of -> as follows:
_=>_ : Set -> Set -> Set
a => b = a -> b
(infix operators are written _=>_ rather than (=>)) This
definition has very little content, and is very similar to doing a
type synonym in Haskell:
type Fun a b = a -> b
Beyond kinds: Turtles all the way down
As promised above, everything in Agda has a type, but then
the type of _=>_ must have a type! This touches your point
about sorts, which is, so to speak, one layer above Set (the kinds).
In Agda this is called Set1:
FunType : Set1
FunType = Set -> Set -> Set
And in fact, there is a whole hierarchy of them! Set is the type of
"small" types: data types in haskell. But then we have Set1,
Set2, Set3, and so on. Set1 is the type of types which mentions
Set. This hierarchy is to avoid inconsistencies such as Girard's
As noticed in your question, -> is used for types and kinds in
Haskell, and the same notation is used for function space at all
levels in Agda. This must be regarded as a built in type operator,
and the constructors are lambda abstraction (or function
definitions). This hierarchy of types is similar to the setting in
System F omega, and more
information can be found in the later chapters of
Pierce's Types and Programming Languages.
Pure type systems
In Agda, types can depend on values, and functions can return types,
as illustrated above, and we also had an hierarchy of
types. Systematic investigation of different systems of the lambda
calculi is investigated in more detail in Pure Type Systems. A good
reference is
Lambda Calculi with Types by Barendregt,
where PTS are introduced on page 96, and many examples on page 99 and onwards.
You can also read more about the lambda cube there.
Firstly, the ?? -> ? -> * kind is a GHC-specific extension. The ? and ?? are just there to deal with unboxed types, which behave differently from just * (which has to be boxed, as far as I know). So ?? can be any normal type or an unboxed type (e.g. Int#); ? can be either of those or an unboxed tuple. There is more information here: Haskell Weird Kinds: Kind of (->) is ?? -> ? -> *
I think a function can't return an unboxed type because functions are lazy. Since a lazy value is either a value or a thunk, it has to be boxed. Boxed just means it is a pointer rather than just a value: it's like Integer() vs int in Java.
Since you are probably not going to be using unboxed types in LYAH-level code, you can imagine that the kind of -> is just * -> * -> *.
Since the ? and ?? are basically just more general version of *, they do not have anything to do with sorts or anything like that.
However, since -> is just a type constructor, you can actually partially apply it; for example, (->) e is an instance of Functor and Monad. Figuring out how to write these instances is a good mind-stretching exercise.
As far as value constructors go, they would have to just be lambdas (\ x ->) or function declarations. Since functions are so fundamental to the language, they get their own syntax.

SML conversions to Haskell

A few basic questions, for converting SML code to Haskell.
1) I am used to having local embedded expressions in SML code, for example test expressions, prints, etc. which functions local tests and output when the code is loaded (evaluated).
In Haskell it seems that the only way to get results (evaluation) is to add code in a module, and then go to main in another module and add something to invoke and print results.
Is this right? in GHCi I can type expressions and see the results, but can this be automated?
Having to go to the top level main for each test evaluation seems inconvenient to me - maybe just need to shift my paradigm for laziness.
2) in SML I can do pattern matching and unification on a returned result, e.g.
val myTag(x) = somefunct(a,b,c);
and get the value of x after a match.
Can I do something similar in Haskell easily, without writing separate extraction functions?
3) How do I do a constructor with a tuple argument, i.e. uncurried.
in SML:
datatype Thing = Info of Int * Int;
but in Haskell, I tried;
data Thing = Info ( Int Int)
which fails. ("Int is applied to too many arguments in the type:A few Int Int")
The curried version works fine,
data Thing = Info Int Int
but I wanted un-curried.
This question is a bit unclear -- you're asking how to evaluate functions in Haskell?
If it is about inserting debug and tracing into pure code, this is typically only needed for debugging. To do this in Haskell, you can use Debug.Trace.trace, in the base package.
If you're concerned about calling functions, Haskell programs evaluate from main downwards, in dependency order. In GHCi you can, however, import modules and call any top-level function you wish.
You can return the original argument to a function, if you wish, by making it part of the function's result, e.g. with a tuple:
f x = (x, y)
where y = g a b c
Or do you mean to return either one value or another? Then using a tagged union (sum-type), such as Either:
f x = if x > 0 then Left x
else Right (g a b c)
How do I do a constructor with a tuple argument, i.e. uncurried in SML
Using the (,) constructor. E.g.
data T = T (Int, Int)
though more Haskell-like would be:
data T = T Int Bool
and those should probably be strict fields in practice:
data T = T !Int !Bool
Debug.Trace allows you to print debug messages inline. However, since these functions use unsafePerformIO, they might behave in unexpected ways compared to a call-by-value language like SML.
I think the # syntax is what you're looking for here:
data MyTag = MyTag Int Bool String
someFunct :: MyTag -> (MyTag, Int, Bool, String)
someFunct x#(MyTag a b c) = (x, a, b, c) -- x is bound to the entire argument
In Haskell, tuple types are separated by commas, e.g., (t1, t2), so what you want is:
data Thing = Info (Int, Int)
Reading the other answers, I think I can provide a few more example and one recommendation.
data ThreeConstructors = MyTag Int | YourTag (String,Double) | HerTag [Bool]
someFunct :: Char -> Char -> Char -> ThreeConstructors
MyTag x = someFunct 'a' 'b' 'c'
This is like the "let MyTag x = someFunct a b c" examples, but it is a the top level of the module.
As you have noticed, Haskell's top level can defined commands but there is no way to automatically run any code merely because your module has been imported by another module. This is entirely different from Scheme or SML. In Scheme the file is interpreted as being executed form-by-form, but Haskell's top level is only declarations. Thus Libraries cannot do normal things like run initialization code when loaded, they have to provide a "pleaseRunMe :: IO ()" kind of command to do any initialization.
As you point out this means running all the tests requires some boilerplate code to list them all. You can look under hackage's Testing group for libraries to help, such as test-framework-th.
For #2, yes, Haskell's pattern matching does the same thing. Both let and where do pattern matching. You can do
let MyTag x = someFunct a b c
in ...
where MyTag x = someFunct a b c
