Source-level definition of `seq` in Haskell - haskell

I'm trying to understand seq in Haskell. I've got some understanding of seq in English, so now I'd like to read its implementation.
However, the source code says
infixr 0 `seq`
seq :: a -> b -> b
seq = seq
How can I read this? It this an infinitely recursive definition? I suspect it's not, though.

seq cannot be implemented in plain Haskell. That is a "dummy" definition. It is "replaced" with the actual seq behavior by GHC. So the real implementation (so to speak) of seq is inside the compiler.
To put it another way: If we ignore the name of the function in that code snippet, that definition has no relationship to seq. The compiler sees the name "seq" (and maybe the fact that it's defined in that specific module; I'm not sure about the details) and that tells it to replace the definition with the actual seq behavior during compilation.

The file you linked starts with
{- This is a generated file (generated by genprimopcode). It is not
code to actually be used. Its only purpose is to be consumed by
haddock. -}
hence it is only generated so that it can in turn cause documentation to be generated.
Every single definition in the file is given by a dummy recursion x = x. That is not the real definition, but only a placeholder.
Haskell's primitives in GHC.Prim like seq can not be simply implemented in a Haskell module (otherwise they would not be called primitives). Instead, they are handled in a special way during compilation.

Related

What is the Unit type?

Normally in Haskell, tuples of length one aren't allowed (AFAIK). However, when messing with Template Haskell, I got this:
oneElementTuple = $(do{
x <- newName "x";
return $ LamE
[VarP x]
(TupE [Just (VarE x)]) -- one element tuple?
})
GHCi tells me that oneElementTuple is of type a -> Unit a. I couldn't find any documentation on this Unit type, and it doesn't seem to be an instance of any basic typeclasses like Show or Functor. So, where is Unit defined, if it isn't just built-in magic? Is it at all useful?
There's a Unit type defined in GHC.Tuple.
Quoting the source:
-- The desugarer uses 1-tuples,
-- but "()" is already used up for 0-tuples
-- See Note [One-tuples] in TysWiredIn
data Unit a = Unit a
Nothing fancy. It has no special tuple syntax. It looks like TH uses this type when one tries to make a one-tuple as you did.
Note that since this type in inside a GHC. module, it is considered low-level. Normally, when needing one-tuples in everyday programming, one uses the Identity newtype instead (which also avoids the additional lifting of Unit).

Why does Haskell hide functions with the same name but different type signatures?

Suppose I was to define (+) on Strings but not by giving an instance of Num String.
Why does Haskell now hide Nums (+) function? After all, the function I have provided:
(+) :: String -> String -> String
can be distinguished by the compiler from Prelude's (+). Why can't both functions exist in the same namespace, but with different, non-overlapping type signatures?
As long as there is no call to the function in the code, Haskell to care that there's an ambiguitiy. Placing a call to the function with arguments will then determine the types, such that appropriate implementation can be chosen.
Of course, once there is an instance Num String, there would actually be a conflict, because at that point Haskell couldn't decide based upon the parameter type which implementation to choose, if the function were actually called.
In that case, an error should be raised.
Wouldn't this allow function overloading without pitfalls/ambiguities?
Note: I am not talking about dynamic binding.
Haskell simply does not support function overloading (except via typeclasses). One reason for that is that function overloading doesn't work well with type inference. If you had code like f x y = x + y, how would Haskell know whether x and y are Nums or Strings, i.e. whether the type of f should be f :: Num a => a -> a -> a or f :: String -> String -> String?
PS: This isn't really relevant to your question, but the types aren't strictly non-overlapping if you assume an open world, i.e. in some module somewhere there might be an instance for Num String, which, when imported, would break your code. So Haskell never makes any decisions based on the fact that a given type does not have an instance for a given typeclass. Of course, function definitions hide other function definitions with the same name even if there are no typeclasses involved, so as I said: not really relevant to your question.
Regarding why it's necessary for a function's type to be known at the definition site as opposed to being inferred at the call-site: First of all the call-site of a function may be in a different module than the function definition (or in multiple different modules), so if we had to look at the call site to infer a function's type, we'd have to perform type checking across module boundaries. That is when type checking a module, we'd also have to go all through the modules that import this module, so in the worst case we have to recompile all modules every time we change a single module. This would greatly complicate and slow down the compilation process. More importantly it would make it impossible to compile libraries because it's the nature of libraries that their functions will be used by other code bases that the compiler does not have access to when compiling the library.
As long as the function isn't called
At some point, when using the function
no no no. In Haskell you don't think of "before" or "the minute you do...", but define stuff once and for all time. That's most apparent in the runtime behaviour of variables, but also translates to function signatures and class instances. This way, you don't have to do all the tedious thinking about compilation order and are safe from the many ways e.g. C++ templates/overloads often break horribly because of one tiny change in the program.
Also, I don't think you quite understand how Hindley-Milner works.
Before you call the function, at which time you know the type of the argument, it doesn't need to know.
Well, you normally don't know the type of the argument! It may sometimes be explicitly given, but usually it's deduced from the other argument or the return type. For instance, in
map (+3) [5,6,7]
the compiler doesn't know what types the numeric literals have, it only knows that they are numbers. This way, you can evaluate the result as whatever you like, and that allows for things you could only dream of in other languages, for instance a symbolic type where
> map (+3) [5,6,7] :: SymbolicNum
[SymbolicPlus 5 3, SymbolicPlus 6 3, SymbolicPlus 7 3]

Is everything in Haskell stored in thunks, even simple values?

What do the thunks for the following value/expression/function look like in the Haskell heap?
val = 5 -- is `val` a pointer to a box containing 5?
add x y = x + y
result = add 2 val
main = print $ result
Would be nice to have a picture of how these are represented in Haskell, given its lazy evaluation mode.
Official answer
It's none of your business. Strictly implementation detail of your compiler.
Short answer
Yes.
Longer answer
To the Haskell program itself, the answer is always yes, but the compiler can and will do things differently if it finds out that it can get away with it, for performance reasons.
For example, for '''add x y = x + y''', a compiler might generate code that works with thunks for x and y and constructs a thunk as a result.
But consider the following:
foo :: Int -> Int -> Int
foo x y = x * x + y * y
Here, an optimizing compiler will generate code that first takes x and y out of their boxes, then does all the arithmetic, and then stores the result in a box.
Advanced answer
This paper describes how GHC switched from one way of implementing thunks to another that was actually both simpler and faster:
http://research.microsoft.com/en-us/um/people/simonpj/papers/eval-apply/
In general, even primitive values in Haskell (e.g. of type Int and Float) are represented by thunks. This is indeed required by the non-strict semantics; consider the following fragment:
bottom :: Int
bottom = div 1 0
This definition will generate a div-by-zero exception only if the value of bottom is inspected, but not if the value is never used.
Consider now the add function:
add :: Int -> Int -> Int
add x y = x+y
A naive implementation of add must force the thunk for x, force the thunk for y, add the values and create an (evaluated) thunk for the result. This is a huge overhead for arithmetic compared to strict functional languages (not to mention imperative ones).
However, an optimizing compiler such as GHC can mostly avoid this overhead; this is a simplified view of how GHC translates the add function:
add :: Int -> Int -> Int
add (I# x) (I# y) = case# (x +# y) of z -> I# z
Internally, basic types like Int is seen as datatype with a single constructor. The type Int# is the "raw" machine type for integers and +# is the primitive addition on raw types.
Operations on raw types are implemented directly on bit-patterns (e.g. registers) --- not thunks. Boxing and unboxing are then translated as constructor application and pattern matching.
The advantage of this approach (not visible in this simple example) is that the compiler is often capable of inlining such definitions and removing intermediate boxing/unboxing operations, leaving only the outermost ones.
It would be absolutely correct to wrap every value in a thunk. But since Haskell is non-strict, compilers can choose when to evaluate thunks/expressions. In particular, compilers can choose to evaluate an expression earlier than strictly necessary, if it results in better code.
Optimizing Haskell compilers (GHC) perform Strictness analysis to figure out, which values will always be computed.
In the beginning, the compiler has to assume, that none of a function's arguments are ever used. Then it goes over the body of the function and tries to find functions applications that 1) are known to be strict in (at least some of) their arguments and 2) always have to be evaluated to compute the function's result.
In your example, we have the function (+) that is strict in both it's arguments. Thus the compiler knows that both x and y are always required to be evaluated at this point.
Now it just so happens, that the expression x+y is always necessary to compute the function's result, therefore the compiler can store the information that the function add is strict in both x and y.
The generated code for add* will thus expect integer values as parameters and not thunks. The algorithm becomes much more complicated when recursion is involved (a fixed point problem), but the basic idea remains the same.
Another example:
mkList x y =
if x then y : []
else []
This function will take x in evaluated form (as a boolean) and y as a thunk. The expression x needs to be evaluated in every possible execution path through mkList, thus we can have the caller evaluate it. The expression y, on the other hand, is never used in any function application that is strict in it's arguments. The cons-function : never looks at y it just stores it in a list. Thus y needs to be passed as a thunk in order to satisfy the lazy Haskell semantics.
mkList False undefined -- absolutely legal
*: add is of course polymorphic and the exact type of x and y depends on the instantiation.
Short answer: Yes.
Long answer:
val = 5
This has to be stored in a thunk, because imagine if we wrote this anywhere in our code (like, in a library we imported or something):
val = undefined
If this has to be evaluated when our program starts, it would crash, right? If we actually use that value for something, that would be what we want, but if we don't use it, it shouldn't be able to influence our program so catastrophically.
For your second example, let me change it a little:
div x y = x / y
This value has to be stored in a thunk as well, because imagine some code like this:
average list =
if null list
then 0
else div (sum list) (length list)
If div was strict here, it would be evaluated even when the list is null (aka. empty), meaning that writing the function like this wouldn't work, because it wouldn't have a chance to return 0 when given the empty list, even though this is what we would want in this case.
Your final example is just a variation of example 1, and it has to be lazy for the same reasons.
All this being said, it is possible to force the compiler to make some values strict, but that goes beyond the scope of this question.
I think the others answered your question nicely, but just for completeness's sake let me add that GHC offers you the possibility of using unboxed values directly as well. This is what Haskell Wiki says about it:
When you are really desperate for speed, and you want to get right down to the “raw bits.” Please see GHC Primitives for some information about using unboxed types.
This should be a last resort, however, since unboxed types and primitives are non-portable. Fortunately, it is usually not necessary to resort to using explicit unboxed types and primitives, because GHC's optimiser can do the work for you by inlining operations it knows about, and unboxing strict function arguments. Strict and unpacked constructor fields can also help a lot. Sometimes GHC needs a little help to generate the right code, so you might have to look at the Core output to see whether your tweaks are actually having the desired effect.
One thing that can be said for using unboxed types and primitives is that you know you're writing efficient code, rather than relying on GHC's optimiser to do the right thing, and being at the mercy of changes in GHC's optimiser down the line. This may well be important to you, in which case go for it.
As mentioned, it's non-portable, so you need a GHC language extension. See here for their docs.

In Haskell, is there some way to forcefully coerce a polymorphic call?

I have a list of values (or functions) of any type. I have another list of functions of any type. The user at runtime will choose one from the first list, and another from the second list. I have a mechanism to ensure that the two items are type compatible (value or output from first is compatible with input of second).
I need some way to call the function with the value (or compose the functions). If the second function has concrete types, unsafeCoerce works fine. But if it's of the form:
polyFunc :: MyTypeclass a => a -> IO ()
polyFunc x = print . show . typeclassFunc x
Then unsafeCoerce doesn't work since it can't resolve to a concrete type.
Is there any way to do what I'm trying to do?
Here's an example of what the lists might look like. However... I'm not limited to this, if there is some other way to represent these that will solve the problem, I would like to know. A critical thing to consider is that: the list can change at runtime so I do not know at compile time all the possible types that might be involved.
data Wrapper = forall a. Wrapper a
firstList :: [Wrapper]
firstList = [Wrapper "blue", Wrapper 5, Wrapper valueOfMyTypeclass]
data OtherWrapper = forall a. Wrapper (a -> IO ())
secondList :: [OtherWrapper]
secondList = [OtherWrapper print, OtherWrapper polyFunc]
Note: As for why I want to do such a crazy thing:
I'm generating code and typechecking it with hint. But that happens at runtime. The problem is that hint is slow at actually executing things and high performance for this is critical. Also, at least in certain cases, I do not want to generate code and run it through ghc at runtime (though we have done some of that, too). So... I'm trying to find somewhere in the middle: dynamically hook things together without having to generate code and compile, but run it at compiled speed instead of interpreted.
Edit: Okay, so now that I see what's going on a bit more, here's a very general approach -- don't use polymorphic functions directly at all! Instead, use functions of type Dynamic -> IO ()! Then, they can use "typecase"-style dispatch directly to choose which monomorphic function to invoke -- i.e. just switching on the TypeRep. You do have to encode this dispatch directly for each polymorphic function you're wrapping. However, you can automate this with some template Haskell if it becomes enough of a hassle.
Essentially, rather than overloading Haskell's polymorphism, just as Dynamic embeds an dynamically typed language in a statically typed language, you now extend that to embed dynamic polymorphism in a statically typed language.
--
Old answer: More code would be helpful. But, as far as I can tell, this is the read/show problem. I.e. You have a function that produces a polymorphic result, and a function that takes a polymorphic input. The issue is that you need to pick what the intermediate value is, such that it satisfies both constraints. If you have a mechanism to do so, then the usual tricks will work, making sure you satisfy that open question which the compiler can't know the answer to.
I'm not sure that I completely understand your question. But since you have value and function which have compatible types you could combine them into single value. Then compiler could prove that types do match.
{-# LANGUAGE ExistentialQuantification #-}
data Vault = forall a . Vault (a -> IO ()) a
runVault :: Vault -> IO ()
runVault (Vault f x) = f xrun

How do you do generic programming in Haskell?

Coming from C++, I find generic programming indispensable. I wonder how people approach that in Haskell?
Say how do write generic swap function in Haskell?
Is there an equivalent concept of partial specialization in Haskell?
In C++, I can partially specialize the generic swap function with a special one for a generic map/hash_map container that has a special swap method for O(1) container swap. How do you do that in Haskell or what's the canonical example of generic programming in Haskell?
This is closely related to your other question about Haskell and quicksort. I think you probably need to read at least the introduction of a book about Haskell. It sounds as if you haven't yet grasped the key point about it which is that it bans you from modifying the values of existing variables.
Swap (as understood and used in C++) is, by its very nature, all about modifying existing values. It's so we can use a name to refer to a container, and replace that container with completely different contents, and specialize that operation to be fast (and exception-free) for specific containers, allowing us to implement a modify-and-publish approach (crucial for writing exception-safe code or attempting to write lock-free code).
You can write a generic swap in Haskell, but it would probably take a pair of values and return a new pair containing the same values with their positions reversed, or something like that. Not really the same thing, and not having the same uses. It wouldn't make any sense to try and specialise it for a map by digging inside that map and swapping its individual member variables, because you're just not allowed to do things like that in Haskell (you can do the specialization, but not the modifying of variables).
Suppose we wanted to "measure" a list in Haskell:
measure :: [a] -> Integer
That's a type declaration. It means that the function measure takes a list of anything (a is a generic type parameter because it starts with a lowercase letter) and returns an Integer. So this works for a list of any element type - it's what would be called a function template in C++, or a polymorphic function in Haskell (not the same as a polymorphic class in C++).
We can now define that by providing specializations for each interesting case:
measure [] = 0
i.e. measure the empty list and you get zero.
Here's a very general definition that covers all other cases:
measure (h:r) = 1 + measure r
The bit in parentheses on the LHS is a pattern. It means: take a list, break off the head and call it h, call the remaining part r. Those names are then parameters we can use. This will match any list with at least one item on it.
If you've tried template metaprogramming in C++ this will all be old hat to you, because it involves exactly the same style - recursion to do loops, specialization to make the recursion terminate. Except that in Haskell it works at runtime (specialization of the function for particular values or patterns of values).
As Earwicker sais, the example is not as meaningful in Haskell. If you absolutely want to have it anyway, here is something similar (swapping the two parts of a pair), c&p from an interactive session:
GHCi, version 6.8.2: http://www.haskell.org/ghc/ :? for help
Loading package base ... linking ... done.
Prelude> let swap (a,b) = (b,a)
Prelude> swap("hello", "world")
("world","hello")
Prelude> swap(1,2)
(2,1)
Prelude> swap("hello",2)
(2,"hello")
In Haskell, functions are as generic (polymorphic) as possible - the compiler will infer the "Most general type". For example, TheMarko's example swap is polymorphic by default in the absence of a type signature:
*Main> let swap (a,b) = (b,a)
*Main> :t swap
swap :: (t, t1) -> (t1, t)
As for partial specialization, ghc has a non-98 extension:
file:///C:/ghc/ghc-6.10.1/doc/users_guide/pragmas.html#specialize-pragma
Also, note that there's a mismatch in terminology. What's called generic in c++, Java, and C# is called polymorphic in Haskell. "Generic" in Haskell usually means polytypic:
http://haskell.readscheme.org/generic.html
But, aboe i use the c++ meaning of generic.
In Haskell you would create type classes. Type classes are not like classes in OO languages. Take the Numeric type class It says that anything that is an instance of the class can perform certain operations(+ - * /) so Integer is a member of Numeric and provides implementations of the functions necessary to be considered Numeric and can be used anywhere a Numeric is expected.
Say you want to be able to foo Ints and Strings. Then you would declare Int and String to be
instances of the type class Foo. Now anywhere you see the type (Foo a) you can now use Int or String.
The reason why you can't add ints and floats directly is because add has the type (Numeric a) a -> a -> a a is a type variable and just like regular variables it can only be bound once so as soon as you bind it to Int every a in the list must be Int.
After reading enough in a Haskell book to really understand Earwicker's answer I'd suggest you also read about type classes. I'm not sure what “partial specialization” means, but it sounds like they could come close.

Resources