CoercibleStrings toy GHC extension

CoercibleStrings toy GHC extension - haskell

I'm trying to write a small GHC extension to get my hands on GHC hacking. As suggested by the GHC gitlab wiki, I started off by having a look at this article by SPJ and SM to get a feel for GHC's architecture.
I now have an idea for what I think would be a relatively small GHC extension that I would like to have a crack at as a first attempt at actually modifying GHC's source code: CoercibleStrings. The outcome would be similar in spirit to that of the existing OverloadedStrings GHC extension, however, the latter I believe "only" infers the type (Data.String.IsString a) => a for a string literal l during type checking and then desugars them to fromString l, where fromString :: (Data.String.IsString a) => String -> a is from the typeclass Data.String.IsString.
This comes in handy when using libraries that make frequent use of alternatives to Haskell's standard String type, particularly if we frequently wish to pass in arguments represented as string literals, e.g. libraries used for terminal IO. For instance, consider the logging library simple-logger, which makes frequent use of Data.Text.Text, and contains the function logError :: (?callStack :: CallStack) => MonadIO m => Text -> m ().
Rather than writing something like:
import qualified Data.Text as T
import Control.Logger.Simple
...
logError . T.pack $ "Some error occured."
Using the OverloadedStrings GHC extension, the last line can be replaced with:
logError $ "Some error occured."
As Data.Text.Text is an instance of the typeclass IsString, this type checks and desugars to something like:
($) logError (fromString "Some error occured.")
This can result in fairly significantly less cluttered code due to these conversions being elided, however, in a fairly similar and common case, where operations applied to the string literal force it to be typed as a String, this benefit is lost. For example, consider:
import qualified Data.Text as T
import Control.Logger.Simple
...
logError . T.pack $ "Some error occured # " ++ (show locus)
Where locus is bound in the enclosing context and has a type that is an instance of the typeclass Show.
Now, due to the type of the concatenation operator, (++) :: [a] -> [a] -> [a], the result of evaluating the expression "Some error occured # " ++ (show locus) must be of type [a], where a is such that [a] is an instance of the typeclass IsString, i.e. String.
Therefore, in this case, the OverloadedStrings extensions does not help us. To remedy this, I propose the CoercibleStrings GHC extension, which would:
Allow the type checker to unify the type String with any instance of the typeclass IsString.
During desugaring, insert an application of the function fromString where these coercions are necessary.
Although the architecture of such an extension would be quite different from that of the OverloadedStrings extension, the idea seems quite straightforward, while providing, I would argue, a fairly large advantage over it. This brings me to the question of why it has yet to be implemented.
There seem to me to be a few potential problems with such an extension that I can think of. The first is that, given that String is an instance of the typeclass IsString, a type inference rule of the form
∀a ∈ IsString. Γ ⊢ e :: String => Γ ⊢ e :: a
can be applied to an expression of type String an unbounded number of times. So it may not be possible to modify GHC's type inference algorithm in such a way as to guarantee termination of type checking given the addition of such a rule. I think however that this issue can be prevented by requiring that a is not String and only allowing GHC to apply this rule if no other rules apply.
The second is from point 2) above, which is where to insert the applications of fromString. The simplest answer I guess would be to apply it to every expression of type String and given that the definition of fromString for String is simply id, any unnecessary applications would likely be optimised away. This may of course lengthen compilation times somewhat. I believe however that this could be targeted better given information from the type checker.
The third is that this type of implicit coercion goes against the philosophy of strongly typed languages such as Haskell, or indeed that these implicit coercions may introduce extra run-time costs that will be invisible to the programmer.
The second and third points are not existential issues that would prevent the extension from being written, however, I am unsure if the solution that I am suggesting to the first point actually works/is reasonably implementable within the existing GHC infrastructure/does not interact badly with other existing extensions.
I'd welcome any comment on these points or any pointers to up-to-date information on GHC's type checking algorithm. Thank you!

Related

Why Haskell treats a stuck application of a type family as a valid type?

Take this code snippet (I changed the type family to a closed one after the various comments and answers mentioned it):
-- a closed type family for which I won't define any instances
type family EmptyTypeFamily t where
-- 1. this type doesn't exist, since there's no `instance EmptyTypeFamily ()`
type NotAType = EmptyTypeFamily ()
-- 2. this value has a type that doesn't exist
untypedValue :: NotAType
untypedValue = undefined
main :: IO ()
main = do
-- 3. the value with no type can be used in expressions
return untypedValue
return . id . fix $ \x -> untypedValue
-- ERROR, finally! No instance for `Show (EmptyTypeFamily ())`
print untypedValue
I find it quite counter intuitive that I can name a stuck application of a type family (1) and even define values for it (2) and use them in expressions (3). Of course I can't define any typeclass instances for it, since the type doesn't even exist. But shouldn't I get an error just by naming that non-type, let alone using it? This makes it harder to detect issues in my code. I read the GHC doc, which mentions "stuck" twice, but doesn't answer my question.
What's the reason why Haskell treats EmptyTypeFamily () as a valid type for which you can even define values and whatnot, when it isn't?

As chi remarked: it's not quite accurate to say your type family is empty, rather it merely happens to not contain any yet-known instances.But I think that's missing the point of the question, so let's discuss instead the type family
type family EmptyTypeFamily t where {}
This one really is empty: you can't write any instances for it.
Still, this causes exactly the same behaviour as you observed. Why?
Well, what GHC does with types during compilation is in some ways similar to how we're dealing with values in all Haskell programs: lazy evaluation and pattern matching. It's well known how the program
f :: Int -> Int
f _ = 624
main :: IO ()
main = print $ f undefined
runs without problems, even though it “evaluates” undefined as the argument to f. It's ok because f doesn't actually need to know anything about its argument.
Similarly, if you write return untypedValue, then the type checker doesn't need to know anything about EmptyTypeFamily () – it just treats it as an abstract entity. It can even unify different type variables to EmptyTypeFamily (). This only requires the assumption that, whatever EmptyTypeFamily is under the hood, it must certainly be deterministic. That much is guaranteed, because the compiler would complain if you wrote two conflicting instances for it.
So, as long as you only use untypedValue with unconstraint-polymorphic functions, it simply doesn't matter that its type doesn't actually exist, because it never needs to be evaluated any further.
In other words, NotAType is purely symbolic. It's a bit like in maths, you can write a theorem starting with “let S be a set and x ∈ S, then bla bla” without actually pinning down what this set is or a value for x.
That changes only as you require additional constraints, like unifying it with a concrete type
f :: Int -> Int
f x = x + 1
main = print $ f untypedValue
• Couldn't match type ‘EmptyTypeFamily ()’ with ‘Int’
Expected type: Int
Actual type: NotAType
• In the first argument of ‘f’, namely ‘untypedValue’
...or, as you observed, use typeclass methods on it. In each case, the compiler has to start with performing an actual pattern match on the type-level value EmptyTypeFamily (). Well, in fact it's a more general process: what the compiler keeps track of are rather propositions. It has a database of knowledge that such and such types are equal, and such and such types match such and such class, and in this case GHC just determines that it doesn't have information which would allow deciding on any known Show instance.
Note that it is actually possible to have a hypothetical context in which that information is available, however nonsensical that is:
{-# LANGUAGE RankNTypes, FlexibleContexts #-}
hypothetically :: (Show NotAType => r) -> ()
hypothetically _ = ()
main = return $ hypothetically (print untypedValue)
In this case, responsibility for proving Show NotAType is deferred to the implementation of hypothetically. As the compiler tackles print untypedValue it just browses through the context and sees that Show NotAType will be proved by whatever code eventually uses the argument to hypothetically. ...Which of course never happens, but the compiler doesn't worry about that when typechecking main.
This, in the maths analogy, is like writing a proof starting with “let x ∈ ℝ such that x2 = -1...” – this is perfectly valid and allows you to prove exciting things. Only, nobody will be able to use the theorem for computing something, because there exists no real number with the required property.

There are two factors that mean GHC needs to be able to contemplate stuck type families without immediately throwing an error and giving up.
The first is, as chi's answer pointed out, type families defined via separate type instance declarations are open. It's theoretically possible for your module to define EmptyTypeFamily, NotAType, and untypedValue and export them all, and for some other module to import your module and add a type instance EmptyTypeFamily () declaration that makes sense of this.
But there are also closed type families, where the set of instances are known statically to any module that has imported the type family at all. It's tempting to say these should at least immediately throw an error if they can't be resolved, and indeed that could conceivably be done in a case like yours where all of the arguments to the type family are fully concrete types. But the second reason GHC needs the concept of stuck type families that aren't an error is that type families are usually used in more complex cases with type variables in play. Consider this:
{-# LANGUAGE TypeFamilies #-}
type family TF t
where TF Bool = Int
TF Char = Bool
foo :: t -> TF t -> TF t
foo = flip const
The TF t in the type of foo is a stuck type family application; GHC can't resolve it here. Whether the declared type for foo is an error or not depends on the variable t, which depends on the context in which foo is used, not just its definition.
So GHC needs this ability to work with things with a type given by a type expression that it cannot actually resolve now. Given that, I'm not overly bothered by it not throwing errors eagerly in cases like the OP (assuming EmptyTypeFamily was rewritten to be a closed family; it definitely shouldn't with the open family). I don't know precisely, but I've always assumed it was either:
It's not possible to make a general procedure for GHC to be able to tell the difference between a stuck application that could potentially be unstuck with more information, and one that definitely can't.
It is possible, but the inconsistency and unpredictability is not considered desirable. Remember that some other GADTs or type families might be able to resolve for any type at all (much like a non-strict function not necessarily needing its argument to be defined), and be applied to a stuck family application; if GHC is distinguishing between "definitely stuck" applications that immediately throw an error and "potentially unstuck-able" applications, the programmer would need to be able to always make the same determination in their head to tell when complex cases are allowable. I'm genuinely unsure whether this could be made pleasant to work with.
It's possible implement and can be made nicely predictable, but it's a special case in the general handling of type families that no one has added yet. The original feature only had open families, I'm fairly sure, so the original guts of type family handling would be written without any real possibility of "definitely stuck" applications existing to be detected.
If you're using closed type families, one thing you can do is add a catch-all case that resolves to a TypeError. Like so:
{-# LANGUAGE DataKinds, TypeFamilies, UndecidableInstances #-}
-- These are in GHC.TypeError as of base 4.17, but the current
-- default on my system is base 4.16. GHC.TypeLits still exports
-- them for backwards compatibility so this will work, but might
-- eventually be deprecated?
import GHC.TypeLits ( TypeError
, ErrorMessage ( Text, ShowType, (:<>:) )
)
type family TF t
where TF t = TypeError (Text "No TF instance for " :<>: ShowType t)
bad :: TF ()
bad = undefined
This helps, but it doesn't catch every possible use of a stuck TF application. It won't catch definitions like bad, because it's still "lazy" about actually evaluating TF (); you told it the type was TF (), and the implementation is undefined which can be any type so it happily unifies with TF () and passes the type check without the compiler ever needing to actually evaluate TF (). But it does catch more cases than a stuck type family application would because it's not stuck: it resolves to a type error. If you have any other binding where it has to infer a type and that depends on the type of bad, it seems to hit the error; even something like boom = bad where you'd think it would similarly be able to unify without any evaluation. And even asking for the type of bad in ghci with :t generates a type error.
At the very least it gives you better error messages than No instance for Show (EmptyTypeFamily ()) because it will resolve the type family to the type error, rather than go looking for an instance that could match a stuck type family and complain about the missing instance.

since there's no instance EmptyTypeFamily ()
There is no such instance, yet.
When compiling a module, we do not (yet) know whether another module actually defines such an instance, hence we must be ready for that to happen. This is called the "open world assumption".
Haskell is meant to allow for separate compilation, i.e. compiling each module individually. Also, Haskell allows each module to add arbitrary instances. Coherently, we can never be sure an instance does not exist during compilation.
To be pedantic, Haskell also allows closed type families, which can not be extended in such a way, but IIRC it handles stuck close type families like stuck open ones. Perhaps it should reject those, after all. In closed type families one can however force rejection by adding an explicit TypeError ".." final case in the definition of the closed type family.

What's the right way to use type aliases in Haskell

I'm a total beginner with Haskell. While trying to solve some practice exercises on hackerrank I stumbled over an error, which made me wonder about doing it "the right way"(tm).
What I was trying to do was this:
import Data.Matrix
newtype Puzzle = Matrix Char
complete :: Puzzle -> Bool
complete p = '-' `elem` (toList p)
[... more functions on 'Matrix Char']
and this gave me:
Couldn't match expected type ‘Matrix Char’
with actual type ‘Puzzle’
In the first argument of ‘toList’, namely ‘p’
In the second argument of ‘elem’, namely ‘(toList p)’
The obvious solution is of course just using Matrix Char instead of Puzzle. But I don't feel that this is an elegant solution. An abstraction into a more specific type feels like the right way to go...

Use type not newtype. The former creates a type alias, the latter is a new type declaration. Specifically newtype is kind of special case for data, where the new type represents a "wrapper" over an existing type (which is a case that can be optimized by the compiler).

I think a better solution than the one offered by Jeffrey's answer, at least in the case of more substantial code bases than a toy game, is to keep using newtype but to change the code to this:
import Data.Matrix
newtype Puzzle = Puzzle (Matrix Char)
complete :: Puzzle -> Bool
complete (Puzzle matrix) = '-' `elem` toList matrix
This will allow you to keep using a truly distinct data type as opposed to resorting to a type alias, which doesn't introduce any new types and will allow completely interchangeable use of Puzzle and Matrix Char with no added type safety (nor expressiveness).
Also, Jeffrey is right in that newtype is more similar to data than type — newtype offers some performance optimisations over data but is more restricted and slightly affects the program evaluation semantics. You're better off reading up on all the various ways of defining types and type aliases in Haskell.
In your case, you might as well substitute data for newtype without changing the behavior of your program; the rest of the program should continue to work identically.
See also: Haskell type vs. newtype with respect to type safety

Haskell: Matching two expressions that are not from class Eq

First of all, I want to clarify that I've tried to find a solution to my problem googling but I didn't succeed.
I need a way to compare two expressions. The problem is that these expressions are not comparable. I'm coming from Erlang, where I can do :
case exp1 of
exp2 -> ...
where exp1 and exp2 are bound. But Haskell doesn't allow me to do this. However, in Haskell I could compare using ==. Unfortunately, their type is not member of the class Eq. Of course, both expressions are unknown until runtime, so I can't write a static pattern in the source code.
How could compare this two expressions without having to define my own comparison function? I suppose that pattern matching could be used here in some way (as in Erlang), but I don't know how.
Edit
I think that explaining my goal could help to understand the problem.
I'm modyfing an Abstract Syntax Tree (AST). I am going to apply a set of rules that are going to modify this AST, but I want to store the modifications in a list. The elements of this list should be a tuple with the original piece of the AST and its modification. So the last step is to for each tuple search for a piece of the AST that is exactly the same, and substitute it by the second element of the tuple. So, I will need something like this:
change (old,new):t piece_of_ast =
case piece_of_ast of
old -> new
_ -> piece_of_ast
I hope this explanation clarify my problem.
Thanks in advance

It's probably an oversight in the library (but maybe I'm missing a subtle reason why Eq is a bad idea!) and I would contact the maintainer to get the needed Eq instances added in.
But for the record and the meantime, here's what you can do if the type you want to compare for equality doesn't have an instance for Eq, but does have one for Data - as is the case in your question.
The Data.Generics.Twins package offers a generic version of equality, with type
geq :: Data a => a -> a -> Bool
As the documentation states, this is 'Generic equality: an alternative to "deriving Eq" '. It works by comparing the toplevel constructors and if they are the same continues on to the subterms.
Since the Data class inherits from Typeable, you could even write a function like
veryGenericEq :: (Data a, Data b) => a -> b -> Bool
veryGenericEq a b = case (cast a) of
Nothing -> False
Maybe a' -> geq a' b
but I'm not sure this is a good idea - it certainly is unhaskelly, smashing all types into one big happy universe :-)
If you don't have a Data instance either, but the data type is simple enough that comparing for equality is 100% straightforward then StandaloneDeriving is the way to go, as #ChristianConkle indicates. To do this you need to add a {-# LANGUAGE StandaloneDeriving #-} pragma at the top of your file and add a number of clauses
deriving instance Eq a => Eq (CStatement a)
one for each type CStatement uses that doesn't have an Eq instance, like CAttribute. GHC will complain about each one you need, so you don't have to trawl through the source.
This will create a bunch of so-called 'orphan instances.' Normally, an instance like instance C T where will be defined in either the module that defines C or the module that defines T. Your instances are 'orphans' because they're separated from their 'parents.' Orphan instances can be bad because you might start using a new library which also has those instances defined; which instance should the compiler use? There's a little note on the Haskell wiki about this issue. If you're not publishing a library for others to use, it's fine; it's your problem and you can deal with it. It's also fine for testing; if you can implement Eq, then the library maintainer can probably include deriving Eq in the library itself, solving your problem.

I'm not familiar with Erlang, but Haskell does not assume that all expressions can be compared for equality. Consider, for instance, undefined == undefined or undefined == let x = x in x.
Equality testing with (==) is an operation on values. Values of some types are simply not comparable. Consider, for instance, two values of type IO String: return "s" and getLine. Are they equal? What if you run the program and type "s"?
On the other hand, consider this:
f :: IO Bool
f = do
x <- return "s" -- note that using return like this is pointless
y <- getLine
return (x == y) -- both x and y have type String.
It's not clear what you're looking for. Pattern matching may be the answer, but if you're using a library type that's not an instance of Eq, the likely answer is that comparing two values is actually impossible (or that the library author has for some reason decided to impose that restriction).
What types, specifically, are you trying to compare?
Edit: As a commenter mentioned, you can also compare using Data. I don't know if that is easier for you in this situation, but to me it is unidiomatic; a hack.
You also asked why Haskell can't do "this sort of thing" automatically. There are a few reasons. In part it's historical; in part it's that, with deriving, Eq satisfies most needs.
But there's also an important principle in play: the public interface of a type ideally represents what you can actually do with values of that type. Your specific type is a bad example, because it exposes all its constructors, and it really looks like there should be an Eq instance for it. But there are many other libraries that do not expose the implementation of their types, and instead require you to use provided functions (and class instances). Haskell allows that abstraction; a library author can rely on the fact that a consumer can't muck about with implementation details (at least without using unsafe tools like unsafeCoerce).

Compile time and run time difference between type and newtype

What is the difference, at various stages of the read-compile-run pipeline, between a type declaration and a newtype declaration?
My assumption was that they compiled down to the same machine instructions, and that the only difference was when the program is typechecked, where for example
type Name = String
newtype Name_ = N String
You can use a Name anywhere a String is required, but the typechecker will call you out if you use a Name_ where a String is expected, even though they encode the same information.
I'm asking the question because, if this is the case, I don't see any reason why the following declarations shouldn't be valid:
type List a = Either () (a, List a)
newtype List_ a = L (Either () (a, List_ a))
However, the type checker accepts the second one but rejects the first. Why is that?

Luqui's comment should be an answer. Type synonym's in Haskell are to first approximation nothing more than macros. That is, they are expanded by the type checker into fully evaluated types. The type checker can not handle infinite types, so Haskell does not have equi-recursive types.
newtypes provide you iso-recursive types that, in GHC, essentially compile down to equi-recursive types in the core language. Haskell is not GHC core, so you don't have access to such types. Equi-recursive types are just a bit harder to work with both for type checkers and humans, while iso-recursive types have equivalent power.

How to create a Haskell function that would introduce a new type?

I'm currently writing an expression parser. I've done the lexical and syntactic analysis and now I'm checking the types. I have the expression in a data structire like this (simplified version):
data Expr = EBinaryOp String Expr Expr
| EInt Int
| EFloat Float
And now I need a function which would convert this to a new type, say TypedExpr, which would also contain type information. And now my main problem is, how this type should look like. I have two ideas - with type parameter:
data TypedExpr t = TEBinaryOp (TBinaryOp a b t) (TExpr a) (TExpr b)
| TEConstant t
addTypes :: (ExprType t) => Expr -> TypedExpr t
or without:
data TypedExpr = TEBinaryOp Type BinaryOp TypedExpr TypedExpr
| TEConstant Type Dynamic
addTypes :: Expr -> TypedExpr
I started with the first option, but I ran into problems, because this approach assumes that you know type of the expression before parsing it (for me, it's true in most cases, but not always). However, I like it, because it lets me use Haskell's type system and check for most errors at compile time.
Is it possible to do it with the first option?
Which one would you choose? Why?
What problems should I expect with each option?

The type of your function
addTypes :: Expr -> TypedExpr t
is wrong, because it would mean that you get a TypedExpr t for any t you like. In contrast, what you actually want is one particular t that is determined by the argument of type Expr.
This reasoning already shows that you are going beyond the capabilities of the Hindley-Milner type system. After all, the return type of addTypes should depend on the value of the argument, but in plain Haskell 2010, types may not depend on values. Hence, you need an extension of the type system that brings you closer to dependent types. In Haskell, generalized algebraic data types (GADTs) can do that.
For a first introduction to GADTs, see also my video on GADTs.
However, after becoming familiar with GADTs, you still have the problem of parsing an untyped expression into a typed one, i.e. to write a function
addTypes :: Expr -> (exists t. TypedExpr t)
Of course, you have to perform some type checking yourself, but even then, it is not easy to convince the Haskell compiler that your type checks (which happen on the value level) can be lifted to the type level. Fortunately, other people have already thought about it, see for example the following message in the haskell-cafe mailing list:
Edward Kmett.
Re: Manual Type-Checking to provide Read instances for GADTs.
(was Re: [Haskell-cafe] Read instance for GATD)
http://article.gmane.org/gmane.comp.lang.haskell.cafe/76466
(Does anyone know of a formally published / nicely written up reference?)

I have recently started using tagless-final syntax for embedded DSL's, and I've found it to be much nicer than the standard GADT method (which you're heading towards, and Apfelmus describes).
The key to tagless-final syntax is that instead of using an expression data type, you represent operations with a type class. For functions like your eBinaryOp, I've found it best to use two classes:
class Repr repr where
eInt :: repr Int
eFloat :: repr Float
class Repr repr => BinaryOp repr a b c where
eBinaryOp :: String -> repr a -> repr b -> repr c
I would make separate BinaryOp functions rather than use a String though.
There's a lot more information on Oleg's web page, including a parser that uses Haskell's type system.

Since you're doing the parsing at runtime, not compile time, you can't piggy back off of Haskell's type system (unless you import the relevant modules and manually call it yourself.)
You may want to turn to TAPL’s ML examples of type checkers for a simple lambda calculus for inspiration. http://www.cis.upenn.edu/~bcpierce/tapl/ (under implementations). They do a bit more than your expression parser, since you don’t support lambdas.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string