Proving "no corruption" in Haskell - haskell

I work in a safety-critical industry, and our software projects generally have safety requirements imposed; things that we have to demonstrate that the software does to a high degree of certainty. Often these are negatives, such as " shall not corrupt more frequently than 1 in ". (I should add that these requirements come from statistical system safety requirements).
One source of corruption is clearly coding errors, and I would like to use the Haskell type system to exclude at least some classes of these errors. Something like this:
First, here is our critical data item that must not be corrupted.
newtype Critical = Critical String
Now I want to store this item in some other structures.
data Foo = Foo Integer Critical
data Bar = Bar String Critical
Now I want to write a conversion function from Foo to Bar which is guaranteed not to mess with the Critical data.
goodConvert, badConvert :: Foo -> Bar
goodConvert (Foo n c) = Bar (show n) c
badConvert (Foo n (Critical s)) = Bar (show n) (Critical $ "Bzzt - " ++ s)
I want "goodConvert" to type check, but "badConvert" to fail type checking.
Obviously I can carefully not import the Critical constructor into the module that does conversion. But it would be much better if I could express this property in the type, because then I can compose up functions that are guaranteed to preserve this property.
I've tried adding phantom types and "forall" in various places, but that doesn't help.
One thing that would work would be to not export the Critical constructor, and then have
mkCritical :: String -> IO Critical
Since the only place that these Critical data items get created is in the input functions, this makes some sense. But I'd prefer a more elegant and general solution.
Edit
In the comments FUZxxl suggested a look at Safe Haskell. This looks like the best solution. Rather than adding a "no corruption" modifier at the type level as I originally wanted, it looks like you can do it at the module level, like this:
1: Create a module "Critical" that exports all the features of the Critical data type, including its constructor. Mark this module as "unsafe" by putting "{-# LANGUAGE Unsafe #-}" in the header.
2: Create a module "SafeCritical" that re-exports everything except the constructor and any other functions that might be used to corrupt a critical value. Mark this module as "trustworthy".
3: Mark any modules that are required to handle Critical values without corruption as "safe". Then use this to demonstrate that any function imported as "safe" cannot cause corruption to a Critical value.
This will leave a smaller minority of code, such as input code that parses Critical values, requiring further verification. We can't eliminate this code, but reducing the amount that needs detailed verification is still a significant win.
The method is based on the fact that a function cannot invent a new value unless a function returns it. If a function only gets one Critical value (as in the "convert" function above) then that is the only one it can return.
A harder variation of the problem comes when a function has two or more Critical values of the same type; it has to guarantee not to mix them up. For instance,
swapFooBar :: (Foo, Bar) -> (Bar, Foo)
swapFooBar (Foo n c1, Bar s c2) = (Bar s c1, Foo n c2)
However this can be handled by giving the same treatment to the containing data structures.

You can use parametricity to get partway there
data Foo c = Foo Integer c
data Bar c = Bar String c
goodConvert :: Foo c -> Bar c
goodConvert (Foo n c) = Bar (show n) c
Since c is an unconstrained type variable, you know that the function goodConvert cannot know anything about c, and therefore cannot construct a different value of that type. It has to use the one provided in the input.
Well, almost. Bottom values allow you to break this guarantee. However, you at least know that if you try to use a "corrupted" value, it will result in an exception (or non-termination).
badConvert :: Foo c -> Bar c
badConvert (Foo n c) = Bar (show n) undefined

While hammar's solution is excellent and I would normally suggest smart constructors / not exporting the constructor, today I decided to try solving this in the Coq proof assistant and extracting to Haskell.
Take note! I am not very well versed in Coq / extraction. Some people have done good work with proving and extracting Haskell code, so look to them for quality examples - I'm just toying!
First we want to define your data types. In Coq this looks much like Haskell GADTs:
Require Import String.
Require Import ZArith.
Inductive Critical :=
Crit : string -> Critical.
Inductive FooT :=
Foo : Z -> Critical -> FooT.
Inductive BarT :=
Bar : string -> Critical -> BarT.
Think of those Inductive lines, such as Inductive FooT := Foo : ... ., as data type declarations: data FooT = Foo Integer Critical
For ease of use, lets get some field accessors:
Definition critF f := match f with Foo _ c => c end.
Definition critB b := match b with Bar _ c => c end.
Since Coq doesn't define many "show" style functions, I'll use a placeholder for showing integers.
Definition ascii_of_Z (z : Z) : string := EmptyString. (* FIXME *)
Now we've got the basics, lets define the goodConvert function!
Definition goodConvert (foo : FooT) : BarT :=
match foo with
Foo n c => Bar (ascii_of_Z n) c
end.
That's all fairly obvious - it's your convert function but in Coq and using a case like statement instead of top-level pattern matching. But how do we know this function is actually going to maintain the invariant? We prove it!
Lemma convertIsGood : forall (f : FooT) (b : BarT),
goodConvert f = b -> critF f = critB b.
Proof.
intros.
destruct f. destruct b.
unfold goodConvert in H. simpl.
inversion H. reflexivity.
Qed.
That says that if converting f results in b then the critical field of f must be the same as the critical field of b (assuming some minor things, such as you not messing up the field accessor implementations).
Now lets extract this to Haskell!
Extraction Language Haskell.
Extract Constant ascii_of_Z => "Prelude.show". (* obviously, all sorts of unsafe and incorrect behavior can be introduced by your extraction *)
Extract Inductive string => "Prelude.String" ["[]" ":"]. Print positive.
Extract Inductive positive => "Prelude.Integer" ["`Data.Bits.shiftL` 1 + 1" "`Data.Bits.shiftL` 1" "1"].
Extract Inductive Z => "Prelude.Integer" ["0" "" ""].
Extraction "so.hs" goodConvert critF critB.
Producing:
module So where
import qualified Prelude
data Bool =
True
| False
data Ascii0 =
Ascii Bool Bool Bool Bool Bool Bool Bool Bool
type Critical =
Prelude.String
-- singleton inductive, whose constructor was crit
data FooT =
Foo Prelude.Integer Critical
data BarT =
Bar Prelude.String Critical
critF :: FooT -> Critical
critF f =
case f of {
Foo z c -> c}
critB :: BarT -> Critical
critB b =
case b of {
Bar s c -> c}
ascii_of_Z :: Prelude.Integer -> Prelude.String
ascii_of_Z z =
[]
goodConvert :: FooT -> BarT
goodConvert foo =
case foo of {
Foo n c -> Bar (ascii_of_Z n) c}
Can we run it?? Does it work?
> critB $ goodConvert (Foo 32 "hi")
"hi"
Great! If anyone has suggestions for me, even though this is an "answer", I'm all ears. I'm not sure how to drop the dead code of things like Ascii0 or Bool, not to mention make good show instances. If anyone's curious, I think the field names can be done automatically if I used a Record instead of an Inductive, but that might make this post syntactically uglier.

I think the solution of hiding constructors is idiomatic. You can export two functions:
mkCritical :: String -> D Critical
extract :: Critical -> String
where D is the trivial monad, or any other. Any function that creates objects of type Critical at some point is marked with D. A function without that D can extract data from Critical objects, but not create new ones.
Alternatively:
data C a = C a Critical
modify :: (a -> String -> b) -> C a -> C b
modify f (C x (Critical y)) = C (f x y) (Critical y)
If you don't export constructor C, only modify, you can write:
goodConvert :: C Int -> C String
goodConvert = modify (\(a, _) -> show a)
but badConvert is impossible to write.

Related

Parsing to Free Monads

Say I have the following free monad:
data ExampleF a
= Foo Int a
| Bar String (Int -> a)
deriving Functor
type Example = Free ExampleF -- this is the free monad want to discuss
I know how I can work with this monad, eg. I could write some nice helpers:
foo :: Int -> Example ()
foo i = liftF $ Foo i ()
bar :: String -> Example Int
bar s = liftF $ Bar s id
So I can write programs in haskell like:
fooThenBar :: Example Int
fooThenBar =
do
foo 10
bar "nice"
I know how to print it, interpret it, etc. But what about parsing it?
Would it be possible to write a parser that could parse arbitrary
programs like:
foo 12
bar nice
foo 11
foo 42
So I can store them, serialize them, use them in cli programs etc.
The problem I keep running into is that the type of the program depends on which program is being parsed. If the program ends with a foo it's of
type Example () if it ends with a bar it's of type Example Int.
I do not feel like writing parsers for every possible permutation (it's simple here because there are only two possibilities, but imagine we add
Baz Int (String -> a), Doo (Int -> a), Moz Int a, Foz String a, .... This get's tedious and error-prone).
Perhaps I'm solving the wrong problem?
Boilerplate
To run the above examples, you need to add this to the beginning of the file:
{-# LANGUAGE DeriveFunctor #-}
import Control.Monad.Free
import Text.ParserCombinators.Parsec
Note: I put up a gist containing this code.
Not every Example value can be represented on the page without reimplementing some portion of Haskell. For example, return putStrLn has a type of Example (String -> IO ()), but I don't think it makes sense to attempt to parse that sort of Example value out of a file.
So let's restrict ourselves to parsing the examples you've given, which consist only of calls to foo and bar sequenced with >> (that is, no variable bindings and no arbitrary computations)*. The Backus-Naur form for our grammar looks approximately like this:
<program> ::= "" | <expr> "\n" <program>
<expr> ::= "foo " <integer> | "bar " <string>
It's straightforward enough to parse our two types of expression...
type Parser = Parsec String ()
int :: Parser Int
int = fmap read (many1 digit)
parseFoo :: Parser (Example ())
parseFoo = string "foo " *> fmap foo int
parseBar :: Parser (Example Int)
parseBar = string "bar " *> fmap bar (many1 alphaNum)
... but how can we give a type to the composition of these two parsers?
parseExpr :: Parser (Example ???)
parseExpr = parseFoo <|> parseBar
parseFoo and parseBar have different types, so we can't compose them with <|> :: Alternative f => f a -> f a -> f a. Moreover, there's no way to know ahead of time which type the program we're given will be: as you point out, the type of the parsed program depends on the value of the input string. "Types depending on values" is called dependent types; Haskell doesn't feature a proper dependent type system, but it comes close enough for us to have a stab at making this example work.
Let's start by forcing the expressions on either side of <|> to have the same type. This involves erasing Example's type parameter using existential quantification.†
data Ex a = forall i. Wrap (a i)
parseExpr :: Parser (Ex Example)
parseExpr = fmap Wrap parseFoo <|> fmap Wrap parseBar
This typechecks, but the parser now returns an Example containing a value of an unknown type. A value of unknown type is of course useless - but we do know something about Example's parameter: it must be either () or Int because those are the return types of parseFoo and parseBar. Programming is about getting knowledge out of your brain and onto the page, so we're going to wrap up the Example value with a bit of GADT evidence which, when unwrapped, will tell you whether a was Int or ().
data Ty a where
IntTy :: Ty Int
UnitTy :: Ty ()
data (a :*: b) i = a i :&: b i
type Sig a b = Ex (a :*: b)
pattern Sig x y = Wrap (x :&: y)
parseExpr :: Parser (Sig Ty Example)
parseExpr = fmap (\x -> Sig UnitTy x) parseFoo <|>
fmap (\x -> Sig IntTy x) parseBar
Ty is (something like) a runtime "singleton" representative of Example's type parameter. When you pattern match on IntTy, you learn that a ~ Int; when you pattern match on UnitTy you learn that a ~ (). (Information can be made to flow the other way, from types to values, using classes.) :*:, the functor product, pairs up two type constructors ensuring that their parameters are equal; thus, pattern matching on the Ty tells you about its accompanying Example.
Sig is therefore called a dependent pair or sigma type - the type of the second component of the pair depends on the value of the first. This is a common technique: when you erase a type parameter by existential quantification, it usually pays to make it recoverable by bundling up a runtime representative of that parameter.
Note that this use of Sig is equivalent to Either (Example Int) (Example ()) - a sigma type is a sum, after all - but this version scales better when you're summing over a large (or possibly infinite) set.
Now it's easy to build our expression parser into a program parser. We just have to repeatedly apply the expression parser, and then manipulate the dependent pairs in the list.
parseProgram :: Parser (Sig Ty Example)
parseProgram = fmap (foldr1 combine) $ parseExpr `sepBy1` (char '\n')
where combine (Sig _ val) (Sig ty acc) = Sig ty (val >> acc)
The code I've shown you is not exemplary. It doesn't separate the concerns of parsing and typechecking. In production code I would modularise this design by first parsing the data into an untyped syntax tree - a separate data type which doesn't enforce the typing invariant - then transform that into a typed version by type-checking it. The dependent pair technique would still be necessary to give a type to the output of the type-checker, but it wouldn't be tangled up in the parser.
*If binding is not a requirement, have you thought about using a free applicative to represent your data?
†Ex and :*: are reusable bits of machinery which I lifted from the Hasochism paper
So, I worry that this is the same sort of premature abstraction that you see in object-oriented languages, getting in the way of things. For example, I am not 100% sure that you are using the structure of the free monad -- your helpers for example simply seem to use id and () in a rather boring way, in fact I'm not sure if your Int -> x is ever anything other than either Pure :: Int -> Free ExampleF Int or const (something :: Free ExampleF Int).
The free monad for a functor F can basically be described as a tree whose data is stored in leaves and whose branching factor is controlled by the recursion in each constructor of the functor F. So for example Free Identity has no branching, hence only one leaf, and thus has the same structure as the monad:
data MonoidalFree m x = MF m x deriving (Functor)
instance Monoid m => Monad (MonoidalFree m) where
return x = MF mempty x
MF m x >>= my_x = case my_x x of MF n y -> MF (mappend m n) y
In fact Free Identity is isomorphic to MonoidalFree (Sum Integer), the difference is just that instead of MF (Sum 3) "Hello" you see Free . Identity . Free . Identity . Free . Identity $ Pure "Hello" as the means of tracking this integer. On the other hand if you have data E x = L x | R x deriving (Functor) then you get a sort of "path" of Ls and Rs before you hit this one leaf, Free E is going to be isomorphic to MonoidalFree [Bool].
The reason I'm going through this is that when you combine Free with an Integer -> x functor, you get an infinitely branching tree, and when I'm looking through your code to figure out how you're actually using this tree, all I see is that you use the id function with it. As far as I can tell, that restricts the recursion to either have the form Free (Bar "string" Pure) or else Free (Bar "string" (const subExpression)), in which case the system would seem to reduce completely to the MonoidalFree [Either Int String] monad.
(At this point I should pause to ask: Is that correct as far as you know? Was this what was intended?)
Anyway. Aside from my problems with your premature abstraction, the specific problem that you're citing with your monad (you can't tell the difference between () and Int has a bunch of really complicated solutions, but one really easy one. The really easy solution is to yield a value of type Example (Either () Int) and if you have a () you can fmap Left onto it and if you have an Int you can fmap Right onto it.
Without a much better understanding of how you're using this thing over TCP/IP we can't recommend a better structure for you than the generic free monads that you seem to be finding -- in particular we'd need to know how you're planning on using the infinite-branching of Int -> x options in practice.

What are the benefits of replacing Haskell record with a function

I was reading this interesting article about continuations and I discovered this clever trick. Where I would naturally have used a record, the author uses instead a function with a sum type as the first argument.
So for example, instead of doing this
data Processor = Processor { processString :: String -> IO ()
, processInt :: Int -> IO ()
}
processor = Processor (\s -> print $ "Hello "++ s)
(\x -> print $ "value" ++ (show x))
We can do this:
data Arg = ArgString String | ArgInt Int
processor :: Arg -> IO ()
processor (ArgString s) = print "Hello" ++ s
processor (ArgInt x) = print "value" ++ (show x)
Apart from being clever, what are the benefits of it over a simple record ?
Is it a common pattern and does it have a name ?
Well, it's just a simple isomorphism. In ADT algebraic:
IO()String × IO()Int
≅ IO()String+Int
The obvious benefit of the RHS is perhaps that it only contains IO() once – DRY FTW.
This is a very loose example but you can see the Arg method as being an initial encoding and the Processor method as being a final encoding. They are, as others have noted, of equal power when viewed in many lights; however, there are some differences.
Initial encodings enable us to examine the "commands" being executed. In some sense, it means we've sliced the operation so that the input and the output are separated. This lets us choose many different outputs given the same input.
Final encodings enable us to abstract over implementations more easily. For instance, if we have two values of type Processor then we can treat them identically even if the two have different effects or achieve their effects by different means. This kind of abstraction is popularized in OO languages.
Initial encodings enable (in some sense) an easier time adding new functions since we just have to add a new branch to the Arg type. If we had many different ways of building Processors then we'd have to update each of these mechanisms.
Honestly, what I've described above is rather stretched. It is the case that Arg and Processor fit these patterns somewhat, but they do not do so in such a significant way as to really benefit from the distinction. It may be worth studying more examples if you're interested—a good search term is the "expression problem" which emphasizes the distinction in points (2) and (3) above.
To expand a bit on leftroundabout's response, there is a way of writing functions as OutputInput, because of cardinality (how many things there are). So for example if you think about all of the mappings of the set {0, 1, 2} of cardinality 3 to the set {0, 1} of cardinality 2, you see that 0 can map to 0 or 1, independent of 1 mapping to 0 or 1, independent of 2 mapping to 0 or 1. When counting the total number of functions we get 2 * 2 * 2 or 23.
In this same way of writing, sum types are written with + and product types are written with * and there is a cute way to phrase this as OutIn1 + In2 = OutIn1 * OutIn2; we could write the isomorphism as:
combiner :: (a -> z, b -> z) -> Either a b -> z
combiner (za, zb) e_ab = case e_ab of Left a -> za a; Right b -> zb b
splitter :: (Either a b -> z) -> (a -> z, b -> z)
splitter z_eab = (\a -> z_eab $ Left a, \b -> z_eab $ Right b)
and we can reify it in your code with:
type Processor = Either String Int -> IO ()
So what's the difference? There aren't many:
The combined form requires both things to have the exact same tail-end. You can't apply combiner to something of type a -> b -> z since that parses as a -> (b -> z) and b -> z is not unifiable with z. If you wanted to unify a -> b -> z with c -> z then you have to first uncurry the function to (a, b) -> z, which looks like a bit of work -- it's just not an issue when you use the record version.
The split form is also a little more concise for application; you just write fst split a instead of combined $ Left a. But this also means that you can't quite do something like yz . combined (whose equivalent is (yz . fst split, yz . snd split)) so easily. When you've actually got the Processor record defined it might be worth it to extend its kind to * -> * and make it a Functor.
The record can in general participate in type classes more easily than the sum-type-function.
Sum types will look more imperative, so they'll probably be clearer to read. For example, if I hand you the pattern withProcState p () [Read path1, Apply (map toUpper), Write path2] it's pretty easy to see that this feeds the processor with commands to uppercase path1 into path2. The equivalent of defining processors would look like procWrite p path2 $ procApply p (map toUpper) $ procRead p path1 () which is still pretty clear but not quite as awesome as the previous case.

Why context is not considered when selecting typeclass instance in Haskell?

I understand that when having
instance (Foo a) => Bar a
instance (Xyy a) => Bar a
GHC doesn't consider the contexts, and the instances are reported as duplicate.
What is counterintuitive, that (I guess) after selecting an instance, it still needs to check if the context matches, and if not, discard the instance. So why not reverse the order, and discard instances with non-matching contexts, and proceed with the remaining set.
Would this be intractable in some way? I see how it could cause more constraint resolution work upfront, but just as there is UndecidableInstances / IncoherentInstances, couldn't there be a ConsiderInstanceContexts when "I know what I am doing"?
This breaks the open-world assumption. Assume:
class B1 a
class B2 a
class T a
If we allow constraints to disambiguate instances, we may write
instance B1 a => T a
instance B2 a => T a
And may write
instance B1 Int
Now, if I have
f :: T a => a
Then f :: Int works. But, the open world assumption says that, once something works, adding more instances cannot break it. Our new system doesn't obey:
instance B2 Int
will make f :: Int ambiguous. Which implementation of T should be used?
Another way to state this is that you've broken coherence. For typeclasses to be coherent means that there is only one way to satisfy a given constraint. In normal Haskell, a constraint c has only one implementation. Even with overlapping instances, coherence generally holds true. The idea is that instance T a and instance {-# OVERLAPPING #-} T Int do not break coherence, because GHC can't be tricked into using the former instance in a place where the latter would do. (You can trick it with orphans, but you shouldn't.) Coherence, at least to me, seems somewhat desirable. Typeclass usage is "hidden", in some sense, and it makes sense to enforce that it be unambiguous. You can also break coherence with IncoherentInstances and/or unsafeCoerce, but, y'know.
In a category theoretic way, the category Constraint is thin: there is at most one instance/arrow from one Constraint to another. We first construct two arrows a : () => B1 Int and b : () => B2 Int, and then we break thinness by adding new arrows x_Int : B1 Int => T Int, y_Int : B2 Int => T Int such that x_Int . a and y_Int . b are both arrows () => T Int that are not identical. Diamond problem, anyone?
This does not answer you question as to why this is the case. Note, however, that you can always define a newtype wrapper to disambiguate between the two instances:
newtype FooWrapper a = FooWrapper a
newtype XyyWrapper a = XyyWrapper a
instance (Foo a) => Bar (FooWrapper a)
instance (Xyy a) => Bar (XyyWrapper a)
This has the added advantage that by passing around either a FooWrapper or a XyyWrapper you explicitly control which of the two instances you'd like to use if your a happens to satisfy both.
Classes are a bit weird. The original idea (which still pretty much works) is a sort of syntactic sugar around what would otherwise be data statements. For example you can imagine:
data Num a = Num {plus :: a -> a -> a, ... , fromInt :: Integer -> a}
numInteger :: Num Integer
numInteger = Num (+) ... id
then you can write functions which have e.g. type:
test :: Num x -> x -> x -> x -> x
test lib a b c = a + b * (abs (c + b))
where (+) = plus lib
(*) = times lib
abs = absoluteValue lib
So the idea is "we're going to automatically derive all of this library code." The question is, how do we find the library that we want? It's easy if we have a library of type Num Int, but how do we extend it to "constrained instances" based on functions of type:
fooLib :: Foo x -> Bar x
xyyLib :: Xyy x -> Bar x
The present solution in Haskell is to do a type-pattern-match on the output-types of those functions and propagate the inputs to the resulting declaration. But when there's two outputs of the same type, we would need a combinator which merges these into:
eitherLib :: Either (Foo x) (Xyy x) -> Bar x
and basically the problem is that there is no good constraint-combinator of this kind right now. That's your objection.
Well, that's true, but there are ways to achieve something morally similar in practice. Suppose we define some functions with types:
data F
data X
foobar'lib :: Foo x -> Bar' x F
xyybar'lib :: Xyy x -> Bar' x X
bar'barlib :: Bar' x y -> Bar x
Clearly the y is a sort of "phantom type" threaded through all of this, but it remains powerful because given that we want a Bar x we will propagate the need for a Bar' x y and given the need for the Bar' x y we will generate either a Bar' x X or a Bar' x y. So with phantom types and multi-parameter type classes, we get the result we want.
More info: https://www.haskell.org/haskellwiki/GHC/AdvancedOverlap
Adding backtracking would make instance resolution require exponential time, in the worst case.
Essentially, instances become logical statements of the form
P(x) => R(f(x)) /\ Q(x) => R(f(x))
which is equivalent to
(P(x) \/ Q(x)) => R(f(x))
Computationally, the cost of this check is (in the worst case)
c_R(n) = c_P(n-1) + c_Q(n-1)
assuming P and Q have similar costs
c_R(n) = 2 * c_PQ(n-1)
which leads to exponential growth.
To avoid this issue, it is important to have fast ways to choose a branch, i.e. to have clauses of the form
((fastP(x) /\ P(x)) \/ (fastQ(x) /\ Q(x))) => R(f(x))
where fastP and fastQ are computable in constant time, and are incompatible so that at most one branch needs to be visited.
Haskell decided that this "fast check" is head compatibility (hence disregarding contexts). It could use other fast checks, of course -- it's a design decision.

Match Data constructor functions

I'm trying to match data constructors in a generic way, so that any Task of a certain type will be executed.
data Task = TaskTypeA Int | TaskTypeB (Float,Float)
genericTasks :: StateLikeMonad s
genericTasks = do
want (TaskTypeA 5)
TaskTypeA #> \input -> do
want (TaskTypeB (1.2,4.3))
runTaskTypeA input
TaskTypeB #> \(x,y) -> runTaskTypeB x y
main = runTask genericTasks
In this, the genericTasks function goes through the do-instructions, building a list of stuff to do from want handled by some sort of state monad, and a list of ways to do it, via the (#>) function. The runTask function will run the genericTasks, use the resulting list of to-do and how-to-do, and do the computations.
However, I'm having quite some trouble figuring out how to extract the "type" (TaskTypeA,B) from (#>), such that one can call it later. If you do a :t TaskTypeA, you get a Int -> Task.
I.e., How to write (#>)?
I'm also not entirely confident that it's possible to do what I'm thinking here in such a generic way. For reference, I'm trying to build something similar to the Shake library, where (#>) is similar to (*>). However Shake uses a String as the argument to (*>), so the matching is done entirely using String matching. I'd like to do it without requiring strings.
Your intuition is correct, it's not possible to write (#>) as you have specified. The only time a data constructor acts as a pattern is when it is in pattern position, namely, appearing as a parameter to a function
f (TaskTypeA z) = ...
as one of the alternatives of a case statement
case tt of
TaskTypeA z -> ...
or in a monadic or pattern binding
do TaskTypeA z <- Just tt
return z
When used in value position (e.g. as an argument to a function), it loses its patterny nature and becomes a regular function. That means, unfortunately, that you cannot abstract over patterns this easily.
There is, however, a simple formalization of patterns:
type Pattern d a = d -> Maybe a
It's a little bit of work to make them.
taskTypeA :: Pattern Task Int
taskTypeA (TaskTypeA z) = Just z
taskTypeA _ = Nothing
If you also need need to use the constructor "forwards" (i.e. a -> d), then you could pair the two together (plus some functions to work with it):
data Constructor d a = Constructor (a -> d) (d -> Maybe a)
apply :: Constructor d a -> a -> d
apply (Constructor f _) = f
match :: Constructor d a -> d -> Maybe a
match (Constructor _ m) = m
taskTypeA :: Constructor Task Int
taskTypeA = Constructor TaskTypeA $ \case TaskTypeA z -> Just z
_ -> Nothing
This is known as a "prism", and (a very general form of) it is implemented in lens.
There are advantages to using an abstraction like this -- namely, that you can construct prisms which may have more structure than data types are allowed to (e.g. d can be a function type), and you can write functions that operate on constructors, composing simpler ones to make more complex ones generically.
If you are using plain data types, though, it is a pain to have to implement the Constructor objects for each constructor like I did for TaskTypeA above. If you have a lot of these to work with, you can use Template Haskell to write your boilerplate for you. The necessary Template Haskell routine is already implemented in lens -- it may be worth it to learn how to use the lens library because of that. (But it can be a bit daunting to navigate)
(Style note: the second Constructor above and its two helper functions can be written equivalently using a little trick:
data Constructor d a = Constructor { apply :: a -> d, match :: d -> Maybe a }
)
With this abstraction in place, it is now possible to write (#>). A simple example would be
(#>) :: Constructor d a -> (a -> State d ()) -> State d ()
cons #> f = do
d <- get
case match cons d of
Nothing -> return ()
Just a -> f a
or perhaps something more sophisticated, depending on what precisely you want.

Can GADTs be used to prove type inequalities in GHC?

So, in my ongoing attempts to half-understand Curry-Howard through small Haskell exercises, I've gotten stuck at this point:
{-# LANGUAGE GADTs #-}
import Data.Void
type Not a = a -> Void
-- | The type of type equality proofs, which can only be instantiated if a = b.
data Equal a b where
Refl :: Equal a a
-- | Derive a contradiction from a putative proof of #Equal Int Char#.
intIsNotChar :: Not (Equal Int Char)
intIsNotChar intIsChar = ???
Clearly the type Equal Int Char has no (non-bottom) inhabitants, and thus semantically there ought to be an absurdEquality :: Equal Int Char -> a function... but for the life of me I can't figure out any way to write one other than using undefined.
So either:
I'm missing something, or
There is some limitation of the language that makes this an impossible task, and I haven't managed to understand what it is.
I suspect the answer is something like this: the compiler is unable to exploit the fact that there are no Equal constructors that don't have a = b. But if that is so, what makes it true?
Here's a shorter version of Philip JF's solution, which is the way dependent type theorists have been refuting equations for years.
type family Discriminate x
type instance Discriminate Int = ()
type instance Discriminate Char = Void
transport :: Equal a b -> Discriminate a -> Discriminate b
transport Refl d = d
refute :: Equal Int Char -> Void
refute q = transport q ()
In order to show that things are different, you have to catch them behaving differently by providing a computational context which results in distinct observations. Discriminate provides exactly such a context: a type-level program which treats the two types differently.
It is not necessary to resort to undefined to solve this problem. Total programming sometimes involves rejecting impossible inputs. Even where undefined is available, I would recommend not using it where a total method suffices: the total method explains why something is impossible and the typechecker confirms; undefined merely documents your promise. Indeed, this method of refutation is how Epigram dispenses with "impossible cases" whilst ensuring that a case analysis covers its domain.
As for computational behaviour, note that refute, via transport is necessarily strict in q and that q cannot compute to head normal form in the empty context, simply because no such head normal form exists (and because computation preserves type, of course). In a total setting, we'd be sure that refute would never be invoked at run time. In Haskell, we're at least certain that its argument will diverge or throw an exception before we're obliged to respond to it. A lazy version, such as
absurdEquality e = error "you have a type error likely to cause big problems"
will ignore the toxicity of e and tell you that you have a type error when you don't. I prefer
absurdEquality e = e `seq` error "sue me if this happens"
if the honest refutation is too much like hard work.
I don't understand the problem with using undefined every type is inhabited by bottom in Haskell. Our language is not strongly normalizing... You are looking for the wrong thing. Equal Int Char leads to type errors not nice well kept exceptions. See
{-# LANGUAGE GADTs, TypeFamilies #-}
data Equal a b where
Refl :: Equal a a
type family Pick cond a b
type instance Pick Char a b = a
type instance Pick Int a b = b
newtype Picker cond a b = Picker (Pick cond a b)
pick :: b -> Picker Int a b
pick = Picker
unpick :: Picker Char a b -> a
unpick (Picker x) = x
samePicker :: Equal t1 t2 -> Picker t1 a b -> Picker t2 a b
samePicker Refl x = x
absurdCoerce :: Equal Int Char -> a -> b
absurdCoerce e x = unpick (samePicker e (pick x))
you could use this to create the function you want
absurdEquality e = absurdCoerce e ()
but that will produce undefined behavior as its computation rule. false should cause programs to abort, or at the very least run for ever. Aborting is the computation rule that is akin to turning minimal logic into intiutionistic logic by adding not. The correct definition is
absurdEquality e = error "you have a type error likely to cause big problems"
as to the question in the title: essentially no. To the best of my knowledge, type inequality is not representable in a practical way in current Haskell. Coming changes to the type system may lead to this getting nicer, but as of right now, we have equalities but not inequalites.

Resources