Future-proofing wildcard pattern matches

Future-proofing wildcard pattern matches - haskell

Let's say I have:
data C = C Bool
and then define:
f :: C -> Int
f (C _) = 0
This is perfectly good, except the match against the underscore is a bit dangerous. It completely ignores the type of the field. That is, if I change the data type later on like this:
data C = C Int
the function f still type-checks just fine.
This is precisely the situation I'd like to avoid: I'd like to get a warning for the definition of f, in the sense that it is way too permissive with regard to changes to the fields of the datatype C.
Note that if I act defensively, I can do:
f' :: C -> Int
f' (C (_ :: Bool)) = 0
This is nice since if I change the field of C to contain a field with a different type later on, I get a nice error message from GHC. While this is exactly what I'd like, it'd be even better if I can get GHC to warn me about this possible pitfall if I forget to be defensive in the above sense. That is, if I pass, say, -warn-permissive-patternmatch, or annotate the data-type C in some way to require more checking, I'd like to get a warning.
While the above is a contrived example, you can imagine this being useful in a real-life scenario, with a data-type with many constraints and many fields. (As in a compiler intermediate-representation for instance.) When writing functions over these types, we usually simply put an _ for fields we don't care about for a given function. But if we later decide to change the field-type to something different, we'd like to review all these wildcard pattern-matches to make sure the necessary changes are done. It would be nice if GHC told us there's a possible "maintenance" headache in these cases, and the proper solution would be to put the extra annotation as in f' above. Obviously, this should be opt-in, and most likely on a per-data-type-declaration, instead of at the module-level. (Though the latter can be made to work I suppose.)
Is there some trick one can use in GHC to get a similar effect today? Can this be implemented by a compiler plugin? Or, does this belong to GHC proper, and thus will have to be implemented directly by the compiler? How useful would it be for others? What other mechanisms do you deploy to avoid such maintenance issues for long-running projects? I'd especially appreciate feedback from a long-term code maintenance perspective.
Update
Thanks for the comments so far. I do agree that it'd be impossible to get something 100%-rock solid, since GHC has no way of foretelling what we intended nor it sees "previous" versions of the code. To make this more precise, what I'm really looking for is if I write a function likef above, GHC should tell me that I should annotate the wildcard to avoid possible changes to the ignored value. That is, it essentially would ask me to write f' instead, which is defensive to this sort of long-term code maintenance issues. (As #amalloy pointed out, there's no "perfect" solution here, but I think an opt-in warning would be nice and can help in large/long-running projects.)
An oft-quoted "strength" of Haskell is that maintenance is easier: Change the data-type and let GHC walk you through all the pieces of code you have to modify. This is one case where that promise fails, unfortunately. I'm trying to figure out a "trick" (or a new feature in GHC) that'd make this strength of Haskell go further with respect to changes of this sort.

I think the goal here is a bit fuzzy. Do you want the compiler to warn you every time you ignore a field in a pattern-match? In that case you can stop using _ altogether, since its purpose is to tell the compiler you are ignoring a field on purpose and to stop warning you. Instead turn on -Wunused-matches and replace all your underscores with variables.
But I don't think you really want that. You want something like _ which will not issue any warnings until you make some change to the type being matched on. I don't think this is something GHC can help you with. Even in your contrived example, the workaround you have (adding a type annotation) doesn't really work. It's fine if you imagine changing the definition of C, but what if you changed the definition of Bool instead? Then the type annotation would still be correct, so you'd get no warning, but your code has still become wrong. Of course in real life we don't change Bool, but a similar problem arises with types you do own. Here is a somewhat more realistic example:
data Location
data TrafficLightColor = Red | Green
data TrafficLight = TrafficLight Location TrafficLightColor
canSpeedUp :: TrafficLight -> Bool
canSpeedUp (TrafficLight loc color) = case (farAway loc, color) of
(True, _) -> True
(False, Red) -> False
(False, _) -> True
In some of the case bodies, you ignore the color with _. So these lines are examples of locations you'd like GHC to help you with, by somehow alerting you anytime the type of color changes. But this is not enough: even if color :: TrafficLightColor remains true, this code can become wrong if someone adds Yellow to the TrafficLightColor definition. The first case is still right, because you really want to ignore any faraway light, but the last case is wrong because you should slow down for yellow lights too.
What I'm getting at is that every usage of _ in any context can potentially become wrong, and GHC has no way to know if a given change to your codebase has made one of these _s wrong or not. It doesn't get a diff between the old version of your code and the new one, so it can't warn you, hey, this new value is falling into the _ case, did you mean to do something different? All it can do is see that you have an unused variable, but you explicitly said you don't care about its value by using _, so it won't bother you. If you want a different behavior, you have to give up the convenience of _.

Related

Getting value out of context or leaving them in?

Assume code below. Is there a quicker way to get the contextual values out of findSerial rather than writing a function like outOfContext?
The underlying question is: does one usually stick within context and use Functors, Applicatives, Monoids and Monads to get the job done, or is it better to take it out of context and apply the usual non-contextual computation methods. In brief: don't want to learn Haskell all wrong, since it takes time enough as it does.
import qualified Data.Map as Map
type SerialNumber = (String, Int)
serialList :: Map.Map String SerialNumber
serialList = Map.fromList [("belt drive",("BD",0001))
,("chain drive",("CD",0002))
,("drive pulley",("DP",0003))
,("drive sprocket",("DS",0004))
]
findSerial :: Ord k => k -> Map.Map k a -> Maybe a
findSerial input = Map.lookup input
outOfContext (Just (a, b)) = (a, b)

Assuming I understand it correctly, I think your question essentially boils down to “Is it idiomatic in Haskell to write and use partial functions?” (which your outOfContext function is, since it’s just a specialized form of the built-in partial function fromJust). The answer to that question is a resounding no. Partial functions are avoided whenever possible, and code that uses them can usually be refactored into code that doesn’t.
The reason partial functions are avoided is that they voluntarily compromise the effectiveness of the type system. In Haskell, when a function has type X -> Y, it is generally assumed that providing it an X will actually produce a Y, and that it will not do something else entirely (i.e. crash). If you have a function that doesn’t always succeed, reflecting that information in the type by writing X -> Maybe Y forces the caller to somehow handle the Nothing case, and it can either handle it directly or defer the failure further to its caller (by also producing a Maybe). This is great, since it means that programs that typecheck won’t crash at runtime. The program might still have logical errors, but knowing before even running the program that it won’t blow up is still pretty nice.
Partial functions throw this guarantee out the window. Any program that uses a partial function will crash at runtime if the function’s preconditions are accidentally violated, and since those preconditions are not reflected in the type system, the compiler cannot statically enforce them. A program might be logically correct at the time of its writing, but without enforcing that correctness with the type system, further modification, extension, or refactoring could easily introduce a bug by mistake.
For example, a programmer might write the expression
if isJust n then fromJust n else 0
which will certainly never crash at runtime, since fromJust’s precondition is always checked before it is called. However, the type system cannot enforce this, and a further refactoring might swap the branches of the if, or it might move the fromJust n to a different part of the program entirely and accidentally omit the isJust check. The program will still compile, but it may fail at runtime.
In contrast, if the programmer avoids partial functions, using explicit pattern-matching with case or total functions like maybe and fromMaybe, they can replace the tricky conditional above with something like
fromMaybe 0 n
which is not only clearer, but ensures any accidental misuse will simply fail to typecheck, and the potential bug will be detected much earlier.
For some concrete examples of how the type system can be a powerful ally if you stick exclusively to total functions, as well as some good food for thought about different ways to encode type safety for your domain into Haskell’s type system, I highly recommend reading Matt Parsons’s wonderful blog post, Type Safety Back and Forth, which explores these ideas in more depth. It additionally highlights how using Maybe as a catch-all representation of failure can be awkward, and it shows how the type system can be used to enforce preconditions to avoid needing to propagate Maybe throughout an entire system.

Function searching for applicable types for applying to Lensed pieces of Haskell AST

My primary queston is: is there, within some Haskell AST, a way I can determine a list of the available declarations, and their types? I'm trying to build an editor that allows for the user to be shown all the appropriate edits available, such as inserting functions and/or other declared values that can be used or inserted at any point. It'll also disallows syntax errors as well as type-errors. (That is, it'll be a semantic structural editor, which I'll also use the typechecker to make sure the editing pieces make sense to in this case, Haskell).
The second part of my question is: once I have that list, given a particular expression or function or focussed-on piece of AST (using Lens), how could I filter the list based on what could possibly replace or fit that particular focussed-on AST piece (whether by providing arguments to a function, or if it's a value, just "as-is"). Perhaps I need to add some concrete example here... something like: "Haskell, which declarations could possibly be applied (for functions) and/or placed into the hole at yay x y z = (x + y - z) * _?" then if there was an expression number2 :: Num a => a ; number2 = 23 it would put this in the list, as well as the functions available in the context, as well as those from Num itself such as (+) :: Num a => a -> a -> a, (*) :: Num a => a -> a -> a, and any other declarations that resulted in a type that would match such as Num a => a etc. etc.
More details follow:
I’ve done a fair bit of research into this area over quite a long time: looked at and used hint, Language.Haskell.Exts and Control.Lens a fair bit. Also had a look into Dynamic. Control.Lens is relevant for the second half of my question. I've also looked at quite a few projects along the way including Conal Elliott's "Semantic Editing Combinators", Paul Chiusano's Unison system and quite a few things in Clojure and Lisp as well.
So, I know I can get a list of the exports of a module with hint as [String], and I could coerce that to [Dynamic], I think (possibly?), but I’m not sure how I’d get sub-function declarations and their types. (Maybe I could take the declarations within that scope with AST and put them in their own modules in a String and pull them in by getting the top level declarations with hint? that would work but feels hacky and cumbersome)
I can use (:~:) from Data.Typeable to do "propositional equality" (ie typechecking?) on two terms, but what I actually need to do is see if a term could be matched into a position in the source/AST (I'm using lenses and prisms to focus on those parts of the AST) given some number of arguments. Some kind of partial type-checking, or result type-checking? Because the thing I might be focussing on could very well be a function, and I might need to keep the same arity.
I feel like perhaps this is very similar to Idris' term-searching, though I haven't looked into the source for that and I'm not sure if that's something only possible in a dependently typed language.
Any help would be great.

Looks like I kind of answered my own questions, so I'm going to do so formally here.
The answer to the first part of my question can be found in the Reflection module of the hint library. I knew I could get a list a [String] of these modules, but there's a function in there that can be used which has type: getModuleExports :: MonadInterpreter m => ModuleName -> m [ModuleElem] and is most likely the sort of thing I'm after. This is because hint provides access to a large part of the GHC API. It also provides some lookup functions which I can then use to get the types of these top level terms.
https://github.com/mvdan/hint/blob/master/src/Hint/Reflection.hs#L30
Also, Template Haskell provides some of the functionality I'm interested in, and I'll probably end up using quite a bit of that to build my functions, or at least a set of lenses for whatever syntax is being used by the code (/text) under consideration.
In terms of the second part of the question, I still don't have a particularly good answer, so my first attempt will be to use some String munging on the output of the lookup functions and see what I can do.

Haskell constructor aliases

Is there a way to have something equivalent to creating "constructor aliases" in Haskell? I'm thinking similar to type aliases where you can give the type a different name but it still behaves in every way as the aliased type.
My use case is a system where I have an assigned time as a property of some objects I'm modelling, so UTCTime. Some of these could be "variable" times, meaning it might not yet be assigned a time or the time it does have is "movable". So Maybe UTCTime.
But only some of the objects have variable times. Others have fixed times that the system has to take as a constant; a time variable currently assigned to a particular time is not handled the same way as a fixed time. Which now suggests Either UTCTime (Maybe UTCTime); it's either a fixed time or a variable time that might be unassigned.
The generic types seem to fit what I'm trying to model really well, so using them feels natural. But while it's obvious what Either UTCTime (Maybe UTCTime) is, it's not particularly obvious what it means, so some descriptive special-case names would be nice.
A simple type Timeslot = Either UTCTime (Maybe UTCTime) would definitely clean up my type signatures a lot, but that does nothing for the constructors. I can use something like bound = Just to get a name for constructing values, but not for pattern matching.
At the other end I can define a custom ADT with whatever names I want, but then I lose all the predefined functionality of the Either and Maybe types. Or rather I'll be applying transformations back and forth all the time (which I suppose is no worse than the situation with using newtype wrappers for things, only without the efficiency guarantee, but I doubt this would be a bottleneck anyway). And I suppose to understand code using generic Either and Maybe functions to manipulate my Timeslot values I'll need to know the way the standard constructors are mapped to whatever I want to use anyway, and the conversion functions would supply a handy compiler-enforced definition of that mapping. So maybe this is a good approach afterall.
I'm pretty sure I know Haskell well enough to say that there is no such thing as constructor-aliasing, but I'm curious whether there's some hack I don't know about, or some other good way of handling this situation.

Despite the drawbacks you mentioned, I strongly suggest simply creating a fresh ADT for your type; for example
data TimeVariable = Constant UTCTime | Assigned UTCTime | Unassigned
I offer these arguments:
Having descriptive constructors will make your code -- both construction and pattern matching -- significantly more readable. Compare Unassigned and Right Nothing. Now add six months and do the same comparison.
I suspect that as your application grows, you will find that this type needs to expand. Adding another constructor or another field to an existing constructor is much easier with a custom ADT, and it makes it very easy to identify code locations that need to be updated to deal with the new type.
Probably there will not be quite as many sensible operations on this type as there are in the standard library for munging Either and Maybe values -- so I bet you won't be duplicating nearly as much code as you think. And though you may be duplicating some code, giving your functions descriptive names is valuable for the same readability and refactoring reasons that giving your constructors descriptive names is.
I have personally written some code where all my sums were Either and all my products were (,). It was horrible. I could never remember which side of a sum meant which thing; when reading old code I had to constantly remind myself what conceptual type each value was supposed to be (e.g. Right doesn't tell you whether you're using Right here as part of a time variable or part of some other thing that you were too lazy to make an ADT for); I had to constantly mentally expand type aliases; etc. Learn from my pain. ;-)

The 'pattern synonyms' might get merged into ghc: http://ghc.haskell.org/trac/ghc/ticket/5144. In the meantime there is also -XViewPatterns, which lets you write things like:
type Timeslot = Either UTCTime (Maybe UTCTime)
fieldA = either Just (const Nothing)
fieldB = either (const Nothing) id
f (fieldA -> Just time) = ...
f (fieldB -> Just time) = ...
f _ = ...

Should I always prefer more general types to specific types?

Compiled with ghc --make, these two programs produce the exact same binaries:
-- id1a.hs
main = print (id' 'a')
id' :: a -> a
id' x = x
-- id1b.hs
main = print (id' 'a')
id' :: Char -> Char
id' x = x
Is this just because of how trivial/contrived my example is, or does this
hold true as programs get more complex?
Also, is there any good reason to avoid making my types as general as
possible? I usually try keep specifics out where I don't need them, but
I am not extremely familiar with the effects of this on compiled languages,
especially Haskell/GHC.
Side Note:
I seem to recall a recent SO question where the answer was to make a type more
specific in order to improve some performance issue, though I cannot find it
now, so I may have imagined it.
Edit:
I understand from a usability / composability standpoint that more general is always better, I'm more interested in the effects this has on the compiled code. Is it possible for me to be too eager in abstracting my code? Or is this usually not a problem in Haskell?

I would go and make everything as general as possible. If you run into performance issues you can start thinking about messing with concrete implementations but IMHO this will not be a problem very often and if this really gets an problem then maybe your performance need will be as great as to think about moving into imperative-land again ;)

Is there any good reason to avoid making my types as general as possible?
No, as long as you have the Specialize pragma at your disposal for those rare situations where it might actually matter.

Is this just because of how trivial/contrived my example is
Yes. Namely, try splitting the definition of id' and main into different modules and you should see a difference.
However, Carsten is right: there may be performance-related reasons to use concrete types, but you should generally start with general types and use concrete implementations only if you actually have a problem.

General types usually make your functions more usable, in my opinion.
This may be a poor example, but if you're writing a function such as elem (takes a list and an element and returns true if the list contains that element and false otherwise), using specific types will constrain the usability of your function. ie. if you specify the type as Int, you can't use that function to check if a String contains a certain character, for example.
I'm not quite sure about performance, but I haven't experienced any issues and I use general types almost all the time.

Haskell set datatype/datastructure

What i want to do is to create a type Set in Haskell to represent a generic(polymorphic) set ex. {1,'x',"aasdf",Phi}
first i want to clear that in my program i want to consider Phi(Empty set) as something that belongs to all sets
here is my code
data Set a b= Phi | Cons a (Set a b)
deriving (Show,Eq,Ord)
isMember Phi _ = True
isMember _ Phi = False
isMember x (Cons a b) = if x==a
then True
else isMember x b
im facing a couple of problems:
I want isMember type to be
isMember :: Eq a => a -> Set a b -> Bool
but according to my code it is
isMember :: Eq a => Set a b -> Set (Set a b) c -> Bool
If i have a set of different times the == operator doesn't work correctly so i need some help please :D

Regarding your type error, the problem looks like the first clause to me:
isMember Phi _ = True
This is an odd clause to write, because Phi is an entire set, not a set element. Just deleting it should give you a function of the type you expect.
Observe that your Set type never makes use of its second type argument, so it could be written instead as
data Set a = Phi | Cons a (Set a)
...and at that point you should just use [a], since it's isomorphic and has a huge entourage of functions already written for using and abusing them.
Finally, you ask to be able to put things of different types in. The short answer is that Haskell doesn't really swing that way. It's all about knowing exactly what kind of type a thing is at compile time, which isn't really compatible with what you're suggesting. There are actually some ways to do this; however, I strongly recommend getting much more familiar with Haskell's particular brand of type bondage before trying to take the bonds off.

A) Doing this is almost always not what you actually want.
B) There are a variety of ways to do this from embedding dynamic types (Dynamic) to using very complicated types (HList).
C) Here's a page describing some ways and issues: http://www.haskell.org/haskellwiki/Heterogenous_collections
D) If you're really going to do this, I'd suggest HList: http://homepages.cwi.nl/~ralf/HList/
E) But if you start to look at the documentation / HList paper and find yourself hopelessly confused, fall back to the dynamic solution (or better yet, rethink why you need this) and come back to HLists once you're significantly more comfortable with Haskell.
(Oh yes, and the existential solution described on that page is probably a terrible idea, since it almost never does anything particularly useful for you).

What you try to do is very difficult, as Haskell does not stores any type information by default. Two modules that are very useful for such things are Data.Typeable and Data.Dynamic. They provide support for storing a monomorphic (!) type and support for dynamic monomorphic typing.
I have not attempted to code something like this previously, but I have some ideas to accomplish that:
Each element of your set is a triple (quadruple) of the following things:
A TypeRep of the stored data-type
The value itself, coerced into an Any.
A comparison function (You can only use monomorphic values, you somehow have to store the context)
similary, a function to show the values.
Your set actually has two dimensions, first a tree by the TypeRep and than a list of values.
Whenever you insert a value, you coerce it into an Any and store all the required stuff together with it, as explained in (1) and put it in the right position as in (2).
When you want to find an element, you generate it's TypeRep and find the subtree of the right type. Then you just compare each sub-element with the value you want to find.
That are just some random thoughts. I guess it's actually much easier to use Dynamic.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string