Is there a way to have something equivalent to creating "constructor aliases" in Haskell? I'm thinking similar to type aliases where you can give the type a different name but it still behaves in every way as the aliased type.
My use case is a system where I have an assigned time as a property of some objects I'm modelling, so UTCTime. Some of these could be "variable" times, meaning it might not yet be assigned a time or the time it does have is "movable". So Maybe UTCTime.
But only some of the objects have variable times. Others have fixed times that the system has to take as a constant; a time variable currently assigned to a particular time is not handled the same way as a fixed time. Which now suggests Either UTCTime (Maybe UTCTime); it's either a fixed time or a variable time that might be unassigned.
The generic types seem to fit what I'm trying to model really well, so using them feels natural. But while it's obvious what Either UTCTime (Maybe UTCTime) is, it's not particularly obvious what it means, so some descriptive special-case names would be nice.
A simple type Timeslot = Either UTCTime (Maybe UTCTime) would definitely clean up my type signatures a lot, but that does nothing for the constructors. I can use something like bound = Just to get a name for constructing values, but not for pattern matching.
At the other end I can define a custom ADT with whatever names I want, but then I lose all the predefined functionality of the Either and Maybe types. Or rather I'll be applying transformations back and forth all the time (which I suppose is no worse than the situation with using newtype wrappers for things, only without the efficiency guarantee, but I doubt this would be a bottleneck anyway). And I suppose to understand code using generic Either and Maybe functions to manipulate my Timeslot values I'll need to know the way the standard constructors are mapped to whatever I want to use anyway, and the conversion functions would supply a handy compiler-enforced definition of that mapping. So maybe this is a good approach afterall.
I'm pretty sure I know Haskell well enough to say that there is no such thing as constructor-aliasing, but I'm curious whether there's some hack I don't know about, or some other good way of handling this situation.
Despite the drawbacks you mentioned, I strongly suggest simply creating a fresh ADT for your type; for example
data TimeVariable = Constant UTCTime | Assigned UTCTime | Unassigned
I offer these arguments:
Having descriptive constructors will make your code -- both construction and pattern matching -- significantly more readable. Compare Unassigned and Right Nothing. Now add six months and do the same comparison.
I suspect that as your application grows, you will find that this type needs to expand. Adding another constructor or another field to an existing constructor is much easier with a custom ADT, and it makes it very easy to identify code locations that need to be updated to deal with the new type.
Probably there will not be quite as many sensible operations on this type as there are in the standard library for munging Either and Maybe values -- so I bet you won't be duplicating nearly as much code as you think. And though you may be duplicating some code, giving your functions descriptive names is valuable for the same readability and refactoring reasons that giving your constructors descriptive names is.
I have personally written some code where all my sums were Either and all my products were (,). It was horrible. I could never remember which side of a sum meant which thing; when reading old code I had to constantly remind myself what conceptual type each value was supposed to be (e.g. Right doesn't tell you whether you're using Right here as part of a time variable or part of some other thing that you were too lazy to make an ADT for); I had to constantly mentally expand type aliases; etc. Learn from my pain. ;-)
The 'pattern synonyms' might get merged into ghc: http://ghc.haskell.org/trac/ghc/ticket/5144. In the meantime there is also -XViewPatterns, which lets you write things like:
type Timeslot = Either UTCTime (Maybe UTCTime)
fieldA = either Just (const Nothing)
fieldB = either (const Nothing) id
f (fieldA -> Just time) = ...
f (fieldB -> Just time) = ...
f _ = ...
Related
In my app I'm doing a lot of conversions from Text to various datatypes, often just to Text itself, but sometimes to other datatypes.
I also rarely do conversions from other string types, e.g. String and ByteString.
Interestingly, Readable.fromText does the job for me, at least for Integer and Text. However I also now need UTCTime, which Readable.fromText doesn't have an instance for (but which I could write myself).
I was thinking that Readable.fromText was a Text analogy of Text.Read.readEither for [Char], however I've realised that Readable.fromText is actually subtlety different, in that readEither for text isn't just pure, but instead expects the input string to be quoted. This isn't the case however for reading integers however, who don't expect quotes.
I understand that this is because show shows strings with quotes, so for read to be consistent it needs to require quotes.
However this is not the behaviour I want. I'm looking for a typeclass where reading strings to strings is basically the id function.
Readable seems to do this, but it's misleadingly named, as its behaviour is not entirely analogous to read on [Char]. Is there another typeclass that has this behaviour also? Or am I best of just extending Readable, perhaps with newtypes or alternatively PRs?
The what
Just use Data.Text and Data.Text.Read directly
With signed decimal or just decimal you get a simple and yet expressive minimalistic parser function. It's directly usable:
type Reader a = Text -> Either String (a, Text)
decimal :: Integral a => Reader a
signed :: Num a => Reader a -> Reader a
Or you cook up your own runReader :: Reader a -> M a combinator for some M to possibly handle non-empty leftover and deal with the Left case.
For turning a String -> Text, all you have to do is use pack
The why
Disclaimer: The matter of parsing data the right way is answered differently depending on who you ask.
I belong to the school that believes typeclasses are a poor fit for parsing mainly for two reasons.
Typeclasses limit you to one instance per type
You can easily have two different time formats in the data. Now you might tell yourself that you only have one use case, but what if you depend on another library that itself or transitively introduces another instance Readable UTCTime? Now you have to use newtypes for no reason other than be able to select a particular implementation, which is not nice!
Code transparency
You cannot make any inference as to what parser behavior you get from a typename alone. And for the most part haddock instance documentation often does not exist because it is often assumed the behavior be obvious.
Consider for example: What will instance Readable Int64 do?
Will it assume an ASCII encoded numeric representation? Or some binary representation?
If binary, which endianness is going to be assumed?
What representation of signedness is expected? In ASCII case perhaps a minus? Or maybe with a space? Or if binary, is it going to be one-complement? Two-complement?
How will it handle overflow?
Code transparency on call-sites
But the intransparency extends to call-sites as well. Consider the following example
do fieldA <- fromText
fieldB <- fromText
fieldB <- fromText
pure T{..}
What exactly does this do? Which parsers will be invoked? You will have to know the types of fieldA, fieldB and fieldB to answer that question. Now in simple code that might seem obvious, but you might easily forget if you look at the same code 2 weeks from now. Or you have more elaborate code, where the types involves are inferred non-locally. It becomes hard to follow which instance this will end up selecting (and the instance can make a huge difference, especially if you start newtyping for different formats. Say you cannot make any inference from a field name fooTimestamp because it might perhaps be UnixTime or UTCTime)
And much worse: If you refactor and alter one of the field types data declaration from one type to another - say a time field from Word64 to UTCTime - this might silently and unexpectedly switch out to a different parser, leading to a bug. Yuk!
On the topic of Show/Read
By the way, the reason why show/read behave they way they do for Prelude instances and deriving-generated instances can be discovered in the Haskell Report 2010.
On the topic of show it says
The result of show is a syntactically correct Haskell expression
containing only constants [...]
And equivalently for read
The result of show is readable by read if all component types are readable.
(This is true for all instances defined in the Prelude but may not be true
for user-defined instances.) [...]
So show for a string foo produces "foo" because that is the syntactically valid Haskell literal representing the string value of foo, and read will read that back, acting as a kind of eval
Let's say I have:
data C = C Bool
and then define:
f :: C -> Int
f (C _) = 0
This is perfectly good, except the match against the underscore is a bit dangerous. It completely ignores the type of the field. That is, if I change the data type later on like this:
data C = C Int
the function f still type-checks just fine.
This is precisely the situation I'd like to avoid: I'd like to get a warning for the definition of f, in the sense that it is way too permissive with regard to changes to the fields of the datatype C.
Note that if I act defensively, I can do:
f' :: C -> Int
f' (C (_ :: Bool)) = 0
This is nice since if I change the field of C to contain a field with a different type later on, I get a nice error message from GHC. While this is exactly what I'd like, it'd be even better if I can get GHC to warn me about this possible pitfall if I forget to be defensive in the above sense. That is, if I pass, say, -warn-permissive-patternmatch, or annotate the data-type C in some way to require more checking, I'd like to get a warning.
While the above is a contrived example, you can imagine this being useful in a real-life scenario, with a data-type with many constraints and many fields. (As in a compiler intermediate-representation for instance.) When writing functions over these types, we usually simply put an _ for fields we don't care about for a given function. But if we later decide to change the field-type to something different, we'd like to review all these wildcard pattern-matches to make sure the necessary changes are done. It would be nice if GHC told us there's a possible "maintenance" headache in these cases, and the proper solution would be to put the extra annotation as in f' above. Obviously, this should be opt-in, and most likely on a per-data-type-declaration, instead of at the module-level. (Though the latter can be made to work I suppose.)
Is there some trick one can use in GHC to get a similar effect today? Can this be implemented by a compiler plugin? Or, does this belong to GHC proper, and thus will have to be implemented directly by the compiler? How useful would it be for others? What other mechanisms do you deploy to avoid such maintenance issues for long-running projects? I'd especially appreciate feedback from a long-term code maintenance perspective.
Update
Thanks for the comments so far. I do agree that it'd be impossible to get something 100%-rock solid, since GHC has no way of foretelling what we intended nor it sees "previous" versions of the code. To make this more precise, what I'm really looking for is if I write a function likef above, GHC should tell me that I should annotate the wildcard to avoid possible changes to the ignored value. That is, it essentially would ask me to write f' instead, which is defensive to this sort of long-term code maintenance issues. (As #amalloy pointed out, there's no "perfect" solution here, but I think an opt-in warning would be nice and can help in large/long-running projects.)
An oft-quoted "strength" of Haskell is that maintenance is easier: Change the data-type and let GHC walk you through all the pieces of code you have to modify. This is one case where that promise fails, unfortunately. I'm trying to figure out a "trick" (or a new feature in GHC) that'd make this strength of Haskell go further with respect to changes of this sort.
I think the goal here is a bit fuzzy. Do you want the compiler to warn you every time you ignore a field in a pattern-match? In that case you can stop using _ altogether, since its purpose is to tell the compiler you are ignoring a field on purpose and to stop warning you. Instead turn on -Wunused-matches and replace all your underscores with variables.
But I don't think you really want that. You want something like _ which will not issue any warnings until you make some change to the type being matched on. I don't think this is something GHC can help you with. Even in your contrived example, the workaround you have (adding a type annotation) doesn't really work. It's fine if you imagine changing the definition of C, but what if you changed the definition of Bool instead? Then the type annotation would still be correct, so you'd get no warning, but your code has still become wrong. Of course in real life we don't change Bool, but a similar problem arises with types you do own. Here is a somewhat more realistic example:
data Location
data TrafficLightColor = Red | Green
data TrafficLight = TrafficLight Location TrafficLightColor
canSpeedUp :: TrafficLight -> Bool
canSpeedUp (TrafficLight loc color) = case (farAway loc, color) of
(True, _) -> True
(False, Red) -> False
(False, _) -> True
In some of the case bodies, you ignore the color with _. So these lines are examples of locations you'd like GHC to help you with, by somehow alerting you anytime the type of color changes. But this is not enough: even if color :: TrafficLightColor remains true, this code can become wrong if someone adds Yellow to the TrafficLightColor definition. The first case is still right, because you really want to ignore any faraway light, but the last case is wrong because you should slow down for yellow lights too.
What I'm getting at is that every usage of _ in any context can potentially become wrong, and GHC has no way to know if a given change to your codebase has made one of these _s wrong or not. It doesn't get a diff between the old version of your code and the new one, so it can't warn you, hey, this new value is falling into the _ case, did you mean to do something different? All it can do is see that you have an unused variable, but you explicitly said you don't care about its value by using _, so it won't bother you. If you want a different behavior, you have to give up the convenience of _.
The Haskell tutorial states that:
by looking at the type signature of read
read :: Read a => String -> a
it follows that GHCI has no way of knowing which type we want in return when running
ghci> read "4"
Why is it necessary to provide a second value from which GHCI can extract a type to compare with?
Wouldn't it be feasible to check a single value against all possible types of the Read typeclass?
Reference:
http://learnyouahaskell.com/types-and-typeclasses
I think you have a (rather common among beginners - I had it myself) misunderstanding of what type classes are. The way Haskell works is logically incompatible with "check[ing] a single value against all possible types of the Read typeclass". Instance selection is based on types. Only types.
You should not think of read as a magical function that can return many types. It's actually a huge family of functions, and the type is used to select which member of the family to use. It's that direction of dependence that matters. Classes create a case where values (usually functions, but not always) - the things that exist at run time - are chosen based on types - the things that exist at compile time.
You're asking "Why not the other direction? Why can't the type depend on the value?", and the answer to that is that Haskell just doesn't work that way. It wasn't designed to, and the theory it was based on doesn't allow it. There is a theory for that (dependent types), and there are extensions being added to GHC that support an increasing set of feature that do some aspect of dependent typing, but it's not there yet.
And even if it was, this example would still not work the way you want. Dependent types still need to know what type something is. You couldn't write a magical "returns anything" version of read. Instead, the type for read would have to involve some function that calculates the type from the value, and inherently only works for the closed set of types that function can return.
Those last two paragraphs are kind of an aside, though. The important part is that classes are ways to go from types to values, with handy compiler support to automatically figure it out for you most of the time. That's all they were designed to do, and it's all that they can do. There are advantages to this design, in terms of ease of compilation, predictability of behavior (open world assumption), and ability to optimize at compile time.
Wouldn't it be feasible to check a single value against all possible types of the Read typeclass?
Doing that would yield the same result; read "4" can potentially be anything that can be read from a String, and that's what ghci reports:
Prelude> :t read "4"
read "4" :: Read a => a
Until you actually do the parsing, the Read a => a represents a potential parsing result. Remember that typeclasses being open means that this could potentially be any type, depending on the presence of the instances.
It's also entirely possible that multiple types could share the same Show/Read textual representation, which brings me to my next point...
If you wanted to check what type the string can be parsed as, that would at the very least require resolving the ambiguity between multiple types that could accept the given input; which means you'd need to know those types beforehand, which Read can't do. And even if you did that, how do you propose such value be then used? You'd need to pack it into something, which implies that you need a closed set again.
All in all, read signature is as precise it can be, given the circumstances.
Not meant as an answer, but this wouldn't fit into a comment cleanly.
In ghci, if you simply do a read "5", then ghci is going to need some help figuring out what you want it to be. However, if that result is being used somewhere, ghci (and Haskell in general) can figure out the type. For (a silly) example:
add1 :: Int -> Int
add1 i = i + 1
five = read "5"
six = add1 five
In that case, there's no need to annotate the read with a type signature, because ghc can infer it from the fact that five is being used in a function that only takes an Int. If you added another function with a different signature that also tried to use five, you'd end up with a compile error:
-- Adding this to our code above
-- Fails to compile
add1Integer :: Integer -> Integer
add1Integer i = i + 1
sixAsInteger = add1Integer five
My primary queston is: is there, within some Haskell AST, a way I can determine a list of the available declarations, and their types? I'm trying to build an editor that allows for the user to be shown all the appropriate edits available, such as inserting functions and/or other declared values that can be used or inserted at any point. It'll also disallows syntax errors as well as type-errors. (That is, it'll be a semantic structural editor, which I'll also use the typechecker to make sure the editing pieces make sense to in this case, Haskell).
The second part of my question is: once I have that list, given a particular expression or function or focussed-on piece of AST (using Lens), how could I filter the list based on what could possibly replace or fit that particular focussed-on AST piece (whether by providing arguments to a function, or if it's a value, just "as-is"). Perhaps I need to add some concrete example here... something like: "Haskell, which declarations could possibly be applied (for functions) and/or placed into the hole at yay x y z = (x + y - z) * _?" then if there was an expression number2 :: Num a => a ; number2 = 23 it would put this in the list, as well as the functions available in the context, as well as those from Num itself such as (+) :: Num a => a -> a -> a, (*) :: Num a => a -> a -> a, and any other declarations that resulted in a type that would match such as Num a => a etc. etc.
More details follow:
I’ve done a fair bit of research into this area over quite a long time: looked at and used hint, Language.Haskell.Exts and Control.Lens a fair bit. Also had a look into Dynamic. Control.Lens is relevant for the second half of my question. I've also looked at quite a few projects along the way including Conal Elliott's "Semantic Editing Combinators", Paul Chiusano's Unison system and quite a few things in Clojure and Lisp as well.
So, I know I can get a list of the exports of a module with hint as [String], and I could coerce that to [Dynamic], I think (possibly?), but I’m not sure how I’d get sub-function declarations and their types. (Maybe I could take the declarations within that scope with AST and put them in their own modules in a String and pull them in by getting the top level declarations with hint? that would work but feels hacky and cumbersome)
I can use (:~:) from Data.Typeable to do "propositional equality" (ie typechecking?) on two terms, but what I actually need to do is see if a term could be matched into a position in the source/AST (I'm using lenses and prisms to focus on those parts of the AST) given some number of arguments. Some kind of partial type-checking, or result type-checking? Because the thing I might be focussing on could very well be a function, and I might need to keep the same arity.
I feel like perhaps this is very similar to Idris' term-searching, though I haven't looked into the source for that and I'm not sure if that's something only possible in a dependently typed language.
Any help would be great.
Looks like I kind of answered my own questions, so I'm going to do so formally here.
The answer to the first part of my question can be found in the Reflection module of the hint library. I knew I could get a list a [String] of these modules, but there's a function in there that can be used which has type: getModuleExports :: MonadInterpreter m => ModuleName -> m [ModuleElem] and is most likely the sort of thing I'm after. This is because hint provides access to a large part of the GHC API. It also provides some lookup functions which I can then use to get the types of these top level terms.
https://github.com/mvdan/hint/blob/master/src/Hint/Reflection.hs#L30
Also, Template Haskell provides some of the functionality I'm interested in, and I'll probably end up using quite a bit of that to build my functions, or at least a set of lenses for whatever syntax is being used by the code (/text) under consideration.
In terms of the second part of the question, I still don't have a particularly good answer, so my first attempt will be to use some String munging on the output of the lookup functions and see what I can do.
I'd like to create a Template Haskell function such that:
$(isInstanceOf ''Read ''SomeType)
will result in either True if SomeType is an instance of Read, and False otherwise.
I tried to look at the result of reify and I think I'm looking for the contents of the ClassI constructor, but the documentation is somewhat lacking and I'm having trouble deciphering what I need. Can someone provide guidance on where to look to find the data needed to create the above function?
Template Haskell already provides a function that does almost what you want. It's there as of version 2.5, and prior to that I'm not aware of any means to look up instances at all.
The difference is that the existing isClassInstance function takes a Name for the class--which is what you get from something like ''Read--but a Type to look for instances with. This probably makes more sense, because with a Name there's no obvious way to check for instances that require type parameters. For example, you wouldn't be able to check directly whether [Int] has a Show instance, or whether Either String is a Monad instance.
Note that a Type can be constructed almost as easily as a Name using a quotation, e.g. you could write something like $(isInstanceOf ''Monad [t| Either String |]).
Given the above, all you'd have to do is a bit of juggling to return a useful value from the splice, whatever you want that to be.