In my app I'm doing a lot of conversions from Text to various datatypes, often just to Text itself, but sometimes to other datatypes.
I also rarely do conversions from other string types, e.g. String and ByteString.
Interestingly, Readable.fromText does the job for me, at least for Integer and Text. However I also now need UTCTime, which Readable.fromText doesn't have an instance for (but which I could write myself).
I was thinking that Readable.fromText was a Text analogy of Text.Read.readEither for [Char], however I've realised that Readable.fromText is actually subtlety different, in that readEither for text isn't just pure, but instead expects the input string to be quoted. This isn't the case however for reading integers however, who don't expect quotes.
I understand that this is because show shows strings with quotes, so for read to be consistent it needs to require quotes.
However this is not the behaviour I want. I'm looking for a typeclass where reading strings to strings is basically the id function.
Readable seems to do this, but it's misleadingly named, as its behaviour is not entirely analogous to read on [Char]. Is there another typeclass that has this behaviour also? Or am I best of just extending Readable, perhaps with newtypes or alternatively PRs?
The what
Just use Data.Text and Data.Text.Read directly
With signed decimal or just decimal you get a simple and yet expressive minimalistic parser function. It's directly usable:
type Reader a = Text -> Either String (a, Text)
decimal :: Integral a => Reader a
signed :: Num a => Reader a -> Reader a
Or you cook up your own runReader :: Reader a -> M a combinator for some M to possibly handle non-empty leftover and deal with the Left case.
For turning a String -> Text, all you have to do is use pack
The why
Disclaimer: The matter of parsing data the right way is answered differently depending on who you ask.
I belong to the school that believes typeclasses are a poor fit for parsing mainly for two reasons.
Typeclasses limit you to one instance per type
You can easily have two different time formats in the data. Now you might tell yourself that you only have one use case, but what if you depend on another library that itself or transitively introduces another instance Readable UTCTime? Now you have to use newtypes for no reason other than be able to select a particular implementation, which is not nice!
Code transparency
You cannot make any inference as to what parser behavior you get from a typename alone. And for the most part haddock instance documentation often does not exist because it is often assumed the behavior be obvious.
Consider for example: What will instance Readable Int64 do?
Will it assume an ASCII encoded numeric representation? Or some binary representation?
If binary, which endianness is going to be assumed?
What representation of signedness is expected? In ASCII case perhaps a minus? Or maybe with a space? Or if binary, is it going to be one-complement? Two-complement?
How will it handle overflow?
Code transparency on call-sites
But the intransparency extends to call-sites as well. Consider the following example
do fieldA <- fromText
fieldB <- fromText
fieldB <- fromText
pure T{..}
What exactly does this do? Which parsers will be invoked? You will have to know the types of fieldA, fieldB and fieldB to answer that question. Now in simple code that might seem obvious, but you might easily forget if you look at the same code 2 weeks from now. Or you have more elaborate code, where the types involves are inferred non-locally. It becomes hard to follow which instance this will end up selecting (and the instance can make a huge difference, especially if you start newtyping for different formats. Say you cannot make any inference from a field name fooTimestamp because it might perhaps be UnixTime or UTCTime)
And much worse: If you refactor and alter one of the field types data declaration from one type to another - say a time field from Word64 to UTCTime - this might silently and unexpectedly switch out to a different parser, leading to a bug. Yuk!
On the topic of Show/Read
By the way, the reason why show/read behave they way they do for Prelude instances and deriving-generated instances can be discovered in the Haskell Report 2010.
On the topic of show it says
The result of show is a syntactically correct Haskell expression
containing only constants [...]
And equivalently for read
The result of show is readable by read if all component types are readable.
(This is true for all instances defined in the Prelude but may not be true
for user-defined instances.) [...]
So show for a string foo produces "foo" because that is the syntactically valid Haskell literal representing the string value of foo, and read will read that back, acting as a kind of eval
Related
The Haskell tutorial states that:
by looking at the type signature of read
read :: Read a => String -> a
it follows that GHCI has no way of knowing which type we want in return when running
ghci> read "4"
Why is it necessary to provide a second value from which GHCI can extract a type to compare with?
Wouldn't it be feasible to check a single value against all possible types of the Read typeclass?
Reference:
http://learnyouahaskell.com/types-and-typeclasses
I think you have a (rather common among beginners - I had it myself) misunderstanding of what type classes are. The way Haskell works is logically incompatible with "check[ing] a single value against all possible types of the Read typeclass". Instance selection is based on types. Only types.
You should not think of read as a magical function that can return many types. It's actually a huge family of functions, and the type is used to select which member of the family to use. It's that direction of dependence that matters. Classes create a case where values (usually functions, but not always) - the things that exist at run time - are chosen based on types - the things that exist at compile time.
You're asking "Why not the other direction? Why can't the type depend on the value?", and the answer to that is that Haskell just doesn't work that way. It wasn't designed to, and the theory it was based on doesn't allow it. There is a theory for that (dependent types), and there are extensions being added to GHC that support an increasing set of feature that do some aspect of dependent typing, but it's not there yet.
And even if it was, this example would still not work the way you want. Dependent types still need to know what type something is. You couldn't write a magical "returns anything" version of read. Instead, the type for read would have to involve some function that calculates the type from the value, and inherently only works for the closed set of types that function can return.
Those last two paragraphs are kind of an aside, though. The important part is that classes are ways to go from types to values, with handy compiler support to automatically figure it out for you most of the time. That's all they were designed to do, and it's all that they can do. There are advantages to this design, in terms of ease of compilation, predictability of behavior (open world assumption), and ability to optimize at compile time.
Wouldn't it be feasible to check a single value against all possible types of the Read typeclass?
Doing that would yield the same result; read "4" can potentially be anything that can be read from a String, and that's what ghci reports:
Prelude> :t read "4"
read "4" :: Read a => a
Until you actually do the parsing, the Read a => a represents a potential parsing result. Remember that typeclasses being open means that this could potentially be any type, depending on the presence of the instances.
It's also entirely possible that multiple types could share the same Show/Read textual representation, which brings me to my next point...
If you wanted to check what type the string can be parsed as, that would at the very least require resolving the ambiguity between multiple types that could accept the given input; which means you'd need to know those types beforehand, which Read can't do. And even if you did that, how do you propose such value be then used? You'd need to pack it into something, which implies that you need a closed set again.
All in all, read signature is as precise it can be, given the circumstances.
Not meant as an answer, but this wouldn't fit into a comment cleanly.
In ghci, if you simply do a read "5", then ghci is going to need some help figuring out what you want it to be. However, if that result is being used somewhere, ghci (and Haskell in general) can figure out the type. For (a silly) example:
add1 :: Int -> Int
add1 i = i + 1
five = read "5"
six = add1 five
In that case, there's no need to annotate the read with a type signature, because ghc can infer it from the fact that five is being used in a function that only takes an Int. If you added another function with a different signature that also tried to use five, you'd end up with a compile error:
-- Adding this to our code above
-- Fails to compile
add1Integer :: Integer -> Integer
add1Integer i = i + 1
sixAsInteger = add1Integer five
My primary queston is: is there, within some Haskell AST, a way I can determine a list of the available declarations, and their types? I'm trying to build an editor that allows for the user to be shown all the appropriate edits available, such as inserting functions and/or other declared values that can be used or inserted at any point. It'll also disallows syntax errors as well as type-errors. (That is, it'll be a semantic structural editor, which I'll also use the typechecker to make sure the editing pieces make sense to in this case, Haskell).
The second part of my question is: once I have that list, given a particular expression or function or focussed-on piece of AST (using Lens), how could I filter the list based on what could possibly replace or fit that particular focussed-on AST piece (whether by providing arguments to a function, or if it's a value, just "as-is"). Perhaps I need to add some concrete example here... something like: "Haskell, which declarations could possibly be applied (for functions) and/or placed into the hole at yay x y z = (x + y - z) * _?" then if there was an expression number2 :: Num a => a ; number2 = 23 it would put this in the list, as well as the functions available in the context, as well as those from Num itself such as (+) :: Num a => a -> a -> a, (*) :: Num a => a -> a -> a, and any other declarations that resulted in a type that would match such as Num a => a etc. etc.
More details follow:
I’ve done a fair bit of research into this area over quite a long time: looked at and used hint, Language.Haskell.Exts and Control.Lens a fair bit. Also had a look into Dynamic. Control.Lens is relevant for the second half of my question. I've also looked at quite a few projects along the way including Conal Elliott's "Semantic Editing Combinators", Paul Chiusano's Unison system and quite a few things in Clojure and Lisp as well.
So, I know I can get a list of the exports of a module with hint as [String], and I could coerce that to [Dynamic], I think (possibly?), but I’m not sure how I’d get sub-function declarations and their types. (Maybe I could take the declarations within that scope with AST and put them in their own modules in a String and pull them in by getting the top level declarations with hint? that would work but feels hacky and cumbersome)
I can use (:~:) from Data.Typeable to do "propositional equality" (ie typechecking?) on two terms, but what I actually need to do is see if a term could be matched into a position in the source/AST (I'm using lenses and prisms to focus on those parts of the AST) given some number of arguments. Some kind of partial type-checking, or result type-checking? Because the thing I might be focussing on could very well be a function, and I might need to keep the same arity.
I feel like perhaps this is very similar to Idris' term-searching, though I haven't looked into the source for that and I'm not sure if that's something only possible in a dependently typed language.
Any help would be great.
Looks like I kind of answered my own questions, so I'm going to do so formally here.
The answer to the first part of my question can be found in the Reflection module of the hint library. I knew I could get a list a [String] of these modules, but there's a function in there that can be used which has type: getModuleExports :: MonadInterpreter m => ModuleName -> m [ModuleElem] and is most likely the sort of thing I'm after. This is because hint provides access to a large part of the GHC API. It also provides some lookup functions which I can then use to get the types of these top level terms.
https://github.com/mvdan/hint/blob/master/src/Hint/Reflection.hs#L30
Also, Template Haskell provides some of the functionality I'm interested in, and I'll probably end up using quite a bit of that to build my functions, or at least a set of lenses for whatever syntax is being used by the code (/text) under consideration.
In terms of the second part of the question, I still don't have a particularly good answer, so my first attempt will be to use some String munging on the output of the lookup functions and see what I can do.
Is there a way to have something equivalent to creating "constructor aliases" in Haskell? I'm thinking similar to type aliases where you can give the type a different name but it still behaves in every way as the aliased type.
My use case is a system where I have an assigned time as a property of some objects I'm modelling, so UTCTime. Some of these could be "variable" times, meaning it might not yet be assigned a time or the time it does have is "movable". So Maybe UTCTime.
But only some of the objects have variable times. Others have fixed times that the system has to take as a constant; a time variable currently assigned to a particular time is not handled the same way as a fixed time. Which now suggests Either UTCTime (Maybe UTCTime); it's either a fixed time or a variable time that might be unassigned.
The generic types seem to fit what I'm trying to model really well, so using them feels natural. But while it's obvious what Either UTCTime (Maybe UTCTime) is, it's not particularly obvious what it means, so some descriptive special-case names would be nice.
A simple type Timeslot = Either UTCTime (Maybe UTCTime) would definitely clean up my type signatures a lot, but that does nothing for the constructors. I can use something like bound = Just to get a name for constructing values, but not for pattern matching.
At the other end I can define a custom ADT with whatever names I want, but then I lose all the predefined functionality of the Either and Maybe types. Or rather I'll be applying transformations back and forth all the time (which I suppose is no worse than the situation with using newtype wrappers for things, only without the efficiency guarantee, but I doubt this would be a bottleneck anyway). And I suppose to understand code using generic Either and Maybe functions to manipulate my Timeslot values I'll need to know the way the standard constructors are mapped to whatever I want to use anyway, and the conversion functions would supply a handy compiler-enforced definition of that mapping. So maybe this is a good approach afterall.
I'm pretty sure I know Haskell well enough to say that there is no such thing as constructor-aliasing, but I'm curious whether there's some hack I don't know about, or some other good way of handling this situation.
Despite the drawbacks you mentioned, I strongly suggest simply creating a fresh ADT for your type; for example
data TimeVariable = Constant UTCTime | Assigned UTCTime | Unassigned
I offer these arguments:
Having descriptive constructors will make your code -- both construction and pattern matching -- significantly more readable. Compare Unassigned and Right Nothing. Now add six months and do the same comparison.
I suspect that as your application grows, you will find that this type needs to expand. Adding another constructor or another field to an existing constructor is much easier with a custom ADT, and it makes it very easy to identify code locations that need to be updated to deal with the new type.
Probably there will not be quite as many sensible operations on this type as there are in the standard library for munging Either and Maybe values -- so I bet you won't be duplicating nearly as much code as you think. And though you may be duplicating some code, giving your functions descriptive names is valuable for the same readability and refactoring reasons that giving your constructors descriptive names is.
I have personally written some code where all my sums were Either and all my products were (,). It was horrible. I could never remember which side of a sum meant which thing; when reading old code I had to constantly remind myself what conceptual type each value was supposed to be (e.g. Right doesn't tell you whether you're using Right here as part of a time variable or part of some other thing that you were too lazy to make an ADT for); I had to constantly mentally expand type aliases; etc. Learn from my pain. ;-)
The 'pattern synonyms' might get merged into ghc: http://ghc.haskell.org/trac/ghc/ticket/5144. In the meantime there is also -XViewPatterns, which lets you write things like:
type Timeslot = Either UTCTime (Maybe UTCTime)
fieldA = either Just (const Nothing)
fieldB = either (const Nothing) id
f (fieldA -> Just time) = ...
f (fieldB -> Just time) = ...
f _ = ...
I am so familiar with imperative language and their features. So, I wonder how I can convert one data type to another?
ex:
in c++
static_cast<>
in c
( data_type) <another_data_type>
Use an explicit coercion function.
For example, fromIntegral will convert any Integral type (Int, Integer, Word*, etc.) into any numeric type.
You can use Hoogle to find the actual function that suits your need by its type.
Haskell's type system is very different and quite a bit smarter then C/C++/Javas. To understand why you are not going to get the answer you expect, it will help to compare the two.
For C and friends the type is a way of describing the layout of data in memory. The compiler does makes a few sanity checks trying to ensure that memory is not corrupted, but in the end its all bytes and you can call them what ever you want. This is even more true of pointers which are always laid out the same in memory but can reference anything or (frighteningly) nothing.
In Haskell, types are a language that one writes to the compiler with. As a programmer you have no control over how the compiler represents data, and because haskell is lazy a lot of data in your program may be no more then a promise to produce a value on demand (called a thunk in the code GHC's and HUGS). While a c compiler can be instructed to treat data differently, there is no equivalent way to tell a haskell compiler to treat one type like another in general.
As mentioned in other answers, there are some types where there are obvious ways to convert one type to another. Any of the numerical types like Double, Fraction, or Real (in general any instance of the Num class) can be made from an Integer, but we need to use a function specifically designed to make this happen. In a sense this is not a 'cast' but an actual function in the same way that \x -> x > 0 is a function for converting numbers into booleans.
I'll make one last guess as to why you might be asking a question like this. When I was just starting with haskell I wrote a lot of functions like:
area :: Double -> Double -> Double -- find the area of a rectangle
area x y = x * y
I would then find myself dumping fromInteger calls all over the place to get my data in the correct type for the function. Having come from a C background I wrote all my functions with monomorphic types. The trick to not needing to cast from one type to another is to write functions that work with different types. Haskell type classes are a huge shift for OOP programmers and so they often get ignored the first couple of tries, but they are what make the otherwise very strict haskell type system usable. If you can relax your type signatures (e.g. area :: (Num a)=> a -> a -> a) you will find yourself wishing for that cast function much less often.
There are many different functions that convert different data types.
The examples would be:
fromIntegral - to convert between say Int and Double
pack / unpack - to convert between ByteString and String
read - to convert from String to Int
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I know newtype is more often compared to data in Haskell, but I'm posing this comparison from more of a design point-of-view than as a technical problem.
In imperitive/OO languages, there is the anti-pattern "primitive obsession", where the prolific use of primitive types reduces the type-safety of a program and introduces accidentally interchangeability of same-typed values, otherwise intended for different purposes. For example, many things can be a String, but it would be nice if a compiler could know, statically, which we mean to be a name and which we mean to be the city in an address.
So, how often then, do Haskell programmers employ newtype to give type distinctions to otherwise primitive values? The use of type introduces an alias and gives a program's readability clearer semantics, but doesn't prevent accidentally interchanges of values. As I learn haskell I notice that the type system is as powerful as any I have come across. Therefore, I would think this is a natural and common practice, but I haven't seen much or any discussion of the use of newtype in this light.
Of course a lot of programmers do things differently, but is this at all common in haskell?
The main uses for newtypes are:
For defining alternative instances for types.
Documentation.
Data/format correctness assurance.
I'm working on an application right now in which I use newtypes extensively. newtypes in Haskell are a purely compile-time concept. E.g. with unwrappers below, unFilename (Filename "x") compiled to the same code as "x". There is absolutely zero run-time hit. There is with data types. This makes it a very nice way to achieve the above listed goals.
-- | A file name (not a file path).
newtype Filename = Filename { unFilename :: String }
deriving (Show,Eq)
I don't want to accidentally treat this as a file path. It's not a file path. It's the name of a conceptual file somewhere in the database.
It's very important for algorithms to refer to the right thing, newtypes help with this. It's also very important for security, for example, consider upload of files to a web application. I have these types:
-- | A sanitized (safe) filename.
newtype SanitizedFilename =
SanitizedFilename { unSafe :: String } deriving Show
-- | Unique, sanitized filename.
newtype UniqueFilename =
UniqueFilename { unUnique :: SanitizedFilename } deriving Show
-- | An uploaded file.
data File = File {
file_name :: String -- ^ Uploaded file.
,file_location :: UniqueFilename -- ^ Saved location.
,file_type :: String -- ^ File type.
} deriving (Show)
Suppose I have this function which cleans a filename from a file that's been uploaded:
-- | Sanitize a filename for saving to upload directory.
sanitizeFilename :: String -- ^ Arbitrary filename.
-> SanitizedFilename -- ^ Sanitized filename.
sanitizeFilename = SanitizedFilename . filter ok where
ok c = isDigit c || isLetter c || elem c "-_."
Now from that I generate a unique filename:
-- | Generate a unique filename.
uniqueFilename :: SanitizedFilename -- ^ Sanitized filename.
-> IO UniqueFilename -- ^ Unique filename.
It's dangerous to generate a unique filename from an arbitrary filename, it should be sanitized first. Likewise, a unique filename is thus always safe by extension. I can save the file to disk now and put that filename in my database if I want to.
But it can also be annoying to have to wrap/unwrap a lot. In the long run, I see it as worth it especially for avoiding value mismatches. ViewPatterns help somewhat:
-- | Get the form fields for a form.
formFields :: ConferenceId -> Controller [Field]
formFields (unConferenceId -> cid) = getFields where
... code using cid ..
Maybe you'll say that unwrapping it in a function is a problem -- what if you pass cid to a function wrongly? Not an issue, all functions using a conference id will use the ConferenceId type. What emerges is a sort of function-to-function-level contract system that is forced at compile time. Pretty nice. So yeah I use it as often as I can, especially in big systems.
I think this is mostly a matter of the situation.
Consider pathnames. The standard prelude has "type FilePath = String" because, as a matter of convenience, you want to have access to all the string and list operations. If you had "newtype FilePath = FilePath String" then you would need filePathLength, filePathMap and so on, or else you would forever be using conversion functions.
On the other hand, consider SQL queries. SQL injection is a common security hole, so it makes sense to have something like
newtype Query = Query String
and then add extra functions that will convert a string into a query (or query fragment) by escaping quote characters, or fill in blanks in a template in the same way. That way you can't accidentally convert a user parameter into a query without going through the quote escaping function.
For simple X = Y declarations, type is documentation; newtype is type checking; this is why newtype is compared to data.
I fairly frequently use newtype for just the purpose you describe: ensuring that something which is stored (and often manipulated) in the same way as another type is not confused with something else. In that way it works just as a slightly more efficient data declaration; there's no particular reason to chose one over the other. Note that with GHC's GeneralizedNewtypeDeriving extension, for either you can automatically derive classes such as Num, allowing your temperatures or yen to be added and subtracted just as you can with the Ints or whatever lies beneath them. One wants to be a little bit careful with this, however; typically one doesn't multiply a temperature by another temperature!
For an idea of how often these things are used, In one reasonably large project I'm working on right now, I have about 122 uses of data, 39 uses of newtype, and 96 uses of type.
But the ratio, as far as "simple" types are concerned, is a bit closer than that demonstrates, because 32 of those 96 uses of type are actually aliases for function types, such as
type PlotDataGen t = PlotSeries t -> [String]
You'll note two extra complexities here: first, it's actually a function type, not just a simple X = Y alias, and second that it's parameterized: PlotDataGen is a type constructor that I apply to another type to create a new type, such as PlotDataGen (Int,Double). When you start doing this kind of thing, type is no longer just documentation, but is actually a function, though at the type level rather than the data level.
newtype is occasionally used where type can't be, such as where a recursive type definition is necessary, but I find this to be reasonably rare. So it looks like, on this particular project at least, about 40% of my "primitive" type definitions are newtypes and 60% are types. Several of the newtype definitions used to be types, and were definitely converted for the exact reasons you mentioned.
So in short, yes, this is a frequent idiom.
I think it is quite common to use newtype for type distinctions. In many cases this is because you want to give different type class instances, or to hide implementations, but simply wanting to protect against accidental conversions is also an obvious reason to do it.