I have recently started messing around with DataKinds in order to have compile-time scientific units for arithmetic. I have more or less figured out a way to do what I want but I feel like it could be a lot cleaner.
I needed integers that could potentially be negative (m^-1) so I decided to use integers rather than naturals. But as it turns out when you do :k 5 it gives you GHC.Types.Nat which does not fit my needs. I ended up instead making my own custom algebraic integer type. As well as defining addition and subtraction type families to use with it.
But this all seems very indirect, it seems like there is no good reason why I can't just directly use all the existing functions for manipulating data at compile time within type families.
Basically I want the following to essentially be generated automatically:
type family (a :: Int) + (b :: Int) :: Int where
-- Should be automatically derivable from (+) applied to Int
Is that possible, if not then why not?
Also is there an easy way to obtain a runtime value back from a type? Specifically when writing a Show instance for all these types I basically just want to pull in the phantom type representing the unit combination and convert it to a string. All the ways I can think of doing that right now seem really verbose.
It looks like the answer to this question is simply that you cannot currently do either of those things automatically. Hopefully it won't take too long before this kind of thing becomes possible.
Related
The Haskell tutorial states that:
by looking at the type signature of read
read :: Read a => String -> a
it follows that GHCI has no way of knowing which type we want in return when running
ghci> read "4"
Why is it necessary to provide a second value from which GHCI can extract a type to compare with?
Wouldn't it be feasible to check a single value against all possible types of the Read typeclass?
Reference:
http://learnyouahaskell.com/types-and-typeclasses
I think you have a (rather common among beginners - I had it myself) misunderstanding of what type classes are. The way Haskell works is logically incompatible with "check[ing] a single value against all possible types of the Read typeclass". Instance selection is based on types. Only types.
You should not think of read as a magical function that can return many types. It's actually a huge family of functions, and the type is used to select which member of the family to use. It's that direction of dependence that matters. Classes create a case where values (usually functions, but not always) - the things that exist at run time - are chosen based on types - the things that exist at compile time.
You're asking "Why not the other direction? Why can't the type depend on the value?", and the answer to that is that Haskell just doesn't work that way. It wasn't designed to, and the theory it was based on doesn't allow it. There is a theory for that (dependent types), and there are extensions being added to GHC that support an increasing set of feature that do some aspect of dependent typing, but it's not there yet.
And even if it was, this example would still not work the way you want. Dependent types still need to know what type something is. You couldn't write a magical "returns anything" version of read. Instead, the type for read would have to involve some function that calculates the type from the value, and inherently only works for the closed set of types that function can return.
Those last two paragraphs are kind of an aside, though. The important part is that classes are ways to go from types to values, with handy compiler support to automatically figure it out for you most of the time. That's all they were designed to do, and it's all that they can do. There are advantages to this design, in terms of ease of compilation, predictability of behavior (open world assumption), and ability to optimize at compile time.
Wouldn't it be feasible to check a single value against all possible types of the Read typeclass?
Doing that would yield the same result; read "4" can potentially be anything that can be read from a String, and that's what ghci reports:
Prelude> :t read "4"
read "4" :: Read a => a
Until you actually do the parsing, the Read a => a represents a potential parsing result. Remember that typeclasses being open means that this could potentially be any type, depending on the presence of the instances.
It's also entirely possible that multiple types could share the same Show/Read textual representation, which brings me to my next point...
If you wanted to check what type the string can be parsed as, that would at the very least require resolving the ambiguity between multiple types that could accept the given input; which means you'd need to know those types beforehand, which Read can't do. And even if you did that, how do you propose such value be then used? You'd need to pack it into something, which implies that you need a closed set again.
All in all, read signature is as precise it can be, given the circumstances.
Not meant as an answer, but this wouldn't fit into a comment cleanly.
In ghci, if you simply do a read "5", then ghci is going to need some help figuring out what you want it to be. However, if that result is being used somewhere, ghci (and Haskell in general) can figure out the type. For (a silly) example:
add1 :: Int -> Int
add1 i = i + 1
five = read "5"
six = add1 five
In that case, there's no need to annotate the read with a type signature, because ghc can infer it from the fact that five is being used in a function that only takes an Int. If you added another function with a different signature that also tried to use five, you'd end up with a compile error:
-- Adding this to our code above
-- Fails to compile
add1Integer :: Integer -> Integer
add1Integer i = i + 1
sixAsInteger = add1Integer five
I have a data structure, such as an expression tree or graph. I want to add some "measure" functions, such as depth and size.
How best to type these functions?
I see the following three variants as of roughly equal usefulness:
depth :: Expr -> Int
depth :: Expr -> Integer
depth :: Num a => Expr -> a
I have the following considerations in mind:
I'm looking at base and fgl as examples, and they consistently use Int, but Data.List also has functions such as genericLength that are polymorphic in return type, and I am thinking that maybe the addition of these generic functions is a reflection of a modernizing trend that I probably should respect and reinforce.
A similar movement of thought is noticeable in some widely used libraries providing a comprehensive set of functions with the same functionality when there are several probable choices of a return type to be desired by the user (e.g. xml-conduit offers parsers that accept both lazy and strict kinds of either ByteString or Text).
Integer is a nicer type overall than Int and I sometimes find that I need to cast a length of a list to an Integer, say because an algorighm that operates in Integer needs to take this length into account.
Making functions return Integral means these functions are made polymorphic, and it may have performance penalty. I don't know all the particulars well, but, as I understand, there may be some run-time cost, and polymorphic things are harder to memoize.
What is the accepted best practice? Which part of it is due to legacy and compatibility considerations? (I.e. if Data.List was designed today, what type would functions such as length have?) Did I miss any pros and cons?
Short answer: As a general rule use Int, and if you need to convert it to something else, use fromIntegral. (If you find yourself doing the conversion a lot, define fi = fromIntegral to save typing or else create your own wrapper.)
The main consideration is performance. You want to write the algorithm so it uses an efficient integer type internally. Provided Int is big enough for whatever calculation you're doing (the standard guarantees a signed 30-bit integer, but even on 32-bit platforms using GHC, it's a signed 32-bit integer), you can assume it will be a high-speed integer type on the platform, particularly in comparison to Integer (which has boxing and bignum calculation overhead that can't be optimized away). Note that the performance differences can be substantial. Simple counting algorithms will often be 5-10x faster using Ints compared to Integers.
While you could give your function a different signature:
depth :: Expr -> Integer
depth :: (Num a) => Expr -> a
but actually implement it under the hood using the efficient Int type and do the conversion at the end, making the conversion implicit strikes me as poor practice. Particularly if this is a library function, making it clear that Int is being used internally by making it part of the signature strikes me as more sensible.
With respect to your listed considerations:
First, the generic* functions in Data.List aren't modern. In particular, genericLength was available in GHC 0.29, released July, 1996. At some prior point, length had been defined in terms of genericLength, as simply:
length :: [a] -> Int
length = genericLength
but in GHC 0.29, this definition was commented out with an #ifdef USE_REPORT_PRELUDE, and several hand-optimized variants of length were defined independently. The other generic* functions weren't in 0.29, but they were already around by GHC 4.02 (1998).
Most importantly, when the Prelude version of length was generalized from lists to Foldables, which is a fairly recent development (since GHC 7.10?), nobody cared enough to do anything with genericLength. I also don't think I've ever seen these functions used "in the wild" in any serious Haskell code. For the most part, you can think of them as deprecated.
Second, the use of lazy/strict and ByteString/Text variants in libraries represents a somewhat different situation. In particular, a conduit-xml user will normally be making the decision between lazy and strict variants and between ByteString and Text types based on considerations about the data being processed and the construction of the algorithms that are far-reaching and pervade the entire type system of a given program. If the only way to use conduit-xml with a lazy Text type was to convert it piecemeal to strict ByteStrings, pass it to the library, and then pull it back out and convert it back to a lazy Text type, no one would accept that complexity. In contrast, a monomorphic Int-based definition of depth works fine, because all you need is fromInteger . depth to adapt it to any numeric context.
Third, as noted above, Integer is only a "nicer" type from the standpoint of having arbitrary precision in situations where you don't care about performance. For things like depth and count in any practical setting, performance is likely to be more important than unlimited precision.
Fourth, I don't think either the runtime cost or failure-to-memoize should be serious considerations in choosing between polymorphic or non-polymorphic versions here. In most situations, GHC will generate a specialized version of the polymorphic function in a context where memoization is no problem.
On this basis, I suspect if Data.List was designed today, it would still use Ints.
I agree with all the points in K. A. Buhr's great answer, but here are a couple more:
You should use a return type of Integer if you expect to support an expression tree that somehow doesn't fit in memory (which seems interesting, but unlikely). If I saw Expr -> Integer I would go looking in the code or docs to try to understand how or why the codomain might be so large.
Re. performance of Integer: normal machine-word arithmetic will be used if the number is not larger than the max width of a machine word. Simplifying, the type is basically:
data Integer = SmallInteger Int | LargeInteger ByteArray
K. A. Buhr mentions that there is an unavoidable performance penalty which is that this value cannot be unboxed (that is it will always have a heap representation, and will be read from and written to memory), and that does sound right to me.
In contrast, functions on Int (or Word) are often unboxed, so that in core you will see types that look like Int# -> Int# ->. You can think of an Int# as only existing in a machine register. This is what you want your numeric code to look like if you care about performance.
Re. polymorphic versions: designing libraries around concrete numeric inputs and polymorphic numeric outputs probably works okay, in terms of convenient type inference. We already have this to a certain degree, in that numeric literals are overloaded. There are certainly times when literals (e.g. also string literals when -XOverloadedStrings) need to be given type signatures, and so I'd expect that if base were designed to be more polymorphic that you would run into more occasions where this would be required (but fewer uses of fromIntegral).
Another option you haven't mentioned is using Word to express that the depth is non-negative. This is more useful for inputs, but even then is often not worth it: Word will still overflow, and negative literals are valid (although GHC will issue a warning); to a certain extent it's just moving where the bug occurs.
I am currently writing my very first program in Haskell.
In the specification I am working with, [0] 5 is used to define a MAC key that could be written "\x00\x00\x00\x00\x00"::ByteString.
I somewhat fancy the idea of reusing that notation (even though it makes very little sense from a programming perspective). Eventually writing mackey so that mackey [0] 5 does the right thing was simple enough.
The only question that remains is how to define my input type so that it enforces the use of a list with a single integer element. Is that even possible?
NB: normally, I wouldn't bother too much about that. I shouldn't even use a list in such case: a simple Int would be enough and "enforce" everything I need; so I know that the correct way is to use a simple integer. But this is a very good way to explore what can be done (or not) with Haskell type system. :)
As you've observed yourself, a single Int does exactly what's needed and is probably the way to go. Don't use a list if you don't want a list!
That said, using a plain Int may not be the best thing either. Perhaps you want to be clear what's the meaning of each argument. You might for that purpose make an alias for Int and call it accordingly:
newtype KeyWord = KeyWord Int
macKey :: KeyWord -> Int -> MAC
In this case the syntax at the call site would then be macKey (KeyWord 0) 5.
It would be possible to shorten that a bit more, but it's probably not worth it. In fact, even the newtype is probably overkill – the main benefit is that the type signature becomes more explicit, but for calling the function this is mostly boilerplate. A simple type-alias is probably enough:
type KeyWord = Int
and then you can again write macKey 0 5 while retaining the clear signature.
If you need to write out lots of those keys in a concise manner, you might consider making macKey and infix operator:
infix 7 #*
(#*) :: KeyWord -> Int -> MAC
and then write 0#*5.
My primary queston is: is there, within some Haskell AST, a way I can determine a list of the available declarations, and their types? I'm trying to build an editor that allows for the user to be shown all the appropriate edits available, such as inserting functions and/or other declared values that can be used or inserted at any point. It'll also disallows syntax errors as well as type-errors. (That is, it'll be a semantic structural editor, which I'll also use the typechecker to make sure the editing pieces make sense to in this case, Haskell).
The second part of my question is: once I have that list, given a particular expression or function or focussed-on piece of AST (using Lens), how could I filter the list based on what could possibly replace or fit that particular focussed-on AST piece (whether by providing arguments to a function, or if it's a value, just "as-is"). Perhaps I need to add some concrete example here... something like: "Haskell, which declarations could possibly be applied (for functions) and/or placed into the hole at yay x y z = (x + y - z) * _?" then if there was an expression number2 :: Num a => a ; number2 = 23 it would put this in the list, as well as the functions available in the context, as well as those from Num itself such as (+) :: Num a => a -> a -> a, (*) :: Num a => a -> a -> a, and any other declarations that resulted in a type that would match such as Num a => a etc. etc.
More details follow:
I’ve done a fair bit of research into this area over quite a long time: looked at and used hint, Language.Haskell.Exts and Control.Lens a fair bit. Also had a look into Dynamic. Control.Lens is relevant for the second half of my question. I've also looked at quite a few projects along the way including Conal Elliott's "Semantic Editing Combinators", Paul Chiusano's Unison system and quite a few things in Clojure and Lisp as well.
So, I know I can get a list of the exports of a module with hint as [String], and I could coerce that to [Dynamic], I think (possibly?), but I’m not sure how I’d get sub-function declarations and their types. (Maybe I could take the declarations within that scope with AST and put them in their own modules in a String and pull them in by getting the top level declarations with hint? that would work but feels hacky and cumbersome)
I can use (:~:) from Data.Typeable to do "propositional equality" (ie typechecking?) on two terms, but what I actually need to do is see if a term could be matched into a position in the source/AST (I'm using lenses and prisms to focus on those parts of the AST) given some number of arguments. Some kind of partial type-checking, or result type-checking? Because the thing I might be focussing on could very well be a function, and I might need to keep the same arity.
I feel like perhaps this is very similar to Idris' term-searching, though I haven't looked into the source for that and I'm not sure if that's something only possible in a dependently typed language.
Any help would be great.
Looks like I kind of answered my own questions, so I'm going to do so formally here.
The answer to the first part of my question can be found in the Reflection module of the hint library. I knew I could get a list a [String] of these modules, but there's a function in there that can be used which has type: getModuleExports :: MonadInterpreter m => ModuleName -> m [ModuleElem] and is most likely the sort of thing I'm after. This is because hint provides access to a large part of the GHC API. It also provides some lookup functions which I can then use to get the types of these top level terms.
https://github.com/mvdan/hint/blob/master/src/Hint/Reflection.hs#L30
Also, Template Haskell provides some of the functionality I'm interested in, and I'll probably end up using quite a bit of that to build my functions, or at least a set of lenses for whatever syntax is being used by the code (/text) under consideration.
In terms of the second part of the question, I still don't have a particularly good answer, so my first attempt will be to use some String munging on the output of the lookup functions and see what I can do.
I am so familiar with imperative language and their features. So, I wonder how I can convert one data type to another?
ex:
in c++
static_cast<>
in c
( data_type) <another_data_type>
Use an explicit coercion function.
For example, fromIntegral will convert any Integral type (Int, Integer, Word*, etc.) into any numeric type.
You can use Hoogle to find the actual function that suits your need by its type.
Haskell's type system is very different and quite a bit smarter then C/C++/Javas. To understand why you are not going to get the answer you expect, it will help to compare the two.
For C and friends the type is a way of describing the layout of data in memory. The compiler does makes a few sanity checks trying to ensure that memory is not corrupted, but in the end its all bytes and you can call them what ever you want. This is even more true of pointers which are always laid out the same in memory but can reference anything or (frighteningly) nothing.
In Haskell, types are a language that one writes to the compiler with. As a programmer you have no control over how the compiler represents data, and because haskell is lazy a lot of data in your program may be no more then a promise to produce a value on demand (called a thunk in the code GHC's and HUGS). While a c compiler can be instructed to treat data differently, there is no equivalent way to tell a haskell compiler to treat one type like another in general.
As mentioned in other answers, there are some types where there are obvious ways to convert one type to another. Any of the numerical types like Double, Fraction, or Real (in general any instance of the Num class) can be made from an Integer, but we need to use a function specifically designed to make this happen. In a sense this is not a 'cast' but an actual function in the same way that \x -> x > 0 is a function for converting numbers into booleans.
I'll make one last guess as to why you might be asking a question like this. When I was just starting with haskell I wrote a lot of functions like:
area :: Double -> Double -> Double -- find the area of a rectangle
area x y = x * y
I would then find myself dumping fromInteger calls all over the place to get my data in the correct type for the function. Having come from a C background I wrote all my functions with monomorphic types. The trick to not needing to cast from one type to another is to write functions that work with different types. Haskell type classes are a huge shift for OOP programmers and so they often get ignored the first couple of tries, but they are what make the otherwise very strict haskell type system usable. If you can relax your type signatures (e.g. area :: (Num a)=> a -> a -> a) you will find yourself wishing for that cast function much less often.
There are many different functions that convert different data types.
The examples would be:
fromIntegral - to convert between say Int and Double
pack / unpack - to convert between ByteString and String
read - to convert from String to Int