So I'm learning Scala and I came across this finicky issue...
If we have a String and want to convert it to an Int, all examples that I've found online say, "It's so simple! Just use String.toInt!"
Okay:
var x = readLine().toInt
Simple enough. I'm assuming toInt is a function of String. But if that's the case, I should be able to call toInt with parentheses, right? Coming from Java, this feels more natural to me:
var x = readLine().toInt()
But alas! Scala gives me the following error:
[error] /home/myhome/code/whatever/hello.scala:13: Int does not take
parameters
[error] var x = readLine().toInt()
Curious. What does this error mean? Is toInt not a function of String? Similarly, why can I do both:
var x = readLine().toLowerCase()
var y = readLine().toLowerCase
with out any problems?
Edit: The duplicate question does not address the toLowerCase vs toInt issue.
This is a good question, and I don't think it counts as a duplicate of the question about defining zero-arity methods with or without parentheses.
The difference between toInt and toLowerCase here is that toInt is a syntactic enrichment provided by the standard library's StringOps class. You can check this by using reify and showCode in the REPL:
scala> import scala.reflect.runtime.universe.{ reify, showCode }
import scala.reflect.runtime.universe.{reify, showCode}
scala> showCode(reify("1".toInt).tree)
res0: String = Predef.augmentString("1").toInt
This just desugars the enrichment method call and shows you what implicit conversion has been applied to support it.
If we look at the toInt method on StringOps, we see that it's defined without parentheses. As the answer to the possible duplicate question points out, if a zero-arity method in Scala is defined without parentheses, it can't be called with parentheses (although if it's defined with parentheses it can be called either way).
So that's why "1".toInt() doesn't work. If String were a normal Scala class following normal Scala naming conventions, "ABC".toLowerCase() wouldn't work either, since the method would be defined without parentheses, since it's not side-effecting (this is just a convention, but it tends to be pretty consistently applied in Scala code).
The problem is that String isn't actually a class in the Scala standard library—it's just java.lang.String. At the JVM level, there's no difference between a zero-arity method defined with parentheses and one defined without parentheses—this is purely a Scala distinction, and it's not encoded in the JVM method signature, but in a separate batch of metadata stored in the class file.
The Scala language designers decided to treat all zero-arity methods that aren't defined in Scala (and therefore don't have this extra metadata) as if they were defined with parentheses. Since the toLowerCase method on String is really and truly a method of the java.lang.String class (not a syntactic enrichment method like toInt), it falls into this category, which means you can call it either way.
Related
In my app I'm doing a lot of conversions from Text to various datatypes, often just to Text itself, but sometimes to other datatypes.
I also rarely do conversions from other string types, e.g. String and ByteString.
Interestingly, Readable.fromText does the job for me, at least for Integer and Text. However I also now need UTCTime, which Readable.fromText doesn't have an instance for (but which I could write myself).
I was thinking that Readable.fromText was a Text analogy of Text.Read.readEither for [Char], however I've realised that Readable.fromText is actually subtlety different, in that readEither for text isn't just pure, but instead expects the input string to be quoted. This isn't the case however for reading integers however, who don't expect quotes.
I understand that this is because show shows strings with quotes, so for read to be consistent it needs to require quotes.
However this is not the behaviour I want. I'm looking for a typeclass where reading strings to strings is basically the id function.
Readable seems to do this, but it's misleadingly named, as its behaviour is not entirely analogous to read on [Char]. Is there another typeclass that has this behaviour also? Or am I best of just extending Readable, perhaps with newtypes or alternatively PRs?
The what
Just use Data.Text and Data.Text.Read directly
With signed decimal or just decimal you get a simple and yet expressive minimalistic parser function. It's directly usable:
type Reader a = Text -> Either String (a, Text)
decimal :: Integral a => Reader a
signed :: Num a => Reader a -> Reader a
Or you cook up your own runReader :: Reader a -> M a combinator for some M to possibly handle non-empty leftover and deal with the Left case.
For turning a String -> Text, all you have to do is use pack
The why
Disclaimer: The matter of parsing data the right way is answered differently depending on who you ask.
I belong to the school that believes typeclasses are a poor fit for parsing mainly for two reasons.
Typeclasses limit you to one instance per type
You can easily have two different time formats in the data. Now you might tell yourself that you only have one use case, but what if you depend on another library that itself or transitively introduces another instance Readable UTCTime? Now you have to use newtypes for no reason other than be able to select a particular implementation, which is not nice!
Code transparency
You cannot make any inference as to what parser behavior you get from a typename alone. And for the most part haddock instance documentation often does not exist because it is often assumed the behavior be obvious.
Consider for example: What will instance Readable Int64 do?
Will it assume an ASCII encoded numeric representation? Or some binary representation?
If binary, which endianness is going to be assumed?
What representation of signedness is expected? In ASCII case perhaps a minus? Or maybe with a space? Or if binary, is it going to be one-complement? Two-complement?
How will it handle overflow?
Code transparency on call-sites
But the intransparency extends to call-sites as well. Consider the following example
do fieldA <- fromText
fieldB <- fromText
fieldB <- fromText
pure T{..}
What exactly does this do? Which parsers will be invoked? You will have to know the types of fieldA, fieldB and fieldB to answer that question. Now in simple code that might seem obvious, but you might easily forget if you look at the same code 2 weeks from now. Or you have more elaborate code, where the types involves are inferred non-locally. It becomes hard to follow which instance this will end up selecting (and the instance can make a huge difference, especially if you start newtyping for different formats. Say you cannot make any inference from a field name fooTimestamp because it might perhaps be UnixTime or UTCTime)
And much worse: If you refactor and alter one of the field types data declaration from one type to another - say a time field from Word64 to UTCTime - this might silently and unexpectedly switch out to a different parser, leading to a bug. Yuk!
On the topic of Show/Read
By the way, the reason why show/read behave they way they do for Prelude instances and deriving-generated instances can be discovered in the Haskell Report 2010.
On the topic of show it says
The result of show is a syntactically correct Haskell expression
containing only constants [...]
And equivalently for read
The result of show is readable by read if all component types are readable.
(This is true for all instances defined in the Prelude but may not be true
for user-defined instances.) [...]
So show for a string foo produces "foo" because that is the syntactically valid Haskell literal representing the string value of foo, and read will read that back, acting as a kind of eval
As I learn Haskell, I can't help but try to understand everything from a formal point of view. After all this is the theoretical coherence I came to look for as a Scala Programmer.
One thing that does compute well in my mind, is fitting the meaning of type declaration in the overraching theory of lambda calculus and everything is an expression. Yes there is Binding, but Binding does not work with type declaration either.
Example
data DataType a = Data a | Datum
Question:
What is the meaning of = here? If it was the declaration of a function, then on the rhs we would get an expression reducible to an irreducible value or another expression (i.e. returning a function). This is not what we have above.
My confusion
We have a type function in DataType a, and Data Constructor a.k.a Value Constructor in Data a and Datum. Clearly a type is not equal to a value, one is at the term level and the other at the type level. Not the same space at all. However it might work to follow the pronunciation provided here https://wiki.haskell.org/Pronunciation. = is pronounced is. As in a DataType is those values. But that is a stretch. because it is not the same semantic as for a function declaration. Hence I'm puzzled. A Type Level function equal a value level function makes no sense to me.
My question reformulated differently so to explain what i am looking for
So in a sense, I would like to understand the semantic of a data declaration and where does it fit in everything is an expression (Haskell theoretical framework) ?
Clarifying the difference with Binding. Generally speaking, when talking about binding, can we say it is an expression of type Unit ? Otherwise, what is it, and what is the type of it, and where does it fit in lambda calculus or whatever we should call the theoretical framework that backs Haskell ?
I think the simplest answer here is that = doesn’t have any one meaning in particular — it is simply the syntax which the Haskell designers decided to use for data types. There is no particular relationship between the = used in function definitions and the = used in ADT definitions; both constructions are used to declare something on the LHS using some sort of specification on the RHS, but that’s about it. The two constructions have more differences than similarities.
As it happens, modern Haskell doesn’t necessarily require the use of = when defining an ADT. The GADTSyntax and GADTs GHCI language extensions enable ‘GADT syntax’, which is an alternate method to define ADTs. It looks like this:
data DataType a where
Data :: a -> DataType a
Datum :: DataType a
This illustrates that the use of = is not necessarily a mandatory part of type declarations: it is simply one convention amongst many.
(As to why the Haskell designers chose to use the convention with =, I’m not entirely sure. Possibly it might have been by analogy with type synonyms, like type Synonym = Maybe Int, in which = does have similarities to its use in function definitions, in that the RHS expresses a type which is then assigned the name on the LHS.)
Since you mentioned you are a Scala programmer, here is what DataType might look like in Scala:
sealed trait DataType[+A]
case class Data[A](a: A) extends DataType[A]
object Datum extends DataType[Nothing]
I read your Haskell code as "A DataType a can be either a Data a or Datum". DataType is a type constructor, whereas Data is a value constructor and Datum is a value. You can think of Data as a function Data :: a -> DataType a, and Datum as something like Datum :: forall a. DataType a (imagine the names were lowercase). The RHS tells you which functions you can use to get a value of type LHS.
When I try to compile this with ghc it complains the number of parameters in the left hand sides of the function definition are different.
module Example where
import Data.Maybe
from_maybe :: a -> Maybe a -> a
from_maybe a Nothing = a
from_maybe _ = Data.Maybe.fromJust
I'm wondering whether this is a ghc restriction. I tried to see if I
could find anything about the number of parameters in the Haskell 2010
Report but I wasn't successful.
Is this legal Haskell or isn't it? If not, where is this parameter count
restriction listed?
It's not legal. The restriction is described in the
Haskell 2010 Report:
4.4.3.1 Function bindings
[...]
Note that all clauses defining a function must be contiguous, and the number of patterns in each clause must be the same.
The answer from melpomene is the summation of the syntax rules. In this answer I want to try to describe why this has to be the case.
From the point of view of the programmer, the syntax in the question seems quite reasonable. There is a first specific case which is marked by a value in the second parameter. Then there is a general solution which reduces to an existing function. Either pattern on its own is valid so why can they not co-exist?
The syntax rule that all patterns must capture exactly the same parameters is a necessary result of haskell being a lazy language. Being lazy means that the values of parameters are not evaluated until they are needed.
Let's look at what happens when we curry a function, that is provide it with less then the number of parameters in the declaration. When that happens haskell produces an anonymous function* that will complete the function and stores the first parameters in the scope of that function without evaluating them.
That last part is important. For functions with different numbers of parameters, it would be necessary to evaluate some of them to choose which pattern to match, but then not evaluate them.
In other words the compiler needs to know exactly how many parameters it needs before it evaluates them to choose among the patterns. So while it seems like we should not need to put an extra _ in, the compiler needs it to be there.
As an aside, the order in which parameters occur can make a big difference in how easy a function is to implement and use. See the answers to my question Ordering of parameters to make use of currying for suggestions. If the order is reversed on the example function, it can be implemented with only the one named point.
from_maybe :: Maybe a -> a -> a
from_maybe Nothing = id
from_maybe (Just x) = const x
*: What an implementation like GHC actually does in this situation is subject to optimization and may not work exactly this way, but the end result must be the same as if it did.
My primary queston is: is there, within some Haskell AST, a way I can determine a list of the available declarations, and their types? I'm trying to build an editor that allows for the user to be shown all the appropriate edits available, such as inserting functions and/or other declared values that can be used or inserted at any point. It'll also disallows syntax errors as well as type-errors. (That is, it'll be a semantic structural editor, which I'll also use the typechecker to make sure the editing pieces make sense to in this case, Haskell).
The second part of my question is: once I have that list, given a particular expression or function or focussed-on piece of AST (using Lens), how could I filter the list based on what could possibly replace or fit that particular focussed-on AST piece (whether by providing arguments to a function, or if it's a value, just "as-is"). Perhaps I need to add some concrete example here... something like: "Haskell, which declarations could possibly be applied (for functions) and/or placed into the hole at yay x y z = (x + y - z) * _?" then if there was an expression number2 :: Num a => a ; number2 = 23 it would put this in the list, as well as the functions available in the context, as well as those from Num itself such as (+) :: Num a => a -> a -> a, (*) :: Num a => a -> a -> a, and any other declarations that resulted in a type that would match such as Num a => a etc. etc.
More details follow:
I’ve done a fair bit of research into this area over quite a long time: looked at and used hint, Language.Haskell.Exts and Control.Lens a fair bit. Also had a look into Dynamic. Control.Lens is relevant for the second half of my question. I've also looked at quite a few projects along the way including Conal Elliott's "Semantic Editing Combinators", Paul Chiusano's Unison system and quite a few things in Clojure and Lisp as well.
So, I know I can get a list of the exports of a module with hint as [String], and I could coerce that to [Dynamic], I think (possibly?), but I’m not sure how I’d get sub-function declarations and their types. (Maybe I could take the declarations within that scope with AST and put them in their own modules in a String and pull them in by getting the top level declarations with hint? that would work but feels hacky and cumbersome)
I can use (:~:) from Data.Typeable to do "propositional equality" (ie typechecking?) on two terms, but what I actually need to do is see if a term could be matched into a position in the source/AST (I'm using lenses and prisms to focus on those parts of the AST) given some number of arguments. Some kind of partial type-checking, or result type-checking? Because the thing I might be focussing on could very well be a function, and I might need to keep the same arity.
I feel like perhaps this is very similar to Idris' term-searching, though I haven't looked into the source for that and I'm not sure if that's something only possible in a dependently typed language.
Any help would be great.
Looks like I kind of answered my own questions, so I'm going to do so formally here.
The answer to the first part of my question can be found in the Reflection module of the hint library. I knew I could get a list a [String] of these modules, but there's a function in there that can be used which has type: getModuleExports :: MonadInterpreter m => ModuleName -> m [ModuleElem] and is most likely the sort of thing I'm after. This is because hint provides access to a large part of the GHC API. It also provides some lookup functions which I can then use to get the types of these top level terms.
https://github.com/mvdan/hint/blob/master/src/Hint/Reflection.hs#L30
Also, Template Haskell provides some of the functionality I'm interested in, and I'll probably end up using quite a bit of that to build my functions, or at least a set of lenses for whatever syntax is being used by the code (/text) under consideration.
In terms of the second part of the question, I still don't have a particularly good answer, so my first attempt will be to use some String munging on the output of the lookup functions and see what I can do.
Suppose I was to define (+) on Strings but not by giving an instance of Num String.
Why does Haskell now hide Nums (+) function? After all, the function I have provided:
(+) :: String -> String -> String
can be distinguished by the compiler from Prelude's (+). Why can't both functions exist in the same namespace, but with different, non-overlapping type signatures?
As long as there is no call to the function in the code, Haskell to care that there's an ambiguitiy. Placing a call to the function with arguments will then determine the types, such that appropriate implementation can be chosen.
Of course, once there is an instance Num String, there would actually be a conflict, because at that point Haskell couldn't decide based upon the parameter type which implementation to choose, if the function were actually called.
In that case, an error should be raised.
Wouldn't this allow function overloading without pitfalls/ambiguities?
Note: I am not talking about dynamic binding.
Haskell simply does not support function overloading (except via typeclasses). One reason for that is that function overloading doesn't work well with type inference. If you had code like f x y = x + y, how would Haskell know whether x and y are Nums or Strings, i.e. whether the type of f should be f :: Num a => a -> a -> a or f :: String -> String -> String?
PS: This isn't really relevant to your question, but the types aren't strictly non-overlapping if you assume an open world, i.e. in some module somewhere there might be an instance for Num String, which, when imported, would break your code. So Haskell never makes any decisions based on the fact that a given type does not have an instance for a given typeclass. Of course, function definitions hide other function definitions with the same name even if there are no typeclasses involved, so as I said: not really relevant to your question.
Regarding why it's necessary for a function's type to be known at the definition site as opposed to being inferred at the call-site: First of all the call-site of a function may be in a different module than the function definition (or in multiple different modules), so if we had to look at the call site to infer a function's type, we'd have to perform type checking across module boundaries. That is when type checking a module, we'd also have to go all through the modules that import this module, so in the worst case we have to recompile all modules every time we change a single module. This would greatly complicate and slow down the compilation process. More importantly it would make it impossible to compile libraries because it's the nature of libraries that their functions will be used by other code bases that the compiler does not have access to when compiling the library.
As long as the function isn't called
At some point, when using the function
no no no. In Haskell you don't think of "before" or "the minute you do...", but define stuff once and for all time. That's most apparent in the runtime behaviour of variables, but also translates to function signatures and class instances. This way, you don't have to do all the tedious thinking about compilation order and are safe from the many ways e.g. C++ templates/overloads often break horribly because of one tiny change in the program.
Also, I don't think you quite understand how Hindley-Milner works.
Before you call the function, at which time you know the type of the argument, it doesn't need to know.
Well, you normally don't know the type of the argument! It may sometimes be explicitly given, but usually it's deduced from the other argument or the return type. For instance, in
map (+3) [5,6,7]
the compiler doesn't know what types the numeric literals have, it only knows that they are numbers. This way, you can evaluate the result as whatever you like, and that allows for things you could only dream of in other languages, for instance a symbolic type where
> map (+3) [5,6,7] :: SymbolicNum
[SymbolicPlus 5 3, SymbolicPlus 6 3, SymbolicPlus 7 3]