violation of Haskell indentation rules for if-then-else

violation of Haskell indentation rules for if-then-else - haskell

According to the Haskell indentation rules, "Code which is part of some expression should be indented further in than the beginning of that expression". However, I found the following example, which seems to violate the rule above, compiles without any error or warning:
someFunction :: Bool -> Int -> Int -> Int
someFunction condition a b = if condition
then a - b
else a + b
Here I am defining a function someFunction, its body is an if-then-else block. According to the indentation rule, the then block is a part of the same expression in the first line, so it should be indented further than its previous line. Yet in my example, the second line then starts at the same column as the first line, and this example compiles.
I am not sure what is going on here. I am working with GHC version 8.0.1.

I'm reasonably sure this is an artifact of a deliberate GHC variation on the indentation rule. Nice catch!
GHC reads this
foo = do
item
if a
then b
else c
item
as
foo = do {
item ;
if a ;
then b ;
else c ;
item }
which should trigger a parse error.
However, this was so common that at a certain point the GHC devs decided to allow for an optional ; before then and else. This change to the if grammar makes the code compile.
This means that if became "special", in that it does not have to indented more, but only as much as the previous item. In the code posted in the question, then is indented as much as the previous item, so there's an implicit ; before it, and that makes the code compile.
I would still try to avoid this "style", though, since it's quirky.
(Personally, I wouldn't have added this special case to GHC. But it's not a big deal, anyway.)
I now noticed that the Wikibook mentions this variant as a "proposal" for a future version of Haskell. This is a bit outdated now, and has been implemented in GHC since then.

Related

Why does the Maybe return type make this crash?

I'm restricting myself the use of prebuilt-in functions for training purposes. I have recoded length as count and it works.
I have a search funtion that simply returns a value at index in a list when given an index and a list. It works completly fine. It throws an error when the index is too large.
search [] _ = error "index too large"
search (a:_) 0 = a
search (_:a) b = search a (b - 1)
Now, I want a safeSearch function that return Nothing if the index is too large of if the list is empty. So I've simply done this.
safeSearch :: [a] -> Int -> Maybe a
safeSearch a b
| b < 0 || b >= count a = Nothing
| otherwise = Just (search a b)
And it works! ... as long as you don't try it on an empty list. Even with an index too large for the list length.
main = print(safeSearch [] 5)
This crashes and I really can't find any way around it.
Even though I don't think my second line is usefull (because if the list is empty, its count is 0 so we drop in the first guard and it should return Nothing?) its not working. Removing it does not solve the problem.
Here's the compile-time error.
main.hs:91:8: error:
* Ambiguous type variable `a0' arising from a use of `print'
prevents the constraint `(Show a0)' from being solved.
Probable fix: use a type annotation to specify what `a0' should be.
These potential instances exist:
instance Show Ordering -- Defined in `GHC.Show'
instance Show Integer -- Defined in `GHC.Show'
instance Show a => Show (Maybe a) -- Defined in `GHC.Show'
...plus 22 others
...plus 13 instances involving out-of-scope types
(use -fprint-potential-instances to see them all)
* In the expression: print (safeSearch [] 5)
In an equation for `main': main = print (safeSearch [] 5)
|
91 | main = print(safeSearch [] 5)
| ^^^^^^^^^^^^^^^^^^^^^^
exit status 1
Any idea? Something I'm missing or even completly going wrong? A concept I need to understand deeper?

The problem is a compile error. That means it isn't actually running your code and hitting your error "index too large" call; the compiler is rejecting your code before it can even try to run it. So you're looking in the wrong place if you're trying to change the code to avoid that.
What's actually happening is that safeSearch [] 5 is returning a value of type Maybe a, where a is the type of the elements in the list. But you didn't include any elements in the list, so there is nothing at all to decide what that type a is.
Your function safeSearch can work for any type, so that's actually fine. But you also try to print the Maybe a value. Using print requires a Show instance, and the instance for Maybe a requires there to also be a Show instance for a. Because there is nothing saying what type a is, the compiler has no way of finding the appropriate Show instance for it, so it has to abort compilation with an error.
The most straightforward way to solve it is to add a type annotation (either of the list, or the Maybe a value resulting from safeSearch). Something like this:
main = print (safeSearch ([] :: [Int]) 5)
(This is what the error message is talking about when it says an ambiguous type variable is preventing a Show constraint from being solved, and that the probable fix is to add a type annotation)
Note that this sort of issue is rarely a problem in "real" code. Normally if you have a list processed into another structure with a related type, you will have other code that does something with the elements or the result, or that produced the list (which isn't always empty). You wouldn't normally write a program that does nothing but process an always-empty list and print the result, except for these kinds of quick tests. So normally, when there is that other code as well, there will be enough context for the compiler to deduce the type of your empty list, and the extra type annotations will not be needed. So this kind of extra type annotation is not usually considered a serious burden that needs to be avoided, because they are hardly ever needed in "real" code. You just code as you want, and on the occassion that a compile error makes your realise you need an annotation you simply add it and move on.
If you do this kind of quick check in GHCi rather than writing a full program with a main function, then you also would not have needed the extra type annotation. This is because GHCi has the ExtendedDefaultRules language extension turned on by default. The "default rules" are conditions when GHC will choose a type for you instead of throwing an "ambiguous type" error. The normal default rules are pretty strict, and really only designed for defaulting numeric constraints (like Num a or Real a, etc). They do not apply to your original example. The "extended default rules" apply more often to avoid needing lots of type signatures in the interactive interpreter (since there you enter one line at a time, instead of the compiler being able to see the full module to infer types from usage). In this case entering print (safeSearch [] 5) at the interpreter prompt will work because it defaults the returned type to Maybe (), and it just so happens that printing Nothing :: Maybe () produces the same output as it would if it had correctly guessed the type you actually meant.
But in almost any real program, defaulting a type variable to () will be a stupid thing to do that makes things work less, so I do not recommend getting into the habit of enabling ExtendedDefaultRules in an actual module. Just add the type annotation, or do quick checks in the interpreter instead of in a module.

What you've written works great for any real-world use case. It only fails when someone writes print (safeSearch [] x) - a literal empty list, with no context to tell what result type is expected. It works fine if they pass in a nonempty list, or a list expression that happens to evaluate to an empty list, or if they use the result in a way that lets type inference figure out what was intended.
Further, there is really no way to write the function so that it works when passed a contextless empty list. The burden to make the types clear is necessarily placed on call sites, not the definition. The comments on your question have already shown how to do this; you only have to be that explicit when you're calling your function in a way that's obviously useless.

difference between variable definition in a Haskell source file and in GHCi?

In a Haskell source file, I can write
a = 1
and I had the impression that I have to write the same in GHCi as
let a = 1
, for a = 1 in GHCi gives a parse error on =.
Now, if I write
a = 1
a = 2
in a source file, I will get an error about Multiple declaration of a, but it is OK to write in GHCi:
let a = 1
let a = 2
Can someone help clarify the difference between the two styles?

Successive let "statements" in the interactive interpreter are really the equivalent of nested let expressions. They behave as if there is an implied in following the assignment, and the rest of the interpreter session comprises the body of the let. That is
>>> let a = 1
>>> let a = 1
>>> print a
is the same as
let a = 1 in
let a = 1 in
print a

There is a key difference in Haskell in having two definitions of the same name and identical scopes, and having two definitions of the same name in nested scopes. GHCi vs modules in a file isn't really related to the underlying concept here, but those situations do lead you to encounter problems if you're not familiar with it.
A let-expression (and a let-statement in a do block) creates a set of bindings with the same scope, not just a single binding. For example, as an expression:
let a = True
a = False
in a
Or with braces and semicolons (more convenient to paste into GHCi without turning on multi-line mode):
let { a = True; a = False} in a
This will fail, whether in a module or in GHCi. There cannot be a single variable a that is both True and False, and there can't be two separate variables named a in the same scope (or it would be impossible to know which one was being referred to by the source text a).
The variables in a single binding set are all defined "at once"; the order they're written in is not relevant at all. You can see this because it's possible to define mututally-recursive bindings that all refer to each other, and couldn't possibly be defined one-at-a-time in any order:
λ let a = True : b
| b = False : a
| in take 10 a
[True,False,True,False,True,False,True,False,True,False]
it :: [Bool]
Here I've defined an infinite list of alternating True and False, and used it to come up with a finite result.
A Haskell module is a single scope, containing all the definitions in the file. Exactly as in a let-expression with multiple bindings, all the definitions "happen at once"1; they're only in a particular order because writing them down in a file inevitably introduces an order. So in a module this:
a = True
a = False
gives you an error, as you've seen.
In a do-block you have let-statements rather than let-expressions.2 These don't have an in part since they just scope over the entire rest of the do-block.3 GHCi commands are very like entering statements in an IO do-block, so you have the same option there, and that's what you're using in your example.
However your example has two let-bindings, not one. So there are two separate variables named a defined in two separate scopes.
Haskell doesn't care (almost ever) about the written order of different definitions, but it does care about the "nesting order" of nested scopes; the rule is that when you refer to a variable a, you get the inner-most definition of a whose scope contains the reference.4
As an aside, hiding an outer-scope name by reusing a name in an inner scope is known as shadowing (we say the inner definition shadows the outer one). It's a useful general programming term to know, since the concept comes up in many languages.
So it's not that the rules about when you can define a name twice are different in GHCi vs a module, its just that the different context makes different things easier.
If you want to put a bunch of definitions in a module, the easy thing to do is make them all top-level definitions, which all have the same scope (the whole module) and so you get an error if you use the same name twice. You have to work a bit more to nest the definitions.
In GHCi you're entering commands one-at-a-time, and it's more work to use multi-line commands or braces-and-semicolon style, so the easy thing when you want to enter several definitions is to use several let statements, and so you end up shadowing earlier definitions if you reuse names.5 You have to more deliberately try to actually enter multiple names in the same scope.
1 Or more accurately the bindings "just are" without any notion of "the time at which they happen" at all.
2 Or rather: you have let-statements as well as let-expressions, since statements are mostly made up of expressions and a let-expression is always valid as an expression.
3 You can see this as a general rule that later statements in a do-block are conceptually nested inside all earlier statements, since that's what they mean when you translate them to monadic operations; indeed let-statements are actually translated to let-expressions with the rest of the do-block inside the in part.
4 It's not ambiguous like two variables with the same name in the same scope would be, though it is impossible to refer to any further-out definitions.
5 And note that anything you've previously defined referring to the name before the shadowing will still behave exactly as it did before, referring to the previous name. This includes functions that return the value of the variable. It's easiest to understand shadowing as introducing a different variable that happens to have the same name as an earlier one, rather than trying to understand it as actually changing what the earlier variable name refers to.

Syntax rules for Haskell infix datatype constructors

I'm trying to make a Haskell datatype a bit like a python dictionary, a ruby hash or a javascript object, in which a string is linked to a value, like so:
data Entry t = Entry String t
type Dictionary t = [Entry t]
The above code works fine. However, I would like a slightly nicer constructor, so I tried defining it like this:
data Entry t = String ~> t
This failed. I tried this:
data Entry t = [Char] ~> t
Again, it failed. I know that ~ has special meaning in Haskell, and GHCi still permits the operator ~>, but I still tried one other way:
data Entry t = [Char] & t
And yet another failure due to parse error. I find this confusing because, for some inexplicable reason, this works:
data Entry t = String :> t
Does this mean that there are certain rules for what characters may occur in infix type constructors, or is it a cast of misinterpretation. I'm not a newbie in Haskell, and I'm aware that it would be more idiomatic to use the first constructor, but this one's stumping me, and it seems to be an important part of Haskell that I'm missing.

Any operator that starts with a colon : is a type constructor or a data constructor, with the exception of (->). If you want the tilde, you could use :~>, but you're not going to get away with using something that doesn't start with a colon. Source

How could I remove the "if ... then ... else ..." keywords in Haskell (GHC)?

I would like to remove the if ... then ... else ... keywords, because I am embedding a language/DSL in Haskell. if, then and else convey a lot of meaning in many domains, and it would be great if I could redefine (or leave them undefined) them to reflect the nature of the language/domain.
I've searched on Google and stackoverflow, but found nothing. (I did find an old thread on why if ... then ... else ... was included as keywords in Haskell.)
My IDE is in Leksah, and, if the keywords can be removed, it would also be nice to have a setting to change the if ... then ... else ... keywords back to their normal font/color/unbold.
I've already tried a naming convention of if' for if and so on. It doesn't feel as good, especially when I want to define if and if', and have to define if' and if'' instead, or if1 and if2. The presence of both if' and if might also be confusing. (The confusion is not that serious an issue in my situation as the users of the DSL are Haskell programmers, but I suppose it can help in other situations).
Summarizing the responses to date:
Use the RebindableSyntax extension to GHC. Not as general as removing the keywords: the syntax of Haskell's if-then-else is retained. (Frerich Raabe)
Workaround: Use very similar words/names, by using data Conditional b a = If b (Then a) (Else a) (only applicable in some contexts). (C. A. McCann)
If RebindableSyntax is a relatively new feature, then it's unlikely to find a more general way, at least not till the next version of GHC.

The RebindableSyntax extension to GHC lets you overload if ... then ... else expressions with your own version. In particular, the ifThenElse function is used to define alternative meanings. if e1 then e2 else e3" means ifThenElse e1 e2 e3.
See the blog article Rebindable if..then..else expressions for a nice discussion of this feature, including some examples.

You can't remove existing keywords. As was pointed out you can use RebindableSyntax, but that might not do what you want.
The only thing getting close to removing keywords is to turn on the CPP option and doing something like
#define if if_
#define then then_
#define else else_
The preprocessor will then expand if/then/else to if_/then_/else_.

How about:
cond True t _ = t
cond False _ f = f

What languages have a while-else type control structure, and how does it work?

A long time ago, I thought I saw a proposal to add an else clause to for or while loops in C or C++... or something like that. I don't remember how it was supposed to work -- did the else clause run if the loop exited normally but not via a break statement?
Anyway, this is tough to search for, so I thought maybe I could get some CW answers here for various languages.
What languages support adding an else clause to something other than an if statement? What is the meaning of that clause? One language per answer please.

Python.
Example use:
for element in container:
if element == target:
break
else:
# this will not be executed if the loop is quit with break.
raise ElementNotFoundError()
From the Python docs:
it is executed when the loop
terminates through exhaustion of the
list (with for) or when the condition
becomes false (with while), but not
when the loop is terminated by a break
statement.

There is so-called "Dijkstra's Loop" (also called "Dijkstra's Guarded Loop"). It was defined in The Guarded Command Language (GCL). You can find some information about it syntax and semantic in the above Wikipedia article at the section 6 Repetition: do.
Nowadays I actually know one programming language which supports this control struture directly. It is Oberon-07 (PDF, 70 KB). And it supports "Dijkstra's Loop" in thу form of while statement. Take a look at section 9.6. While statements in the above PDF.
WHILE m > n DO m := m – n
ELSIF n > m DO n := n – m
END

Interestingly, neither the Python or the Oberon construct are the one I've been searching for. In C, I frequently find myself often wanting an 'otherwise' or 'elsewhile' construct that is executed only if the loop was never taken. Perhaps this is the construction you are looking for as well?
So instead of:
if (condition) {
do {
condition = update(something);
} while (condition);
} else {
loop_never_taken(something);
}
I could write:
while (condition) {
condition = update(something);
} otherwhile {
loop_never_taken(something);
}
It's definitely shorter, and I would find it much clearer to read. It even translates easily into (pseudo) assembly:
while: test condition
bz elsewhile
loop: push something
call update
test: test condition
bnz loop
jmp done
elsewhile: push something
call loop_never_taken
done: ...
I feel like it's a basic enough structure that it deserves a little more sugar. But apparently there haven't been any successful language designers who rely on this structure as much as I do. I wonder how much I should read into that!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string