I have a doubt of the Maybe and Either types, and their hypothetical relation to EAFP(Easier Ask Forgiveness to Permission). I've worked with Python and get used to work with the EAFP paradigm in the world of exceptions.
The classical example: Division by zero
def func(x,y):
if not y:
print "ERROR."
else: return (x/y)
and Python's style:
def func(x,y):
try:
return (x/y)
except: return None
In Haskell, the first function would be
func :: (Eq a, Fractional a) => a -> a -> a
func x y = if y==0 then error "ERROR." else x/y
and with Maybe:
func :: (Eq a, Fractional a) => a -> a -> Maybe a
func x y = if y==0 then Nothing else Just (x/y)
In Python's version, you run func without checking y. With Haskell, the story is the opposite: y is checked.
My question:
Formally, does Haskell support the EAFP paradigm or "prefers" LBYL although admits a semi-bizarre EAFP approximation?
PD: I called "semi-bizarre" because, even if it is intuitively readable, it looks (at least for me) like it vulnerates EAFP.
The Haskell style with Maybe and Either forces you to check for the error at some point, but it does not have to be right away. If you don't want to deal with the error now, you can just propagate it on through the rest of your computation.
Taking your hypothetical safe divide-by-0 example, you could use it in a broader computation without an explicit check:
do result <- func a b
let x = result * 10
return x
Here, you don't have to match on the Maybe returned by func: you just extract it into the result variable using do-notation, which automatically propagates failure throughout. The consequence is that you don't need to deal with the potential error immediately, but the final result of the computation is wrapped in Maybe itself.
This means that you can easily combine (compose) functions that miht result in an error without having to check the error at each step.
In a sense, this gives you the best of both worlds. You still only have to check for errors in one place, at the very end, but you're explicit about it. You have to use something like do-notation to take care of the actual propagation and you can't ignore the final error by accident: if you don't want to handle it, you have to turn it into a runtime error explicitly.
Isn't explicit better than implicit?
Now, Haskell also has a system of exceptions for working with runtime errors that you do not have to check at all. This is useful occasionally, but not too often. In Haskell, we only use it for errors that we do not expect to ever catch—truly exceptional situations. The rule of thumb is that a runtime exception represents a bug in your program, while an improper input or merely an uncommon case should be represented with Maybe or Either.
Related
As the title states, I see pieces of code online where the variables/functions have ' next to it, what does this do/mean?
ex:
function :: [a] -> [a]
function ...
function' :: ....
The notation comes from mathematics. It is read x prime. In pretty much any math manual you can find something like let x be a number and x' be the projection of ... (math stuff).
Why not using another convention? well, in mathematics It makes a lot of sense because It can be very pedagogical... In programming we aren't used to this convention so I don't see the point of using it, but I am not against it neither.
Just to give you an example of its use in mathematics so you can understand why It is used in Haskell. Below, the same triangle concept but one using prime convention and other not using it. It is pretty clear in the first picture that pairs (A, A'), (B, B'), ... are related by one being the vertex and the prime version being the midpoint of the oposite edge. Whereas in the second example, you just have to remember that A is the midpoint of the oposite edge of vertex P. First is easier and more pedagogical:
As the other answers said, function' is just another variable name. So,
don'tUse :: Int -> IO ()
don'tUse won'tBe''used'' = return ()
is just like
dontUse :: Int -> IO ()
dontUse wontBeUsed = return ()
with slightly different names. The only requirement is that the name starts with a lowercase-letter or underscore, after that you can have as many single-quote characters as you want.
Prelude> let _' = 1
Prelude> let _'' = 2
Prelude> let _''''''''' = 9
Prelude> _' + _'' * _'''''''''
19
...Of course it's not necessarily a good idea to name variables like that; normally such prime-names are used when making a slightly different version of an already named thing. For example, foldl and foldl' are functions with the same signature that do essentially the same thing, only with different strictness (which often affects performance memory usage and whether infinite inputs are allowed, but not the actual results).
That said, to the question
Haskell what does the ' symbol do?
– the ' symbol does in fact do various other things as well, but only when it appears not as a non-leading character in a name.
'a' is a character literal.
'Foo is a constructor used on the type level. See DataKinds.
'bar and ''Baz are quoted names. See TemplateHaskell.
From wiki.haskell.org:
First of all, common subexpression elimination (CSE) means that if an expression appears in several places, the code is rearranged so that the value of that expression is computed only once. For example:
foo x = (bar x) * (bar x)
might be transformed into
foo x = let x' = bar x in x' * x'
thus, the bar function is only called once. (And if bar is a particularly expensive function, this might save quite a lot of work.)
GHC doesn't actually perform CSE as often as you might expect. The trouble is, performing CSE can affect the strictness/laziness of the program. So GHC does do CSE, but only in specific circumstances --- see the GHC manual. (Section??)
Long story short: "If you care about CSE, do it by hand."
I'm wondering under what circumstances CSE "affects" the strictness/laziness of the program and what kind of effect that could be.
The naive CSE rule would be
e'[e, e] ~> let x = e in e'[x, x].
That is, whenever a subexpression e occurs twice in the expression e', we use a let-binding to compute e once. This however leads itself to some trivial space leaks. For example
sum [1..n] + prod [1..n]
is typically O(1) space usage in a lazy functional programming language like Haskell (as sum and prod would tail-recurse and blah blah blah), but would become O(n) when the naive CSE rule is enacted. This can be terrible for programs when n is high!
The approach is then to make this rule more specific, restricting it to a small set of cases that we know won't have the problem. We can begin by more specifically enumerating the problems with the naive rule, which will form a set of priorities for us to develop a better CSE:
The two occurrences of e might be far apart in e', leading to a long lifetime for the let x = e binding.
The let-binding must always allocate a closure where previously there might not have been one.
This can create an unbound number of closures.
There are cases where the closure might never deallocate.
Something better
let x = e in e'[e] ~> let x = e in e'[x]
This is a more conservative rule but is much safer. Here we recognize that e appears twice but the first occurrence syntactically dominates the second expression, meaning here that the programmer has already introduced a let-binding. We can safely just reuse that let-binding and replace the second occurrence of e with x. No new closures are allocated.
Another example of syntactic domination:
case e of { x -> e'[e] } ~> case e of { x -> e'[x] }
And yet another:
case e of {
Constructor x0 x1 ... xn ->
e'[e]
}
~>
case e of {
Constructor x0 x1 ... xn ->
e'[Constructor x0 x1 ... xn]
}
These rules all take advantage of existing structure in the program to ensure that the kinetics of space usage remain the same before and after the transformation. They are much more conservative than the original CSE but they are also much safer.
See also
For a full discussion of CSE in a lazy FPL, read Chitil's (very accessible) 1997 paper. For a full treatment of how CSE works in a production compiler, see GHC's CSE.hs module, which is documented very thoroughly thanks to GHC's tradition of writing long footnotes. The comment-to-code ratio in that module is off the charts. Also note how old that file is (1993)!
I noticed today that such a definition
safeDivide x 0 = x
safeDivide = (/)
is not possible. I am just curious what the (good) reason behind this is. There must be a very good one (it's Haskell after all :)).
Note: I am not looking suggestions for alternative implementations to the code above, it's a simple example to demonstrate my point.
I think it's mainly for consistency so that all clauses can be read in the same manner, so to speak; i.e. every RHS is at the same position in the type of the function. I think would mask quite a few silly errors if you allowed this, too.
There's also a slight semantic quirk: say the compiler padded out such clauses to have the same number of patterns as the other clauses; i.e. your example would become
safeDivide x 0 = x
safeDivide x y = (/) x y
Now consider if the second line had instead been safeDivide = undefined; in the absence of the previous clause, safeDivide would be ⊥, but thanks to the eta-expansion performed here, it's \x y -> if y == 0 then x else ⊥ — so safeDivide = undefined does not actually define safeDivide to be ⊥! This seems confusing enough to justify banning such clauses, IMO.
The meaning of a function with multiple clauses is defined by the Haskell standard (section 4.4.3.1) via translation to a lambda and case statement:
fn pat1a pat1b = r1
fn pat2a pat2b = r2
becomes
fn = \a b -> case (a,b) of
(pat1a, pat1b) -> r1
(pat2a, pat2b) -> r2
This is so that the function definition/case statement way of doing things is nice and consistent, and the meaning of each isn't specified redundantly and confusingly.
This translation only really makes sense when each clause has the same number of arguments. Of course, there could be extra rules to fix that, but they'd complicate the translation for little gain, since you probably wouldn't want to define things like that anyway, for your readers' sake.
Haskell does it this way because it's predecessors (like LML and Miranda) did. There is no technical reason it has to be like this; equations with fewer arguments could be eta expanded. But having a different number of arguments for different equations is probably a typo rather than intentional, so in this case we ban something sensible&rare to get better error reporting in the common case.
Started working with erlang quite recently and ran into the problem above, how do you go about comparing two strings in a guard statement? Tried the string:equal(x,y) method but couldn't get it to work inside a guard.
You could use pattern matching like this:
are_the_same(A, A) ->
true;
are_the_same(_, _) ->
false.
In first clause both arguments are named A which will result in them being pattern matched against each other. Or to be exact first argument will be bind to A variable with use of = operator, and than second argument will be bind to A variable with = operator, but since A is bound already it will be treated as "comparision". You can read more about this in docs.
And of course you could write write first clouse with use of guard like:
are_the_same(A, B) when A =:= B ->
You don't need the function string:equal/2 to compare strings; you can use the operators == or =:=, which are allowed in guard tests. For example:
foo(A, B) when A =:= B ->
equal;
foo(_, _) ->
not_equal.
Though in most cases you'd want to use pattern matching instead, as described in the other answer.
NB: As of Erlang/OTP 20.0, string:equal(A, B) is no longer equivalent to A =:= B. string:equal/2 now operates on grapheme clusters, and there are also string:equal/3 and string:equal/4 that can optionally ignore case when comparing and do Unicode normalisation. So you need to understand what you mean by "equal" before settling on a comparison method.
The functions you can use in guards are limited because of the nature of Erlang's scheduling; specifically, Erlang aims to avoid side-effects in guard statements (e.g., calling to another process) because guards are evaluated by the scheduler and do not count against reductions. This is why string:equal does not work.
That being said, you can use Erlang's pattern matching to match strings. Please bear in mind the use of strings as lists, binaries, or iolists (nested lists/binaries) in Erlang, and make sure you're testing/passing strings of the right type (iolists are particularly hard to pattern match and are usually best handled with the re module, or converting them to binaries via iolist_to_binary).
For example, say we want a function that tests to see if a string begins with "foo":
bar("foo" ++ _Rest) -> true;
bar(<<"foo", Rest/binary>>) -> true;
bar(_Else) -> false.
If you just want to test for a particular string, it's even easier:
bar("foo") -> true;
bar(<<"foo">>) -> true;
bar(_Else) -> false.
I noticed today that such a definition
safeDivide x 0 = x
safeDivide = (/)
is not possible. I am just curious what the (good) reason behind this is. There must be a very good one (it's Haskell after all :)).
Note: I am not looking suggestions for alternative implementations to the code above, it's a simple example to demonstrate my point.
I think it's mainly for consistency so that all clauses can be read in the same manner, so to speak; i.e. every RHS is at the same position in the type of the function. I think would mask quite a few silly errors if you allowed this, too.
There's also a slight semantic quirk: say the compiler padded out such clauses to have the same number of patterns as the other clauses; i.e. your example would become
safeDivide x 0 = x
safeDivide x y = (/) x y
Now consider if the second line had instead been safeDivide = undefined; in the absence of the previous clause, safeDivide would be ⊥, but thanks to the eta-expansion performed here, it's \x y -> if y == 0 then x else ⊥ — so safeDivide = undefined does not actually define safeDivide to be ⊥! This seems confusing enough to justify banning such clauses, IMO.
The meaning of a function with multiple clauses is defined by the Haskell standard (section 4.4.3.1) via translation to a lambda and case statement:
fn pat1a pat1b = r1
fn pat2a pat2b = r2
becomes
fn = \a b -> case (a,b) of
(pat1a, pat1b) -> r1
(pat2a, pat2b) -> r2
This is so that the function definition/case statement way of doing things is nice and consistent, and the meaning of each isn't specified redundantly and confusingly.
This translation only really makes sense when each clause has the same number of arguments. Of course, there could be extra rules to fix that, but they'd complicate the translation for little gain, since you probably wouldn't want to define things like that anyway, for your readers' sake.
Haskell does it this way because it's predecessors (like LML and Miranda) did. There is no technical reason it has to be like this; equations with fewer arguments could be eta expanded. But having a different number of arguments for different equations is probably a typo rather than intentional, so in this case we ban something sensible&rare to get better error reporting in the common case.