Erlang: Matching strings in guard statement - string

Started working with erlang quite recently and ran into the problem above, how do you go about comparing two strings in a guard statement? Tried the string:equal(x,y) method but couldn't get it to work inside a guard.

You could use pattern matching like this:
are_the_same(A, A) ->
true;
are_the_same(_, _) ->
false.
In first clause both arguments are named A which will result in them being pattern matched against each other. Or to be exact first argument will be bind to A variable with use of = operator, and than second argument will be bind to A variable with = operator, but since A is bound already it will be treated as "comparision". You can read more about this in docs.
And of course you could write write first clouse with use of guard like:
are_the_same(A, B) when A =:= B ->

You don't need the function string:equal/2 to compare strings; you can use the operators == or =:=, which are allowed in guard tests. For example:
foo(A, B) when A =:= B ->
equal;
foo(_, _) ->
not_equal.
Though in most cases you'd want to use pattern matching instead, as described in the other answer.
NB: As of Erlang/OTP 20.0, string:equal(A, B) is no longer equivalent to A =:= B. string:equal/2 now operates on grapheme clusters, and there are also string:equal/3 and string:equal/4 that can optionally ignore case when comparing and do Unicode normalisation. So you need to understand what you mean by "equal" before settling on a comparison method.

The functions you can use in guards are limited because of the nature of Erlang's scheduling; specifically, Erlang aims to avoid side-effects in guard statements (e.g., calling to another process) because guards are evaluated by the scheduler and do not count against reductions. This is why string:equal does not work.
That being said, you can use Erlang's pattern matching to match strings. Please bear in mind the use of strings as lists, binaries, or iolists (nested lists/binaries) in Erlang, and make sure you're testing/passing strings of the right type (iolists are particularly hard to pattern match and are usually best handled with the re module, or converting them to binaries via iolist_to_binary).
For example, say we want a function that tests to see if a string begins with "foo":
bar("foo" ++ _Rest) -> true;
bar(<<"foo", Rest/binary>>) -> true;
bar(_Else) -> false.
If you just want to test for a particular string, it's even easier:
bar("foo") -> true;
bar(<<"foo">>) -> true;
bar(_Else) -> false.

Related

Permutation parsing with megaparsec + parser-combinators too lenient

I'm attempting to parse permutations of flags. The behavior I want is "one or more flags in any order, without repetition". I'm using the following packages:
megaparsec
parser-combinators
The code I have is outputting what I want, but is too lenient on inputs. I don't understand why it's accepting multiples of the same flags. What am I doing wrong here?
pFlags :: Parser [Flag]
pFlags = runPermutation $ f <$>
toPermutation (optional (GroupFlag <$ char '\'')) <*>
toPermutation (optional (LeftJustifyFlag <$ char '-'))
where f a b = catMaybes [a, b]
Examples:
"'-" = [GroupFlag, LeftJustifyFlag] -- CORRECT
"-'" = [LeftJustifyFlag, GroupFlag] -- CORRECT
"''''-" = [GroupFlag, LeftJustifyFlag] -- INCORRECT, should fail if there's more than one of the same flag.
Instead of toPermutation with optional, I believe you need to use toPermutationWithDefault, something like this (untested):
toPermutationWithDefault Nothing (Just GroupFlag <$ char '\'')
The reasoning is described in the paper “Parsing Permutation Phrases” (PDF) in §4, “adding optional elements” (emph. added):
Consider, for example […] all permutations of a, b and c. Suppose b can be empty and we want to recognise ac. This can be done in three different ways since the empty b can be recognised before a, after a or after c. Fortunately, it is irrelevant for the result of a parse where exactly the empty b is derived, since order is not important. This allows us to use a strategy similar to the one proposed by Cameron: parse nonempty constituents as they are seen and allow the parser to stop if all remaining elements are optional. When the parser stops the default values are returned for all optional elements that have not been recognised.
To implement this strategy we need to be able to determine whether a parser can derive the empty string and split it into its default value and its non-empty part, i.e. a parser that behaves the same except that it does not recognise the empty string.
That is, the permutation parser needs to know which elements can succeed without consuming input, otherwise it will be too eager to commit to a branch. I don’t know why this would lead to accepting multiples of an element, though; perhaps you’re also missing an eof?

EAFP in Haskell

I have a doubt of the Maybe and Either types, and their hypothetical relation to EAFP(Easier Ask Forgiveness to Permission). I've worked with Python and get used to work with the EAFP paradigm in the world of exceptions.
The classical example: Division by zero
def func(x,y):
if not y:
print "ERROR."
else: return (x/y)
and Python's style:
def func(x,y):
try:
return (x/y)
except: return None
In Haskell, the first function would be
func :: (Eq a, Fractional a) => a -> a -> a
func x y = if y==0 then error "ERROR." else x/y
and with Maybe:
func :: (Eq a, Fractional a) => a -> a -> Maybe a
func x y = if y==0 then Nothing else Just (x/y)
In Python's version, you run func without checking y. With Haskell, the story is the opposite: y is checked.
My question:
Formally, does Haskell support the EAFP paradigm or "prefers" LBYL although admits a semi-bizarre EAFP approximation?
PD: I called "semi-bizarre" because, even if it is intuitively readable, it looks (at least for me) like it vulnerates EAFP.
The Haskell style with Maybe and Either forces you to check for the error at some point, but it does not have to be right away. If you don't want to deal with the error now, you can just propagate it on through the rest of your computation.
Taking your hypothetical safe divide-by-0 example, you could use it in a broader computation without an explicit check:
do result <- func a b
let x = result * 10
return x
Here, you don't have to match on the Maybe returned by func: you just extract it into the result variable using do-notation, which automatically propagates failure throughout. The consequence is that you don't need to deal with the potential error immediately, but the final result of the computation is wrapped in Maybe itself.
This means that you can easily combine (compose) functions that miht result in an error without having to check the error at each step.
In a sense, this gives you the best of both worlds. You still only have to check for errors in one place, at the very end, but you're explicit about it. You have to use something like do-notation to take care of the actual propagation and you can't ignore the final error by accident: if you don't want to handle it, you have to turn it into a runtime error explicitly.
Isn't explicit better than implicit?
Now, Haskell also has a system of exceptions for working with runtime errors that you do not have to check at all. This is useful occasionally, but not too often. In Haskell, we only use it for errors that we do not expect to ever catch—truly exceptional situations. The rule of thumb is that a runtime exception represents a bug in your program, while an improper input or merely an uncommon case should be represented with Maybe or Either.

ocaml style: parameterize programs

I have a OCaml program which modules have lots of functions that depend on a parameter, i.e. "dimension". This parameter is determined once at the beginning of a run of the code and stays constant until termination.
My question is: how can I write the code shorter, so that my functions do not all require a "dimension" parameter. Those modules call functions of each other, so there is no strict hierarchy (or I can't see this) between the modules.
how is the ocaml style to adress this problem? Do I have to use functors or are there other means?
You probably cannot evaluate the parameter without breaking dependencies between modules, otherwise you would just define it in one of the modules where it is accessible from other modules. A solution that comes to my mind is a bit "daring". Define the parameter as a lazy value, and suspend in it a dereference of a "global cell":
let hidden_box = ref None
let initialize_param p =
match !hidden_box with None -> hidden_box := Some p | _ -> assert false
let param =
lazy (match !hidden_box with None -> assert false | Some p -> p)
The downside is that Lazy.force param is a bit verbose.
ETA: Note that "there is no strict hierarchy between the modules" is either:
false, or
you have a recursive module definition, or
you are tying the recursive knot somewhere.
In case (2) you can just put everything into a functor. In case (3) you are passing parameters already.

What makes a good name for a helper function?

Consider the following problem: given a list of length three of tuples (String,Int), is there a pair of elements having the same "Int" part? (For example, [("bob",5),("gertrude",3),("al",5)] contains such a pair, but [("bob",5),("gertrude",3),("al",1)] does not.)
This is how I would implement such a function:
import Data.List (sortBy)
import Data.Function (on)
hasPair::[(String,Int)]->Bool
hasPair = napkin . sortBy (compare `on` snd)
where napkin [(_, a),(_, b),(_, c)] | a == b = True
| b == c = True
| otherwise = False
I've used pattern matching to bind names to the "Int" part of the tuples, but I want to sort first (in order to group like members), so I've put the pattern-matching function inside a where clause. But this brings me to my question: what's a good strategy for picking names for functions that live inside where clauses? I want to be able to think of such names quickly. For this example, "hasPair" seems like a good choice, but it's already taken! I find that pattern comes up a lot - the natural-seeming name for a helper function is already taken by the outer function that calls it. So I have, at times, called such helper functions things like "op", "foo", and even "helper" - here I have chosen "napkin" to emphasize its use-it-once, throw-it-away nature.
So, dear Stackoverflow readers, what would you have called "napkin"? And more importantly, how do you approach this issue in general?
General rules for locally-scoped variable naming.
f , k, g, h for super simple local, semi-anonymous things
go for (tail) recursive helpers (precedent)
n , m, i, j for length and size and other numeric values
v for results of map lookups and other dictionary types
s and t for strings.
a:as and x:xs and y:ys for lists.
(a,b,c,_) for tuple fields.
These generally only apply for arguments to HOFs. For your case, I'd go with something like k or eq3.
Use apostrophes sparingly, for derived values.
I tend to call boolean valued functions p for predicate. pred, unfortunately, is already taken.
In cases like this, where the inner function is basically the same as the outer function, but with different preconditions (requiring that the list is sorted), I sometimes use the same name with a prime, e.g. hasPairs'.
However, in this case, I would rather try to break down the problem into parts that are useful by themselves at the top level. That usually also makes naming them easier.
hasPair :: [(String, Int)] -> Bool
hasPair = hasDuplicate . map snd
hasDuplicate :: Ord a => [a] -> Bool
hasDuplicate = not . isStrictlySorted . sort
isStrictlySorted :: Ord a => [a] -> Bool
isStrictlySorted xs = and $ zipWith (<) xs (tail xs)
My strategy follows Don's suggestions fairly closely:
If there is an obvious name for it, use that.
Use go if it is the "worker" or otherwise very similar in purpose to the original function.
Follow personal conventions based on context, e.g. step and start for args to a fold.
If all else fails, just go with a generic name, like f
There are two techniques that I personally avoid. One is using the apostrophe version of the original function, e.g. hasPair' in the where clause of hasPair. It's too easy to accidentally write one when you meant the other; I prefer to use go in such cases. But this isn't a huge deal as long as the functions have different types. The other is using names that might connote something, but not anything that has to do with what the function actually does. napkin would fall into this category. When you revisit this code, this naming choice will probably baffle you, as you will have forgotten the original reason that you named it napkin. (Because napkins have 4 corners? Because they are easily folded? Because they clean up messes? They're found at restaurants?) Other offenders are things like bob and myCoolFunc.
If you have given a function a name that is more descriptive than go or h, then you should be able to look at either the context in which it is used, or the body of the function, and in both situations get a pretty good idea of why that name was chosen. This is where my point #3 comes in: personal conventions. Much of Don's advice applies. If you are using Haskell in a collaborative situation, then coordinate with your team and decide on certain conventions for common situations.

Correct way to define a function in Haskell

I'm new to Haskell and I'm trying out a few tutorials.
I wrote this script:
lucky::(Integral a)=> a-> String
lucky 7 = "LUCKY NUMBER 7"
lucky x = "Bad luck"
I saved this as lucky.hs and ran it in the interpreter and it works fine.
But I am unsure about function definitions. It seems from the little I have read that I could equally define the function lucky as follows (function name is lucky2):
lucky2::(Integral a)=> a-> String
lucky2 x=(if x== 7 then "LUCKY NUMBER 7" else "Bad luck")
Both seem to work equally well. Clearly function lucky is clearer to read but is the lucky2 a correct way to write a function?
They are both correct. Arguably, the first one is more idiomatic Haskell because it uses its very important feature called pattern matching. In this form, it would usually be written as:
lucky::(Integral a)=> a-> String
lucky 7 = "LUCKY NUMBER 7"
lucky _ = "Bad luck"
The underscore signifies the fact that you are ignoring the exact form (value) of your parameter. You only care that it is different than 7, which was the pattern captured by your previous declaration.
The importance of pattern matching is best illustrated by function that operates on more complicated data, such as lists. If you were to write a function that computes a length of list, for example, you would likely start by providing a variant for empty lists:
len [] = 0
The [] clause is a pattern, which is set to match empty lists. Empty lists obviously have length of 0, so that's what we are having our function return.
The other part of len would be the following:
len (x:xs) = 1 + len xs
Here, you are matching on the pattern (x:xs). Colon : is the so-called cons operator: it is appending a value to list. An expression x:xs is therefore a pattern which matches some element (x) being appended to some list (xs). As a whole, it matches a list which has at least one element, since xs can also be an empty list ([]).
This second definition of len is also pretty straightforward. You compute the length of remaining list (len xs) and at 1 to it, which corresponds to the first element (x).
(The usual way to write the above definition would be:
len (_:xs) = 1 + len xs
which again signifies that you do not care what the first element is, only that it exists).
A 3rd way to write this would be using guards:
lucky n
| n == 7 = "lucky"
| otherwise = "unlucky"
There is no reason to be confused about that. There is always more than 1 way to do it. Note that this would be true even if there were no pattern matching or guards and you had to use the if.
All of the forms we've covered so far use so-called syntactic sugar provided by Haskell. Pattern guards are transformed to ordinary case expressions, as well as multiple function clauses and if expressions. Hence the most low-level, unsugared way to write this would be perhaps:
lucky n = case n of
7 -> "lucky"
_ -> "unlucky"
While it is good that you check for idiomatic ways I'd recommend to a beginner that he uses whatever works for him best, whatever he understands best. For example, if one does (not yet) understand points free style, there is no reason to force it. It will come to you sooner or later.

Resources