Splitting hairs whilst splitting strings

Splitting hairs whilst splitting strings - string

I tried to make a function in Idris like so:
strSplit : String -> Maybe (Char, String)
This would 'un-cons' the string into its first Char and the rest of the string, and return Nothing if it were empty.
So I wrote this, which failed:
strSplit x = case strM of
StrNil => Nothing
StrCons c cd => Just (c, cs)
So I then tried this, a bit like in Prelude.Strings:
strSplit x with (strM x)
strSplit "" | StrNil = Nothing
strSplit (strCons c cs) | (StrCons c cs) = Just (c, cs)
Which compiled and ran with no problems.
My question is, why do I have to use that with rule to split a string in this way, and why does my original method fail?
Note: Sorry, I can't access an interpreter at the moment, so I can't write the error message here yet.

There are two problems here. Firstly, in the 'case' block, the argument is strM rather than strM x as it is in the 'with' block, so you're inspecting different things.
There's a more interesting problem though, which is that if you try fixing the first one:
strSplit : String -> Maybe (Char, String)
strSplit x = case strM x of
StrNil => Nothing
StrCons c cd => Just (c, cs)
You'll get a different error (this is from current master which has reworded error messages):
Type mismatch between
StrM "" (Type of StrNil)
and
StrM x (Expected type)
So the distinction between 'case' and 'with' is that 'with' takes into account that the thing you're inspecting may influence the types and values on the left hand side. In the 'case', matching strM x means that x must be "", but a 'case' can appear anywhere and takes no account of the effect on the types of the other arguments (working out appropriate type checking rules for this would be quite a challenge...).
On the other hand, 'with' can only appear at the top level: effectively what it's doing is adding another top level thing to match on which, being top level, can affect the types and values of the other patterns.
So, the short answer is that 'with' supports dependent pattern matching, but 'case' does not.

Related

How can I write a parser using Parsec that only accepts unique elements?

I have recently started learning Haskell and have been trying my hand at Parsec. However, for the past couple of days I have been stuck with a problem that I have been unable to find the solution to. So what I am trying to do is write a parser that can parse a string like this:
<"apple", "pear", "pineapple", "orange">
The code that I wrote to do that is:
collection :: Parser [String]
collection = (char '<') *> (string `sepBy` char ',')) <* (char '>')
string :: Parser String
string = char '"' *> (many (noneOf ['\"', '\r', '\n', '"'])) <* char '"'
This works fine for me as it is able to parse the string that I have defined above. Nevertheless, I would now like to enforce the rule that every element in this collection must be unique and that is where I am having trouble. One of the first results I found when searching on the internet was this one, which suggest the usage of the nub function. Although the problem stated in that question is not the same, it would in theory solve my problem. But what I don't understand is how I can apply this function within a Parser. I have tried adding the nub function to several parts of the code above without any success. Later I also tried doing it the following way:
collection :: Parser [String]
collection = do
char '<'
value <- (string `sepBy` char ','))
char '>'
return nub value
But this does not work as the type does not match what nub is expecting, which I believe is one of the problems I am struggling with. I am also not entirely sure whether nub is the right way to go. My fear is that I am going in the wrong direction and that I won't be able to solve my problem like this. Is there perhaps something I am missing? Any advice or help anyone could provide would be greatly appreciated.

The Parsec Parser type is an instance of MonadPlus which means that we can always fail (ie cause a parse error) whenever we want. A handy function for this is guard:
guard :: MonadPlus m => Bool -> m ()
This function takes a boolean. If it's true, it return () and the whole computation (a parse in this case) does not fail. If it's false, the whole thing fails.
So, as long as you don't care about efficiency, here's a reasonable approach: parse the whole list, check for whether all the elements are unique and fail if they aren't.
To do this, the first thing we have to do is write a predicate that checks if every element of a list is unique. nub does not quite do the right thing: it return a list with all the duplicates taken out. But if we don't care much about performance, we can use it to check:
allUnique ls = length (nub ls) == length ls
With this predicate in hand, we can write a function unique that wraps any parser that produces a list and ensures that list is unique:
unique parser = do res <- parser
guard (allUnique res)
return res
Again, if guard is give True, it doesn't affect the rest of the parse. But if it's given False, it will cause an error.
Here's how we could use it:
λ> parse (unique collection) "<interactive>" "<\"apple\",\"pear\",\"pineapple\",\"orange\">"
Right ["apple","pear","pineapple","orange"]
λ> parse (unique collection) "<interactive>" "<\"apple\",\"pear\",\"pineapple\",\"orange\",\"apple\">"
Left "<interactive>" (line 1, column 46):unknown parse error
This does what you want. However, there's a problem: there is no error message supplied. That's not very user friendly! Happily, we can fix this using <?>. This is an operator provided by Parsec that lets us set the error message of a parser.
unique parser = do res <- parser
guard (allUnique res) <?> "unique elements"
return res
Ahhh, much better:
λ> parse (unique collection) "<interactive>" "<\"apple\",\"pear\",\"pineapple\",\"orange\",\"apple\">"
Left "<interactive>" (line 1, column 46):
expecting unique elements
All this works but, again, it's worth noting that it isn't efficient. It parses the whole list before realizing elements aren't unique, and nub takes quadratic time. However, this works and it's probably more than good enough for parsing small to medium-sized files: ie most things written by hand rather than autogenerated.

G-machine, (non-)strict contexts - why case expressions need special treatment

I'm currently reading Implementing functional languages: a tutorial by SPJ and the (sub)chapter I'll be referring to in this question is 3.8.7 (page 136).
The first remark there is that a reader following the tutorial has not yet implemented C scheme compilation (that is, of expressions appearing in non-strict contexts) of ECase expressions.
The solution proposed is to transform a Core program so that ECase expressions simply never appear in non-strict contexts. Specifically, each such occurrence creates a new supercombinator with exactly one variable which body corresponds to the original ECase expression, and the occurrence itself is replaced with a call to that supercombinator.
Below I present a (slightly modified) example of such transformation from 1
t a b = Pack{2,1} ;
f x = Pack{2,2} (case t x 7 6 of
<1> -> 1;
<2> -> 2) Pack{1,0} ;
main = f 3
== transformed into ==>
t a b = Pack{2,1} ;
f x = Pack{2,2} ($Case1 (t x 7 6)) Pack{1,0} ;
$Case1 x = case x of
<1> -> 1;
<2> -> 2 ;
main = f 3
I implemented this solution and it works like charm, that is, the output is Pack{2,2} 2 Pack{1,0}.
However, what I don't understand is - why all that trouble? I hope it's not just me, but the first thought I had of solving the problem was to just implement compilation of ECase expressions in C scheme. And I did it by mimicking the rule for compilation in E scheme (page 134 in 1 but I present that rule here for completeness): so I used
E[[case e of alts]] p = E[[e]] p ++ [Casejump D[[alts]] p]
and wrote
C[[case e of alts]] p = C[[e]] p ++ [Eval] ++ [Casejump D[[alts]] p]
I added [Eval] because Casejump needs an argument on top of the stack in weak head normal form (WHNF) and C scheme doesn't guarantee that, as opposed to E scheme.
But then the output changes to enigmatic: Pack{2,2} 2 6.
The same applies when I use the same rule as for E scheme, i.e.
C[[case e of alts]] p = E[[e]] p ++ [Casejump D[[alts]] p]
So I guess that my "obvious" solution is inherently wrong - and I can see that from outputs. But I'm having trouble stating formal arguments as to why that approach was bound to fail.
Can someone provide me with such argument/proof or some intuition as to why the naive approach doesn't work?

The purpose of the C scheme is to not perform any computation, but just delay everything until an EVAL happens (which it might or might not). What are you doing in your proposed code generation for case? You're calling EVAL! And the whole purpose of C is to not call EVAL on anything, so you've now evaluated something prematurely.
The only way you could generate code directly for case in the C scheme would be to add some new instruction to perform the case analysis once it's evaluated.
But we (Thomas Johnsson and I) decided it was simpler to just lift out such expressions. The exact historical details are lost in time though. :)

Erlang: Matching strings in guard statement

Started working with erlang quite recently and ran into the problem above, how do you go about comparing two strings in a guard statement? Tried the string:equal(x,y) method but couldn't get it to work inside a guard.

You could use pattern matching like this:
are_the_same(A, A) ->
true;
are_the_same(_, _) ->
false.
In first clause both arguments are named A which will result in them being pattern matched against each other. Or to be exact first argument will be bind to A variable with use of = operator, and than second argument will be bind to A variable with = operator, but since A is bound already it will be treated as "comparision". You can read more about this in docs.
And of course you could write write first clouse with use of guard like:
are_the_same(A, B) when A =:= B ->

You don't need the function string:equal/2 to compare strings; you can use the operators == or =:=, which are allowed in guard tests. For example:
foo(A, B) when A =:= B ->
equal;
foo(_, _) ->
not_equal.
Though in most cases you'd want to use pattern matching instead, as described in the other answer.
NB: As of Erlang/OTP 20.0, string:equal(A, B) is no longer equivalent to A =:= B. string:equal/2 now operates on grapheme clusters, and there are also string:equal/3 and string:equal/4 that can optionally ignore case when comparing and do Unicode normalisation. So you need to understand what you mean by "equal" before settling on a comparison method.

The functions you can use in guards are limited because of the nature of Erlang's scheduling; specifically, Erlang aims to avoid side-effects in guard statements (e.g., calling to another process) because guards are evaluated by the scheduler and do not count against reductions. This is why string:equal does not work.
That being said, you can use Erlang's pattern matching to match strings. Please bear in mind the use of strings as lists, binaries, or iolists (nested lists/binaries) in Erlang, and make sure you're testing/passing strings of the right type (iolists are particularly hard to pattern match and are usually best handled with the re module, or converting them to binaries via iolist_to_binary).
For example, say we want a function that tests to see if a string begins with "foo":
bar("foo" ++ _Rest) -> true;
bar(<<"foo", Rest/binary>>) -> true;
bar(_Else) -> false.
If you just want to test for a particular string, it's even easier:
bar("foo") -> true;
bar(<<"foo">>) -> true;
bar(_Else) -> false.

Correct way to define a function in Haskell

I'm new to Haskell and I'm trying out a few tutorials.
I wrote this script:
lucky::(Integral a)=> a-> String
lucky 7 = "LUCKY NUMBER 7"
lucky x = "Bad luck"
I saved this as lucky.hs and ran it in the interpreter and it works fine.
But I am unsure about function definitions. It seems from the little I have read that I could equally define the function lucky as follows (function name is lucky2):
lucky2::(Integral a)=> a-> String
lucky2 x=(if x== 7 then "LUCKY NUMBER 7" else "Bad luck")
Both seem to work equally well. Clearly function lucky is clearer to read but is the lucky2 a correct way to write a function?

They are both correct. Arguably, the first one is more idiomatic Haskell because it uses its very important feature called pattern matching. In this form, it would usually be written as:
lucky::(Integral a)=> a-> String
lucky 7 = "LUCKY NUMBER 7"
lucky _ = "Bad luck"
The underscore signifies the fact that you are ignoring the exact form (value) of your parameter. You only care that it is different than 7, which was the pattern captured by your previous declaration.
The importance of pattern matching is best illustrated by function that operates on more complicated data, such as lists. If you were to write a function that computes a length of list, for example, you would likely start by providing a variant for empty lists:
len [] = 0
The [] clause is a pattern, which is set to match empty lists. Empty lists obviously have length of 0, so that's what we are having our function return.
The other part of len would be the following:
len (x:xs) = 1 + len xs
Here, you are matching on the pattern (x:xs). Colon : is the so-called cons operator: it is appending a value to list. An expression x:xs is therefore a pattern which matches some element (x) being appended to some list (xs). As a whole, it matches a list which has at least one element, since xs can also be an empty list ([]).
This second definition of len is also pretty straightforward. You compute the length of remaining list (len xs) and at 1 to it, which corresponds to the first element (x).
(The usual way to write the above definition would be:
len (_:xs) = 1 + len xs
which again signifies that you do not care what the first element is, only that it exists).

A 3rd way to write this would be using guards:
lucky n
| n == 7 = "lucky"
| otherwise = "unlucky"
There is no reason to be confused about that. There is always more than 1 way to do it. Note that this would be true even if there were no pattern matching or guards and you had to use the if.
All of the forms we've covered so far use so-called syntactic sugar provided by Haskell. Pattern guards are transformed to ordinary case expressions, as well as multiple function clauses and if expressions. Hence the most low-level, unsugared way to write this would be perhaps:
lucky n = case n of
7 -> "lucky"
_ -> "unlucky"
While it is good that you check for idiomatic ways I'd recommend to a beginner that he uses whatever works for him best, whatever he understands best. For example, if one does (not yet) understand points free style, there is no reason to force it. It will come to you sooner or later.

Haskell: Delimit a string by chosen sub-strings and whitespace

Am still new to Haskell, so apologize if there is an obvious answer to this...
I would like to make a function that splits up the all following lists of strings i.e. [String]:
["int x = 1", "y := x + 123"]
["int x= 1", "y:= x+123"]
["int x=1", "y:=x+123"]
All into the same string of strings i.e. [[String]]:
[["int", "x", "=", "1"], ["y", ":=", "x", "+", "123"]]
You can use map words.lines for the first [String].
But I do not know any really neat ways to also take into account the others - where you would be using the various sub-strings "=", ":=", "+" etc. to break up the main string.
Thank you for taking the time to enlighten me on Haskell :-)

The Prelude comes with a little-known handy function called lex, which is a lexer for Haskell expressions. These match the form you need.
lex :: String -> [(String,String)]
What a weird type though! The list is there for interfacing with a standard type of parser, but I'm pretty sure lex always returns either 1 or 0 elements (0 indicating a parse failure). The tuple is (token-lexed, rest-of-input), so lex only pulls off one token. So a simple way to lex a whole string would be:
lexStr :: String -> [String]
lexStr "" = []
lexStr s =
case lex s of
[(tok,rest)] -> tok : lexStr rest
[] -> error "Failed lex"
To appease the pedants, this code is in terrible form. An explicit call to error instead of returning a reasonable error using Maybe, assuming lex only returns 1 or 0 elements, etc. The code that does this reliably is about the same length, but is significantly more abstract, so I spared your beginner eyes.

I would take a look at parsec and build a simple grammar for parsing your strings.

how about using words .)
words :: String -> [String]
and words wont care for whitespaces..
words "Hello World"
= words "Hello World"
= ["Hello", "World"]

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Splitting hairs whilst splitting strings - string

Related

How can I write a parser using Parsec that only accepts unique elements?

G-machine, (non-)strict contexts - why case expressions need special treatment

Erlang: Matching strings in guard statement

Correct way to define a function in Haskell

Haskell: Delimit a string by chosen sub-strings and whitespace

Categories

Resources