Haskell function with infinite list as argument [duplicate] - haskell

In The Haskell 98 Report it's said that
A floating literal must contain digits both before and after the decimal point; this ensures that a decimal point cannot be mistaken for another use of the dot character.
What other use might this be? I can't imagine any such legal expression.
(To clarify the motivation: I'm aware that many people write numbers like 9.0 or 0.7 all the time without needing to, but I can't quite befriend myself with this. I'm ok with 0.7 rather then the more compact but otherwise no better .7, but outwritten trailing zeroes feel just wrong to me unless they express some quantity is precise up to tenths, which is seldom the case in the occasions Haskell makes me write 9.0-numbers.)
I forgot it's legal to write function composition without surrounding whitespaces! That's of course a possibility, though one could avoid this problem by parsing floating literals greedily, such that replicate 3 . pred$8 ≡ ((replicate 3) . pred) 8 but replicate 3.pred$8 ≡ (replicate 3.0 pred)8.
There is no expression where an integer literal is required to stand directly next to a ., without whitespace?

One example of other uses is a dot operator (or any other operator starting or ending with a dot): replicate 3.pred$8.
Another possible use is in range expressions: [1..10].
Also, you can (almost) always write 9 instead of 9.0, thus avoiding the need for . altogether.

One of the most prominent usages of (.) is the function composition. And so the haskell compiler interpretes a . 1 composing the function a with a number and does not know what to do; analogously the other way 'round. Other usage of (.) could be found here.
Other problems with .7 vs. 0.7 are not known to me.

I don't really seem much of a problem with allowing '9.' and '.7'. I think the current design is more of a reflection of the ideas of the original designers of Haskell.

While it could probably be disambiguated, I don't think there is much to be gained from allowing .7 and 7.. Code is meant to be read by people as well as machines, and it's much easier to accidentally miss a decimal point at either end of a literal than in the middle.
I'll take the extra readability over the saved byte any day.

Related

Parse arithmetic/boolean expression but skip capture

Given the following expression
x = a + 3 + b * 5
I would like to write that in the following data structure, where I'm only interested to capture the variables used on the RHS and keep the string intact. Not interesting in parsing a more specific structure since I'm doing a transformation from language to language, and not handling the evaluation
Variable "x" (Expr ["a","b"] "a + 3 + b * 5")
I've been using this tutorial as my starting point, but I'm not sure how to write an expression parser without buildExpressionParser. That doesn't seem to be the way I should approach this.
I am not sure why you want to avoid buildExpressionParser, as it hides a lot of the complexity in parsing expressions with infix operators. It is the right way to do things....
Sorry about that, but now that I got that nag out of the way, I can answer your question.
First, here is some background-
The main reason writing a parser for expressions with infix operators is hard is because of operator precedence. You want to make sure that this
x+y*z
parses as this
+
/ \
x *
/\
y z
and not this
*
/ \
+ z
/ \
x y
Choosing the correct parsetree isn't a very hard problem to solve.... But if you aren't paying attention, you can write some really bad code. Why? Performance....
The number of possible parsetrees, ignoring precedence, grows exponentially with the size of the input. For instance, if you write code to try all possibilities then throw away all but the ones with the proper precedence, you will have a nasty surprise when your parser tackles anything in the real world (remember, exponential complexity often ain't just slow, it is basically not a solution at all.... You may find that you are waiting half an hour for a simple parse, no one will use that parser).
I won't repeat the details of the "proper" solution here (a google search will give the details), except to note that the proper solution runs at O(n) with the size of the input, and that buildExpressionParser hides all the complexity of writing such a parser for you.
So, back to your original question....
Do you need to use buildExpressionParser to get the variables out of the RHS, or is there a better way?
You don't need it....
Since all you care about is getting the variables used in the right side, you don't care about operator precedence. You can just make everything left associative and write a simple O(n) parser. The parsetrees will be wrong, but who cares? You will still get the same variables out. You don't even need a context free grammar for this, this regular expression basically does it
<variable>(<operator><variable>)*
(where <variable> and <operator> are defined in the obvious way).
However....
I wouldn't recommend this, because, as simple as it is, it still will be more work than using buildExpressionParser. And it will be trickier to extend (like adding parenthesis). But most important, later on, you may accidentally use it somewhere where you do need a full parsetree, and be confused for a while why the operator precedence is so completely messed up.
Another solution is, you could rewrite your grammar to remove the ambiguity (again, google will tell you how).... This would be good as a learning exercise, but you basically would be repeating what buildExpressionParser is doing internally.

Reversing a list, evaluation order

I am reading page 69 of Haskell School of Expression and I am not sure that I got the evalution of rev [1:2:3:4] right.
Hudak does not explain the evalution(rewriting) order in detail in his book for reverse.
Could someone please either confirm that my guess (shown in the attached picture) is correct or if not correct then point out what I got wrong. I believe that it is correct but I am not 100% sure, this is the reason for asking.
So the question is:
when I evaluate one step of reverse then aftes the evaluation (i.e. rewriting) the result should be surrounded by parenthesis, right?
If I understand correctly, these unlucky appearance of parentheseses is the reason for the poor (read quadratic) time complexity of reverse. In this example 6 steps are spent in total on list appending in order to reverse a 4 element list.
Yes, nested, left-associative calls to append (in Haskell, goes by the names (++) and (<>)) generates poor performance of singly-linked lists.
There are several solutions to this problem, since it's been known about for 30 or 40 years, at least. I believe the library version of reverse uses an accumulator to achieve linear complexity rather than quadratic, but it's still not something you want to call frequently on lists.

If Non Terminals in Gene Expression Programming are mono-type functions, how to build complex Programs?

It just seemed to me studying GEP,and especially analyzing Karva expressions, that Non Terminals are most suitable for functions which type is a->a for some type a, in Haskell notation.
Like, with classic examples, Q+-*/ are all functions from 'some' Double to 'a' Double and they just change in arity.
Now, how can one coder use functions of heterogeneous signature in one Karva expressed gene?
Brief Introduction to GEP/Karva
Gene Expression Programming uses dense representations of a population of expressions and applies evolutionary pressure to make better ones to solve a given problem.
Karva notation represents an expression tree as a string, represented in a non-traditional traversal of level-at-a-time, left-to-right - read more here. Using Karva notation, it is simple and quick to combine (or mutate) expressions to create the next generation.
You can parse Karva notation in Haskell as per this answer with explanation of linear time or this answer that's the same code, but with more diagrams and no proof.
Terminals are the constants or variables in a Karva expression, so /+a*-3cb2 (meaning (a+(b*2))/(3-c)) has terminals [a,b,2,3,c]. A Karva expression with no terminals is thus a function of some arity.
My Question is then more related to how one would use different types of functions without breaking the gene.
What if one wants to use a Non Terminal like a > function? One can count on the fact that, for example, it can compare Doubles. But the result, in a strongly typed Language, would be a Bool. Now, assuming that the Non terminal encoding for > is interspersed in the gene, the parse of the k-expression would result in invalid code, because anything calling it would expect a Double.
One can then think of manually and silently sneak in a cast, as is done by Ms. Ferreira in her book, where she converts Bools into Ints like 0 and 1 for False and True.
Si it seems to me that k-expressed genes are for Non Terminals of any arity, that share the property of taking values of one type a, returning a type a.
In the end, has anyone any idea about how to overcome this?
I already now that one can use homeotic genes, providing some glue between different Sub Expression Trees, but that, IMHO, is somewhat rigid, because, again, you need to know in advance returned types.

Why do Haskell numerical literals need to start and end with digits?

In The Haskell 98 Report it's said that
A floating literal must contain digits both before and after the decimal point; this ensures that a decimal point cannot be mistaken for another use of the dot character.
What other use might this be? I can't imagine any such legal expression.
(To clarify the motivation: I'm aware that many people write numbers like 9.0 or 0.7 all the time without needing to, but I can't quite befriend myself with this. I'm ok with 0.7 rather then the more compact but otherwise no better .7, but outwritten trailing zeroes feel just wrong to me unless they express some quantity is precise up to tenths, which is seldom the case in the occasions Haskell makes me write 9.0-numbers.)
I forgot it's legal to write function composition without surrounding whitespaces! That's of course a possibility, though one could avoid this problem by parsing floating literals greedily, such that replicate 3 . pred$8 ≡ ((replicate 3) . pred) 8 but replicate 3.pred$8 ≡ (replicate 3.0 pred)8.
There is no expression where an integer literal is required to stand directly next to a ., without whitespace?
One example of other uses is a dot operator (or any other operator starting or ending with a dot): replicate 3.pred$8.
Another possible use is in range expressions: [1..10].
Also, you can (almost) always write 9 instead of 9.0, thus avoiding the need for . altogether.
One of the most prominent usages of (.) is the function composition. And so the haskell compiler interpretes a . 1 composing the function a with a number and does not know what to do; analogously the other way 'round. Other usage of (.) could be found here.
Other problems with .7 vs. 0.7 are not known to me.
I don't really seem much of a problem with allowing '9.' and '.7'. I think the current design is more of a reflection of the ideas of the original designers of Haskell.
While it could probably be disambiguated, I don't think there is much to be gained from allowing .7 and 7.. Code is meant to be read by people as well as machines, and it's much easier to accidentally miss a decimal point at either end of a literal than in the middle.
I'll take the extra readability over the saved byte any day.

Why do programming languages use commas to separate function parameters?

It seems like all programming languages use commas (,) to separate function parameters.
Why don't they use just spaces instead?
Absolutely not. What about this function call:
function(a, b - c);
How would that look with a space instead of the comma?
function(a b - c);
Does that mean function(a, b - c); or function(a, b, -c);? The use of the comma presumably comes from mathematics, where commas have been used to separate function parameters for centuries.
First of all, your premise is false. There are languages that use space as a separator (lisp, ML, haskell, possibly others).
The reason that most languages don't is probably that a) f(x,y) is the notation most people are used to from mathematics and b) using spaces leads to lots of nested parentheses (also called "the lisp effect").
Lisp-like languages use: (f arg1 arg2 arg3) which is essentially what you're asking for.
ML-like languages use concatenation to apply curried arguments, so you would write f arg1 arg2 arg3.
Tcl uses space as a separator between words passed to commands. Where it has a composite argument, that has to be bracketed or otherwise quoted. Mind you, even there you will find the use of commas as separators – in expression syntax only – but that's because the notation is in common use outside of programming. Mathematics has written n-ary function applications that way for a very long time; computing (notably Fortran) just borrowed.
You don't have to look further than most of our natural languages to see that comma is used for separation items in lists. So, using anything other than comma for enumerating parameters would be unexpected for anyone learning a programming language for the first time.
There's a number of historical reasons already pointed out.
Also, it's because in most languages, where , serves as separator, whitespace sequences are largely ignored, or to be more exact, although they may separate tokens, they do not act as tokens themselves. This is moreless true for all languages deriving their syntax from C. A sequence of whitespaces is much like the empty word and having the empty word delimit anything probably is not the best of ideas.
Also, I think it is clearer and easier to read. Why have whitespaces, which are invisible characters, and essentially serve nothing but the purpose of formatting, as really meaningful delimiters. It only introduces ambiguity. One example is that provided by Carl.
A second would f(a (b + c)). Now is that f(a(b+c)) or f(a, b+c)?
The creators of JavaScript had a very useful idea, similar to yours, which yields just the same problems. The idea was, that ENTER could also serve as ;, if the statement was complete. Observe:
function a() {
return "some really long string or expression or whatsoever";
}
function b() {
return
"some really long string or expression or whatsoever";
}
alert(a());//"some really long string or expression or whatsoever"
alert(b());//"undefined" or "null" or whatever, because 'return;' is a valid statement
As a matter of fact, I sometimes tend to use the latter notation in languages, that do not have this 'feature'. JavaScript forces a way to format my code upon me, because someone had the cool idea, of using ENTER instead of ;.
I think, there is a number of good reasons why some languages are the way they are. Especially in dynamic languages (as PHP), where there's no compile time check, where the compiler could warn you, that the way it resolved an ambiguity as given above, doesn't match the signature of the call you want to make. You'd have a lot of weird runtime errors and a really hard life.
There are languages, which allow this, but there's a number of reasons, why they do so. First and foremost, because a bunch of very clever people sat down and spent quite some time designing a language and then discovered, that its syntax makes the , obsolete most of the time, and thus took the decision to eliminate it.
This may sound a bit wise but I gather for the same reason why most earth-planet languages use it (english, french, and those few others ;-) Also, it is intuitive to most.
Haskell doesn't use commas.
Example
multList :: [Int] -> Int -> [Int]
multList (x : xs) y = (x * y) : (multList xs y)
multList [] _ = []
The reason for using commas in C/C++ is that reading a long argument list without a separator can be difficult without commas
Try reading this
void foo(void * ptr point & * big list<pointers<point> > * t)
commas are useful like spaces are. In Latin nothing was written with spaces, periods, or lower case letters.
Try reading this
IAMTHEVERYMODELOFAWHATDOYOUWANTNOTHATSMYBUCKET
it's primarily to help you read things.
This is not true. Some languages don't use commas. Functions have been Maths concepts before programming constructs, so some languages keep the old notation. Than most of the newer has been inspired by C (Javascript, Java, C#, PHP too, they share some formal rules like comma).
While some languages do use spaces, using a comma avoids ambiguous situations without the need for parentheses. A more interesting question might be why C uses the same character as a separator as is used for the "a then b" operator; the latter question is in some ways more interesting given that the C character set has at three other characters that do not appear in any context (dollar sign, commercial-at, and grave, and I know at least one of those (the dollar sign) dates back to the 40-character punchcard set.
It seems like all programming languages use commas (,) to separate function parameters.
In natural languages that include comma in their script, that character is used to separate things. For instance, if you where to enumerate fruits, you'd write: "lemon, orange, strawberry, grape" That is, using comma.
Hence, using comma to separate parameters in a function is more natural that using other character ( | for instance )
Consider:
someFunction( name, age, location )
vs.
someFunction( name|age|location )
Why don't they use just spaces instead?
Thats possible. Lisp does it.
The main reason is, space, is already used to separate tokens, and it's easier not to assign an extra functionality.
I have programmed in quite a few languages and while the comma does not rule supreme it is certainly in front. The comma is good because it is a visible character so that script can be compressed by removing spaces without breaking things. If you have space then you can have tabs and that can be a pain in the ... There are issues with new-lines and spaces at the end of a line. Give me a comma any day, you can see it and you know what it does. Spaces are for readability (generally) and commas are part of syntax. Mind you there are plenty of exceptions where a space is required or de rigueur. I also like curly brackets.
It is probably tradition. If they used space they could not pass expression as param e.g.
f(a-b c)
would be very different from
f(a -b c)
Some languages, like Boo, allow you to specify the type of parameters or leave it out, like so:
def MyFunction(obj1, obj2, title as String, count as Int):
...do stuff...
Meaning: obj1 and obj2 can be of any type (inherited from object), where as title and count must be of type String and Int respectively. This would be hard to do using spaces as separators.

Resources