Is functional Clojure or imperative Groovy more readable? - groovy

OK, no cheating now.
No, really, take a minute or two and try this out.
What does "positions" do?
Edit: simplified according to cgrand's suggestion.
(defn redux [[current next] flag] [(if flag current next) (inc next)])
(defn positions [coll]
(map first (reductions redux [1 2] (map = coll (rest coll)))))
Now, how about this version?
def positions(coll) {
def (current, next) = [1, 1]
def previous = coll[0]
coll.collect {
current = (it == previous) ? current : next
next++
previous = it
current
}
}
I'm learning Clojure and I'm loving it, because I've always enjoyed functional programming. It took me longer to come up with the Clojure solution, but I enjoyed having to think of an elegant solution. The Groovy solution is alright, but I'm at the point where I find this type of imperative programming boring and mechanical. After 12 years of Java, I feel in a rut and functional programming with Clojure is the boost I needed.
Right, get to the point. Well, I have to be honest and say that I wonder if I'll understand the Clojure code when I go back to it months later. Sure I could comment the heck out of it, but I don't need to comment my Java code to understand it.
So my question is: is it a question of getting more used to functional programming patterns? Are functional programming gurus reading this code and finding it a breeze to understand? Which version did you find easier to understand?
Edit: what this code does is calculate the positions of players according to their points, while keep track of those who are tied. For example:
Pos Points
1. 36
1. 36
1. 36
4. 34
5. 32
5. 32
5. 32
8. 30

I don't think there's any such thing as intrinsic readability. There's what you're used to, and what you aren't used to. I was able to read both versions of your code OK. I could actually read your Groovy version more easily, even though I don't know Groovy, because I too spent a decade looking at C and Java and only a year looking at Clojure. That doesn't say anything about the languages, it only says something about me.
Similarly I can read English more easily than Spanish, but that doesn't say anything about the intrinsic readability of those languages either. (Spanish is actually probably the "more readable" language of the two in terms of simplicity and consistency, but I still can't read it). I'm learning Japanese right now and having a heck of a hard time, but native Japanese speakers say the same about English.
If you spent most of your life reading Java, of course things that look like Java will be easier to read than things that don't. Until you've spent as much time looking at Lispy languages as looking at C-like languages, this will probably remain true.
To understand a language, among other things you have to be familiar with:
syntax ([vector] vs. (list), hyphens-in-names)
vocabulary (what does reductions mean? How/where can you look it up?)
evaluation rules (does treating functions as objects work? It's an error in most languages.)
idioms, like (map first (some set of reductions with extra accumulated values))
All of these take time and practice and repetition to learn and internalize. But if you spend the next 6 months reading and writing lots of Clojure, not only will you be able to understand that Clojure code 6 months from now, you'll probably understand it better than you do now, and maybe even be able to simplify it. How about this:
(use 'clojure.contrib.seq-utils) ;;'
(defn positions [coll]
(mapcat #(repeat (count %) (inc (ffirst %)))
(partition-by second (indexed coll))))
Looking at Clojure code I wrote a year ago, I'm horrified at how bad it is, but I can read it OK. (Not saying your Clojure code is horrible; I had no trouble reading it at all, and I'm no guru.)

I agree with Timothy: you introduce too much abstractions. I reworked your code and ended with:
(defn positions [coll]
(reductions (fn [[_ prev-score :as prev] [_ score :as curr]]
(if (= prev-score score) prev curr))
(map vector (iterate inc 1) coll)))
About your code,
(defn use-prev [[a b]] (= a b))
(defn pairs [coll] (partition 2 1 coll))
(map use-prev (pairs coll))
can be simply refactored as:
(map = coll (rest coll))

edit: may not be relevant anymore.
The Clojure one is convoluted to me. It contains more abstractions which need to be understood. This is the price of using higher order functions, you have to know what they mean. So in an isolated case, imperative requires less knowledge. But the power of abstractions is in their means of combination. Every imperative loop must be read and understood, whereas sequence abstractions allow you to remove the complexity of a loop and combine powerful opperations.
I would further argue that the Groovy version is at least partially functional as it uses collect, which is really map, a higher order function. It has some state in it also.
Here is how I would write the Clojure version:
(defn positions2 [coll]
(let [current (atom 1)
if-same #(if (= %1 %2) #current (reset! current (inc %3)))]
(map if-same (cons (first coll) coll) coll (range (count coll)))))
This is quite similar to the Groovy version in that it uses a mutable "current", but differs in that it doesn't have a next/prev variable - instead using immutable sequences for those. As Brian elloquently put it, readability is not intrinsic. This version is my preference for this particular case, and seems to sit somewhere in the middle.

The Clojure one is more convoluted at first glance; though it maybe more elegant.
OO is the result to make language more "relatable" at higher-level.
Functional languages seems to have a more "algorithimc"(primitive/elementary) feel to it.
That's just what I felt at the moment.
Maybe that will change when I have more experience working with clojure.
I'm afraid that we are decending into the game of which language can be the most concise or solve a problem in the least line of code.
The issue are 2 folds for me:
How easy at first glance to get a feel of what the code is doing?.
This is important for code maintainers.
How easy is it to guess at the logic behind the code?.
Too verbose/long-winded?. Too terse?
"Make everything as simple as possible, but not simpler."
Albert Einstein

I too am learning Clojure and loving it. But at this stage of my development, the Groovy version was easier to understand. What I like about Clojure though is reading the code and having the "Aha!" experience when you finally "get" what is going on. What I really enjoy is the similar experience that happens a few minutes later when you realize all of the ways the code could be applied to other types of data with no changes to the code. I've lost count of the number of times I've worked through some numerical code in Clojure and then, a little while later, thought of how that same code could be used with strings, symbols, widgets, ...
The analogy I use is about learning colors. Remember when you were introduced to the color red? You understood it pretty quickly -- there's all this red stuff in the world. Then you heard the term magenta and were lost for a while. But again, after a little more exposure, you understood the concept and had a much more specific way to describe a particular color. You have to internalize the concept, hold a bit more information in your head, but you end up with something more powerful and concise.

Groovy supports various styles of solving this problem too:
coll.groupBy{it}.inject([]){ c, n -> c + [c.size() + 1] * n.value.size() }
definitely not refactored to be pretty but not too hard to understand.

I know this is not an answer to the question, but I will be able "understand" the code much better if there are tests, such as:
assert positions([1]) == [1]
assert positions([2, 1]) == [1, 2]
assert positions([2, 2, 1]) == [1, 1, 3]
assert positions([3, 2, 1]) == [1, 2, 3]
assert positions([2, 2, 2, 1]) == [1, 1, 1, 4]
That will tell me, one year from now, what the code is expected to do. Much better than any excellent version of the code I have seen here.
Am I really off topic?
The other thing is, I think "readability" depends on the context. It depends who will maintain the code. For example, in order to maintain the "functional" version of the Groovy code (however brilliant), it will take not only Groovy programmers, but functional Groovy programmers...
The other, more relevant, example is: if a few lines of code make it easier to understand for "beginner" Clojure programmers, then the code will overall be more readable because it will be understood by a larger community: no need to have studied Clojure for three years to be able to grasp the code and make edits to it.

Related

Does Lisp's treatment of code as data make it more vulnerable to security exploits than a language that doesn't treat code as data? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I know that this might be a stupid question but I was curious.
Since Lisp treats code and data the same, does this mean that it's easier to write a payload and pass it as "innocent" data that can be used to exploit programs? In comparison to languages that don't do so?
For e.g. In python you can do something like this.
malicious_str = "print('this is a malicious string')"
user_in = eval(malicious_str)
>>> this is a malicious string
P.S I have just started learning Lisp.
No, I don't think it does. In fact because of what is normally meant by 'code is data' in Lisp, it is potentially less vulnerable than some other languages.
[Note: this answer is really about Common Lisp: see the end for a note about that.]
There are two senses in which 'code can be data' in a language.
Turning objects into executable code: eval & friends
This is the first sense. What this means is that you can, say, take a string or some other object (not all types of object, obviously) and say 'turn this into something I can execute, and do that'.
Any language that can do this has either
to be extremely careful about doing this on unconstrained data;
or to be able to be certain that a given program does not actually do this.
Plenty of languages have equivalents of eval and its relations, so plenty of languages have this problem. You give an example of Python for instance, which is a good one, and there are probably other examples in Python (I've written programs even in Python 2 which supported dynamic loading of modules at runtime, which executes potentially arbitrary code, and I think this stuff is much better integrated in Python 3).
This is also not just a property of a language: it's a property of a system. C can't do this, right? Well, yes it can if you're on any kind of reasonable *nixy platform. Not only can you use an exec-family function, but you can probably dynamically load a shared library and execute code in it.
So one solution to this problem is to, somehow, be able to be certain that a given program doesn't do this. One thing that helps is if there are a finite, known number of ways of doing it. In Common Lisp I think those are probably
eval of course;
unconstrained read (because of *read-eval*);
load;
compile;
compile-file;
and probably some others that I have forgotten.
Well, can you detect calls to those, statically, in a program? Not really: consider this:
(funcall (symbol-function (find-symbol s)) ...)
and now you're in trouble unless you have very good control over what s is: it might be "EVAL" for instance.
So that's frightening, but I don't think it's more frightening than what Python can do, for instance (almost certainly you can poke around in the namespace to find eval or something?). And something like that in a program ought to be a really big hint that bad things might happen.
I think there are probably two approaches to this, neither of which CL adopts but which implementations could (and perhaps even programs written in CL could).
One would be to be able to run programs in such a way that the finite set of bad functions above simply are disallowed: they'd signal errors if you tried to call them. An implementation could clearly do that (see below).
The other would be to have something like Perl's 'tainting' where data which came from a user needs to be explicitly looked-at by the program somehow before it's used. That doesn't guarantee safety of course, but it does make it harder to make silly mistakes: if s above came from user input and was thus tainted you'd have to explicitly say 'it's OK to use it' and, well, then it's up to you.
So this is a problem, but I don't think it's worse than the problems that very many other languages (and language-families) have.
An example of an implementation that can address the first approach is LispWorks: if you're building an application with LW, you typically create the binary with a function called deliver, which has options which allow you to remove the definitions of functions from the resulting binary whether or not the delivery process would otherwise leave them there. So, for instance
(deliver 'foo "x" 5
:functions-to-remove '(eval load compile compile-file read))
would result in an executable x which, whatever else it did, couldn't call those functions, because they're not present, at all.
Other implementations probably have similar features: I just don't know them as well.
But there's another sense in which 'code is data' in Lisp.
Program source code is available as structured data
This is the sense that people probably really mean when they say 'code is data' in Lisp, even if they don't know that. It's worth looking at your Python example again:
>>> eval("exit('exploded')")
exploded
$
So what eval eats is a string: a completely unstructured vector of characters. If you want to know whether that string contains something nasty, well, you've got a lot of work ahead of you (disclaimer: see below).
Compare this with CL:
> (let ((trying-to-be-bad "(end-the-world :now t)"))
(eval trying-to-be-bad))
"(end-the-world :now t)"
OK, so that clearly didn't end the world. And it didn't end the world because eval evaluates a bit of Lisp source code, and the value of a string, as source code, is the string.
If I want to do something nasty I have to hand it an actual interesting structure:
> (let ((actually-bad '(eval (progn
(format *query-io* "? ")
(finish-output *query-io*)
(read *query-io*)))))
(eval actually-bad))
? (defun foo () (foo))
foo
Now that's potentially quite nasty in at least several ways. But wait: in order to do this nasty thing, I had to hand eval a chunk of source code represented as an s-expression. And the structure of that s-expression is completely open to inspection by me. I can write a program which inspects this s-expression in any arbitrary way I like, and decides whether or not it is acceptable to me. That's just hugely easier than 'given this string, interpret it as a piece of source text for the language and tell me if it is dangerous':
the process of turning the sequence of characters into an s-expression has happened already;
the structure of s-expressions is both simple and standard.
So in this sense of 'code is data', Lisp is potentially much safer than other languages which have versions of eval which eat strings, like Python, say, because code is structured, standard, simple data. Lisp has an answer to the terrible 'language in a string' problem.
I am fairly sure that Python does in fact have some approach to making the parse tree available in a standard way which can be inspected. But eval still happily eats strings.
As I said above, this answer is about Common Lisp. But there are many other Lisps of course, which will have varying versions of this problem. Racket for instance probably can really fairly tightly constrain things, using sandboxed execution and modules, although I haven't explored this.
Any language can be exploited if you are not careful.
A well-known attack against Lisp is via the #. reader macro:
(read-from-string "#.(start-the-war)")
will start the war if *read-eval* is non-nil - this is why one should always bind it when reading from an un-trusted stream.
However, this is not directly related to "code is data" doctrine...

How does flexibility affect a language's syntax?

I am currently working on writing my own language(shameless plug), which is centered around flexibility. I am trying to make almost any part of the language syntax exchangeable through things like extensions/plugins. While writing the whole thing, it has got me thinking. I am wondering how that sort of flexibility could affect the language.
I know that Lisp is often referred to as one of the most extensible languages due to its extensive macro system. I do understand that concept of macros, but I am yet to find a language that allows someone to change the way it is parsed. To my knowledge, almost every language has an extremely concrete syntax as defined by some long specification.
My question is how could having a flexible syntax affect the intuitiveness and usability of the language? I know the basic "people might be confused when the syntax changes" and "semantic analysis will be hard". Those are things that I am already starting to compensate for. I am looking for a more conceptual answer on the pros and cons of having a flexible syntax.
The topic of language design is still quite foreign to me, so I apologize if I am asking an obvious or otherwise stupid question!
Edit:
I was just wanting to clarify the question I was asking. Where exactly does flexibility in a language's syntax stand, in terms of language theory? I don't really need examples or projects/languages with flexibility, I want to understand how it can affect the language's readability, functionality, and other things like that.
Perl is the most flexible language I know. That a look at Moose, a postmodern object system for Perl 5. It's syntax is very different than Perl's but it is still very Perl-ish.
IMO, the biggest problem with flexibility is precedence in infix notation. But none I know of allow a datatype to have its own infix syntax. For example, take sets. It would be nice to use ⊂ and ⊇ in their syntax. But not only would a compiler have to recognize these symbols, it would have to be told their order of precedence.
Common Lisp allows to change the way it's parsed - see reader macros. Racket allows to modify its parser, see racket languages.
And of course you can have a flexible, dynamically extensible parsing alongside with powerful macros if you use the right parsing techniques (e.g., PEG). Have a look at an example here - mostly a C syntax, but extensible with both syntax and semantic macros.
As for precedence, PEG goes really well together with Pratt.
To answer your updated question - there is surprisingly little research done on programming languages readability anyway. You may want to have a look at what Dr. Blackwell group was up to, but it's still far from conclusive.
So I can only share my hand-wavy anecdotes - flexible syntax languages facilitates eDSL construction, and, in my opinion, eDSLs is the only way to eliminate unnecessary complexity from code, to make code actually maintainable in a long term. I believe that non-flexible languages are one of the biggest mistakes made by this industry, and it must be corrected at all costs, ASAP.
Flexibility allows you to manipulate the syntax of the language. For example, Lisp Macros can enable you to write programs that write programs and manipulate your syntax at compile-time to valid Lisp expressions. For example the Loop Macro:
(loop for x from 1 to 5
do(format t "~A~%" x))
1
2
3
4
5
NIL
And we can see how the code was translated with macroexpand-1:
(pprint(macroexpand-1 '(loop for x from 1 to 5
do (format t "~a~%" x))))
We can then see how a call to that macro is translated:
(LET ((X 1))
(DECLARE (TYPE (AND REAL NUMBER) X))
(TAGBODY
SB-LOOP::NEXT-LOOP
(WHEN (> X '5) (GO SB-LOOP::END-LOOP))
(FORMAT T "~a~%" X)
(SB-LOOP::LOOP-DESETQ X (1+ X))
(GO SB-LOOP::NEXT-LOOP)
SB-LOOP::END-LOOP)))
Language Flexibility just allows you to create your own embedded language within a language and reduce the length of your program in terms of characters used. So in theory, this can make a language very unreadable since we can manipulate the syntax. For example we can create invalid code that's translated to valid code:
(defmacro backwards (expr)
(reverse expr))
BACKWARDS
CL-USER> (backwards ("hello world" nil format))
"hello world"
CL-USER>
Clearly the above code can become complex since:
("hello world" nil format)
is not a valid Lisp expression.
Thanks to SK-logic's answer for pointing me in the direction of Alan Blackwell. I sent him an email asking his stance on the matter, and he responded with an absolutely wonderful explanation. Here it is:
So the person who responded to your StackOverflow question, saying
that flexible syntax could be useful for DSLs, is certainly correct.
It actually used to be fairly common to use the C preprocessor to
create alternative syntax (that would be turned into regular syntax in
an initial compile phase). A lot of the early esolangs were built this
way.
In practice, I think we would have to say that a lot of DSLs are
implemented as libraries within regular programming languages, and
that the library design is far more significant than the syntax. There
may be more purpose for having variety in visual languages, but making
customisable general purpose compilers for arbitrary graphical syntax
is really hard - much worse than changing text syntax features.
There may well be interesting things that your design could enable, so
I wouldn’t discourage experimentation. However, I think there is one reason why
customisable syntax is not so common. This is related to the famous
programmer’s editor EMACS. In EMACS, everything is customisable - all
key bindings, and all editor functions. It’s fun to play with, and
back in the day, many of us made our own personalised version that
only we knew how to operate. But it turned out that it was a real
hassle that everyone’s editor worked completely differently. You could
never lean over and make suggestions on another person’s session, and
teams always had to know who was logged in order to know whether the
editor would work. So it turned out that, over the years, we all just
started to use the default distribution and key bindings, which made
things easier for everyone.
At this point in time, that is just about enough of an explanation that I was looking for. If anyone feels as though they have a better explanation or something to add, feel free to contact me.

From a high level programming perspective, what is the major difference between C# and F#? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I'm aware that they both use different programming paradigms, but from a high level perspective apart from differing syntax it seems most basic tasks can be achieved in similar fashion.
I only say this because when I've previously touched functional programming languages such as Haskell, writing code for basics tasks was (at first) difficult, frustrating, and required a completely different mindset.
For example the following took some time to get to grips with using recursive syntax:
loop :: Int -> IO ()
loop n = if 0 == n then return () else loop (n-1)
Where as an F# loop is recognisable and understable almost immediately:
let list1 = [ 1; 5; 100; 450; 788 ]
for i in list1 do
printfn "%d" i
When C# programmers start learning F# they are advised to completely re-think their thought pattern (which was definitely required for Haskell), but I've now written several F# programs dealing with conditions, loops, data sets etc to perform practical tasks, and I'm wondering where the 'different-paradigm' barrier really kicks in?
Hopefully someone will be able to solve my confusion.
wThe main difference when the barrier kicks in is when people have to think in terms of functions, not in terms of objects.
Yes, it is totally possible to do object-oriented code in F#, and in this matter there is not that much of a difference between these two besides syntax. But that's not the point when using F#, even if F# allows you do to it.
The barrier kicks in when developers start solving problems in a functional way.
Here are the some of the topics that are new for C#/OO developers when learning F#/FP
Pattern matching. Sometimes people have hard times comprehending its usefulness.
Tail recursion (and the "recursive" way of solving problems)
Discriminated unions (people still try to look at them as to hierarchy of classes, IEquatable/IComparable implementations, etc instead of just thinking declaratively)
The "unit" value over "void".
Partial application (gets a bit easier as latest versions of C# allows us to deal with Funcs, but as it looks ugly not many do)
The whole concept of values over variables (including immutability)
The main difference between C# and F# is that F# gives you all this, and it makes sense to take it and use it for good.
However, yes, it is still possible writing "Csharpish" code in F# without kicking any barrier except that in this case one will hate F# for its syntax.
Your question is a bit misleading. From a very high-level perspective, pretty much all programming language are equivalent. They are all turing-complete and as such, allow you to solve the same set of problems.
From a still high-level, but more concrete perspective, C# and F# differ in so far, as F#'s functionality is a superset of C#'s. (please, do not flame me for this, I know it is not true, strictly speaking, but it gives a picture)
F# being a .net language, it inherits .net's object model and in the object-oriented subset is therefore very similar to C#, with a more lightweight syntax due to better type inference.
However, F# also supports two other paradigms:
functional programming: F# "variables", they are in fact called values, are immutable by default and as such a c# style int i = 0; i = i+1; looks very differently in F#, because you need to allow for mutability explicitly let mutable i = 0; i <- i + 1;. So if you look at the functional subset, F# is, in fact a lot closer to Haskell than it is to C#.
imperative programming: You can also write F# code in a script-oriented style, without classes, modules, etc. Just a pure script, and in this case it also looks very differently from C#.
Your example used loops similar in style to how you would write C# code and therefore it felt similar.
If you do a very small change, however, you can achieve the same thing in a way that is already quite different from C#. [ 1; 5; 100; 450; 788 ] |> List.iter (printfn "%d")
The reason why people tend to claim you need to change the way how you think about problems, is because the incentive of F#, for a C# programmer, usually is the functional subset, not the object-oriented one.
Looks like you haven't done too much Haskell?
How is, for example,
let list1 = [ 1, 5, 100, 450, 788 ]
forM_ list1 printStrLn
less recognzable?
If you like, you can even have an alias for for forM_

A language in which everything compiles

I'm trying to do some research for a new project, and I need to create objects dynamically from random data.
For this to work, I need a language / compiler that doesn't have problems with weird uncompilable code lying around.
Basically, I need the random code to compile (or be interpreted) as much as possible - Meaning that the uncompilable parts will be ignored, and only the compilable parts will create the objects (which could be ran).
Object Oriented-ness is not a must, but is a very strong advantage.
I thought of ASM, but it's very messy, and I'd probably need a more readable code
Thanks!
It sounds like you're doing something very much like genetic programming; even if you aren't, GP has to solve some of the same problems—using randomness to generate valid programs. The approach to this that is typically used is to work with a syntax tree: rather than storing x + y * 3 - 2, you store something like the following:
Then, instead of randomly changing the syntax, one can randomly change nodes in the tree instead. And if x should randomly change to, say, +, you can statically know that this means you need to insert two children (or not, depending on how you define +).
A good choice for a language to work with for this would be any Lisp dialect. In a Lisp, the above program would be written (- (+ x (* y 3)) 2), which is just a linearization of the syntax tree using parentheses to show depth. And in fact, Lisps expose this feature: you can just as easily work with the object '(- (+ x (* y 3)) 2) (note the leading quote). This is a three-element list, whose first element is -, second element is another list, and third element is 2. And, though you might or might not want it for your particular application, there's an eval function, such that (eval '(- (+ x (* y 3)) 2)) will take in the given list, treat it as a Lisp syntax tree/program, and evaluate it. This is what makes Lisps so attractive for doing this sort of work; Lisp syntax is basically a reification of the syntax-tree, and if you operate at the syntax-tree level, you can work on code as though it was a value. Lisp won't help you read /dev/random as a program directly, but with a little interpretation layered on top, you should be able to get what you want.
I should also mention, though I don't know anything about it (not that I know much about ordinary genetic programming) the existence of linear genetic programming. This is sort of like the assembly model that you mentioned—a linear stream of very, very simple instructions. The advantage here would seem to be that if you are working with /dev/random or something like it, the amount of interpretation needed is very small; the disadvantage would be, as you mentioned, the low-level nature of the code.
I'm not sure if this is what you're looking for, but any programming language can be made to function this way. For any programming language P, define the language Palways as follows:
If p is a valid program in P, then p is a valid program in Palways whose meaning is the same as its meaning in P.
If p is not a valid program in P, then p is a valid program in Palways whose meaning is the same as a program that immediately terminates.
For example, I could make the language C++always so that this program:
#include <iostream>
using namespace std;
int main() {
cout << "Hello, world!" << endl;
}
would compile as "Hello, world!", while this program:
Hahaha! This isn't legal C++ code!
Would be a legal program that just does absolutely nothing.
To solve your original problem, just take any OOP language like Java, Smalltalk, etc. and construct the appropriate Javaalways, Smalltalkalways, etc. language from it. Again, I'm not sure if this is at all what you're looking for, but it could be done very easily.
Alternatively, consider finding a grammar for any OOP language and then using that grammar to produce random syntactically valid programs. You could then filter those programs down by using the Palways programming language for that language to eliminate syntactically but not semantically valid programs.
Divide the ASCII byte values into 9 classes (division modulo 9 would help). Then assign then to Brainfuck codewords (see http://en.wikipedia.org/wiki/Brainfuck). Then interpret as Brainfuck.
There you go, any sequence of ASCII characters is a program. Not that it's going to do anything sensible... This approach has a much better chance, compared to templatetypedef's answer, to get a nontrivial program from a random byte sequence.
Text Editors
You could try feeding random character strings to an editor like Emacs or VI. Many (most?) characters will perform an editing action but some will do nothing (other than beep, perhaps). You would have to ensure that the random code mutator never generates the character sequence that exits the editor. However, this experience would be much like programming a Turing machine -- the code is not too readable.
Mathematica
In Mathematica, undefined symbols and other expressions evaluate to themselves, without error. So, that language might be a viable choice if you can arrange for the random code mutator to always generate well-formed expressions. This would be readily achievable since the basic Mathematica syntax is trivial, making it easy to operate on syntactic units rather than at the character level. It would be even easier if the mutator were written in Mathematica itself since expression-munging is Mathematica's forte. You could define a mini-language of valid operations within a Mathematica package that does not import the system-defined symbols. This would allow you to generate well-formed expressions to your heart's content without fear of generating a dangerous expression, like DeleteFile[FileNames["*.*", "/", Infinity]].
I believe Common Lisp should suit your needs. I always have some code in my SLIME/Emacs session that wouldn't compile. You can always tweak things, redefine functions in run-time. It is actually very good for prototyping.
A few years ago it took me quite a while to learn. But nowadays we have quicklisp and everything is so much easier.
Here I describe my development environment:
Install lisp on my linux machine
PS: I want to give an example, where Common Lisp was useful for me:
Up to maybe 2004 I used to write small programs in C (the keep it simple Unix way).
The last 3 years I had to get lots of different hardware running. Motorized stages, scientific cameras, IO cards.
The cameras turned out to be quite annoying. Usually you have to cool them down to -50 degree celsius or so and (in some SDKs) they don't like it when you close them. But this
is exactly how my C development cycle worked: write (30s), compile (1s), run (0.1s), repeat.
Eventually I decided to just use Common Lisp. Often it is straight forward to define the foreign function interfaces to talk to the SDKs and I can do this without ever leaving the running Lisp image. I start the editor in the morning define the open-device function, to talk to the device and after 3 hours I have enough of the functions implemented to set gain, temperature, region of interest and obtain the video.
Then I can often put the SDK manual away and just use the camera.
I used the same interactive programming approach when I have to parse some webpage or some weird XML.

Can a language have Lisp's powerful macros without the parentheses?

Can a language have Lisp's powerful macros without the parentheses?
Sure, the question is whether the macro is convenient to use and how powerful they are.
Let's first look how Lisp is slightly different.
Lisp syntax is based on data, not text
Lisp has a two-stage syntax.
A) first there is the data syntax for s-expressions
examples:
(mary called tim to tell him the price of the book)
(sin ( x ) + cos ( x ))
s-expressions are atoms, lists of atoms or lists.
B) second there is the Lisp language syntax on top of s-expressions.
Not every s-expression is a valid Lisp program.
(3 + 4)
is not a valid Lisp program, because Lisp uses prefix notation.
(+ 3 4)
is a valid Lisp program. The first element is a function - here the function +.
S-expressions are data
The interesting part is now that s-expressions can be read and then Lisp uses the normal data structures (numbers, symbols, lists, strings) to represent them.
Most other programming languages don't have a primitive representation for internalized source - other than strings.
Note that s-expressions here are not representing an AST (Abstract Syntax Tree). It's more like a hierarchical token tree coming out of a lexer phase. A lexer identifies the lexical elements.
The internalized source code now makes it easy to calculate with code, because the usual functions to manipulate lists can be applied.
Simple code manipulation with list functions
Let's look at the invalid Lisp code:
(3 + 4)
The program
(defun convert (code)
(list (second code) (first code) (third code)))
(convert '(3 + 4)) -> (+ 3 4)
has converted an infix expression into the valid Lisp prefix expression. We can evaluate it then.
(eval (convert '(3 + 4))) -> 7
EVAL evaluates the converted source code. eval takes as input an s-expression, here a list (+ 3 4).
How to calculate with code?
Programming languages now have at least three choices to make source calculations possible:
base the source code transformations on string transformations
use a similar primitive data structure like Lisp. A more complex variant of this is a syntax based on XML. One could then transform XML expressions. There are other possible external formats combined with internalized data.
use a real syntax description format and represent the source code internalized as a syntax tree using data structures that represent syntactic categories. -> use an AST.
For all these approaches you will find programming languages. Lisp is more or less in camp 2. The consequence: it is theoretically not really satisfying and makes it impossible to statically parse source code (if the code transformations are based on arbitrary Lisp functions). The Lisp community struggles with this for decades (see for example the myriad of approaches that the Scheme community has tried). Fortunately it is relatively easy to use, compared to some of the alternatives and quite powerful. Variant 1 is less elegant. Variant 3 leads to a lot complexity in simple AND complex transformations. It usually also means that the expression was already parsed with respect to a specific language grammar.
Another problem is HOW to transform the code. One approach would be based on transformation rules (like in some Scheme macro variants). Another approach would be a special transformation language (like a template language which can do arbitrary computations). The Lisp approach is to use Lisp itself. That makes it possible to write arbitrary transformations using the full Lisp language. In Lisp there is not a separate parsing stage, but at any time expressions can be read, transformed and evaluated - because these functions are available to the user.
Lisp is kind of a local maximum of simplicity for code transformations.
Other frontend syntax
Also note that the function read reads s-expressions to internal data. In Lisp one could either use a different reader for a different external syntax or reuse the Lisp built-in reader and reprogram it using the read macro mechanism - this mechanism makes it possible to extend or change the s-expression syntax. There are examples for both approaches to provide a different external syntax in Lisp.
For example there are Lisp variants which have a more conventional syntax, where code gets parsed into s-expressions.
Why is the s-expression-based syntax popular among Lisp programmers?
The current Lisp syntax is popular among Lisp programmers for two reasons:
1) the data is code is data idea makes it easy to write all kinds of code transformations based on the internalized data. There is also a relatively direct way from reading code, over manipulating code to printing code. The usual development tools can be used.
2) the text editor can be programmed in a straight forward way to manipulate s-expressions. That makes basic code and data transformations in the editor relatively easy.
Originally Lisp was thought to have a different, more conventional syntax. There were several attempts later to switch to other syntax variants - but for some reasons it either failed or spawned different languages.
Absolutely. It's just a couple orders of magnitude more complex, if you have to deal with a complex grammar. As Peter Norvig noted:
Python does have access to the
abstract syntax tree of programs, but
this is not for the faint of heart. On
the plus side, the modules are easy to
understand, and with five minutes and
five lines of code I was able to get
this:
>>> parse("2 + 2")
['eval_input', ['testlist', ['test', ['and_test', ['not_test', ['comparison',
['expr', ['xor_expr', ['and_expr', ['shift_expr', ['arith_expr', ['term',
['factor', ['power', ['atom', [2, '2']]]]], [14, '+'], ['term', ['factor',
['power', ['atom', [2, '2']]]]]]]]]]]]]]], [4, ''], [0, '']]
This was rather a disapointment to me. The Lisp parse of the equivalent expression is (+ 2 2). It seems that only a real expert would want to manipulate Python parse trees, whereas Lisp parse trees are simple for anyone to use. It is still possible to create something similar to macros in Python by concatenating strings, but it is not integrated with the rest of the language, and so in practice is not done.
Since I'm not a super-genius (or even a Peter Norvig), I'll stick with (+ 2 2).
Here's a shorter version of Rainer's answer:
In order to have lisp-style macros, you need a way of representing source-code in data structures. In most languages, the only "source code data structure" is a string, which doesn't have nearly enough structure to allow you to do real macros on. Some languages offer a real data structure, but it's too complex, like Python, so that writing real macros is stupidly complicated and not really worth it.
Lisp's lists and parentheses hit the sweet spot in the middle. Just enough structure to make it easy to handle, but not too much so you drown in complexity. As a bonus, when you nest lists you get a tree, which happens to be precisely the structure that programming languages naturally adopt (nearly all programming languages are first parsed into an "abstract syntax tree", or AST, before being actually interpreted/compiled).
Basically, programming Lisp is writing an AST directly, rather than writing some other language that then gets turned into an AST by the computer. You could possibly forgo the parens, but you'd just need some other way to group things into a list/tree. You probably wouldn't gain much from doing so.
Parentheses are irrelevant to macros. It's just Lisp's way of doing things.
For example, Prolog has a very powerful macros mechanism called "term expansion". Basically, whenever Prolog reads a term T, if tries a special rule term_expansion(T, R). If it is successful, the content of R is interpreted instead of T.
Not to mention the Dylan language, which has a pretty powerful syntactic macro system, which features (among other things) referential transparency, while being an infix (Algol-style) language.
Yes. Parentheses in Lisp are used in the classic way, as a grouping mechanism. Indentation is an alternative way to express groups. E.g. the following structures are equivalent:
A ((B C) D)
and
A
B
C
D
Have a look at Sweet-expressions. Wheeler makes a very good case that the reason things like infix notation have not worked before is that typical notation also tries to add precedence, which then adds complexity, which causes difficulties in writing macros.
For this reason, he proposes infix syntax like {1 + 2 + 3} and {1 + {2 * 3}} (note the spaces between symbols), that are translated to (+ 1 2) and (+ 1 (* 2 3)) respectively. He adds that if someone writes {1 + 2 * 3}, it should become (nfx 1 + 2 * 3), which could be captured, if you really want to provide precedence, but would, as a default, be an error.
He also suggests that indentation should be significant, proposes that functions could be called as fn(A B C) as well as (fn A B C), would like data[A] to translate to (bracketaccess data A), and that the entire system should be compatible with s-expressions.
Overall, it's an interesting set of proposals I'd like to experiment with extensively. (But don't tell anyone at comp.lang.lisp: they'll burn you at the stake for your curiosity :-).
Code rewriting in Tcl in a manner recognizably similar to Lisp macros is a common technique. For example, this is (trivial) code that makes it easier to write procedures that always import a certain set of global variables:
proc gproc {name arguments body} {
set realbody "global foo bar boo;$body"
uplevel 1 [list proc $name $arguments $realbody]
}
With that, all procedures declared with gproc xyz rather than proc xyz will have access to the foo, bar and boo globals. The whole key is that uplevel takes a command and evaluates it in the caller's context, and list is (among other things) an ideal constructor for substitution-safe code fragments.
Erlang's parse transforms are similar in power to Lisp macros, though they are much trickier to write and use (they are applied to the entire source file, rather than being invoked on demand).
Lisp itself had a brief dalliance with non-parenthesised syntax in the form of M-expressions. It didn't take with the community, though variants of the idea found their way into modern Lisps, so you get Lisp's powerful macros without the parentheses ... in Lisp!
Yes, you can definitely have Lisp macros without all the parentheses.
Take a look at "sweet-expressions", which provides a set of additional abbreviations for traditional s-expressions. They add indentation, a way to do infix, and traditional function calls like f(x), but in a way that is backwards-compatible (you can freely mix well-formatted s-expressions and sweet-expressions), generic, and homoiconic.
Sweet-expressions were developed on http://readable.sourceforge.net and there is a sample implementation.
For Scheme there is a SRFI for sweet-expressions, SRFI-110: http://srfi.schemers.org/srfi-110/
No, it's not necessary. Anything that gives you some sort of access to a parse tree would be enough to allow you to manipulate the macro body in hte same way as is done in Common Lisp. However, as the manipulation of the AST in lisp is identical to the manipulation of lists (something that is bordering on easy in the lisp family), it's possibly not nearly as natural without having the "parse tree" and "written form" be essentially the same.
I think this was not mentioned.
C++ templates are Turing-complete and perform processing at compile-time.
There is the well-known expression templates mechanism that allow transformations,
not from arbitrary code, but at least, from the subset of c++ operators.
So imagine you have 3 vectors of 1000 elements and you must perform:
(A + B + C)[0]
You can capture this tree in a expression template and arbitrarily manipulate it
at compile-time.
With this tree, at compile time, you can transform the expression.
For example, if that expression means A[0] + B[0] + C[0] for your domain, you could
avoid the normal c++ processing which would be:
Add A and B, adding 1000 elements.
Create a temporary for the result, and add with the 1000 elements of C.
Index the result to get the first element.
And replace with another transformed expression template tree that does:
Capture A[0]
Capture B[0]
Capture C[0]
Add all 3 results together in the result to return with += avoiding temporaries.
It is not better than lisp, I think, but it is still very powerful.
Yes, it is certainly possible. Especially if it is still a Lisp under the bonnet:
http://www.meta-alternative.net/pfront.pdf
http://www.meta-alternative.net/pfdoc.pdf
Boo has a nice "quoted" macro syntax that uses [| |] as delimiters, and has certain substitutions which are actually verified syntactically by the compiler pipeline using $variables. While simple and relatively painless to use, it's much more complicated to implement on the compiler side than s-expressions. Boo's solution may have a few limitations that haven't affected my own code. There's also an alternate syntax that reads more like ordinary OO code, but that falls into the "not for the faint of heart" category like dealing with Ruby or Python parse trees.
Javascript's template strings offer yet another approach to this sort of thing. For instance, Mark S. Miller's quasiParserGenerator implements a grammar syntax for parsers.
Go ahead and enter the Elixir programming language.
Elixir is a functional programming language that feels like Lisp with respect to macros, but it is on Ruby's clothes, and runs on top of the Erlang VM.
For those who do not like the parenthesis, but wish their language has powerful macros, Elixir is a great choice.
You can write macros in R (it have more like Algol Syntax) that have notion of delayed expression like in LISP macros. You can call substitute() or quote() to not evaluate the delayed expression but get actual expression and traverse its source code like in LISP. Even structure of the expression source code is like in LISP. Operators are first item in list. e.g.: input$foo which is getting property foo from list input as expression is written as ['$', 'input', 'foo'] just like in LISP.
You can check the ebook Metaprogramming in R that also show how to create Macros in R (not something you would normally do but it's possible). It's based on Article from 2001 Programmer’s Niche: Macros in R that explain how to write LIPS macros in R.

Resources