How to prove the correctness of a given grammar?

How to prove the correctness of a given grammar? - programming-languages

I am wondering how programming langauge developers validate and prove that their grammar is correct. Suppose that I created a new grammar for a new langauge. I can test my grammar with a unit test tool by providing different kinds of test programs. However, I will never 100% ensure that my grammar is correct. How do language developers ensure that their grammar is correct in real world?
Let's say I created a grammar for a new language using pencil and paper. However, I did a mistake and my grammar accepts the expressions that end with a + like 2+2+. I will implement my language using this incorrect grammar, if I don't find the mistake in it. After implementation and unit testing, I can find the error. Is it possible to find it before starting any implementation?
Definitely, I can try my grammar with some sample inputs using pencil and paper (derivation etc.), but I may miss some corner cases. Is there a better approach or how in the real language developers test their grammar?

A proof is a logical argument that demonstrates the truth of a claim. There are as many ways to prove something as there are ways of thinking about a problem. A common way to prove things about discrete structures (like grammars) is using mathematical induction. Basically, you show that something is true in base cases - the simplest cases possible - and then show that if it's true for all cases under a certain size, it must therefore be true for cases of the next size.
In our case: suppose we wanted only to prove your grammar didn't generate + at the end of a word. We could do induction on the number of productions used in constructing a string in the language. We would identify all relevant base cases, show the property holds for these strings, and then show that longer strings in the language are constructed in such a way that it is impossible to get a + at the end. Here's an example.
S := S + S | (S) | x
Base case: the shortest string in the language is x, generated as S -> x. It does not end with a +.
Induction hypothesis: assume all strings produced using up to and including k productions do not end with +.
Induction step: we must show strings produced using more than k productions do not end with +. If we apply the rule (S) to any string generated from S, we do not add + so the property holds. If we apply S + S to strings generated from S, the last symbol in S + S is the last symbol of a shorter string (at least 2 symbols shorter) generated by S. By the induction hypothesis, that string did not end in +, so neither does this one. There are no other productions, so no string in the language ends in +. QED

Related

Is Rust's lexical grammar regular, context-free or context-sensitive?

The lexical grammar of most programming languages is fairly non-expressive in order to quickly lex it. I'm not sure what category Rust's lexical grammar belongs to. Most of it seems regular, probably with the exception of raw string literals:
let s = r##"Hi lovely "\" and "#", welcome to Rust"##;
println!("{}", s);
Which prints:
Hi lovely "\" and "#", welcome to Rust
As we can add arbitrarily many #, it seems like it can't be regular, right? But is the grammar at least context-free? Or is there something non-context free about Rust's lexical grammar?
Related: Is Rust's syntactical grammar context-free or context-sensitive?

The raw string literal syntax is not context-free.
If you think of it as a string surrounded by r#k"…"#k (using the superscript k as a count operator), then you might expect it to be context-free:
raw_string_literal
: 'r' delimited_quoted_string
delimited_quoted_string
: quoted_string
| '#' delimited_quoted_string '#'
But that is not actually the correct syntax, because the quoted_string is not allowed to contain "#k although it can contain "#j for any j<k
Excluding the terminating sequence without excluding any other similar sequence of a different length cannot be accomplished with a context-free grammar because it involves three (or more) uses of the k-repetition in a single production, and stack automata can only handle two. (The proof that the grammar is not context-free is surprisingly complicated, so I'm not going to attempt it here for lack of MathJax. The best proof I could come up with uses Ogden's lemma and the uncommonly cited (but highly useful) property that context-free grammars are closed under the application of a finite-state transducer.)
C++ raw string literals are also context-sensitive [or would be if the delimiter length were not limited, see Note 1], and pretty well all whitespace-sensitive languages (like Python and Haskell) are context-sensitive. None of these lexical analysis tasks is particularly complicated so the context-sensitivity is not a huge problem, although most standard scanner generators don't provide as much assistance as one might like. But there it is.
Rust's lexical grammar offers a couple of other complications for a scanner generator. One issue is the double meaning of ', which is used both to create character literals and to mark lifetime variables and loop labels. Apparently it is possible to determine which of these applies by considering the previously recognized token. That could be solved with a lexical scanner which is capable of generating two consecutive tokens from a single pattern, or it could be accomplished with a scannerless parser; the latter solution would be context-free but not regular. (C++'s use of ' as part of numeric literals does not cause the same problem; the C++ tokens can be recognized with regular expressions, because the ' can not be used as the first character of a numeric literal.)
Another slightly context-dependent lexical issue is that the range operator, .., takes precedence over floating point values, so that 2..3 must be lexically analysed as three tokens: 2 .. 3, rather than two floating point numbers 2. .3, which is how it would be analysed in most languages which use the maximal munch rule. Again, this might or might not be considered a deviation from regular expression tokenisation, since it depends on trailing context. But since the lookahead is at most one character, it could certainly be implemented with a DFA.
Postscript
On reflection, I am not sure that it is meaningful to ask about a "lexical grammar". Or, at least, it is ambiguous: the "lexical grammar" might refer to the combined grammar for all of the languages "tokens", or it might refer to the act of separating a sentence into tokens. The latter is really a transducer, not a parser, and suggests the question of whether the language can be tokenised with a finite-state transducer. (The answer, again, is no, because raw strings cannot be recognized by a FSA, or even a PDA.)
Recognizing individual tokens and tokenising an input stream are not necessarily equivalent. It is possible to imagine a language in which the individual tokens are all recognized by regular expressions but an input stream cannot be handled with a finite-state transducer. That will happen if there are two regular expressions T and U such that some string matching T is the longest token which is a strict prefix of an infinite set of strings in U. As a simple (and meaningless) example, take a language with tokens:
a
a*b
Both of these tokens are clearly regular, but the input stream cannot be tokenized with a finite state transducer because it must examine any sequence of as (of any length) before deciding whether to fallback to the first a or to accept the token consisting of all the as and the following b (if present).
Few languages show this pathology (and, as far as I know, Rust is not one of them), but it is technically present in some languages in which keywords are multiword phrases.
Notes
Actually, C++ raw string literals are, in a technical sense, regular (and therefore context free) because their delimiters are limited to strings of maximum length 16 drawn from an alphabet of 88 characters. That means that it is (theoretically) possible to create a regular expression consisting of 13,082,362,351,752,551,144,309,757,252,761 patterns, each matching a different possible raw string delimiter.

How to determine whether given language is regular or not(by just looking at the language)?

Is there any trick to guess if a language is regular by just looking at the language?
In order to choose proof methods, I have to have some hypothesis at first. Do you know any hints/patterns required to reduce time consumption in solving long questions?
For instance, in order not to spend time on pumping lemma, when language is regular and I don't want to construct DFA/grammar.
For example:
1. L={w ε {a,b}*/no of a in (w) < no of b in (w)}
2. L={a^nb^m/n,m>=0}
How to tell which is regular by just looking at the above examples??

In general, when looking at a language, a good rule of thumb for whether the language is regular or not is to think of a program that can read a string and answer the question "is this string in the language?"
To write such a program, do you need to store some arbitrary value in a variable or is the program's state (that is, the combination of all possible variables' values) limited to some finite fixed number of possibilities? If the language can be recognized by a program that only needs a fixed number of variables that can only have a fixed number of values, then you've got a regular language. If not, then not.
Using this, I can see that the first language is not regular, but the second language is. In the first language, I need to remember how many as I've seen, and how many bs. (Or at the very least, I need to keep track of (# of as) - (# of bs), and accept if the string ends while that count is negative). At the same time, there's no limit on the number of as, so this count could go arbitrarily large.
In the second language, I don't care what n and m are at all. So with the second language, my program would just keep track of "have I seen at least one b yet?" to make sure we don't have any a characters that occur after the first b. (So, one variable with only two values - true or false)
So one way to make language 1 into a regular language is to change it to be:
1. L={w ∈ {a,b}*/no of a in (w) < no of b in (w), and no of a in (w) < 100}
Now I don't need to keep track of the number of as that I've seen once I hit 100 (since then I know automatically that the string isn't in the language), and likewise with the number of bs - once I hit 100, I can stop counting because I know that'll be enough unless the number of as is itself too large.
One common case you should watch out for with this is when someone asks you about languages where "number of as is a multiple of 13" or "w ∈ {0,1}* and w is the binary representation of a multiple of 13". With these, it might seem like you need to keep track of the whole number to make the determination, but in fact you don't - in both cases, you only need to keep a variable that can count from 0 to 12. So watch out for "multiple of"-type languages. (And the related "is odd" or "is even" or "is 1 more than a multiple of 13")
Other mathematical properties though - for example, w ∈ {0,1}* and w is the binary representation of a perfect square - will result in non-regular languages.

If Non Terminals in Gene Expression Programming are mono-type functions, how to build complex Programs?

It just seemed to me studying GEP,and especially analyzing Karva expressions, that Non Terminals are most suitable for functions which type is a->a for some type a, in Haskell notation.
Like, with classic examples, Q+-*/ are all functions from 'some' Double to 'a' Double and they just change in arity.
Now, how can one coder use functions of heterogeneous signature in one Karva expressed gene?
Brief Introduction to GEP/Karva
Gene Expression Programming uses dense representations of a population of expressions and applies evolutionary pressure to make better ones to solve a given problem.
Karva notation represents an expression tree as a string, represented in a non-traditional traversal of level-at-a-time, left-to-right - read more here. Using Karva notation, it is simple and quick to combine (or mutate) expressions to create the next generation.
You can parse Karva notation in Haskell as per this answer with explanation of linear time or this answer that's the same code, but with more diagrams and no proof.
Terminals are the constants or variables in a Karva expression, so /+a*-3cb2 (meaning (a+(b*2))/(3-c)) has terminals [a,b,2,3,c]. A Karva expression with no terminals is thus a function of some arity.
My Question is then more related to how one would use different types of functions without breaking the gene.
What if one wants to use a Non Terminal like a > function? One can count on the fact that, for example, it can compare Doubles. But the result, in a strongly typed Language, would be a Bool. Now, assuming that the Non terminal encoding for > is interspersed in the gene, the parse of the k-expression would result in invalid code, because anything calling it would expect a Double.
One can then think of manually and silently sneak in a cast, as is done by Ms. Ferreira in her book, where she converts Bools into Ints like 0 and 1 for False and True.
Si it seems to me that k-expressed genes are for Non Terminals of any arity, that share the property of taking values of one type a, returning a type a.
In the end, has anyone any idea about how to overcome this?
I already now that one can use homeotic genes, providing some glue between different Sub Expression Trees, but that, IMHO, is somewhat rigid, because, again, you need to know in advance returned types.

Mathematica: what is symbolic programming?

I am a big fan of Stephen Wolfram, but he is definitely one not shy of tooting his own horn. In many references, he extols Mathematica as a different symbolic programming paradigm. I am not a Mathematica user.
My questions are: what is this symbolic programming? And how does it compare to functional languages (such as Haskell)?

When I hear the phrase "symbolic programming", LISP, Prolog and (yes) Mathematica immediately leap to mind. I would characterize a symbolic programming environment as one in which the expressions used to represent program text also happen to be the primary data structure. As a result, it becomes very easy to build abstractions upon abstractions since data can easily be transformed into code and vice versa.
Mathematica exploits this capability heavily. Even more heavily than LISP and Prolog (IMHO).
As an example of symbolic programming, consider the following sequence of events. I have a CSV file that looks like this:
r,1,2
g,3,4
I read that file in:
Import["somefile.csv"]
--> {{r,1,2},{g,3,4}}
Is the result data or code? It is both. It is the data that results from reading the file, but it also happens to be the expression that will construct that data. As code goes, however, this expression is inert since the result of evaluating it is simply itself.
So now I apply a transformation to the result:
% /. {c_, x_, y_} :> {c, Disk[{x, y}]}
--> {{r,Disk[{1,2}]},{g,Disk[{3,4}]}}
Without dwelling on the details, all that has happened is that Disk[{...}] has been wrapped around the last two numbers from each input line. The result is still data/code, but still inert. Another transformation:
% /. {"r" -> Red, "g" -> Green}
--> {{Red,Disk[{1,2}]},{Green,Disk[{3,4}]}}
Yes, still inert. However, by a remarkable coincidence this last result just happens to be a list of valid directives in Mathematica's built-in domain-specific language for graphics. One last transformation, and things start to happen:
% /. x_ :> Graphics[x]
--> Graphics[{{Red,Disk[{1,2}]},{Green,Disk[{3,4}]}}]
Actually, you would not see that last result. In an epic display of syntactic sugar, Mathematica would show this picture of red and green circles:
But the fun doesn't stop there. Underneath all that syntactic sugar we still have a symbolic expression. I can apply another transformation rule:
% /. Red -> Black
Presto! The red circle became black.
It is this kind of "symbol pushing" that characterizes symbolic programming. A great majority of Mathematica programming is of this nature.
Functional vs. Symbolic
I won't address the differences between symbolic and functional programming in detail, but I will contribute a few remarks.
One could view symbolic programming as an answer to the question: "What would happen if I tried to model everything using only expression transformations?" Functional programming, by contrast, can been seen as an answer to: "What would happen if I tried to model everything using only functions?" Just like symbolic programming, functional programming makes it easy to quickly build up layers of abstractions. The example I gave here could be easily be reproduced in, say, Haskell using a functional reactive animation approach. Functional programming is all about function composition, higher level functions, combinators -- all the nifty things that you can do with functions.
Mathematica is clearly optimized for symbolic programming. It is possible to write code in functional style, but the functional features in Mathematica are really just a thin veneer over transformations (and a leaky abstraction at that, see the footnote below).
Haskell is clearly optimized for functional programming. It is possible to write code in symbolic style, but I would quibble that the syntactic representation of programs and data are quite distinct, making the experience suboptimal.
Concluding Remarks
In conclusion, I advocate that there is a distinction between functional programming (as epitomized by Haskell) and symbolic programming (as epitomized by Mathematica). I think that if one studies both, then one will learn substantially more than studying just one -- the ultimate test of distinctness.
Leaky Functional Abstraction in Mathematica?
Yup, leaky. Try this, for example:
f[x_] := g[Function[a, x]];
g[fn_] := Module[{h}, h[a_] := fn[a]; h[0]];
f[999]
Duly reported to, and acknowledged by, WRI. The response: avoid the use of Function[var, body] (Function[body] is okay).

You can think of Mathematica's symbolic programming as a search-and-replace system where you program by specifying search-and-replace rules.
For instance you could specify the following rule
area := Pi*radius^2;
Next time you use area, it'll be replaced with Pi*radius^2. Now, suppose you define new rule
radius:=5
Now, whenever you use radius, it'll get rewritten into 5. If you evaluate area it'll get rewritten into Pi*radius^2 which triggers rewriting rule for radius and you'll get Pi*5^2 as an intermediate result. This new form will trigger a built-in rewriting rule for ^ operation so the expression will get further rewritten into Pi*25. At this point rewriting stops because there are no applicable rules.
You can emulate functional programming by using your replacement rules as function. For instance, if you want to define a function that adds, you could do
add[a_,b_]:=a+b
Now add[x,y] gets rewritten into x+y. If you want add to only apply for numeric a,b, you could instead do
add[a_?NumericQ, b_?NumericQ] := a + b
Now, add[2,3] gets rewritten into 2+3 using your rule and then into 5 using built-in rule for +, whereas add[test1,test2] remains unchanged.
Here's an example of an interactive replacement rule
a := ChoiceDialog["Pick one", {1, 2, 3, 4}]
a+1
Here, a gets replaced with ChoiceDialog, which then gets replaced with the number the user chose on the dialog that popped up, which makes both quantities numeric and triggers replacement rule for +. Here, ChoiceDialog as a built-in replacement rule along the lines of "replace ChoiceDialog[some stuff] with the value of button the user clicked".
Rules can be defined using conditions which themselves need to go through rule-rewriting in order to produce True or False. For instance suppose you invented a new equation solving method, but you think it only works when the final result of your method is positive. You could do the following rule
solve[x + 5 == b_] := (result = b - 5; result /; result > 0)
Here, solve[x+5==20] gets replaced with 15, but solve[x + 5 == -20] is unchanged because there's no rule that applies. The condition that prevents this rule from applying is /;result>0. Evaluator essentially looks the potential output of rule application to decide whether to go ahead with it.
Mathematica's evaluator greedily rewrites every pattern with one of the rules that apply for that symbol. Sometimes you want to have finer control, and in such case you could define your own rules and apply them manually like this
myrules={area->Pi radius^2,radius->5}
area//.myrules
This will apply rules defined in myrules until result stops changing. This is pretty similar to the default evaluator, but now you could have several sets of rules and apply them selectively. A more advanced example shows how to make a Prolog-like evaluator that searches over sequences of rule applications.
One drawback of current Mathematica version comes up when you need to use Mathematica's default evaluator (to make use of Integrate, Solve, etc) and want to change default sequence of evaluation. That is possible but complicated, and I like to think that some future implementation of symbolic programming will have a more elegant way of controlling evaluation sequence

As others here already mentioned, Mathematica does a lot of term rewriting. Maybe Haskell isn't the best comparison though, but Pure is a nice functional term-rewriting language (that should feel familiar to people with a Haskell background). Maybe reading their Wiki page on term rewriting will clear up a few things for you:
http://code.google.com/p/pure-lang/wiki/Rewriting

Mathematica is using term rewriting heavily. The language provides special syntax for various forms of rewriting, special support for rules and strategies. The paradigm is not that "new" and of course it's not unique, but they're definitely on a bleeding edge of this "symbolic programming" thing, alongside with the other strong players such as Axiom.
As for comparison to Haskell, well, you could do rewriting there, with a bit of help from scrap your boilerplate library, but it's not nearly as easy as in a dynamically typed Mathematica.

Symbolic shouldn't be contrasted with functional, it should be contrasted with numerical programming. Consider as an example MatLab vs Mathematica. Suppose I want the characteristic polynomial of a matrix. If I wanted to do that in Mathematica, I could do get an identity matrix (I) and the matrix (A) itself into Mathematica, then do this:
Det[A-lambda*I]
And I would get the characteristic polynomial (never mind that there's probably a characteristic polynomial function), on the other hand, if I was in MatLab I couldn't do it with base MatLab because base MatLab (never mind that there's probably a characteristic polynomial function) is only good at calculating finite-precision numbers, not things where there are random lambdas (our symbol) in there. What you'd have to do is buy the add-on Symbolab, and then define lambda as its own line of code and then write this out (wherein it would convert your A matrix to a matrix of rational numbers rather than finite precision decimals), and while the performance difference would probably be unnoticeable for a small case like this, it would probably do it much slower than Mathematica in terms of relative speed.
So that's the difference, symbolic languages are interested in doing calculations with perfect accuracy (often using rational numbers as opposed to numerical) and numerical programming languages on the other hand are very good at the vast majority of calculations you would need to do and they tend to be faster at the numerical operations they're meant for (MatLab is nearly unmatched in this regard for higher level languages - excluding C++, etc) and a piss poor at symbolic operations.

Can a language have Lisp's powerful macros without the parentheses?

Can a language have Lisp's powerful macros without the parentheses?

Sure, the question is whether the macro is convenient to use and how powerful they are.
Let's first look how Lisp is slightly different.
Lisp syntax is based on data, not text
Lisp has a two-stage syntax.
A) first there is the data syntax for s-expressions
examples:
(mary called tim to tell him the price of the book)
(sin ( x ) + cos ( x ))
s-expressions are atoms, lists of atoms or lists.
B) second there is the Lisp language syntax on top of s-expressions.
Not every s-expression is a valid Lisp program.
(3 + 4)
is not a valid Lisp program, because Lisp uses prefix notation.
(+ 3 4)
is a valid Lisp program. The first element is a function - here the function +.
S-expressions are data
The interesting part is now that s-expressions can be read and then Lisp uses the normal data structures (numbers, symbols, lists, strings) to represent them.
Most other programming languages don't have a primitive representation for internalized source - other than strings.
Note that s-expressions here are not representing an AST (Abstract Syntax Tree). It's more like a hierarchical token tree coming out of a lexer phase. A lexer identifies the lexical elements.
The internalized source code now makes it easy to calculate with code, because the usual functions to manipulate lists can be applied.
Simple code manipulation with list functions
Let's look at the invalid Lisp code:
(3 + 4)
The program
(defun convert (code)
(list (second code) (first code) (third code)))
(convert '(3 + 4)) -> (+ 3 4)
has converted an infix expression into the valid Lisp prefix expression. We can evaluate it then.
(eval (convert '(3 + 4))) -> 7
EVAL evaluates the converted source code. eval takes as input an s-expression, here a list (+ 3 4).
How to calculate with code?
Programming languages now have at least three choices to make source calculations possible:
base the source code transformations on string transformations
use a similar primitive data structure like Lisp. A more complex variant of this is a syntax based on XML. One could then transform XML expressions. There are other possible external formats combined with internalized data.
use a real syntax description format and represent the source code internalized as a syntax tree using data structures that represent syntactic categories. -> use an AST.
For all these approaches you will find programming languages. Lisp is more or less in camp 2. The consequence: it is theoretically not really satisfying and makes it impossible to statically parse source code (if the code transformations are based on arbitrary Lisp functions). The Lisp community struggles with this for decades (see for example the myriad of approaches that the Scheme community has tried). Fortunately it is relatively easy to use, compared to some of the alternatives and quite powerful. Variant 1 is less elegant. Variant 3 leads to a lot complexity in simple AND complex transformations. It usually also means that the expression was already parsed with respect to a specific language grammar.
Another problem is HOW to transform the code. One approach would be based on transformation rules (like in some Scheme macro variants). Another approach would be a special transformation language (like a template language which can do arbitrary computations). The Lisp approach is to use Lisp itself. That makes it possible to write arbitrary transformations using the full Lisp language. In Lisp there is not a separate parsing stage, but at any time expressions can be read, transformed and evaluated - because these functions are available to the user.
Lisp is kind of a local maximum of simplicity for code transformations.
Other frontend syntax
Also note that the function read reads s-expressions to internal data. In Lisp one could either use a different reader for a different external syntax or reuse the Lisp built-in reader and reprogram it using the read macro mechanism - this mechanism makes it possible to extend or change the s-expression syntax. There are examples for both approaches to provide a different external syntax in Lisp.
For example there are Lisp variants which have a more conventional syntax, where code gets parsed into s-expressions.
Why is the s-expression-based syntax popular among Lisp programmers?
The current Lisp syntax is popular among Lisp programmers for two reasons:
1) the data is code is data idea makes it easy to write all kinds of code transformations based on the internalized data. There is also a relatively direct way from reading code, over manipulating code to printing code. The usual development tools can be used.
2) the text editor can be programmed in a straight forward way to manipulate s-expressions. That makes basic code and data transformations in the editor relatively easy.
Originally Lisp was thought to have a different, more conventional syntax. There were several attempts later to switch to other syntax variants - but for some reasons it either failed or spawned different languages.

Absolutely. It's just a couple orders of magnitude more complex, if you have to deal with a complex grammar. As Peter Norvig noted:
Python does have access to the
abstract syntax tree of programs, but
this is not for the faint of heart. On
the plus side, the modules are easy to
understand, and with five minutes and
five lines of code I was able to get
this:
>>> parse("2 + 2")
['eval_input', ['testlist', ['test', ['and_test', ['not_test', ['comparison',
['expr', ['xor_expr', ['and_expr', ['shift_expr', ['arith_expr', ['term',
['factor', ['power', ['atom', [2, '2']]]]], [14, '+'], ['term', ['factor',
['power', ['atom', [2, '2']]]]]]]]]]]]]]], [4, ''], [0, '']]
This was rather a disapointment to me. The Lisp parse of the equivalent expression is (+ 2 2). It seems that only a real expert would want to manipulate Python parse trees, whereas Lisp parse trees are simple for anyone to use. It is still possible to create something similar to macros in Python by concatenating strings, but it is not integrated with the rest of the language, and so in practice is not done.
Since I'm not a super-genius (or even a Peter Norvig), I'll stick with (+ 2 2).

Here's a shorter version of Rainer's answer:
In order to have lisp-style macros, you need a way of representing source-code in data structures. In most languages, the only "source code data structure" is a string, which doesn't have nearly enough structure to allow you to do real macros on. Some languages offer a real data structure, but it's too complex, like Python, so that writing real macros is stupidly complicated and not really worth it.
Lisp's lists and parentheses hit the sweet spot in the middle. Just enough structure to make it easy to handle, but not too much so you drown in complexity. As a bonus, when you nest lists you get a tree, which happens to be precisely the structure that programming languages naturally adopt (nearly all programming languages are first parsed into an "abstract syntax tree", or AST, before being actually interpreted/compiled).
Basically, programming Lisp is writing an AST directly, rather than writing some other language that then gets turned into an AST by the computer. You could possibly forgo the parens, but you'd just need some other way to group things into a list/tree. You probably wouldn't gain much from doing so.

Parentheses are irrelevant to macros. It's just Lisp's way of doing things.
For example, Prolog has a very powerful macros mechanism called "term expansion". Basically, whenever Prolog reads a term T, if tries a special rule term_expansion(T, R). If it is successful, the content of R is interpreted instead of T.

Not to mention the Dylan language, which has a pretty powerful syntactic macro system, which features (among other things) referential transparency, while being an infix (Algol-style) language.

Yes. Parentheses in Lisp are used in the classic way, as a grouping mechanism. Indentation is an alternative way to express groups. E.g. the following structures are equivalent:
A ((B C) D)
and
A
B
C
D

Have a look at Sweet-expressions. Wheeler makes a very good case that the reason things like infix notation have not worked before is that typical notation also tries to add precedence, which then adds complexity, which causes difficulties in writing macros.
For this reason, he proposes infix syntax like {1 + 2 + 3} and {1 + {2 * 3}} (note the spaces between symbols), that are translated to (+ 1 2) and (+ 1 (* 2 3)) respectively. He adds that if someone writes {1 + 2 * 3}, it should become (nfx 1 + 2 * 3), which could be captured, if you really want to provide precedence, but would, as a default, be an error.
He also suggests that indentation should be significant, proposes that functions could be called as fn(A B C) as well as (fn A B C), would like data[A] to translate to (bracketaccess data A), and that the entire system should be compatible with s-expressions.
Overall, it's an interesting set of proposals I'd like to experiment with extensively. (But don't tell anyone at comp.lang.lisp: they'll burn you at the stake for your curiosity :-).

Code rewriting in Tcl in a manner recognizably similar to Lisp macros is a common technique. For example, this is (trivial) code that makes it easier to write procedures that always import a certain set of global variables:
proc gproc {name arguments body} {
set realbody "global foo bar boo;$body"
uplevel 1 [list proc $name $arguments $realbody]
}
With that, all procedures declared with gproc xyz rather than proc xyz will have access to the foo, bar and boo globals. The whole key is that uplevel takes a command and evaluates it in the caller's context, and list is (among other things) an ideal constructor for substitution-safe code fragments.

Erlang's parse transforms are similar in power to Lisp macros, though they are much trickier to write and use (they are applied to the entire source file, rather than being invoked on demand).
Lisp itself had a brief dalliance with non-parenthesised syntax in the form of M-expressions. It didn't take with the community, though variants of the idea found their way into modern Lisps, so you get Lisp's powerful macros without the parentheses ... in Lisp!

Yes, you can definitely have Lisp macros without all the parentheses.
Take a look at "sweet-expressions", which provides a set of additional abbreviations for traditional s-expressions. They add indentation, a way to do infix, and traditional function calls like f(x), but in a way that is backwards-compatible (you can freely mix well-formatted s-expressions and sweet-expressions), generic, and homoiconic.
Sweet-expressions were developed on http://readable.sourceforge.net and there is a sample implementation.
For Scheme there is a SRFI for sweet-expressions, SRFI-110: http://srfi.schemers.org/srfi-110/

No, it's not necessary. Anything that gives you some sort of access to a parse tree would be enough to allow you to manipulate the macro body in hte same way as is done in Common Lisp. However, as the manipulation of the AST in lisp is identical to the manipulation of lists (something that is bordering on easy in the lisp family), it's possibly not nearly as natural without having the "parse tree" and "written form" be essentially the same.

I think this was not mentioned.
C++ templates are Turing-complete and perform processing at compile-time.
There is the well-known expression templates mechanism that allow transformations,
not from arbitrary code, but at least, from the subset of c++ operators.
So imagine you have 3 vectors of 1000 elements and you must perform:
(A + B + C)[0]
You can capture this tree in a expression template and arbitrarily manipulate it
at compile-time.
With this tree, at compile time, you can transform the expression.
For example, if that expression means A[0] + B[0] + C[0] for your domain, you could
avoid the normal c++ processing which would be:
Add A and B, adding 1000 elements.
Create a temporary for the result, and add with the 1000 elements of C.
Index the result to get the first element.
And replace with another transformed expression template tree that does:
Capture A[0]
Capture B[0]
Capture C[0]
Add all 3 results together in the result to return with += avoiding temporaries.
It is not better than lisp, I think, but it is still very powerful.

Yes, it is certainly possible. Especially if it is still a Lisp under the bonnet:
http://www.meta-alternative.net/pfront.pdf
http://www.meta-alternative.net/pfdoc.pdf

Boo has a nice "quoted" macro syntax that uses [| |] as delimiters, and has certain substitutions which are actually verified syntactically by the compiler pipeline using $variables. While simple and relatively painless to use, it's much more complicated to implement on the compiler side than s-expressions. Boo's solution may have a few limitations that haven't affected my own code. There's also an alternate syntax that reads more like ordinary OO code, but that falls into the "not for the faint of heart" category like dealing with Ruby or Python parse trees.

Javascript's template strings offer yet another approach to this sort of thing. For instance, Mark S. Miller's quasiParserGenerator implements a grammar syntax for parsers.

Go ahead and enter the Elixir programming language.
Elixir is a functional programming language that feels like Lisp with respect to macros, but it is on Ruby's clothes, and runs on top of the Erlang VM.
For those who do not like the parenthesis, but wish their language has powerful macros, Elixir is a great choice.

You can write macros in R (it have more like Algol Syntax) that have notion of delayed expression like in LISP macros. You can call substitute() or quote() to not evaluate the delayed expression but get actual expression and traverse its source code like in LISP. Even structure of the expression source code is like in LISP. Operators are first item in list. e.g.: input$foo which is getting property foo from list input as expression is written as ['$', 'input', 'foo'] just like in LISP.
You can check the ebook Metaprogramming in R that also show how to create Macros in R (not something you would normally do but it's possible). It's based on Article from 2001 Programmer’s Niche: Macros in R that explain how to write LIPS macros in R.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string