Is the metafont language a context free grammar?

Is the metafont language a context free grammar? - metafont

Is the metafont language a context free grammar?
I am new to lexer/parser/compiler writing, and am trying to get a general idea how hard it would be to write a metafont compiler.

It is not. The Metafont language is basicly a macro expander operating over a stream of tokens. It's a ingenious but very complex design, that allows the syntax to be changeable at runtime. Tokens are not fixed, like they are in most languages, but they are variable, their meaning can be changed at runtime.
Metafont makes a distinction between tokens used as part of variable names (named tags) and as syntactic elements (named sparks). When the parser encounters a tag, it first looks up the meaning of the tag before doing any evaluation. This has the consequence that the meaning of a stream of tokens may change completely during the execution of a metafont program!
Take the following example in mf:
$ mf
This is METAFONT, Version 2.718281 (TeX Live 2013/Debian)
**expr
(/usr/share/texlive/texmf-dist/metafont/base/expr.mf
gimme an expr: 1plus5
>> plus5
gimme an expr: begingroup let plus=+ endgroup
>> vacuous
gimme an expr: 1plus5
>> 6
The token stream 1plus2 parses as 1*plus2, where
plus2 is a variable, since plus is defined as a tag.
In the second expression we redefined plus as a spark with
the same meaning as +. (begingroup is needed to enclose
a command in an expression). When we parse 1plus5 again, it
stands for 1 + 5, so evaluates to 6!
Macro expansion in metafont is very complex, it can expand variables,
but sparks can also contain macros, with different rules for expansion.
There are also expanders for infix functions of different fixity.
The best way to learn about this is to read the metafontbook.

Related

By means of what language design principle is a string instantiated either as a string or as a variable in GNU Octave?

Having an Octave script (in the sense of dynamic languages here) move.m defining function move(direction), it can be invoked from another script (alternatively from the command line) in different ways: move left, move('left') or move(left). While the first two will instantiate direction with the string 'left', the last one will consider left as a variable.
The question is about the formal principle in language definition behind this. I understand that in the first mode, the script is invoked as a command, considering that the rest of the command line is just data, not variables (pretty much as in a Linux prompt); while in the last two it is called as a function, interpreting what follows (between parenthesis) as either data or variables. If this is a general design criteria among scripting languages, what is the principle behind it?

To answer your question, yes, this is by design, and it's syntactic sugar offered by matlab (and hence octave) for running certain functions that expect only string arguments. Here is the relevant section in the matlab manual: https://uk.mathworks.com/help/matlab/matlab_prog/command-vs-function-syntax.html
I should clarify some misconceptions though. First, it's not "data" vs "variables". Any argument supplied in command syntax is simply interpreted as a string. So these two are equivalent:
fprintf("1")
fprintf 1
I.e., in fprintf 1, the 1 is not numeric data. It's a string.
Secondly, not all m files are "scripts". You calling your m file a script caused me some confusion. Your particular file contains a function definition and nothing else, so it's a function, 100%.
The reason this is important here, is that all functions can be called either via functional syntax or command syntax (as long as it makes sense in terms of the expected arguments being strings), whereas scripts take no arguments, so there is no functional / command syntax at play, and if you were passing 'arguments' to a script you're doing something wrong.

I understand that in the first mode, the script is invoked as a command [...]
As far as Octave goes, you are better off forgetting about that distinction. I'm not sure if a "command" ever existed but it certainly does not exist now. The command syntax is just syntactic sugar in Octave. Makes it simpler for interactive plot adjustment since it's functions arguments mainly take strings.

How to run several operations back to back in a lambda in Haskell?

I'm teaching myself Haskell, and I am having difficulty understanding how I might pipeline a number of operations in the body of a lambda function without using a do block. Take the following for example:
par = (\c->
c + 1
c + 2
)
In Ruby and other imperative languages, I am used to being able to run serveral expressions in a lambda block by adding a linebreak between them. In Haskell, I looked for a similar construct and found that Haskell doesn't respect linebreaks in pure expressions.
So, syntactically, if I wanted to run the second line after the first without using a do block, what could I do?

Unless it's side-effectful code, having multiple independent expressions with no side effects makes no sense in Haskell (nor any other languages for that matter). You can use let to store the value of an expression in a name and then use it in the subsequent expression (the returm value of the lambda), or a do-block, but anything else is meaningless in Haskell and thus not a valid program.
In other words: it makes little sense to ask this question solely about syntax as Haskell's syntax is all about its semantics.

A language in which everything compiles

I'm trying to do some research for a new project, and I need to create objects dynamically from random data.
For this to work, I need a language / compiler that doesn't have problems with weird uncompilable code lying around.
Basically, I need the random code to compile (or be interpreted) as much as possible - Meaning that the uncompilable parts will be ignored, and only the compilable parts will create the objects (which could be ran).
Object Oriented-ness is not a must, but is a very strong advantage.
I thought of ASM, but it's very messy, and I'd probably need a more readable code
Thanks!

It sounds like you're doing something very much like genetic programming; even if you aren't, GP has to solve some of the same problems—using randomness to generate valid programs. The approach to this that is typically used is to work with a syntax tree: rather than storing x + y * 3 - 2, you store something like the following:
Then, instead of randomly changing the syntax, one can randomly change nodes in the tree instead. And if x should randomly change to, say, +, you can statically know that this means you need to insert two children (or not, depending on how you define +).
A good choice for a language to work with for this would be any Lisp dialect. In a Lisp, the above program would be written (- (+ x (* y 3)) 2), which is just a linearization of the syntax tree using parentheses to show depth. And in fact, Lisps expose this feature: you can just as easily work with the object '(- (+ x (* y 3)) 2) (note the leading quote). This is a three-element list, whose first element is -, second element is another list, and third element is 2. And, though you might or might not want it for your particular application, there's an eval function, such that (eval '(- (+ x (* y 3)) 2)) will take in the given list, treat it as a Lisp syntax tree/program, and evaluate it. This is what makes Lisps so attractive for doing this sort of work; Lisp syntax is basically a reification of the syntax-tree, and if you operate at the syntax-tree level, you can work on code as though it was a value. Lisp won't help you read /dev/random as a program directly, but with a little interpretation layered on top, you should be able to get what you want.
I should also mention, though I don't know anything about it (not that I know much about ordinary genetic programming) the existence of linear genetic programming. This is sort of like the assembly model that you mentioned—a linear stream of very, very simple instructions. The advantage here would seem to be that if you are working with /dev/random or something like it, the amount of interpretation needed is very small; the disadvantage would be, as you mentioned, the low-level nature of the code.

I'm not sure if this is what you're looking for, but any programming language can be made to function this way. For any programming language P, define the language Palways as follows:
If p is a valid program in P, then p is a valid program in Palways whose meaning is the same as its meaning in P.
If p is not a valid program in P, then p is a valid program in Palways whose meaning is the same as a program that immediately terminates.
For example, I could make the language C++always so that this program:
#include <iostream>
using namespace std;
int main() {
cout << "Hello, world!" << endl;
}
would compile as "Hello, world!", while this program:
Hahaha! This isn't legal C++ code!
Would be a legal program that just does absolutely nothing.
To solve your original problem, just take any OOP language like Java, Smalltalk, etc. and construct the appropriate Javaalways, Smalltalkalways, etc. language from it. Again, I'm not sure if this is at all what you're looking for, but it could be done very easily.
Alternatively, consider finding a grammar for any OOP language and then using that grammar to produce random syntactically valid programs. You could then filter those programs down by using the Palways programming language for that language to eliminate syntactically but not semantically valid programs.

Divide the ASCII byte values into 9 classes (division modulo 9 would help). Then assign then to Brainfuck codewords (see http://en.wikipedia.org/wiki/Brainfuck). Then interpret as Brainfuck.
There you go, any sequence of ASCII characters is a program. Not that it's going to do anything sensible... This approach has a much better chance, compared to templatetypedef's answer, to get a nontrivial program from a random byte sequence.

Text Editors
You could try feeding random character strings to an editor like Emacs or VI. Many (most?) characters will perform an editing action but some will do nothing (other than beep, perhaps). You would have to ensure that the random code mutator never generates the character sequence that exits the editor. However, this experience would be much like programming a Turing machine -- the code is not too readable.
Mathematica
In Mathematica, undefined symbols and other expressions evaluate to themselves, without error. So, that language might be a viable choice if you can arrange for the random code mutator to always generate well-formed expressions. This would be readily achievable since the basic Mathematica syntax is trivial, making it easy to operate on syntactic units rather than at the character level. It would be even easier if the mutator were written in Mathematica itself since expression-munging is Mathematica's forte. You could define a mini-language of valid operations within a Mathematica package that does not import the system-defined symbols. This would allow you to generate well-formed expressions to your heart's content without fear of generating a dangerous expression, like DeleteFile[FileNames["*.*", "/", Infinity]].

I believe Common Lisp should suit your needs. I always have some code in my SLIME/Emacs session that wouldn't compile. You can always tweak things, redefine functions in run-time. It is actually very good for prototyping.
A few years ago it took me quite a while to learn. But nowadays we have quicklisp and everything is so much easier.
Here I describe my development environment:
Install lisp on my linux machine
PS: I want to give an example, where Common Lisp was useful for me:
Up to maybe 2004 I used to write small programs in C (the keep it simple Unix way).
The last 3 years I had to get lots of different hardware running. Motorized stages, scientific cameras, IO cards.
The cameras turned out to be quite annoying. Usually you have to cool them down to -50 degree celsius or so and (in some SDKs) they don't like it when you close them. But this
is exactly how my C development cycle worked: write (30s), compile (1s), run (0.1s), repeat.
Eventually I decided to just use Common Lisp. Often it is straight forward to define the foreign function interfaces to talk to the SDKs and I can do this without ever leaving the running Lisp image. I start the editor in the morning define the open-device function, to talk to the device and after 3 hours I have enough of the functions implemented to set gain, temperature, region of interest and obtain the video.
Then I can often put the SDK manual away and just use the camera.
I used the same interactive programming approach when I have to parse some webpage or some weird XML.

Can a language have Lisp's powerful macros without the parentheses?

Can a language have Lisp's powerful macros without the parentheses?

Sure, the question is whether the macro is convenient to use and how powerful they are.
Let's first look how Lisp is slightly different.
Lisp syntax is based on data, not text
Lisp has a two-stage syntax.
A) first there is the data syntax for s-expressions
examples:
(mary called tim to tell him the price of the book)
(sin ( x ) + cos ( x ))
s-expressions are atoms, lists of atoms or lists.
B) second there is the Lisp language syntax on top of s-expressions.
Not every s-expression is a valid Lisp program.
(3 + 4)
is not a valid Lisp program, because Lisp uses prefix notation.
(+ 3 4)
is a valid Lisp program. The first element is a function - here the function +.
S-expressions are data
The interesting part is now that s-expressions can be read and then Lisp uses the normal data structures (numbers, symbols, lists, strings) to represent them.
Most other programming languages don't have a primitive representation for internalized source - other than strings.
Note that s-expressions here are not representing an AST (Abstract Syntax Tree). It's more like a hierarchical token tree coming out of a lexer phase. A lexer identifies the lexical elements.
The internalized source code now makes it easy to calculate with code, because the usual functions to manipulate lists can be applied.
Simple code manipulation with list functions
Let's look at the invalid Lisp code:
(3 + 4)
The program
(defun convert (code)
(list (second code) (first code) (third code)))
(convert '(3 + 4)) -> (+ 3 4)
has converted an infix expression into the valid Lisp prefix expression. We can evaluate it then.
(eval (convert '(3 + 4))) -> 7
EVAL evaluates the converted source code. eval takes as input an s-expression, here a list (+ 3 4).
How to calculate with code?
Programming languages now have at least three choices to make source calculations possible:
base the source code transformations on string transformations
use a similar primitive data structure like Lisp. A more complex variant of this is a syntax based on XML. One could then transform XML expressions. There are other possible external formats combined with internalized data.
use a real syntax description format and represent the source code internalized as a syntax tree using data structures that represent syntactic categories. -> use an AST.
For all these approaches you will find programming languages. Lisp is more or less in camp 2. The consequence: it is theoretically not really satisfying and makes it impossible to statically parse source code (if the code transformations are based on arbitrary Lisp functions). The Lisp community struggles with this for decades (see for example the myriad of approaches that the Scheme community has tried). Fortunately it is relatively easy to use, compared to some of the alternatives and quite powerful. Variant 1 is less elegant. Variant 3 leads to a lot complexity in simple AND complex transformations. It usually also means that the expression was already parsed with respect to a specific language grammar.
Another problem is HOW to transform the code. One approach would be based on transformation rules (like in some Scheme macro variants). Another approach would be a special transformation language (like a template language which can do arbitrary computations). The Lisp approach is to use Lisp itself. That makes it possible to write arbitrary transformations using the full Lisp language. In Lisp there is not a separate parsing stage, but at any time expressions can be read, transformed and evaluated - because these functions are available to the user.
Lisp is kind of a local maximum of simplicity for code transformations.
Other frontend syntax
Also note that the function read reads s-expressions to internal data. In Lisp one could either use a different reader for a different external syntax or reuse the Lisp built-in reader and reprogram it using the read macro mechanism - this mechanism makes it possible to extend or change the s-expression syntax. There are examples for both approaches to provide a different external syntax in Lisp.
For example there are Lisp variants which have a more conventional syntax, where code gets parsed into s-expressions.
Why is the s-expression-based syntax popular among Lisp programmers?
The current Lisp syntax is popular among Lisp programmers for two reasons:
1) the data is code is data idea makes it easy to write all kinds of code transformations based on the internalized data. There is also a relatively direct way from reading code, over manipulating code to printing code. The usual development tools can be used.
2) the text editor can be programmed in a straight forward way to manipulate s-expressions. That makes basic code and data transformations in the editor relatively easy.
Originally Lisp was thought to have a different, more conventional syntax. There were several attempts later to switch to other syntax variants - but for some reasons it either failed or spawned different languages.

Absolutely. It's just a couple orders of magnitude more complex, if you have to deal with a complex grammar. As Peter Norvig noted:
Python does have access to the
abstract syntax tree of programs, but
this is not for the faint of heart. On
the plus side, the modules are easy to
understand, and with five minutes and
five lines of code I was able to get
this:
>>> parse("2 + 2")
['eval_input', ['testlist', ['test', ['and_test', ['not_test', ['comparison',
['expr', ['xor_expr', ['and_expr', ['shift_expr', ['arith_expr', ['term',
['factor', ['power', ['atom', [2, '2']]]]], [14, '+'], ['term', ['factor',
['power', ['atom', [2, '2']]]]]]]]]]]]]]], [4, ''], [0, '']]
This was rather a disapointment to me. The Lisp parse of the equivalent expression is (+ 2 2). It seems that only a real expert would want to manipulate Python parse trees, whereas Lisp parse trees are simple for anyone to use. It is still possible to create something similar to macros in Python by concatenating strings, but it is not integrated with the rest of the language, and so in practice is not done.
Since I'm not a super-genius (or even a Peter Norvig), I'll stick with (+ 2 2).

Here's a shorter version of Rainer's answer:
In order to have lisp-style macros, you need a way of representing source-code in data structures. In most languages, the only "source code data structure" is a string, which doesn't have nearly enough structure to allow you to do real macros on. Some languages offer a real data structure, but it's too complex, like Python, so that writing real macros is stupidly complicated and not really worth it.
Lisp's lists and parentheses hit the sweet spot in the middle. Just enough structure to make it easy to handle, but not too much so you drown in complexity. As a bonus, when you nest lists you get a tree, which happens to be precisely the structure that programming languages naturally adopt (nearly all programming languages are first parsed into an "abstract syntax tree", or AST, before being actually interpreted/compiled).
Basically, programming Lisp is writing an AST directly, rather than writing some other language that then gets turned into an AST by the computer. You could possibly forgo the parens, but you'd just need some other way to group things into a list/tree. You probably wouldn't gain much from doing so.

Parentheses are irrelevant to macros. It's just Lisp's way of doing things.
For example, Prolog has a very powerful macros mechanism called "term expansion". Basically, whenever Prolog reads a term T, if tries a special rule term_expansion(T, R). If it is successful, the content of R is interpreted instead of T.

Not to mention the Dylan language, which has a pretty powerful syntactic macro system, which features (among other things) referential transparency, while being an infix (Algol-style) language.

Yes. Parentheses in Lisp are used in the classic way, as a grouping mechanism. Indentation is an alternative way to express groups. E.g. the following structures are equivalent:
A ((B C) D)
and
A
B
C
D

Have a look at Sweet-expressions. Wheeler makes a very good case that the reason things like infix notation have not worked before is that typical notation also tries to add precedence, which then adds complexity, which causes difficulties in writing macros.
For this reason, he proposes infix syntax like {1 + 2 + 3} and {1 + {2 * 3}} (note the spaces between symbols), that are translated to (+ 1 2) and (+ 1 (* 2 3)) respectively. He adds that if someone writes {1 + 2 * 3}, it should become (nfx 1 + 2 * 3), which could be captured, if you really want to provide precedence, but would, as a default, be an error.
He also suggests that indentation should be significant, proposes that functions could be called as fn(A B C) as well as (fn A B C), would like data[A] to translate to (bracketaccess data A), and that the entire system should be compatible with s-expressions.
Overall, it's an interesting set of proposals I'd like to experiment with extensively. (But don't tell anyone at comp.lang.lisp: they'll burn you at the stake for your curiosity :-).

Code rewriting in Tcl in a manner recognizably similar to Lisp macros is a common technique. For example, this is (trivial) code that makes it easier to write procedures that always import a certain set of global variables:
proc gproc {name arguments body} {
set realbody "global foo bar boo;$body"
uplevel 1 [list proc $name $arguments $realbody]
}
With that, all procedures declared with gproc xyz rather than proc xyz will have access to the foo, bar and boo globals. The whole key is that uplevel takes a command and evaluates it in the caller's context, and list is (among other things) an ideal constructor for substitution-safe code fragments.

Erlang's parse transforms are similar in power to Lisp macros, though they are much trickier to write and use (they are applied to the entire source file, rather than being invoked on demand).
Lisp itself had a brief dalliance with non-parenthesised syntax in the form of M-expressions. It didn't take with the community, though variants of the idea found their way into modern Lisps, so you get Lisp's powerful macros without the parentheses ... in Lisp!

Yes, you can definitely have Lisp macros without all the parentheses.
Take a look at "sweet-expressions", which provides a set of additional abbreviations for traditional s-expressions. They add indentation, a way to do infix, and traditional function calls like f(x), but in a way that is backwards-compatible (you can freely mix well-formatted s-expressions and sweet-expressions), generic, and homoiconic.
Sweet-expressions were developed on http://readable.sourceforge.net and there is a sample implementation.
For Scheme there is a SRFI for sweet-expressions, SRFI-110: http://srfi.schemers.org/srfi-110/

No, it's not necessary. Anything that gives you some sort of access to a parse tree would be enough to allow you to manipulate the macro body in hte same way as is done in Common Lisp. However, as the manipulation of the AST in lisp is identical to the manipulation of lists (something that is bordering on easy in the lisp family), it's possibly not nearly as natural without having the "parse tree" and "written form" be essentially the same.

I think this was not mentioned.
C++ templates are Turing-complete and perform processing at compile-time.
There is the well-known expression templates mechanism that allow transformations,
not from arbitrary code, but at least, from the subset of c++ operators.
So imagine you have 3 vectors of 1000 elements and you must perform:
(A + B + C)[0]
You can capture this tree in a expression template and arbitrarily manipulate it
at compile-time.
With this tree, at compile time, you can transform the expression.
For example, if that expression means A[0] + B[0] + C[0] for your domain, you could
avoid the normal c++ processing which would be:
Add A and B, adding 1000 elements.
Create a temporary for the result, and add with the 1000 elements of C.
Index the result to get the first element.
And replace with another transformed expression template tree that does:
Capture A[0]
Capture B[0]
Capture C[0]
Add all 3 results together in the result to return with += avoiding temporaries.
It is not better than lisp, I think, but it is still very powerful.

Yes, it is certainly possible. Especially if it is still a Lisp under the bonnet:
http://www.meta-alternative.net/pfront.pdf
http://www.meta-alternative.net/pfdoc.pdf

Boo has a nice "quoted" macro syntax that uses [| |] as delimiters, and has certain substitutions which are actually verified syntactically by the compiler pipeline using $variables. While simple and relatively painless to use, it's much more complicated to implement on the compiler side than s-expressions. Boo's solution may have a few limitations that haven't affected my own code. There's also an alternate syntax that reads more like ordinary OO code, but that falls into the "not for the faint of heart" category like dealing with Ruby or Python parse trees.

Javascript's template strings offer yet another approach to this sort of thing. For instance, Mark S. Miller's quasiParserGenerator implements a grammar syntax for parsers.

Go ahead and enter the Elixir programming language.
Elixir is a functional programming language that feels like Lisp with respect to macros, but it is on Ruby's clothes, and runs on top of the Erlang VM.
For those who do not like the parenthesis, but wish their language has powerful macros, Elixir is a great choice.

You can write macros in R (it have more like Algol Syntax) that have notion of delayed expression like in LISP macros. You can call substitute() or quote() to not evaluate the delayed expression but get actual expression and traverse its source code like in LISP. Even structure of the expression source code is like in LISP. Operators are first item in list. e.g.: input$foo which is getting property foo from list input as expression is written as ['$', 'input', 'foo'] just like in LISP.
You can check the ebook Metaprogramming in R that also show how to create Macros in R (not something you would normally do but it's possible). It's based on Article from 2001 Programmer’s Niche: Macros in R that explain how to write LIPS macros in R.

primitives of a programming language

Which do the concepts control flow, data type, statement, expression and operation belong to? Syntax or semantics?
What is the relation between control flow, data type, statement, expression, operation, function, ...? How a program is built from these primitives level by level?
I would like to understand these primitive concepts and their relations in order to figure out what aspects of a new language should one learn.
Thanks and regards!

All of those language elements have both syntax (how it is written) and semantics (how the way it is written corresponds to what it actually means). Control flow determines which statements are executed and when, expressions yield a value and can be made up of functions and other language elements (although the details depend on the programming language). An operation is usually a sequence of statements. The meaning of "function" varies from language to language; in some languages, any operation that can be invoked by name is a function. In other languages, a function is an operation that yields a result (as opposed to a procedure that does not report a result). Some languages also require that functions be non-mutating while procedures can be mutating, although this varies from language to language. Data types encapsulate both data and the operations/procedures/functions that can be operated on that data.

They belong to both worlds:
Syntax will describe which are the operators, which are primitive types (int, float), which are the keywords (return, for, while). So syntax decides which "words" you can use in the programming language. With word I mean every single possible token: = is a token, void is a token, varName12345 is a token that is considered as an identifier, 12.4 is a token considered as a float and so on..
Semantics will describe how these tokens can be combined together inside you language.
For example you will have that while semantics is something like:
WHILE ::= 'while' '(' CONDITION ')' '{' STATEMENTS '}'
CONDITION ::= CONDITION '&&' CONDITION | CONDITION '||' CONDITION | ...
STATEMENTS ::= STATEMENT ';' STATEMENTS | empty_rule
and so on. This is the grammar of the language that describes exactly how the language is structured. So it will be able to decide if a program is correct according to the language semantics.
Then there is a third aspect of the semantics, that is "what does that construct mean?". You can see it as a correspondence between, for example, a for loop and how it is translated into the lower level language needed to be executed.
This third aspect will decide if your program is correct with respect to the allowed operations. Usually you can make a compiler reject many of programs that have no meaning (because they violates the semantic) but to be able to find many different mistakes you will have to introduce a new tool: the type checker that will also check that whenever you do operations they are correct according to the types.
For example you grammar can allow doing varName = 12.4 but the typechecker will use the declaration of varName to understand if you can assign a float to it. (of course we're talking about static type checking)

Those concepts belong to both.
Statements, expressions, control flow operations, data types, etc. have their structure defined using the syntax. However, their meaning comes from the semantics.
When you have defined syntax and semantics for a programming language and its constructs, this basically provides you with a set of building blocks. The syntax is used to understand the structure in the code - usually represented using an abstract syntax tree, or AST. You can then traverse the tree and apply the semantics to each element to execute the program, or generate some instructions for some instruction set so you can execute the code later.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string