Is there a Python3LexerBase python file in ANTLR4? - antlr4

I found an ANTLR grammar for Python3 here, downloaded it, and generated the ANTLR recognizers using the ANTLR plugin in Pycharm (plugin version 1.17, containing ANTLR runtime 4.9.1).
One of the generated modules is Python3Lexer.py. That class imports something called Python3LexerBase near the top (which presumably would be another Python module), but I have not been able to find a python module called Python3LexerBase.
The generated Python3Lexer module also contains functions like the following:
def NEWLINE_sempred(self, localctx:RuleContext, predIndex:int):
if predIndex == 0:
return this.atStartOfInput()
this.atStartOfInput() looks very Javaish.
I found a Python3LexerBase.java class here, so it seems to be trying to call a Java class from the Python.
There is a very similar thread about the same base class in Go, where it was indicated that there is, or might be, one in Python.
Does anybody know:
Why is it (seemingly) trying to call Java code from Python?
Is there a Python3LexerBase module somewhere, or do I need to implement it myself?

Base classes for lexers and parsers are usually written in the target language of the generated files, hence they do not belong to the grammar itself (which is target language agnostic per se, if no target action code is included). But usually there's at least one implementation nearby, for a specific target language and you have to port that to your language (here Python) yourself, unless this one implementation matches your needs, of course.
If a language is used by enough people then usually a base implementation is contributed over time as it happened for the Python grammar. But as #kaby76 already mentioned, this hasn't happened yet for Python as target.

Related

Is it possible to export a DSL compiler created by JetBrains MPS and use it independently (e.g. invoke it from another Java program)

I'd like to build a DSL and use it as follows:
The DSL compiles to Java.
Export the DSL compiler and package it (i.e. as a JAR), so I can invoke the DSL compiler from a Java application to compile "code written in my DSL" into "Java source code" (I'll use other libraries to programmatically compile Java into bytecode).
Can I use JetBrains MPS to build a DSL and export its compiler as described? If no, other suggestions are appreciated?
I raised the question on MPS Support forum, and the answer I got was that it's not possible to export a compiler for my DSL (e.g. as JAR) from MPS IDE and then invoke the exported compiler from some Java application (think of a Java backend service) passing a text input representing a program written on my DSL.
Though you can use ant to invoke the "MPS code generator" (which is responsible for generating the target language code, e.g. Java, representing the input DSP program), but the generator expects as input "the MPS model" of your DSL program (I guess it's some AST like MPS internal representation of the DSL program). But the only way to generate "the MPS model" of your DSL program is by using Jetbrains' MPS IDE (or a stripped version of it, or intellij with a plugin for your DSL). In other words, the only way to write/edit programs in your DSL and be able to compile them, is by using Jetbrains MPS IDE (or one of its derivatives).
Link to the question I posted on MPS Support forum and the answer.
It seems to me your question is not so far from this documentation entry: https://confluence.jetbrains.com/display/MPSD32/Building+standalone+IDEs+for+your+languages
Maybe you cannot do it directly as a jar library, but it is possible, with some ant or gradle magic, to call a DSL compiler (or, as it's called in MPS, a generator) from an ant task. Documentation about this can be found at https://www.jetbrains.com/help/mps/building-mps-language-plugins.html#
I know it says building plugins but the same mechanism is used.
Why you would want to do this, though, eludes me, since the strong point of MPS is IDE support and very advanced multi-language integration, not necessarily code generation.
invoke the exported compiler [...] passing a text input representing a program written on my DSL
Your idea is sadly inherently flawed. There is no such thing as an MPS "DSL compiler" which takes text as input. In MPS there are generators which transform your DSL into another MPS language, in your case your target language would be BaseLanguage (MPS version of Java). After the transformation, the Java source code is generated as .java files and is automatically compiled as .class files. So yeah, this can be done with an Ant script built in BuildLanguage and called from cmd. But, the generator does NOT take as input text but an AST. The AST is your program "coded" (proper term would be modeled) in MPS.
So what you actually want is a parser (if your language is textual and parseable that is), which has text as input and AST as output. Once you have the AST in any form, you can somehow put it into an MPS model.
Please refer to my other answer where I commented on some portability (basically import, export) in MPS here. I have mentioned (not only) a project I am working on there. It allows to import a language and programs into MPS.
If you don't want to use MPS' IDE at all, but to work with text, it loses the advantage of MPS as a language workbench (LWB) with projectional editor. Maybe you should use another textual LWB (f.e. Xtext) or a parser generator (f.e. ANTLR). If the grammar definitions in parser generators scare you, you could use a model-based parser generator like YAJCo (I have contributed).

Creating libraries from machine readable specifications in Haskell

I have a specification and I wish to transform it into a library. I can write a program that writes out Haskel source. However is there a cleaner way that would allow me to compile the specification directly (perhaps using templates)?
References to manuals and tutorials would be greatly appropriated.
Yes, you can use Template Haskell. The are a couple of approaches to using it.
One approach is to use quasiquotation to embed (parts of) the text of the specification in a quasiquotation within a source file. To implement it, you need to write a parser of the machine specification that outputs Haskell AST. This might be useful if the specification is relatively static, it makes sense to have subsets of the specification, or you want to manually map parts of the specification to different modules. This may also be useful, in addition to a different approach perhaps, to provide tools for users of the library to express things in terms of the specification.
Another approach is to execute IO in a normal Template Haskell splice. This would allow you to read the specification from a file (see addDependentFile too in this case), the network (don't do this), or to execute an arbitrary program to produce the Haskell AST needed. This might be more useful if the specification changes more often, or you want to keep a strict separation between the specification and code.
If it's much easier to produce Haskell source than Haskell AST, you can use a library like haskell-src-meta which will parse a string into Template Haskell AST.

How do functional language compilers work? [duplicate]

I've heard of the idea of bootstrapping a language, that is, writing a compiler/interpreter for the language in itself. I was wondering how this could be accomplished and looked around a bit, and saw someone say that it could only be done by either
writing an initial compiler in a different language.
hand-coding an initial compiler in Assembly, which seems like a special case of the first
To me, neither of these seem to actually be bootstrapping a language in the sense that they both require outside support. Is there a way to actually write a compiler in its own language?
Is there a way to actually write a compiler in its own language?
You have to have some existing language to write your new compiler in. If you were writing a new, say, C++ compiler, you would just write it in C++ and compile it with an existing compiler first. On the other hand, if you were creating a compiler for a new language, let's call it Yazzleof, you would need to write the new compiler in another language first. Generally, this would be another programming language, but it doesn't have to be. It can be assembly, or if necessary, machine code.
If you were going to bootstrap a compiler for Yazzleof, you generally wouldn't write a compiler for the full language initially. Instead you would write a compiler for Yazzle-lite, the smallest possible subset of the Yazzleof (well, a pretty small subset at least). Then in Yazzle-lite, you would write a compiler for the full language. (Obviously this can occur iteratively instead of in one jump.) Because Yazzle-lite is a proper subset of Yazzleof, you now have a compiler which can compile itself.
There is a really good writeup about bootstrapping a compiler from the lowest possible level (which on a modern machine is basically a hex editor), titled Bootstrapping a simple compiler from nothing. It can be found at https://web.archive.org/web/20061108010907/http://www.rano.org/bcompiler.html.
The explanation you've read is correct. There's a discussion of this in Compilers: Principles, Techniques, and Tools (the Dragon Book):
Write a compiler C1 for language X in language Y
Use the compiler C1 to write compiler C2 for language X in language X
Now C2 is a fully self hosting environment.
The way I've heard of is to write an extremely limited compiler in another language, then use that to compile a more complicated version, written in the new language. This second version can then be used to compile itself, and the next version. Each time it is compiled the last version is used.
This is the definition of bootstrapping:
the process of a simple system activating a more complicated system that serves the same purpose.
EDIT: The Wikipedia article on compiler bootstrapping covers the concept better than me.
A super interesting discussion of this is in Unix co-creator Ken Thompson's Turing Award lecture.
He starts off with:
What I am about to describe is one of many "chicken and egg" problems that arise when compilers are written in their own language. In this ease, I will use a specific example from the C compiler.
and proceeds to show how he wrote a version of the Unix C compiler that would always allow him to log in without a password, because the C compiler would recognize the login program and add in special code.
The second pattern is aimed at the C compiler. The replacement code is a Stage I self-reproducing program that inserts both Trojan horses into the compiler. This requires a learning phase as in the Stage II example. First we compile the modified source with the normal C compiler to produce a bugged binary. We install this binary as the official C. We can now remove the bugs from the source of the compiler and the new binary will reinsert the bugs whenever it is compiled. Of course, the login command will remain bugged with no trace in source anywhere.
Check out podcast Software Engineering Radio episode 61 (2007-07-06) which discusses GCC compiler internals, as well as the GCC bootstrapping process.
Donald E. Knuth actually built WEB by writing the compiler in it, and then hand-compiled it to assembly or machine code.
As I understand it, the first Lisp interpreter was bootstrapped by hand-compiling the constructor functions and the token reader. The rest of the interpreter was then read in from source.
You can check for yourself by reading the original McCarthy paper, Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I.
Every example of bootstrapping a language I can think of (C, PyPy) was done after there was a working compiler. You have to start somewhere, and reimplementing a language in itself requires writing a compiler in another language first.
How else would it work? I don't think it's even conceptually possible to do otherwise.
Another alternative is to create a bytecode machine for your language (or use an existing one if it's features aren't very unusual) and write a compiler to bytecode, either in the bytecode, or in your desired language using another intermediate - such as a parser toolkit which outputs the AST as XML, then compile the XML to bytecode using XSLT (or another pattern matching language and tree-based representation). It doesn't remove the dependency on another language, but could mean that more of the bootstrapping work ends up in the final system.
It's the computer science version of the chicken-and-egg paradox. I can't think of a way not to write the initial compiler in assembler or some other language. If it could have been done, I should Lisp could have done it.
Actually, I think Lisp almost qualifies. Check out its Wikipedia entry. According to the article, the Lisp eval function could be implemented on an IBM 704 in machine code, with a complete compiler (written in Lisp itself) coming into being in 1962 at MIT.
Some bootstrapped compilers or systems keep both the source form and the object form in their repository:
ocaml is a language which has both a bytecode interpreter (i.e. a compiler to Ocaml bytecode) and a native compiler (to x86-64 or ARM, etc... assembler). Its svn repository contains both the source code (files */*.{ml,mli}) and the bytecode (file boot/ocamlc) form of the compiler. So when you build it is first using its bytecode (of a previous version of the compiler) to compile itself. Later the freshly compiled bytecode is able to compile the native compiler. So Ocaml svn repository contains both *.ml[i] source files and the boot/ocamlc bytecode file.
The rust compiler downloads (using wget, so you need a working Internet connection) a previous version of its binary to compile itself.
MELT is a Lisp-like language to customize and extend GCC. It is translated to C++ code by a bootstrapped translator. The generated C++ code of the translator is distributed, so the svn repository contains both *.melt source files and melt/generated/*.cc "object" files of the translator.
J.Pitrat's CAIA artificial intelligence system is entirely self-generating. It is available as a collection of thousands of [A-Z]*.c generated files (also with a generated dx.h header file) with a collection of thousands of _[0-9]* data files.
Several Scheme compilers are also bootstrapped. Scheme48, Chicken Scheme, ...

What would be involved in calling ARPACK++ (a C++ library) from Haskell?

I've spent a couple of days developing a program in Haskell, while learning the language. Now I realize that I'll need to call Arpack (a Fortran library) or Arpack++ (a C++ wrapper to Arpack) -- I can't find a good implementation of Lanczos method with Haskell bindings. Do any more experienced Haskell programers have an opinion of how difficult this would be?
I've been able to get ".so" ("shared object") versions of libarpack and libarpack++ installed through Ubuntu's repository, but I'm not sure that will suffice. I suspect I'm going to ultimately need to build Arpack++ from source code, which is possible, but I'm getting a lot of build errors, so it will take time. Is there any way to use just the ".so" files, without knowing exactly which version of the header files were used to generate them?
I'm considering using GreenCard, because it looks like the most well maintained Haskell/C bridge. I can't find much documentation though, so I'm wondering whether it will support C++ too.
I'm also starting to wonder whether I should rewrite my program in Python, and use scipy to call Arpack, but I've already sunk a couple of days into writing Haskell. I really like Haskell too, so I'm hoping I can make this work. I guess my overall question is this: What would be involved in making this work with Haskell?
Thanks much.
ELF format is standard format of executables and shared libraries, so accessing the code in these compiled modules is only a matter of knowing function names. If I understand correctly, Fortran is interoperable with C. As a consequence, Fortran should be interoperable with any language which can use C bindings, including Haskell. FYI, you can find all names exported by a module (executable or shared object or simple object archive) using nm tool (it is usually available in all linux distros by default). This of course would work if the binary file was not "stripped", but AFAIK it is not common practice.
However, Haskell cannot use C++ bindings in sane way, since C++ polymorphic features require name mangling, and the method of this name transformation is highly compiler-dependent. It is well-known problem which is not specific to Haskell. Of course, you could try to get a list of exported symbols from C++ shared object and then bind them using FFI, but... It isn't worth it.
As dsign said, you can use Foreign Function Interface GHC feature to create bindings to foreign code. All you would require is library headers (and the library itself of course). In case of C language that would be header files (*.h), but since your library is written in Fortran, you have to find header files analogue in library sources, refere to this page to match Fortran and C types, and then use this information to write FFI bindings. It would be helpful first to write C bindings, i.e. write C header. Then you can even use automatic FFI binding programs like c2hs.
It maybe also helpful to look through C++ bindings. It is possible that it has the header file I've described above. If it has one, then writing FFI bindings will be no more difficult than writing them for any other library.
So, it is not entirely impossible, but it may require some thorough work. Writing bindings to scientific/pure computational libraries is way easier than writing them for some system library which does a lot of IO and keeps its own internal state, but since this library is written not in C... Well, it may be advisable to invest your time in easier alternatives. I cannot say anythin about scipy, I've never used it, but since Python as a language is much more simpler than Haskell, it may be good alternative.
I can tell you that using a C/Fortran library from Haskell, with the help of the Foreign Function Interface would be certainly possible and not terribly complicated. Here is an introduction. In my understanding, you should be able to call anything with a C calling convention, and perhaps even Fortran, without need of recompiling the code. The only exception is with things that look like function calls but are indeed macros, in which case you will have to figure out what the macros do and reproduce them in Haskell.
As of greencard, I have never used it, so I can not vouch for it.
Your second idea of using Python could potentially save you more than a couple of days. Sad as it is, I have never managed Haskell code to easily adapt to my changing requirements, while I find that trivial in Python. Of course, that could be a limitation on my skills with Haskell or my thinking process rather that something to blame to the language.

Interpreted standard library

It's common for a programming language to come with a standard library implemented at least partly in the language itself.
In the case of an interpreted language, the obvious implementation is to read the library source files when the interpreter starts up, but this runs into the messy but persistent problem of making sure the interpreter knows where to find those files even when both are moved around. It would be cleaner if they could be embedded in the interpreter itself, so there is just a single executable.
I can see a simple way to do this by just translating the library source files to C literal strings, but I'm curious as to whether there are any pitfalls I'm overlooking or refinements to the method.
So my question is, what existing interpreted languages attach library source files in the language itself, to the interpreter?
Bytecode virtual machines often provide an answer to this: store the bytecode in files (*.pyc, *.rbc) and load the bytecoded versions of the libraries using a simpler mechanism.
Smalltalks do this by dumping the standard heap into a separate file called an "image".
As for single-file distribution, append the library file(s) to the end of the executable file, and include special logic for the interpreter to read from its binary and find a structure of those interpretable program data, or alternatively build the interpreter with a static inclusion of the program data.

Resources