Groovy DSL scripts - groovy

I wrote a Global AST transformation that should be applied to DSL scripts, and am now in the process of selecting the best way to identify specific groovy scripts as these DSL scripts.
I considered the following options:
A custom file extension; The biggest disadvantage here is IDE support: many barely support compilation/editing of files that have non-groovy extensions (you can configure an editor but it requires some tweaking).
A special file name suffix (prefix) but in this case the suffix should be really unique (and thus relatively long) to avoid accidental transformation of regular groovy files (my current choice).
A local AST transformation applied to a script class, this has as disadvantage that one would need to write some boilerplate code for each script.
Having some unique first statement in the scripts that will identify the DSL.
What would in your opinion be the best option to choose and why? Are there any other options at my disposal that I haven't thought about?

If you compile your DSL scripts using GroovyShell you can use CompilerConfiguration.addCompilationCustomizer(ASTTransformationCustomizer(YourGlobalASTTransformation)) to apply the transformation on them.

Related

What does an Interpreter contain?

I'm using Antlr4 to create an interpreter, lexer and parser. The GUI it will be used in contains QScintilla2.
As QScintilla does not need a parser and has a CustomLexer module will the (Antlr4 built, Python3 target) interpreter be enough?
I'm not asking for opinions but factual guidance. Thanks.
What does an Interpreter contain
An interpreter must have some way to parse the code and then some way to run it. Usually the "way to parse the code" would be handled by a lexer+parser, but lexerless parsing is also possible. Either way, the parser will create some intermediate representation of the code such as a tree or bytecode. The "way to run it" will then be a phase that iterates over the generated tree or bytecode and executes it. JIT-compilation (i.e. generating machine code from the tree or bytecode and then executing that) is also possible, but more advanced. You can also run various analyses between parsing and execution (for example you can check whether any undefined variables or used anywhere or you could do static type checking - though the latter is uncommon in interpreted languages).
When using ANTLR, ANTLR will generate a lexer and parser for you, the latter of which will produce a parse tree as a result, which you can iterate over using the generated listener or visitor. At that point you proceed as you see fit with your own code. For example, you could generate bytecode from the parse tree and execute that, translate the parse tree to a simplified tree and execute that or execute the parse tree directly in a visitor.
QScintilla is about displaying the language and is not linked to the interpreter. In an IDE the console is where the interpreter comes into play along with running the script (from a 'Run' button for example). The only thing which is common to QScintilla and the interpreter is the script file - the interpreter is not connected or linked to QScintilla. Does this make basic sense?
Yes, that makes sense, but it doesn't have to be entirely like that. That is, it can make sense to reuse certain parts of your interpreter to implement certain features in your editor/IDE, but you don't have to.
You've specifically mentioned the "Run" button and as far as that is concerned, the implementation of the interpreter (and whether or not it uses ANTLR) is of absolutely no concern. In fact it doesn't even matter which language the interpreter is written in. If your interpreter is named mylangi and you're currently editing a file named foo.mylang, then hitting the "Run" button should simply execute subprocess.run(["mylangi", "foo.mylang"]) and display the result in some kind of tab or window.
Same if you want to have a "console" or "REPL" window where you can interact with the interpreter: You simply invoke the interpreter as a subprocess and connect it to the tab or subwindow that displays the console. Again the implementation of the interpreter is irrelevant for this - you treat it like any other command line application.
Now other features that IDEs and code editors have are syntax highlighting, auto-completion and error highlighting.
For syntax highlighting you need some code that goes through the source and tells the editor which parts of the code should have which color (or boldness etc.). Using QScintilla, you accomplish this by giving a lexer class that does this. You can define such a class, by simply writing the necessary code to detect the types of tokens by hand, but you can also re-use the lexer generated by ANTLR. So that's one way in which the implementation of your interpreter could be re-used in the editor/IDE. However since a syntax highlighter is usually fairly straight forward to write by hand, you don't have to do it this way.
For code completion you need to understand which variables and functions are defined in the file, what their scope is, and which other files are included in the current file. These days it's becoming common to implement this logic in a so-called language-server that is separate tool that can be re-used from different editors and IDEs. Regardless of whether you implement this logic in such a language server or directly in your editor, you'll need a parser (and, if applicable, a type checker) to be able to answer these types of question. Again that's something that you can re-use from your interpreter and this time that's definitely a good idea because writing a second parser would be significant additional work (and easy to get out of sync with the interpreter's parser).
For error highlighting you can simply invoke the interpreter in "verify only" mode (i.e. only print out syntax errors and other errors that can be detected statically, but don't actually run the file -- many interpreters have such an option) and then parse the output to find out where to draw the squiggly lines. But you can also re-use the parser (and analyses if you have any) from your interpreter instead. If you go the route of having a language server, errors and warnings would also be handled by the language server.

Creating libraries from machine readable specifications in Haskell

I have a specification and I wish to transform it into a library. I can write a program that writes out Haskel source. However is there a cleaner way that would allow me to compile the specification directly (perhaps using templates)?
References to manuals and tutorials would be greatly appropriated.
Yes, you can use Template Haskell. The are a couple of approaches to using it.
One approach is to use quasiquotation to embed (parts of) the text of the specification in a quasiquotation within a source file. To implement it, you need to write a parser of the machine specification that outputs Haskell AST. This might be useful if the specification is relatively static, it makes sense to have subsets of the specification, or you want to manually map parts of the specification to different modules. This may also be useful, in addition to a different approach perhaps, to provide tools for users of the library to express things in terms of the specification.
Another approach is to execute IO in a normal Template Haskell splice. This would allow you to read the specification from a file (see addDependentFile too in this case), the network (don't do this), or to execute an arbitrary program to produce the Haskell AST needed. This might be more useful if the specification changes more often, or you want to keep a strict separation between the specification and code.
If it's much easier to produce Haskell source than Haskell AST, you can use a library like haskell-src-meta which will parse a string into Template Haskell AST.

XJC re-use classes across ant tasks

I am exploring options to use same classes across two different ant tasks. In the first task, I am already building the jar & deleting the generated classes. Xjc does not allow passing a jar as a parameter for reference. One option that I currently did is regenerate the episode file only from the xsd and construct another jar. Is there any better approach?
Please see this post:
http://blog.bdoughan.com/2011/12/reusing-generated-jaxb-classes.html
The right way to do this:
Use -episode to generate the episode file on your first schema
Use the generated episode JAR during the second compilation
(In some cases) remove the leftovers
So episodes are definitely the way to go. I don't quite understand what do you mean by "Xjc do not allow to pass jar as parameter for reference".

Debug-able Domain Specific Language

My goal is to develop a DSL for my application but I want the user to be able to put a break-point in his/her DSL without the user to know anything about the underlying language that the DSL runs on and he/she see is the DSL related syntax, stack, watch variables and so on.
How can I achieve this?
It depends on your target platform. For example, if you're implementing your DSL compiler on top of .NET, it is trivial to annotate your bytecode with debugging information (variable names, source code location for expressions and statements, etc.).
If you also provide a Visual Studio extension for your language, you'll be able to reuse a royalty-free MSVS Isolated Shell for both editing and debugging for your DSL code.
Nearly the same approach is possible with JVM (you can use Eclipse or Netbeans as a debugging frontend).
Native code generation is a little bit more complicated, but it is still possible to do some simple things, like generating C code stuffed with line pragmas.
You basically need to generate code for your DSL with built-in opportunities for breakpoints, each with built-in facilities for observing the internal state variables. Then your debugger has know how to map locations in the DSL to the debug breakpoints, and for each breakpoint, simply call the observers. (If the observers have names, e.g., variable names, you can let the user choose which ones to call).

#Grape in scripts with multiple files

I'd like to use #Grape in my groovy program but my program consists of several files. The examples on the Groovy Grape page all seem to assume that your script will consist of one file. How can I do this? Should I just add it to one of the files and expect that the imports will work from the others? If so, then is it common to place all the #Grape calls in one file with no other code? Do I need to add the Grape call to all files that will import the package? Do I need to download the JAR and create a Gradle file, which I was getting away without at this point?
the grape engine and the #grab annotation were created as part of core groovy with single file scripts in mind, to allow a chunk of text to easily become a fully functional program.
for larger applications, gradle is an awesome build tool with lots of useful features.
but yes, you can manage all the application dependencies just with grape.
whether you annotate every file or a single one does not matter, just make sure the #grab annotated file is read before you try to use the external class.
annotating the main class is probably better as you will easily lose track of library versions if you have the annotations scattered.
and yes, you should consider gradle for any application with more than a dozen files or anything you might want to reuse elsewhere as a library.
In my opinion, it depends how your program is to be run...
If your program is to be run as a collection of standalone scripts, then I'd probably stick the #Grab required for each script at the top of each of them.
If your program is more of a standard style program with a single point of entry, then I'd go for using a build tool like Gradle (as you say), as you get a lot of easy wins by using it.
Firstly, it makes it easy to define your dependencies (and build a single large jar containing all of them)
Secondly, Gradle makes it really easy to start writing tests, include code coverage plugins, or useful tools like codenarc to suggest possible fixes or improvements to your code. These all become invaluable not only for improving your code (or knowing your code works), but also when refactoring your code, you know you've not broken anything that used to work.

Resources