Xtext - multiple files language - dsl

I'm pretty new to Xtext, so I don't understand very well all of the associated concepts. There's one question in particular I couldn't find an answer to: how can I manage a grammar for a language with multiple files?
The DSL I'm working on typically uses four files, three of which should be referenced in the first one. All files share the same extension, though not the same grammar. Is that possible at all?

How can I manage a grammar for a language with multiple files?
Xtext first parses the file, and then links crossreferences. These crossreferences can be "internal" in a file or "external". In both cases the linking and the scoping ystems will do the hard work for you.
All files share the same extension, though not the same grammar. Is that possible at all?
This seems to be a different question, but alas...
If the grammars are really different then you will have a hard time with Xtext. If Xtext sees a .foo file, how should it decide, which parser should be applied? Try each one until no error occurs? And what if the file is written in grammar B but really contains syntax errors? ...
But often there is a little trick: The is really one grammar, but the grammar contains two nearly separate parts. Which part is used is calculated by the first few keywords in the file.
A small example:
File A.foo:
module A {
// more stuff here
}
module B {
// also more stuff
}
File B.foo:
system X {
use module A
use module B
}
The grammar might look like this:
Model: Modules | Systems;
Modules: modules += Module;
Module: 'module' name=ID '{' '}';
Systems: systems += System;
System: 'system' name=ID '{' used+=UsedModule* '}';
UsedModule: 'use' 'module' module=[Module];
In this grammar one file can only contain either module XOR system definitions but not a mix of them. The first occurrence of the keyword module or system determines what is allowed.

Related

ANTLR4 target file names

For the TypeScript ANTLR target that Sam and I have been working on, I would like to have the code generation tool create a single typescript file to hold all the classes generated from a named grammar input. Is this output file structure going to be hard?
So for example, I'd like Expr.g4 -> Expr.g4.ts. That one file TypeScript file could contain named exports for {ExprLexer, ExprParser, and ExprListener} classes, visitor code if requested, maybe even some loose factory functions etc.
I've been looking into the source code under tool/src/org/antlr/v4/codegen to find out how the number and names of the output files are determined, in particular finding CodeGenPipeline.java, This class works in conjunction with the language-specific target class, but the pipeline has a lot (perhaps too much) knowledge of possible output files built into it. None of what I see in CodeGenPipeline.java seems well matched to my 1:1 input-to-output file model.
It seems like the knowledge of what files should be generated for a given language target should come from the language.stg file if possible, but I can't find any evidence that approach has been implemented. Can anyone fill me in on any reasons that approach can't hasn't been tried or worked?

How to compile this COBOL grammar files?

I'm using COBOL grammar files from below URL:
https://github.com/antlr/grammars-v4/tree/master/cobol85
From the given source, there are 2 grammar files which are Cobol85.g4 and Cobol85Preprocessor.g4.
Both work like a charm if I deal separately like the following:
~$ antlr4 -Dlanguage=Python2 Cobol85
and
~$ antlr4 -Dlanguage=Python2 Cobol85Preprocessor
However, I realize, only Cobol85Preprocessor able to understand comments in COBOL. On the other hand, Cobol85 grammar file don't. My best tought, maybe I need to import both together into a single file.
So, I created another grammar file named Cobol.g4 which contains below code:
grammar Cobol;
import Cobol85Preprocessor, Cobol85;
and compiled it with the following command:
~$ antlr4 -Dlanguage=Python2 Cobol
Good news, I found no problem compiling it. The bad news, it doesn't work perfectly compare to the previous method (deal grammar files separately).
Instead, I received the below error message:
line 1:30 extraneous input '.\r\n ' expecting {<EOF>, ADATA, ADV...
Is there any way to solve this or by design, I should deal both separately? Could anyone please help me with this issue?
PS: I'm not sure if this piece of information will be useful. I'm using Antlr 4.7.1 with Listener.
Disclaimer: I am the author of these COBOL ANTLR4 grammar files.
The parser generated from grammar Cobol85.g4 has to be provided with COBOL source code, which has been preprocessed with a COBOL preprocessor. Cobol85Preprocessor.g4 is at the core of this preprocessor and enables parsing of statements such as COPY REPLACE, EXEC SQL etc.
Cobol85Preprocessor.g4 is meant to be augmented with quite extensive additional logic, which is not included in the grammar files and enables normalization of line formats, line breaks, comment lines, comment entries, EXEC SQL, EXEC CICS and so on. This missing code is leading to the problems you are noticing.
The ProLeap COBOL parser written by me implements all of this in Java based on the files Cobol.g4 and Cobol85Preprocessor.g4. However, there is no Python implementation, yet.

Simplest way to deal with "import" statement in ANTLR4

I’m using ANTLR4 and I have an "import" statement inside my grammar.
Does ANTLR4 have an option to automatically open and parse input file instead of doing it inside my visitor (creating another parser/lexer and visitor for each "import" declaration) ?
"Pretty" sure that I've already seen it but I can't find it anymore.
Inside my grammar :
importStatement : 'import' ID ';' // Here ? an action (Java code)
// to prepend an AST to my current AST ?
Inside an input files :
Import test;
There is no built-in functionality for this, primarily because every language requiring it has its own set of rules for how it needs to be done. In addition, this can quickly make the parse operation for your whole project go from O(n) to O(n²) (i.e. parsing each file once, to parsing up to the whole project for each file).
If your language allows you to build a correct parse tree prior to resolving the imports (e.g. it doesn't have arbitrary #define statements that can appear in imports), then you should be glad you aren't C/C++ and parse each file independently before resolving the import statements.

When are `include directives not needed in Verilog and SystemVerilog?

Suppose I have a top level file that I pass to my compiler that has:
`include "my_defines.sv"
`include "my_component.sv"
Inside "my_component.sv" file, I am using some defines from "my_defines.sv", like this:
my_variable = `CONSTANT_FROM_MY_DEFINES;
The question is the following: do I need to have `include "my_defines.sv" inside "my_component.sv"? Perhaps this requirement is compiler-specific?
If your "my_defines.sv" has an "include" guard, then it is safe and better to include "my_defines.sv" in all your other files. The "include" guard at the top of "my_defines.sv" will look like this:
`ifndef MY_DEFINES_SV
`define MY_DEFINES_SV
// put your own defines here ...
`endif
include directives like that are like copying and pasting that file into the point where the include is. The compiler:
Reads the file you give it.
When it encounters an include, it reads that file.
When it's finished that file it continues the original file.
The result is that the compiler sees one big flat file.
In your example you can use stuff from my_defines in my_component because it appears earlier.
The problem with doing a lot of this is that eventually you'll end up with conflicts. Maybe two things reference each other (which include comes first), two things use the same name (clashing definitions), or multiple things have the same include statement (multiple definitions of the same thing).
Packages solve those problems. Once things start getting a little more complex, look into them.
It is dependent upon the order in which your source files are compiled. Because you are referring specifically todefine macros, which are global, it is required that the macro definitions are compiled before the macro is used. In your case, you do not need to include "my_defines.sv" inside "my_component.sv" since "my_defines.sv" was already compiled in your top file.
Macro definitions only persist across files but only to the end of the translation unit. Simulators must support two different methods of assigning source files to translation units and it's hard to get `include files full of `defines to compile correctly in both methods.
It is better use parameters or const variables for constants. Since parameters and constants follow normal scoping rules you can safely include them in every file/scope that needs them. Then it doesn't matter how the code is broken into translation units, it always compiles. I think it is easier to find the definitions when you're browsing the code because the `include is probably in the same file instead of off in some other unrelated file.
you have to include `include "my_defines.sv in my_component.sv...
best practice is add all include in one pkg and add that pkg to each of file.

VC++ 2005 project option to include stl?

I'm working on a cross platform project that uses STL. The other compiler includes STL support by default, but in VS2005 I need to add the following before the class definitions that use STL items:
#include <cstdlib>
using namespace std;
Is there a VS2005 option that would set this automatically? It's just a bit tedious to work around. I'm just trying to avoid lots of #ifdefs in the source -
EDIT: The other compiler is the IAR workbench for the ARM 926x family. Perhaps I should get them to explicitly do the includes?
Also - is "std::map<>" preferred over "using namespace std; map<>" ?
All compilers should require you to include those lines. If they don't, then they're just encouraging you to write non-portable code because you're relying on certain headers to be included automatically and you're relying on certain names to be in scope implicitly.
I don't mean to say that those two lines should always be required, though. I only mean that if the rest of your code is written to use things declared in the cstdlib header and in the std namespace, then those two lines need to appear first, and the compiler shouldn't act as though they are there when they really aren't.
Check whether your other compiler has some settings to disable this implicit code. If it doesn't, then it's probably a very, very old compiler, and you should consider not using it and not supporting it anymore.
Try refering to STL components by their namespace-qualified name (i.e. std::vector).
Doing a global 'using namespace std' is usually a bad idea.
Or maybe I'm not understanding the question.
The IAR compiler does not support the std namespace (I'm not sure why, because it does support namespaces in general if I remember right).
If you look in the runtime headers for IAR you'll see that they do some macro gymnastics to work around that (the runtime is licensed from Dinkumware, who provide runtimes for quite a few compilers).
You may need to do something similar if you want your stuff to work in multiple environments. A possible cleaner alternative is to just include the "using namespace std;" directive. I might be wrong, but I think the IAR compiler essentially ignored it (it didn't mind that you were using a namespace it didn't know about). A lot of people will think that's ugly, but sometimes you gotta do what the compiler you have wants you to do.
In general you should avoid "using namespace X", especially in header files (because everyone who includes your header gets that namespace too whether they want it or not), and especially for namespace std (because it's so big and the potential for name collisions is big).
Instead, in header files refer to names by their fully qualified form, e.g.:
// for plain functions
void foo(std::map<int> intMap);
// for classes
class person {
std::string name_;
public:
person(std::string name);
// ...
};
Then, in code files, you can do "using", but prefer using specific items in the namespace rather than pulling in the entire namespace. e.g.:
using std::map;
using std::string;
void foo(map<int> intMap) { ... };
person::person(string name) : name_(name) { ... };
etc. This way you avoid impacting others including your headers, and you avoid pulling in potentially zillions of names that might cause a collision with other stuff.

Resources