How can I parse nested source files with ANTLR4? - antlr4

I've asked this question before (slightly differently) but didn't understand the answers enough at the time to give intelligent feedback (sigh).
I need to be able to include files inside other files at arbitrary points so I need to be able to have a stack of files with a single parse tree.
If I was writing this myself (and I have done this in the past), my parser would recognize the "Include xyz" or "Import abc", and would cause the lexer to suspend reading from the current file, push that file on a stack, and continue reading characters from the new file until exhausted.
However, when using ANTLR4 (where so far I've avoided inserting any code into the grammar file itself) and using the visitor pattern, all I see is the created tree which of course is too late.
I've found references to PUSHSTREAM as something that can be done in the lexer but I cannot find an actual example and would really appreciate some help (either a pointer to an actual example that I perhaps missed when searching or a short code sample if someone has one).
Note that I'm writing code in C++, not Java.
Thanks in advance

Years ago I developed a solution for ANTLR 2.7, to parse Windows resource files (*.rc). Such files are structured very much like C/C++ header files and support preprocessor directives like #if/#end/#pragma/#include.
For that I created a special character input stream (with a nested char input stream) which implements a stack based approach for include files. Whenever a new include directive is found in the char input a new stack entry is created with the current actual input stream, its position and line/column information (to provide local source locations, in case a parsing problem was found). That entry is pushed onto a stack and a new input stream is created. Once this is exhausted the TOS is popped off the stack and serving chars continued from the last position (after the #include statement). The lexer only sees a continuous stream of characters.

Related

How to Pipeline File Lines in NodeJS

I am new to programming NodeJS and want to load multi-line sections from a file into MongoDB. I have seen simple solution that reads the file into memory and parses into lines. This will fill my need but doesn't seem the "NodeJS" way of doing it. If I just wanted to quickly load the file, I would load IntelliJ and do it in Java. I want to learn asynchronous and pipelines.
I see the the basic steps are stream the file in chunks, parse the chunks into lines, group lines into sections, convert sections into JSON, and insert JSON into MongoDB.
I like the idea of pipelines since I can easily replace parts and reuse others. It also helps in this case since the slowest step is the last and the whole input file could be loaded into memory before the first MongoDB document is written.
I have searched for a good example but they seem to be missing parts and explanations of what I need to modify. I have seen you can easily pipe a file stream in chunks but I need lines. I have seen you can easily stream a file to a line parser but that is not a pipeline.
Any ideas on how to do this or good examples?
Thanks,
Wes.

Can antlr4 be used to parse very large gzip compressed files?

I am trying to parse very large gzip compressed (10+GB) file in python3. Instead of creating the parse tree, instead I used embedded actions based on the suggestions in this answer.
However, looking at the FileStream code it wants to read the entire file and then parse it. This will not work for big files.
So, this is a two part question.
Can ANTLR4 use a file stream, probably custom, that allows it to read chunks of the file at a time? What should the class interface look like?
Predicated on the above having "yes", would that class need to handle seek operations, which would be a problem if the underlying file is gzip compressed?
Short anser: no, not possible.
Long(er) answer: ANTLR4 can potentially use unlimited lookahead, so it relies on the stream to seek to any position with no delay or parsing speed will drop to nearly a hold. For that reason all runtimes use a normal file stream that reads in the entire file at once.
There were discussions/attempts in the past to create a stream that buffers only part of the input, but I haven't heard of anything that actually works.

How to find text strings in a .xxx file

I'm working on a program that needs to find a tag in a .xxx file to just tell me if it exists or not in the file. I've been doing quite a bit of troubleshooting but I've realized there are three key things I don't know:
What a .xxx file is
Where to find help on how to work with .xxx files (Google didn't return anything useful)
How to read a string out of a .xxx file
I'm looking for help with these 3 things - specifically the 3rd, but help on the other two would mean I don't have to ask more questions later! I'm not in need of troubleshooting help yet - I'm not too worried about making my code run at this moment. This is more for reference and general knowledge so I don't have to ask 100 more questions about tedious specifics later on.
So, if anyone out there knows anything about these three problems, or has any knowledge on .xxx files, can you help me out?
(If you happen to know the code to do this, I'm writing in C#)
If you're using ReadLines, then it assumes it's a text file with line endings. If you're trying to use that on a binary file, then it won't necessarily work. And the best you may get is a count of 0 or 1, if there's no line endings found in the binary file at all.
You'll have to load the bytes in that instances and do a more thorough search through the binary file for instances of your string.
But if you're only wanting to know if a LINE contains at least one instance (as you have written your code above), then it won't work for binary files where you can't guarantee line endings exist.

LiveCode File Creation

I'm not sure I'm asking this in the right place, but I've been working with LiveCode and I'm curious how the actual .livecode or .rev files get created. They look like some sort of mixed binary and LiveCode format. I've glanced through the source code, but it's not clear to me how the files are constructed.
Note that I'm talking about the project containers, not the standalones.
I'm also not sure that this is the right place to ask. It isn't really a programming question, even though it is related. I think that the stackfile format is binary, but parts appear in clear text because that's what they are. Everything that is unrecognisable can be two things. It can be a definition of a byte range, or it can be the description of the stack, card or control itself. This description can contain user data, including clear text, but also movie data, picture data, a unicode stream, etc. Encrypted stacks appear as binary data.
I would ask this question directly to RunRev...
To find out what happens when the file is saved, you have to look at the C++ functions inside the Livecode engine when the savestack message is sent and handled.
No other way to tell, so you have to ask those familiar with the innards of the engine.

filter lines starting with ; ussing batch

Hi I have a script (auto lisp AutoCAD) for a program. The rules of this script are that comments are started with ; character is it possible to write a batch that filters out all lines starting with ;. I namely then encrypt the file from a LSP to a FAS type which renders the commentary as useless (cant be read when encrypted) however AutoCAD still encrypts the text meaning a fairly heavy file size (double of what it should be). The current method is to manually delete every comment line by hand however try doing that a few hundred times. And I need the commentary in place to keep neat record of what’s happening because I work from the not encrypted lisp file its self.
All in all I also want the encryption because its my hard work and my right to keep this secure as it then also means more job security, it also allows me to block some smart alec self proclaimed staff making edditation and in edition the file encryption is recommended for stability reasons by AutoCAD its self.
All in all even if it was because I like to without good reason then that should be valid enough.
I’m looking to achieve this through a batch script as that one of few languages that I feel competent enough in… outside of the AutoCAD frame.
The following will convert a file named "source.lsp" and produce "noComment.lsp". It will strip out lines that start with a ; (including comment lines indented with spaces).
findstr /rvc:"^ *;" "source.lsp" >"noComment.lsp"

Resources