Antlr4 switch the lexer mode programatically

Antlr4 switch the lexer mode programatically - antlr4

I would like to be able to define programmatically to which mode the lexer should switch. Say if I have a grammar that parses something like php code.
foo bar <?php ... ?>
but in my case it will support more than one language, say
foo bar <?php ... ?>
baz <?cpp ... ?>
blah <?java ... ?>
the problem is that I would like to have one of the languages to be by default, but which one could be determined only programmatically, say based on the file extension. Or if the file is something.cpp-tmpl
<? ... ?>
to switch to cpp mode, for the same construction in file with name something.java-tmpl to switch to java mode.
Can I, and if yes, how I can do that?

The short answer is yes. Look at org.antlr.v4.runtime.Lexer.java for a full complement of methods for programmatically managing lexer modes. These are implicitly the mode control statements surfaced in Antlr lexer grammars.

Related

Switch Case Conditional output in Latex

I want to write a report which has a structure like this:
\begin{document}
\input[option=a]{class}
\input[option=b]{class}
\input[option=c]{class}
\input[option=d]{class}
\end{document}
class.tex has content like this:
here are some shared content
switch(option)
case a
some text a
case b
some text b
case c
some text c
case d
some text d
endswitch
Here maybe more shared content.
Is there any way to do this in Latex?

A simplified way of doing this could be with logic statements using if else fi logic
at the top of the .tex file set up a switch with
\newif\ifswitch
The default value will be false. To set the value to be true use
\switchtrue
Then in the text of the document use
\ifswitch
<<text to include if switch is true>>
\else
<<text to include if switch is false>>
\fi % ends the if statement
So for your particular question you could have a set of switches
\newifConditionA
\newifConditionB
\newifConditionC
\newifConditionD
This is not as elegant as using a switch statement, but allows conditions where you want text from A and C at the same time for example.
Reference where this is discussed is here for
two versions of a document with 'if else' logic statements

You can use the following (crude) method of identifying the textual components between which you want to extract stuff from a file:
\documentclass{article}
\usepackage{filecontents}
\begin{filecontents*}{class.tex}
switch(option)
case a
some text a
case b
some text b
case c
some text c
case d
some text d
endswitch
\end{filecontents*}
\usepackage{catchfile}
% \inputclass{<file>}{<from>}{<to>}
\newcommand{\inputclass}[2]{%
\CatchFileDef{\class}{class.tex}{}%
\long\def\classsegment##1#1 ##2 #2##3\relax{##2}%
\show\classsegment
\expandafter\classsegment\class\relax
}
\begin{document}
\inputclass{case c}{case d}
\inputclass{case a}{case b}
\inputclass{case d}{endswitch}
\inputclass{case b}{case c}
\end{document}
Related:
How to extract information between two unique words in a large text file
How to extract data between two different xml tags
\input only part of a file
The last one is a more adaptable approach using the catchfilebetweentags package. This requires the insertion of appropriate tags within your code, which might not be as helpful. You could also use listings to include specific lines of code from an external file.

As I understand it, what you want is to update the function for each different part of the text, while defining the function in just one place.
The easy way to do this is to renew a variable command at the start of each section.
At start:
\newcommand{\VARIABLENAME}{VARIABLE_1}
At section:
\renewcommand{\VARIABLENAME}{VARIABLE_2}
There are more advanced ways of doing this as well, involving defining variables but for all it is worth, this is more readable and simpler to implement.
Note: If you are planning to make something more dynamic then just a class, I recommend implementing something in another language such as python to write the file in LaTex as it usually gives a lot more room for modification.

ANTLR get first production

I'm using ANTLR4 and, in particular, the C grammar available in their repo (grammar). It seems that the grammar hasn't an initial rule, so I was wondering how it's possible to get it. In fact, once initialized the parser, I attach my listener, but I obtain syntax errors since I'm trying to parse two files with different code instructions:
int a;
int foo() { return 0; }
In my example I call the parser with "parser.primaryExpression();" which is the first production of the "g4" file. Is it possible to avoid to call the first production and get it automatically by ANTLR instead?

In addition to #GRosenberg's answer:
Also the rule enum (in the generated parser) contains entries for each rule in the order they appear in the grammar and the first rule has the value 0. However, just because it's the first rule in the grammar doesn't mean that it is the main entry point. Only the grammar author knows what the real entry is and sometimes you might even want to parse only with a subrule, which makes this decision even harder.

ANTLR provides no API to obtain the first rule. However, in the parser as generated, the field
public static final String[] ruleNames = ....;
lists the rulenames in the order of occurrence in the grammar. With reflection, you can access the method.
Beware. Nothing in the Antlr 'spec' defines this ordering. Simply has been true to date.

What is the usage of Nested comments in some programming languages?

Why nested comments use by some programming languages such as MATLAB ,I just want to know usage of this kind comments in a program and what are the advantages we can gain by using this nested comments ?

The answer is nested comments allows commented-out code that contains comments itself
example in C++ has block comments delimited by /../ that can span multiple lines and line comments delimited by //.

Usually, coding standards for a particular project or program have rules about which comment style to use when; a common convention is to use block comments (/* */) for method and class documentation, and inline comments (//) for remarks inside method bodies and such, e.g.:
/**
* Helper class to store Foo objects inside a bar.
*/
public class Foobar {
/**
* Stores a Foo in this Foobar's bar, unless the bar already contains
* an equivalent Foo.
* Returns the number of Foos added (always 0 or 1).
*/
public int storeFoo(Foo foo) {
// Don't add a foo we already have!
if (this.bar.contains(foo)) {
return 0;
}
// OK, we don't have this foo yet, so we'll add it.
this.bar.append(foo);
return 1;
}
}
If someone wants to temporarily disable entire methods or classes in the above program.It's very helpful, if that language allows nested comments.

You can use comments...:
to temporally disable some lines of code.
as titles for sections.
to comment each line.
to add some notations or comments on other comments.
to send macro orders.
And you can mix all of them. That's why we need different ways to mark comments and create nested comments.

Good old Turbo Pascal aka Borland Pascal allows multi-line comments either with curly braces { } or with parenthesis star (* *), which nest independently of one another even though multi-line comments in the same style do not nest.
A good workaround from my old work place was use of typical brace { } comments for all informational comments and specialized use of the less common parenthesis star (* *) only to comment out code. Marking the middle lines of commented out code with something like ** is still a decent idea, and macros can be used to achieve this in programmer editors
function ComputeCost(var x : longint);
{ Wide version: Apply discounts to raw price.}
(* CODE GRAVEYARD!
** function ComputeCost(var x : integer);
** {Apply discounts to raw price.}
*)
Minimalists will always discount the need for nested comments by saying that C style languages allow constructs like #ifdef SOMETHING or the elegantly short #if 0 to disable code. True minimalists want old code removed completely and say version control takes the place of keeping old code. A good counter is that commented out code together with programmer editors with folding support, e.g. Vim, allows visually stepping over dead code while keeping it for reference.

I feel that nested comments are not necessary! In general a comment is omitted by the compiler so comments serve a main purpose for indicating the programmer what he had done or a new programmer to know the flow of the program..why unnecessarily nest comments..just an indication that can be without nesting.. eg:
for(;;)
{
if()
{
}
}/* a loop with an if condition*/
**need not be as**
/*a loop/*if condition*/for n times*/

escaped Ambersand in JSF i18n Resource Bundle

i have something like
<s:link view="/member/index.xhtml" value="My News" propagation="none"/>
<s:link view="/member/index.xhtml" value="#{msg.myText}" propagation="none"/>
where the value of myText in the messages.properties is
myText=My News
The first line of the example works fine and replaces the text to "My News", but the second that uses a value from the resource bundle escapes the ambersand, too "My&#160;News".
I tried also to use unicode escape sequences for the ambersand and/or hash with My\u0026\u0023160;News, My\u0026#160;News and My\u0026nbsp;News in the properties file without success.
(Used css no-wrap instead of the previous used xml encoding, but would be interested anyway)

EDIT - Answer to clarified question
The first is obviously inline, so interpreter knows that this is safe.
The second one comes from external source (you are using Expression Language) and as such is not safe and need to be escaped. The result of escaping would be as you wrote, basically it will show you the exact value of HTML entity.
This is related to security (XSS for example) and not necessary i18n.
Previous attempt
I don't quite know what you are asking for but I believe it is "how to display it?".
Most of the standard JSF controls contain escape attribute that if set to false won't escape the text. Unfortunately it seems that you are using something like SeamTools which does not have this attribute.
Well, in this case there is not much to be done. Unless you could use standard control, maybe you should go and try to actually save your properties file as Unicode (UTF-16 BigEndian in fact) and simply put valid Unicode non-breaking space character. Theoretically that should work; Unicode-encoded properties files are supported in latest version of Java (although I cannot recall if it was Java SE 5 or Java SE 6)...

lexer/parser ambiguity

How does a lexer solve this ambiguity?
/*/*/
How is it that it doesn't just say, oh yeah, that's the begining of a multi-line comment, followed by another multi-line comment.
Wouldn't a greedy lexer just return the following tokens?
/*
/*
/
I'm in the midst of writing a shift-reduce parser for CSS and yet this simple comment thing is in my way. You can read this question if you wan't some more background information.
UPDATE
Sorry for leaving this out in the first place. I'm planning to add extensions to the CSS language in this form /* # func ( args, ... ) */ but I don't want to confuse an editor which understands CSS but not this extension comment of mine. That's why the lexer just can't ignore comments.

One way to do it is for the lexer to enter a different internal state on encountering the first /*. For example, flex calls these "start conditions" (matching C-style comments is one of the examples on that page).

The simplest way would probably be to lex the comment as one single token - that is, don't emit a "START COMMENT" token, but instead continue reading in input until you can emit a "COMMENT BLOCK" token that includes the entire /*(anything)*/ bit.
Since comments are not relevant to the actual parsing of executable code, it's fine for them to basically be stripped out by the lexer (or at least, clumped into a single token). You don't care about token matches within a comment.

In most languages, this is not ambiguous: the first slash and asterix are consumed to produce the "start of multi-line comment" token. It is followed by a slash which is plain "content" within the comment and finally the last two characters are the "end of multi-line comment" token.
Since the first 2 characters are consumed, the first asterix cannot also be used to produce an end of comment token. I just noted that it could produce a second "start of comment" token... oops, that could be a problem, depending on the amount of context is available for the parser.
I speak here of tokens, assuming a parser-level handling of the comments. But the same applies to a lexer, whereby the underlying rule is to start with '/*' and then not stop till '*/' is found. Effectively, a lexer-level handling of the whole comment wouldn't be confused by the second "start of comment".

Since CSS does not support nested comments, your example would typically parse into a single token, COMMENT.
That is, the lexer would see /* as a start-comment marker and then consume everything up to and including a */ sequence.

Use the regexp's algorithm, search from the beginning of the string working way back to the current location.
if (chars[currentLocation] == '/' and chars[currentLocation - 1] == '*') {
for (int i = currentLocation - 2; i >= 0; i --) {
if (chars[i] == '/' && chars[i + 1] == '*') {
// .......
}
}
}
It's like applying the regexp /\*([^\*]|\*[^\/])\*/ greedy and bottom-up.

One way to solve this would be to have your lexer return:
/
*
/
*
/
And have your parser deal with it from there. That's what I'd probably do for most programming languages, as the /'s and *'s can also be used for multiplication and other such things, which are all too complicated for the lexer to worry about. The lexer should really just be returning elementary symbols.
If what the token is starts to depend too much on context, what you're looking for may very well be a simpler token.
That being said, CSS is not a programming language so /'s and *'s can't be overloaded. Really afaik they can't be used for anything else other than comments. So I'd be very tempted to just pass the whole thing as a comment token unless you have a good reason not to: /\*.*\*/

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Antlr4 switch the lexer mode programatically - antlr4

The short answer is yes. Look at org.antlr.v4.runtime.Lexer.java for a full complement of methods for programmatically managing lexer modes. These are implicitly the mode control statements surfaced in Antlr lexer grammars.

Related

Switch Case Conditional output in Latex

ANTLR get first production

What is the usage of Nested comments in some programming languages?

escaped Ambersand in JSF i18n Resource Bundle

lexer/parser ambiguity

Categories

Resources