How to define tokens that can appear in multiple lexical modes in ANTLR4? - lexer

I am learning ANTLR4 and was trying to play with lexical modes. How can I have the same token appear in multiple lexical modes? As a very simple example, let's say my grammar has two modes, and I want to match white space and end-of-lines in both of them how can I do it without ending with WS_MODE1 and WS_MODE2 for example. Is there a way to reuse the same definition in both cases? I am hoping to get WS tokens in the output stream for all white space irrespective of the mode. The same applies to EOL and other keywords that can appear in both modes.

The rules have to have different names, but you can use the -> type(...) lexer command to give them the same type.
WS : [ \t]+;
mode Mode1;
Mode1_WS : WS -> type(WS);
mode Mode2;
Mode2_WS : WS -> type(WS);
Even though Mode1_WS and Mode2_WS are not fragment rules, the code generator will see the type command and know that you reassigned their types, so it will not define tokens for them.

Related

GNU M4: Define a rule that matches text, and operates on that matched text?

Suppose I have:
File:
[x]
And I would like to define m4 macro:
define(`\[.*\]`, ...)
Question: Is this possible and how does one do it?
It isn't possible as you can see in manual of m4:
3.1 Macro names
A name is any sequence of letters, digits, and the character ‘_’
(underscore), where the first character is not a digit. m4 will use
the longest such sequence found in the input. If a name has a macro
definition, it will be subject to macro expansion (see Macros). Names
are case-sensitive.
Examples of legal names are: ‘foo’, ‘_tmp’, and ‘name01’.
The [ and ] characters aren't legal in macro definition.
If you're feeling adventurous, maybe you could take a look a this experimental feature mentioned in GNU m4 1.4.18's info page :
An experimental feature, which would improve 'm4' usefulness, allows
for changing the syntax for what is a "word" in 'm4'. You should use:
./configure --enable-changeword
if you want this feature compiled in. The current implementation slows
down 'm4' considerably and is hardly acceptable. In the future, 'm4'
2.0 will come with a different set of new features that provide similar
capabilities, but without the inefficiencies, so changeword will go away
and _you should not count on it_.

Define a syntax region which depends on the indentation level

I'm trying to built a lighter syntax file for reStructuredText in Vim. In rst, literal blocks start when "::" is encountered at the end of a line:
I'll show you some code::
if foo = bar then
do_something()
end
Literal blocks end when indentation level is lowered.
But, literal blocks can be inside other structures that are indented but not literal:
.. important::
Some code for you inside this ".. important" directive::
Code comes here
Back to normal text, but it is indented with respect to ".. important".
So, the problem is: how to make a region that detects the indentation? I did that with the following rule:
syn region rstLiteralBlock start=/^\%(\.\.\)\#!\z(\s*\).*::$/ms=e-1 skip=/^$/ end=/^\z1\S/me=e-1
It works pretty fine but has a problem: any match or region that appear in line that should be matched by "start" takes over the syntax rules. Example:
Foo `this is a link and should be colored`_. Code comes here::
It will not make my rule work because there is a "link" rule that takes over the situation. This is because the ms and me matching parameters but I cannot take them off, because it would just color the whole line.
Any help on that?
Thanks!
By matching the text before the :: as the region's start, you're indeed preventing other syntax rules from applying there. I would solve this by positive lookbehind; i.e. only assert the rules for the text before the ::, without including it in the match. With this, you even don't need the ms=e-1, as the only thing that gets matched for the region start is the :: itself:
syn region rstLiteralBlock start=/\%(^\%(\.\.\)\#!\z(\s*\).*\)\#<=::$/ skip=/^$/ end=/^\z1\S/me=e-1
The indentation will still be captured by the \z(...\).

Where are line breaks allowed within Haskell expressions?

Background
Most style guides recommend keeping line lengths to 79 characters or less. In Haskell, indentation rules mean that expressions frequently need to be broken up with new lines.
Questions:
Within expressions, where is it legal to place a new line?
Is this documented somewhere?
Extended question: I see GHC formatting my code when it reports an error so someone has figured out how to automate the process of breaking long lines. Is there a utility that I can put haskell code into and have it spit that code back nicely formatted?
You can place a newline anywhere between lexical tokens of an expression. However, there are constraints about how much indentation may follow the newline. The easy rule of thumb is to indent the next line to start to the right of the line containing the expression. Beyond that, some style things:
If you are indenting an expression that appears in a definition name = expression, it's good style to indent to the right of the = sign.
If you are indenting an expression that appears on the right-hand side of a do binding or a list comprehension, it's good style to indent to the right of the <- sign.
The authoritative documentation is probably the Haskell 98 Report (Chapter 2 on lexical structure), but personally I don't find this material very easy to read.

How to enable syntax conceal only in certain contexts in Vim?

I want to conceal variables with names based on Greek symbols and turn them into their Unicode equivalent symbol, similarly to how vim-cute-python works. For instance, I have this:
syntax match scalaNiceKeyword "alpha" conceal cchar=α
defined in a file for concealing within Scala files which works great except that it's overly aggressive. If I write alphabet it then gets concealed to become αbet, which is noticeably wrong.
How can I modify or expand this conceal statement so that it only conceals keywords that match [ _]alpha[ _]? In other words, I want the following conversions:
alpha_1 => α_1
alpha => α
alphabet => alphabet
Note: This is similar to this question, however it seems like it's slightly more complicated since the group environment I want to match is spaces and underscores. Naively defining a syntax region like the following makes things all kinds of wrong:
syn region scalaGreekGroup start="[ _]" end="[ _]"
Thanks in advance!
Modify the pattern to match only the names delimited by word
boundaries or underscores:
:syntax match scalaNiceKeyword '\(_\|\<\)\zsalpha\ze\(\>\|_\)' conceal cchar=α
there's a script called unilatex.vim which defines imaps to do \alpha => α on opening/writing and backconversion on saving. I am using it for latex code and have modified it to drop backconversion as my latex compiler is able to do unicode right.
I don't know if scala source code can be unicode, but if it can be you might have a look at my version.

Treat macro arguments in Common Lisp as (case-sensitive) strings

(This is one of those things that seems like it should be so simple that I imagine there may be a better approach altogether)
I'm trying to define a macro (for CLISP) that accepts a variable number of arguments as symbols (which are then converted to case-sensitive strings).
(defmacro symbols-to-words (&body body)
`(join-words (mapcar #'symbol-name '(,#body))))
converts the symbols to uppercase strings, whereas
(defmacro symbols-to-words (&body body)
`(join-words (mapcar #'symbol-name '(|,#body|))))
treats ,#body as a single symbol, with no expansion.
Any ideas? I'm thinking there's probably a much easier way altogether.
The symbol names are uppercased during the reader step, which occurs before macroexpansion, and so there is nothing you can do with macros to affect that. You can globally set READTABLE-CASE, but that will affect all code, in particular you will have to write all standard symbols in uppercase in your source. There is also a '-modern' option for CLISP, which provides lowercased version for names of the standard library and sets the reader to be case-preserving, but it is itself non-standard. I have never used it myself so I am not sure what caveats actually apply.
The other way to control the reader is through reader macros. Common Lisp already has a reader macro implementing a syntax for case-sensitive strings: the double quote. It is hard to offer more advice without knowing why you are not just using it.
As Ramarren correctly says, the case of symbols is determined during read time. Not at macro expansion time.
Common Lisp has a syntax for specifying symbols without changing the case:
|This is a symbol| - using the vertical bar as multiple escape character.
and there is also a backslash - a single escape character:
CL-USER > 'foo\bar
|FOObAR|
Other options are:
using a different global readtable case
using a read macro which reads and preserves case
using a read macro which uses its own reader
Also note that a syntax for something like |,#body| (where body is spliced in) does not exist in Common Lisp. The splicing in does only work for lists - not symbol names. |, the vertical bar, surrounds character elements of a symbol. The explanation in the Common Lisp Hyperspec is a bit cryptic: Multiple Escape Characters.

Resources