GNU M4: Define a rule that matches text, and operates on that matched text? - gnu

Suppose I have:
File:
[x]
And I would like to define m4 macro:
define(`\[.*\]`, ...)
Question: Is this possible and how does one do it?

It isn't possible as you can see in manual of m4:
3.1 Macro names
A name is any sequence of letters, digits, and the character ‘_’
(underscore), where the first character is not a digit. m4 will use
the longest such sequence found in the input. If a name has a macro
definition, it will be subject to macro expansion (see Macros). Names
are case-sensitive.
Examples of legal names are: ‘foo’, ‘_tmp’, and ‘name01’.
The [ and ] characters aren't legal in macro definition.

If you're feeling adventurous, maybe you could take a look a this experimental feature mentioned in GNU m4 1.4.18's info page :
An experimental feature, which would improve 'm4' usefulness, allows
for changing the syntax for what is a "word" in 'm4'. You should use:
./configure --enable-changeword
if you want this feature compiled in. The current implementation slows
down 'm4' considerably and is hardly acceptable. In the future, 'm4'
2.0 will come with a different set of new features that provide similar
capabilities, but without the inefficiencies, so changeword will go away
and _you should not count on it_.

Related

Is there a one-liner to tell vim/ctags autocompletion to search from the middle of a word?

In vim (in Insert mode, after running exuberant ctags), I am using ctrl-x followed by ctrl-] to bring up a dropdown of various possible words/tokens. It's a great feature.
The problem is that by default, this list starts with a bunch of numeric options and automatically inserts the first numeric option, and if I backspace to get rid of the numbers and start typing a part of a word fresh -- with the idea of searching from the middle of the word -- the autocompletion behavior exits entirely.
I know I could type the first letter of the word that I want, then go from there. But that assumes that I know the first letter of the word, which is not necessarily a given.
For example, if I'm working on a pair-programming project with a friend during a long weekend, I might not remember at any given moment whether he called his method promoteRecordStatus(), updateRecordStatus() or boostRecordStatus(). In this example, I would like to type RecordStatus and get the relevant result, which does not seem to be possible at a glance with the current behavior.
So with that scenario in mind: Is there a simple, vim-native way to tell the editor to start its autocompletion without any assumptions, then search all available tokens for my typed string in all parts of each token?
I will of course consider plugin suggestions helpful, but I would prefer a short, vim-native answer that doesn't require any plugins if possible. Ideally, the configuration could be set using just a line or two.
The built-in completions all require a match at the starting position. In some cases, you could drop separator characters from the 'iskeyword' option (e.g. in Vimscript, drop # to be able to complete individual components from foo#bar#BazFunction()), but this won't work for camelCaseWords at all.
Custom :help complete-functions can implement any completion search, though. To be based on the tags database, it would have to use taglist() as a source, and filter according to the completion base entered before triggering the completion. If you do not anchor this pattern match at the beginning, you have your desired completion.

Vim: Substitute only in syntax-selected text areas

The exact problem: I have a source in C++ and I need to replace a symbol name to some other name. However, I need that this replace the symbol only, not accidentally the same looking word in comments or text in "".
The source information what particular language section it is, is enough defined in the syntax highlighting rules. I know they can fail sometimes, but let's state this isn't a problem. I need some way to walk through all found occurrences of the phrase, then check in which section it is found, and if it's text or comment, this phrase should be skipped. Otherwise the replacement should be done either immediately, or by asking first, depending on well known c flag.
What I imagine would be at least theoretically possible is:
Having a kinda "callback" when doing substitution (called for each phrase found, and requesting the answer whether to substitute or not), or extract the list of positions where the phrase has been found, then iterate through all of them
Extract the name of the current "hi-linked" syntax highlighting rule, which is used to color the text at given position
Is it at all possible within the current features of vim?
Yes, with a :help sub-replace-expression, you can evaluate arbitrary expressions in the replacement part of :substitute. Vim's synID() and synstack() functions allow you to get the current syntax element.
Luc Hermitte has an implementation that omits replacement inside strings, here. You can easily adapt this to your use case.
With the help of my ingo-library plugin, you can define a short predicate function, e.g. matching comments and constants (strings, numbers, etc.):
function! CommentOrConstant()
return ingo#syntaxitem#IsOnSyntax(getpos('.'), '^\%(Comment\|Constant\)$')
endfunction
My PatternsOnText plugin now provides a :SubstituteIf command that works like :substitute, but also takes a predicate expression. With that, it's very easy to do a replacement anywhere except in comments or constants:
:%SubstituteIf/pattern/replacement/g !CommentOrConstant()

syntax highlighting for Assembler

I need to add support for assembler language I'm working with (it is not x86, 68K, or 8051 which are well supported by vim). I looked at the existing syntax files, and here are my questions
1) When does it really make sense to use syn keyword and syn match? My understanding is that the latter supports regex and gives more flexibility. On the other way, looking at /usr/share/vim/vim70/syntax/asmh8300.vim - they define opcodes in both keyword and match, what benefit does it really give?
2) Instructions in my Asm have a common format:
INSTR OP1, OP2 ..; i.e. space delimits the instruction name from operands.
I think for this I'm ok with only defining all Asm commands in 'keyword' since space symbol is by default in 'iskeyword'. Am I right ?
3) The Asm also supports C-style structures, enums and comments. Can I just borrow its syntax definition from c.vim or it won't work and requires some tweaking?
If you have a limited set of identifiers, and if they all consist solely of keyword characters, then :syn keyword is the best choice. You're right in that :syn match provides a superset of functionality. Essentially,
:syn keyword myGroup foobar
is equivalent to
:syn match myGroup "\<foobar\>"
Beware of old versions
The syntax/asmh8300.vim syntax you've referenced is from 2002, it may not be the best example of how to write a syntax file. (For example, it omits the \<...\> around its matches, what looks like a bug to me. And it still has compatibility stuff for Vim 5 / 6 that's not needed any more.)
Also, do you actually use Vim 7.0?! Vim 7.0 is from 2007 and very outdated. It should be possible to install the latest version 7.3; if you can't find a proper package for your distribution (for Windows, check the binaries from the Cream project, it's also not very difficult to compile (e.g. from the Mercurial sources) on Linux.
Borrow other syntax elements
If other syntaxes are embedded in your syntax, and clearly delimited (e.g. like JavaScript inside HTML), you can :syn include it. But if there are just similar constructs, it's best to copy-and-paste them into your syntax (and adapt at least the group names). You need to be careful to catch all contained syntax groups, too; together with syntax clusters, the hierarchy can be quite complex!
More tips
When writing a syntax, you often need to find out which syntax group causes the highlighting. :syn list shows all active groups, but it's easier when you install the SyntaxAttr.vim - Show syntax highlighting attributes of character under cursor plugin.

Treat macro arguments in Common Lisp as (case-sensitive) strings

(This is one of those things that seems like it should be so simple that I imagine there may be a better approach altogether)
I'm trying to define a macro (for CLISP) that accepts a variable number of arguments as symbols (which are then converted to case-sensitive strings).
(defmacro symbols-to-words (&body body)
`(join-words (mapcar #'symbol-name '(,#body))))
converts the symbols to uppercase strings, whereas
(defmacro symbols-to-words (&body body)
`(join-words (mapcar #'symbol-name '(|,#body|))))
treats ,#body as a single symbol, with no expansion.
Any ideas? I'm thinking there's probably a much easier way altogether.
The symbol names are uppercased during the reader step, which occurs before macroexpansion, and so there is nothing you can do with macros to affect that. You can globally set READTABLE-CASE, but that will affect all code, in particular you will have to write all standard symbols in uppercase in your source. There is also a '-modern' option for CLISP, which provides lowercased version for names of the standard library and sets the reader to be case-preserving, but it is itself non-standard. I have never used it myself so I am not sure what caveats actually apply.
The other way to control the reader is through reader macros. Common Lisp already has a reader macro implementing a syntax for case-sensitive strings: the double quote. It is hard to offer more advice without knowing why you are not just using it.
As Ramarren correctly says, the case of symbols is determined during read time. Not at macro expansion time.
Common Lisp has a syntax for specifying symbols without changing the case:
|This is a symbol| - using the vertical bar as multiple escape character.
and there is also a backslash - a single escape character:
CL-USER > 'foo\bar
|FOObAR|
Other options are:
using a different global readtable case
using a read macro which reads and preserves case
using a read macro which uses its own reader
Also note that a syntax for something like |,#body| (where body is spliced in) does not exist in Common Lisp. The splicing in does only work for lists - not symbol names. |, the vertical bar, surrounds character elements of a symbol. The explanation in the Common Lisp Hyperspec is a bit cryptic: Multiple Escape Characters.

Why doesn't Vims errorformat take regular expressions?

Vims errorformat (for parsing compile/build errors) uses an arcane format from c for parsing errors.
Trying to set up an errorformat for nant seems almost impossible, I've tried for many hours and can't get it. I also see from my searches that alot of people seem to be having the same problem. A regex to solve this would take minutesto write.
So why does vim still use this format? It's quite possible that the C parser is faster but that hardly seems relevant for something that happens once every few minutes at most. Is there a good reason or is it just an historical artifact?
It's not that Vim uses an arcane format from C. Rather it uses the ideas from scanf, which is a C function. This means that the string that matches the error message is made up of 3 parts:
whitespace
characters
conversion specifications
Whitespace is your tabs and spaces. Characters are the letters, numbers and other normal stuff. Conversion specifications are sequences that start with a '%' (percent) character. In scanf you would typically match an input string against %d or %f to convert to integers or floats. With Vim's error format, you are searching the input string (error message) for files, lines and other compiler specific information.
If you were using scanf to extract an integer from the string "99 bottles of beer", then you would use:
int i;
scanf("%d bottles of beer", &i); // i would be 99, string read from stdin
Now with Vim's error format it gets a bit trickier but it does try to match more complex patterns easily. Things like multiline error messages, file names, changing directory, etc, etc. One of the examples in the help for errorformat is useful:
1 Error 275
2 line 42
3 column 3
4 ' ' expected after '--'
The appropriate error format string has to look like this:
:set efm=%EError\ %n,%Cline\ %l,%Ccolumn\ %c,%Z%m
Here %E tells Vim that it is the start of a multi-line error message. %n is an error number. %C is the continuation of a multi-line message, with %l being the line number, and %c the column number. %Z marks the end of the multiline message and %m matches the error message that would be shown in the status line. You need to escape spaces with backslashes, which adds a bit of extra weirdness.
While it might initially seem easier with a regex, this mini-language is specifically designed to help with matching compiler errors. It has a lot of shortcuts in there. I mean you don't have to think about things like matching multiple lines, multiple digits, matching path names (just use %f).
Another thought: How would you map numbers to mean line numbers, or strings to mean files or error messages if you were to use just a normal regexp? By group position? That might work, but it wouldn't be very flexible. Another way would be named capture groups, but then this syntax looks a lot like a short hand for that anyway. You can actually use regexp wildcards such as .* - in this language it is written %.%#.
OK, so it is not perfect. But it's not impossible either and makes sense in its own way. Get stuck in, read the help and stop complaining! :-)
I would recommend writing a post-processing filter for your compiler, that uses regular expressions or whatever, and outputs messages in a simple format that is easy to write an errorformat for it. Why learn some new, baroque, single-purpose language unless you have to?
According to :help quickfix,
it is also possible to specify (nearly) any Vim supported regular
expression in format strings.
However, the documentation is confusing and I didn't put much time into verifying how well it works and how useful it is. You would still need to use the scanf-like codes to pull out file names, etc.
They are a pain to work with, but to be clear: you can use regular expressions (mostly).
From the docs:
Pattern matching
The scanf()-like "%*[]" notation is supported for backward-compatibility
with previous versions of Vim. However, it is also possible to specify
(nearly) any Vim supported regular expression in format strings.
Since meta characters of the regular expression language can be part of
ordinary matching strings or file names (and therefore internally have to
be escaped), meta symbols have to be written with leading '%':
%\ The single '\' character. Note that this has to be
escaped ("%\\") in ":set errorformat=" definitions.
%. The single '.' character.
%# The single '*'(!) character.
%^ The single '^' character. Note that this is not
useful, the pattern already matches start of line.
%$ The single '$' character. Note that this is not
useful, the pattern already matches end of line.
%[ The single '[' character for a [] character range.
%~ The single '~' character.
When using character classes in expressions (see |/\i| for an overview),
terms containing the "\+" quantifier can be written in the scanf() "%*"
notation. Example: "%\\d%\\+" ("\d\+", "any number") is equivalent to "%*\\d".
Important note: The \(...\) grouping of sub-matches can not be used in format
specifications because it is reserved for internal conversions.
lol try looking at the actual vim source code sometime. It's a nest of C code so old and obscure you'll think you're on an archaeological dig.
As for why vim uses the C parser, there are plenty of good reasons starting with that it's pretty universal. But the real reason is that sometime in the past 20 years someone wrote it to use the C parser and it works. No one changes what works.
If it doesn't work for you the vim community will tell you to write your own. Stupid open source bastards.

Resources