(This is one of those things that seems like it should be so simple that I imagine there may be a better approach altogether)
I'm trying to define a macro (for CLISP) that accepts a variable number of arguments as symbols (which are then converted to case-sensitive strings).
(defmacro symbols-to-words (&body body)
`(join-words (mapcar #'symbol-name '(,#body))))
converts the symbols to uppercase strings, whereas
(defmacro symbols-to-words (&body body)
`(join-words (mapcar #'symbol-name '(|,#body|))))
treats ,#body as a single symbol, with no expansion.
Any ideas? I'm thinking there's probably a much easier way altogether.
The symbol names are uppercased during the reader step, which occurs before macroexpansion, and so there is nothing you can do with macros to affect that. You can globally set READTABLE-CASE, but that will affect all code, in particular you will have to write all standard symbols in uppercase in your source. There is also a '-modern' option for CLISP, which provides lowercased version for names of the standard library and sets the reader to be case-preserving, but it is itself non-standard. I have never used it myself so I am not sure what caveats actually apply.
The other way to control the reader is through reader macros. Common Lisp already has a reader macro implementing a syntax for case-sensitive strings: the double quote. It is hard to offer more advice without knowing why you are not just using it.
As Ramarren correctly says, the case of symbols is determined during read time. Not at macro expansion time.
Common Lisp has a syntax for specifying symbols without changing the case:
|This is a symbol| - using the vertical bar as multiple escape character.
and there is also a backslash - a single escape character:
CL-USER > 'foo\bar
|FOObAR|
Other options are:
using a different global readtable case
using a read macro which reads and preserves case
using a read macro which uses its own reader
Also note that a syntax for something like |,#body| (where body is spliced in) does not exist in Common Lisp. The splicing in does only work for lists - not symbol names. |, the vertical bar, surrounds character elements of a symbol. The explanation in the Common Lisp Hyperspec is a bit cryptic: Multiple Escape Characters.
Related
I need a function that behaves similar to the behavior of sscanf
For example, let's suppose we have a format string that looks like this (the function I'm looking for doesn't have to be exactly like this, but something similar)
"This is normal text that has to exactly match, but here is a ${var}"
And have return/modify a variable to look like
{'var': <whatever was there>}
After researching this for a while, the only things I could actually find was scanf, but that takes input form stdin, and not a string
I am aware that there is a regex solution for this, but I'm looking for a function that does this without the need for regex (regex is slow). However, if there is no other solution for this, I will accept a regex solution.
The normal solution for this in most languages that have regular expressions built-in is to use regular expressions.
If you're not used to or don't like regular expressions I'm sorry. Most of the programming world have assumed that knowledge of regular expressions is mandatory.
In any case. The normal solution to this is string.prototype.match:
let text = get_string_to_scan();
let match = text.match(/This is normal text that has to exactly match, but here is a (.+)/);
if (match) { // match is null if no match is found
// The result you want is in match[1]
console.log('value of var is:', match[1]);
}
What pattern you put in your capture group (the (..) part) depends on what you want. The code above captures anything at all including spaces and special characters.
If you just want to capture a "word", that is, printable characters without spaces, then you can use (\w+):
text.match(/This is normal text that has to exactly match, but here is a (\w+)/)
If you want to capture a word with only letters but not numbers you can use ([a-zA-Z]+):
text.match(/This is normal text that has to exactly match, but here is a ([a-zA-Z]+)/)
The flexibility of regular expression is why other methods of string scanning are usually not supported in languages that have had regular expression built-in since the beginning. But of course, flexibility comes with complexity.
Do you mean to have the ${var} to act as a placeholder? If so you could do it by replacing the " with the backtick:
console.log(`This is normal text that has to exactly match, but here is a ${"whatever was there"}`)
The exact problem: I have a source in C++ and I need to replace a symbol name to some other name. However, I need that this replace the symbol only, not accidentally the same looking word in comments or text in "".
The source information what particular language section it is, is enough defined in the syntax highlighting rules. I know they can fail sometimes, but let's state this isn't a problem. I need some way to walk through all found occurrences of the phrase, then check in which section it is found, and if it's text or comment, this phrase should be skipped. Otherwise the replacement should be done either immediately, or by asking first, depending on well known c flag.
What I imagine would be at least theoretically possible is:
Having a kinda "callback" when doing substitution (called for each phrase found, and requesting the answer whether to substitute or not), or extract the list of positions where the phrase has been found, then iterate through all of them
Extract the name of the current "hi-linked" syntax highlighting rule, which is used to color the text at given position
Is it at all possible within the current features of vim?
Yes, with a :help sub-replace-expression, you can evaluate arbitrary expressions in the replacement part of :substitute. Vim's synID() and synstack() functions allow you to get the current syntax element.
Luc Hermitte has an implementation that omits replacement inside strings, here. You can easily adapt this to your use case.
With the help of my ingo-library plugin, you can define a short predicate function, e.g. matching comments and constants (strings, numbers, etc.):
function! CommentOrConstant()
return ingo#syntaxitem#IsOnSyntax(getpos('.'), '^\%(Comment\|Constant\)$')
endfunction
My PatternsOnText plugin now provides a :SubstituteIf command that works like :substitute, but also takes a predicate expression. With that, it's very easy to do a replacement anywhere except in comments or constants:
:%SubstituteIf/pattern/replacement/g !CommentOrConstant()
I am working on a processor that parts texts into blocks with marks:
LOREM IPSUM SED AMED
will be parsed like:
{word:1}LOREM{/word:1}{space:2}
{word:3}IPSUM{/word:3}{space:4}
{word:5}SED{/word:5}{space:6}
{word:7}AMED{/word:7}
But I dont want to use "{word}" etc, because it causes processor down, because it is an string again... I need to mark like these:
\E002\0001 LOREM \E003\0001 \E004\0002
\E002\0003 IPSUM \E003\0004 \E004\0005
\E002\0006 SED \E003\0006 \E004\0007
\E002\0008 AMED \E003\0008
First \E002 means element type number, its last bit represent element's close. So element number increments with +2.
Second \0001 means element index for stacking.
I am just used \E002 irrelevantly for this example.
But \0001 also using in Unicode Range, and this leads me to where I start again...
So which unicode range can I use? \ff0000? or how can I solve this?
Thanks!
The Unicode Consortium thought of this. There is a range of Unicode code points that are meant to never represent a displayable character, but meta-codes instead:
Noncharacters are code points that are permanently reserved and will never have characters
assigned to them.
...
Tag characters were intended to support a general scheme for the internal tagging of text
streams in the absence of other mechanisms, such as markup languages. The use of tag
characters for language tagging is deprecated.
(http://www.unicode.org/versions/Unicode9.0.0/ch23.pdf)
You should be able to use regular control characters as "private" tags, because these should never occur in proper strings. This would be the range from U+0000 to U+001F, excluding tab (U+0009), the common "returns" (U+000A and U+000D), and, for safety, U+0000 itself (some libraries do not like Null characters in the middle of strings).
Non-characters
Noncharacters are code points that are permanently reserved in the Unicode Standard for
internal use. They are not recommended for use in open interchange of Unicode text data.
You can use U+FEFF (which is currently officially defined as Not-A-Character), or U+FFFE and U+FFFF. There are several more "officially not-a-characters" defined, and you can be fairly sure they would not occur in regular text strings.
A few random sequences with predefined definitions, and so highly unlikely to occur in plain text strings are:
Specials: U+FFF0–U+FFF8
The nine unassigned Unicode code points in the range U+FFF0..U+FFF8 are reserved for
special character definitions.
Annotation Characters: U+FFF9–U+FFFB
An interlinear annotation consists of annotating text that is related to a sequence of annotated
characters. For all regular editing and text-processing algorithms, the annotated characters
are treated as part of the text stream. The annotating text is also part of the content,
but for all or some text processing, it does not form part of the main text stream.
Tag Characters: U+E0000–U+E007F
This block encodes a set of 95 special-use tag characters to enable the spelling out of ASCIIbased
string tags using characters that can be strictly separated from ordinary text content
characters in Unicode.
(all quotations from the chapter as above)
Staying within conventions, you can also use U+2028 (line separator) and/or U+2029 paragraph separator.
Technically, your use of U+E000–U+F8FF (the "Private Use Area") is okay-ish, because these code points only can define an unambiguous character in combination with a certain font. However, it is possible these codes may pop up if you get your plain text from a source where the font was included.
As for how to encode this into your strings: it doesn't really matter if the numerical code immediately following your private tag marker is a valid Unicode character or not. If you see one of your own tag markers, then the value immediately following is always your own private sequence number.
As you see, there are lots of possibilities. I guess the most important criterium is whether you want to use other functions on these strings. If you create a string that is technically invalid Unicode (for instance, because it includes not-a-character values), some external functions may choose to fail to work on them, or silently remove the bad values. In such a case, you'd need to rigorously stick to a system in which you only use 'valid' code points.
Another question on Emacs 24.1 and Haskell. I've noticed that it does indenting for me and it does very basic highlighting for me (types are in green, for example). But out-of-box Emacs 24.1 doesn't highlight commonly used functions like foldr, map, etc. Is there an ability for Emacs and haskell-mode to highlight commonly used functions?
Fundamentally, standard library functions are just that--functions. In fact, depending on your imports, any of them could be user-supplied rather than from the standard prelude! This actually happens often--for example, if you want to use Control.Category you usually hide id and replace it with a polymorphic version.
So in short, there is no real reason to highlight standard functions. So I really doubt this functionality is present in the standard Haskell mode.
That said, this is Emacs. You can easily add anything you want. If you have a list of all the function names you want to highlight, it should not be difficult to add that to the Haskell mode.
You can add your new functions to the haskell-mode highlighting with code something like this in your .emacs file:
(font-lock-add-keywords 'haskell-mode
'(("\\<\\(map\\|foldr\\|foldl\\)\\>" 1
'(:foreground "#3366FF") t)))
The weird looking string is an Emacs-style regular expression. \< and \> are like \b and \(, \| and \) are for alternation inside a group. Since there are no regex literals, every \ has to be escaped inside the string. The regex would be more readable as \<\(map\|foldr\|foldl\)\>. You can easily add other function names by adding new cases to the expression.
The (:foreground "#3366FF") just sets the color of the text to a rather fetching shade of blue.
Vims errorformat (for parsing compile/build errors) uses an arcane format from c for parsing errors.
Trying to set up an errorformat for nant seems almost impossible, I've tried for many hours and can't get it. I also see from my searches that alot of people seem to be having the same problem. A regex to solve this would take minutesto write.
So why does vim still use this format? It's quite possible that the C parser is faster but that hardly seems relevant for something that happens once every few minutes at most. Is there a good reason or is it just an historical artifact?
It's not that Vim uses an arcane format from C. Rather it uses the ideas from scanf, which is a C function. This means that the string that matches the error message is made up of 3 parts:
whitespace
characters
conversion specifications
Whitespace is your tabs and spaces. Characters are the letters, numbers and other normal stuff. Conversion specifications are sequences that start with a '%' (percent) character. In scanf you would typically match an input string against %d or %f to convert to integers or floats. With Vim's error format, you are searching the input string (error message) for files, lines and other compiler specific information.
If you were using scanf to extract an integer from the string "99 bottles of beer", then you would use:
int i;
scanf("%d bottles of beer", &i); // i would be 99, string read from stdin
Now with Vim's error format it gets a bit trickier but it does try to match more complex patterns easily. Things like multiline error messages, file names, changing directory, etc, etc. One of the examples in the help for errorformat is useful:
1 Error 275
2 line 42
3 column 3
4 ' ' expected after '--'
The appropriate error format string has to look like this:
:set efm=%EError\ %n,%Cline\ %l,%Ccolumn\ %c,%Z%m
Here %E tells Vim that it is the start of a multi-line error message. %n is an error number. %C is the continuation of a multi-line message, with %l being the line number, and %c the column number. %Z marks the end of the multiline message and %m matches the error message that would be shown in the status line. You need to escape spaces with backslashes, which adds a bit of extra weirdness.
While it might initially seem easier with a regex, this mini-language is specifically designed to help with matching compiler errors. It has a lot of shortcuts in there. I mean you don't have to think about things like matching multiple lines, multiple digits, matching path names (just use %f).
Another thought: How would you map numbers to mean line numbers, or strings to mean files or error messages if you were to use just a normal regexp? By group position? That might work, but it wouldn't be very flexible. Another way would be named capture groups, but then this syntax looks a lot like a short hand for that anyway. You can actually use regexp wildcards such as .* - in this language it is written %.%#.
OK, so it is not perfect. But it's not impossible either and makes sense in its own way. Get stuck in, read the help and stop complaining! :-)
I would recommend writing a post-processing filter for your compiler, that uses regular expressions or whatever, and outputs messages in a simple format that is easy to write an errorformat for it. Why learn some new, baroque, single-purpose language unless you have to?
According to :help quickfix,
it is also possible to specify (nearly) any Vim supported regular
expression in format strings.
However, the documentation is confusing and I didn't put much time into verifying how well it works and how useful it is. You would still need to use the scanf-like codes to pull out file names, etc.
They are a pain to work with, but to be clear: you can use regular expressions (mostly).
From the docs:
Pattern matching
The scanf()-like "%*[]" notation is supported for backward-compatibility
with previous versions of Vim. However, it is also possible to specify
(nearly) any Vim supported regular expression in format strings.
Since meta characters of the regular expression language can be part of
ordinary matching strings or file names (and therefore internally have to
be escaped), meta symbols have to be written with leading '%':
%\ The single '\' character. Note that this has to be
escaped ("%\\") in ":set errorformat=" definitions.
%. The single '.' character.
%# The single '*'(!) character.
%^ The single '^' character. Note that this is not
useful, the pattern already matches start of line.
%$ The single '$' character. Note that this is not
useful, the pattern already matches end of line.
%[ The single '[' character for a [] character range.
%~ The single '~' character.
When using character classes in expressions (see |/\i| for an overview),
terms containing the "\+" quantifier can be written in the scanf() "%*"
notation. Example: "%\\d%\\+" ("\d\+", "any number") is equivalent to "%*\\d".
Important note: The \(...\) grouping of sub-matches can not be used in format
specifications because it is reserved for internal conversions.
lol try looking at the actual vim source code sometime. It's a nest of C code so old and obscure you'll think you're on an archaeological dig.
As for why vim uses the C parser, there are plenty of good reasons starting with that it's pretty universal. But the real reason is that sometime in the past 20 years someone wrote it to use the C parser and it works. No one changes what works.
If it doesn't work for you the vim community will tell you to write your own. Stupid open source bastards.