This is a TeX legacy issue---it would have made more sense to require a whitespace when a whitespace is desired: 12,123 is probably a number, while 12, 123 is probably a list. Alas, it is what it is.
Related to MathJax rendering of commas in numbers, where the solution is suppression of spaces via {,}. Works, but inconvenient. Is there a way to make this automatic?
The hack in https://github.com/mathjax/MathJax/issues/169#issuecomment-2040235 is concerned with European vs Anglo. The equivalent hack,
<script type="text/x-mathjax-config">
MathJax.Hub.Register.StartupHook("TeX Jax Ready",function () {
MathJax.InputJax.TeX.Definitions.number =
/^(?:[0-9]+(?:\,[0-9]{3})*(?:\{\.\}[0-9]*)*|\{\.\}[0-9]+)/
});
</script>
solves the comma problem in 1,234.56 but now there is a space after the period (i.e., before 5). I am not sure how the regex above works. can someone help?
Change the pattern to
/^(?:[0-9]+(?:,[0-9]{3})*(?:\.[0-9]*)*|\.[0-9]+)/
to allow 12,345.6 to be treated as a number, while 12, 345 is a list of two numbers. In the original pattern, the \{\.\} requires a literal {.} (braces included), not just a decimal.
Related
I was trying to learn how to write vim plugins and required to match numbers which a language would allow to work with and trying to highlight it with different colors and wrote following regex in very magic mode:
syntax match cNumberGroup "\v\d+"
syntax match cNumberGroup "\v0x\x+"
syntax match cNumberGroup "\v[+\-]?(?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+)?"
The first one is for decimal numbers. The second one for hex numbers
The third one is for scientific numbers representations.
The first two are working fine but last one is not working. I want to know that why it is not working and also that, Is there a better way to write all the regex to cover all the three number systems efficiently.
Thanks.
Just slapping \v (very magic) in from of the regular expression doesn't make Vim's regular expression syntax Perl-compatible. As #Carpetsmoker has already commented, :help perl-patterns shows the differences.
For your example, the (?:...) has to be written as \v%(...):
syntax match cNumberGroup "\v[+\-]?%(0|[1-9]\d*)%(\.\d*)?%([eE][+\-]?\d+)?"
I have a .html file that is working perfectly fine but for some reason Sublime 3 decides that it has invalid code, check the image below:
Any idea why that's happening and how to fix it without having to modify the code?
The HTML5 spec states (my emphasis):
Comments must start with the four character sequence U+003C LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS (<!--). Following this sequence, the comment may have text, with the additional restriction that the text must not start with a single > (U+003E) character, nor start with a U+002D HYPHEN-MINUS character (-) followed by a > (U+003E) character,
nor contain two consecutive U+002D HYPHEN-MINUS characters (--),
nor end with a U+002D HYPHEN-MINUS character (-). Finally, the comment must be ended by the three character sequence U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN (-->).
So that's why it's complaining. As to how to fix it without changing the code, that's trickier.
Your contention that it works is no different really to C developers wondering why they need to worry about undefined behaviour because the code they wrote works fine. The fact that it works fine in one particular implementation is not relevant to portable code.
My advice is to actually change the code. It's not valid, after all, and any browser (current or future) would be well within its rights to simply reject it.
As an aside after some historical digging, it appears this is not allowed because SGML, on which HTML was based, had slightly different rules regarding comment.
On sensing the <!-- token, the parser was switched to a comment mode where > characters were actually allowed within the comment. If the -- sequence was encountered, it changed to a different mode where the > would end the comment.
In fact, it appears to have been a toggle switch between those two modes, so something like <!-- >>>>> -- xyzzy -- >>>>> --> was possible, but putting a > where the xyzzy would end the comment.
XML, for one, didn't adopt this behaviour and HTML has now modified it to follow the "don't use -- within comments at all" rule, the reason being that hardly anyone knew that the comments behaved in the SGML way, causing some pain :-)
I have several functions that start with get_ in my code:
get_num(...) , get_str(...)
I want to change them to get_*_struct(...).
Can I somehow match the get_* regex and then replace according to the pattern so that:
get_num(...) becomes get_num_struct(...),
get_str(...) becomes get_str_struct(...)
Can you also explain some logic behind it, because the theoretical regex aren't like the ones used in UNIX (or vi, are they different?) and I'm always struggling to figure them out.
This has to be done in the vi editor as this is main work tool.
Thanks!
To transform get_num(...) to get_num_struct(...), you need to capture the correct text in the input. And, you can't put the parentheses in the regular expression because you may need to match pointers to functions too, as in &get_distance, and uses in comments. However, and this depends partially on the fact that you are using vim and partially on how you need to keep the entire input together, I have checked that this works:
%s/get_\w\+/&_struct/g
On every line, find every expression starting with get_ and continuing with at least one letter, number, or underscore, and replace it with the entire matched string followed by _struct.
Darn it; I shouldn't answer these things on spec. Note that other regex engines might use \& instead of &. This depends on having magic set, which is default in vim.
For an alternate way to do it:
%s/get_\(\w*\)(/get_\1_struct(/g
What this does:
\w matches to any "word character"; \w* matches 0 or more word characters.
\(...\) tells vim to remember whatever matches .... So, \(w*\) means "match any number of word characters, and remember what you matched. You can then access it in the replacement with \1 (or \2 for the second, etc.)
So, the overall pattern get_\(\w*\)( looks for get_, followed by any number of word chars, followed by (.
The replacement then just does exactly what you want.
(Sorry if that was too verbose - not sure how comfortable you are with vim regex.)
Can someone give me a real-world scenario of a method/function with a string argument which came from user input (e.g. form field, parsed data from file, etc.) where leading or trailing spaces SHOULD NOT have been trimmed?
I can't ever recall such a situation for myself.
EDIT: Mind you, I didn't say trimming any whitespace. I said trimming leading or trailing (only) spaces (or whitespace).
Search string in any "Find" dialog in an editor.
Password input boxes. There's lots of data out there, where whitespace can genuinely be considered important part of the string. It narrows things down alot by making it starting and ending whitespace only, but there's still many examples. Stuff you pass through a PHP style nl2br function.
If you are inputting code. There may be a scenario where whitespace at the begining and end are necessary.
Also, look at Stack Overflow's markdown editor. Code examples are indented. If you posted just a code example, then it will require leading and trailing white space not be trimmed.
Perhaps a Whitespace interpreter.
Python....
A Stackoverflow answer, or more generally input written in markdown (four leading spaces -> code block).
A paragraph entry.
If the input is python code (say, for a pastebin kinda thing), you certainly can't trim leading white space; but you also can't trim trailing white space, because it could be a part of a multi-line string (triple quoted string).
I've used whitespace as a delimiter before, so there. Also, for anything that involves concatenating multiple inputs, removing leading/trailing whitespace can break formatting or possibly do worse. Aside from that, as Spencer said, for indented paragraphs you probably would not want to remove the leading whitespace.
Obviously passwords should not be trimmed. Passwords can contain leading or trailing whitespaces that need to be be treated as valid characters.
Vims errorformat (for parsing compile/build errors) uses an arcane format from c for parsing errors.
Trying to set up an errorformat for nant seems almost impossible, I've tried for many hours and can't get it. I also see from my searches that alot of people seem to be having the same problem. A regex to solve this would take minutesto write.
So why does vim still use this format? It's quite possible that the C parser is faster but that hardly seems relevant for something that happens once every few minutes at most. Is there a good reason or is it just an historical artifact?
It's not that Vim uses an arcane format from C. Rather it uses the ideas from scanf, which is a C function. This means that the string that matches the error message is made up of 3 parts:
whitespace
characters
conversion specifications
Whitespace is your tabs and spaces. Characters are the letters, numbers and other normal stuff. Conversion specifications are sequences that start with a '%' (percent) character. In scanf you would typically match an input string against %d or %f to convert to integers or floats. With Vim's error format, you are searching the input string (error message) for files, lines and other compiler specific information.
If you were using scanf to extract an integer from the string "99 bottles of beer", then you would use:
int i;
scanf("%d bottles of beer", &i); // i would be 99, string read from stdin
Now with Vim's error format it gets a bit trickier but it does try to match more complex patterns easily. Things like multiline error messages, file names, changing directory, etc, etc. One of the examples in the help for errorformat is useful:
1 Error 275
2 line 42
3 column 3
4 ' ' expected after '--'
The appropriate error format string has to look like this:
:set efm=%EError\ %n,%Cline\ %l,%Ccolumn\ %c,%Z%m
Here %E tells Vim that it is the start of a multi-line error message. %n is an error number. %C is the continuation of a multi-line message, with %l being the line number, and %c the column number. %Z marks the end of the multiline message and %m matches the error message that would be shown in the status line. You need to escape spaces with backslashes, which adds a bit of extra weirdness.
While it might initially seem easier with a regex, this mini-language is specifically designed to help with matching compiler errors. It has a lot of shortcuts in there. I mean you don't have to think about things like matching multiple lines, multiple digits, matching path names (just use %f).
Another thought: How would you map numbers to mean line numbers, or strings to mean files or error messages if you were to use just a normal regexp? By group position? That might work, but it wouldn't be very flexible. Another way would be named capture groups, but then this syntax looks a lot like a short hand for that anyway. You can actually use regexp wildcards such as .* - in this language it is written %.%#.
OK, so it is not perfect. But it's not impossible either and makes sense in its own way. Get stuck in, read the help and stop complaining! :-)
I would recommend writing a post-processing filter for your compiler, that uses regular expressions or whatever, and outputs messages in a simple format that is easy to write an errorformat for it. Why learn some new, baroque, single-purpose language unless you have to?
According to :help quickfix,
it is also possible to specify (nearly) any Vim supported regular
expression in format strings.
However, the documentation is confusing and I didn't put much time into verifying how well it works and how useful it is. You would still need to use the scanf-like codes to pull out file names, etc.
They are a pain to work with, but to be clear: you can use regular expressions (mostly).
From the docs:
Pattern matching
The scanf()-like "%*[]" notation is supported for backward-compatibility
with previous versions of Vim. However, it is also possible to specify
(nearly) any Vim supported regular expression in format strings.
Since meta characters of the regular expression language can be part of
ordinary matching strings or file names (and therefore internally have to
be escaped), meta symbols have to be written with leading '%':
%\ The single '\' character. Note that this has to be
escaped ("%\\") in ":set errorformat=" definitions.
%. The single '.' character.
%# The single '*'(!) character.
%^ The single '^' character. Note that this is not
useful, the pattern already matches start of line.
%$ The single '$' character. Note that this is not
useful, the pattern already matches end of line.
%[ The single '[' character for a [] character range.
%~ The single '~' character.
When using character classes in expressions (see |/\i| for an overview),
terms containing the "\+" quantifier can be written in the scanf() "%*"
notation. Example: "%\\d%\\+" ("\d\+", "any number") is equivalent to "%*\\d".
Important note: The \(...\) grouping of sub-matches can not be used in format
specifications because it is reserved for internal conversions.
lol try looking at the actual vim source code sometime. It's a nest of C code so old and obscure you'll think you're on an archaeological dig.
As for why vim uses the C parser, there are plenty of good reasons starting with that it's pretty universal. But the real reason is that sometime in the past 20 years someone wrote it to use the C parser and it works. No one changes what works.
If it doesn't work for you the vim community will tell you to write your own. Stupid open source bastards.