How to match specific letters in words - vim

I'm currently learning Russian, and there is one caveat in the encoding of Cyrillic letters: Some look exactly like ASCII. Example. The word »облако« (cloud) does neither contain an »a« nor an »o« but instead, it contains a »а« and a »о«. If you're not getting it yet, try to fire up your browsers search dialog, enter an »a« or an »o«, use some highlight-all functionality, and you will see, that »а« and »о« both remain dark.
So, now I want to highlight this problem in vim. Since I'm using mixed language text files, I can't just highlight every ASCII letter (that would be easy), instead, I want all ASCII letters in all words that contain at least one Cyrillic letter to be error-highlighted. My current approach is to use this matches:
" Here, I use бакло as a shortcut for the list of all cyrillic letters,
" this makes this a small self contained example for the word used in the
" problem desctiption, without having the full list in all lines.
" To get the file I actually have, run
" :%s/бакло/ЖжФфЭэЗзЧчБбАаДдВЬвьЪъЫыСсЕеёНнЮюІіКкМмИиЙйПпЛлОоРрЯяГгТтЦцШшЩщХхУу/g
syn match russianWordOk "[бакло]\+"
syn match russianWordError "[бакло][a-zA-Z0-9_]\+"hs=s+1
syn match russianWordError "[a-zA-Z0-9_]\+[бакло]"he=e-1
syn match russianWordError "[бакло][a-zA-Z0-9_]\+[бакло]"hs=s+1,he=e-1
However, like in »облaко« (now a is ASCII), the highlighting would still mark »обл« as valid, »a« as invalid, »к« as not being part of a keyword (it is part of the matching russianWordError keyword) and finally the remaining »о« as valid again. What I want instead is to have the entire word being part of the matching russianWordError keyword but still only the »a« being highlighted as illegal. Is there a way and if yes, how do I accomplish that?

In order to only match whole words, not fragments inside other words, wrap your patterns in \< and \>. These assertions will then be based on Vim's 'iskeyword' setting, and should be fine. (Alternatively, you can do other lookbehind and lookahead assertions via \#<= and \#=.)
syn match russianWordOk "\<[бакло]\+\>"
I would approach the highlighting of the wrong ASCII character not via hs= / he=, but via a contained group. First, identify bad mixed words. There has to be at least one cyrillic letter, either at the beginning, or at the end. The rest is at least one (i.e. repeating the \%(...\) group with \+, or else you would only match single-error words) ASCII, potentially other cyrillics in between:
syn match russianWordBad "\<\%([бакло]*[a-zA-Z0-9_]\)\+[бакло]\+\>" contains=russianWordError
syn match russianWordBad "\<[бакло]\+\%([a-zA-Z0-9_][бакло]*\)\+\>" contains=russianWordError
This contains the ASCII syntax group that does the error highlighting. Because of contained, it only matches inside another group (here: russianWordBad).
syn match russianWordError "[a-zA-Z0-9_]" contained

Related

How do I start a Vim syntax-region at the very start of the document, while also allowing a keyword-match in that same position?

Thanks to another question, I've managed to assign a syntax-region to my document that starts at the very start (\%^) of the document:
syn region menhirDeclarations start=/\%^./rs=e-1 end=/%%/me=s-1
To get that to work, the start pattern had to match the very first character of the document. That is, the above won't work with just start=/\%^/; it needs that last . (The match, when successful, then excludes that character; but it has to actually match before that happens …)
Problem is, any :syn-keyword match at the same location — even one lower in :syn-priority — is going to preempt my above region-match. Basically, this means I can't have any keyword that's allowed to match at the start of the document, or that keyword, when so placed, will prevent the above "whole-top-of-the-document" region from matching at all.
Concrete example. With the following syntax:
syn keyword menhirDeclarationKeyword %parameter %token " ...
syn region menhirDeclarations start=/\%^./rs=e-1 end=/%%/me=s-1
… the document …
%token <blah> blah
blah blah blah
… will not contain the requisite menhirDeclarations region, because menhirDeclarationKeyword matched at the very first character, consuming it, and preventing menhirDeclarations from matching.
I can bypass this by declaring everything in the syntax definition as a :syn-match or :syn-region, and defining the above region very last … but that's probably a performance problem, and more importantly, really difficult to manage.
tl;dr: Is there any way to match a region at the very start of the document, and allow keywords to match at the same location?
To keep the keywords, you have to make them contained. Else, Vim's syntax rules will always give them precedence, and they won't allow your region(s) to match. If I remember correctly from your last question, the whole document is parsed as a sequence of different regions; that would be great. Else, you have to create new regions or matches for those parts of the document that are not yet covered, but may also contain keywords.
syn keyword menhirDeclarationKeyword contained %parameter %token
syn region menhirDeclarations start=/\%^%token\>/rs=e-1 end=/%%/me=s-1 contains=menhirDeclarationKeyword
If that isn't feasible, you indeed have to use :syntax match instead:
syn match menhirDeclarationKeyword "\<%token\>"
Don't assume that this will be slower; measure it via :help :syntime, on various complex input files.
The "difficult to manage" part can be addressed via Vimscript metaprogramming. For example, you can keep all keywords in a list and build the definitions dynamically with a loop:
for s:keyword in ['parameter', 'token']
execute printf('syntax match menhirDeclarationKeyword "\<%%%s\>"', s:keyword)
endfor

How to exclude capitalized words from spell checking in Vim?

There are too many acronyms and proper nouns to add to the dictionary. I would like any words that contains a capital letter to be excluded from spell checking. Words are delimited by either a whilespace or special characters (i.e., non-alphabetic characters). Is this possible?
The first part of the answer fails when the lowercase and special characters surround the capitalized word:
,jQuery,
, iPad,
/demoMRdogood/
[CSS](css)
`appendTo()`,
The current answer gives false positives (excludes from the spellcheck) when the lowercase words are delimited by a special character. Here are the examples:
(async)
leetcode, eulerproject,
The bounty is for the person who fixes this problem.
You can try this command:
:syn match myExCapitalWords +\<[A-Z]\w*\>+ contains=#NoSpell
The above command instructs Vim to handle every pattern described by \<[A-Z]\w*\> as part of the #NoSpell cluster. Items of the #NoSpell cluster aren’t spell checked.
If you further want to exclude all words from spell checking that contain at least one non-alphabetic character you can invoke the following command:
:syn match myExNonWords +\<\p*[^A-Za-z \t]\p*\>+ contains=#NoSpell
Type :h spell-syntax for more information.
Here is the solution that worked for me. This passes the cases I mentioned in the question:
syn match myExCapitalWords +\<\w*[A-Z]\K*\>+ contains=#NoSpell
Here is an alternative solution that uses \S instead of \K. The alternative solution excludes characters that are in the parenthesis and are preceded by a capitalized letter. Since it is more lenient, it works better for URLs:
syn match myExCapitalWords +\<\w*[A-Z]\S*\>+ contains=#NoSpell
Exclude "'s" from the spellcheck
s after an apostrophe is considered a misspelled letter regardless of the solution above. a quick solution is to add s to your dictionary or add a case for that:
syn match myExCapitalWords +\<\w*[A-Z]\K*\>\|'s+ contains=#NoSpell
This was not part the question, but this is a common case for spell checking process so I mentioned it here.

What vim pattern matches a number which ends with dot?

In PDP11/40 assembling language a number ends with dot is interpreted as a decimal number.
I use the following pattern but fail to match that notation, for example, 8.:
syn match asmpdp11DecNumber /\<[0-9]\+\.\>/
When I replace \. with D the pattern can match 8D without any problem. Could anyone tell me what is wrong with my "end-with-dot" pattern? Thanks.
Your regular expression syntax is fine (well, you can use \d instead of [0-9]), but your 'iskeyword' value does not include the period ., so you cannot match the end-of-word (\>) after it.
It looks like you're writing a syntax for a custom filetype. One option is to
:setlocal filetype+=.
in a corresponding ~/.vim/ftplugin/asmpdp11.vim filetype plugin. Do this when the period character is considered a keyword character in your syntax.
Otherwise, drop the \> to make the regular expression match. If you want to ensure that there's no non-whitespace character after the period, you can assert that condition after the match, e.g. like this:
:syn match asmpdp11DecNumber /\<\d\+\.\S\#!/
Note that a word is defined by vim as:
A word consists of a sequence of letters, digits and underscores, or a
sequence of other non-blank characters, separated with white space
(spaces, tabs, ). This can be changed with the 'iskeyword'
option. An empty line is also considered to be a word.
so your pattern works fine if whitespace follows the number. You may want to skip the \>.
I think the problem is your end-of-word boundary marker. Try this:
syn match asmpdp11DecNumber /\<[0-9]\+\./
Note that I have removed the \> end-of-word boundary. I'm not sure what that was in there for, but it appears to work if you remove it. A . is not considered part of a word, which is why your version fails.

How to change word recognition in vim spell?

I like that vim 7.0 supports spell checking via :set spell, and I like that it by default only checks comments and text strings in my C code. But I wanted to find a way to change the behavior so that vim will know that when I write words containing underscores, I don't want that word spell checked.
The problem is that I often will refer to variable or function names in my comments, and so right now vim thinks that each piece of text that isn't a complete correct word is a spelling error. Eg.
/* The variable proj_abc_ptr is used in function do_func_stuff' */
Most of the time, the pieces seperated by underscores are complete words, but other times they are abbreviations that I would prefer not to add to a word list. Is there any global way to tell vim to include _'s as part of the word when spell checking?
Here are some more general spell-checking exception rules to put in .vim/after/syntax/{LANG}.vim files:
" Disable spell-checking of bizarre words:
" - Mixed alpha / numeric
" - Mixed case (starting upper) / All upper
" - Mixed case (starting lower)
" - Contains strange character
syn match spellingException "\<\w*\d[\d\w]*\>" transparent contained containedin=pythonComment,python.*String contains=#NoSpell
syn match spellingException "\<\(\u\l*\)\{2,}\>" transparent contained containedin=pythonComment,python.*String contains=#NoSpell
syn match spellingException "\<\(\l\+\u\+\)\+\l*\>" transparent contained containedin=pythonComment,python.*String contains=#NoSpell
syn match spellingException "\S*[/\\_`]\S*" transparent contained containedin=pythonComment,python.*String contains=#NoSpell
Change pythonComment,python.*String for your language.
transparent means that the match inherits its highlighting properties from the containing block (i.e. these rules do not change the way text is displayed).
contained prevents these matches from extending past the containing block (the last rule ends with \S* which would likely match past the end of a block)
containedin holds a list of existing syntax groups to add these new rules to.
contains=#NoSpell overrides any and all inherited groups, thus telling the spellchecker to skip the matched text.
You'll need to move it into its own group. Something like this:
hi link cCommentUnderscore cComment
syn match cCommentUnderscore display '\k\+_\w\+'
syn cluster cCommentGroup add=cCommentUnderscore
In some highlighters you may need contains=#NoSpell on the end of the match line, but in C, the default is #NoSpell, so it should be fine like that.

Sub-match syntax highlighting in Vim

First, I'll show the specific problem I'm having, but I think the problem can be generalized.
I'm working with a language that has explicit parenthesis syntax (like Lisp), but has keywords that are only reserved against the left paren. Example:
(key key)
the former is a reserved word, but the latter is a reference to the variable named "key"
Unfortunately, I find highlighting the left paren annoying, so I end up using
syn keyword classification key
instead of
syn keyword classification (key
but the former triggers on the variable uses as well.
I'd take a hack to get around my problem, but I'd be more interested in a general method to highlight just a subset of a given match.
Using syn keyword alone for this situation doesn't work right because you want your highlighting to be more aware of the surrounding syntax. A combination of syn region, syn match, and syn keyword works well.
hi link lispFuncs Function
hi link lispFunc Identifier
hi link sExpr Statement
syn keyword lispFuncs key foo bar contained containedin=lispFunc
syn match lispFunc "(\#<=\w\+" contained containedin=sExpr contains=lispFuncs
syn region sExpr matchgroup=Special start="(" end=")" contains=sExpr,lispFuncs
The above will only highlight key, foo, and bar using the Function highlight group, only if they're also matched by lispFunc.
If there are any words other than key, foo, and bar which come after a (, they will be highlighted using the Identifier highlight group. This allows you to distinguish between standard function names and user-created ones.
The ( and ) will be highlighted using the Special highlight group, and anything inside the () past the first word will be highlighted using the Statement highlight group.
There does appear to be some capability for layered highlighting, as seen here: Highlighting matches in Vim over an inverted pattern
which gives ex commands
:match myBaseHighlight /foo/
:2match myGroup /./
I haven't been able to get anything like that to work in my syntax files, though. I tried something like:
syn match Keyword "(key"
syn match Normal "("
The highlighting goes to Normal or Keyword over the whole bit depending on what gets picked up first (altered by arrangement in the file)
Vim soundly rejected using "2match" as a keyword after "syn".

Resources