Could someone explain a particular use case of ‘foldexpr’ syntax in Vim? - vim

Could anyone please provide an explanation of the syntax in the following example, or post me a link where there is a more general explanation of the individual symbols used in this expression? I found Vim help to be incomplete in this regard.
:set foldexpr=getline(v:lnum)=~'^\\s*$'&&getline(v:lnum+1)=~'\\S'?'<1':1
What is unclear to me is the following.
Why are strings enclosed in single quotes instead of double quotes? Is it a matter of choice?
What does the explanation mark mean in =~'\\S'?'<1':1?
What does the expression 'string1'string2'string3 mean?
What does :1 mean?

The foldexpr option supposed to contain an expression that evaluates
into an integer or a string of particular format that specifies the folding
level of the line which number is stored in the v:lnum global variable
at the moment of evaluation.
Let us follow the logic of this foldexpr example from top to bottom.
getline(v:lnum)=~'^\\s*$'&&getline(v:lnum+1)=~'\\S'?'<1':1
At the top level, the whole expression is an instance of the ternary
operator A ? B : C. The result of the operator is the value of the
B expression if A evaluates to non-zero, and the value of the
C expression otherwise (see :help expr1). In this case, B is
the string literal '<1', and C is the number 1 (for meaning
of '<1' and 1 as fold level specifiers see :help fold-expr).
The A expression consists of two conditions joined by the
&& operator:
getline(v:lnum) =~ '^\\s*$' && getline(v:lnum+1) =~ '\\S'
Both conditions have the same form:
getline(N) =~ S
The getline function returns contents of the line (in the current
buffer) that is referenced by the line number passed as an argument
(see :help getline). When the foldexpr is evaluated, the v:lnum
variable contains number of the line for which folding level should
be calculated.
The =~ operator tests whether its left operand matches a regular
expression given by its right string operand, and returns boolean value
(see :help expr4, in particular, near the end of the expr4 section).
Thus, the A condition is intended to check that the v:lnum-th line
matches the '^\\s*$' pattern, and the line following that v:lnum-th
line matches the '\\S' pattern.
The regular expression patterns are specified in the expression as
string literals. String literals have two syntactic forms and can be
quoted using double or single quotes. The difference between these
forms is that double quoted string could contain various control
sequences which start with backslash. That sequences allow to specify
special characters that cannot be easily typed otherwise (double
quote, for example—it writes \"). Single quoted strings, at the
other hand, do not allow such backslash-sequences. (For complete
description of single and double quoted strings see :help expr-string
and :help literal-string.)
The notable consequence of the double quoted strings syntax is that
backslash symbol itself must be escaped (\\). That is why single
quoted strings are often used to specify regular expressions: there
is no need to escape constantly demanded backslash symbol. One can
notice, though, that backslashes are nevertheless escaped in those
patterns above. This is due to that some symbols (including backslash)
have special meaning when in Ex commands (including :set, of
course). When you hit Enter to start the command
:set foldexpr=...
Vim interprets some character sequences first (see :help cmdline-special).
In particular, the \\ sequence is treated as a single backslash.
Putting it all together, the expression tests whether the line number
v:lnum contains only blank characters and whether the following line
(number v:lnum+1) has any non-blank character (see :help pattern
to grasp the meaning of the patterns). If so, the expression evaluates
to the string '<1', otherwise it evaluates to the number 1.

Related

vim: replace sub-match with the same number of new strings

My plan is to do a pretty standard search replace, replacing all instances of old_string with new_string. The problem is that I only want to do this for an arbitrary number of old_strings following a specific prefix. So for example:
old_string = "a"
new_string = "b"
prefix = "xxxx"
xxxxaaaaaaaa => xxxxbbbbbbbb
xxxxaaapostfix => xxxxbbbpostfix
xxaaaa => xxaaaa
etc. I'm not sure how to do this. I imagine there's some way to say s/xxxxa*/xxxxb{number of a's}/g or something, but I have no idea what it is.
You can definitely do this! I would use the \= register to evaluate some vimscript. From :h s/\=:
Substitute with an expression *sub-replace-expression*
*sub-replace-\=* *s/\=*
When the substitute string starts with "\=" the remainder is interpreted as an
expression.
The special meaning for characters as mentioned at |sub-replace-special| does
not apply except for "<CR>". A <NL> character is used as a line break, you
can get one with a double-quote string: "\n". Prepend a backslash to get a
real <NL> character (which will be a NUL in the file).
Then you can use the repeat and submatch functions to build the right string. For example:
:%s/\(xxxx\)\(a\+\)/\=submatch(1).repeat('b', len(submatch(2)))
I chose to use \+ instead of * because then the pattern will not be found after the substitute command finished (this effects hlsearch and n)
Of course, if you use the \zs and \ze (start/end of match) atoms, you can use less capturing groups, which makes this waaay shorter and clearer.
:%s/xxxx\zsa\+/\=repeat('b', len(submatch(0)))
If you have perl support, you can use
:%perldo s/xxxx\Ka+/"b" x length($&)/ge
xxxx\Ka+ match one or more a only if preceded by xxxx
lookbehind with \K
/ge replace all occurrences in line, e allows to use Perl code in replacement section
"b" x length($&) the string b repeated length($&) number of times
See :h perl for more info

What vim pattern matches a number which ends with dot?

In PDP11/40 assembling language a number ends with dot is interpreted as a decimal number.
I use the following pattern but fail to match that notation, for example, 8.:
syn match asmpdp11DecNumber /\<[0-9]\+\.\>/
When I replace \. with D the pattern can match 8D without any problem. Could anyone tell me what is wrong with my "end-with-dot" pattern? Thanks.
Your regular expression syntax is fine (well, you can use \d instead of [0-9]), but your 'iskeyword' value does not include the period ., so you cannot match the end-of-word (\>) after it.
It looks like you're writing a syntax for a custom filetype. One option is to
:setlocal filetype+=.
in a corresponding ~/.vim/ftplugin/asmpdp11.vim filetype plugin. Do this when the period character is considered a keyword character in your syntax.
Otherwise, drop the \> to make the regular expression match. If you want to ensure that there's no non-whitespace character after the period, you can assert that condition after the match, e.g. like this:
:syn match asmpdp11DecNumber /\<\d\+\.\S\#!/
Note that a word is defined by vim as:
A word consists of a sequence of letters, digits and underscores, or a
sequence of other non-blank characters, separated with white space
(spaces, tabs, ). This can be changed with the 'iskeyword'
option. An empty line is also considered to be a word.
so your pattern works fine if whitespace follows the number. You may want to skip the \>.
I think the problem is your end-of-word boundary marker. Try this:
syn match asmpdp11DecNumber /\<[0-9]\+\./
Note that I have removed the \> end-of-word boundary. I'm not sure what that was in there for, but it appears to work if you remove it. A . is not considered part of a word, which is why your version fails.

How to repeat a substitution the number of times the search word occurs in a row in a substitution command in Vim?

I would like to use tabs in a code that doesn’t use them. What I did until now to implement tabs was pretty handcrafty:
:%s/^ /\t/g
:%s/^\t /\t\t/g
. . .
Question: Is there a way to replace two spaces ( ) by tab (\t) the number of times it was found at the beginning of a line?
There are (at least) three substitution techniques relevant to this case.
1. The first one takes advantage of the preceding-atom matching
syntax to naturally define a step of indentation. According to the
question statement, an indent step is a pair of adjacent space
characters preceded with nothing but spaces from the beginning
of line. Following this definition, one can construct the actual
substitution pattern, right to left:
:%s/\%(^ *\)\#<= /\t/g
Indeed, the pattern designates an occurrence of two literal space
characters, but only when they are preceded by a zero-width match
of the atom just before \#<=, which is the pattern ^ * wrapped in
grouping parentheses \%(, \). These non-capturing parentheses are
used instead of the usual capturing ones, \(, \), since there is no
need in further referring to the matched string of leading spaces. Due
to the g flag, the above :substitute command runs through the
leading spaces pair by pair, and replaces each of them by single tab
character.
2. The second technique takes a different approach. Instead of
matching separate indent levels, one can break each of the lines
starting with space characters down into two lines: one containing
the indenting spaces of the original line, and another holding the
rest of it. After that, it is straightforward to replace all of the pairs
of spaces on the first line, and concatenate the lines back together:
:g/^ /s/^ \+/&\r/|-s/ /\t/g|j!
3. The third idea is to process leading spaces by means of Vim
scripting language. A convenient way of doing that is to use the
substitute with an expression feature of the :substitute command
(see :help sub-replace-\=). When started with \=, the substitute
string of the command enables to substitute the matches of a pattern
with results of evaluation of the expression specified after \=:
:%s#^ \+#\=repeat("\t",len(submatch(0))/2)
If you specifically want to convert spaces into tabs (or vice-versa) at the start of a line, there's the useful :retab command which takes care of that. For example:
:retab! 2 will convert spaces in groups of two to tabs
:set expandtab and then :retab! 2 will convert tabstops (of width 2) back to spaces
See :h :retab (and :h 'ts') for the details.
This is not a general solution for the original problem, but I think it covers the most common use case.
There is no general way of doing this using :s regex's. You can't make the /g modifier look backwards otherwise it'd be unusable, and you can't reliably check that you're at the beginning of the line without looking backwards.
The only way of doing it generally is to loop, like so:
:for i in range(100)
: %s/^\t*\zs /\t/e
:endfor
Which is ugly, slow and highly unrecommended. Use :retab

Deleting duplicate values using find and replace in a text editor

I messed something up. In my xml, each non preferred term has a preferred term to use:
Something I have done has created some non preffered terms where the preferred term to use is the exact same name as this non preferred term.
<term>
<termId>127699289611384833453kNgWuDxZEK37Lo4QVWZ</termId>
<termUpdate>Add</termUpdate>
<termName>Adenosquamous Carcinoma</termName>
<termType>Nd</termType>
<termStatus>Active</termStatus>
<termApproval>Approved</termApproval>
<termCreatedDate>20110704T09:41:31</termCreatedDatae>
<termCreatedBy>admin</termCreatedBy>
<termModifiedDate>20110704T09:45:17</termModifiedDate>
<termModifiedBy>admin</termModifiedBy>
<relation>
<relationType>USE</relationType>
<termId>1276992897N1537166632rbr7BISWAI93SarY118G</termId>
<termName>Adenosquamous Carcinoma</termName>
</relation>
Is there a text editor with a find and replace function I can use to tell it that if the in =the of the actual term, to just delete the whole ? I looked at the related queries and they mentioned regular expressions, but I've spent ages trying to build them and they are beyond me,
thanks!
It is nearly 3 years too late answering this question, but there are Perl regular expressions which can be indeed used for this task.
Finding and deleting a term block containing same termName in relation as defined above for the term itself is possible with UltraEdit for Windows v21.10.0.1032 and most likely also with other text editors supporting Perl regular expression using a case-sensitive Perl regular expression Replace with search string:
^[ \t]*<term>(?:(?!</term>)[\S\s])+<termName>([^\r\n]+?)</termName>(?:(?!</term>)[\S\s])+<relation>(?:(?!</term>)[\S\s])+<termName>\1</termName>(?:(?!</term>)[\S\s])+</term>[ \t\r]*\n
The replace string is an empty string.
Explanation:
^ ... start every search at beginning of a line.
[ \t]* ... there can be 0 or more spaces or tabs at beginning of the line.
<term> ... this string must be found next on the line.
Next the tricky expression follows which is required to match any character up to next string of interest, but with avoiding matching something in next term block if the remaining expression does not return a positive result on current term block.
(?:(?!</term>)[\S\s])+ ... this expression finds any character because of [\S\s] matching any non whitespace character or any whitespace character. There must be at least 1 character before next fixed string because of the +, but it can be also more characters. Additionally the Perl regular expression must make look ahead on every character matched to check if NOT </term> follows. If right of the currently matched character there is the string </term>, the Perl regexp engine must stop matching any character at current position in stream and continue with next part of the search string. So this expression can match any character, but not beyond </term> and therefore only characters between <term> and </term>. Because of ?: nothing is captured/marked for back referencing by this expression.
<termName> ... this fixed string within a term block must be found next.
([^\r\n]+?) ... matches the characters of the name of the term and captures/marks this string for back referencing. Instead of the negative character class expression [^\r\n], it would be also possible to use another class definition, or just . if a dot does not match new line characters. Also possible would be ([^<]+) if it is not possible that a not encoded opening angle bracket is part of the term name. Character < must be encoded with < according to XML specification within an element's value except within a CDATA block.
</termName> ... this fixed string within a term block must be found next.
(?:(?!</term>)[\S\s])+ ... again any character within a term block up to next fixed string.
<relation> ... this fixed string within a term block must be found next.
(?:(?!</term>)[\S\s])+ ... again any character within a term block up to next fixed string.
<termName> ... this fixed string within a term block must be found next.
\1 ... this expression back references the captured/marked term name and therefore the next string must be the same as the name of the term defined above.
</termName> ... this fixed string within a term block must be found next.
(?:(?!</term>)[\S\s])+ ... again any character within a term block up to next fixed string.
</term> ... this fixed string marking end of a term block must be found next.
[ \t\r]*\n ... matches 0 or more spaces, tabs and carriage returns and next a line-feed. So this expression works for a DOS/Windows (CR+LF) and a Unix (only LF) text file.
Also possible with UltraEdit is:
(?s)^[ \t]*<term>(?:(?!</term>).)+<termName>([^<]+?)</termName>(?:(?!</term>).)+<relation>(?:(?!</term>).)+<termName>\1</termName>(?:(?!</term>).)+</term>[ \t\r]*\n
(?s) ... this expression at beginning of the search string changes the behavior of . from matching any character except line terminators to really any character and therefore . is now like [\S\s].

What [...] in a bash script stands for?

I am reading this tutorial, and I encountered that bash script uses [...] as a wild card character. So what exactly [...] stands in a bash script?
It's a regex-style character matching syntax; from the Bash Reference Manual, §3.5.8.1 (Pattern Matching):
[...]
Matches any one of the enclosed characters. A pair of characters separated by a hyphen denotes a range expression; any character that sorts between those two characters, inclusive, using the current locale's collating sequence and character set, is matched. If the first character following the ‘[’ is a ‘!’ or a ‘^’ then any character not enclosed is matched. A ‘−’ may be matched by including it as the first or last character in the set. A ‘]’ may be matched by including it as the first character in the set. The sorting order of characters in range expressions is determined by the current locale and the value of the LC_COLLATE shell variable, if set.
For example, in the default C locale, ‘[a-dx-z]’ is equivalent to ‘[abcdxyz]’. Many locales sort characters in dictionary order, and in these locales ‘[a-dx-z]’ is typically not equivalent to ‘[abcdxyz]’; it might be equivalent to ‘[aBbCcDdxXyYz]’, for example. To obtain the traditional interpretation of ranges in bracket expressions, you can force the use of the C locale by setting the LC_COLLATE or LC_ALL environment variable to the value ‘C’.
Within ‘[’ and ‘]’, character classes can be specified using the syntax [:class:], where class is one of the following classes defined in the posix standard:
alnum alpha ascii blank cntrl digit graph lower
print punct space upper word xdigit
A character class matches any character belonging to that class. The word character class matches letters, digits, and the character ‘_’.
Within ‘[’ and ‘]’, an equivalence class can be specified using the syntax [=c=], which matches all characters with the same collation weight (as defined by the current locale) as the character c.
Within ‘[’ and ‘]’, the syntax [.symbol.] matches the collating symbol symbol.
(emphasis added to the most common usage patterns)
It is used in the tutorial to speak about regular expressions in addition to globbing ('*' and '?'). For example [a-z] regular expression will match one lowercase character.
Actually, what is a wildcard is [abc] for example. It matches one of the three letters.

Resources