I have text like this (1 or 0 tab + multiple whitespaces at line beginning):
(tab) There are a tab and 4 whitespaces before me. // line 1
(tab) There are a tab and 6 whitespaces before me. // line 2
There are 6 whitespaces before me. // line 3
There are 4 whitespaces before me. // line 4
When i use ^[\t\s]\s*, only line 1,2 are matched, line 3, 4 are not matched, why?
(When i use ^\s*, line 3 and 4 can be matched.)
Thanks!
It turns out that you can not use \s to match whitespace within [].
Just use to match it within [].
That is interesting. I'm not sure why the \s doesn't work inside of [] brackets. Perhaps it is because [] defines explicit characters and \s is ambiguous (it can stand for multiple characters). In other words \s stands for any whitespace, including a tab(\t). However, if you explicitly specify a space in this case (^[\t ]\s*) it will work.
As noted \s doesn't work within [], alternatively you could use the [:blank:] character class:
^[[:blank:]]\+
Related
I have a file which is as following
!J INCé0001438823
#1 A LIFESAFER HOLDINGS, INC.é0001509607
#1 ARIZONA DISCOUNT PROPERTIES LLCé0001457512
#1 PAINTBALL CORPé0001433777
$ LLCé0001427189
$AVY, INC.é0001655250
& S MEDIA GROUP LLCé0001447162
I just want to keep the last 10 characters of each line so that it becomes as following:-
0001438823
0001509607
0001457512
0001433777
0001427189
0001655250
:%s/.*\(.\{10\}\)/\1
: ex-commaned
% entire file
s/ substitute
.* anything (greedy)
. followed by any character
\{10\} exactly 10 of them
\( \) put them in a match group
/ replace with
\1 said match group
I would treat this as a shell script problem. Enter the following in vim:
:%! rev|cut -c1-10|rev
The :%! will pipe the entire buffer through the following filter, and then the filter comes straight from here.
for a single line you could use:
$9hd0
$ go to end of line
9h go 9 characters left
d0 delete to beginning of line
Assuming the é character appears only once in a line, and only before your target ten digits, then this would seem to work:
:% s/^.*é//
: command
% all lines
s/ / substitute (i.e., search-and-replace) the stuff between / and /
^ search from beginning of line,
. including any character (wildcard),
* any number of the preceding character,
é finding "é";
// replace with the stuff between / and / (i.e., nothing)
Note that you can type the é character by using ctrl-k e' (control-k, then e, then apostrophe, without spaces). On my system at least, this works in insert mode and when typing the "substitute" command. (To see the list of characters you can invoke with the ctrl-k "digraph" feature, use :dig or :digraph.
line A
foo bar bar foo bar foo
line B
foo bar bar foo
In line A, there are multiple occurrence of double space.
I only want to match lines like line B which has only once double space occurrence.
I tried
^.*\s{2}.*$
but it will match both.
How may I have the desired output? Thank you.
If you wish to match strings that contain no more than one string of two or more spaces between words you could use following regular expression.
r'^(?!(?:.*(?<! ) {2,}(?! )){2})'
Start your engine!
Note that this expression matches
abc de fgh
where there are four spaces between 'c' and 'd'.
Python's regex engine performs the following operations.
^
(?! : begin negative lookahead
(?: : begin non-capture group
.* : match 0+ characters other than line terminators
(?<! : begin negative lookbehind
[ ]{2,} : match 2+ spaces
(?! ) : negative lookahead asserts match is not followed by a space
) : end negative lookbehind
) : end non-capture group
{2} : execute non-capture group twice
) : end negative lookahead
You can do:
^(?!.*[ \t]{2,}.*[ \t]{2,})
# Negative look ahead assertion that states 'only start the match
# on this line IF there are NOT 2 (or potentially more) breaks with
# two (or potentially more) of tabs or spaces'.
Demo 1
If you want to require ONE double space in the line but not more:
^(?=.*[ \t]{2,})(?!.*[ \t]{2,}.*[ \t]{2,})
# Positive look ahead that states 'only start this match if there is
# at least one break with two tabs or spaces'
# BUT
# Negative look ahead assertion that states 'only start the match
# on this line IF there are NOT 2 (or potentially more) breaks with
# two (or potentially more) of tabs or spaces'.
Demo 2
If you want to limit to only two spaces (not tabs and not more than 2 spaces):
^(?=.*[ ]{2})(?!.*[ ]{2}.*[ ]{2})
# Same as above but remove the tabs as part of the assertion
Demo 3
Note: In your regex you have \s as the class for a space. That also matches [\r\n\t\f\v ] so both horizontal and vertical space characters.
Note 2:
You can do this without a regex as well (assuming you only want lines that have 1 and only 1 double space in them):
txt='''\
line A
foo bar bar foo bar foo
line B
foo bar bar foo'''
>>> [line for line in txt.splitlines() if len(line.split(' '))==2]
['foo bar bar foo']
You can get the match without lookarounds by starting the match with 1+ non whitespace chars.
Then optionally repeat a single whitespace char followed by non whitespace chars before and after matching a double whitespace char.
The negated character class [^\S\r\n] will match any whitespace chars except a newline or carriage return. If you want to allow matching newlines as well, you could use \s
^\S+(?:[^\S\r\n]\S+)*[^\S\r\n]{2}(?:\S+[^\S\r\n])*\S+$
Explanation
^ Start of string
\S+ Match 1+ non whitespace chars
(?: Non capture group
[^\S\r\n]\S+ Match a whitespace char without a newline
)* Close group and repeat 0+ times
[^\S\r\n]{2} Match the 2 whitespace chars without a newline
(?: Non capture group
\S+[^\S\r\n] Match 1+ non whitespace chars followed by a whitespace char without a newline
)* Close group a and repeat 1+ times
\S+ Match 1+ non whitespace chars
$ End of string
Regex demo
(You'd think this would be easy, but I'm stumped.)
I'm converting an iOS note to a text file, and the note contains "0." and "?" whenever there is a list or bullet.
This was a bulleted list
? item 20
? Item 21
? Item 22
I'm having so much problem replacing the "?"
I don't want to replace a legitimate question mark at the end of a sentence,
but I want to replace the "?" bullets with "-" (preferably anywhere in the line, not just at the beginning)
I tried these searches - no luck
set line "? item 20"
set index_bullet [string first "(\s|\r|\n)(\?)" $line]
set index_bullet [string first "(!\w)(\?)" $line]
set index_bullet [string first ^\? $line]
This works, but it would match any question mark
set index_bullet [string first \? $line]
Does anyone know what I'm doing wrong?
How do I find and replace only question mark bullets with a "-"?
Thank you very much in advance
If you're really wanting to replace a question mark where you've got a regular expression that describes the rule, the regsub command is the right way. (The string first command finds literal substrings only. The string match command uses globbing rules.) In this case, we'll use the -all option so that every instance is replaced:
set line "? item 20"
set replaced [regsub -all {(\s|^)\?(\s)} $line {\1-\2}]
puts "'$line' --> '$replaced'"
# Prints: '? item 20' --> '- item 20'
The main tricks to using regular expressions in Tcl are, as much as possible, to keep REs and their replacements in braces so that the you can use Tcl metacharacters (e.g., backslash or square brackets) without having to fiddle around a lot.
Also, \s by default will match a newline.
It seems likely that a character used to indicate a list item is the first character on the line or the first character after optional whitespace. To match a question mark at the beginning of a line:
string match {\?*} $line
or
string match \\?* $line
The braces or doubled backslash keeps the question mark from being treated as a string match metacharacter.
To find a question mark after optional whitespace:
string match {\?*} [string trimleft $line]
The command returns 1 if it finds a match, and 0 if it doesn't.
To do this with string first, use
if {[string first ? [string trimleft $line]] eq 0} ...
but in that case, keep in mind that the index returned from string first isn't the true location of the question mark. (Use
== instead of eq if you have an older Tcl).
When you have determined that the line contains a question mark in the first non-whitespace position, a simple
set line [regsub {\?} $line -]
will perform a single substitution regardless of where it is.
Documentation:
regsub,
string,
Syntax of Tcl regular expressions
I figured it out.
I did it in two steps:
1) First find the "?"
set index_bullet [string first "\?" $line]
2) Then filter out "?" that is not a bullet
set index_question_mark [string first "\w\?" $line]
I have a solution, but please post if you have a better way of doing this.
Thanks!
So, I've seen that you can remove between two characters and remove between two strings but I haven't been able to find a system that works between a string and a character.
I need to remove the numbers between the two brackets in...
provinces= {
923 6862 9794 9904 11751 11846 11882
}
Keep in mind that these files also contains other brackets which are needed. I've looked around for a solution for this but none seem to work :/
Thanks for the help.
This one will do the job:
Ctrl+H
Find what: \b(provinces\s*=\s*\{)[^}]+(\})
Replace with: $1$2
Replace all
Explanation:
\b : a word boundary
( : start group 1
provinces : literally "provinces"
\s* : 0 or more spaces
= : equal sign
\s* : 0 or more spaces
\{ : an open curly bracket, must be escaped because it has special meaning in regex
) : end group 1
[^}]+ : any character that is not a close curly bracket
(\}) : group 2, a close curly bracket, escaped.
Replacement:
$1$2 : group 1 then group 2
Say we are given this string in a vimscript:
"/home/Linus Torvalds/.vim/bundle/vim-autoformat/formatters/tidy -q --show-errors 0 --show-warnings 0 --indent auto --indent-spaces 2 --vertical-space yes --tidy-mark no --wrap 68".
How do we extract the filename part? In this case that would be:
"/home/Linus Torvalds/.vim/bundle/formatters/tidy".
If you can guarantee there are no dashes (-) in the path itself, I would do it like this:
matchstr(input_string,'^.\{-}\ze -')
Explanation: From the beginning of the string (^) match any character non-greedily (.\{-}) until the first occurrence of a space followed by a dash (\ze -).
Or you could just match until the first dash and then trim any trailing whitespace with a substitute() command, which would be less concise, but might be more readable.