How to search/replace special chars? - vim

After a copy-paste from Wikipedia into Vim, I get this:
1 A
2
3 [+] Métier agricole<200e> – 44 P • 2 C
4 [×] Métier de l'ameublement<200e> – 10 P
5 [×] Métier de l'animation<200e> – 5 P
6 [+] Métier en rapport avec l'art<200e> – 11 P • 4 C
7 [×] Métier en rapport avec l'automobile<200e> – 10 P
8 [×] Métier de l'aéronautique<200e> – 15 P
The problem is that <200e> is only a char.
I'd like to know how to put it in a search/replace (via the / or :).

Check the help for \%u:
/\%d /\%x /\%o /\%u /\%U E678
\%d123 Matches the character specified with a decimal number. Must be
followed by a non-digit.
\%o40 Matches the character specified with an octal number up to 0377.
Numbers below 040 must be followed by a non-octal digit or a non-digit.
\%x2a Matches the character specified with up to two hexadecimal characters.
\%u20AC Matches the character specified with up to four hexadecimal
characters.
\%U1234abcd Matches the character specified with up to eight hexadecimal
characters.
These are sequences you can use. Looks like you have two bytes, so \%u200e
should match it. Anyway, it's pretty strange. 20 in UTF-8 / ASCII is the space
character, and 0e is ^N. Check your encoding settings.

replace ^#
:%s/\%x00//g
replace ^L
// Enter the ^L using ctrl-V ctrl-L
:%s/^L//g
refers:
gvim - How to remove this symbol "^#" with vim? - Super User
vim - Deleting form feed ^L characters - Stack Overflow

If you want to quickly select this extraneous character everywhere and replace it / get rid of it, you could:
isolate one of the strange characters by adding a space before and after it, so it becomes a "word"
use the * command to search for the word under the cursor. If you have set hlsearch on, you should then see all of the occurrences of the extraneous character highlighted.
replace last searched item by something else, globally:
:%s//something else/

Related

How to ignore a scape of a word character in a python string with two consecutive apostrophe

Escape Issue
Hello all,
I have the bellow python string and I am trying to basically to ignore the '' scape characters. My Goal is to have the two single apostrophe before and after the DD but the raw string is not working as expected as seen in the image. Is there a way I could make them appear as I am expecting ?
r"""hello my name is "John" I cannot remove ''DD'' the escape character in order to show
the two single quotes together"""
Thanks,
I tried using replacements functions and other methods that I researched and non worked as expected.
chars = r""""John" ''DD''"""
chars # '"John" \'\'DD\'\''
'"John" \'\'DD\'\''
Above you see escaped inner single quotes (apostrophes) because the output is presented as single-quoted string. In fact, there are no reverse solidi (backslashes) in the chars string:
len(chars) # 13
print( chars) # "John" ''DD''
"John" ''DD''
Another proof (chars string as a sequence characters):
import unicodedata
for jj, char in enumerate( chars):
print( f'{jj:3}', char, unicodedata.name(char,'???'))
0 " QUOTATION MARK
1 J LATIN CAPITAL LETTER J
2 o LATIN SMALL LETTER O
3 h LATIN SMALL LETTER H
4 n LATIN SMALL LETTER N
5 " QUOTATION MARK
6 SPACE
7 ' APOSTROPHE
8 ' APOSTROPHE
9 D LATIN CAPITAL LETTER D
10 D LATIN CAPITAL LETTER D
11 ' APOSTROPHE
12 ' APOSTROPHE
Read more about escaping in String and Bytes literals

Vim Find Wildcard Value and Replace With same wildcard with small pertibation

I wish to find an occurrence of a specific symbol inside a large string and prepend a space to that symbol.
NOTE: I also have occurrences in the file with the space already prepended to that symbol.
input
6890 4 2.025 12.219883 -80.86158
6891 1 36.45 11.314275-79.050365
6892 1 36.45 14.031098-79.955972
6893 1 2.025 13.12549-78.144757
output:
6890 4 2.025 12.219883 -80.86158
6891 1 36.45 11.314275 -79.050365
6892 1 36.45 14.031098 -79.955972
6893 1 2.025 13.12549 -78.144757
I first thought that the solution would have the following form
:%s/*-*/* -*/
This form does not account for the existing spaces
:%s/\(\d\)-/\1 -/g
in entire file % substitute s/ a digit \d, (and capture it \(\d\)) followed by a minus -, and replace / with what was captured \1 followed by space and minus - with options / "all occurrences on the line" g

Looking for a Regex which can find all the number combinaitions without having 3 zero's in between and mixed with delimeters

I would like to find all the number combinaitions without having 3 zero's in between.
There might be some delimiters (max 2 characters) in between the numbers.
I'm using python and I would like to perform this search with the regex.
Accepted numbers
This is number 1234 which should be accepted.
12-45
1 2 0 0 3 4 5
not accepted numbers:
1
12
123
1000
1000-2000
30000-31000
21 000-32 000-50 000
21 00 03 00 00
The regex with which I could come up is:
([\s\-]{0,2}\d(?!000)){4,}
My regex can find all the accepted numbers but it doesn't filter out all the excepted numbers.
See the results in regex
Actually this regex is used in python to remove the matched numbers from the text:
See python code
p.s. Delimiters are not only space but should be at least \s and dash.
p.s.s. The numbers might be in the middle of the string. So I think I cannot use ^ and $ in my regex.
You could assert not 3 zeroes in a row while matching optional delimiters in between.
\b(?![\d\s-]*?0(?:[\s-]*0){2})\d(?:[\s-]*\d){3,}\b
Explanation
\b A word boundary
(?! Negative lookahead, assert what is at the right is not
[\d\s-]*? Match any of a digit, whitespace char or - as least as possible
0(?:[\s-]*0){2} - ) Match a zere followed by 2 times a zero with optional delimiters in between
\d Match a digit
(?:[\s-]*\d){3,} Repeat 3 or more times matching a digit with optional delimiters in between
\b A word boundary
Regex demo

Match any character (including whitespace) until the LAST bunch of whitespaces

I've got such text:
0000 10 [STUFF] Text ("TOTAL,SOME RANDOM TEXT") (558b6a68)
The first two column is pretty static. The third is optional. The last is optional and if exists, then always covered between parenthesis.
My issue is with the forth column, which can have spaces or actually any character inside (except newline of course).
My current regex looks like this:
^([a-fA-F0-9]{4,})\s+[a-fA-F0-9]+\s+(?:\[[^\]]*\]\s+)?
It matches all until the beginning of the fourth column.
Please note that space might exist anywhere, I can't define exact locations, like "always before parenthesis" or "may be between quotation marks".
I know for sure that this is the column before the last. So I'd like to capture them like this:
0000 10 [STUFF] Text("TOTAL,SOME RANDOM TEXT") (558b6a68)
^ ^ ^ ^ ^ ^
CAPTURE C A P T U R E C A P T U R E
I'd like to capture the texts marked between ^ ^ characters mentioned in the previous code block.
So, I'd like to grab any character UNTIL the last bunch of whitespace but also I don't want to include them into the final match group.
I hope I described it well :) Is it posssible with regex at all?
Here is some more sample text to test on:
0000 10 Text("TOTAL,SOME RANDOM TEXT") (1122aabb)
0010 5 D==1122aabb (1122aabb)
0015 17 Text("AND,SOME,MORE") (00000001)
002c 5 D==1 (1)
0031 1 !D (ccdd3344)
0032 5 D==ccdd3344 (ccdd3344)
0037 2 !1 (1)
0039 0 [AAAA] Fff
0039 1 [BBBB] Aaa
003a 6 N(05, eeff5566) (eeff5566)
0040 1 Qq
0041 2 $ab ([String]:"Unknown")
0043 f Call A/SomeFunc-X
0052 1 cd
I'd also start similar like your pattern with something like ^(\w+) +\w+ +(?:\[[^\]]+\] *)?
From here (start of 4th column) capture the first \S non white space followed by .*? lazily any amount of any character until an optional parenthesized part at the $ end can be captured. If not, the full line is consumed by group two.
^(\w+) +\w+ +(?:\[[^\]]+\] *)?(\S.*?)(?: +(\([^)]+\)))?$
See this demo at regex101
Feel free to adjust the parenthesis of the third group to only capture what's inside if needed.

Matching only a <tab> that is between two numbers

How to match a tab only when it is between two numbers?
Sample script
209.65834 27.23204908
119.37987 15.03317082
74.240635 8.30561924
29.1014 0
931.8861 -100.00000
-16.03784 -8.30562
;
_mirror
l
;
29.1014 0
1028.10 0.00
n
_spline
935.4875 250
924.2026913 269.8820375
912.9178825 277.4506484
890.348265 287.3181854
(in the above script, the tabs are between the numbers, not the spaces) (blank lines are significant; there is nothing in them, but I can't lose them)
I wish to get a "," between the numbers. Tried with :%s/\t/\,/ but that will touch the empty lines too, and the end of lines.
Try this:
:%s/\(\d\)\t\(-\?\d\)/\1,\2/
\d matches any digit. -? means "an optional -. The pair of (escaped) parenthesis capture the match, and \1 refers to the first captured match, \2 refers to the second.
google://vim+regex -> http://vimregex.com/ ->
:%s/\([0-9]\)\t\([0-9]\)/\1,\2/gc
You have 2 groups of numbers here ([0-9]) and tab-symbols \t between them. Add some escape symbols and you have the answer.
g for multichange in single line, c for some asking.
\1 and \2 are matching groups (numbers in your case).
It's not really hard to find answer for questions like that by yourself.
try
:%s/\([0-9]\)\t\([0-9]\)/\1,\2/g
explanation - search the patten <digit>\t<digit> and remember the part that matches <digit> .
\( ... \) captures and remembers the part that matches.
\1 recalls the first captured digit, \2 the second captured digit.
so if the match was on 123\t789, <digit>,<digit> matches 3\t7
the 3 and 7 are rememberd as \1 and \2
or
:g/[0-9]/ s/\t/,/g
explanation - filter all lines with a digit, then substitute tabs with a comma on those lines

Resources