How I can search in vim for non valid characters - vim

I've a file with has a invalid character, I want to learn how to search for those kind of characters. Please check image, is what appear after ..get Ale
I've checked with an heximal editor and the value is
f480 8080 0d
I tried with /[^[:alnum:][:punct:][:space:]] with it also returns spaces, dashes etc.

I am not quite sure what you mean by "invalid character," but if what you want is to search for non-printable characters, type:
/[^[:print:]]
If you want to search for ASCII hex 0d, type: /[\u000d] or /^Vx0d where "^V" means [Ctrl]-V (type V while holding the "Control" key pressed).
Check this link for more info: How do I replace or find non-printable characters in vim regex?

Related

How to replace bytes \xe3\x80\x80 with byte \x20 in vim?

Let's to create target file to operate with.
python3
>>> mfile = open("f:/test.txt","wb")
>>> mfile.write(b'\xe3\x80\x80')
3
>>> mfile.close()
Now to open f:/test.txt with xxd,you will see three bytes \xe3\x80\x80 in it,our target file encoding with utf-8 contains three bytes \xe3\x80\x80.
python3
b'\xe3\x80\x80'.decode('utf-8')
'\u3000'
It means that the unicode of three bytes in test.txt encoding with utf-8 is 3000.
:s/\%u3000/ /g
s/\%u3000/ /g can replace bytes \xe3\x80\x80 with byte \x20 in vim.
Issue remains still here.
:s/\%u3000/\%u20/g
:s/\%u3000/\%x20/g
:s/\%u3000/\x20/g
All the three formats above here can't work,why \xe3\x80\x80 can be expressed by \%u3000 in vim, (white blank) can't be expressed by \%u20 or \%x20 or \x20 ?
can express \x20, white blank is printable character,what's more, i want to replace the three bytes \xe3\x80\x80 with latin-1's nbsp?
The nbsp in latin-1 encoding means Non-breaking space which is NON PRINTABLE CHARACTERS,how to write the expression in vim?
:s/\%u3000/\%ua0/g
:s/\%u3000/\%xa0/g
:s/\%u3000/\xa0/g
None of them can work for the case.
You can type the \xe3\x80\x80 or u3000 character by pressing ctrl+v then u and then the 4 Unicode characters, in your case 3000 (check :help i_CTRL-V_digit ), since is a black character you will see nothing but just a space, you could type :set list to see all the places where you have that character or in any case add this to your .vimrc
set listchars=tab:▸\ ,eol:¬,trail:·,extends:#,nbsp:.
Now in the same way you enter the character, you could try to replace it within the command line, but in this case to be available to enter the ctrl+v you could try using the command-line window (:help cedit).
Go to command mode and after having the : press ctrl+f it will open the command-line window in where you could go into insert mode and type: %s/ctrl+vu3000/ /g and when done press enter to apply command.
Give a try first before entering the command-line window, since when using ctrl+v it may work, not like when using ctrl+k (http://vim.wikia.com/wiki/Entering_special_characters)
In the image instead of replacing with a white space / /, Is replacing with ---- just to visually see the changes.
1.How to input non printable characters when to edit a file in vim?
In the insert mode:
1.ctrl+v (ctrl+q if ctrl+v call paste from regitor)
2.input u
3.input the unicode value of non printable characters
4.input enter key
2.How to input non printable characters in substitute command of vim's ex mode?
For example, to replace all bytes \xe3\x80\x80 with \xa0,all byte's encoding is utf-8.
1. get the byte's unicode value
`\xe3\x80\x80`'s unicode value is `3000`,
`\xa0`'s unicode value is `a0`.
2.press `:` into ex mode.
3.:s/\%u3000/
4:ctrl+v ua0
do not input enter as above process
5.go on to input `/g`.
6.press enter.

How to insert Unicode character U+2611 in gvim

when I try to enter this Unicode character :☑(U+2611) in vim using the command like : ^Vu2611 (which means press ctrl+V then type u2611 in insert mode),Vim somehow breaks it into two characters : &(26) and ^Q(11).
There's no any problem when I tried to insert other kind of characters like □ (U+a1f5).
It seems like Vim stopped its parsing immediately after 26 (which represents character '&') has been read .
So,how can I insert this kind of Unicode characters in Vim (I have tried to paste it into Vim ,it doesn't work)?
Please Help!!!
In order to process Unicode characters, Vim must use an 'encoding' that is able to represent those characters. With a value of latin1, the mentioned character cannot be encoded (this 8-bit encoding only includes ASCII and several Western European characters, see here).
So, you need to
:set encoding=utf-8
With that, any newly created file will use that encoding, and you should be able to insert Unicode characters and write them (also with another Unicode file encoding, like :w ++enc=ucs-2le; but if you tried to persist as :w ++enc=latin1, you'd get a CONVERSION ERROR).

Removing hex code ffa3 in Vim

I've got a file with a load of weird characters with in it that I need to get rid of.
Using ga on the character reveals it has the following encodings:
ᆪ> 65443, Hex ffa3, Octal 177643
But I can't seem to find it using :%s/\%xffa3//g. What am I doing wrong?
Look at :help \%x:
\%x2a Matches the character specified with up to two hexadecimal characters.
So Vim is actually matching the three characters <uf>a3. Since you have a four-digit hex number, you need to use \%u:
:%s/\%uffa3//g
Alternatives
You can also insert the character directly into the command line via :help i_CTRL-V_digit (i.e. <C-v>uffa3), but if you already have instances of that character in your buffer (and near your cursor!), I'd just yank that char with yl and insert it in the command-line via <C-r>".

How to search for a character the displays as "<85>" in Vim

I have a file that was converted from EBCDIC to ASCII. Where there used to be new lines there are now characters that show up as <85> (a symbol representing a single character, not the four characters it appears to be) and the whole file is on one line. I want to search for them and replace them all with new lines again, but I don't know how.
I tried putting the cursor over one and using * to search for the next occurrence, hoping that it might show up in my / search history. That didn't work, it just searched for the word that followed the <85> character.
I searched Google, but didn't see anything obvious.
My goal is to build a search and replace string like:
:%s/<85>/\n/g
Which currently just gives me:
E486: Pattern not found: <85>
I found "Find & Replace non-printable characters in vim" searching Google. It seems like you should be able to do:
:%s/\%x85/\r/gc
Omit the c to do the replacement without prompting, try with c first to make sure it is doing what you want it to do.
In Vim, typing :h \%x gives more details. In addition to \%x, you can use \%d, \%o, \%u and \%U for decimal, octal, up to four and up to eight hexadecimal characters.
For special character searching, win1252 for example, for the case of <80>,<90>,<9d>...
type:
/\%u80, \/%u90, /\%u9d ...
from the editor.
Similarly for octal, decimal, hex, type: /\%oYourCode, /\%dYourCode, /\%xYourCode.
try this: :%s/<85>/^M/g
note: press Ctrl-V together then M
or if you don't mind using another tool,
awk '{gsub("<85>","\n")}1' file

vim and unrecognized characters

I have a file with some accents, and VIM displays them as "~V" characters. The "od -bc" command tells me the characters are charcode 226. I want to substitute them using VIM. But I can't get it to match the characters. How can I achieve that?
Optional question: how can I have VIM tell me which charset is used to interpret the current file?
You can use the following formats, from vim's manual on patterns and regular expressions:
ordinary atom
magic nomagic matches
\%d \%d match specified decimal character (eg \%d123
\%x \%x match specified hex character (eg \%x2a)
\%o \%o match specified octal character (eg \%o040)
\%u \%u match specified multibyte character (eg \%u20ac)
\%U \%U match specified large multibyte character (eg \%U12345678)
So you should be able to do something like this to replace char 226 with a space globally in the file:
:%s/\%d226/ /g
As for the latter, if you do:
:set encoding
You'll see output like:
encoding=latin1
One very simple way to deal with such "weird" characters is:
select the offending character(s) visually (v)
yank it to buffer
replace it with: :%s/<ctrl-r>"/something-else/g
where <ctrl-r> is pressing ctrl and letter r - together with " it will copy buffer to command line - effectively putting your offending characters inside of s/// operation.

Resources