I'm used to using vim to modify a file's line endings:
$ file file
file: ASCII text, with CRLF line terminators
$ vim file
:set ff=mac
:wq
$ file file
file: ASCII text, with CR line terminators
Is it possible to use a similar process to change a file's unicode encoding? I'm trying the following, which doesn't work:
$ file file.xml
file.xml: Unicode text, UTF-16, little-endian
$ vim file
:set encoding=utf-8
:wq
$ file file.xml
file.xml: Unicode text, UTF-16, little-endian
I saw someone say that he could "set fileencoding=utf-8, then update and write the file, and it works," but I seem to be missing something, or else he was confused. I don't know what he meant by "then update."
From the doc:
:write ++enc=utf-8 russian.txt
So you should be able to change the encoding as part of the write command.
Notice that there is a difference between
set encoding
and
set fileencoding
In the first case, you'll change the output encoding that is shown in the terminal. In the second case, you'll change the output encoding of the file that is written.
While using vim to do it is perfectly possible, why don't you simply use iconv? I mean - loading text editor just to do encoding conversion seems like using too big hammer for too small nail.
Just:
iconv -f utf-16 -t utf-8 file.xml > file.utf8.xml
And you're done.
Just like your steps, setting fileencoding should work. However, I'd like to add one "set bomb" to help editor consider the file as UTF8.
$ vim file
:set bomb
:set fileencoding=utf-8
:wq
It could be useful to change the encoding just on the command line before the file is read:
rem On MicroSoft Windows
vim --cmd "set encoding=utf-8" file.ext
# In *nix shell
vim --cmd 'set encoding=utf-8' file.ext
See starting, --cmd.
auto GUIEnter * set encoding=utf-8
should help
Related
I know the typical method to set line number in file using set number, however i want to use this as while opening the file. something like this
vim filename ******
You can pass arbitrary commands to vim with +.
$ vim somefile.txt '+set number'
See the vim man page.
However, customizations like this usually go in your ~/.vimrc file:
set number
by edit I mean the act of opening it, modifying it, and saving it.
I have some windows .reg files I want to edit from vim. It looks like I'm stuck at telling vim to use proper encoding when opening them:
commands like :e ++enc=latin-1 are not working.
What is the proper way to edit .reg files in windows gvim?
It works for me. My settings are:
filetype on
:echo &filetype &encoding &fileencoding &ff
registry utf-8 utf-16le dos
You either need to explictly specify the encoding:
:edit ++enc=utf-16le file.reg
or ensure that 'fileencodings' starts with ucs-bom, and you use a Unicode encoding (you'll probably need the latter also for the previous method to correctly see all characters):
:set encoding=utf-8
:set fencs^=ucs-bom
I believe Windows uses a UTF-16 encoding, so try reloading the file with
:e! ++enc=utf-16le
I'm trying to remove all CLRF characters from a projects git repo. I'm writing a command to grep through the repo recursively to find instances. However, some of the 'hits' when opened in vim show very explicitly that there is ^M and yet others do not display these characters.
However, when running
file <filename without visual ^M>
It says
blah.java ASCII Java program text, with CRLF line terminators
and
od -cx <filename without visual ^M>
returns with \r\n peppered throughout.
I'm just interested why vim sometimes shows them and sometimes not.
EDIT:
I created a test text file and manually added ^M (ie ctrl V + ctrl M) and vim displayed those characters. Then I ran:
sed -i '' -e 's/\r//g' controlm.txt
And opened the file with vim and the visual ^M were gone, but od -cx still showed \r \n, however I then ran
sed -i '' -e 's/^M//g' controlm.txt
Then it removed not only the visual ^M in vim as well as I've confirmed that od -cx displays that \r \n are now just \n.
This question would probably better be asked on Superuser.com, not here, because it's about using vim, not programming. But to answer it:
When opening a file, vim tries to detect if it's a MS-DOS/Windows or a unix file. If all lines are terminated by \r\n, it's probably a DOS file, if only some of them are, vim may assume unix as well. If the file format is set to DOS, vim ignores \r when reading the file, and shows [dos] in the status line directly after reading the file.
When writing back the file, it terminates each line with \r\n; if the file format is unix, it terminates lines with \n. You can set the mode with the command
:se fileformat=unix
or
:se fileformat=dos
Try creating a file x.txt in Windows, open it in vim. Then, :se fileformat=unix and :w y.txt; then :se fileformat=dos and :w z.txt. Test y.txt and z.txt with od cx. y.txt will have \r\n line endings, z.txt won't.
When only some, but not all, lines in the file end in \r, for example if (unix) git added some headers (without \r) to a file that was created on dos/windows, the file format detection sees the headers first, assumes unix, does not remove the \r from the rest of the file when reading, and shows those as ^M.
#Guntram Blohm is right, but he left out part of the answer: the 'fileformats' option (short form 'ffs'). If you
:set ffs=unix
:e dosfile.txt
then vim will stubbornly refuse to accept the CRLF line endings and will show all the ^M characters explicitly.
:help 'ffs'
When I try to execute my script I got ^M is an invalid character but in Vim, I see $ upon entering :set list
I tried :%s/^V^M//g but it says ^M pattern is not found
I guessed this occurred because I used some .vimrc I found here which converts the end of line characters to $
Without figuring this out, my only option would be retyping my script.
It looks like your script consistently has ^M line endings, and therefore got detected as fileformat=dos. :setlocal fileformat? will tell you.
To convert this file to Unix (LF) line endings, just :setlocal fileformat=unix and :write, or combine this in :w ++ff=unix.
If you never want Vim to detect such files (and show the ^M instead), put :set fileformats-=dos into your ~/.vimrc (or edit an existing config).
se nolist and the dollar signs will disappear.
No matter the OS, you always have line endings in your text.
Line endings are whitespace and always present, but usually just not shown.
Are you using Windows or *nix?
in windows, you can replace ^M as ctlr-q ctrl-m to input the ^M. In *nix, you can just use dos2unix to translate your script file to unix format.
If after opening your file you don’t see ^M at the end of line, but when you try sourcing it vim does show complains about ^M in various places the only thing you need to do is w ++ff=unix and reopen this file.
When you open a file vim detects line ending format. Thus trying to substitute ^M will not work: all detected line endings are converted into internal string end. E.g. when file format is dos like in your case it looks like
set nocompatible\r\nset ignorecase\r\n...
(where \r is carriage return, sometimes represented as ^M, and \n is line feed character, and \r\n sequence is dos line ending). When file format is unix it looks like
set nocompatible\nset ignorecase\n...
. For mac it looks like
set nocompatible\rset ignorecase\r...
. But if vim correctly detected line ending all these files transform into
"set nocompatible"
"set ignorecase"
"..."
C strings in internal structure representing buffer, each string represents one line. No \r and no \n are there.
When you do :w files are converted back into a sequence of bytes. :w ++ff=unix forces line endings. Reopening is needed because fileformat setting is not changed in this case thus next w without ++ff will save with dos line endings again. When you reopen line endings are redetected and fileformat setting is reassigned. You can do :set fileformat=unix manually after :w ++ff=unix, but :e is much faster to type.
Is there a text editor on Linux that allows me to see line breaks and carriage returns? Does Vim support this feature?
To disagree with the official answer:
:set list will not show ^M characters (CRs). Supplying the -b option to vi/Vim will work. Or, once Vim is loaded, type :e ++ff=unix.
Assuming your vim settings for :set listchars=... is set to visualize the characters you are attempting to see, in this case the carriage return characters (typed with CTL + V, CTRM + M) —— otherwise, as reported in many of the comments on this answer, the ^M character will not show on :set list
:set list in Vim will show whitespace. End of lines show as '$' and carriage returns usually show as '^M'.
vi shows newlines (LF character, code x0A) by showing the subsequent text on the next line.
Use the -b switch for binary mode. For example , vi -b filename or vim -b filename --.
It will then show CR characters (x0D), which are not normally used in Unix style files, as the characters ^M.
Just to clarify why :set list won't show CR's as ^M without e ++ff=unix and why :set list has nothing to do with ^M's.
Internally when Vim reads a file into its buffer, it replaces all line-ending characters with its own representation (let's call it $'s). To determine what characters should be removed, it firstly detects in what format line endings are stored in a file. If there are only CRLF '\r\n' or only CR '\r' or only LF '\n' line-ending characters, then the 'fileformat' is set to dos, mac and unix respectively.
When list option is set, Vim displays $ character when the line break occurred no matter what fileformat option has been detected. It uses its own internal representation of line-breaks and that's what it displays.
Now when you write buffer to the disc, Vim inserts line-ending characters according to what fileformat options has been detected, essentially converting all those internal $'s with appropriate characters. If the fileformat happened to be unix then it will simply write \n in place of its internal line-break.
The trick is to force Vim to read a dos encoded file as unix one. The net effect is that it will remove all \n's leaving \r's untouched and display them as ^M's in your buffer. Setting :set list will additionally show internal line-endings as $. After all, you see ^M$ in place of dos encoded line-breaks.
Also notice that :set list has nothing to do with showing ^M's. You can check it by yourself (make sure you have disabled list option first) by inserting single CR using CTRL-V followed by Enter in insert mode. After writing buffer to disc and opening it again you will see ^M despite list option being set to 0.
You can find more about file formats on http://vim.wikia.com/wiki/File_format or by typing:help 'fileformat' in Vim.
Try the following command.
:set binary
In Vim, this should do the same thing as using the "-b" command line option. If you put this in your startup (i.e., .vimrc) file, it will always be in place for you.
On many *nix systems, there is a "dos2unix" or "unix2dos" command that can process the file and correct any suspected line ending issues. If there aren't any problems with the line endings, the files will not be changed.
I suggest you to edit your .vimrc file, for running a list of commands.
Edit your .vimrc file, like this:
cat >> ~/.vimrc <<EOF
set ffs=unix
set encoding=utf-8
set fileencoding=utf-8
set listchars=eol:¶
set list
EOF
When you're executing Vim, the commands in file .vimrc are executed, and you can see this example:
My line with CRLF eol here ^M¶
By using cat and -A you can see new lines as $ and tabs as ^I:
cat -A myfile
You can view break lines using the gedit editor.
First, if you don't have it installed, for Debian/Ubuntu/Mint based distros:
sudo apt-get install gedit
For Fedora/CentOS/RedHat based distros:
sudo dnf install gedit
or
sudo yum install gedit
Now, install gedit plugins:
sudo apt-get install gedit-plugins
or
Under Gnome2, user plugins were put into ~/.gnome2/gedit/plugins/
For Gnome3: ~/.local/share/gedit/plugins/
Download the plugins from: https://help.gnome.org/users/gedit/stable/gedit-plugin-guide.html.en#gedit-additional-plugins
and select Draw Spaces plugin, enter on Preferences, and chose Draw new lines:
Using Visual Studio Code, you can install the Line endings extension.
Sublime Text 3 has a plugin called RawLineEdit that will display line endings and allow the insertion of arbitrary line-ending type
Shift + Ctrl + P and start type the name of the plugin, and toggle to show line endings.
Add the following alias to your .bashrc or .bash_aliases:
alias vidos='vi -c ":e ++ff=unix" -c "set list"'
Then you can use vidos to edit the file and see newline as $ and carriage return as ^M.