I create a simple test file like this:
$ cat > test
blah
Now I run vi, and then :%!xxd to edit first bytes with FFD8 FF
00000000: ffd8 ffe0 0a blah.
and the I run :%!xxd -r.
file gives me NOT jpeg:
$ file test
test: Unicode text, UTF-8 text
And if I manage to get hexdump:
$ xxd test
00000000: c3bf c398 c3bf c3a0 0a .........
What am I doing wrong with xxd?
Thank you
When opening the file with vi, please make sure to say:
LC_ALL=C vi test
then edit and save the file with the shown procedure.
Related
in a nutshell, i would like to be able to type and display characters from iso-8859-1 on my cygwin mintty. unfortunately i haven't figured out how to do this.
my locale :
$ locale
LANG=C.ISO-8859-1
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_ALL=C
mintty is configured as an xterm (although it seems to make no difference what terminal emulation i choose), and through options => text, i have configured the 'locale' section as C and the character set as ISO-8859-1.
when i type any accented character from my keyboard, the character does not display on the terminal. however, if i invoke cat, the characters i type display correctly. also, when i edit using vi (well, vim, actually), i am able to type (and display) accented characters without problems. so the problem seems to have something to do with the shell and not with the terminal emulation itself.
furthermore, if i write a little script to make a file named, for example, être.utx, the file displays as ???tre.utx when i ls it. looking at its hex, i get
$ ls *.utx | od -c -tx1
0000000 357 203 252 t r e . u t x \n
ef 83 aa 74 72 65 2e 75 74 78 0a
0000013
so it seems the script i wrote is creating a file whose name begins with the trigramme 0xEF 0x83 0xAA, rather than the single-byte character whose encoding should be 0xEA. i don't know how to interpret this ; i know it isn't utf-8, which would be 0xC3 0xAA.
it appears there is only one character set in my cygwin configuration that is configured to support 8859-1 : norwegian. [of course, i suppose i could learn norwegian, but i would prefer something a bit less strenuous, if possible...]
in any case, does anyone have an idea what i am doing wrong ?
many thanks in advance.
Just set mintty's locale to something utf8-ish.
In my case:
Window Menu (Alt+Space)
Options… (o)
Text (l.h. panel)
Locale → en_GB
Character Set → UTF-8
[Save]
Quit and restart
$ locale
LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_ NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_ALL=
$ echo $'\u2154'
⅔
Nice
I have searched high and low for anyone asking a similar question. It does not seem to be a simple case of :set fileformat=dos or :set fileformat=unix.
Writing the file out with :set fileencoding=latin1 and :set fileformat=dos changed such that git diff reports all the lines to have ^M appended.
The code was originally happily existing as:
...
if (v == value32S)
{
...
I made the outrageously radical improvement to (which looks fine on the screen in vim):
...
if (v == value32S ||
v == value33)
{
...
But git diff to check for erroneous changes shows:
diff --git a/csettings.cpp b/csettings.cpp
index 1234..8901 100755
--- a/csettings.cpp
+++ b/csettings.cpp
## -2466,7 +2466,8 ## bool MyClass::settingIsValid(QString s)
#if CONFIG_1 || CONFIG_2
- if (v == value32S)
+ if (v == value32S ||^M
+ v == value33)^M
{
doSomething(new_v);
where the bold italic text is reverse video.
I have tried several means to make the apparently spurious carriage returns go away. First was to be sure there wasn't a hidden character. View with vim :set list:
...
if (v == value32S ||$
v == value33)$
{$
...
Seems fine. Dumping the file (microdetails vary to protect NDA, and I am too lazy to make it a perfect deception):
$ hd csettings.cpp
(...)
0000eae0 xx xx xx xx xx xx xx xx xx 65 33 32 53 20 7c 7c |(v == value32S |||
0000eaf0 0d 0a 20 20 20 20 20 20 20 20 20 20 20 20 76 20 |.. v |
0000eb00 3d 3a 20 xx xx xx xx xx xx 65 33 33 29 0d 0a 20 |== ...value33).. |
All of the other lines also end in "0d 0a", so this looks fine. An interesting suggestion was to use cat -e (which was new to me):
$ cat -e c.cpp
...
if (v == value32S ||^M$
v == value33)^M$
{^M$
...
Another suggestion was to use file for clues:
$ file csettings.cpp
csettings.cpp: C source, UTF-8 Unicode text, with CRLF line terminators
Interestingly, this is the only file in this directory (of header files and cpp code) which isn't ASCII text. Some files have CRLF line terminators and some do not. Also, some show C++ source and others are C source which I assume isn't significant.
Deleting the file and git checkout to get a fresh copy also shows it as UTF-8, which I traced to having the degree symbol in some strings ("°F" and "°C") so UTF-8 doesn't seem to be an issue.
Still, I don't see why using vim to edit only these lines would cause this problem. Or maybe it isn't a problem? Any ideas?
----- Addendum -----
git config --get-regexp core.* shows
core.repositoryformatversion 0
core.filemode true
core.bare false
core.logallrefupdates true
By default, Git assumes that you're using Unix line endings in the repository and highlights carriage returns as trailing whitespace. However, by default, it highlights trailing whitespace only on new lines, since the goal is to avoid introducing new problems.
If you run git diff --ws-error-highlight=all, you'll see that there are also carriage returns on the lines being removed and on the context lines. If you don't want to see this, you can set core.whitespace to cr-at-eol, which will prevent it from being highlighted. There are no ill effects to this; it simply prevents carriage returns from being treated as trailing whitespace.
If you're planning on using this project on non-Windows systems, you should convert the line endings to Unix and use a .gitattributes file to specify the text attribute for text files so the line ending is automatically converted based on the operating system in use. This may be valuable even if your project is only used on Windows, since if someone has core.autocrlf set, you may end up with mixed line endings.
Assuming you are using unix based Operating System.
Normally using vi or cat command, ^M characters are not visible.
You can see using cat -v command.
Eg.
cat -v < file_name >
To get rid of these characters use dos2unix command.
Eg.
dos2unix < file_name >
This will remove those ^M characters and save the result in same file itself. So you don't have to create any temp file for storing intermediate file content.
Why does this shell script add a return to the filename of the output file?
#!/bin/bash
/usr/bin/tail -n 1 /path/logchanged.csv >> "/path/logcontatenated.csv"
The filename is not called "logcontatenated.csv", but "logcontatenated.csv
"
I really can't find on the internet why this happens.
Could it be that you created that script using Windows? If the line ends in \r\n without trailing spaces the file name is interpreted as logcontatenated.csv\r. Try hd yourscript.sh to display a hexdump of your script. Line breaks should be only a single byte of 0a rather than two bytes of 0d 0a, i.e. make sure the byte before any 0a is NOT 0d. You could use dos2unix yourscript.sh to fix your script. You might need to install dos2unix first.
EDIT: Replaced 0c with 0d.
I was given a trace file in XML format (created on a Windows machine). When I open it in Vim or cat it on the command line (on Mac or Linux), it visually appears fine. But after an XML parser failed to load the document as I'd expect, I found out, after digging a little deeper, that there are non-printable chars througout:
h001:logs bill$ xxd trace.xml | head -n 3
0000000: fffe 3c00 3f00 7800 6d00 6c00 2000 7600 ..<.?.x.m.l. .v.
0000010: 6500 7200 7300 6900 6f00 6e00 3d00 2200 e.r.s.i.o.n.=.".
0000020: 3100 2e00 3000 2200 2000 6500 6e00 6300 1...0.". .e.n.c.
I then tried the following with no luck removing these non-printed chars:
:%s/[^[:print:]]//g
:%s/[^[:control:]]//g
:%s/[^[:null:]]//g
I'm figuring this is due to the fact I'm switching from Windows to Linux, but I'm not seeing any of the usual artifacts (e.g. ^M, ^#, etc).
Any thoughts on what's happening here and what would be the right way to remove these from within Vim?
The problem is your XML parser doesn't understand UTF-16.
You can convert it by opening an empty vim session and doing:
:e ++enc=utf-16le file.txt
:w ++enc=utf8
This will open the file with utf-16 little endian encoding, and the save it as utf-8.
Esteemed Meld and Emacs/ESS users,
What I did:
Create a script.r using Emacs/ESS.
Make some modifications to script.r by pulling some lines of code from another_script.r
Reopen another_script.r (or script.r) in Emacs/ESS to continue working.
All the lines in another_script.r which were not pushed to script.r end with ^M
Some times it's the other way around - only the line that was pushed/pulled ends with ^M's. So far i haven't isolated exactly which action determines where the ^M's are placed. Either way i still end up with ^M's all over the place and i'd like to avoid getting them after using Meld!
FWIW: the directory is being synced by Dropbox; in Meld, Preferences > Encoding tab, "utf8" is entered in the text box; all actions are performed under Linux (Ubunt 12.04) with Meld v1.5.3, Emacs v23.3.1
Current workaround is running in a terminal: dos2unix /path/to/script.r which strips the ^Ms. But this shouldn't be necessary and I'm hoping some one here can tell me how to avoid these.
Cheers.
In a terminal i ran cat script.r | hexdump -C | head and amongst the output returned found a 0d 0a, which is DOS formatting for a new line (carriage return 0d immediately followed by a line feed 0a). I ran the same command on another_script.r i was merging with but only observed 0a, no 0d 0a, indicating Unix formatting.
To check further if this was the source of the ^M line endings, script.r was converted to unix formatting via dos2unix script.r & verified that 0d 0a was converted to 0a using hexdump -C as above. I performed a merge using Meld in attempting to replicate the process which yielded ^M line endings in my script's. I re-oppened both files in Emacs/ESS and found no ^M line endings. Short of converting script.r back to dos formatting and repeating the above procedure to see if the ^M line endings re-appear, i believe i've solved my ^M issue, which simply is that, unbeknownst to me, one of my files was dos formatted. My take home message: in a Windows dominated environ, never assume that one's personal linux environment doesn't contain DOS bits. Or line endings.