A recent problem* left me wondering whether there is a text editor out there that lets you see every single character of the file, even if they are invisible? Specifically, I'm not looking for hex editing capabilities, I am interested in a text editor that'll show me all of the invisible characters (not just the common whitespace / line break characters). The BOM marker is just one example, others are e.g. mathematical invisibles or possibly unsupported characters.
I'm not looking for a text editor that simply supports a large variety of text encoding / translations between encodings. All text editors I've come across treat the invisible characters correctly i.e. leave them invisible (or simply get removed in the translation as in the case of the BOM marker).
I'm asking this mostly out of academic interests, so I'm not particular about any specific OS. I can easily test Linux and OSX solutions, but if you recommend a Windows editor, I would appreciate if you include descriptions of how the editor handles invisibles other than whitespace / line breaks.
EDIT: I'm beginning to be sure that the behavior I want can be implemented in emacs/vim via either custom highlighting or by messing around with the font itself. A solution of this type would also be acceptable.
EDIT2: After looking at several options I found TextMate which at least shows a blank space where an invisible UTF-8 character is in the file. Slightly disappointed with SO's ability to answer my question. Bounty goes to VIM, because that is the direction in which the solution most likely lies.
*The incident that lead me to this question: I wrote a perl script using TextWrangler and managed to change the encoding to UTF8 BOM, which inserts the BOM marker at the start of the file. Perl (or rather the operating system) promptly misses the #! and mayhem ensues. It then took me the better part of an afternoon to figure this out since most text editors do not show the BOM marker even with various "show invisibles" options turned on. Now I've learned my lesson and will use less immediately :-).
vim (in either textual or graphic mode) can show all control characters if you :set list. The BOM is a special case, controlled by the :set bomb or :set nobomb commands.
In Visual Studio's Open File dialog, the Open pushbutton has a down arrow next to it that lets you choose Open With.... One of the options in the resulting dialog is Binary Editor.
I've used this now and then to spot some invisible character or to resolve some line-ending issue.
Notepad++ rocks:
Open the file in EMACS and do a M-X hexl-mode. You'll get a display that looks like this:
87654321 0011 2233 4455 6677 8899 aabb ccdd eeff 0123456789abcdef
00000000: 2320 2020 2020 2020 2020 2020 2020 2020 #
00000010: 2020 2020 2020 2020 2020 2020 2020 2020
00000020: 2020 2020 2020 2020 2020 2020 2020 2020
00000030: 2d2a 2d20 4175 746f 636f 6e66 202d 2a2d -*- Autoconf -*-
00000040: 0a23 2050 726f 6365 7373 2074 6869 7320 .# Process this
00000050: 6669 6c65 2077 6974 6820 6175 746f 636f file with autoco
00000060: 6e66 2074 6f20 7072 6f64 7563 6520 6120 nf to produce a
00000070: 636f 6e66 6967 7572 6520 7363 7269 7074 configure script
00000080: 2e0a 2320 4f72 6465 7220 6973 206c 6172 ..# Order is lar
00000090: 6765 6c79 2069 7272 6576 656c 6c61 6e74 gely irrevellant
000000a0: 2c20 616c 7468 6f75 6768 2069 7420 6d75 , although it mu
000000b0: 7374 2073 7461 7274 2077 6974 6820 4143 st start with AC
000000c0: 5f49 4e49 5420 616e 6420 656e 6420 7769 _INIT and end wi
000000d0: 7468 2041 435f 4f55 5450 5554 0a23 2053 th AC_OUTPUT.# S
000000e0: 6565 2068 7474 703a 2f2f 6175 746f 746f ee http://autoto
000000f0: 6f6c 7365 742e 736f 7572 6365 666f 7267 olset.sourceforg
00000100: 652e 6e65 742f 7475 746f 7269 616c 2e68 e.net/tutorial.h
I've encountered the same limitations — my specific issue is the need to be able to display characters like U+200B, the zero-width space, and U+200C, the zero-width non-joiner. (Used in electronic texts with such languages as Khmer, which otherwise do not separate words with spaces.) Unlike you, instead of "platform doesn't matter," I need an editor with Windows and Linux versions, and Mac too is desirable.
I haven't found any text editors that will let you display them on-screen, although some (many?) will let you enter them and will properly treat them as characters that can be cut and pasted and whose presence is indicated via cursor movement. (That is, if the screen shows "if" and there are three ZWSP's between the "i" and "f," you have to press the arrow key four times to move from "i" to "f.")
TextPad 4.7.3 is otherwise my text editor of choice, but it is very limited in its acceptance of scripts; and TextPad 5 definitely does not show these invisibles.
I have often resorted to opening my files in OpenOffice.org Writer, which will show a gray slash at these characters' location with invisibles turned on, and Microsoft Word, which displays a double-box (box within a box) character for such invisibles. This double-box has width and changes the line-breaks on-screen, which is not trivial and which I haven't seen in any other editor.
You can also user Notepad ++ to show them. Here is an example, the black boxes are control characters.
I prefer UltraEdit even though it is not free. It is very capable of showing hidden characters, including a robust HEX viewing mode. (I am not affiliated with the publisher, IDM.)
I am not sure as I haven't used it in a while, but I remember that SciTE was a good one that showed me "too much information" for my needs.
Programmer's Notepad on Windows might work.
TextPad (It's nagware, runs on Windows)
I'm not sure which of these will show the hidden characters out of the box, but they're all made for "nerdy" stuff, so I assume that they would work ,at least with a little tweaking. I can verify that Programmer's Notepad does show "hidden" characters.
If you are running a 32 bit version of windows, you can see BOMs and other invisible characters such as carriage return or line feeds that look like a music eighth note in MS Dos Editor which you can open by typing "edit" in the run box or from a command prompt.
Unfortunately, the ms dos editor is not available on 64 bit systems :(
Related
In GCP compute Linux Accidentally did cat filebeat instead of filebeat.yaml
after that my bashrc contains below chars and if I type '~' bash is printing 'ü'
Need help in fixing this
if Ä -f ü/.bash_aliases Å; then
. ü/.bash_aliases
fi
This looks like your terminal was accidentally configured for legacy ISO-646-SE or a variant. Your file is probably fine; it's just that your terminal remaps the display characters according to a scheme from the 1980s.
A quick hex dump should verify that the characters in the file are actually correct. Here's an example of what you should see.
bash$ echo '[\]' | xxd
00000000: 5b5c 5d0a [\].
Even if the characters are displayed as ÄÖÅ, they are correct if you see the hex codes 5B, 5C, and 5D. (If you don't have xxd, try hexdump or od -t x1.)
Probably
bash$ tput reset
can set your terminal back to sane settings. Maybe stty sane might work too (but less likely, in my experience). Else, try logging out and back in.
Back when ASCII was the only game in town, but American (or really any) hardware was exported to places where the character repertoire was insufficient, the local vendor would replace the ROM chips in terminals to remap some slightly less common character codes to be displayed as the missing local glyphs. Over time, this became standardized; the ISO-646 standard was updated to document these local overrides. (The linked Wikipedia page has a number of tables with details.)
Eventually, 8-bit character sets became the norm, and then most locales switched to Latin-1 or some other suitable character set which no longer needed this hack. However, it was still rather prevalent even in the early 1990s. In the early 2000s, Unicode started taking over, and so now this seems like an absurd arrangement.
I'm guessing the file you happened to cat contained some control characters which instructed your terminal to switch to this legacy character set. It's not entirely uncommon (though usually when it happens to me, it switches to some "graphical" character set where some characters display box-drawing characters or mathematical symbols).
I've been tinkering with multiple hex editors but nothing really has worked.
What I'm looking for is a way to change a binary in actual binary (not in hex). This is purely for educational purposes and I know it's trivial to convert between both, but I wanted to be able to change the ones and zeroes just like I would do hex.
I've tried using vim with the %!xxd -b but then it won't work with %!xxd -r. I know how to convert the file into binary, but I'm looking for a way to dynamically change it in this format and being able to save it.
Better yet would be if I could find a way to actually create a binary by coding purely in actual binary.
Any help would be appreciated :D
vim or gvim should work for you directly, without the xxd filter.
Open the file in (g)vim. Place your cursor on a character and type ga to see its character code in the status line. To insert character NNN, place your cursor where you want it, go in insert mode and type Ctrl-v and then the three digit decimal code value. Use Ctrl-v x HH to enter the character by its hexadecimal code.
Make sure your terminal is not set to use UTF8, because in UTF8, typing Ctrl-v 128 will in fact insert c280, the utf-8 encoding of character 128, instead of 80.
LC_ALL=C vim binary-file
is the easiest way to make sure you're doing binary character based editing in vim, but that might do weird things if your terminal is utf-8.
LC_ALL=C gvim binary-file
should open a stand-alone window with proper display.
FYI, if you did want to work in utf-8, Ctrl-v u HHHH is how to enter the Unicode character with Hex code point HHHH.
windows
open cmd.exe or notepad++ or whatever editor
enable numlock key
On laptops you need to use the function key or the blue / grey silver numbers above alphabet keys (using the numbers on the top line will not work as they map to different scan code.
press alt key + 255 will correspond to 0xff
press alt key + 254 will correspond to 0xfe
see below for a demo
C:\>copy con rawbin.bin
■²ⁿ√·∙⌂~}─^Z
^Z
1 file(s) copied.
C:\>xxd rawbin.bin
0000000: fffe fdfc fbfa f97f 7e7d c41a 0d0a ........~}....
C:\>
I want/need to edit files with UTF-8 characters in it and I want to use Vim for it.
Before I get accused of asking something that was asked before, I've read the Vim documentation on encoding, fileencoding[s], termencoding and more, googled the subject, and read this question among other texts.
Here is a sentence with a UTF-8 character in it that I use as a test case.
From Japanese 勝 (katsu) meaning "victory"
If I open the (UTF-8) file with Notepad it is displayed correct.
When I open it with Vim, the best thing I get is a black square where the Japanese character for katsu should be.
Changing any of the settings for fileencoding or encoding does not make a difference.
Why is Vim giving me a black square where Notepad is displaying it without problems? If I copy the text from Vim with copy/paste to Notepad it is displayed correctly, indicating that the text is not corrupted but displayed wrong. But what setting(s) have influence on that?
Here is the relevant part of my _vimrc:
if has("multi_byte")
set encoding=utf-8
if &termencoding == ""
let &termencoding = &encoding
endif
setglobal fileencoding=utf-8
set fileencodings=ucs-bom,utf-8,latin1
endif
The actual settings when I open the file are:
encoding=utf-8
fileencoding=utf-8
termencoding=utf-8
My PC is running Windows 10, language is English (United States).
This is what the content of the file looks like after loading it in Vim and converting it to hex:
0000000: efbb bf46 726f 6d20 4a61 7061 6e65 7365 ...From Japanese
0000010: 20e5 8b9d 2028 6b61 7473 7529 206d 6561 ... (katsu) mea
0000020: 6e69 6e67 2022 7669 6374 6f72 7922 0d0a ning "victory"..
The first to bytes is the Microsoft BOM magic, the rest is just like ASCII except for the second, third and fourth byte on the second line, which must represent the non-ASCII character somehow.
There are two steps to make Vim successfully display a UTF-8 character:
File encoding. You've correctly identified that this is controlled by the 'encoding' and 'fileencodings' options. Once you've properly set this up (which you can verify via :setlocal filenencoding?, or the ga command on a known character, or at least by checking that each character is represented by a single cell, not its constituent byte values), there's:
Character display. That is, you need to use a font that contains the UTF-8 glyphs. UTF-8 is large; most fonts don't contain all glyphs. In my experience, that's less of a problem on Linux, which seems to have some automatic fallbacks built in. But on Windows, you need to have a proper font installed and configured (gVim: in guifont).
For example, to properly display Japanese Kanji characters, you need to install the far eastern language support in Windows, and then
:set guifont=MS_Gothic:h12:cSHIFTJIS
Questions
In the following example:
Why in the first line, a \t is inserted but only 4-col blank is displayed? Shouldn't it be 8 according to tabstop?
Why the two <TAB> hits in the first line and fifth lines have different results? (one is 09 and the other is 20202020)
.vimrc
set noexpandtab
set tabstop=8
set shiftwidth=4
set softtabstop=4
Text
I created the following text by first inserting 8 empty lines and then typing at the beginning of each line, so that there is no shiftwidth generated. (each 4-col blank below is inserted by hitting <TAB>):
1tab done
2tabs done
3tabs done
4tabs done
1tab
2tabs
3tabs
4tabs
hex representation
Then I did :%!xxd, this is what I got (with some trailing newlines):
0000000: 3174 6162 0964 6f6e 650a 3274 6162 7309 1tab.done.2tabs.
0000010: 2020 2020 646f 6e65 0a33 7461 6273 0909 done.3tabs..
0000020: 646f 6e65 0a34 7461 6273 0909 2020 2020 done.4tabs..
0000030: 646f 6e65 0a20 2020 2031 7461 620a 0932 done. 1tab..2
0000040: 7461 6273 0a09 2020 2020 3374 6162 730a tabs.. 3tabs.
0000050: 0909 3474 6162 730a 0a0a 0a0a 0a0a 0a0a ..4tabs.........
Related discussion
There are some pretty good answers here but I still don't understand what is going on in this particular case.
softtabstop is meant to be used that way, it will make your tabs appear a certain length even though they really are the length of tabstop
'softtabstop' 'sts' number (default 0)
...
This is useful to keep the 'ts' setting at its standard value
of 8, while being able to edit like it is set to 'sts'. However,
commands like "x" still work on the actual characters.
As for the weird behaviour of inserting spaces at the beginning, I'm convinced it has to do with smarttab which is on by default and shifts instead of inserting a tab at the beginning of a line.
'smarttab' 'sta' boolean (default on) global
When on, a < Tab> in front of a line inserts blanks according to
'shiftwidth'. 'tabstop' or 'softtabstop' is used in other places. A
will delete a 'shiftwidth' worth of space at the start of the
line.
IGI-111 answered your first question correctly, albeit maybe not clearly enough.
Why in the first line, a \t is inserted but only 4-col blank is displayed? Shouldn't it be 8 according to tabstop?
Keep in mind that TAB doesn't insert a fixed number of blanks, but advances the position to the next multiple of a certain number (if numbered from zero).
What do you mean by "even though they really are the length of
tabstop"?
He must mean that TAB characters, when printed to your terminal, would cause the cursor position to advance to the next multiple of tabstop (8), while pressing Tab in vim advances to the next multiple of softtabstop (4). If this next multiple of softtabstop isn't a multiple of tabstop, vim has to use spaces (0x20).
Why the two <TAB> hits in the first line and fifth lines have different results? (one is 09 and the other is 20202020)
That's because in the first line you hit Tab after you typed 1tab, i. e. you were at position 4 and to advance to position 8, which is a multiple of tabstop, a TAB (0x09) did the job, while in the fifth line you hit Tab at the beginning position 0 and to advance to position 4, which is not a multiple of tabstop, spaces are needed.
I work a lot with files which contain data on fixed positions. Non-delimited "CSV" files if you will... Often, I'd like to highlight a specific column.
I tried
:match ErrorMsg /\%>30v.\+\%<40v/
but this runs extremely slow and only matches the first line. I suppose the file may be too large for this. Mind you, the files are very wide (around 40000 characters) but not very long (around 2000 lines). The data originates from old tools over which I have no control.
Example file:
63082
01089
75518 735301
53473 017146
37217
07
940376
762 2842
88331
40680 8928
645718
0131
03522
47210 27431
93837
8825072 49479415
52084 8940
0591705 205635
525429
65339 300
0397
1983
0
2605768
121991 648
3892
1260
Any ideas?
Are you using Vim 7.3?
Apparently they just recently added a colorcolumn option.
Try:
:set colorcolumn=31,32,33,34,35,36,37,38,39
Note that :help 'colorcolumn' says "Will make screen redrawing slower". I somewhat replicated your scenario, though, by using pure blocks of 1234567890 with the exact repetition count you specified.
The command you mentioned is very slow. colorcolumn isn't.
but this runs extremely slow and only matches the first line
By "first line" do you mean the first displayed line, when word wrapping is enabled? Unfortunately colorcolumn will exhibit the same behavior...
This is off the original topic, but Google leads people here. When trying to fix a horribly indented YAML or any other swiftwidth=2 file, I really struggle to visually recognize what is and isn't in a valid column. A comment by #ib to the accepted answer lead me to this gem.
:let &l:colorcolumn = join(range(3,15,2),',')
It basically says set colorcolumn to the comma delimited string value of 3 through 15 counted by 2. (In other words: :set colorcolumn=3,5,7,9,11,13,15) The result looks like this:
You can do a simple :set colorcolumn to see what value results.
To get rid of it do :set colorcolumn=