unidentified characters in terminal

unidentified characters in terminal - linux

I faced these strange abnormal characters when I was trying to calculate PI number in terminal over a Beowulf cluster.
how can I convert these characters into some legible characters?
it's interesting that when I make less processes , the result is normal .
Thanks in advance.
edit:
This was done with mpich 1 and with 1000 processes over a 3-computer cluster.

Because the output has lots of Unicode replacement characters, it looks as if the locale settings on your machine are not set to use UTF-8 encoding.
Of course, it could simply be from attempting to print binary data on the terminal. But locale is a possibility. In either case, the terminal is running with UTF-8 encoding and your output is not valid UTF-8 text.
Resetting the terminal will not be helpful; it is the application (or your use of it) which is the problem.
Further reading:
Overcoming frustration: Correctly using unicode in python2
Avoid printing unicode replacement character in Java

Related

How to treat multibyte characters simply as a sequence of bytes?

I would like to use vim with binary files. I run run vim with -b and I have isprint = and display += uhex. I am using the following statusline:
%<%f\ %h%m%r%=%o\ (0x%06O)\ \ %3.b\ <%02B>\ %7P
so I get output containing some useful information like byte offset in the file and the current character in hex etc. But I'm having trouble with random pieced of data interpreted as multibyte characters which prevent me from accessing the inner bytes, combine with surroundings (including vim's decoration) or display as �.
Of course I have tried opening the files with ++enc=latin1. However, my system's encoding is UTF-8, so what vim supposedly does is convert the file from Latin-1 to UTF-8 internally and display that. This has two problems:
The sequence <c3><ac> displays as Ã¬, rather than ì, but the characters count as two bytes each, so it breaks my %o and counts offsets wrong. This is 2 bytes in the file but apparently 4 bytes in vim's buffer.
I don't know why my isprint is ignored. Neither of these characters are between 32 and 126 so they should display in hex.
I found the following workaround: I set encoding to latin1, but termencoding to utf-8. This achieves what I want, but breaks other things like when vim needs to display status messages ("new file", "changed" etc.) in my language, because it wants to use the encoding for them too and they don't fit. I guess I could run vim in LC_ALL=C but it feels I'm resorting to too many dirty tricks already. Is there a better way, i.e., without having to mess around with encoding?

How to set character in accumulo?

I use the accumulo shell to inspecte tables.I find I can insert english,But When I insert Chinese,I see some garbled in shell .how to deal with this problem.Can I set utf-8 in Accumulo?

The answer is no. Unfortunately, displaying UTF-8 characters is not currently possible. Accumulo primarily deals with raw bytes, not Strings of Characters. The shell currently (up to 2.0.0-alpha-2, at least) has very limited capabilities to display wide Unicode characters. The shell's behavior is to, for convenience, show printable 7-bit ASCII characters; the rest are shown as hex-encoded form.
While the current shell capabilities are limited, it is an open source project, that would welcome patches to better support any UTF-8 printable characters.

node JS console log ascii symbol

HiI am using node JS for my app, and I want to print ascii symbols in terminal.Here is a table for ascii symbols. Please check Extended ASCII Codes field. I want to print square or circle, for example 178 or 219.
Can anyone say me, how can do it?Thank you

Like several other languages, Javascript suffers from The UTF‐16
Curse. Except that Javascript has an even worse form of it, The UCS‐2
Curse. Things like charCodeAt and fromCharCode only ever deal with
16‐bit quantities, not with real, 21‐bit Unicode code points.
Therefore, if you want to print out something like 𝒜, U+1D49C,
MATHEMATICAL SCRIPT CAPITAL A, you have to specify not one character
but two “char units”: "\uD835\uDC9C".
Please refer to this link: https://dheeb.files.wordpress.com/2011/07/gbu.pdf
Your desired character is not a printable ASCII character. On linux you can print all the printable ascii characters by running this command:
for((i=32;i<=127;i++)) do printf \\$(printf '%03o\t' "$i"); done;printf "\n"
or
man ascii
So what you can do is to print unicode characters. Here is a list of all the available unicode characters, and you can select one which is looking almost identical with your desired character.
http://unicode-table.com/en/#2764
I've tested on a windows terminal but it is still not showing the desired character, but it's working on linux. If it's still not working you had to make sure to set LANGUAGE="en_US.UTF-8" in /etc/rc.conf and LANG="en_US.UTF-8" in /etc/locale.conf.
So printing out something like this on node console:
console.log('\u2592 start typing...');
will output this result:
▒ start typing...

Actually, if you only care about ASCII that should not be a real problem at all. You only have to properly escape them. A good reference for this is https://mathiasbynens.be/notes/javascript-escapes
console.log('\xB2 \xDB')
Works for me with recentish node under Windows (cmd shell) and mac OS. For ASCII characters you can just convert them to hex and prepend them with \x in your strings. Give it a try with node -e "console.log('\xB2')"

And when you try this answer, and it works, you might want to try:
node -e "console.log('\x07')"

Removing lines containing encoding errors in a text file

I must warn you I'm a beginner. I have a text file in which some lines contain encoding errors. By "error", this is what I get when parsing the file in my linux console (question marks instead of characters):
I want to remove every line showing those "question marks". I tried to grep -v the problematic character, but it doesn't work. The file itself is UTF8 and I guess some of the lines come from texts encoded in another format. I know I could find a way to reconvert them properly, but I just want them gone for now.
Do you have any ideas about how I could do this please?
PS: Some lines contain diacritics which are displayed fine. The "strings" command seems to remove too many "good" lines.

When dealing with mojibake on character encodings other than ANSI you must check 2 things:
Is the file really encoded in X? (X being UTF-8 WITHOUT BOM in your case. You could be trying to read UTF-8 WITH BOM, UTF-16, latin-1, etc. as UTF-8, and that would be the problem). Try reading in (not converting to) other encodings and see if any of them fits.
Is your locale or text editor set to read the file as UTF-8? If not, that may be the problem. Check for support and figure out how to change the setting. In linux try locale and setlocale commands to check and set it properly.
I like how notepad++ for windows (which also runs perfectly in linux using wine) lets you set any encoding you want to read the file without trying to convert it (of course if you set any other than the one the file is encoded in you will only see those weird characters), and also has a different option which allows you to convert it from one encoding to another. That has been pretty useful to me.
If you are a beginner you may be interested in this article. It explains briefly and clearly the whats, whys and hows of character encoding.
[EDIT] If the above fails, even windows-1252 and such ANSI encodings, I've just learned here how to remove non-ascii characters using tr unix command, turning it into ASCII (but be aware information on extra characters is lost in this output and there is no coming back, so keep the input file just in case you find a better fix):
tr -cd '\11\12\40-\176' < $INPUT_FILE > $OUTPUT_FILE
or, if you want to get rid of the whole line:
grep -v -P "[^\11\12\40-\176]" $INPUT_FILE > $OUTPUT_FILE
[EDIT 2] This answer here gives a pretty good guess of what could be happening if none of the encodings work on your file (Unfortunately the only straight forward solution seems to be removing those problematic characters).

You can use a micro-Perl script like:
perl -pe 's/[^[:ascii:]]+//g;' my_utf8_file.txt

Using vim+LaTeX with Scandinavian characters

I want to create a lab write-up with LaTeX in Ubuntu, however my text includes Scandinavian characters and at present I have to type them in using /"a and "/o etc. Is it possible to get the latex-compiler to read these special characters when they are typed in as is? Additionally, I would like vim to "read" Finnish: Now when I open a .tex-document containing Scandinavian characters, they are not displayed at all in vim. How can I correct this?

For latex, use the inputenc option:
\usepackage[utf8]{inputenc}
Instead of utf8, you may use whatever else fits you, like latin1, as well.
Now the trick is to make your terminal run the same character encoding. It seems that it runs a character/input encoding that doesn't fit your input right now.
For this, refer to the "Locale" settings of your distribution. You can always check the locale settings in the terminal by issueing locale. These days, UTF8 locales are preferred as they work with every character imaginable. If your terminal's environment is set up correctly, vim should happily work with all your special characters without mourning.

To find out in which encoding Vim thinks the document is, try:
:set enc
To set the encoding to UTF-8, try:
:set enc=utf8

I can't help with vim, but for LaTeX I recommend you check out XeTeX, which is an extension of TeX that is designed to support Unicode input. XeTeX is now part of Texlive, so if you have TeX installed chances are you already have it.

I use the UCS unicode support: http://iamleeg.blogspot.com/2007/10/nice-looking-latex-unicode.html

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string