In layman terms my goal is to change how the "dot" button on numeric keyboard behave. Now once tapped it produces a "comma". I need it to produce a "dot".
After research I started toting with locale. Apparently my locale is set to en_US:
[xxx#xxx ~]$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
I've looked into what i presume is a proper config file for this particular locale:
/usr/share/i18n/locales/en_US
and looked for anything that might be related to "dot", "decimal separator" etc. Found LC_MONETARY and LC_NUMERIC, however mon_decimal_point for monetary and decimal_point for numeric were already set to - which I'm quite sure is a "dot".
Just for giggles I also changed mon_thousands_sep and thousands_sep to and restarted. No help here.
My machine:
RHEL
xxxx#xxxxx ~]$ uname -a
Linux xxxxxx 2.6.32-642.4.2.el6.x86_64 #1 SMP Mon Aug 15 02:06:41 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Now - this is a corporate computer with some strict security policies in place, so it would not be possible for me to just
yum -install some_magic_keyboard_mapping_app
I need to change it the old style.
I have a virtual machine set up, so I can mess it up as much as i want prior to changing things on my work laptop.
CentOS 7.7
I had the same problem. My locale is set to en_US.UTF-8 but my keyboard is Swedish. The localized decimal separator in Sweden is a comma so I guess by default it is actually correct but most software is not aware of this so it is generally a problem. I found a simple solution. On the command line just do (root privileges not required):
setxkbmap -option kpdl:dot
To query the keyboard settings do:
setxkb -query
To clear all set options do:
setxkbmap -option ''
See this post for more details:
https://askubuntu.com/questions/209636/how-to-change-decimal-comma-to-decimal-period-in-numpad
I was having the same problem in Ubuntu 18.04 LTS from Linux/GNU. I pressed . button on the numeric keyboard, but the result was as if I had pressed the , button.
I was able to solve my problem by changing the keyboard configuration in the System Settings → Region & Language → Input Sources option. The language of my country is Portuguese but I changed the language to English and the problem is over.
Related
I have a strange problem: I cannot type or copy the percent sign in my bash...
I tried to read ~/.bashrc, /etc/profile (and stuff in /etc/profile.d). I also tried "sudo bash", but still not possible to type "%". Percent sign in "sh" works...
Any suggestions?
uname -a
Linux 3.2.0-65-generic #99-Ubuntu SMP Fri Jul 4 21:03:29 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
BTW: Question moved to: https://superuser.com/questions/890645/percent-sign-in-bash-is-not-typeable
A workaround is using the ascii value 37: Press and hold the ALT-key, enter 37 on your numeric keyboard and release the ALT-key.
A solution is checking the keyboard mapping. Hold the shift and try all the numbers. On my keyboard I have
!##$%^&*()
Do you have an old keyboard somewhere that you could try?
I had a similar problem with the mapping of the backspace button. Instead of deleting previous characters while editing files it would appear "^?" . I used "stty sane" in the command line and it was reset. Maybe that helps.
I am using Cygwin in Console2 with the following PS1
export PS1='\[\e]2;\w\a\e[1;32m\e[40m\n\w\n\d - \# > \[\e[0;00m\]'
The prompt has the correct text content, but all the colors are ignored.
~/wd
Tue Mar 18 - 01:14 PM >
Screenshot showing Console2:
When I use mintty, the colours are perfect.
TERM is set the same in both Console2 and mintty:
Tue Mar 18 - 06:29 PM > env | grep TERM
TERM=cygwin
TERMCAP=SC|screen|VT 100/ANSI X3.64 virtual terminal:\
You have not show your screenshots. So I'm not sure what do you meaning.
But I believe it is cygwin feature (bug). It thinks that ANSI is not available in Windows terminal (that is true for Console2, but of course not if you are using ANSICON or ConEmu). That means that cygwin process all ANSI sequences internally (it does not send them to the terminal). So, if any problems happens, that all is cygwin implementation problems.
Reading
man locale
I figure that that locale displays information about the "current locale" or a list of all available locales.
In addition, running
$ locale
gives...
LANG=
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=
However, neither the man nor running it actually sheds a light on what these environment variables do. I would like to ask specifically what these environment variables are needed for or used for? (say for example in the context of a software running on this unix/linux OS that has these environment variables)
The Question: What does that mean in the context of a software that is running on the OS with these locales?
Oh, the man page (man 1 locale) does:
LC_CTYPE
Character classification and case conversion.
LC_COLLATE
Collation order.
LC_TIME
Date and time formats.
LC_NUMERIC
Non-monetary numeric formats.
LC_MONETARY
Monetary formats.
LC_MESSAGES
Formats of informative and diagnostic messages and interactive responses.
Perhaps, you had a look for the 'locale' manpage in the wrong section? These are the standard sections (see man man)
0 Header files (usually found in /usr/include)
1 Executable programs or shell commands
2 System calls (functions provided by the kernel)
3 Library calls (functions within program libraries)
4 Special files (usually found in /dev)
5 File formats and conventions eg /etc/passwd
6 Games
7 Miscellaneous (including macro packages and conven-
tions), e.g. man(7), groff(7)
8 System administration commands (usually only for root)
9 Kernel routines [Non standard]
so, for the locale binary, you have to look in section 1: man 1 locale. To fully answer your question, I cite the description part of locale's man page:
DESCRIPTION
The locale utility shall write information about the current locale
environment, or all public locales, to the standard output. For the
purposes of this section, a public locale is one provided by the imple-
mentation that is accessible to the application.
When locale is invoked without any arguments, it shall summarize the
current locale environment for each locale category as determined by
the settings of the environment variables defined in the Base Defini-
tions volume of IEEE Std 1003.1-2001, Chapter 7, Locale.
When invoked with operands, it shall write values that have been
assigned to the keywords in the locale categories, as follows:
* Specifying a keyword name shall select the named keyword and the
category containing that keyword.
* Specifying a category name shall select the named category and all
keywords in that category.
Samples (LC_TIME and LC_MESSAGES):
$ export LC_TIME='fr_FR.UTF-8' #french time
$ date
mar. août 30 18:41:07 CEST 2011
$ export LC_TIME='de_DE.UTF-8' #german time
$ date
Di 30. Aug 18:41:12 CEST 2011 #english time
$ export LC_TIME='en_US.UTF-8'
$ date
Tue Aug 30 18:41:17 CEST 2011
$ rm NON-EXIST
rm: cannot remove `NON-EXIST': No such file or directory
$ export LC_TIME='de_DE.UTF-8' #german time, but english MESSAGES
$ rm NON-EXIST
rm: cannot remove `NON-EXIST': No such file or directory
$ export LC_MESSAGES='de_DE.UTF-8' #german messages
$ rm NON-EXIST
rm: cannot remove `NON-EXIST': Datei oder Verzeichnis nicht gefunden
LC_COLLATE is for sorting information according to a language. LC_MONETARY is the format for currency (US: $1.24, europe: 1.24 €)
Locale governs a lot of things, such as:
Encoding in use (i.e., en_US.UTF-8, or some other classic encoding)
Translation files to use for the standard library or other applications.
Internationalization (number formatting, currency, dates)
The C locale is the "default" locale. It is generally advisable to be more specific, and run as something UTF-8 enabled on Linux.
Given a file txt:
ab
a c
a a
When calling sort txt, I obtain:
a a
ab
a c
In other words, it is not proper sorting, it kind of deletes/ignores the whitespaces! I expected this to be the behavior of sort -i but it happens with or without the -i flag.
I would like to obtain "correct" sorting:
a a
a c
ab
How should I do that?
Solved by:
export LC_ALL=C
From the sort() documentation:
WARNING: The locale specified by the environment affects sort order. Set LC_ALL=C to get the traditional sort order that uses native byte values.
(works for ASCII at least, no idea for UTF8)
Like mentioned before, LC_ALL=C sort does the trick. This is simply because different languages have different rules for sorting characters, which are often laid out by senior linguists instead of CS experts. And these rules, in the case of your locale, seem to say that spaces ought to be ignored in sorting.
By prefixing LC_ALL=C (or, when LC_ALL is unset, LC_COLLATE=C suffices), you explicitely declare language-agnostic sorting (and, with LC_ALL, number-formatting and stuff), which is what you want in this context. If you want to make this your default, export LC_COLLATE in your environment.
The default is chosen in this way to keep consistency with the "normal", real-world sorting schemes (like the white pages), which often ignored spaces.
Using the C locale i.e. sorting just by byte values is not a good solution in languages where some letters are outside the range [A-Za-z]. Such letters are represented as multiple bytes in UTF-8 and then the byte value collating order is not what one desires. (Some characters may have two equivalent representations (pre-composed and de-composed)).
Nevertheless, the treatment of spaces is a problem. I tried the following:
$ cat stest
a b
a c
ab
a d
$ sort stest
ab
a b
a c
a d
$ sort -k 1,1 stest
a b
a c
a d
ab
For my needs, the -k 1,1 did the trick. Another but clumsier solution I tried, was to change spaces to some auxiliary character, then sort, then change the auxiliaries back into blanks.
You could use the 'env' program to temporarily change your LC_COLLATE for the duration of the sort; e.g.
/usr/bin/env LC_COLLATE=POSIX /bin/sort file1 file2
It's a little cumbersome on the command line but if you're using it in a script should be transparent.
I have been looking at this for a little while, wanting to optimize a shell script I maintain that has a heavy international userbase. (heavy as in percentage, not quantity).
Most of the options I saw around the web and SO seem to recommend what I see here, setting the locale globally (overkill)
export LC_ALL=C
or piping it into each individual command like this from gnu.org (tedious)
$ echo abcdefghijklmnopqrstuvwxyz | LC_ALL=C /usr/xpg4/bin/tr 'a-z' 'A-Z' ABCDEFGHIJKLMNOPQRSTUVWXYZ
I wanted to avoid clobbering the user's locale as a unseen side effect of running my program. This turned out to be easily accomplished just as you would expect, by leaving off the globalization. No need to export this variable past your program.
I had to set LANG instead of LC_ALL for some reason, but all the individual locales were set which is functionally enough for me.
Here is the test, simple as can be
#!/bin/bash
# locale_checker.sh
#Check and set locale to LC_ALL to optimize character sort and search.
echo "locale was $LANG"
LANG=C
locale
and output + proof that it is temporary and can be restricted to my script's process.
mateor#:~/snippets$ ./locale_checker.sh
locale was en_US.UTF-8
LANG=C
LANGUAGE=en_US:en
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=
mateor#:~/snippets$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
There you go. You get the optimized locale without clobbering another person's innocent environment as well as avoid the tedium of piping it everywhere you think it may help.
Weird, works here (cygwin).
Try sort -d txt.
Actually for me
$ cat txt
ab
a c
a a
$ sort txt
a a
a c
ab
I'll bet between your a and c you have a non-breaking space or an enspace or an empspace or other high-codepoint space!
EDIT
Just ran it on Linux. I should have looked at the tags. Yes I get the same output you do! My first run was on the Mac. Looks like a difference between GNU and BSD. I will investigate further.
EDIT 2:
Linux uses a field-based sort.... still looking for how to suppress it. Tried
sort -t, txt
hoping to trick GNU into thinking the whole line was one field, but it still used the current locale to sort.
EDIT 3:
The OP solved the problem by setting the locale to C with
export LC_ALL=C
There seems to be no other approach. The sort command will use the current locale, and although it often says the C (or its alias POSIX) is the default locale, if you have Linux it has probably been set for you. Enter locale -a to see the available locales. On my system:
$ locale -a
C
POSIX
en_AG
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8
en_HK.utf8
en_IE.utf8
en_IN
en_NG
en_NZ.utf8
en_PH.utf8
en_SG.utf8
en_US.utf8
en_ZA.utf8
en_ZW.utf8
It seems like setting the locale to C (or its alias POSIX) is the only way to break the field-based behavior of sort and treat the whole line as one field. It is rather odd IMHO that this is how to do it. I would think the -t or -k options, or perhaps some new option would be a more sensible way to make this happen.
BTW, it looks like this question has been asked before on SO: unexpected result from gnu sort.
Now I change my gnome-terminal's character encoding to "GBK" (default it is UTF-8), but how can I get the value(character encoding) in my Linux?
The terminal uses environment variables to determine which character set to use, therefore you can determine it by looking at those variables:
echo $LC_CTYPE
or
echo $LANG
locale command with no arguments will print the values of all of the relevant environment variables except for LANGUAGE.
For current encoding:
locale charmap
For available locales:
locale -a
For available encodings:
locale -m
Check encoding and language:
$ echo $LC_CTYPE
ISO-8859-1
$ echo $LANG
pt_BR
Get all languages:
$ locale -a
Change to pt_PT.utf8:
$ export LC_ALL=pt_PT.utf8
$ export LANG="$LC_ALL"
If you have Python:
python -c "import sys; print(sys.stdout.encoding)"
To my knowledge, no.
Circumstantial indications from $LC_CTYPE, locale and such might seem alluring, but these are completely separated from the encoding the terminal application (actually an emulator) happens to be using when displaying characters on the screen.
They only way to detect encoding for sure is to output something only present in the encoding, e.g. ä, take a screenshot, analyze that image and check if the output character is correct.
So no, it's not possible, sadly.
To see the current locale information use locale command. Below is an example on RHEL 7.8
[usr#host ~]$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
Examination of https://invisible-island.net/xterm/ctlseqs/ctlseqs.html, the xterm control character documentation, shows that it follows the ISO 2022 standard for character set switching. In particular ESC % G selects UTF-8.
So to force the terminal to use UTF-8, this command would need to be sent. I find no way of querying which character set is currently in use, but there are ways of discovering if the terminal supports national replacement character sets.
However, from charsets(7), it doesn't look like GBK (or GB2312) is an encoding supported by ISO 2022 and xterm doesn't support it natively. So your best bet might be to use iconv to convert to UTF-8.
Further reading shows that a (significant) subset of GBK is EUC, which is a ISO2022 code, so ISO2022 capable terminals may be able to display GBK natively after all, but I can't find any mention of activating this programmatically, so the terminal's user interface would be the only recourse.