Debian system wide locale not being followed - linux

contents of /etc/default/locale:
# File generated by update-locale
LANG=en_US.UTF-8
locale command output:
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
Why isn't LANG being set?
Note: I have no .bashrc or .profile

The only clean thing that worked for me was this from https://wiki.debian.org/Locale
Add a line like this to your /etc/profile file:
: "${LANG:=en_US.utf8}"; export LANG

Related

iconv not TRANSLIT from UTF-8 to US-ASCII properly

I need to eliminate special characters in a large .xml file. So, I need a file to go from UTF-8 to US-ASCII. I believe I should be able to use iconv to do this with the following command:
iconv -f UTF-8 -t US-ASCII//TRANSLIT//IGNORE sample1.xml -o sample2.xml
Here are a few lines of the input file:
...from regjsparser’s AST...
...returning “symbol” for...
...foo-bar → fooBar...
...André Cruz...
...Kat Marchán...
And here is the output of those snippets:
...from regjsparser's AST... (replaced RIGHT SINGLE QUOTE with APOSTROPHE )
...returning "symbol" for... (replaced LEFT/RIGHT DOUBLE QUOTES with regular QUOTES )
...foo-bar -> fooBar... (replaced RIGHTWARDS ARROW with DASH and GREATER THAN )
...Andr? Cruz... (failed to identify/replace ACUTE E / U+00E9 with regular E )
...Kat March?n... (failed to identify/replace ACUTE A / U+00E1 with regular A )
Clearly the tool is working because it replaces some of the chars, but it can never replace accented letters.
These files are BOM files generated by CycloneDX, so they should just be UTF-8 encoded originally.
The iconv installed on the machine comes from Debian 2.31 GLIBC library.
I have no idea why it is struggling with accented chars.
EDIT: Here is the printout of the locale and locale -a commands. Not sure if these values are relevant to this problem or not.
locale
+ locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
locale -a
+ locale -a
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_COLLATE to default locale: No such file or directory
C
C.UTF-8
POSIX
I'm struggling to understand what these LC values mean and how they work.
Fixed this by running
export LC_ALL="C.UTF-8"
iconv -f UTF-8 -t US-ASCII//TRANSLIT//IGNORE sample1.xml -o sample2.xml
It seems that the original value of locale parameters is en-US.UTF-8 by default, even if it does not exist on the machine. So you need to run locale -a to determine what options you have and choose one that closely fits your needs. Seems that most anything label xx.UTF-8 will work for TRANSLITERATION purposes.
I've read that this exported value is applied only during your current session, and would need to be reset every time you start a new session. If you want to permanently set the locale values, you will need to do something like this:
https://www.tecmint.com/set-system-locales-in-linux/

How to find out what messes up my locale-settings?

something messes up my locale settings and I can't find out where and why this happens. I am on a Manjaro system, but installed Ubuntu a couple of days ago with dual-boot option. The problems started then.
/etc/locale.gen
...
#en_BW ISO-8859-1
#en_CA.UTF-8 UTF-8
#en_CA ISO-8859-1
en_DK.UTF-8 UTF-8
#en_DK ISO-8859-1
#en_GB.UTF-8 UTF-8
...
sudo locale-gen was called
/etc/default/locale
LANG=en_DK.utf8
LC_CTYPE="en_DK.utf8"
LC_NUMERIC=en_DK.UTF-8
LC_TIME=en_DK.UTF-8
LC_COLLATE="en_DK.utf8"
LC_MONETARY=en_DK.UTF-8
LC_MESSAGES="en_DK.utf8"
LC_PAPER=en_DK.UTF-8
LC_NAME=en_DK.UTF-8
LC_ADDRESS=en_DK.UTF-8
LC_TELEPHONE=en_DK.UTF-8
LC_MEASUREMENT=en_DK.UTF-8
LC_IDENTIFICATION=en_DK.UTF-8
LC_ALL=
/etc/locale.conf
LANG=en_DK.UTF-8
manjaro-settings-manager:
everything set to en_DK.UTF-8
no further exports in bashrc, zshrc, ~/.profile
output locale:
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_DK.UTF-8
LC_CTYPE="en_DK.UTF-8"
LC_NUMERIC=en_GB.UTF-8
LC_TIME=en_GB.UTF-8
LC_COLLATE="en_DK.UTF-8"
LC_MONETARY=en_GB.UTF-8
LC_MESSAGES="en_DK.UTF-8"
LC_PAPER=en_GB.UTF-8
LC_NAME=en_GB.UTF-8
LC_ADDRESS=en_GB.UTF-8
LC_TELEPHONE=en_GB.UTF-8
LC_MEASUREMENT=en_GB.UTF-8
LC_IDENTIFICATION=en_GB.UTF-8
LC_ALL=
Solved this by using
grep -rnw '.' -e 'en_GB.UTF-8'
to find out that Ubuntu set locales in ~/.pam_environment that somehow overwrote the system defaults. After replacing those, everything was fine.
Maybe this will save someone some trouble.

Perl fails to set locale even though it is installed

I am having trouble running Perl on a Linux system (Ubuntu):
user#Box:~$ perl -e exit
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = "en_US:en",
LC_ALL = (unset),
LC_CTYPE = "UTF-8",
LANG = "en_DK.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_DK.UTF-8").
Googling showed, that this is usually related to environment variables referring to a missing locale, however all locales seem to be there:
user#Box:~$ locale -a
locale: Cannot set LC_CTYPE to default locale: No such file or directory
C
en_DK.utf8
en_GB.utf8
en_US
en_US.iso88591
en_US.utf8
POSIX
I have tried installing all of en, but that does not seem to affect anything.
Found the answer while writing the question:
The culprit is LC_CTYPE=UTF-8, which is apparently perfectly valid in macOS (and Perl will accept it there), but not on Linux. To avoid it, one can override LC_CTYPE as follows:
root#Box:~# update-locale LC_CTYPE=en_US.UTF-8
After logging out and back in again, Perl will no longer complain.
user#Box:~# perl -e 'print "Hack the Planet!\n"'
Hack the Planet!

How to change the language of my git?

My ˋgitˋ is in german, it says:
ˋAuf Zweig masterˋ
instead of
On branch master
with git status.
What's the reason for this?
Probably you locale is german. You can see it by locale. Try to change it by: export LANG="en_US.UTF-8"
The reason for this is that your command line language is set to German.
So when you do:
echo $LANG
you will see:
de_DE.UTF-8
To change this, do:
echo "export LANG=en_US.UTF-8" >> ~/.bashrc
assuming your standard shell is bash.
Don't forget:
source ~/.bashrc
Sometimes changing the LANG environment variable alone is not good enough.
You may also need to add LC_ALL
export LC_ALL=en_US.UTF-8
According to The IEEE and The Open Group - Environment Variables.
It is because the environment variables starting by LC_* will be used first by your system before LANG:
The values of locale categories shall be determined by a precedence
order; the first condition met below determines the value:
If the LC_ALL environment variable is defined and is not null, the
value of LC_ALL shall be used.
If the LC_* environment variable (LC_COLLATE, LC_CTYPE, LC_MESSAGES,
LC_MONETARY, LC_NUMERIC, LC_TIME) is defined and is not null, the
value of the environment variable shall be used to initialize the
category that corresponds to the environment variable.
If the LANG environment variable is defined and is not null, the
value of the LANG environment variable shall be used.
If the LANG environment variable is not set or is set to the empty
string, the implementation-defined default locale shall be used.
To change it permanently, you need to paste the code above into your favourite shell configuration file (probably ~/.bashrc or ~/.zshrc)
Then to apply the modification do:
$ source ~/.bashrc
or
$ source ~/.zshrc
Otherwise, just open a new terminal.
In my case, setting LANG or LC_ALL was not enough. I also had a LANGUAGE environment variable which was set to en_GB:en_US:de. Despite the ordering, which is presumably an order of preference, it resulted in a German language response from git and other commandline-programmes. When I changed it to en_GB:en_US, git and other programmes became English.
As explain in #Tom comment, it is possible to add alias. In my case, I add in my Ubuntu ~/.bash_aliases
alias giten='LANGUAGE=en_GB:en_Us git'
so if I use git, it is in my language, if I use giten, it is in english
NOTA: by this way, the auto-completion is lost

How to get terminal's Character Encoding

Now I change my gnome-terminal's character encoding to "GBK" (default it is UTF-8), but how can I get the value(character encoding) in my Linux?
The terminal uses environment variables to determine which character set to use, therefore you can determine it by looking at those variables:
echo $LC_CTYPE
or
echo $LANG
locale command with no arguments will print the values of all of the relevant environment variables except for LANGUAGE.
For current encoding:
locale charmap
For available locales:
locale -a
For available encodings:
locale -m
Check encoding and language:
$ echo $LC_CTYPE
ISO-8859-1
$ echo $LANG
pt_BR
Get all languages:
$ locale -a
Change to pt_PT.utf8:
$ export LC_ALL=pt_PT.utf8
$ export LANG="$LC_ALL"
If you have Python:
python -c "import sys; print(sys.stdout.encoding)"
To my knowledge, no.
Circumstantial indications from $LC_CTYPE, locale and such might seem alluring, but these are completely separated from the encoding the terminal application (actually an emulator) happens to be using when displaying characters on the screen.
They only way to detect encoding for sure is to output something only present in the encoding, e.g. ä, take a screenshot, analyze that image and check if the output character is correct.
So no, it's not possible, sadly.
To see the current locale information use locale command. Below is an example on RHEL 7.8
[usr#host ~]$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
Examination of https://invisible-island.net/xterm/ctlseqs/ctlseqs.html, the xterm control character documentation, shows that it follows the ISO 2022 standard for character set switching. In particular ESC % G selects UTF-8.
So to force the terminal to use UTF-8, this command would need to be sent. I find no way of querying which character set is currently in use, but there are ways of discovering if the terminal supports national replacement character sets.
However, from charsets(7), it doesn't look like GBK (or GB2312) is an encoding supported by ISO 2022 and xterm doesn't support it natively. So your best bet might be to use iconv to convert to UTF-8.
Further reading shows that a (significant) subset of GBK is EUC, which is a ISO2022 code, so ISO2022 capable terminals may be able to display GBK natively after all, but I can't find any mention of activating this programmatically, so the terminal's user interface would be the only recourse.

Resources