What's the default encoding in bash standard input? [duplicate] - node.js

I am using Gina Trapiani's excellent todo.sh to organize my todo-list.
However being a dane, it would be nice if the script accepted special danish characters like ø and æ.
I am an absolute UNIX-n00b, so it would be a great help if anybody could tell me how to fix this! :)

Slowly, the Unix world is moving from ASCII and other regional encodings to UTF-8. You need to be running a UTF terminal, such as a modern xterm or putty.
In your ~/.bash_profile set you language to be one of the UTF-8 variants.
export LANG=C.UTF-8
or
export LANG=en_AU.UTF-8
etc..
You should then be able to write UTF-8 characters in the terminal, and include them in bash scripts.
#!/bin/bash
echo "UTF-8 is græat ☺"
See also: https://serverfault.com/questions/11015/utf-8-and-shell-scripts

What does this command show?
locale
It should show something like this for you:
LC_CTYPE="da_DK.UTF-8"
LC_NUMERIC="da_DK.UTF-8"
LC_TIME="da_DK.UTF-8"
LC_COLLATE="da_DK.UTF-8"
LC_MONETARY="da_DK.UTF-8"
LC_MESSAGES="da_DK.UTF-8"
LC_PAPER="da_DK.UTF-8"
LC_NAME="da_DK.UTF-8"
LC_ADDRESS="da_DK.UTF-8"
LC_TELEPHONE="da_DK.UTF-8"
LC_MEASUREMENT="da_DK.UTF-8"
LC_IDENTIFICATION="da_DK.UTF-8"
LC_ALL=
If not, you might try doing this before you run your script:
LANG=da_DK.UTF-8
You don't say what happens when you run the script and it encounters these characters. Are they in the todo file? Are they entered at a prompt? Is there an error message? Is something output in place of the expected output?
Try this and see what you get:
read -p "Enter some characters" string
echo "$string"

Related

Where and How Bash convert strings to colors

I am working on Bash 5.0 from GNU repository. I wanted to find the place where Bash reads a string with ASCII colors and convert it to colors, like in the following case where it convert "Hello" to red:
root#ubuntu:~/Desktop/bash-5.0# ./bash
root#ubuntu:~/Desktop/bash-5.0# echo $BASH_VERSION
5.0.0(8)-release
root#ubuntu:~/Desktop/bash-5.0# ./bash -c 'echo -e "\033[31mHello\e[0m World"'
Hello World
I searched inside the source code and found two files that seems to be related:
bash-5.0/lib/readline/colors.c - link
bash-5.0/lib/readline/parse-colors.c - link
But they are not, they work only on the first time I load Bash and you need to write the following rows in the file ~/.inputrc for it to work:
set colored-completion-prefix on
set colored-stats on
Any idea where in the code Bash takes string like that "\033[31mHello" and convert it to red?
It's not the shell that's converting anything to colors, it is your terminal. The shell only outputs ANSI escape codes which are then picked up by the terminal.
Depending on your point of view and philosophical interpretations, \033[31mHello already is a colored string (for the shell, at least, it is)

Preserving accentuated letters when running a PERL script from linux terminal

I want to get a plain text file from the French Wikipedia dump XML file.
To that end, I am applying a Perl script
I can give the full file if necessary, I only added the line
tr/a-zàâééèëêîôûùç-/ /cs;
to the script here: http://mattmahoney.net/dc/textdata.html
However, when I run on linux terminal:
perl filterwikifr.pl frwiki.xml > frwikiplaintext.txt
the output text file does not print accentuated letters correctly. For example, I get catégorie instead of catégorie...
I also tried:
perl -CS filterwikifr.pl frwiki.xml > frwikiplaintext.txt
without better success (and other variants instead of -CS...)
the problem is with the text editor gedit.
If, instead of opening the file directly, I open gedit, and then go to "open" and down, in "Character encoding", I choose UTF-8 instead of "Automatically Detected", then the accents are printed correctly.

How to display bash special characters \h, \s, etc

I can't find a way to display bash special chars. For example the hostname is \h.
If I do :
echo '\h \\h'
it won't work ( display h \h). How can I make it display my hostname ?
ref : http://www.tldp.org/HOWTO/Bash-Prompt-HOWTO/bash-prompt-escape-sequences.html
Those are only evaluated in the PS1 and PS2 variables. You can test them dynamically like this:
PS1="\h"
The bash will then display the new prompt. Just open a new bash if you mess it up, it will not be saved.
The HOWTO told you it is "Prompt Escape Sequences", only works when you put then in PS1 or PS2,

Perl color specifiers with redirected output

I have a Perl script that uses Term::ANSIColor. It used to be the case that if I redirect the output to a file > file.txt then the file contains just the text, and not the color codes ^[[0m
Something changed on my machine, Ubuntu 10.04, such that redirected output includes these special characters that specify color.
Any idea how to fix this? Can I detect output redirection from inside the perl script and skip the color part?
Thanks!
You can test whether you're running interactively using the IO::Interactive package:
use IO::Interactive qw(is_interactive);
if (is_interactive())
{
# show some fancy color
}
The rationale behind using IO::Interactive (instead of just testing if STDIN is a tty with the -t operator) is extensively explained by Damian Conway.

Cat magic - end of input

When "cat > xx.txt << EOF" is entered on the command line, further input from cmdline goes to file xx.txt until EOF is written. EOF is not a sacred word here, if instead the command was cat > xx.txt << BBB, then cmdline input goes to xx.txt until BBB is written. I don't know whats the rationale behind ( << end_of_input_sequence)this. Cat man page does not explain much.
I have only seen this in scripts etc.
It's a feature of the shell, not cat - that's why you won't find it in the cat manual.
It's known as a "Here document" - see this page of the Advanced Bash-Scripting Guide for some documentation.
This is called a here document. I believe it first appeared in shells, but some programming languages such as Perl, Ruby, and PHP also implement this style.
This syntax is called Here Document (scroll a bit to find it).
It's not specific to any command, not cat anymore than any other command ; and it can be find in the man of the shell ; for instance, man bash :
3.6.6 Here Documents
This type of redirection instructs the
shell to read input from the current
source until a line containing only
word (with no trailing blanks) is
seen. All of the lines read up to that
point are then used as the standard
input for a command.
(Not a full quote -- there is more to read in the man)
BTW, It's a syntax that has been re-used in some programming languages, like PHP ;-)

Resources