Do I need to include -n option when encoding a password using base64? - base64

I read somewhere that when encoding a password in base64, I should use echo -n to prevent the newline from being included in the encoded value. For example,
echo changeme | base64
Y2hhbmdlbWUK
echo -n changeme | base64
Y2hhbmdlbWU=
These two base64 strings are different, but when I decode them they are the same
echo Y2hhbmdlbWUK | base64 -d
changeme
echo Y2hhbmdlbWU= | base64 -d
changeme
So do I really need to add the -n option https://linux.die.net/man/1/echo says:
do not output the trailing newline
Searches gave this long winded example using python... didn't read it all Simple way to encode a string according to a password?

When you use an online converter to decode it to hex, you'll see that the first string becomes 6368616e67656d650a and has 0a (ASCII Linefeed) on the end, which the second doesn't have.
So the answer is yes, you really need to add the -n option.
If you change your echo to echo -n you'll see this as well.
echo -n Y2hhbmdlbWU= | base64 -d
changemePROMPT-SHOWS-HERE
Another way to see this is using the following UNIX command:
echo -n Y2hhbmdlbWUK | base64 -d | od -c
0000000 c h a n g e m e \n
echo -n Y2hhbmdlbWU= | base64 -d | od -c
0000000 c h a n g e m e
Where you can see the first encoding includes the linefeed and the second does not.

Related

Bash script - decode encoded string to byte array

I am new to bash scripting and I am writing a small script where I need to decode the encoded string and convert it to a byte array.
my_array=(`echo "dGVzdA==" | base64 -d`)
when I print the size of the array it's showing as 1 instead of 4.
echo "${#my_array[*]}"
Question - will not base64 -d convert to a byte array? There is an equivalent java code where it does for me:
byte [] b = Base64.getDecoder().decode("dGVzdA==");
Any thoughts?
to byte array
There are no "byte arrays" in shells. You can basically choose from two types of variables:
a string
an array of strings (bash extension, not available on all shells)
How you interpret the data is up to you, the shell can only store strings. Bash has an extension called "bash arrays" that allows to create an array. Note that strings are zero terminated - it's impossible to store the zero byte.
Question - will it not base64 -d converts into byte array ?
The string dGVzdA== is base64 encoded string test. So you can convert the string dGVzdA== to the string test:
$ echo "dGVzdA==" | base64 -d
test# no newline is printed after test, as the newline is not in input
Any thoughts ?
You could convert the string test to a string that contains the representation of each byte as hexadecimal number, typically with xxd:
$ echo "dGVzdA==" | base64 -d | xxd -p
74657374
Now you could insert a newline after each two characters in the string and read the output as a newline separated list of strings into an array:
$ readarray -t array < <(echo "dGVzdA==" | base64 -d | xxd -p -c1)
$ declare -p array
declare -a array=([0]="74" [1]="65" [2]="73" [3]="74")
Each element of the array is one byte in the input represented as a string that contains a number in base16. This may be as close as you can get to a "byte array".
#!/bin/bash
read -r -a my_array < <(echo 'dGVzdA==' | base64 -d | sed 's/./& /g')
echo "${#my_array[#]}"
You can also access individual characters in a string variable to do this.
$: v=$( echo "dGVzdA==" | base64 -d )
$: my_array=( $( for(( i=0; i<${#v}; i++ )); do echo "${v:i:1}"; done ) )
$: echo ${my_array[2]}
s
This isn't a one-liner, but aside from the base64 call I think it's all done in the parser.
Introduction
Syntax:
encoded='dGVzdA=='
myVar=$(base64 -d <<<"$encoded")
Work fine, but if you plan to work with a lot of encoded strings, doing:
base64Decoder() {
local -n _result=${2:-B64Decoded}
_result=$(base64 -d <<<"$1")
}
Will be enough:
base64Decoder $encoded decoded
echo $decoded
test
But having to fork to base64 -d for each operation could become heavy...
Near pure bash: using coproc for only 1 forks to tr.
As this question stand for
Bash script - decode encoded string to byte array
to be used as library function, there is a very efficient way, using dedicated subprocess to convert base64 characters to bash 64 bits integer value, then small bash loop to compute characters.
With only 1 fork, this is a lot quicker, for small strings!
# start `tr` in background
printf -v _Base64_refstr "%s" {A..Z} {a..z} {0..9} + / =
printf -v _Bash64_refstr "%s" {0..9} {a..z} {A..Z} # _ 0
coproc stdbuf -o0 tr ${_Base64_refstr} ${_Bash64_refstr}
declare -r trB64IN=${COPROC[1]} trB64OUT=$COPROC
unset _Base64_refstr _Bash64_refstr
bashBase64Decoder() {
local _string _4B _val
local -n _result=${2:-B64Decoded}
echo >&$trB64IN $1
read -u $trB64OUT _string
_result=()
while read -n4 _4B;do
[ "$_4B" ] &&
printf -v _val %02X\ \
$((64#$_4B>>16)) $((64#$_4B>>8&255)) $((64#$_4B&255)) &&
_result+=($_val)
done <<<$_string
}
Then
bashBase64Decoder 'dGVzdA==' myVar
declare -p myVar
declare -a myVar=([0]="74" [1]="65" [2]="73" [3]="74" [4]="00" [5]="00")
Or you could replace printf .. & line and _result assignment:
bashBase64Decoder() {
local _string _4B _chrs
local -n _result=${2:-B64Decoded}
echo >&$trB64IN $1
read -u $trB64OUT _string
_result=()
while read -n4 _4B;do
[ "$_4B" ] &&
printf -v _chrs \\%02o $((64#$_4B>>16)) $((64#$_4B>>8&255)) \
$((64#$_4B&255)) &&
printf -v _chrs "$_chrs" &&
_result+="$_chrs"
done <<<$_string
}
Then
bashBase64Decoder dGVzdA== myVar
declare -p myVar
declare -a myVar=([0]="test")
This could become upto more than 10 time quicker for small strings:
time for ((i=1000;i--;)) { myVar=$(base64 -d <<<'dGVzdA==');}
real 0m2.009s
time for ((i=1000;i--;)) { bashBase64Decoder dGVzdA== myVar;}
real 0m0.168s

How to translate and remove non-printable characters? [duplicate]

I want to delete all the control characters from my file using linux bash commands.
There are some control characters like EOF (0x1A) especially which are causing the problem when I load my file in another software. I want to delete this.
Here is what I have tried so far:
this will list all the control characters:
cat -v -e -t file.txt | head -n 10
^A+^X$
^A1^X$
^D ^_$
^E-^D$
^E-^S$
^E1^V$
^F%^_$
^F-^D$
^F.^_$
^F/^_$
^F4EZ$
^G%$
This will list all the control characters using grep:
$ cat file.txt | head -n 10 | grep '[[:cntrl:]]'
+
1
-
-
1
%
-
.
/
matches the above output of cat command.
Now, I ran the following command to show all lines not containing control characters but it is still showing the same output as above (lines with control characters)
$ cat file.txt | head -n 10 | grep '[^[:cntrl:]]'
+
1
-
-
1
%
-
.
/
here is the output in hex format:
$ cat file.txt | head -n 10 | grep '[[:cntrl:]]' | od -t x2
0000000 2b01 0a18 3101 0a18 2004 0a1f 2d05 0a04
0000020 2d05 0a13 3105 0a16 2506 0a1f 2d06 0a04
0000040 2e06 0a1f 2f06 0a1f
0000050
as you can see, the hex values, 0x01, 0x18 are control characters.
I tried using the tr command to delete the control characters but got an error:
$ cat file.txt | tr -d "\r\n" "[:cntrl:]" >> test.txt
tr: extra operand `[:cntrl:]'
Only one string may be given when deleting without squeezing repeats.
Try `tr --help' for more information.
If I delete all control characters, I will end up deleting the newline and carriage return as well which is used as the newline characters on windows. How do I delete all the control characters keeping only the ones required like "\r\n"?
Thanks.
Instead of using the predefined [:cntrl:] set, which as you observed includes \n and \r, just list (in octal) the control characters you want to get rid of:
$ tr -d '\000-\011\013\014\016-\037' < file.txt > newfile.txt
Based on this answer on unix.stackexchange, this should do the trick:
$ cat scriptfile.raw | col -b > scriptfile.clean
Try grep, like:
grep -o "[[:print:][:space:]]*" in.txt > out.txt
which will print only alphanumeric characters including punctuation characters and space characters such as tab, newline, vertical tab, form feed, carriage return, and space.
To be less restrictive, and remove only control characters ([:cntrl:]), delete them by:
tr -d "[:cntrl:]"
If you want to keep \n (which is part of [:cntrl:]), then replace it temporarily to something else, e.g.
cat file.txt | tr '\r\n' '\275\276' | tr -d "[:cntrl:]" | tr "\275\276" "\r\n"
A little late to the party: cat -v <file>
which I think is the easiest to remember of the lot!

Removing Control Characters from a File

I want to delete all the control characters from my file using linux bash commands.
There are some control characters like EOF (0x1A) especially which are causing the problem when I load my file in another software. I want to delete this.
Here is what I have tried so far:
this will list all the control characters:
cat -v -e -t file.txt | head -n 10
^A+^X$
^A1^X$
^D ^_$
^E-^D$
^E-^S$
^E1^V$
^F%^_$
^F-^D$
^F.^_$
^F/^_$
^F4EZ$
^G%$
This will list all the control characters using grep:
$ cat file.txt | head -n 10 | grep '[[:cntrl:]]'
+
1
-
-
1
%
-
.
/
matches the above output of cat command.
Now, I ran the following command to show all lines not containing control characters but it is still showing the same output as above (lines with control characters)
$ cat file.txt | head -n 10 | grep '[^[:cntrl:]]'
+
1
-
-
1
%
-
.
/
here is the output in hex format:
$ cat file.txt | head -n 10 | grep '[[:cntrl:]]' | od -t x2
0000000 2b01 0a18 3101 0a18 2004 0a1f 2d05 0a04
0000020 2d05 0a13 3105 0a16 2506 0a1f 2d06 0a04
0000040 2e06 0a1f 2f06 0a1f
0000050
as you can see, the hex values, 0x01, 0x18 are control characters.
I tried using the tr command to delete the control characters but got an error:
$ cat file.txt | tr -d "\r\n" "[:cntrl:]" >> test.txt
tr: extra operand `[:cntrl:]'
Only one string may be given when deleting without squeezing repeats.
Try `tr --help' for more information.
If I delete all control characters, I will end up deleting the newline and carriage return as well which is used as the newline characters on windows. How do I delete all the control characters keeping only the ones required like "\r\n"?
Thanks.
Instead of using the predefined [:cntrl:] set, which as you observed includes \n and \r, just list (in octal) the control characters you want to get rid of:
$ tr -d '\000-\011\013\014\016-\037' < file.txt > newfile.txt
Based on this answer on unix.stackexchange, this should do the trick:
$ cat scriptfile.raw | col -b > scriptfile.clean
Try grep, like:
grep -o "[[:print:][:space:]]*" in.txt > out.txt
which will print only alphanumeric characters including punctuation characters and space characters such as tab, newline, vertical tab, form feed, carriage return, and space.
To be less restrictive, and remove only control characters ([:cntrl:]), delete them by:
tr -d "[:cntrl:]"
If you want to keep \n (which is part of [:cntrl:]), then replace it temporarily to something else, e.g.
cat file.txt | tr '\r\n' '\275\276' | tr -d "[:cntrl:]" | tr "\275\276" "\r\n"
A little late to the party: cat -v <file>
which I think is the easiest to remember of the lot!

unix - print distinct list of control characters in a file

For example given an input file like below:
sid|storeNo|latitude|longitude
2|1|-28.03õ720000
9|2
10
jgn
352|1|-28.03¿720000
9|2|fd¿kjhn422-405
000¥0543210|gf¿djk39
gfd|f¥d||fd
Output (the characters below can appear in any order):
¿õ¥
Does anyone have a function (awk, bash, perl.etc) that could scan each line and then output (in octal, hex or ascii - either is fine) a distinct list of the control characters (for simplicity, control characters being those above ascii char 126) found?
Using perl v5.8.8.
To print the bytes in octal:
perl -ne'printf "%03o\n", ord for /[^\x09\x0A\x20-\x7E]/g' file | sort -u
To print the bytes in hex:
perl -ne'printf "%02X\n", ord for /[^\x09\x0A\x20-\x7E]/g' file | sort -u
To print the original bytes:
perl -nE'say for /[^\x09\x0A\x20-\x7E]/g' file | sort -u
This should catch everything over ordinal value 126 without having to explicitly weed out outliers
#!/bin/bash
while IFS= read -n1 c; do
if (( $(printf "%d" "'$c") > 126)); then
echo "$c"
fi
done < ./infile | sort -u
Output
¥
¿
õ
To delete everything except the control characters:
tr -d '\0-\176' < input > output
To test:
printf 'foobar\n\377' | tr -d '\0-\176' | od -t c
See tr(1) man page for details.
sed -e 's/[A-Za-z0-9,|]//g' -e 's/-//g' -e 's/./&^M/g' | sort -u
Delete everything you don't want, put everything else on its own line, then sort -u the whole kit.
The "&^M" is "&" followed by Ctrl-V followed by Ctrl-M in Bash.
Unix wins.

How to convert hex to ASCII characters in the Linux shell?

Let's say that I have a string 5a.
This is the hex representation of the ASCII letter Z.
I need to find a Linux shell command which will take a hex string and output the ASCII characters that the hex string represents.
So if I do:
echo 5a | command_im_looking_for
I will see a solitary letter Z:
Z
I used to do this with xxd:
echo -n 5a | xxd -r -p
But then I realised that in Debian/Ubuntu, xxd is part of vim-common and hence might not be present in a minimal system. To also avoid Perl (IMHO also not part of a minimal system), I ended up using sed, xargs, and printf like this:
echo -n 5a | sed 's/\([0-9A-F]\{2\}\)/\\\\\\x\1/gI' | xargs printf
Mostly, I only want to convert a few bytes and it's okay for such tasks. The advantage of this solution over the one of ghostdog74 is, that this can convert hex strings of arbitrary lengths automatically. xargs is used because printf doesnt read from standard input.
echo -n 5a | perl -pe 's/([0-9a-f]{2})/chr hex $1/gie'
Note that this won't skip non-hex characters. If you want just the hex (no whitespace from the original string etc):
echo 5a | perl -ne 's/([0-9a-f]{2})/print chr hex $1/gie'
Also, zsh and bash support this natively in echo:
echo -e '\x5a'
You can do this with echo only, without the other stuff. Don't forget to add "-n" or you will get a linebreak automatically:
echo -n -e "\x5a"
Bash one-liner
echo -n "5a" | while read -N2 code; do printf "\x$code"; done
Some Python 3 one-liners that work with any number of bytes.
Decoding hex
Using strip, so that it's ok to have a newline on stdin.
$ echo 666f6f0a | python3 -c "import sys, binascii; sys.stdout.buffer.write(binascii.unhexlify(input().strip()))"
foo
Encoding hex
$ echo foo | python3 -c "import sys, binascii; print(binascii.hexlify(sys.stdin.buffer.read()).decode())"
666f6f0a
Depending on where you got that "5a", you can just prepend "\x" to it and pass that to printf:
$ a=5a
$ a="\x${a}"
$ printf "$a"
Z
echo 5a | python -c "import sys; print chr(int(sys.stdin.read(),base=16))"
Here is a pure bash script (as printf is a bash builtin) :
#warning : spaces do matter
die(){ echo "$#" >&2;exit 1;}
p=48656c6c6f0a
test $((${#p} & 1)) == 0 || die "length is odd"
p2=''; for ((i=0; i<${#p}; i+=2));do p2=$p2\\x${p:$i:2};done
printf "$p2"
If bash is already running, this should be faster than any other solution which is launching a new process.
dc can convert between numeric bases:
$ echo 5a | (echo 16i; tr 'a-z' 'A-Z'; echo P) | dc
Z$
There is a simple shell command ascii.
If you use Ubuntu, install it with:
sudo apt install ascii
Then
ascii 0x5a
will output:
ASCII 5/10 is decimal 090, hex 5a, octal 132, bits 01011010: prints as `Z'
Official name: Majuscule Z
Other names: Capital Z, Uppercase Z
As per #Randal comment, you can use perl, e.g.
$ printf 5a5a5a5a | perl -lne 'print pack "H*", $_'
ZZZZ
and other way round:
$ printf ZZZZ | perl -lne 'print unpack "H*", $_'
5a5a5a5a
Another example with file:
$ printf 5a5a5a5a | perl -lne 'print pack "H*", $_' > file.bin
$ perl -lne 'print unpack "H*", $_' < file.bin
5a5a5a5a
You can use this command (python script) for larger inputs:
echo 58595a | python -c "import sys; import binascii; print(binascii.unhexlify(sys.stdin.read().strip()).decode())"
The result will be:
XYZ
And for more simplicity, define an alias:
alias hexdecoder='python -c "import sys; import binascii; print(binascii.unhexlify(sys.stdin.read().strip()).decode())"'
echo 58595a | hexdecoder
GNU awk 4.1
awk -niord '$0=chr("0x"RT)' RS=.. ORS=
Note that if you echo to this it will produce an extra null byte
$ echo 595a | awk -niord '$0=chr("0x"RT)' RS=.. ORS= | od -tx1c
0000000 59 5a 00
Y Z \0
Instead use printf
$ printf 595a | awk -niord '$0=chr("0x"RT)' RS=.. ORS= | od -tx1c
0000000 59 5a
Y Z
Also note that GNU awk produces UTF-8 by default
$ printf a1 | awk -niord '$0=chr("0x"RT)' RS=.. ORS= | od -tx1
0000000 c2 a1
If you are dealing with characters outside of ASCII, and you are going to be
Base64 encoding the resultant string, you can disable UTF-8 with -b
echo 5a | sha256sum | awk -bniord 'RT~/\w/,$0=chr("0x"RT)' RS=.. ORS=
Similar to my answer here: Linux shell scripting: hex number to binary string
You can do it with the same tool like this (using ascii printable character instead of 5a):
echo -n 616263 | cryptocli dd -decoders hex
Will produce the following result:
abcd

Resources