Why do base64/openssl use a padding character of 'K' instead of '=' - base64

Ive noticed that php base64_encode uses '=' as a padding character. According to Wikipedia the different types use either '=' or none. However the CLI base64 command as well as openssl enc -base64 use 'K' as the padding. I am looking for information as to why and what implementations they use.
echo base64_encode('hello'); // aGVsbG8=
echo hello | base64 -i - // aGVsbG8K
openssl enc -base64 <<< hello // aGVsbG8K

K is not padding character. It is a result of the newline which is added by the shell commands.
echo hello | openssl enc -base64 # aGVsbG8K
echo -n hello | openssl enc -base64 # aGVsbG8=
UPDATE:
Technical explanation
Base64 converts the provided bitstream to 6-bit-chunks instead of 8-bit chunks. Then a special table (other than the ascii table) with 64 printable-only characters (thus the encoding name), is used to convert these 6-bit chunks to characters:
Let's see it in practice. (print-bits and print-b64-bits are imaginary commands )
With newline:
echo hello | print-bits
# 01101000 (h) 01100101 (e) 01101100 (l) 01101100 (l) 01101111 (o) 00001010 (\n)
echo hello | print-b64-bits
# 011010 (a) 000110 (G) 010101 (V) 101100 (s) 011011 (b) 000110 (G) 111100 (8) 001010 (K)
No newline:
echo -n hello | print-bits
# 01101000 (h) 01100101 (e) 01101100 (l) 01101100 (l) 01101111 (o)
echo -n hello | print-b64-bits
# 011010 (a) 000110 (G) 010101 (V) 101100 (s) 011011 (b) 000110 (G) 111100 (8)
In the latter case the output characters are 7. A = char needs to be appended to make them 8 (a product of 4).
Note: A newline at the end is not always converted to K. It could be o or g. This depends on the number of input bytes. Consider the case below:
echo helllo | print-bits
# 01101000 (h) 01100101 (e) 01101100 (l) 01101100 (l) 01101100 (l) 01101111 (o) 00001010 (\n)
echo helllo | print-b64-bits
# 011010 (a) 000110 (G) 010101 (V) 101100 (s) 011011 (b) 000110 (G) 110001 (x) 101111 (v) 000010 (C) 10 (g)
In the case above the last 2 bits will first be padded with zeros, then conversion to printable characters will follow. The last output character is now g.
And since the output characters are 10, two = need to be added to make them 12 (product of 4).

Related

How do I get AWK to reaarrange and manipulate text in a file to two output files depending on conditions?

I tried to find an efficient way to split then recombine text in one file into two seperate files. it's got a lot going on like removing the decimal point, reversing the sign (+ becomes - and - becomes +) in amount field and padding. For example:
INPUT file input.txt:
(this first line is there just to give character position more easily instead of counting, it's not present in the input file, the "|" is just there to illustrate position only)
1234567890123456789012345678901234567890123456789012345678901234567890123456789012345
| | | | | | | ("|" shows position)
123456789XXPPPPPPPPPP NNNNNN#1404.58 #0.00 0 1
987654321YYQQQQQQQQQQ NNNNNN#-97.73 #-97.73 1 1
777777777XXGGGGGGGGGG NNNNNN#115.92 #115.92 0 0
888888888YYHHHHHHHHHH NNNNNN#3.24 #3.24 1 0
Any line that contains a "1" as the 85th character above goes to one file say OutputA.txt rearranged like this:
PPPPPPPPPP~~NNNNNN123456789XX~~~-0000140458-0000000000
QQQQQQQQQQ~~NNNNNN987654321YY~~~+0000009773+0000009773
As well as any line that contains a "0" as the 85th character above goes to another file OutputB.txt rearranged like this:
GGGGGGGGGG~~NNNNNN777777777XX~~~-0000011592-0000011592
HHHHHHHHHH~~NNNNNN888888888YY~~~-0000000324-0000000324
It seems so complicated, but if I could just grab each portion of the input lines as different variables and then write them out in a different order with right alignment for the amount padded with 0s and splitting them into different files depending on the last column. Not sure how I can put all these things together in one go.
I tried printing out each line into a different file depending whether the 85th charater is a 1 or 0, then then trying to create variables say from first character to 11th character is varA and the next 10 is varB etc... but it get complex quickly because I need to change + to - and - to + and then pad with zeros and change te spacing. it gets a bit mad. This should be possible with one script but I just can't put all the pieces together.
I've looked for tutorials but nothing seems to cover grabbing based on condition whilst at the same time padding, rearranging, splitting etc.
Many thanks in advance
split
Use GNU AWK ability to print to file, consider following simple example
seq 20 | awk '$1%2==1{print $0 > "fileodd.txt"}$1%2==0{print $0 > "fileeven.txt"}'
which does read output of seq 20 (numbers from 1 to 20, inclusive, each on separate line) and does put odd numbers to fileodd.txt and even number to fileeven.txt
recombine text
Use substr and string contatenation for that task, consider following simple example, say you have file.txt with DD-MM-YYYY dates like so
01-29-2022
01-30-2022
01-31-2022
but you want YYYY-MM-DD then you could do that by
awk '{print substr($0,7,4) "-" substr($0,1,2) "-" substr($0,4,2)}' file.txt
which gives output
2022-01-29
2022-01-30
2022-01-31
substr arguments are: string ($0 is whole line), start position and length, space is concatenation operator.
removing the decimal point
Use gsub with second argument set to empty string to delete unwanted characters, but keep in mind . has special meaning in regular expression, consider following simple example, let file.txt content be
100.15
200.30
300.45
then
awk '{gsub(/[.]/,"");print}' file.txt
gives output
10015
20030
30045
Observe that /[.]/ not /./ is used and gsub does change in-place.
reversing the sign(...)padding
Multiply by -1, then use sprintf with suitable modifier, consider following example, let file.txt content be
1
-10
100
then
awk '{print "Reversed value is " sprintf("%+05d",-1*$1)}' file.txt
gives output
Reversed value is -0001
Reversed value is +0010
Reversed value is -0100
Explanation: % - this is place where value will be instered, + - prefix using - or +, 05 - pad with leading zeros to width of 5 characters, d assume value is integer. sprintf does return formatted string which can be concatenated with other string as shown above.
(tested in GNU Awk 5.0.1)
You can use jq for this task:
#!/bin/bash
INPUT='
123456789XXPPPPPPPPPP NNNNNN#1404.58 #0.00 0 1
987654321YYQQQQQQQQQQ NNNNNN#-97.73 #-97.73 1 1
777777777XXGGGGGGGGGG NNNNNN#115.92 #115.92 0 0
888888888YYHHHHHHHHHH NNNNNN#3.24 #3.24 1 0
'
convert() {
jq -rR --arg lineSelector "$1" '
def transformNumber($len):
tonumber | # convert string to number
(if . < 0 then "+" else "-" end) as $sign | # store inverted sign
if . < 0 then 0 - . else . end | # abs(number)
. * 100 | # number * 100
tostring | # convert number back to string
$sign + "0" * ($len - length) + .; # indent with leading zeros
# Main program
split(" ") | # split each line by space
map(select(length > 0)) | # remove empty entries
select(.[4] == $lineSelector) | # keep only lines with the selected value in last column
# generate output # example for first line
.[0][11:21] + # PPPPPPPPPP
"~~" + # ~~
(.[1] | split("#")[0]) + # NNNNNN
.[0][0:11] + # 123456789XX
"~~~" + # ~~~
(.[1] | split("#")[1] | transformNumber(10)) + # -0000140458
(.[2] | split("#")[1] | transformNumber(10)) # -0000000000
' <<< "$2"
}
convert 0 "$INPUT" # or convert 1 "$INPUT"
Output for 0
GGGGGGGGGG~~NNNNNN777777777XX~~~-0000011592-0000011592
HHHHHHHHHH~~NNNNNN888888888YY~~~-0000000324-0000000324
Output for 1
PPPPPPPPPP~~NNNNNN123456789XX~~~-0000140458-0000000000
QQQQQQQQQQ~~NNNNNN987654321YY~~~+0000009773+0000009773

Print random line into different text file

What I want to do is print a random line from text file A into text file B WITHOUT it choosing the same line twice. So if text file B has a line with the number 25 in it, it will not choose that line from text file A
I have figured out how to print a random line from text file A to text file B, however, I am not sure how to make sure it does not choose the same line twice.
echo "$(printf $(cat A.txt | shuf -n 1))" > /home/B.txt
grep -Fxv -f B A | shuf -n 1 >> B
First part (grep) prints difference of A and B to stdout, i.e. lines present in A but absent in B:
-F — Interpret PATTERNS as fixed strings, not regular expressions.
-x — Select only those matches that exactly match the whole line.
-v — Invert the sense of matching.
-f FILE — Obtain patterns from FILE.
Second part (shuf -n 1) prints random line from stdin. Output is appended to B.
That's not really "random", then. Never mind.
Please try the following awk solution - I think it does what you're trying to achieve.
$ cat A
11758
1368
26149
2666
27666
11155
31832
11274
21743
25
$ cat B
18518
8933
941
32286
1234
25
1608
5284
23040
19028
$ cat pseudo
BEGIN{
"bash -c 'echo ${RANDOM}'"|getline seed # Generate a random seed
srand(seed) # use random seed, otherwise each repeated run will generate the same random sequence
count=0 # set a counter
}
NR==FNR{ # while on the first file, remember every number; note this will weed out duplicates!
b[$1]=1
}
!($1 in b) { # for numbers we haven't seen yet (so on the second file, ignoring ones present in file B)
a[count]=$1 # remember new numbers in an associative array with an integer index
count++
}
END{
r=(int(rand() * count)) # generate a random number in the range of our secondary array's index values
print a[r] >> "B" # print that randomly chosen element to the last line of file B
}
$ awk -f pseudo B A
$ cat B
18518
8933
941
32286
1234
25
1608
5284
23040
19028
27666
$
$ awk -f pseudo B A
$ cat B
18518
8933
941
32286
1234
25
1608
5284
23040
19028
27666
31832

Convert carriage return (\r) to actual overwrite

Questions
Is there a way to convert the carriage returns to actual overwrite in a string so that 000000000000\r1010 is transformed to 101000000000?
Context
1. Initial objective:
Having a number x (between 0 and 255) in base 10, I want to convert this number in base 2, add trailing zeros to get a 12-digits long binary representation, generate 12 different numbers (each of them made of the last n digits in base 2, with n between 1 and 12) and print the base 10 representation of these 12 numbers.
2. Example:
With x = 10
Base 2 is 1010
With trailing zeros 101000000000
Extract the 12 "leading" numbers: 1, 10, 101, 1010, 10100, 101000, ...
Convert to base 10: 1, 2, 5, 10, 20, 40, ...
3. What I have done (it does not work):
x=10
x_base2="$(echo "obase=2;ibase=10;${x}" | bc)"
x_base2_padded="$(printf '%012d\r%s' 0 "${x_base2}")"
for i in {1..12}
do
t=$(echo ${x_base2_padded:0:${i}})
echo "obase=10;ibase=2;${t}" | bc
done
4. Why it does not work
Because the variable x_base2_padded contains the whole sequence 000000000000\r1010. This can be confirmed using hexdump for instance. In the for loop, when I extract the first 12 characters, I only get zeros.
5. Alternatives
I know I can find alternative by literally adding zeros to the variable as follow:
x_base2=1010
x_base2_padded="$(printf '%s%0.*d' "${x_base2}" $((12-${#x_base2})) 0)"
Or by padding with zeros using printf and rev
x_base2=1010
x_base2_padded="$(printf '%012s' "$(printf "${x_base2}" | rev)" | rev)"
Although these alternatives solve my problem now and let me continue my work, it does not really answer my question.
Related issue
The same problem may be observed in different contexts. For instance if one tries to concatenate multiple strings containing carriage returns. The result may be hard to predict.
str=$'bar\rfoo'
echo "${str}"
echo "${str}${str}"
echo "${str}${str}${str}"
echo "${str}${str}${str}${str}"
echo "${str}${str}${str}${str}${str}"
The first echo will output foo. Although you might expect the other echo to output foofoofoo..., they all output foobar.
The following function overwrite transforms its argument such that after each carriage return \r the beginning of the string is actually overwritten:
overwrite() {
local segment result=
while IFS= read -rd $'\r' segment; do
result="$segment${result:${#segment}}"
done < <(printf '%s\r' "$#")
printf %s "$result"
}
Example
$ overwrite $'abcdef\r0123\rxy'
xy23ef
Note that the printed string is actually xy23ef, unlike echo $'abcdef\r0123\rxy' which only seems to print the same string, but still prints \r which is then interpreted by your terminal such that the result looks the same. You can confirm this with hexdump:
$ echo $'abcdef\r0123\rxy' | hexdump -c
0000000 a b c d e f \r 0 1 2 3 \r x y \n
000000f
$ overwrite $'abcdef\r0123\rxy' | hexdump -c
0000000 x y 2 3 e f
0000006
The function overwrite also supports overwriting by arguments instead of \r-delimited segments:
$ overwrite abcdef 0123 xy
xy23ef
To convert variables in-place, use a subshell: myvar=$(overwrite "$myvar")
With awk, you'd set the field delimiter to \r and iterate through fields printing only the visible portions of them.
awk -F'\r' '{
offset = 1
for (i=NF; i>0; i--) {
if (offset <= length($i)) {
printf "%s", substr($i, offset)
offset = length($i) + 1
}
}
print ""
}'
This is indeed too long to put into a command substitution. So you better wrap this in a function, and pipe the lines to be resolved to that.
To answer the specific question, how to convert 000000000000\r1010 to 101000000000, refer to Socowi's answer.
However, I wouldn't introduce the carriage return in the first place and solve the problem like this:
#!/usr/bin/env bash
x=$1
# Start with 12 zeroes
var='000000000000'
# Convert input to binary
binary=$(bc <<< "obase = 2; $x")
# Rightpad with zeroes: ${#binary} is the number of characters in $binary,
# and ${var:x} removes the first x characters from $var
var=$binary${var:${#binary}}
# Print 12 substrings, convert to decimal: ${var:0:i} extracts the first
# i characters from $var, and $((x#$var)) interprets $var in base x
for ((i = 1; i <= ${#var}; ++i)); do
echo "$((2#${var:0:i}))"
done

Shell Script : How to extract fields in variable positions by name from a string

I have lines such as the following:
Mar 21 09:53:41 srv-1 kernel: [846595.861054] m5tomm7: IN=eth0 OUT=eth0 MAC=00:00:00:00:00:00:00:00:00:00:00:00:00:00 SRC=192.168.3.202 DST=192.168.2.99 LEN=52 TOS=0x00 PREC=0x00 TTL=126 ID=8076 DF PROTO=TCP SPT=62956 DPT=5358 WINDOW=8192 RES=0x00 SYN URGP=0
and I want to extract the SRC, DST, PROTO, and DPT fields.
I cannot rely on using field indices, because they vary.
# perl -ne will loop over the input data and run the following program
cat logfile | perl -ne '
my $h = {}; # Declare a hash
# match all KEY=VALUE pairs in the line
while ( m|(\w+)=(\S+)|g ) {
$h->{$1} = $2; # Store ($1 = KEY, $2 = VALUE)
};
print join(",",$h->{SRC},$h->{DST},$h->{PROTO},$h->{DPT}) . "\n";
'
output
192.168.3.202,192.168.2.99,TCP,5358
Combining grep with its -P option with paste allows for a simple solution (requires the GNU implementations):
$ grep -Po '\b(SRC|DST|PROTO|DPT)=\K[^ ]+' file | paste -s -d' \n'
192.168.3.202 192.168.2.99 TCP 5358
-P enables PRCEs (Perl-compatible Regular Expressions).
-o outputs only the matching part(s) of the line, each match on its own output line.
\K (a feature enabled by -P) drops everything matched so far; omit this, if you want the field names and = included in the output too (e.g., SRC=192.168.3.202).
The paste command then joins every 4 lines with spaces to form a single line, by applying the separator (delimiter) string, ' \n', cyclically - note how the string is composed of exactly 4 characters - 3 spaces and a newline - that matches the number of fields to extract per line.

Linux: Fix the output content from a file

Really need your help. I have a file which included data like (field:value) in one line
File.A
A:13 B:2 D:5 F:92 G:3 ...
I had created a file which include "A to Z".
File.B
A B C D E F G H I J ...
And trying to use bash script to get content and fix the output which will insert the miss line with 0 value.
A:13 B:2 C:0 D:5 E:0 F:92 G:3 H:0 ...
Think over two days.. but still not thing come out from my head. Is there any way I can solve it?
Let's make brace expansion work: {A..Z} expands as all the list of letters:
$ echo {A..Z}
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Then we can loop through all lines greping. In case it matches, we print the line; otherwise, we print letter:0.
for letter in {A..Z}
do
grep "^$letter" file || echo "$letter:0"
done
Test
$ for letter in {A..Z}; do grep "^$letter" file || echo "$letter:0"; done
A:13
B:2
C:0
D:5
E:0
F:92
G:3
H:0
I:0
J:0
K:0
L:0
M:0
N:0
O:0
P:0
Q:0
R:0
S:0
T:0
U:0
V:0
W:0
X:0
Y:0
Z:0
Now that you updated the question and the input file contains everything in the same line, you can use this grep to match:
grep -o "$word:[0-9]*" file
and then replace new lines with spaces:
$ for word in {A..Z}; do grep -o "$word:[0-9]*" file || echo "$word:0"; done | tr '\n' ' '
A:13 B:2 C:0 D:5 E:0 F:92 G:3 H:0 I:0 J:0 K:0 L:0 M:0 N:0 O:0 P:0 Q:0 R:0 S:0 T:0 U:0 V:0 W:0 X:0 Y:0 Z:0
If you fancy a bit of awk, you could try this:
awk -F: -vRS=" " '
{ c[$1] = $2 }
END{
for(i=65;i<91;++i){
a=sprintf("%c", i)
printf("%c:%d ",i,c[a])
}
}' A
where A is your file. The first block builds an array of all the values that have been set. Once all of the file has been read, the loop goes through the ascii values of A (65) to Z (90) and prints out the values that have been set in the array. The ones that are missing are printed as 0.
Output:
A:13 B:2 C:0 D:5 E:0 F:92 G:3 H:0 I:0 J:0 K:0 L:0 M:0 N:0 O:0 P:0 Q:0 R:0 S:0 T:0 U:0 V:0 W:0 X:0 Y:0 Z:0
Since everyone clearly can't get enough from my answer, here's another way you could do it, inspired by the {A..Z} range used in #fedorqui's answer:
awk -F: -vRS=" " '
NR==FNR { a[i++] = $1; next }
{ b[$1] = $2 }
END{for(i=0;i<length(a);++i)printf("%c:%d ",a[i],b[a[i]])}' - <<<$(echo {A..Z}) A
The first block reads in all the letters of the alphabet, thus reducing the need to know their character codes. The second block builds an array from your file A. Once the file has been read, All the values are printed out, resulting in the same output as above.
Pure Bash, no external processes. Print the match if the letter is found in the line or the letter followed by 0 otherwise.
read content < "$infile"
for letter in {A..Z}; do
if [[ $content =~ ${letter}:[[:digit:]]+ ]] ; then
echo "${BASH_REMATCH[0]}"
else
echo "${letter}:0"
fi
done
or shorter
for x in {A..Z}; do
[[ $content =~ ${x}:[0-9]+ ]] && echo "${BASH_REMATCH[0]}" || echo "${x}:0"
done

Resources