Decode base64 strings into hex strings in a file and overwrite - linux

I have a list of base64 strings in a file (file.txt) that I need to convert into hex. E.g.,
6IwwfX8Cctn85LW+vItMhw==
wIsNfYESR9Nfueo7mg3f7Q==
A+MxnRyu6kotbKPZglQ0Fg==
Jt5jNIphpmfGoFgtgM7/Sg==
sN+Q0Xcu6JHlkqdhJlM/tw==
Command:
echo -n 6IwwfX8Cctn85LW+vItMhw== | base64 -d | od -t x1 -An
This command works individually (albeit the spaces in between), but I need to convert through each string in the file, which has more than 500 lines.
Basically, I want the above base64 string format to be decoded to the below example hex string format:
30aa268d130fb78a4f8cb6f300e4c760
Is there a way that I can call each line in the file (like a for each command) and pipe with the base64 command to convert? Any help is appreciated.

Try:
for b64 in $(cat e.txt); do echo "$b64" | base64 -d | od -t x1 -An | tr -d ' '; done
The tr -d ' ' at the end deletes all spaces.

cat file.txt | while read input ; do echo -n "$input" | base64 -d | od -t x1 -An ; done

cat file.txt | while read b64
do echo -n "$b64" | base64 -d | od -t x1 -An | sed 's/[\t ]*//g'
done

You can use Awk for this and run a command on the contents of the file using the getline() syntax
awk '{ cmd = "printf '%s' "$1 "| base64 -d | od -t x1 -An" }
{ while ( ( cmd | getline result ) > 0 ) { gsub(/[[:space:]]+/,"",result); $0 = result }; close(cmd) }1' file.txt
Use a temporary file for overwriting the file contents and move back the original file
tmpfile=$(mktemp)
awk '{ cmd = "printf '%s' "$1 "| base64 -d | od -t x1 -An" }
{ while ( ( cmd | getline result ) > 0 ) { gsub(/[[:space:]]+/,"",result); $0 = result }; close(cmd) }1' file.txt > "$tmpfile" && mv "$tmpfile" file.txt

Related

Difficulty to create .txt file from loop in bash

I've this data :
cat >data1.txt <<'EOF'
2020-01-27-06-00;/dev/hd1;100;/
2020-01-27-12-00;/dev/hd1;100;/
2020-01-27-18-00;/dev/hd1;100;/
2020-01-27-06-00;/dev/hd2;200;/usr
2020-01-27-12-00;/dev/hd2;200;/usr
2020-01-27-18-00;/dev/hd2;200;/usr
EOF
cat >data2.txt <<'EOF'
2020-02-27-06-00;/dev/hd1;120;/
2020-02-27-12-00;/dev/hd1;120;/
2020-02-27-18-00;/dev/hd1;120;/
2020-02-27-06-00;/dev/hd2;230;/usr
2020-02-27-12-00;/dev/hd2;230;/usr
2020-02-27-18-00;/dev/hd2;230;/usr
EOF
cat >data3.txt <<'EOF'
2020-03-27-06-00;/dev/hd1;130;/
2020-03-27-12-00;/dev/hd1;130;/
2020-03-27-18-00;/dev/hd1;130;/
2020-03-27-06-00;/dev/hd2;240;/usr
2020-03-27-12-00;/dev/hd2;240;/usr
2020-03-27-18-00;/dev/hd2;240;/usr
EOF
I would like to create a .txt file for each filesystem ( so hd1.txt, hd2.txt, hd3.txt and hd4.txt ) and put in each .txt file the sum of the value from each FS from each dataX.txt. I've some difficulties to explain in english what I want, so here an example of the result wanted
Expected content for the output file hd1.txt:
2020-01;/dev/hd1;300;/
2020-02;/dev/hd1;360;/
2020-03;/dev/hd1;390:/
Expected content for the file hd2.txt:
2020-01;/dev/hd2;600;/usr
2020-02;/dev/hd2;690;/usr
2020-03;/dev/hd2;720;/usr
The implementation I've currently tried:
for i in $(cat *.txt | awk -F';' '{print $2}' | cut -d '/' -f3| uniq)
do
cat *.txt | grep -w $i | awk -F';' -v date="$(cat *.txt | awk -F';' '{print $1}' | cut -d'-' -f-2 | uniq )" '{sum+=$3} END {print date";"$2";"sum}' >> $i
done
But it doesn't works...
Can you show me how to do that ?
Because the format seems to be so constant, you can delimit the input with multiple separators and parse it easily in awk:
awk -v FS='[;-/]' '
prev != $9 {
if (length(output)) {
print output >> fileoutput
}
prev = $9
sum = 0
}
{
sum += $9
output = sprintf("%s-%s;/%s/%s;%d;/%s", $1, $2, $7, $8, sum, $11)
fileoutput = $8 ".txt"
}
END {
print output >> fileoutput
}
' *.txt
Tested on repl generates:
+ cat hd1.txt
2020-01;/dev/hd1;300;/
2020-02;/dev/hd1;360;/
2020-03;/dev/hd1;390;/
+ cat hd2.txt
2020-01;/dev/hd2;600;/usr
2020-02;/dev/hd2;690;/usr
2020-03;/dev/hd2;720;/usr
Alternatively, you could -v FS=';' and use split to split first and second column to extract the year and month and the hdX number.
If you seek a bash solution, I suggest you invert the loops - first iterate over files, then over identifiers in second column.
for file in *.txt; do
prev=
output=
while IFS=';' read -r date dev num path; do
hd=$(basename "$dev")
if [[ "$hd" != "${prev:-}" ]]; then
if ((${#output})); then
printf "%s\n" "$output" >> "$fileoutput"
fi
sum=0
prev="$hd"
fi
sum=$((sum + num))
output=$(
printf "%s;%s;%d;%s" \
"$(cut -d'-' -f1-2 <<<"$date")" \
"$dev" "$sum" "$path"
)
fileoutput="${hd}.txt"
done < "$file"
printf "%s\n" "$output" >> "$fileoutput"
done
You could also almost translate awk to bash 1:1 by doing IFS='-;/' in while read loop.

Fastest way to compare hundreds of thousands of files, and create output results file in bash

I have the following:
-Values File, values.txt
-Directory Structure: ./dataset/label/author/files.txt
-Tens of thousands of files.txt's
-A file called targets.txt, which contains the location of every files.txt
Example targets.txt
./dataset/tallperson/Jabba/awesome.txt
./dataset/fatperson/Detox/toxic.txt
I have a file called values.txt, which contains hundreds of thousands of lines of values. These values are things like "aef", "; i", "jfk", etc. Random 3-Character lines.
I also have tens of thousands of files, each which also contain hundreds to thousands of lines. Each line also contains Random 3-Character lines.
The values.txt was created using the values of each files.txt. Therefore, there is no value in any file.txt file which isn't contained in values.txt. values.txt contains NO repeating values.
Example:
./dataset/weirdperson/Crooked/file1.txt
LOL
hel
lo
how
are
you
on
thi
s f
ine
day
./dataset/awesomeperson/Mild/file2.txt
I a
m v
ery
goo
d.
Tha
nks
LOL
values.txt
are
you
on
thi
s f
ine
day
goo
d.
Tha
hel
lo
how
I a
m v
ery
nks
LOL
The above is just example data. Each file will contain hundreds of lines. And values.txt will contain hundreds of thousands of lines.
My goal here is to make one file, where each line is a file. Each line will contain N values where each value is correspondant to the line in values.txt. And each value will be seperated by a comma. Each value is calculated simply by how many times each file contains the value of each line in values.txt.
The result should look something like this. With line 1 being file1.txt and line 2 being file2.txt.
Result.txt
1,1,1,1,1,1,1,0,0,0,1,1,1,0,0,0,0,1,
0,0,0,0,0,0,0,1,1,1,0,0,0,1,1,1,1,1,
Now. The last thing is, after getting this result I would like to add a label. The label is equivalent to the Nth parent directory from the file. For this example, lets say the 2nd parent directory. Therefore the label would be "tallperson" or "shortperson". As a result, the new Results.txt file would look like this.
Results.txt
1,1,1,1,1,1,1,0,0,0,1,1,1,0,0,0,0,1,weirdperson
0,0,0,0,0,0,0,1,1,1,0,0,0,1,1,1,1,1,awesomeperson
I would like a way to accomplish all of this, but I need it to be fast as I am working with a very large scale dataset.
This is my current code, but it's too slow. The bottleneck is line 2.
Script. Each file located at "./dataset/label/author/file.java"
1 while IFS= read file_name; do
2 cat values.txt | xargs -d '\n' -I {} grep -Fc -- "{}" "$file_name" | xargs printf "%d," >> Results.txt;
3 label=$(echo "$file_name" | cut -d '/' -f 3);
4 printf "$label\n" >> Results.txt;
5 done < targets.txt
------------
To REPLICATE this problem. Do the following:
mkdir -p dataset/{label1,label2}
touch file1.txt; chmod 777 file1.txt
touch file2.txt; chmod 777 file2.txt
echo "Enter anything here" > file1.txt
echo "Enter something here too" > file2.txt
mv file1.txt ./dataset/label1
mv file2.txt ./dataset/label2
find ./dataset/ -type f -name "*.txt" | while IFS= read file_name; do cat $file_name | sed -e "s/.\{3\}/&\n/g" | sort -u > $modified-file_name; done
find ./dataset/ -type f -name "modified-*.txt" | xargs -d '\n' -I {} echo {} >> targets.txt
xargs cat < targets.txt | sort -u > values.txt
With the above UNCHANGED, you should get a values.txt with something similar to below. If there's any lines with less or more than 3 characters for some reason, please delete the line.
any
e
Ent
er
eth
he
her
ing
ng
re
som
thi
too
You should get a targets.txt file
./dataset/label2/modified-file2.txt
./dataset/label1/modified-file1.txt
From here. The goal is to check every file in targets.txt, and count how many values the file has contained in values.txt. And to output the results with the label to Results.txt
The following script will work for this example, but I need it to be way faster for large scale operations.
while IFS= read file_name; do
cat values.txt | xargs -d '\n' -I {} grep -Fc -- "{}" $file_name | xargs printf "%d," >> Results.txt;
label=$(echo "$file_name" | cut -d '/' -f 3);
printf "$label\n" >> Results.txt;
done < targets.txt
Here's another example
Example 2:
./dataset/weirdperson/Crooked/file1.txt
LOL
LOL
HAHA
./dataset/awesomeperson/Mild/file2.txt
LOL
LOL
LOL
values.txt
LOL
HAHA
Result.txt
2,1,weirdperson
3,0,awesomeperson
Here's a solution in Python, using its ordered dictionary datatype.
import os
from collections import OrderedDict
# read samples from values.txt into an Ordered Dict.
# each dict key is a line from the file
# (including the trailing newline, but that doesn't matter)
# each dict value is 0
with open('values.txt', 'r') as f:
samplecount0=OrderedDict((sample, 0) for sample in f.readlines())
# get list of filenames from targets.txt
with open('targets.txt', 'r') as f:
targets=[t.rstrip('\n') for t in f.readlines()]
# for each target,
# read its lines of samples
# increment the corresponding count in samplecount
# print out samplecount in a single line separated by commas
# each line also has the 2nd-to-last directory component of the target's pathname
for target in targets:
with open(target, 'r') as f:
# copy samplecount0 to samplecount so we don't have to read the values.txt file again
samplecount=samplecount0.copy()
# for each sample in the target file, increment the samplecount dict entry
for tsample in f.readlines():
samplecount[tsample] += 1
output = ','.join(str(v) for v in samplecount.values())
output += ',' + os.path.basename(os.path.dirname(os.path.dirname(target)))
print(output)
Output:
$ python3 doit.py
1,1,1,1,1,1,1,0,0,0,1,1,1,0,0,0,0,1,weirdperson
0,0,0,0,0,0,0,1,1,1,0,0,0,1,1,1,1,1,awesomeperson
Try this:
<targets.txt xargs -n1 -P4 bash -c "
awk 'NR==FNR{a[\$0];next} {if (\$0 in a) {printf \"1,\"} else {printf \"0,\"}}' \"\$1\" values.txt |
sed $'s\x01$\x01'\"\$(<<<\"\$1\" cut -d/ -f3)\"'\n'$'\x01'
" --
The -P4 let's you parallelize the jobs in targets.txt. The short awk script marges lines and prints 0 and 1 followed by a comma. Then sed is used to append the 3rd part of the folder path to the end of the line. The sed line looks strange, because I used unprintable character $'\x01' as the separator for s command.
Tested with:
mkdir -p ./dataset/weirdperson/Crooked
cat <<EOF >./dataset/weirdperson/Crooked/file1.txt
LOL
hel
lo
how
are
you
on
thi
s f
ine
day
EOF
mkdir -p ./dataset/awesomeperson/Mild/
cat <<EOF >./dataset/awesomeperson/Mild/file2.txt
I a
m v
ery
goo
d.
Tha
nks
LOL
EOF
cat <<EOF >values.txt
are
you
on
thi
s f
ine
day
goo
d.
Tha
hel
lo
how
I a
m v
ery
nks
LOL
EOF
cat <<EOF >targets.txt
./dataset/weirdperson/Crooked/file1.txt
./dataset/awesomeperson/Mild/file2.txt
EOF
measure_start() {
declare -g ttic_start
echo "==> Test $* <=="
ttic_start=$(date +%s.%N)
}
measure_end() {
local end
end=$(date +%s.%N)
local start
start="$ttic_start"
ttic_runtime=$(python -c "print(${end} - ${start})")
echo "Runtime: $ttic_runtime"
echo
}
measure_start original
while IFS= read file_name; do
cat values.txt | xargs -d '\n' -I {} grep -Fc -- "{}" $file_name | xargs printf "%d,"
label=$(echo "$file_name" | cut -d '/' -f 3);
printf "$label\n"
done < targets.txt
measure_end
measure_start first try with bash
nl -w1 values.txt | sort -k2.2 > values_sorted.txt
< targets.txt xargs -n1 -P0 bash -c "
sort -t$'\t' \"\$1\" |
join -t$'\t' -12 -21 -eEMPTY -a1 -o1.1,2.1 values_sorted.txt - |
sort -s -n -k1.1 |
sed 's/.*\tEMPTY/0/;t;s/.*/1/' |
tr '\n' ',' |
sed $'s\x01$\x01'\"\$(<<<\"\$1\" cut -d/ -f3)\"'\n'$'\x01'
" --
measure_end
measure_start second try with awk
<targets.txt xargs -n1 -P0 bash -c "
awk 'NR==FNR{a[\$0];next} {if (\$0 in a) {printf \"1,\"} else {printf \"0,\"}}' \"\$1\" values.txt |
sed $'s\x01$\x01'\"\$(<<<\"\$1\" cut -d/ -f3)\"'\n'$'\x01'
" --
measure_end
Outputs:
==> Test original <==
1,1,1,1,1,1,1,0,0,0,1,1,1,0,0,0,0,1,weirdperson
0,0,0,0,0,0,0,1,1,1,0,0,0,1,1,1,1,1,awesomeperson
Runtime: 0.133769512177
==> Test first try with bash <==
0,0,0,0,0,0,0,1,1,1,0,0,0,1,1,1,1,1,awesomeperson
1,1,1,1,1,1,1,0,0,0,1,1,1,0,0,0,0,1,weirdperson
Runtime: 0.0322473049164
==> Test second try with awk <==
0,0,0,0,0,0,0,1,1,1,0,0,0,1,1,1,1,1,awesomeperson
1,1,1,1,1,1,1,0,0,0,1,1,1,0,0,0,0,1,weirdperson
Runtime: 0.0180222988129

Error Shell Script

When I try to run this script this error appears : operating extra /home/ubuntu/Desktop/Destino/, and I do not know why , someone help me please.
#!/bin/bash
input="/home/ubuntu/Desktop/Output/SAIDA.txt"
dt=`date +"%Y%m%d%H%M%S"`
layout='C'
if [ -e "$input" ] ; then
header=$(head -n 1 $input)
export header
tail -n +2 $input | split -l 99 -d --additional-suffix=.txt \ --filter='{ printf %s\\n "$header"; cat; }' >/home/ubuntu/Desktop/Destino/$FILE - NOMENCLATURA_${dt}_
for arquivo in ´Is/home/ubuntu/Desktop/*.txt´
do
NOME= ´cat $arquivo | cut -d "." -f1´
touch/home/ubuntu/Desktop/Destino/$NOME.cfg
echo $dt > $NOME.cfg
echo $layout > $NOME.cfg
done
else
echo "The input file does not exist."
fi
You have some strange quote characters in your script. To substitute the output of a command, wrap it with $() or backticks, not ´ characters.
for arquivo in ´Is/home/ubuntu/Desktop/*.txt´
I guess Is was meant to be ls, but you left out the space after it. But there's no need to parse the output of ls, just use the wildcard directly.
for arquivo in /home/ubuntu/Desktop/*.txt
On this line:
tail -n +2 $input | split -l 99 -d --additional-suffix=.txt \ --filter='{ printf %s\\n "$header"; cat; }' >/home/ubuntu/Desktop/Destino/$FILE - NOMENCLATURA_${dt}_
you need to put the output filename in quotes because of the spaces.
tail -n +2 $input | split -l 99 -d --additional-suffix=.txt \ --filter='{ printf %s\\n "$header"; cat; }' >"/home/ubuntu/Desktop/Destino/$FILE - NOMENCLATURA_${dt}_"
Also, the FILE variable is not set, you need to assign that earlier.
On this line:
NOME= ´cat $arquivo | cut -d "." -f1´
you're again using the wrong type of quotes to capture the output of the command. Also, you must not have a space between = and the value you want to assign. It should be:
NOME=$(cat $arquivo | cut -d "." -f1)
There's no need to do export header. The variable is only being used in this script, not in any child processes.

Translate Chinese to urlencoding in awk

I have a .txt file. And each line contains Chinese. I want to translate the Chinese to urlencoding.
How can I get it?
txt.file
http://wiki.com/ 中文
http://wiki.com/ 中国
target.file
http://wiki.com/%E4%B8%AD%E6%96%87
http://wiki.com/%E4%B8%AD%E5%9B%BD
I found a shell script way to approach it like this:
echo '中文' | tr -d '\n' | xxd -plain | sed 's/\(..\)/%\1/g' | tr '[a-z]' '[A-Z]'
So, I wanna embed it in awk like this, but I failed:
awk -F'\t' '{
a=system("echo '"$2"'| tr -d '\n' | xxd -plain | \
sed 's/\(..\)/%\1/g' | tr '[a-z]' '[A-Z]");
print $1a
}' txt.file
I have tried another way to write an outside function and call it in awk, code like this, failed it again.
zh2url()
{
echo $1 | tr -d '\n' | xxd -plain | sed 's/\(..\)/%\1/g' | tr '[a-z]' '[A-Z]'
}
export -f zh2url
awk -F'\t' "{a=system(\"zh2url $2\");print $1a}" txt.file
Please implement it with awk command because I actually have another thing need to handle in awk at the same time.
With GNU awk for co-processes, etc.:
$ cat tst.awk
function xlate(old, cmd, new) {
cmd = "xxd -plain"
printf "%s", old |& cmd
close(cmd,"to")
if ( (cmd |& getline rslt) > 0 ) {
new = toupper(gensub(/../,"%&","g",rslt))
}
close(cmd)
return new
}
BEGIN { FS="\t" }
{ print $1 xlate($2) }
$ awk -f tst.awk txt.file
http://wiki.com/%E4%B8%AD%E6%96%87
http://wiki.com/%E4%B8%AD%E5%9B%BD

What's the opposite of od(1)?

Say I have 8b1f 0008 0231 49f6 0300 f1f3 75f4 0c72 f775 0850 7676 720c 560d 75f0 02e5 ce00 0861 1302 0000 0000, how can I easily get a binary file from that without copying+pasting into a hex editor?
Use:
% xxd -r -p in.txt out.bin
See xxd.
This version will work with binary format too:
cat /bin/sh \
| od -A n -v -t x1 \
| tr -d '\r' \
| xxd -r -g 1 -p1 \
| md5sum && md5sum /bin/sh
The extra '\r' is just if you're dealing with DOS text files...
And process byte by byte to prevent endianness difference if running parts of a pipe on different systems.
All the present answers refer to the convenient xxd -r approach, but for situations where xxd is not available or convenient here is a more portable (and more flexible, but more verbose and less efficient) solution, using only POSIX shell syntax (it also compensates for odd-number of digits in input):
un_od() {
printf -- "$(
tr -d '\t\r\n ' | sed -e 's/^(.(.{2})*)$/0\1/' -e 's/\(.\{2\}\)/\\x\1/g'
)"
}
By the way: you don't specify whether your input is big-endian or little-endian, or whether you want big/little-endian output. Usually input such as in your question would be big-endian/network-order (e.g., as created by od -t x1 -An -v), and would be expected to transform to big-endian output. I presume xxd just assumes that default if not told otherwise, and this solution does that too. If byte-swapping is needed, how you do the byte-swapping also depends on the word-size of the system (e.g., 32 bit, 64 bit) and very rarely the byte-size (you can almost always assume 8-bit bytes - octets - though).
The below functions use a more complex version of the binary -> od -> binary trick to portably byteswap binary data, conditional on system endianness, and accounting for system word-size. The algorithm works for anything up to 72-bit word size (because seq -s '' 10 -> 12345678910 doesn't work):
if { sed --version 2>/dev/null || :; } | head -n 1 | grep -q 'GNU sed'; then
_sed() { sed -r "${#}"; }
else
_sed() { sed -E "${#}"; }
fi
sys_bigendian() {
return $(
printf 'I' | od -t o2 | head -n 1 | \
_sed -e 's/^[^ \t]+[ \t]+([^ \t]+)[ \t]*$/\1/' | cut -c 6
)
}
sys_word_size() { expr $(getconf LONG_BIT) / 8; }
byte_swap() {
_wordsize=$1
od -An -v -t o1 | _sed -e 's/^[ \t]+//' | tr -s ' ' '\n' | \
paste -d '\\' $(for _cnt in $(seq $_wordsize); do printf -- '- '; done) | \
_sed -e 's/^/\\/' -e '$ s/\\+$//' | \
while read -r _word; do
_thissize=$(expr $(printf '%s' "$_word" | wc -c) / 4)
printf '%s' "$(seq -s '' $_thissize)" | tr -d '\n' | \
tr "$(seq -s '' $_thissize -1 1)" "$_word"
done
unset _wordsize _prefix _word _thissize
}
You can use the above to output file contents in big-endian format regardless of system endianness:
if sys_bigendian; then
cat /bin/sh
else
cat /bin/sh | byte_swap $(sys_word_size)
fi
Here is the way to reverse "od" output:
echo "test" | od -A x -t x1 | sed -e 's|^[0-f]* ?||g' | xxd -r
test

Resources