bash - hash binary contents of variable without creating a file

bash - hash binary contents of variable without creating a file - linux

I am trying to obtain the hash of the contents stored in a variable created from a curl statement without outputting the curl into a file.
Basically I am trying to avoid:
curl website.com/file.exe >> tempfile.bin
md5sum tempfile.bin
The above provides the correct md5 hash. My approach (below) seemed to have work when testing plain text files, however when I attempted to download a binary and save it to a variable the hash was different than when I saved it to a file.
My attempt:
binary=$(curl website.com/file.exe)
echo $binary | md5sum
I think I may be missing a flag, or perhaps echo may not be the best way to do it. The important part of this challenge is not writing a file to disk, yet achieving the same md5 hash as if it were written to disk.

To also skip the step of using a temp variable, you can use process substitution:
md5sum <(curl website.com/file.exe)
or pipe to md5sum directly:
curl website.com/file.exe | md5sum

The bash shell doesn't handle raw binary data well, as you've experienced. To accomplish your goal, you will need to encode the file contents into a text format when you read them into the bash variable, and decode them when you write them out.
For instance, if you have the base64 tool, you can use it to re-implement your example like this:
encoded=$(curl website.com/file.exe | base64)
echo "$encoded" | base64 --decode | md5sum
If you later want to save the data to a file named $output, you could do it like this:
echo "$encoded" | base64 --decode -o "$output"

Related

read the first line of a text file with JQ

Trying to see how I can read the first line of a text file using jq
I have a text file with a bunch of ids (newfile.txt )
5584157003
5584158003
5584159003
5584160003
id like to be able to just read the first line with jq.
I tried doing this
cat newfile.txt | jq '.[0]'
But getting an error of
jq: error (at <stdin>:482): Cannot index number with number
I'd like to be able to read line by line so that I can eventually run a look with that ID and be able to do stuff with it. Any ideas?

Use the -R argument (aka --raw-input) to tell jq that it's receiving input as strings rather than JSON, and use input to read only a single item at a time. Thus:
jq -Rn input <yourfile
...will output:
"5584157003"
If you want to convert it to a number, that's what tonumber is for:
jq -Rn 'input | tonumber' <yourfile
...which will output:
5584157003

is there a way to specifically retrieve a line number? for example line 3?
If no transformation need be done, then using sed would probably be the simplest, efficient approach; if a simple transformation is required, then besides sed, awk might be worth considering, but jq might also be worth considering under certain circumstances.
In particular, if efficiency is a consideration, then it would make sense to use jq's nth filter, along the lines of:
jq --argjson n 3 -nR 'nth($n - 1; inputs)' newfile.txt
This approach will avoid reading lines beyond the specified one.
(nth counts from 0.)
You might also want to use jq's -r option.

what I am ultimately trying to do is fetch each line number from this file one by one and make an api call using bash.
For a straightforward task like that, you could simply use bash's read, along the lines of
while IFS= read -r line ; do ... done < newfile.txt
If any kind of transformation of the input lines needs to be done, however, jq might be appropriate, e.g. if the lines must be URL-encoded. This could be done using inputs in conjunction with jq's -n and -R command-line options, along the lines of:
while IFS= read -r line ; do
...
done < <(jq -Rrn 'inputs|#uri' newfile.txt)

How to use functions in bash? [duplicate]

This question already has an answer here:
Compute base64 encoded hash from a given hash?
(1 answer)
Closed 5 years ago.
I have a file which has hashes of the files against their filename.
For example,
fb7e0a4408e46fd5573ffb9e73aec021a9dcf426235c0ccfc37d2f5e09a68a23 /path/to/some/file
237e0a4408e46fe3573f239e73aec021a9dcf426235c023fc37d2f5e09a68a12 /path/to/another/file
... and so on...
I need the hash converted to base64 encoded format.
So I used combination of a bash function and awk.
Here is what I wrote,
#!/bin/sh
base64Encode() {
$1 | openssl base64 -A
}
awk ' { t = base64Encode $1; print t } ' file.txt
But it does not seem to work. I'm using hashdeep to generate the hash-list file and hashdeep does not support base64 encoded output. That is why I'm using openssl.
Any help or tips regarding this would be great!
Edit:
The given answers work but I'm having some other issue it seems.
Usually cat filename | openssl dgst -sha256 | openssl base64 -A gives a base64 encoded output for filename file which is absoutely correct,
and output from hashdeep matched output from cat filename | openssl dgst -sha256.
So, I thought of piping the output obtained from above step to openssl base64 -A for base64 output. But, still I get different values from actual result.
Although this might be suited for a separate question perhaps, but still I would appreciate any support on this.

Awk only:
$ awk '{ c="echo " $1 "|openssl base64 -A"
c | getline r
print r }' file
ZmI3ZTBhNDQwOGU0NmZkNTU3M2ZmYjllNzNhZWMwMjFhOWRjZjQyNjIzNWMwY2NmYzM3ZDJmNWUwOWE2OGEyMwo=
MjM3ZTBhNDQwOGU0NmZlMzU3M2YyMzllNzNhZWMwMjFhOWRjZjQyNjIzNWMwMjNmYzM3ZDJmNWUwOWE2OGExMgo=
For the tight one-liner version see #123's comment below.
... and #EdMorton's super-tight (read: super-proof) version.

Because you especially asking for how to use functions, I divided the problem to several small functions. It is a good practice in all (bigger) bash programs.
The basic rule is: functions behaves like any other commands:
you can redirect their input/output
you can call them with arguments
and like.
The best functions are like common unix executables, e.g. reads from stdin and prints to stdout. This allows you use them in pipelines too.
So, now the rewrite:
# function for create base64 - reads from stdin, writes to stdout
base64Encode() {
openssl base64 -A
}
# function for dealing with your file
# e.g. reads lines "hash path" and prints "base64 path"
convert_hashes() {
while read -r hash path; do
b64=$(base64Encode <<< "$hash")
echo "$b64 $path"
done
}
#the "main" program
convert_hashes < your_file.txt
output
ZmI3ZTBhNDQwOGU0NmZkNTU3M2ZmYjllNzNhZWMwMjFhOWRjZjQyNjIzNWMwY2NmYzM3ZDJmNWUwOWE2OGEyMwo= /path/to/some/file
MjM3ZTBhNDQwOGU0NmZlMzU3M2YyMzllNzNhZWMwMjFhOWRjZjQyNjIzNWMwMjNmYzM3ZDJmNWUwOWE2OGExMgo= /path/to/another/file
Yes, i know, i want only the base64 without the attached path. Ot course, you can modify the above convert_hashes and remove the path from the output, e.g. instead of the echo "$b64 $path" you could use the echo "$b64" and the output will be just the b64 string only - but youre loosing information in the function - which string belongs to which path - imho, not the best practice.
Therefore, you can leave the function as-is, and use another tool, for getting the first column - and only when needed - e.g. in the "main" program. This way you have designed a function for later more universal way.
convert_hashes < your_file.txt | cut -d ' ' -f1
output
ZmI3ZTBhNDQwOGU0NmZkNTU3M2ZmYjllNzNhZWMwMjFhOWRjZjQyNjIzNWMwY2NmYzM3ZDJmNWUwOWE2OGEyMwo=
MjM3ZTBhNDQwOGU0NmZlMzU3M2YyMzllNzNhZWMwMjFhOWRjZjQyNjIzNWMwMjNmYzM3ZDJmNWUwOWE2OGExMgo=
Now imagine, that you extending the script, and want not use files, but the input is coming from another program: Let simulate this with the following get_data function (of course, in the real app it will do something other, not just cat:
get_data() {
cat <<EOF
fb7e0a4408e46fd5573ffb9e73aec021a9dcf426235c0ccfc37d2f5e09a68a23 /path/to/some/file
237e0a4408e46fe3573f239e73aec021a9dcf426235c023fc37d2f5e09a68a12 /path/to/another/file
EOF
}
now you can use the all above as:
get_data | convert_hashes
the output will be the same as above.
of course, you can do something with the output too, let say
get_data | convert_hashes | grep another/file | cut -d ' ' -f1
MjM3ZTBhNDQwOGU0NmZlMzU3M2YyMzllNzNhZWMwMjFhOWRjZjQyNjIzNWMwMjNmYzM3ZDJmNWUwOWE2OGExMgo=
Of course, if you have such "modular" structure, you can easily replace any parts, without need touch the other parts, let say going to replace the openssl with the base64 command.
base64Encode() {
base64
}
And everything will continue work, without any other changes. Of course, in real app is (probably) pointless to have function which calls only one program - but I especially doing this because you asked about the functions.
Otherwise, the above could be done in simple:
while read -r hash path; do
openssl base64 -A <<<"$hash"
echo
#or echo $(openssl base64 -A <<<"$hash")
#or printf "%s\n" $(openssl base64 -A <<<"$hash")
done < your_file.txt
or even
cut -d ' ' -f1 base | xargs -I% -n1 bash -c 'echo $(openssl base64 -A <<<"%")'
You need the echo or print because the openssl doesn't prints newlines by default. Output:
ZmI3ZTBhNDQwOGU0NmZkNTU3M2ZmYjllNzNhZWMwMjFhOWRjZjQyNjIzNWMwY2NmYzM3ZDJmNWUwOWE2OGEyMwo=
MjM3ZTBhNDQwOGU0NmZlMzU3M2YyMzllNzNhZWMwMjFhOWRjZjQyNjIzNWMwMjNmYzM3ZDJmNWUwOWE2OGExMgo=
Ps: to be honest, i do not understand why do you need base64 encode some already encoded hash - but YMMV. :)

How can I run a command on all files in a directory and mv to a different the ones that get an output that contains 'Cannot read TIFF header'?

I'd like to remove all bad tiffs from out of a very large directory. The commandline tool "tiffinfo" makes it easy to identify them:
tiffinfo -D *
This will have an ouput like this:
00074000/74986.TIF: Cannot read TIFF header.
if the tiff file is corrupt. If this happens I'd like to take the file and move it to a different dirrectory: bad_images. I tried using awk on this, but it hasn't worked so far...
Thanks!

Assuming the "Cannot read TIFF header" error comes on standard error, and assuming tiffinfo outputs other data on standard out which you don't want, then:
cd /path/to/tiffs
for file in `tiffinfo -D * 2>&1 >/dev/null | cut -f1 -d:`
do
echo mv $file /path/to/bad_images
done
Remove the echo to actually move the files, once satisfied that the script will work as expected.

Retrieve URL components using bash

I have a massive list of URLs in a text file, which I'd like to download using wget. This seems simple enough:
#!/bin/bash
cat list.txt | \
while read CMD; do
wget $CMD; done;
However, wget uses the basename of the file as the download location, which results in renaming schemes, such as file.txt.1, file.txt.2 and so on.
An $URL can look like this:
http://sub.domain.com/some/folder/to/file.txt
Where http://sub.domain.com/some/ is always the same. Now, in JS I would do $URL.split("http://sub.domain.com/some/")[1], but this doesn't quite seem to work in Bash:
IFS="http://sub.domain.com/some/" read -a url <<< "http://sub.domain.com/some/folder/to/file.txt"
echo "${url[1]}"; // always empty.

Use the shell's parameter expansion operator to remove the prefix:
base=${CMD#http://sub.domain.com/some/}
BTW, you should get out of the habit of using all-uppercase variable names in shell scripts. These are conventionally used for environment variables.

If the length of the prefix is static you could do the following:
#!/bin/bash
while read line
do
suffix=${line:${#line} - LENGTH}
wget $line -O $suffix
done < "list.txt"

how to convert filename.bz2.gz file to filename.gz

I have a bunch of files with filename.bz2.gz which I want to convert to filename.gz.
any help ?
thanks

Having your filename *.bz2.gz I assume the file had been created using the following order of compressions:
echo test | bzip2 | gzip -f > file.bz2.gz
Meaning it is a gzipped bzip2 file (for whatever reason). If my assumption is correct you can change it's compression to gzip-only, using the following commands:
gunzip < file.bz2.gz | bunzip2 | gzip > file.gz

If you just want to rename then do this.
for i in `ls|awk -F. '{print $1}'`
do
mv "$i".bz2.gz "$i".gz
done

I would refine Ajit's solution in this way:
for i in *.bz2.gz; do
i=${i%.bz2.gz}
mv "$i.bz2.gz" "$i.gz"
done
Using a glob rather than command subsitution avoids problems with word-splitting for filenames with whitespace. It also avoids the extra ls process, which is marginally more efficient, particularly on platforms like Cygwin with slow process forking. For the same reason, the awk command can be replaced with the ${parameter%[word]} parameter expansion syntax. (Quoting style of "$i".gz vs "$i.gz" makes no difference and is just personal preference.)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

bash - hash binary contents of variable without creating a file - linux

To also skip the step of using a temp variable, you can use process substitution: md5sum <(curl website.com/file.exe) or pipe to md5sum directly: curl website.com/file.exe | md5sum

Related

read the first line of a text file with JQ

How to use functions in bash? [duplicate]

How can I run a command on all files in a directory and mv to a different the ones that get an output that contains 'Cannot read TIFF header'?

Retrieve URL components using bash

how to convert filename.bz2.gz file to filename.gz

Categories

Resources