How can i get md5sum for all files inside zip without extracting - linux

Is there any way to get md5sum for all/anyone file inside zip without extracting the zip?
I can extract needed files using unzip <.zip>
But i need to get md5sum without extracting the zip.

This may not be exactly what you are looking for but it will get you closer. You wouldn't be extracting the entire zip but extracting a file to pipe it to md5sum to get checksum. Without reading the contents of the file, md5sum won't be able to generate a hash.
Let's say you have 3 files with this MD5:
b1946ac92492d2347c6235b4d2611184 a.txt
591785b794601e212b260e25925636fd b.txt
6f5902ac237024bdd0c176cb93063dc4 c.txt
You zip them into a single file using zip final.zip a.txt b.txt c.txt
When you list the files, you see there are 3 files.
unzip -l final.zip
Archive: final.zip
Length Date Time Name
--------- ---------- ----- ----
6 2021-08-08 17:20 a.txt
6 2021-08-08 17:20 b.txt
12 2021-08-08 17:20 c.txt
--------- -------
24 3 files
To get MD5 of each of the files without extracting the entire zip, you can do this:
unzip -p final.zip a.txt | md5sum
b1946ac92492d2347c6235b4d2611184 -
unzip -p final.zip b.txt | md5sum
591785b794601e212b260e25925636fd -
unzip -p final.zip c.txt | md5sum
6f5902ac237024bdd0c176cb93063dc4 -
Alternative
You can do md5sum *.txt > checksums to get hash of all files and store them in a checksums file. Add that to the zip so you know the md5 of each of the file when the files were added to zipped.

For all files in a zip you can use this;
File='final.zip' ; unzip -lqq $File | while read L ; do unzip -p $File ${L##*[[:space:]]} | md5sum | sed "s/-/${L##*[[:space:]]}/" ; done
Gives;
b1946ac92492d2347c6235b4d2611184 a.txt
591785b794601e212b260e25925636fd b.txt
6f5902ac237024bdd0c176cb93063dc4 c.txt

Based on #MartinMann's answer, a version that works correctly no matter if the file names contain spaces or special characters:
ZIPFILE="final.zip"; unzip -Z1 "$ZIPFILE" | grep -v '/$' | while read L; do "$(unzip -p "$ZIPFILE" "$L" | md5sum | cut '-d ' -f1) $L" ; done
Gives:
b1946ac92492d2347c6235b4d2611184 a.txt
591785b794601e212b260e25925636fd b.txt
6f5902ac237024bdd0c176cb93063dc4 path/to/file with spaces.txt

Related

appending to a tar file in a loop

I have a directory, that has a maybe 6 files.
team1_t444444_jill.csv
team1_t444444_jill.csv
team1_t444444_jill.csv
team1_t999999_jill.csv
team1_t999999_jill.csv
team1_t111111_jill.csv
team1_t111111_jill.csv
I want to be able to tar each of the files based on their t number, so t444444 should have it's own tar file with all the corresponding csv's. t999999 should then have its own and so on... a total of three tar files should be created dynamically
for file in $bad_dir/*.csv; do
fbname=`basename "$file" | cut -d. -f1` #takes the pathfile off, only shows xxx_tyyyyy_zzz.csv
t_name=$(echo "$fbname" | cut -d_ -f2) #takes the remaning stuff off, only shows tyyyyy
#now i am stuck on how to create a tar file and send email
taredFile = ??? #no idea how to implement
(cat home/files/hello.txt; uuencode $taredFile $taredFile) | mail -s "Failed Files" $t_name#hotmail.com
The simplest edit of your script that should do what you want is likely something like this.
for file in $bad_dir/*.csv; do
fbname=`basename "$file" | cut -d. -f1` #takes the pathfile off, only shows xxx_tyyyyy_zzz.csv
t_name=$(echo "$fbname" | cut -d_ -f2) #takes the remaning stuff off, only shows tyyyyy
tarFile=$t_name-combined.tar
if [ ! -f "$tarFile" ]; then
tar -cf "$tarFile" *_${t_name}_*.csv
{ cat home/files/hello.txt; uuencode $tarFile $tarFile; } | mail -s "Failed Files" $t_name#hotmail.com
fi
done
Use a tar file name based on the unique bit of the input file names. Then check for that file existing before creating it and sending email (protects against creating the file more than once and sending email more than once).
Use the fact that the files are globbable to get tar to archive them all from the first one we see.
You'll also notice that I replaced (commands) with { commands; } in the pipeline. The () force a sub-shell but so does the pipe itself so there's no reason (in this case) to force an extra sub-shell manually just for the grouping effect.
This is what you want:
for i in `find | cut -d. -f2 | cut -d_ -f1,2 | sort | uniq`;
do
tar -zvcf $i.tgz $i*
# mail the $i.tgz file
done
Take a look on my run:
$ for i in `find | cut -d. -f2 | cut -d_ -f1,2 | sort | uniq`; do tar -zvcf $i.tgz $i*; done
team1_t111111_jill.csv
team1_t111111_jxx.csv
team1_t111111.tgz
team1_t444444_j123.csv
team1_t444444_j444.csv
team1_t444444_jill.csv
team1_t444444.tgz
team1_t999999_jill.csv
team1_t999999_jilx.csv
team1_t999999.tgz
ubuntu#ubuntu1504:/tmp/foo$ ls
team1_t111111_jill.csv team1_t111111.tgz team1_t444444_j444.csv team1_t444444.tgz team1_t999999_jilx.csv
team1_t111111_jxx.csv team1_t444444_j123.csv team1_t444444_jill.csv team1_t999999_jill.csv team1_t999999.tgz

Merge sort gzipped files

I have 40 files of 2GB each, stored on an NFS architecture. Each file contains two columns: a numeric id and a text field. Each file is already sorted and gzipped.
How can I merge all of these files so that the resulting output is also sorted?
I know sort -m -k 1 should do the trick for uncompressed files, but I don't know how to do it directly with the compressed ones.
PS: I don't want the simple solution of uncompressing the files into disk, merging them, and compressing again, as I don't have sufficient disk space for that.
This is a use case for process substitution. Say you have two files to sort, sorta.gz and sortb.gz. You can give the output of gunzip -c FILE.gz to sort for both of these files using the <(...) shell operator:
sort -m -k1 <(gunzip -c sorta.gz) <(gunzip -c sortb.gz) >sorted
Process substitution substitutes a command with a file name that represents the output of that command, and is typically implemented with either a named pipe or a /dev/fd/... special file.
For 40 files, you will want to create the command with that many process substitutions dynamically, and use eval to execute it:
cmd="sort -m -k1 "
for input in file1.gz file2.gz file3.gz ...; do
cmd="$cmd <(gunzip -c '$input')"
done
eval "$cmd" >sorted # or eval "$cmd" | gzip -c > sorted.gz
#!/bin/bash
FILES=file*.gz # list of your 40 gzip files
# (e.g. file1.gz ... file40.gz)
WORK1="merged.gz" # first temp file and the final file
WORK2="tempfile.gz" # second temp file
> "$WORK1" # create empty final file
> "$WORK2" # create empty temp file
gzip -qc "$WORK2" > "$WORK1" # compress content of empty second
# file to first temp file
for I in $FILES; do
echo current file: "$I"
sort -k 1 -m <(gunzip -c "$I") <(gunzip -c "$WORK1") | gzip -c > "$WORK2"
mv "$WORK2" "$WORK1"
done
Fill $FILES the easiest way with the list of your files with bash globbing (file*.gz) or with a list of 40 filenames (separated with white blanks). Your files in $FILES stay unchanged.
Finally, the 80 GB data are compressed in $WORK1. While processing this script no uncompressed data where written to disk.
Adding a differently flavoured multi-file merge within a single pipeline - it takes all (pre-sorted) files in $OUT/uniques, sort-merges them and compresses the output, lz4 is used due to it's speed:
find $OUT/uniques -name '*.lz4' |
awk '{print "<( <" $0 " lz4cat )"}' |
tr "\n" " " |
(echo -n sort -m -k3b -k2 " "; cat -; echo) |
bash |
lz4 \
> $OUT/uniques-merged.tsv.lz4
It is true there are zgrep and other common utilities that play with compressed files, but in this case you need to sort/merge uncompressed data and compress the result.

how to compare output of two ls in linux

So here is the task which I can't solve. I have a directory with .h files and a directory with .i files, which have the same names as the .h files. I want just by typing a command to have all .h files which are not found as .i files. It's not a hard problem, I can do it in some programming language, but I'm just curious how it will look like in cmd :). To be more specific here is the algo:
get file names without extensions from ls *.h
get file names without extensions from ls *.i
compare them
print all names from 1 that are not met in 2
Good luck!
diff \
<(ls dir.with.h | sed 's/\.h$//') \
<(ls dir.with.i | sed 's/\.i$//') \
| grep '$<' \
| cut -c3-
diff <(ls dir.with.h | sed 's/\.h$//') <(ls dir.with.i | sed 's/\.i$//') executes ls on the two directories, cuts off the extensions, and compares the two lists. Then grep '$<' finds the files that are only in the first listing, and cut -c3- cuts off the "< " characters that diff inserted.
ls ./dir_h/*.h | sed -r -n 's:.*dir_h/([^.]*).h$:dir_i/\1.i:p' | xargs ls 2>&1 | \
grep "No such file or directory" | awk '{print $4}' | sed -n -r 's:dir_i/([^:]*).*:dir_h/\1:p'
ls -1 dir1/*.hh dir2/*.ii | awk -F"/" '{print $NF}' |awk -F"." '{a[$1]++;b[$0]}END{for(i in a)if(a[i]==1 && b[i".hh"]) print i}'
explanation:
ls -1 dir1/*.hh dir2/*.ii
above will list all the files *.hh and *.ii files in both the directories.
awk -F"/" '{print $NF}'
above will just print the file name excluding the complete path of the file.
awk -F"." '{a[$1]++;b[$0]}END{for(i in a)if(a[i]==1 && b[i".hh"]) print i}'
above will create two associative arrays one with file name and one with excluding the extension.
if both hh and ii files exist the value in the assosciative array will 2 if there is only one file then the value will be 1.so we need array item whose value is 1 and it should be a header file (.hh).
this can be checked using the asso..array b which is done in the END block.
Assuming bash is your shell:
for file in $( ls dir_with_h/*.h ); do
name=${file%\.h}; # trim trailing ".h" file extension
name=${name#dir_with_h/}; # trim leading folder name
if [ ! -e dir_with_i/${name}.i ]; then
echo ${name};
fi
done
Undoubtedly this can be ported to virtually all other shells. I find this less cryptic than some other approaches (although this is surely my problem) but it is a little wordy. As such. a shell script might help recall it.

Linux: cat matching files in date order?

I have a few files in a directory with names similar to
_system1.log
_system2.log
_system3.log
other.log
but they are not created in that order.
Is there a simple, non-hardcoded, way to cat the files starting with the underscore in date order?
Quick 'n' dirty:
cat `ls -t _system*.log`
Safer:
ls -1t _system*.log | xargs -d'\n' cat
Use ls:
ls -1t | xargs cat
ls -1 | xargs cat
You can concatenate and also store them in a single file according to their time of creation and also you can specify the files which you want to concatenate. Here, I find it very useful. The following command will concatenate the files which are arranged according to their time of creaction and have common string 'xyz' in their file name and store all of them in outputfile.
cat $(ls -t |grep xyz)>outputfile

wc gzipped files?

I have a directory with both uncompressed and gzipped files and want to run wc -l on this directory. wc will provide a line count value for the compressed files which is not accurate (since it seems to count newlines in the gzipped version of the file). Is there a way to create a zwc script similar to zgrep that will detected the gzipped files and count the uncompressed lines?
Try this zwc script:
#! /bin/bash --
for F in "$#"; do
echo "$(zcat -f <"$F" | wc -l) $F"
done
You can use zgrep to count lines as well (or rather the beginning of lines)
zgrep -c ^ file.txt
I use too "cat file_name | gzip -d | wc -l"

Resources