How to gunzip without delete? - linux

How to unzip .gz all archives without delete in specified folder?
I tried: gunzip *.gz /folder

You can simply use -k option specified in man.
-k --keep
Keep (don't delete) input files during compression or decompression.
For example if you use the command
gunzip -k my_file.gz
my_file.gz will not be deleted after compression or decompression.

can be done with some bash scripting using gzip --stdout option and compute the uncompressed name with bash substitution:
for i in *.gz
do
echo unzipping $i to /folder/${i/.gz}
gunzip --stdout "$i" > "/folder/${i/.gz}"
done
the current implementation fails with filenames containing spaces

find /source/dir/ -type f -name '*.gz' \
-exec bash -c 'gunzip -c "$0" > "/folder/${0%.gz}"' {} \;
This command finds all *.gz files in /source/dir/ directory and executes a bash command for each file. The bash command uncompresses the file ($0) to the standard output and redirects the result to "/folder/${0%.gz}". The latter is the "/folder/" string concatenated with the filename without .gz extension. The ${0%.gz} expression removes shortest match for .gz from the back of the string (the path).
And the {} sequence is replaced with path of the next gzipped file. It is passed to the bash command as the first argument ($0).
Replace /source/dir/ with dot (.), if the source directory is the current directory.

use gzip to unzip to stdout and pipe that to your required file (as -k option doesn't work for me either
gzip -dc somefile.gz > outputfile.txt

Related

Select and looping over differences in two directories linux

I have bash script that that loops through files in the raw folder and puts them into the audio folder. This works just fine.
#!/bin/bash
PATH_IN=('/nas/data/customers/test2/raw/')
PATH_OUT=('/nas/data/customers/test2/audio/')
mkdir -p /nas/data/customers/test2/audio
IFS=$'\n'
find $PATH_IN -type f -name '*.wav' -exec basename {} \; | while read -r file; do
sox -S ${PATH_IN}${file} -e signed-integer ${PATH_OUT}${file}
done
My issue is that, as the folders grow I do not want to run the script on the files that has already been converted, so I would like to loop over only the files that has not been converted yet. I.e the files only in raw but not in audio.
I found the function
diff audio raw
That can I do just that, but I cannot find a good way to incorporate this into my bash script. Any help or nudges in the right direction would be highly appreciated.
You could do:
diff <(ls -1a $PATH_OUT) <(ls -1a $PATH_IN) | grep -E ">" | sed -E 's/> //'
The first part will diff the files on both folders, the second part will filter out to get only the additions, and the third one will clean the list from the diff symbols to get just the names.

Decompress .gz files from subfolders (recursively) into root folder

Here is the situation : I have a folder, containing a lot of subfolders, some of them containing .gz compressed files (NOT tar, just compressed text files). I want to recursively decompress all these .gz files into the root folder, but I can't figure out the exact way to do it.
I have my folders like that :
/folderX/subfolder1/file.gz
by using
gzip -c -d -r *.gz
I can probably extract all the files at once, but they will remain in their respective subfolders. I want them all in /folderX/
find -name *.gz
gives me the correct list of the files I am looking for, but I have no idea how to combine the 2 commands. Should I combine these commands in a script ? Or is there a functionality of gzip that I have missed allowing to decompress everything in the folder from which you are executing the command ?
Thanks for the help !
You can use a while..done loop that iterate the input:
find dirname -name *.gz|while read i; do gzip -c -d -r $i; done
You can also use xargs, with the additional benefit of dealing with spaces (" ") in the file name of parameter -0:
find dirname -name '*.gz' -print0 | xargs -0 -L 1 gzip -c -d -r
The "-print0" output all the files found separated by NULL character. The -0 switch of xargs rebuild the list parsing the NULL character and applies the "gzip..." command to each of them. Pay attention to the "-L 1" parameter which tells xargs to pass only ONE file at a time to gzip.

Extract and delete all .gz in a directory- Linux

I have a directory. It has about 500K .gz files.
How can I extract all .gz in that directory and delete the .gz files?
This should do it:
gunzip *.gz
#techedemic is correct but is missing '.' to mention the current directory, and this command go throught all subdirectories.
find . -name '*.gz' -exec gunzip '{}' \;
There's more than one way to do this obviously.
# This will find files recursively (you can limit it by using some 'find' parameters.
# see the man pages
# Final backslash required for exec example to work
find . -name '*.gz' -exec gunzip '{}' \;
# This will do it only in the current directory
for a in *.gz; do gunzip $a; done
I'm sure there's other ways as well, but this is probably the simplest.
And to remove it, just do a rm -rf *.gz in the applicable directory
Extract all gz files in current directory and its subdirectories:
find . -name "*.gz" | xargs gunzip
If you want to extract a single file use:
gunzip file.gz
It will extract the file and remove .gz file.
for foo in *.gz
do
tar xf "$foo"
rm "$foo"
done
Try:
ls -1 | grep -E "\.tar\.gz$" | xargs -n 1 tar xvfz
Then Try:
ls -1 | grep -E "\.tar\.gz$" | xargs -n 1 rm
This will untar all .tar.gz files in the current directory and then delete all the .tar.gz files. If you want an explanation, the "|" takes the stdout of the command before it, and uses that as the stdin of the command after it. Use "man command" w/o the quotes to figure out what those commands and arguments do. Or, you can research online.

Finding human-readable files on Unix

I'd like to find human-readable files on my Linux machine without a file extension constraint. Those files should be of human sensing files like text, configuration, HTML, source-code etc. files. Is there a way to filter and locate?
Use:
find /dir/to/search -type f | xargs file | grep text
find will give you a list of files.
xargs file will run the file command on each of the lines from the piped input.
find and file are your friends here:
find /dir/to/search -type f -exec sh -c 'file -b {} | grep text &>/dev/null' \; -print
This will find any files (NOTE: it will not find symlinks directories sockets, etc., only regular files) in /dir/to/search and run sh -c 'file -b {} | grep text &>/dev/null' ; which looks at the type of file and looks for text in the description. If this returns true (i.e., text is in the line) then it prints the filename.
NOTE: using the -b flag to file means that the filename is not printed and therefore cannot create any issues with the grep. E.g., without the -b flag the binary file gettext would erroneously be detected as a textfile.
For example,
root#osdevel-pete# find /bin -exec sh -c 'file -b {} | grep text &>/dev/null' \; -print
/bin/gunzip
/bin/svnshell.sh
/bin/unicode_stop
/bin/unicode_start
/bin/zcat
/bin/redhat_lsb_init
root#osdevel-pete# find /bin -type f -name *text*
/bin/gettext
If you want to look in compressed files use the --uncompress flag to file. For more information and flags to file see man file.
This should work fine, too:
file_info=`file "$file_name"` # First reading the file info string which should have the words "ASCII" or "Unicode" if it's a readable file
if grep -q -i -e "ASCII" -e "Unicode"<<< "$file_info"; then
echo "file is readable"
fi

uncompressing a large number of files on the fly

I have a script that I need to run on a large number of files with the extension **.tar.gz*.
Instead of uncompressing them and then running the script, I want to be able to uncompress them as I run the command and then work on the uncompressed folder, all with a single command.
I think a pipe is a good solution for this but i haven't used it before. How would I do this?
The -v orders tar to print filenames as it extracts each file:
tar -xzvf file.tar.gz | xargs -I {} -d\\n myscript "{}"
This way the script will contain commands to deal with a single file, passed as a parameter (thanks to xargs) to your script ($1 in the script context).
Edit: the -I {} -d\\n part will make it work with spaces in filenames.
The following three lines of bash...
for archive in *.tar.gz; do
tar zxvf "${archive}" 2>&1 | sed -e 's!x \([^/]*\)/.*!\1!' | sort -u | xargs some_script.sh
done
...will iterate over each gzipped tarball in the current directory, decompress it, grab the top-most directories of the decompressed contents and pass those as arguments to somescript.sh. This probably uses more pipes than you were expecting but seems to do what you are asking for.
N.B: tar xf can only take one file per invocation.
You can use a for loop:
for file in *.tar.gz; do tar -xf "$file"; your commands here; done
Or expanded:
for file in *.tar.gz; do
tar -xf "$file"
# your commands here
done

Resources