uncompressing a large number of files on the fly - linux

I have a script that I need to run on a large number of files with the extension **.tar.gz*.
Instead of uncompressing them and then running the script, I want to be able to uncompress them as I run the command and then work on the uncompressed folder, all with a single command.
I think a pipe is a good solution for this but i haven't used it before. How would I do this?

The -v orders tar to print filenames as it extracts each file:
tar -xzvf file.tar.gz | xargs -I {} -d\\n myscript "{}"
This way the script will contain commands to deal with a single file, passed as a parameter (thanks to xargs) to your script ($1 in the script context).
Edit: the -I {} -d\\n part will make it work with spaces in filenames.

The following three lines of bash...
for archive in *.tar.gz; do
tar zxvf "${archive}" 2>&1 | sed -e 's!x \([^/]*\)/.*!\1!' | sort -u | xargs some_script.sh
done
...will iterate over each gzipped tarball in the current directory, decompress it, grab the top-most directories of the decompressed contents and pass those as arguments to somescript.sh. This probably uses more pipes than you were expecting but seems to do what you are asking for.
N.B: tar xf can only take one file per invocation.

You can use a for loop:
for file in *.tar.gz; do tar -xf "$file"; your commands here; done
Or expanded:
for file in *.tar.gz; do
tar -xf "$file"
# your commands here
done

Related

How to gunzip without delete?

How to unzip .gz all archives without delete in specified folder?
I tried: gunzip *.gz /folder
You can simply use -k option specified in man.
-k --keep
Keep (don't delete) input files during compression or decompression.
For example if you use the command
gunzip -k my_file.gz
my_file.gz will not be deleted after compression or decompression.
can be done with some bash scripting using gzip --stdout option and compute the uncompressed name with bash substitution:
for i in *.gz
do
echo unzipping $i to /folder/${i/.gz}
gunzip --stdout "$i" > "/folder/${i/.gz}"
done
the current implementation fails with filenames containing spaces
find /source/dir/ -type f -name '*.gz' \
-exec bash -c 'gunzip -c "$0" > "/folder/${0%.gz}"' {} \;
This command finds all *.gz files in /source/dir/ directory and executes a bash command for each file. The bash command uncompresses the file ($0) to the standard output and redirects the result to "/folder/${0%.gz}". The latter is the "/folder/" string concatenated with the filename without .gz extension. The ${0%.gz} expression removes shortest match for .gz from the back of the string (the path).
And the {} sequence is replaced with path of the next gzipped file. It is passed to the bash command as the first argument ($0).
Replace /source/dir/ with dot (.), if the source directory is the current directory.
use gzip to unzip to stdout and pipe that to your required file (as -k option doesn't work for me either
gzip -dc somefile.gz > outputfile.txt

Using 'tar' command in for loop

I know it is the basic question but please do help me .I compressed and archived my server log file using tar command
for i in server.log.2016-07-05 server.log.2016-07-06 ; do tar -zcvf server2.tar.gz $i; done
The output of the above loop is:
server.log.2016-07-05
server.log.2016-07-06
But while listing the file using tar -tvf server2.tar.gz the output obtained is:
rw-r--r-- root/root 663643914 2016-07-06 23:59 server.log.2016-07-06
i.e., I archived two files but only one file was displayed which means archive doesnt have both files right?? Please help on this.
I just tested with these two files but my folder has multiple files. Since I didn't get expected output I was not proceeded with all the files in my folder. The exact loop I am going to use is:
Previousmonth=$(date "+%b" --date '1 month ago')
for i in $(ls -l | awk '/'$Previousmonth'/ && /server.log./ {print $NF}');do;tar -zcvf server2.tar.gz $i;done
I am trying to compress and archive multiple files but while listing the files using tar -tvf it doesn't shows all the files.
You don't need a loop here. Just list all the files you want to add as command line parameter:
tar -zcvf server2.tar.gz server.log.2016-07-05 server.log.2016-07-06
The same goes for your other example too:
tar -zcvf server2.tar.gz $(ls -l | awk '/'$Previousmonth'/ && /server.log./ {print $NF}')
Except that parsing the output of ls -l is awful and strongly not recommended.
But since the filenames to backup contain the month number,
a much simpler and better solution is to get the year + month number using the date command, and then use shell globbing:
prefix=$(date +%Y-%m -d 'last month')
tar -zcvf server2.tar.gz server.log.$prefix-??

Untar multiple *.tar.gz.aa *.tar.gz.ab pattern files

I tarred a folder and split it into tar.gz files of 200mb when zipping. How can I go about unzipping them? Is there a way I can do this in one command or do I have to do each one separately?
You even cannot do it separately.
Just undo what you did in reversed order:
first concatenate them
then unzip them
then untar
So you do
cat *.tar.gz.* | zcat | tar xvf -
or, even shorter,
cat *.tar.gz.* | tar xvfz -
You can use the bellow :
$ cat *.tar | tar -xvf - -i
cat command, listed .tar files, then listed files will extracted with tar -xvf - -i command.

Extract files from tar file using wildcard

Can any one tell me how to extract a tar file using wildcards, for example
$ tar -xvf file1_*.tar dir1/
Thanks in advance
You can execute the following in the same dir as the tars.
for filename in ./file1_*.tar; do tar -xvf $filename -C ./dir1/; done
To extract multiple tar files in a single directory, try the following (from the directory containing the files):
ls file1_*.tar | xargs -I{} tar -xvf {} dir1/
The command lists the tar files using your pattern in the current directory, piping them to xargs, which will execute the tar command on each file using the pattern tar -xvf {filename} dir1/.
To see exactly what will be performed, modify the above command to
ls file1_*.tar | xargs -I{} echo tar -xvf {} dir1/
xargs is an incredibly powerful tool to learn how to use from the commandline where a single command needs to be performed on multiple inputs, and will often save you a lot of time.
This post also has another alternative.

how to rename files you put into a tar archive using linux 'tar'

I'm trying to create a tar archive with a couple files, but rename those files in the archive. Right now I have something like this:
tar -czvf file1 /some/path/to/file2 file3 etc
But I'd like to do something like:
tar -czvf file1=file1 /some/path/to/file2=file2 file3=path/to/renamedFile3 etc=etc
Where, when extracted into directory testDir, you would see the files:
testDir/file1
testDir/file2
testDir/path/to/renamedFile3
testDir/etc
How can I do this?
You can modify filenames (among other things) with --transform. For example, to create a tape archive /tmp/foo.tar, putting files /etc/profile and /etc/bash.bashrc into it while also renaming profile to foo, you can do the following:
tar --transform='flags=r;s|bar|foo|' -cf file.tar file1 file2 bar fubar /dir/*
Results of the above is that bar is added to file.tar as foo.
The r flag means transformations are applied to regular files only. For more information see GNU tar documentation.
You can use --transform multiple times, for example:
tar --transform='flags=r;s|foo|bar|' --transform='flags=r;s|baz|woz|' -cf file.tar /some/dir/where/foo/is /some/dir/where/baz/is /other/stuff/* /dir/too
With --transform, there's no need to make a temporary testDir first. To prepend testDir/ to everything in the archive, match the beginning anchor ^:
tar --transform "s|file3|path/to/renamedFile3|" \
--transform "flags=r;s|^|testDir/|" \
-czvf my_archive.tgz file1 /some/path/to/file2 file3 etc
The r flag is critical to keep the transform from breaking any symlink targets in the archive (which also match ^).
We can refer to the man tar, the -O option is the best choice since files can be written to standard out.
-O (x, t modes only) In extract (-x) mode, files will be written to
standard out rather than being extracted to disk. In list (-t)
mode, the file listing will be written to stderr rather than the
usual stdout.
here are the examples:
# 1. without -O
tar xzf 20170511162930.db.tar.gz
# result: 20170511162930.db
# 2. with -O
tar xzf 20170511162930.db.tar.gz -O > latest.db
# result: latest.db
After not liking any solution that I've found, I've just written tarlogs.py, which lets you specify arbitrary names for tar entries. Each tar entry is constructed from one (or several) regular (or gzipped) inputs. You can also add directories, which will be recursed into as with regular tar. So in your case,
tarlogs.py -o file1 -i /some/path/to/file2 -o file2 -i file3 -o path/to/renamedFile3 -o /etc >output.tar
(-o with no -i inputs simply uses the output path as input, with no renaming)
This question has been up for a while, but for anyone who's looking for another suitable solution:
I've created a fork of the original GNU tar utility with additional support for file name mapping.
Usage example:
> touch myfile.txt
> tar cf file.tar ':myfile.txt:dir/inside/tar/newname.txt'
> tar tvf file.tar
-rw-rw-r-- user/user 0 2022-02-12 14:27 dir/inside/tar/newname.txt
The feature is triggered by prefixing file names with a colon (:) as shown above. A second colon functions as a separator between the source file location and the desired file name inside the archive.
:[source file]:[desired name inside the tar]
This feature is compatible with the -T (input list from file) flag.
How to compile it
> git clone https://github.com/leso-kn/tar
> cd tar
> ./bootstrap
> ./configure
> make -j4
# Run it
> src/tar --version

Resources