How to find specific file types and tar them? - linux

It seems I've got a problem. I've got some different file types in my current directory, and I want to just tar the .png files. I started with this:
find -name "*.png" | tar -cvf backupp.tar
It wouldn't work because I didn't specify which files, so looking on how others did it, I added xargs:
find -name "*.png" | xargs tar -cvf backupp.tar
It did work this time, and backupp.tar file was created, but here is the problem. I can't seem to extract it. Whenever I type:
tar -xvf backupp.tar
Nothings happens. I've tried changing chmod and sudo, but nothing gives in.
So, did I type the wrong command completely or is there somethings I just missed?

tar expects a list of names as arguments. Your use of xargs can be improved by adding the -print0 option to find and adding the -0 option to xargs to insure find is providing filenames separated by a nul-character and that xargs is processing a list of filenames separated by the same. This prevents any whitespace or other stray characters in the filenames from causing problems, e.g.
find dir -type f -name "*.png" -print0 | xargs -0 tar -cf tarfile.tar
The above will find all files in or below dir matching name "*.png" and provide a list of filenames separated by the nul-character to xargs for use by tar. You can list the files contained in the resulting archive with:
tar -tf tarfile.tar
Consider using compression (if wanted) by adding the z (gzipped) j (bzip2) or J (xz compression) and the appropriate extension to reduce you archive size. e.g.
... | xargs -0 tar -czf tarfile.tar.gz

Related

script to tar up multiple log files separately

I'm on a RedHat Linux 6 machine, running Elasticsearch and Logstash. I have a bunch of log files that were rotated daily from back in June til August. I am trying to figure out the best way to tar them up to save some diskspace, without manually taring up each one. I'm a bit of a newbie at scripting, so I was wondering if someone could help me out? The files have the name elasticsearch-cluster.log.datestamp. Ideally they would all be in their individual tar files, so that it'd be easier to go back and take a look at that particular day's logs if needed.
You could use a loop :
for file in elasticsearch-cluster.log.*
do
tar zcvf "$file".tar.gz "$file"
done
Or if you prefer a one-liner (this is recursive):
find . -name 'elasticsearch-cluster.log.*' -print0 | xargs -0 -I {} tar zcvf {}.tar.gz {}
or as #chepner mentions with the -exec option:
find . -name 'elasticsearch-cluster.log.*' -exec tar zcvf {}.tar.gz {} \;
or if want to exclude already zipped files:
find . -name 'elasticsearch-cluster.log.*' -not -name '*.tar.gz' -exec tar zcvf {}.tar.gz {} \;
If you don't mind all the files being in a single tar.gz file, you can do:
tar zcvf backups.tar.gz elasticsearch-cluster.log.*
All these commands leave the original files in place. After you validate the tar.gz files, you can delete them manually.

Find and tar for each file in Linux

I have a list of files with different modification times, 1_raw,2_raw,3_raw... I want to find files that are modified more than 10 days ago and zip them to release disk space. However, the command:
find . -mtime +10 |xargs tar -cvzf backup.tar.gz
will create a new file backup.tar.gz
What I want is to create a tarball for each file, so that I can easily unzip each of them when needed. After the command, my files should become: 1_raw.tar.gz, 2_raw.tar.gz, 3_raw.tar.gz...
Is there anyway to do this? Thanks!
Something like this is what you are after:
find . -mtime +10 -type f -print0 | while IFS= read -r -d '' file; do
tar -cvzf "${file}.tar.gz" "$file"
done
The -type f was added so that it doesn't also process directories, just files.
This adds a compressed archive of each file that was modified more than 10 days ago, in all subdirectories, and places the compressed archive next to its respective unarchived version (in the same folder). I assume this is what you wanted.
If you didn't need to handle whitespaces in the path, you could do with simply:
for f in $(find . -mtime +10 -type f) ; do
tar -cvzf "${f}.tar.gz" "$f"
done
Simply, try this
$ find . -mtime +10 | xargs -I {} tar czvf {}.tar.gz {}
Here, {} indicates replace-str
-I replace-str
Replace occurrences of replace-str in the initial-arguments with names read from standard input. Also, unquoted blanks do not terminate input items; instead the separator is the newline character. Implies -x and -L 1.
https://linux.die.net/man/1/xargs

Tar'ing using wildcards where one type may not exist

I have a shell script to automate the creation of separate tar files for several directories; cd'ing to each and calling the command:
tar cf pakage1.tar *.csv *.fmt
Most directories contain .fmt and .csv files, I need a solution for when a *.csv may not exist but *.fmt does and therefore a tar is required. I haven't found an 'ignore wildcard if not found' command, does one exist?
Thankyou in advance.
Use find in combination with xargs:
find . \( -name '*.csv' -or -name '*.fmt' \) -print0 | xargs -0 tar cf pakage1.tar
-print0 and -0 to use null-separators instead of spaces otherwise it will choke on filenames with spaces in them.

Unzipping from a folder of unknown name?

I have a bunch of zip files, and I'm trying to make a bash script to automate the unzipping of certain files from it.
Things is, although I know the name of the file I want, I don't know the name of the folder it's in; it is one folder depth in
How can I extract these files, preferably discarding the folder?
Here's how to unzip any given file at any depth and junk the folder paths on the way out:
unzip -j somezip.zip *somefile.txt
The -j junks any folder structure in the zip file and the asterisk gives a wildcard to match along any path.
if you're in:
some_directory/
and the zip files are in any number of subdirectories, say:
some_directory/foo
find ./ -name myfile.zip -exec unzip {} -d /directory \;
Edit: As for the second part, removing the directory that contained the zip file I assume?
find ./ -name myfile.zip -exec unzip {} -d /directory \; -exec echo rm -rf `dirname {}` \;
Notice the "echo." That's a sanity check. I always echo first when executing something destructive like rm -rf in a loop/iterative sequence like this. Good luck!
Have you tried unzip somefile.zip "*/blah.txt"?
You can use find to find the file that you need to unzip, and xargs to call unzip:
find /path/to/root/ -name 'zipname.zip' -print0 | xargs -0 unzip
print0 enables the command to work with files or paths that have white space in them. -0 is the option to xargs that makes it work with print0.

Find files and tar them (with spaces)

Alright, so simple problem here. I'm working on a simple back up code. It works fine except if the files have spaces in them. This is how I'm finding files and adding them to a tar archive:
find . -type f | xargs tar -czvf backup.tar.gz
The problem is when the file has a space in the name because tar thinks that it's a folder. Basically is there a way I can add quotes around the results from find? Or a different way to fix this?
Use this:
find . -type f -print0 | tar -czvf backup.tar.gz --null -T -
It will:
deal with files with spaces, newlines, leading dashes, and other funniness
handle an unlimited number of files
won't repeatedly overwrite your backup.tar.gz like using tar -c with xargs will do when you have a large number of files
Also see:
GNU tar manual
How can I build a tar from stdin?, search for null
There could be another way to achieve what you want. Basically,
Use the find command to output path to whatever files you're looking for. Redirect stdout to a filename of your choosing.
Then tar with the -T option which allows it to take a list of file locations (the one you just created with find!)
find . -name "*.whatever" > yourListOfFiles
tar -cvf yourfile.tar -T yourListOfFiles
Try running:
find . -type f | xargs -d "\n" tar -czvf backup.tar.gz
Why not:
tar czvf backup.tar.gz *
Sure it's clever to use find and then xargs, but you're doing it the hard way.
Update: Porges has commented with a find-option that I think is a better answer than my answer, or the other one: find -print0 ... | xargs -0 ....
If you have multiple files or directories and you want to zip them into independent *.gz file you can do this. Optional -type f -atime
find -name "httpd-log*.txt" -type f -mtime +1 -exec tar -vzcf {}.gz {} \;
This will compress
httpd-log01.txt
httpd-log02.txt
to
httpd-log01.txt.gz
httpd-log02.txt.gz
Would add a comment to #Steve Kehlet post but need 50 rep (RIP).
For anyone that has found this post through numerous googling, I found a way to not only find specific files given a time range, but also NOT include the relative paths OR whitespaces that would cause tarring errors. (THANK YOU SO MUCH STEVE.)
find . -name "*.pdf" -type f -mtime 0 -printf "%f\0" | tar -czvf /dir/zip.tar.gz --null -T -
. relative directory
-name "*.pdf" look for pdfs (or any file type)
-type f type to look for is a file
-mtime 0 look for files created in last 24 hours
-printf "%f\0" Regular -print0 OR -printf "%f" did NOT work for me. From man pages:
This quoting is performed in the same way as for GNU ls. This is not the same quoting mechanism as the one used for -ls and -fls. If you are able to decide what format to use for the output of find then it is normally better to use '\0' as a terminator than to use newline, as file names can contain white space and newline characters.
-czvf create archive, filter the archive through gzip , verbosely list files processed, archive name
Edit 2019-08-14:
I would like to add, that I was also able to use essentially use the same command in my comment, just using tar itself:
tar -czvf /archiveDir/test.tar.gz --newer-mtime=0 --ignore-failed-read *.pdf
Needed --ignore-failed-read in-case there were no new PDFs for today.
Why not give something like this a try: tar cvf scala.tar `find src -name *.scala`
Another solution as seen here:
find var/log/ -iname "anaconda.*" -exec tar -cvzf file.tar.gz {} +
The best solution seem to be to create a file list and then archive files because you can use other sources and do something else with the list.
For example this allows using the list to calculate size of the files being archived:
#!/bin/sh
backupFileName="backup-big-$(date +"%Y%m%d-%H%M")"
backupRoot="/var/www"
backupOutPath=""
archivePath=$backupOutPath$backupFileName.tar.gz
listOfFilesPath=$backupOutPath$backupFileName.filelist
#
# Make a list of files/directories to archive
#
echo "" > $listOfFilesPath
echo "${backupRoot}/uploads" >> $listOfFilesPath
echo "${backupRoot}/extra/user/data" >> $listOfFilesPath
find "${backupRoot}/drupal_root/sites/" -name "files" -type d >> $listOfFilesPath
#
# Size calculation
#
sizeForProgress=`
cat $listOfFilesPath | while read nextFile;do
if [ ! -z "$nextFile" ]; then
du -sb "$nextFile"
fi
done | awk '{size+=$1} END {print size}'
`
#
# Archive with progress
#
## simple with dump of all files currently archived
#tar -czvf $archivePath -T $listOfFilesPath
## progress bar
sizeForShow=$(($sizeForProgress/1024/1024))
echo -e "\nRunning backup [source files are $sizeForShow MiB]\n"
tar -cPp -T $listOfFilesPath | pv -s $sizeForProgress | gzip > $archivePath
Big warning on several of the solutions (and your own test) :
When you do : anything | xargs something
xargs will try to fit "as many arguments as possible" after "something", but then you may end up with multiple invocations of "something".
So your attempt: find ... | xargs tar czvf file.tgz
may end up overwriting "file.tgz" at each invocation of "tar" by xargs, and you end up with only the last invocation! (the chosen solution uses a GNU -T special parameter to avoid the problem, but not everyone has that GNU tar available)
You could do instead:
find . -type f -print0 | xargs -0 tar -rvf backup.tar
gzip backup.tar
Proof of the problem on cygwin:
$ mkdir test
$ cd test
$ seq 1 10000 | sed -e "s/^/long_filename_/" | xargs touch
# create the files
$ seq 1 10000 | sed -e "s/^/long_filename_/" | xargs tar czvf archive.tgz
# will invoke tar several time as it can'f fit 10000 long filenames into 1
$ tar tzvf archive.tgz | wc -l
60
# in my own machine, I end up with only the 60 last filenames,
# as the last invocation of tar by xargs overwrote the previous one(s)
# proper way to invoke tar: with -r (which append to an existing tar file, whereas c would overwrite it)
# caveat: you can't have it compressed (you can't add to a compressed archive)
$ seq 1 10000 | sed -e "s/^/long_filename_/" | xargs tar rvf archive.tar #-r, and without z
$ gzip archive.tar
$ tar tzvf archive.tar.gz | wc -l
10000
# we have all our files, despite xargs making several invocations of the tar command
Note: that behavior of xargs is a well know diccifulty, and it is also why, when someone wants to do :
find .... | xargs grep "regex"
they intead have to write it:
find ..... | xargs grep "regex" /dev/null
That way, even if the last invocation of grep by xargs appends only 1 filename, grep sees at least 2 filenames (as each time it has: /dev/null, where it won't find anything, and the filename(s) appended by xargs after it) and thus will always display the file names when something maches "regex". Otherwise you may end up with the last results showing matches without a filename in front.

Resources