Create ZIP of hundred thousand files based on date newer than one year on Linux - linux

I have a /folder with over a half million files created in the last 10 years. I'm restructuring the process so that in the future there are subfolders based on the year.
For now, I need to backup all files modified within the last year. I tried
zip -r /backup.zip $(find /folder -type f -mtime -365
but get error: Argument list too long.
Is there any alternative to get the files compressed and archived?

Zip has an option to read the filelist from stdin. Below is from the zip man page
-# file lists. If a file list is specified as -# [Not on MacOS],
zip takes the list of input files from standard input instead of
from the command line. For example,
zip -# foo
will store the files listed one per line on stdin in foo.zip.
This should do what you need
find /folder -type f -mtime -365 | zip -# /backup.zip
Note that I've removed the -r option because it isn't doing anything - you are explicitly selecting standard files with the find command (-type f)

You'll have to switch from passing all the files at once to piping the files one at a time to the zip command.
find /folder -type f -mtime -365 | while read FILE;do zip -r /backup.zip $FILE;done

You can also work with the -exec parameter in find, like this:
find /folder -type f -mtime -365 -exec zip -r /backup.zip \;
(or whatever your command is). For every file, the given command is executed with the file passed as a last parameter.

Find the files and then execute the zip command on as many files as possible using + as opposed to ;
find /folder -type f -mtime -365 -exec zip -r /backup.zip '{}' +

Related

Bash Command for Finding the size of all files with particular filetype in a directory in ubuntu

I have a folder which contains several file types say .html,.php,.txt etc.. and it has sub folders also .Sub folders may contain all the file types mentioned above.
Question1:- I want to find size of all the files having the file type as '.html' which are there in both root directory and in sub- directories
Question2:- I want to find size of all the files having the file type as '.html' which are there only in root directory but not in sub folders.
I surfed through the internet but all i am able to get is commands like df -h, du -sh etc..
Are there any bash commands for the above questions? Any bash scripts?
You can use the find command for that.
#### Find the files recursively
find . -type f -iname "*.html"
#### Find the files on the r
find . -type f -maxdepth 1 -iname "*.iml"
Then, in order to get their size, you can use the -exec option like this:
find . -type f -iname "*.html" -exec ls -lha {} \;
And if you really only need the file size (I mean, without all the other stuff that ls prints):
find . -type f -iname "*.html" -exec stat -c "%s" {} \;
Explanation:
iname search of files without being case sensitive
maxdepth travels subdirectories recursively up to the specify level (1 means only the immediate folder)
exec executes an arbitrary command using the found paths, where "{}" represents the path of the file
type indicates the type of file (a directory is a file in Linux)

Find and tar for each file in Linux

I have a list of files with different modification times, 1_raw,2_raw,3_raw... I want to find files that are modified more than 10 days ago and zip them to release disk space. However, the command:
find . -mtime +10 |xargs tar -cvzf backup.tar.gz
will create a new file backup.tar.gz
What I want is to create a tarball for each file, so that I can easily unzip each of them when needed. After the command, my files should become: 1_raw.tar.gz, 2_raw.tar.gz, 3_raw.tar.gz...
Is there anyway to do this? Thanks!
Something like this is what you are after:
find . -mtime +10 -type f -print0 | while IFS= read -r -d '' file; do
tar -cvzf "${file}.tar.gz" "$file"
done
The -type f was added so that it doesn't also process directories, just files.
This adds a compressed archive of each file that was modified more than 10 days ago, in all subdirectories, and places the compressed archive next to its respective unarchived version (in the same folder). I assume this is what you wanted.
If you didn't need to handle whitespaces in the path, you could do with simply:
for f in $(find . -mtime +10 -type f) ; do
tar -cvzf "${f}.tar.gz" "$f"
done
Simply, try this
$ find . -mtime +10 | xargs -I {} tar czvf {}.tar.gz {}
Here, {} indicates replace-str
-I replace-str
Replace occurrences of replace-str in the initial-arguments with names read from standard input. Also, unquoted blanks do not terminate input items; instead the separator is the newline character. Implies -x and -L 1.
https://linux.die.net/man/1/xargs

7zip archieving files that are newer than a specific date

I create 7zip files like this from command line in Linux:
# 7za a /backup/files.7z /myfolder
After that I want to create another zip file that includes all files inside /myfolder that are newer then dd-mm-YY.
Is it possible to archieve files with respect to file's last change time ?
(I don't want to update "files.7z" file I need to create another zip file that includes only new files)
The proposal by Gooseman:
# find myfolder -mtime -10 -exec 7za a /backup/newfile.7z {} \;
adds all files of each directory tree which got new files since the directory is also new and then adds all new files just archived again.
The following includes only new files but does not store the path names in the archive:
# find myfolder -type f -mtime -10 -exec 7za a /backup/newfile.7z {} \;
This stores only new files — with path names:
# find myfolder -type f -mtime -10 > /tmp/list.txt
# tar -cvf /tmp/newfile.tar -T /tmp/list.txt
# 7za a /backup/newfile.7z /tmp/newfile.tar
You could try this command:
find myfolder -mtime -10 -exec 7za a /backup/newfile.7z {} \;
In order to find the number to use by the mtime option you could use some of these answers:
How to find the difference in days between two dates? In your case it would be the difference between the current date and your custom dd-mm-YY (in my example dd-mm-YY is 10 days back from now)
From man find:
-n for less than n
-mtime n
File's data was last modified n*24 hours ago. See the comments for -atime to understand how rounding affects the interpretation of file modification times.

Zip together all HTML files under current directory

I am looking to zip together *.html files recursively under the current directory.
My current command is:
zip all-html-files.zip *.html
But this doesn't work recursively. Nor does adding the -r option it seems. Can anybody advise? I want to zip all html files under the current directory, including those underneath subdirectories, but zip the HTML files only, not their file folders.
Thanks!
What about this?
find /your/path/ -type f -name "*.html" | xargs zip all_html_files.zip
looks for all .html files under the directory /your/path (change it for yours). Then, pipes the result to xargs, which creates the zip file.
To junk the paths, add -j option:
find /your/path/ -type f -name "*.html" | xargs zip -j all_html_files.zip
find . -name "*.html" -print | zip all-html-files.zip -#
Try
find . -type f -name "*.html" | xargs zip all-html-files
You can also say
find . -type f -name "*.html" | zip all-html-files -#
If you do not want to preserve the directory structure, specify the -j option:
find . -type f -name "*.html" | zip -j all-html-files -#
man zip says:
-# file lists. If a file list is specified as -# [Not on MacOS], zip
takes the list of input files from standard input instead of from the
command line. For example,
zip -# foo
will store the files listed one per line on stdin in foo.zip.
Under Unix, this option can be used to powerful effect in conjunction
with the find (1) command. For example, to archive all the C source
files in the current directory and its subdirectories:
find . -name "*.[ch]" -print | zip source -#
(note that the pattern must be quoted to keep the shell from expanding
it).
-j
--junk-paths
Store just the name of a saved file (junk the path), and do not
store directory names. By default, zip will store the full path
(relative to the current directory).

grep files based on time stamp

This should be pretty simple, but I am not figuring it out. I have a large code base more than 4GB under Linux. A few header files and xml files are generated during build (using gnu make). If it matters the header files are generated based on xml files.
I want to search for a keyword in header file that was last modified after a time instance ( Its my start compile time), and similarly xml files, but separate grep queries.
If I run it on all possible header or xml files, it take a lot of time. Only those that were auto generated. Further the search has to be recursive, since there are a lot of directories and sub-directories.
You could use the find command:
find . -mtime 0 -type f
prints a list of all files (-type f) in and below the current directory (.) that were modified in the last 24 hours (-mtime 0, 1 would be 48h, 2 would be 72h, ...). Try
grep "pattern" $(find . -mtime 0 -type f)
To find 'pattern' in all files newer than some_file in the current directory and its sub-directories recursively:
find -newer some_file -type f -exec grep 'pattern' {} +
You could specify the timestamp directly in date -d format and use other find tests e.g., -name, -mmin.
The file list could also be generate by your build system if find is too slow.
More specific tools such as ack, etags, GCCSense might be used instead of grep.
Use this. Because if find doesn't return a file, then grep will keep waiting for an input halting the script.
find . -mtime 0 -type f | xargs grep "pattern"

Resources