Script to zip complete file structure depending on file age

Script to zip complete file structure depending on file age - linux

Alright so i have a web server running CentOS at work that is hosting a few websites internally only. It's our developpement server and thus has lots [read tons] of old junk websites and whatnot.
I was trying to elaborate a command that would find files that haven't been modified for over 6 months, group them all in a tarball and then delete them. Thus far i have tried many different type of find commands with arguments and whatnot. Our structure looks like such
/var/www/joomla/username/fileshere/temp
/var/www/username/fileshere
So i tried something amongst the lines of :
find /var/www -mtime -900 ! -mtime -180 | xargs tar -cf test4.tar
Only to have a 10MB resulting tar, when the expected result would be over 50 GB's.
I tried using gzip instead, but i ended up zipping MY WHOLE SERVER thus making is unusable, had to transfer the whole filesystem and reinstall a complete new server and lots of shit and trouble and... you get the idea. So i want to find the perfect command that won't blow up our server but will find all FILES and DIRECTORIES that haven't been modified for over 6 months.

Be careful with ctime.
ctime is related to changes made to inodes (changing permissions, owner, etc)
atime when a file was last accessed (check if your file system is using noatime or relatime options, in that case the atime option may not work in the expected way)
mtime when data in a file was last modified.
Depending on what are you trying to do, the mtime option could be your best option.
Besides, you should check the print0 option. From man find:
-print0
True; print the full file name on the standard output, followed by a null character (instead of the newline character that -print uses). This allows file names that contain newlines or
other types of white space to be correctly interpreted by programs that process the find output. This option corresponds to the -0 option of xargs.
I do not know what are you trying to do but this command could be useful for you:
find /var/www -mtime +180 -print0 | xargs -0 tar -czf example.tar.gz

Try this:
find /var/www -ctime +180 | xargs tar cf test.tar
The ctime parameter tells you the difference between current time and each files modification times, and if you use the + instead of minus it will give you the "files modified in a date older than x days".
Then just pass it to tar with xargs and you should be set.

Related

How to find and replace an IP address in many archives in linux

Example:
find /tmp/example -type f -print0 | xargs -0 sed -i 's/10.20.1.110/10.10.1.40/g'
I need replace 10.20.1.110 to 10.10.1.40 in all archives inside /tmp/example.
But this command does not replace inside archives.

.xml, *.txt , *.py ..jy . This archives types.
These are not archives, but ordinary text file extensions; thus, if the sed command doesn't work for you, there must be another reason. It may be that the command is executed with insufficient priviledges - sed -i exits as soon as it cannot rename its temporary output file to the input file (as it's the case if the containing directory has the sticky bit t set and you don't own the file or the directory). Pay heed to error messages.

Retrieving the sub-directory, which had most recently been modified, in a Linux shell script?

How can I retrieve the sub-directory, which had most recently been modified, in a directory?
I am using a shell script on a Linux distribution (Ubuntu).

Sounds like you want the ls options
-t sort by modification time, newest first
And only show directories, use something like this answer suggests Listing only directories using ls in bash: An examination
ls -d */
And if you want each directory listed on one line (if your file/dirnames have no newlines or crazy characters) I'd add -1 So all together, this should list directories in the current directory, with the newest modified times at the top
ls -1td */
And only the single newest directory:
ls -1td */ | head -n 1
Or if you want to compare to a specific time you can use find and it's options like -cmin -cnewer -ctime -mmin -mtime and find can handle crazy names like newlines, spaces, etc with null terminated names options like -print0

How much the subdirectory is modified is irrelevant. Do you know the name of the subdirectory? Get its content like this:
files=$(ls subdir-name)
for file in ${files}; do
echo "I see there is a file named ${file}"
done

Zipping and deleting files with certain age

i'm trying to elaborate a command that will find files that haven't been modified in over 6 months and zip them with one command. Afterwards i want to delete all those files and i just archived.
My current command to find the directories with the files is
find /var/www -type d -mtime -400 ! -mtime -180 | xargs ls -l > testd.txt
This gave me all the directories including the files that are older than 6 months
Now i was wondering if there was a way of zipping all the results and deleting them afterwards. Something amongst the line of
find /var/www -type f -mtime -400 ! -mtime -180 | gzip -c archive.gz
If anyone knows the proper syntax to achieve this i'd love to know. Thakns!
Edit, after a few tests this command results in a corrupted file
find /var/www -mtime -900 ! -mtime -180 | xargs tar -cf test4.tar
Any ideas?

Break this into several distinct steps that you can implement and thoroughly test separately:
Build a list of files to be archived and then deleted, saved to a temp file
Use the list from step 1 to add the files to .tar.gz archives. Give the archive file a name following a specific pattern that won't appear in the files to be archived, and put it in a directory outside the hierarchy of files being archived.
Read back the files from the .tar.gz and compare them (or their hashes) to the original files to ENSURE that you got them all without corruption
Use the list from step 1 to delete the files. Do not use a wildcard for deletion. Put in some guard code to prevent deletion of any file matching the name pattern of the archive .tar.gz file(s) created in step 2.
When testing a script that can do irreversible damage, always code the dangerous command with a leading echo and leave it that way until you are sure everything works. Only then remove the echo.

Consider zip, it should meet your requirements.
find ... | zip -m# archive.zip
-m (move) deletes the input directories/files after making the specified zip archive.
-# takes the list of input files from standard input.
You may find more options which are useful to you in the zip manual, e. g.
-r (recurse) travels the directory structure recursively.
-sf (show-files) shows the files that would be operated on, then exits.
-t or --from-date operates on files not modified prior to the specified date.
-tt or --before-date operates on files not modified after or at the specified date.
This could possibly make findexpendable.
zip -mr --from-date 2012-09-05 --before-date 2013-04-13 archive /var/www

sed not working as expected, but only for directory depth greater than 1

I am trying to find all instances of a string in all files on my system up to a specified directory depth. I then want to replace these with another string and I am using 'find' and 'sed' by piping one into the other.
This works where I use the base path as cd /home/../.. or any other directory which isn't "/". It also only works if I select a directory depth of 1 (so /test.txt is changed, but /home/test.txt isn't) If I change nothing else and used say a depth of 2 or 3, neither /test.txt nor /home/text.txt are changed. In the former, no warnings appear, and in the latter, the results below (And no strings are replaced in either of the files).
Worryingly, it did work once out of the blue, but I have no idea how and I can't recreate the results. I should say I know the risks of using these commands with root from base directory, and the specific use of the programs below is intentional so I am not looking for an alternative way, just a clue as to how this isn't working and perhaps a suggestion on how to fix it.
cd /;find . -maxdepth 3 -type f -print0 | xargs -0 sed -i 's/teststring123/itworked/gI'
sed: couldn't open temporary file ./sys/kernel/sedoPGqGB: No such file or directory
sed: couldn't open temporary file ./proc/878/sedtqayiq: No such file or directory
As you see, there are warnings, but nether the less I would expect it to work, the commands appear good, anything I am missing folks?

This should be:
find / -maxdepth 3 -type f -print -exec sed -i -e 's/teststring123/itworked/g' {} \;
Although changing all files below / strikes me as a very bad idea indeed (I hope you're not running as root!).
The "couldn't open temporary file ./[...]" errors are likely to be because sed, running as your user, doesn't have permission to create files in /.
My version runs from your current working directory, I assume your ${HOME}, where you'll be able to create the temporary file, but you're still unlikely to be able to replace those files vital to the continued running of your operating system.

how do I check that two folders are the same in linux

I have moved a web site from one server to another and I copied the files using SCP
I now wish to check that all the files have been copied OK.
How do I compare the sites?
Count files for a folder?
Get the total files size for folder tree?
or is there a better way to compare the sites?
Paul

Using diff with the recursive -r and quick -q option. It is the best and by far the fastest way to do this.
diff -r -q /path/to/dir1 /path/to/dir2
It won't tell you what the differences are (remove the -q option to see that), but it will very quickly tell you if all the files are the same.
If it shows no output, all the files are the same, otherwise it will list the files that are different.

If you were using scp, you could probably have used rsync.
rsync won't transfer files that are already up to date, so you can use it to verify a copy is current by simply running rsync again.
If you were doing something like this on the old host:
scp -r from/my/dir newhost:/to/new/dir
Then you could do something like
rsync -a --progress from/my/dir newhost:/to/new/dir
The '-a' is short for 'archive' which does a recursive copy and preserves permissions, ownerships etc. Check the man page for more info, as it can do a lot of clever things.

cd website
find . -type f -print | sort | xargs sha1sum
will produce a list of checksums for the files. You can then diff those to see if there are any missing/added/different files.

maybe you can use something similar to this:
find <original root dir> | xargs md5sum > original
find <new root dir> | xargs md5sum > new
diff original new

To add on reply from Sidney.
It is not very necessary to filter out -type f, and produce hash code.
In reply to zidarsk8, you don't need to sort, since find, same as ls, sorts the filenames alphabetically by default. It works for empty directories as well.
To summarize, top 3 best answers would be:
(P.S. Nice to do a dry run with rsync)
diff -r -q /path/to/dir1 /path/to/dir2
diff <(cd dir1 && find) <(cd dir2 && find)
rsync --dry-run -avh from/my/dir newhost:/to/new/dir

Make checksums for all files, for example using md5sum. If they're all the same for all the files and no file is missing, everything's OK.

If you used scp, you probably can also use rsync over ssh.
rsync -avH --delete-after 1.example.com:/path/to/your/dir 2.example.com:/path/to/your/
rsync does the checksums for you.
Be sure to use the -n option to perform a dry-run. Check the manual page.
I prefer rsync over scp or even local cp, every time I can use it.
If rsync is not an option, md5sum can generate md5 digests and md5sumc --check will check them.

Try diffing your directory recursively. You'll get a nice summary if something is different in one of the directories.

I have been move a web site from one server to another I copied the files using SCP
You could do this with rsync, it is great if you just want to mirror something.
/Johan
Update : Seems like #rjack beat me with the rsync answer with 6 seconds :-)

I would add this to Douglas Leeder or Eineki, but sadly, don't have enough reputation to comment. Anyway, their answers are both great, excepting that they don't work for file names with spaces. To make that work, do
find [dir1] -type f -print0 | xargs -0 [preferred hash function] > [file1]
find [dir2] -type f -print0 | xargs -0 [preferred hash function] > [file2]
diff -y [file1] [file2]
Just from experimenting, I also like to use the -W ### arguement on diff and output it to a file, easier to parse and understand in the terminal.

...when comparing two folders across a network drive or on separate computers
If comparing two folders on the same computer, diff is fine, as explained by the main answer.
However, if trying to compare two folders on different computers, or across a network, don't do that! If across a network, it will take forever since it has to actually transmit every byte of every file in the folder across the network. So, if you are comparing a 3 GB dir, all 3 GB have to be transferred across the network just to see if the remote dir and local dir are the same.
Instead, use a SHA256 hash. Hash the dir on one computer on that computer, and on the other computer on that computer. Here is how:
(From my answer here: How to hash all files in an entire directory, including the filenames as well as their contents):
# 1. First, cd to the dir in which the dir of interest is found. This is
# important! If you don't do this, then the paths output by find will differ
# between the two computers since the absolute paths to `mydir` differ. We are
# going to hash the paths too, not just the file contents, so this matters.
cd /home/gabriel # example on computer 1
cd /home/gabriel/dev/repos # example on computer 2
# 2. hash all files inside `mydir`, then hash the list of all hashes and their
# respective file paths. This obtains one single final hash. Sorting is
# necessary by piping to `sort` to ensure we get a consistent file order in
# order to ensure a consistent final hash result. Piping to awk extracts
# just the hash.
find mydir -type f -exec sha256sum {} + | sort | sha256sum | awk '{print $1}'
Example run and doutput:
$ find eclipse-workspace -type f -exec sha256sum {} + | sort | sha256sum | awk '{print $1}'
8f493478e7bb77f1d025cba31068c1f1c8e1eab436f8a3cf79d6e60abe2cd2e4
Do this on each computer, then ensure the hashes are the same to know if the directories are the same.
Note that the above commands ignore empty directories, file permissions, timestamps of when files were last edited, etc. For most cases though that's ok.
You can also use rsync to basically do this same thing for you, even when copying or comparing across a network.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string