detect if something is modified in directory, and if so, backup - otherwise do nothing - linux

I have a "Data" directory, that I rsync to a remote NAS periodically via a shell script.
However, I'd like to make this more efficient. I'd like to detect if something has changed in "Data" before running rsync. This is so that I don't wake up the drives on the NAS unnecessarily.
I was thinking of modifying the shell script to get the latest modified time of the files in Data (by using a recursive find), and write that to a file every time Data is rsynced.
Before every sync, the shell script can compare the current timestamp of "Data" with the previous timestamp when "Data" was sync'd. If the current timestamp is newer, then rsync, otherwise do nothing.
My question is, is there a more efficient way to figure out if the "Data" directory is modified since the last rsync? Note that Data has many, many, layers of sub-directories.

If I understand correctly, you just want to see if any files have been modified so you can figure out whether to proceed to the rsync portion of your script?
It's a pretty simple task to figure out when the data was last synced, especially if you do this nightly. As soon as you find one file with mtime greater than the time of the last sync, you know you have to proceed to the full rsync.
find has this functionality built in:
# find all files modified in the last 24 hours
find -mtime 1

Rsync already does this. There is no on-demand solution that doesn't require checking the mtime and ctime properties of the inodes.
However you could create a daemon that uses inotify to track changes as they occur, and fire rsync at intervals, or whenever you feel sufficient events have occurred to justify calling rsync.

I would use the find command, but do it this way: When the rsync runs, touch a file, like "rsyncranflag". Then you can run
find Data -newer rsyncranflag
That will say definitively whether any files were changed since the last rsync (subject to the accuracy of mtime).

Related

How to get all changes you made to your config files (since system install) in one shot?

I wonder if there is any way i could retrieve all changes i made to my various configuration files since install(residing in /etc and so on) in one shot?
I imagine some kind of loop, that uses 'diff' to compare all those files to a 'standard installation' of ubuntu. Output should be a single file with information regarding the changes that were made and a timestamp.
Perhaps there is even a way to put all that in a script and let it run regularly to automatically keep track of future config file changes.
If the files are already modified, I guess your only option is to diff your files with a fresh install. Keep in mind some files might be specific to you computer, I'm thinking of files that can hold device-specific values like your mac address udev/rules.d/70-persistent-net.rules, your drives uuid /etc/fstab, etc.
If you're planning this ahead, there are at least two options you can consider:
use a VCS such as git.
use a filesystem that keeps a complete history of the changes made.

Mysterious find command hogging memory on Linux Mint

I'm running linux mint 17 and I notice that every so often my computer slows to a crawl.W When I look at top I see "/usr/bin/find / -ignore_readdir_race (..." etc. sucking up most of my memory. It runs for a really long time (several hours) and my guess is that its an automated indexing process for my hard drive.
I'm working on a project that requires me to have over 6 million audio files on a mounted SSD so another guess is that the filesystem manager is trying to index all these files for quick search. Is that the case? Is there any way to turn it off for the SSD?
The locate command reports data collected for its database by a regular cron task. You can exclude directories from the database, making the task run more quickly. According to updatedb.conf(5)
PRUNEPATHS
A whitespace-separated list of path names of directories which should not be scanned by updatedb(8). Each path name must be exactly in the form in which the directory would be reported by locate(1).
By default, no paths are skipped.
On my Debian machine for instance, /etc/updatedb.conf contains this line:
PRUNEPATHS="/tmp /var/spool /media"
You could modify your /etc/updatedb.conf to add the directories which you want to ignore. Only the top-level directory of a directory tree need be listed; subdirectories are ignored when the parent is ignored.
Further reading:
Tip of the day: Speed up `locate`
How do I get mlocate to only index certain directories?
It's a daily cron job that updates databases used by the locate command. See updatedb(8) if you want to learn more. Having six million audio files will likely cause this process to eat up a lot of CPU as it's trying to index your local filesystems.
If you don't use locate, I'd recommend simply disabling updatedb, something like this:
sudo kill -9 <PID>
sudo chmod -x /etc/cron.daily/mlocate
sudo mv /var/lib/mlocate/mlocate.db /var/lib/mlocate/mlocate.db.bak
If all else fails just remove the package.

Linux ~/.bashrc export most recent directory

I have several environment variables in my ~/.bashrc that point to different directories. I am running a program that creates a new folder every time that it runs and puts a time stamp in the directory name. For example, baseline_2015_11_10_15_40_31-model-stride_1-type_1. Is there away of making a variable that can link to the last created directory?
cd $CURRENT_DIR
Your mileage may vary a lot depending on what exactly do you need to accomplish. However, it almost all cases I would advise against doing something that weird and unreliable like what's described below and revise your architecture to avoid hunting for directories.
Method 1
If your program creates a subdirectory inside current directory, and you always know that nothing else happens in that directory and you want a subdirectory with latest creation timestamp, then you can do something like:
your_complex_program_that_creates_dir
TARGET_DIR=$(ls -t1 --group-directories-first | head -n1)
cd "$TARGET_DIR"
Method 2
If a lot of stuff happens on the system, then you'll end up monitoring what your program does with the filesystem and reacting when it creates a directory. There are two ways to do that, using strace and inotify, both are relatively complex. Here's the way to do that with strace:
strace -o some_temp_file.strace your_complex_program_that_creates_dir
TARGET_DIR=$(sed -ne '/^mkdir(/ { s/^mkdir("\(.*\)", .*).*$/\1/; p }' some_temp_file.strace
cd "$TARGET_DIR"
This snippet runs your_complex_program_that_creates_dir under control of strace, which essentially logs every system call your program makes into a file. Afterwards, this file is analyzed to seek a line like
mkdir("target_dir", 0777) = 0
and extract value of "target_dir" into a variable. Note that:
if your program creates more than 1 directory (even for temporary purposes and deletes them afterwards, or whatever) — there's really no way to determine which of them to grab
running a program with strace is much slower that normal due to huge overhead of logging all the syscalls.
it's super non-portable — facilities like strace exist on most modern OS, but implementations will vary a lot
A solution with inotify works in the same way, but using different mechanism — i.e. it uses OS hook to log all the operations that process performs with file system and then react to it (remember created directory).
However, I repeat, I'd strongly suggest against using any of these solutions beyond research interest.

Symbolic link to latest file in a folder

I have a program which requires the path to various files. The files live in different folders and are constantly updated, at irregular intervals.
When the files are updated, they change name, so, for instance, in the folder dir1 I have fv01 and fv02. Later on the day someone adds fv02_v1; the day after someone adds fv03 and so on. In other words, I always have an updated file but with different name.
I want to create a symbolic link in my "run" folder to these files, such that said link always points to the latest file created.
I can do this in Python or Bash, but I was wondering what is out there, as this is hardly an uncommon problem.
How would you go about it?
Thank you.
Juan
PS. My operating system is Linux. I currently have a simple daemon (Python) that looks every once in a while (refreshes every minute) for the latest file. Seems kind of an overkill to me.
Unless there is some compelling reason that you have left unstated (e.g. thousands of files in the directory) just do it the way you suggest with a script sorting the files by modification time. There is no secret method that I am aware of.
You could write a daemon using inotify to monitor your directories and immediately set your links but that seems like overkill.
Edit: I just saw your edit. Since you have the daemon already, inotify might not be such a bad idea. It would be somewhat more efficient than constantly querying since the OS will tell you when something in your directories has changed.
I don't know python well enough to point you to anything specific but there must exist a wrapper for inotify.

Can /tmp in Linux ever fill up?

I'm putting some files in /tmp on a web server that are being used by a web application for a limited amount of time. If the files get left in the server's /tmp after the user quits using the application and this happens repeatedly, should i be concerned about the directory filling up? I read online that rebooting cleans out the /tmp directory, but this box doesn't get rebooted very much.
Tom
Yes, it will fill up. Consider implementing a cron job that will delete old files after a while.
Something like this should do the trick:
/usr/bin/find /tmp/mydata -type f -atime +1 -exec rm -f {} \;
This will delete files that have a modification time that's more than a day old.
Or as a crontab entry:
# run five minutes after midnight, every day
5 0 * * * /usr/bin/find /tmp/mydata -type f -atime +1 -exec rm -f {} \;
where /tmp/mydata is a subdirectory where your application stores its temporary files. (Simply deleting old files under /tmp would be a very bad idea, as someone else pointed out here.)
Look at the crontab and find man pages for the details. Don't go running scripts that delete files on your filesystem without understanding all the details - that's how bad things happen to good servers. :)
Of course, if you can just modify your application to delete temporary files when it's done with them, that would be a far better solution, generally.
You can't just blindly delete everything that hasn't been modified for a certain amount of time. A lot of programs store sockets in there, which never get modified but are still an integral part of the program working. Take for instance mysql from one of my servers:
srwxrwxrwx 1 mysql mysql 0 Sep 11 04:01 mysql.sock=
That's a valid, working "file" in /tmp. It just looks old because mysql hasn't been restarted in a while. Either limit your find with '-type f' or '-atime', or use one of the distro-provided tools others have mentioned.
The only thing you can write to without worrying it will fill up is /dev/null. Everything else will eventually run out of space if you keep dumping things in it.
One simple approach would be to have a cron job clean up all your /tmp files that are older than, say, a few days.
Yep It will be linked to one of your disks/partitions and can fill up.
It gets deleted on a reboot.
When the user quits the application you should clean the files up after them.
In which language is your web-application? A lot of languages propose temp files:
C
python
php
...
Search in your language if there is such a feature.
Just a warning: not all Linux installation clean the /tmp directory after each reboot
Some linux distros have a package that will clean up old files in /tmp for you. It isn't hard to implement your own, as mentioned above. One thing to look out for are long running processes, especially "zombies", which are ones that have died but which haven't finished cleaning up after themselves. If a process has a file open, just deleting it from /tmp won't actually reclaim its space - you have to kill the process or somehow coerce it to close the file. Many programs that write log or temporary files are designed to catch a signal (often SIGUSR1) and close and re-open any log or temporary files for that reason.
Many Linux distributions include something named 'tmpwatch', or similar which runs via cron and deletes things on a pre-defined gradient. Some are smart enough to go by the owner of the file .. stuff that is owned by daemon users gets cleaned out faster than stuff owned by regular users. Check on the mailing lists for your distro of choice to find out.
Still, you should have SNMP or some other kind of monitor watching how much room is available, if it fills up services like Apache aren't going to be happy. For instance, e-accelerator for PHP will need plenty of room, some mail scanners don't clean up properly, etc.

Resources