linux - watch a directory for new files, then run a script - linux

I want to watch a directory in Ubuntu 14.04, and when a new file is created in this directory, run a script.
specifically I have security cameras that upload via FTP captured video when they detect motion. I want to run a script on this FTP server so when new files are created, they get mirrored (uploaded) to a cloud storage service immediately, which is done via a script I've already written.
I found iWatch which lets me do this (http://iwatch.sourceforge.net/index.html) - the problem I am having is that iwatch immediately kicks off the cloud upload script the instant the file is created in the FTP directory, even while the file is in progress of being uploaded still. This causes the cloud sync script to upload 0-byte files, useless to me.
I could add a 'wait' in the cloud upload script maybe but it seems hack-y and impossible to predict how long to wait as it depends on file size, network conditions etc.
Whats a better way to do this?

Although inotifywait was mentioned in comments, a complete solution might be useful to others. This seems to be working:
inotifywait -m -e close_write /tmp/upload/ | gawk '{print $1$3; fflush()}' | xargs -L 1 yourCommandHere
will run
yourCommandHere /tmp/upload/filename
when a newly uploaded file is closed
Notes:
inotifywait is part of apt package inotify-tools in Ubuntu. It uses the kernel inotify service to monitor file or directory events
-m option is monitor mode, outputs one line per event to stdout
-e close_write for file close events for files that were open for writing. File close events hopefully avoid receiving incomplete files.
/tmp/upload can be replaced with some other directory to monitor
the pipe to gawk reformats the inotifywait output lines to drop the 2nd column, which is a repeat of the event type. It combines the dirname in column 1 with the filename in column 3 to make a new line, which is flushed every line to defeat buffering and encourage immediate action by xargs
xargs takes a list of files and runs the given command for each file, appending the filename on the end of the command. -L 1 causes xargs to run after each line received on standard input.

You were close to solution there. You can watch many different events with iwatch - the one that interests you is close_write. Syntax:
iwatch -e close_write <directory_name>
This of course works only if file's closed when the writing's complete, which, while it's a sane assumption, it's not necessarily a true one (yet often is).

Here's another version of reacting to a filesystem event by making a POST request to a given URL.
#!/bin/bash
set -euo pipefail
cd "$(dirname "$0")"
watchRoot=$1
uri=$2
function post()
{
while read path action file; do
echo '{"Directory": "", "File": ""}' |
jq ".Directory |= \"$path\"" |
jq ".File |= \"$file\"" |
curl --data-binary #- -H 'Content-Type: application/json' -X POST $uri || continue
done
}
inotifywait -r -m -e close_write "$watchRoot" | post

Related

Curl command returning file regardless if it exists or not

I am running a curl command that passes cookies.txt with an authentication string generated in a previous step and attempts to download a file that is generated daily. This works great when the file exists, but the problem i'm running into is when the file is not yet released. I start listening at 5PM and I re-run the script every 5 minutes and attempt to grab the file. Currently I check to see if the file size is above a certain value, but it doesn't work very well.
Is there any way to set curl to only create a file if the file it's attempting to grab exists? I'd really like to avoid the whole checking file size practice since that's really unreliable for multiple files of differing sizes.
Curl Command:
curl -b cookies.txt -J -L -v -O https://file_http_example.thespot.com/cleared_product_$(date +"%Y_%m_%d").xlsx
I try comparing the file size by doing the following:
fileSize=500
targetFileSize=$(wc -c file_name_$(date + "%Y_%m_%d").xlsx | awk '{print $1}')
if [ "$targetFileSize" -gt "$fileSize" ]
then keep the file otherwise delete it.
UPDATE:
I just ran the script with the below changes and I still saved a file. This is what's saved in the file:
Assuming the server respond with an HTTP error when the file isn't available, you can use curl's -f flag :
-f, --fail Fail silently (no output at all) on HTTP errors
This will avoid creating a file when the server responds with for an HTTP 404 or other error code.

List files which have a corresponding "ready" file

I have a service "A" which generates some compressed files comprising of the data it receives in requests. In parallel there is another service "B" which consumes these compressed files.
The trick is "B" shouldn't consume any of the files unless they are written completely. The service deduces this information by looking for a ".ready" file created by service "A" with name exactly same as the file generated along with the extension mentioned; once the compression is done. Service "B" uses Apache Camel to do this filtering.
Now, I am writing a shell script which needs the same compressed files and this would need the same filtering be implemented in shell. I need help writing this script. I am aware of find command but a naive shell user, so have very limited knowledge.
Example:
Compressed file: sumit_20171118_1.gz
Corresponding ready
file: sumit_20171118_1.gz.ready
Another compressed file: sumit_20171118_2.gz
No ready file is present for this one.
Of the above listed files only the first should be picked up as it has a corresponding ready file.
The most obvious way would be to use a busy loop. But if you are on GNU/Linux you can do better than that (from: https://www.gnu.org/software/parallel/man.html#EXAMPLE:-GNU-Parallel-as-dir-processor)
inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f my_dir |
parallel -uj1 echo Do stuff to file {}
This way you do not even have to wait for the .ready file: The command will only be run when writing to the file is finished and the file is closed.
If, however, the .ready file is only written much later then you can search for that one:
inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f my_dir |
grep --line-buffered '\.ready$' |
parallel -uj1 echo Do stuff to file {.}

Listing files while working with them - Shell Linux

I have a database server that it basic work is to import some specific files, do some calculations and provide data in a web interface.
It's planned for next weeks a hardware replacement, it needs to migrate the database. But there's one problem in it: the actual database is corrupted and show some errors in web interface. This is due to server freezing while importing/calculating, that's why the replacement.
So I'm not willing to just dump the db and restore in the new server. Doesn't make sense to still use the corrupted database and while dumping the old server goes really slow. I have a backup from all files to be imported (the current number is 551) and I'm working on a script to "re-import" all of them and have a nice database again.
The actual server takes ~20 minutes to import each new file. Let's say that new server takes 10 for each file due to its power... It's a long time! And here comes the problem: it receives new file hourly, so there will be more files when it finishes the job.
Restore script start like this:
for a in $(ls $BACKUP_DIR | grep part_of_filename); do
Question is: does this "ls" will have new file names when they come? File names are timestamp based, so they will be in the end of the list.
Or does this "ls" is execute once and results goes to a temp var?
Thanks.
ls will execute once, at the beginning, and any new files won't show up.
You can rewrite that statement to list the files again at the start of each loop (and, as Trey mentioned, better to use find, not ls):
while all=$(find $BACKUP_DIR/* -type f | grep part_of_filename); do
for a in $all; do
But this has a major problem: it will repeatedly process the same files over and over again.
The script needs to record which files are done. Then it can list the directory again and process any (and only) new files. Here's one way:
touch ~/done.list
cd $BACKUP_DIR
# loop while f=first file not in done list:
# find list the files; more portable and safer than ls in pipes and scripts
# fgrep -v -f ~/done.list pass through only files not in the done list
# head -n1 pass through only the first one
# grep . control the loop (true iff there is something)
while f=`find * -type f | fgrep -v -f ~/done.list | head -n1 | grep .`; do
<process file $f>
echo "$f" >> ~/done.list
done

Run additional command when rsync detects a file

I am currently running the following script to make an automatic backup of my Music:
#!/bin/bash
while :; do
rsync -ruv /mnt/hdd1/Music/ /mnt/hdd2/Music/
done
Whenever a new file is added to my music folder, it is detected by rsync and it is copied to my other disk. This script runs fine, but I would also like to convert the detected file to an ogg opus file for putting on my phone.
My question is: How do I run a command on a new file found by rsync -u?
I will also accept answers which work totally differently, but have the same result.
rsync -ruv /mnt/hdd1/Music /mnt/hdd2/ | sed -n 's|^Music/||p' >~/filelist.tmp
while IFS= read filename
do
[ -f "$filename" ] || continue
# do something with file
echo "Now processing '$filename'"
done <~/filelist.tmp
With the -v option, rsync prints the names of files it copies to stdout. I use sed to capture just those filenames, excluding the informational messages, to a file. The filenames in that file can be processed later as you like.
The approach with sed above depends on rsync displaying filenames starting with the final part of the source directory, e.g. "Music/" in my example above, which is then removed assuming that you don't need it. Alternately, one could try an explicit approach for excluding noise messages.

inotifywait command not detecting files but folders it does

Im trying to use inotifywait to detect every time a file or folder gets moved into a folder in realtime (eg. /root in the case)
I tried this, which does detect both folders and files, but this is for a created file, I want it for a moved file/folder.
inotifywait --monitor --format %f --event create /root
So then I use this, but using this only sees when a folder is moved in, when I move in a file nothing is shown... :(
inotifywait --monitor --format %f --event moved_to /root
Any idea what's going on?
PS, Im using Linux, Debian 5 (Lenny).
You can specify many events with inotify. In your case it seems you need something like :
inotifywait --monitor --format %f --event move --event create /root
It should works. If you need more, read carefully the man page :
-e <event>, --event <event>
Listen for specific event(s) only.
The events which can be listened for are listed in the EVENTS section.
This option can be specified more than once.
If omitted, all events are listened for.
[...]
EVENTS
The following events are valid for use with the -e option:
[...]
move A file or directory was moved from or to a watched directory.
Note that this is actually implemented simply by listening for both moved_to
and moved_from, hence all close events received will be output as one or both
of these, not MOVE.
create A file or directory was created within a watched directory.
It works for me with move / touch. Hope it helps ...

Resources