How to ensure only one cron entry with Ansible? - cron

I use the cron module to add a crontab entry to managed hosts.
I realized that some of them already have this entry (it was probably added manually) and these systems now have two identical entries (and two jobs starting at the same time).
Is there a way to ensure that only one entry matching requirements (timing and command) is present?
I had a similar problem with lineinfile and ended up first deleting all matching lines, then recreate a single one (in the meantime I found a post explaining the exact same approach). This works for files, though, and not for crontabs (which are ultimately files but I doubt I can safely edit them directly)

If you're on a generic Linux machine, you should be able to place a file in /etc/cron.d/. It uses root-style crontab, so you specify a user as the sixth item, and the command as the seventh.
You should place one "entry" per file- so you might have /etc/cron.d/cleanup, /etc/cron.d/makeasandwich, and so on. You can use copy or template to generate them.
There's also a cron module, but it has some of the massive problems that lineinfile has. It's much better to be confident about what your server configuration is.

Related

Want to store a value in local ../usr from shell script

I just want to store some values while running shell script ,
scenario : if im running shell script it will do some operation and it will store the results/activity done.
then again I'm running the same script I should identify these are executed and you can continue from here . some what I need . how to do that? can we use .lock file or else any other best ways are there?
I just want to store some values while running shell script , how to do that? can we use .lock file or else any other best ways are there?
.lock files are by convention used to identify running services and I would therefor vote against it.
It just sounds like you want to keep track of your progress.
If you do not mind the data being erased post reboot I'd suggest you simple use /tmp for that (this remains in memory), do mind that if we are talking very large amounts this will drain your available mem.
Without knowing your use case it's hard to tell you what is the best solution.
But I would suggest writing an empty file that just indicates that your script is in progress(very similar to lock behaviour) and a second file that just keeps track of what items you processed.
Then just loop over the items and skip until you hit a 'new' item.
If we are talking very large amounts you should consider using a local database or database server.

Running script twice at time

I'm making a little simple script to improve the efficienty of my work team.
The script simply searches a file that the user gives as param.
./check_file test_file.xml
I used only ls and cp commands and there's no log or temporary files.
My question is: should I put a .lock file to be sure that the script runs only once at time or can I avoid this control?
Usually I create a lock file, because my scripts write temporary files and if two users run at the same moment the script, it explodes.
Thanks!
Generally speaking, no. I would recommend avoiding temporary files as much as possible, preferring pipes instead. However, I doubt it's always possible to avoid temporary files, so when I have to, I use $$ in the filename (current process ID or PID). So if you're using /tmp/check_file.tmp as your temporary filename, instead use /tmp/check_file.$$.tmp - then two processes can run at a time, each with their own PID, and not overlap.
Slightly more advanced is to also use ${TMP:-/tmp} as the temporary directory instead of just /tmp - that way users can specify a different directory for each run, and thereby also avoid any overlaps.

"find" command cannot detect files added during execution

Stackoverflow has saved my life on countless occasions over the years. Now, it's time for me to post my first question ever, the answer to which I have been unable to find so far.
I have a tool (language/implementation is irrelevant) which accepts a text file as input. This text file (let's call it file_list.txt) contains a long list of file paths, one per line. The tool then iterates over the lines in file_list.txt and does something with every file path. This needs to be done continuously and file_list.txt needs to always contain the latest file paths because users continuously upload or delete files from the share being monitored. To achieve this, I have set up a cron job which calls a script. First the script calls the find utility with the search parameters required and pipes the output to a temporary file. When the file is fully populated, it is moved to file_list.txt. Then, once this is done, the tool is invoked with file_list.txt as an input parameter.
So far, so good. The share being monitored is VERY LARGE (~60 TB) and the find command takes around 5 hours to execute. This is not a problem since we have multiple overlapping find commands running in parallel (triggered once per hour). The entire setup runs on a compute farm, so CPU utilization, etc. is also not an issue.
The problem arises in the lag time for file detection. Ideally, I want a user to add a file and I want one of the already running, overlapping find commands to detect this file within a matter of minutes. However, I have noticed that none of the already-running find commands will detect this file. Only a find command started AFTER this file was added will detect it. This means that generally, I need to wait around 5 hours for a newly added file to be detected. This leads me to believe that the find utility somehow acts on a "cached" version of the share state when it was triggered. Is this true? Can anyone confirm this? And if so, what can I do to improve the detection lag?
Please let me know if further clarificaion is required. I am happy to provide any further details.
To summarize: you have a gigantic filesystem volume (60 TB) which contains a huge number of files, and you use find(1) to name a large number of those files and put those names into a text file for analysis. You have discovered that files are not listed if they are created after find(1) was started but before it finished.
I think the best solution is to stop thinking of this as a batch job, and do it "online" using inotify(7). You can use the inotify API to be immediately informed of changes to your filesystem, including new files being created. There is of course the original C API, as well as the excellent pyinotify.
With inotify, you can start a watcher program once and leave it running continuously (under a supervisor if needed for restarts). The operating system can then notify you whenever a relevant filesystem event occurs, and you can respond immediately rather than waiting for the next scan.
The one downside for your use case might be that the watcher program does need to run on a machine which has the filesystem mounted locally. But the overall compute resources required are probably much less than your current approach of repeated linear scans.
executing find commands and piping the output to temporary files might work up to a certain scale, but is far from optimal. If you want a less resource intensive, more reactive solution, I would recommend considering to reimplement your software using the inotify interface:
The inotify API provides a mechanism for monitoring filesystem events.
Inotify can be used to monitor individual files, or to monitor
directories. When a directory is monitored, inotify will return
events for the directory itself, and for files inside the directory.
So an event will be raised for each file change; or file being added.
Note that you can then keep an internal list of files up to date which only needs to be changed when you get a event.

Shake: Signal whether anything had to be rebuilt at all

I use shake to build a bunch of static webpages, which I then have to upload to a remote host, using sftp. Currently, the cronjob runs
git pull # get possibly updated sources
./my-shake-system
lftp ... # upload
I’d like to avoid running the final command if shake did not actually rebuild anything. Is there a way to tell shake “Run command foo, after everything else, and only if you changed something!”?
Or alternatively, have shake report whether it did something in the process exit code?
I guess I can add a rule that depends on all possibly generated file, but that seems to be redundant and error prone.
Currently there is no direct/simple way to determine if anything built. It's also not such a useful concept as for simpler build systems, as certain rules (especially those that define storedValue to return Nothing) will always "rerun", but then very quickly decide they don't need to run the rules that depend on them. To Shake, that is the same as rerunning. I can think of a few approaches, which one is best probably depends on your situation:
Tag the interesting rules
You could tag each interesting rule (one that produces something that needs uploading) with a function that writes to a specific file. If that specific file exists, then you need to upload. This might work slightly better, as if you do multiple Shake runs, and in the first something changes but the second nothing does, the file will still be present. If it makes sense, use an IORef instead of a file.
Use profiling
Shake has quite advanced profiling. If you pass shakeProfile=["output.json"] it will produce a JSON file detailing what built and when. Runs are indexed by an Int, with 0 for the most recent run, and any runs that built nothing are excluded. If you have one rule that always fires (e.g. write to a dummy file with alwaysRerun) then if anything fired at the same time, it rebuilt.
Watch the .shake.database file size
Shake has a database, stored under shakeFiles. Each uninteresting run it will grow by a fairly small amount (~100 bytes) - but a fixed size given your system. If it changes in size by a greater amount, then it did something interesting.
Of these approaches, tagging the interesting rules is probably the simplest and most direct (although does run the risk of you forgetting to tag something).

Linux non-su script indirectly triggering su script?

I'd like to create an auto-testing/grading script for students on a Linux system such that:
Any student user can initiate the script at any time.
A separate script (with root privileges) copies student code to a non-student-accessible file space, using non-student-accessible unit tests, etc.
The user receives limited feedback in the form of a text file generated by the grading script.
In short, I'm looking to create something similar to programming contest submission systems, but allowing richer feedback without revealing all teacher unit testing.
I would imagine that a spooling behavior between one initiating script and one root-permission cron script might be in order. Are there any models/examples of how one might best structure communication between a user-initiated script and a separate root-initiated script for such purposes?
There are many options.
The things I would mention at the first line:
Don't use su; use sudo; there are several reasons for it, and the main reason, that to use su you need the password of the user you want to be and with sudo — you don't;
Scripts can't be suid, you must use binaries or just a normal script that will be started using sudo (of course students must have sudoers entry that allows them to use the script);
Cron is not that fast, as you may theoretically need; cron runs tasks every minute; please consider inotify usage;
To communicate between components of your system you need something that will react in realtime; there are many opensource components/libraries/frameworks that could help you, but I would recommend you to take a look at ZeroMQ and Redis;
Results of the scripts' executions/tests can be written either to a filesystem (I think it would be better), or to a DBMS.
If you want to stick to shell scripting, the method I suggest for communicating between processes would be to have the root script continually check a named pipe for input (i.e. keep opening it after each eof) and send each input through whatever various tests must be done. Have part of the input be a 'return address' - where to send the result.
This should allow the tests to be performed in a privileged space without exposing any control over the privileged space to the students. The students don't need sudo, and you don't need to pull in libraries. Just have the students pipe their code into a non-privileged script that adds the return address and whatever other markup you may need, which then gives it to the named pipe.

Resources