Running script twice at time - linux

I'm making a little simple script to improve the efficienty of my work team.
The script simply searches a file that the user gives as param.
./check_file test_file.xml
I used only ls and cp commands and there's no log or temporary files.
My question is: should I put a .lock file to be sure that the script runs only once at time or can I avoid this control?
Usually I create a lock file, because my scripts write temporary files and if two users run at the same moment the script, it explodes.
Thanks!

Generally speaking, no. I would recommend avoiding temporary files as much as possible, preferring pipes instead. However, I doubt it's always possible to avoid temporary files, so when I have to, I use $$ in the filename (current process ID or PID). So if you're using /tmp/check_file.tmp as your temporary filename, instead use /tmp/check_file.$$.tmp - then two processes can run at a time, each with their own PID, and not overlap.
Slightly more advanced is to also use ${TMP:-/tmp} as the temporary directory instead of just /tmp - that way users can specify a different directory for each run, and thereby also avoid any overlaps.

Related

Want to store a value in local ../usr from shell script

I just want to store some values while running shell script ,
scenario : if im running shell script it will do some operation and it will store the results/activity done.
then again I'm running the same script I should identify these are executed and you can continue from here . some what I need . how to do that? can we use .lock file or else any other best ways are there?
I just want to store some values while running shell script , how to do that? can we use .lock file or else any other best ways are there?
.lock files are by convention used to identify running services and I would therefor vote against it.
It just sounds like you want to keep track of your progress.
If you do not mind the data being erased post reboot I'd suggest you simple use /tmp for that (this remains in memory), do mind that if we are talking very large amounts this will drain your available mem.
Without knowing your use case it's hard to tell you what is the best solution.
But I would suggest writing an empty file that just indicates that your script is in progress(very similar to lock behaviour) and a second file that just keeps track of what items you processed.
Then just loop over the items and skip until you hit a 'new' item.
If we are talking very large amounts you should consider using a local database or database server.

"Just in time" read only filesystem using mkfifo and inotifywait

I am writing some gross middleware - basically, I have some old code that needs to open 100,000 files for reading only, expecting them all to be in one folder. It never writes. It is multiprocess so it can try to open ~30 files at the same time. The old way, I would have to actually copy the files into that folder (or use links, NFS, etc.). Worth noting I have no ability to change this old code - its just a binary.
I have some new, fancy code that can retrieve a file almost instantly. I want to tie these things together, so when the old code tries to open the file, it is actually, in real time, running the new code.
So I thought of mkfifo and inotifywait. Instead of a folder of 100,000 files, I can make a folder of 100,000 named pipes. So far so good. The legacy code goes to open the files, not knowing that they are indeed named pipes. The problem is, I don't know what order the legacy code is going to open the files (nice, right?). So I would like to TRIGGER the named pipe WRITE (from my fancy new code) when the legacy code goes in for the read. I can't spawn 100,000 writes and have them all block. So I thought hey - inotifywait makes sense. Every time the legacy goes to open the pipe, it triggers a read event, which can then be used to spawn the pipe writer in the background. The problem is.. inotifywait doesn't trigger the read event until AFTER the writer has been spawned!
Any ideas of how to solve this? Basically - I want to intercept a file open, block for a couple hundred ms while I retrieve the contents of the file, then return that contents. Ideally I don't have to create a custom FUSE filesystem to do this.. its just a read-only file open. The problem is this needs to run fast and in parallel.. and I don't know which files are going to be opened in what order. Gotta be a quick and dirty way!
Thanks in advance for everyone's time.

Is there a way to make a bash script process messages that have been sent to it using the write command

Is there a way to make a bash script process messages that have been sent to it using the "write" command? So for example, if a user wants to activate a feature in my script, could I make it so that they can send the script a command using the write command?
One possible method I thought of was to configure logging for a screen session and then have the bash script parse text through there, but I'm not sure if there would be a simpler or more efficient way to tackle this
EDIT: I was thinking as an alternative solution I could use a named pipe. I'm worried that it would break though if the tmp partition gets filled up completely (not sure if this would impact write as well?). I'm going to be running this script on a shared box, and every once in a while someone will completely fill up the /tmp partition and then just leave it like that until people start complaining
Hmm, you are trying to really circumvent a poor unix command to ask it something it was not specified for. From the man page (emphasize mine):
The write utility allows you to communicate with other users, by copying
lines from your terminal to theirs
That means that write is intended to copy line directly on terminals. As soon as you say, I will dump terminal output with screen, and then parse the dump file, you loose the simplicity of write (and also need disk space, with the problem of removing old lines from a sequencial file)
Worse, as your script lives on its own, it could (should?) be a daemon script attached to no terminal
So if I have correctly understood your question, your requirements are:
a script that does some tasks and should be able to respond to asynchronous requests - common usages are named pipes or network or unix domain sockets, less common are files in a dedicated folder with a optional signal to have immediate processing, adding lines to a sequential file while being possible is uncommon, because of a synchonization of access problem
a simple and convivial way for users to pass requests. Ok write is nice for that part, but much too hard to interface IMHO
If you do not want to waste time on that part by using standard tools, I would recommend the mail system. It is trivial to alias a mail address to a program that will be called with the mail message as input. But I am not sure it is worth it, because the user could directly call the program with the request as input or command line parameter.
So the client part could be simply a program that:
create a temporary file in a dedicated folder (mkstemp is your friend in C or C++, or mktemp in shell - but beware of race conditions)
write the request to that file
optionaly send a signal to a pid - provided the script write its own PID on startup to a dedicated file

"find" command cannot detect files added during execution

Stackoverflow has saved my life on countless occasions over the years. Now, it's time for me to post my first question ever, the answer to which I have been unable to find so far.
I have a tool (language/implementation is irrelevant) which accepts a text file as input. This text file (let's call it file_list.txt) contains a long list of file paths, one per line. The tool then iterates over the lines in file_list.txt and does something with every file path. This needs to be done continuously and file_list.txt needs to always contain the latest file paths because users continuously upload or delete files from the share being monitored. To achieve this, I have set up a cron job which calls a script. First the script calls the find utility with the search parameters required and pipes the output to a temporary file. When the file is fully populated, it is moved to file_list.txt. Then, once this is done, the tool is invoked with file_list.txt as an input parameter.
So far, so good. The share being monitored is VERY LARGE (~60 TB) and the find command takes around 5 hours to execute. This is not a problem since we have multiple overlapping find commands running in parallel (triggered once per hour). The entire setup runs on a compute farm, so CPU utilization, etc. is also not an issue.
The problem arises in the lag time for file detection. Ideally, I want a user to add a file and I want one of the already running, overlapping find commands to detect this file within a matter of minutes. However, I have noticed that none of the already-running find commands will detect this file. Only a find command started AFTER this file was added will detect it. This means that generally, I need to wait around 5 hours for a newly added file to be detected. This leads me to believe that the find utility somehow acts on a "cached" version of the share state when it was triggered. Is this true? Can anyone confirm this? And if so, what can I do to improve the detection lag?
Please let me know if further clarificaion is required. I am happy to provide any further details.
To summarize: you have a gigantic filesystem volume (60 TB) which contains a huge number of files, and you use find(1) to name a large number of those files and put those names into a text file for analysis. You have discovered that files are not listed if they are created after find(1) was started but before it finished.
I think the best solution is to stop thinking of this as a batch job, and do it "online" using inotify(7). You can use the inotify API to be immediately informed of changes to your filesystem, including new files being created. There is of course the original C API, as well as the excellent pyinotify.
With inotify, you can start a watcher program once and leave it running continuously (under a supervisor if needed for restarts). The operating system can then notify you whenever a relevant filesystem event occurs, and you can respond immediately rather than waiting for the next scan.
The one downside for your use case might be that the watcher program does need to run on a machine which has the filesystem mounted locally. But the overall compute resources required are probably much less than your current approach of repeated linear scans.
executing find commands and piping the output to temporary files might work up to a certain scale, but is far from optimal. If you want a less resource intensive, more reactive solution, I would recommend considering to reimplement your software using the inotify interface:
The inotify API provides a mechanism for monitoring filesystem events.
Inotify can be used to monitor individual files, or to monitor
directories. When a directory is monitored, inotify will return
events for the directory itself, and for files inside the directory.
So an event will be raised for each file change; or file being added.
Note that you can then keep an internal list of files up to date which only needs to be changed when you get a event.

Move file in Linux only when it's not in use by another process

Using the lsof command, I can determine whether a file is in use by some process, but I need to atomically check a file for use and move it only if unused. These files are in use by various other programs over which I have no control, so I can't use advisory locks. The purpose is to stop other processes from modifying that file, so just moving the file while a process has it open is not OK. Any solution?
UPDATE: a solution just occurred to me that I think suits my purposes. The end goal is to process these files in their final state when other programs finish modifying them. If I move the file to another directory, I can then use lsof to check whether it is still in use via its old path; if so, I just check again later until it's no longer in use and then process the file. By moving the file to another directory, it hides the file from users and the program. I don't want users and programs seeing the file in the old directory because that gives them opportunity to open the file in between the time I use lsof and process the file, which means I'd be processing the file in a modified state.
How about using fuser? Run it on a file and it will display the PIDs of processes using the file. If there are no PIDs, there is nothing using the file. It will also return a non-zero exit code if there is no process using the file.
However, note that you could still have a race condition, because a process could open the file after the fuser command returns and before you mv it.
Some sample code to move a file if not in use:
if ! fuser /my/file
then
mv /my/file /somewhere/else
fi
You can move a file which something is accessing it, assuming you move it on the same file system using the rename(2) syscall (which mv would use if source and target are on the same file system). You could even remove it using unlink(2) system call. And moving such a file would indeed forbid other future processes to access it by that same path.
You could also use the inotify(7) API to be notified when something access it.
At you might also consider mandatory locking at least with some file systems.
but rumors are that mandatory locking does not work well and could be buggy sometimes

Resources