I have a script that writes to a file and then dumps that file to a database. I need this task to run as frequently as possible, but never run more than instance at the same time (or it'd be writing redundant stuff to the same file).
How I am currently doing it is in the shell script I am checking to see if a file exists, and if it does, I exit the script. At the end of each script it deletes the file.
This works 95% of the time. However, if the server is restarted (which happens semi-frequently), the file that was being written to will remain, and every time the script is called after that it will exit because the file already exists.
What would be a good way around this problem?
You could check to see if any processes are using the file with 'fuser'. It will return the PID of any program using the file. If there are no PIDS, you are safe to wipe it and start again.
Related
This is under Ubuntu 20.04.
There's a script that appends to a file via shell redirection.
I want to read that file after the script's process has ended and all data has been written.
I'm using pgrep to check when the script ends (I have carefully checked that this check works).
I have noted that the file may not be fully written even if the process ended.
Because of what I have read, this can happen because of buffering. A side question would be: can this actually happen or am I misunderstanding something?
I'm thinking on using lsof/inotifywait/a loop with fuser to await the file closing. Is this the right wait to manage this situations?
What I don't really understand is: if the process that opened the file exited, who will show as the file "opener" on lsof/inotifywait/fuser output?
If you're worried about the file not having been written to disk due to buffering and it's in a process where you don't have the file descriptor, you can force the system to write them to disk with the sync <file> command or sync function in unistd.h.
NOTICE: Feedback on how the question can be improved would be great as I am still learning, I understand there is no code because I am confident it does not need fixing. I have researched online a great deal and cannot seem to find the answer to my question. My script works as it should when I change the parameters to produce less outputs so I know it works just fine. I have debugged the script and got no errors. When my parameters are changed to produce more outputs and the script runs for hours then it stops. My goal for the question below is to determine if linux will timeout a process running over time (or something related) and, if, how it can be resolved.
I am running a shell script that has several for loops which does the following:
- Goes through existing files and copies data into a newly saved/named file
- Makes changes to the data in each file
- Submits these files (which number in the thousands) to another system
The script is very basic (beginner here) but so long as I don't give it too much to generate, it works as it should. However if I want it to loop through all possible cases which means I will generates 10's of thousands of files, then after a certain amount of time the shell script just stops running.
I have more than enough hard drive storage to support all the files being created. One thing to note however is that during the part where files are being submitted, if the machine they are submitted to is full at that moment in time, the shell script I'm running will have to pause where it is and wait for the other machine to clear. This process works for a certain amount of time but eventually the shell script stops running and won't continue.
Is there a way to make it continue or prevent it from stopping? I typed control + Z to suspend the script and then fg to resume but it still does nothing. I check the status by typing ls -la to see if the file size is increasing and it is not although top/ps says the script is still running.
Assuming that you are using 'Bash' for your script - most likely, you are running out of 'system resources' for your shell session. Also most likely, the manner in which your script works is causing the issue. Without seeing your script it will be difficult to provide additional guidance, however, you can check several items at the 'system level' that may assist you, i.e.
review system logs for errors about your process or about 'system resources'
check your docs: man ulimit (or 'man bash' and search for 'ulimit')
consider removing 'deep nesting' (if present); instead, create work sets where step one builds the 'data' needed for the next step, i.e. if possible, instead of:
step 1 (all files) ## guessing this is what you are doing
step 2 (all files)
step 3 (all files
Try each step for each file - Something like:
for MY_FILE in ${FILE_LIST}
do
step_1
step_2
step_3
done
:)
Dale
I've been troubleshooting this issue for about a week and I am nowhere, so I wanted to reach out for some help.
I have a perl script that I execute via command like, usually in a manner of
nohup ./script.pl --param arg --param2 arg2 &
I usually have about ten of these running at once to process the same type of data from different sources (that is specified through parameters). The script works fine and I can see logs for everything in nohup.out and monitor status via ps output. This script also uses a sql database to track status of various tasks, so I can track finishes of certain sources.
However, that was too much work, so I wrote a wrapper script to execute the script automatically and that is where I am running into problems. I want something exactly the same as I have, but automatic.
The getwork.pl script runs ps and parses output to find out how many other processes are running, if it is below the configured thresh it will query the database for the most out of date source and kick off the script.
The problem is that the kicked off jobs aren't running properly, sometimes they terminate without any error messages and sometimes they just hang and sit idle until I kill them.
The getwork script queries sql and gets the entire execution command via sql concatanation, so in the sql query I am doing something like CONCAT('nohup ./script.pl --arg ',param1,' --arg2 ',param2,' &') to get the command string.
I've tried everything to get these kicked off, I've tried using system (), but again, some jobs kick off, some don't, sometimes it gets stuck, sometimes jobs start and then die within a minute. If I take the exact command I used to start the job and run it in bash, it works fine.
I've tried to also open a pipe to the command like
open my $ca, "| $command" or die ($!);
print $ca $command;
close $ca;
That works just about as well as everything else I've tried. The getwork script used to be executed through cron every 30 minutes, but I scrapped that because I needed another shell wrapper script, so now there is an infinite look in the get work script that executes a function every 30 minutes.
I've also tried many variations of the execution command, including redirecting output to different files, etc... nothing seems to be consistent. Any help would be much appreciated, because I am truly stuck here....
EDIT:
Also, I've tried to add separate logging within each script, it would start a new log file with it's PID ($$). There was a bunch of weirdness there too, all log files would get created, but then some of the processes would be running and writing to the file, others would just have an empty text file and some would just have one or two log entries. Sometimes the process would still be running and just not doing anything, other times it would die with nothing in the log. Me, running the command in shell directly always works though.
Thanks in advance
You need a kind of job managing framework.
One of the bigest one is Gearman: http://www.slideshare.net/andy.sh/gearman-and-perl
Is there a way to find out which process wrote to a give file earlier. I am having a problem where multiple processes seem to be writing to a file. I know one of the processes but not sure who else is writing to the file. I am on linux/ubuntu. Is there a way a log is mantained by the OS on what processes have written to a specified file
Create a small monitoring process which will log periodically who is currently accessing the file.
You can write a small script using fuser. Is here a quick example (to be improved)
#!/bin/bash
log=~/file-access.log
while true
do
fuser your_file >> $log
sleep 0.2s
done
But you will have to be lucky that the process writing to this file takes enough time to have the chance to detect it with fuser.
No, there is nothing by default to keep track of which processes wrote to a file after the fact.
If you can repro at will, inotify or similar can help you monitor who is writing to the file as it happens.
Using the lsof command, I can determine whether a file is in use by some process, but I need to atomically check a file for use and move it only if unused. These files are in use by various other programs over which I have no control, so I can't use advisory locks. The purpose is to stop other processes from modifying that file, so just moving the file while a process has it open is not OK. Any solution?
UPDATE: a solution just occurred to me that I think suits my purposes. The end goal is to process these files in their final state when other programs finish modifying them. If I move the file to another directory, I can then use lsof to check whether it is still in use via its old path; if so, I just check again later until it's no longer in use and then process the file. By moving the file to another directory, it hides the file from users and the program. I don't want users and programs seeing the file in the old directory because that gives them opportunity to open the file in between the time I use lsof and process the file, which means I'd be processing the file in a modified state.
How about using fuser? Run it on a file and it will display the PIDs of processes using the file. If there are no PIDs, there is nothing using the file. It will also return a non-zero exit code if there is no process using the file.
However, note that you could still have a race condition, because a process could open the file after the fuser command returns and before you mv it.
Some sample code to move a file if not in use:
if ! fuser /my/file
then
mv /my/file /somewhere/else
fi
You can move a file which something is accessing it, assuming you move it on the same file system using the rename(2) syscall (which mv would use if source and target are on the same file system). You could even remove it using unlink(2) system call. And moving such a file would indeed forbid other future processes to access it by that same path.
You could also use the inotify(7) API to be notified when something access it.
At you might also consider mandatory locking at least with some file systems.
but rumors are that mandatory locking does not work well and could be buggy sometimes