First the background to this intriguing challenge. The continuous integration build can often have failures during development and testing of deadlocks, loops, or other issues that result in a never ending test. So all the mechanisms for notifying that a build has failed become useless.
The solution will be to have the build script timeout if there's zero output to the build log file for more than 5 minutes since the build routinely writes out the names of unit tests as it proceeds. So that's the best way to identify it's "frozen".
Okay. Now the nitty gritty...
The build server uses Hudson to run a simple bash script that invokes the more complex build script based on Nant and MSBuild (all on Windows).
So far all solutions around the net involve a timeout on the total run time of the command. But that solution fails in this case because the tests might hang or freeze in the first 5 minutes.
What we've thought of so far:
First, here's the high level bash command run the full test suite in Hudson.
build.sh clean free test
That command simply sends all the Nant and MSBuild build logging to stdout.
It's obvious that we need to tee that output to a file:
build.sh clean free test 2>&1 | tee build.out
Then in parallel a command needs to sleep, check the modify time of the file and if more than 5 minutes kill the main process. A kill -9 will be fine at that point--nothing graceful needed once it has frozen.
That's the part you can help with.
In fact, I made a script like this over 15 years ago to kill the connection with a data phone line to japan after periods of inactivity but can't remember how I did it.
Sincerely,
Wayne
build.sh clean free test 2>&1 | tee build.out &
sleep 300
kill -KILL %1
You may be able to use timeout:
timeout 300 command
Solved this myself by writing a bash script.
It's called iotimeout with one parameter which is the number of seconds.
You use it like this:
build.sh clean dev test | iotimeout 120
iotimeout has 2 loops.
One is a simple while read line loop that echos echo line but
it also uses the touch command to update the modified time of a
tmp file every time it writes a line. Unfortunately, it wasn't
possible to monitor a build.out file because Windoze doesn't
update the file modified time until you close the file. Oh well.
Another loop runs in the background, that's a forever loop
which sleeps 10 seconds and then checks the modified time
of the temp file. If that ever exceeds 120 seconds old then
that loop forces the entire process group to exit.
The only tricky stuff was returning the exit code of the original
program. Bash gives you a PIPESTATUS array to solve that.
Also, figuring out how to kill the entire program group was
some research but turns out to be easy just--kill 0
Related
I have a PHP script that runs the following code:
exec("ls $image_subdir | parallel -j8 tesseract $image_subdir/{} /Processed/OCR/{.} -l eng pdf",$output, $result_code);
The code runs, however, even after I terminate the PHP script and close the browser, it continues to create the pdf files (thousands). It has been 24 hrs and it is still running. When I run a ps command, it only shows the 8 current processes that were created.
How can I find where all the pending ones are running and kill them? I believe I can simply restart Apache/PHP, but I would like to know where these pending processes are and how they can be down or controlled. It seemed originally that the code waited a minute while it executed the above code, then proceeded to the next line of code in the PHP script. So it appears that it created the jobs somewhere and then proceeded to the next line of code.
Is it perhaps something peculiar to the parallel command? Any information is very much appreciated. Thank you.
The jobs appear to have been produced by a perl process:
perl /usr/bin/parallel -j8 tesseract {...basically the code from the exec() function call in the php script}
perl was invoked either by the gnu parallel command or php's exec function. In any event, htop would not allow killing of process and did not produce any error or status and so it may be a permission problem preventing htop from killing the process. So it was done with sudo on the command line which ultimately killed the process and stopped any further processes creation from the original PHP exec() call.
Multiple scripts are running in my Linux server which are generating huge data and I realise that it will eat all my 500GB of storage size in next 2-5 days and scripts require 10 more days to finish the process means they need more space. So most likely I am going to have a space issue problem and I will have to restart the entire process again.
Process is like this -
script1.sh content is like below
"calling an api" > /tmp/output1.txt
script2.sh content is like below
"calling an api" > /tmp/output2.txt
Executed like this -
nohup ./script1.sh & ### this create file in /tmp/output1.txt
nohup ./script2.sh & ### this create file in /tmp/output2.txt
My understand initially was, if I will follow below steps, it should work --
when scripts are running with nohup in background execute this command -
mv /tmp/output1.txt /tmp/output1.txt_bkp; touch /tmp/output1.txt
And then transfer this file /tmp/output1.txt_bkp to another server via ftp and remove it after that to get space on server and script will keep on writing in /tmp/output1.txt file.
But this assumption was wrong and script is keep on writing in /tmp/output1.txt_bkp file. I think script is writing based on inode number that is why it is keep on writing in old file.
Now the question is how to avoid space issue without killing/restart scripts?
Essentially what you're trying to do is pull a file out from under a script that's actively writing into it. I'm not sure how nohup would let you do that.
May I suggest a different approach?
Why don't you move an x number of lines from your /tmp/output[x].txt to /tmp/output[x].txt_bkp? You can do so without much trouble while your script is running and dumping stuff into /tmp/output[x].txt. That way you can free up space by shrinking your output[x] files.
Try this as a test. Open 2 terminals (or use screen) to your Linux box. Make sure both are in the same directory. Run this command in one of your terminals:
for line in `seq 1 2000000`; do echo $line >> output1.txt; done
And then run this command in the other before the first one finishes:
head -1000 output1.txt > output1.txt_bkp && sed -i '1,+999d' output1.txt
Here is what's going to happen. The first command will start producing a file that looks like this:
1
2
3
...
2000000
The second command will chop off the first 1000 lines of output1.txt and put them into output1.txt_bkp and it will do so WHILE the file is being generated.
Afterwards, look inside output1.txt and output1.txt_bkp, you will see that the former looks like this:
1001
1002
1003
1004
...
2000000
While the latter will have the first 1000 lines. You can do the same exact thing with your logs.
A word of caution: Based on your description, your box is under a heavy load from all that dumping. This may negatively impact the process outlined above.
I wanted to run a particular function in .bashrc script file ( which actually does a job of removing a docker exited containers in the background)
I already looked into cron but it is not useful for me please suggest any other methods to do it.
I also tried writing a while loop along with sleep which is not the efficient method as we start it every time and stop it.
First choice is cron, but you can also use at.
Here is a little example. The script is started once per minute and loggt each run into logfile.dat
#!/bin/bash
echo "bash $0" | at now +1 minutes -M
date >> /tmp/logfile.dat
With atq you can see witch jobs waiting for next run an with atrm you can stop the cycle.
==> man at
I don't necessarily consider this a great idea either, but to answer the question you asked...
Here's a simple template you should be able to adapt.
chime() {
local chimeDelay=10 # seconds, adjust to your needs
echo "bong!"; date; # code that Does The Thing
sleep $chimeDelay && chime & # snooze and Do The Thing again
} >/tmp/chimelog 2>/tmp/chime.err # logs, not your console
Once you execute this it should keep spawning as long as you are logged in, but ought to collapse on a HUP, which I assume is what you wanted. If you just wanted a cron substitute, then write and run it as a simplistic daemon with a HUP trap, but you probably should add locks to keep multiple instances from running, etc.
I have a perl script that runs and does some checks.
In some cases that script fails and stops processing and in others completes.
What I would like to do is to be able to check if the script run within 1 minute and if the run was successful somehow then exit.
I thought about saving some file or checking $?, as an indication but I thought there may be exist some standard clean approach for this.
Would like a solution that would work for both linux and mac
You could see if you script has ended after a minute by trying something like this :
sleep 60
ps -ae | grep yourScript.name
This has to be executed at the same time as your script(s). If it returns nothing, that means your script isn't running anymore, aka has ended.
For the final result, you could make your perl script write into a specific log file, and check the end of this log file if the ps -ae | grep yourScript.name returned nothing.
Hope it helped !
I have a bash script that I'm running from DVD. This script copies multi-volume tar files from DVD to the local machine. Part-way through the copy, the script prompts the user to insert a second DVD, at which point the remaining files are copied. The script exists on the first DVD but not on the second.
This script is simply stopping after the last file is copied, but prior to starting the tar multi-volume extract operation and subsequent processing. There are no errors or messages reported. I've tried running bash with '-x' but there's nothing suspicious - not even an exit statement. Even more unfortunate is the fact that this behavior is inconsistent. Sometimes the script will stop, but other times it will continue with no problems.
I have run strace on the script. Following the conclusion of the copy operations, I see this:
read(255, "\0\0\0\0\0\0\0\0\0\0"..., 5007) = 1302
read(255, "", 5007) = 0
exit_group(0) = ?
I know that bash reads the script file into memory and executes it from there, but is it possible that it's trying to re-read the script file at some point and failing (since it no longer exists)? The tar files are quite large, and it takes approximately 10-15 minutes from the time the script starts to the time the last file is copied (from the second DVD).
I see you have already found a workaround, so I will just try to uncover what's happening:
bash isn't reading the whole script into memory, it's doing buffered reads on it, only as much as necessary each time (presumably that's for code sharing with terminal input). Before any external commands are launched, bash seeks to the exact position in the script and continues to read from there after the command finishes. You can see this if you edit the script file while it's running:
term1$ cat > test.sh
sleep 8
echo DONE
term1$ bash test.sh
While the sleep is executing, change the script from another terminal:
term2$ cat > test.sh
echo HAHA
Observe how bash becomes confused when the sleep is complete:
test.sh: line 2: A: command not found
It remembers that the position in the input file was 8 before the sleep, so it tries to read from there and is confronted with the last A from the overwritten script.
Now to your case. Normally, having a file open from a dvd locks the drive and prohibits disk change. If you nevertheless manage to change the disk, that should definitely involve an umount which should then invalidate the script fd. That's clearly not happening according to your strace output, which is a little strange. In any case, bash won't be able to read the rest of the script.