pbs: input file not correclty updated - pbs

I am using a PBS queuing system and submit a job with the following bash jobscript
#PBS -l nodes=1:ppn=1
#PBS -l walltime=1:30:00
./aprogram $input
using qsub -v "input=myinputfile" script.job
This works fine except for the fact that if I run the job, change something in the input file without renaming it and rerun the job shortly thereafter, the input file the program aprogram gets is still the old input file.
Obviously the file is hiding somewhere and is not reread if the time is to short (waiting a few minutes does the trick). Does anybody have an idea where I could wipe the file out such that it is correctly read?

I can't speak for all PBS queuing systems, but Torque only makes a local copy of the script; it doesn't parse the script to figure out what the script uses and make local copies of those things. In other words, if you are using Torque it is not doing this. I'm not sure what it could be, although I would try to track down what is refreshed every few minutes to see if that could be it.

Related

Need suggestion to move a big live file in linux

Multiple scripts are running in my Linux server which are generating huge data and I realise that it will eat all my 500GB of storage size in next 2-5 days and scripts require 10 more days to finish the process means they need more space. So most likely I am going to have a space issue problem and I will have to restart the entire process again.
Process is like this -
script1.sh content is like below
"calling an api" > /tmp/output1.txt
script2.sh content is like below
"calling an api" > /tmp/output2.txt
Executed like this -
nohup ./script1.sh & ### this create file in /tmp/output1.txt
nohup ./script2.sh & ### this create file in /tmp/output2.txt
My understand initially was, if I will follow below steps, it should work --
when scripts are running with nohup in background execute this command -
mv /tmp/output1.txt /tmp/output1.txt_bkp; touch /tmp/output1.txt
And then transfer this file /tmp/output1.txt_bkp to another server via ftp and remove it after that to get space on server and script will keep on writing in /tmp/output1.txt file.
But this assumption was wrong and script is keep on writing in /tmp/output1.txt_bkp file. I think script is writing based on inode number that is why it is keep on writing in old file.
Now the question is how to avoid space issue without killing/restart scripts?
Essentially what you're trying to do is pull a file out from under a script that's actively writing into it. I'm not sure how nohup would let you do that.
May I suggest a different approach?
Why don't you move an x number of lines from your /tmp/output[x].txt to /tmp/output[x].txt_bkp? You can do so without much trouble while your script is running and dumping stuff into /tmp/output[x].txt. That way you can free up space by shrinking your output[x] files.
Try this as a test. Open 2 terminals (or use screen) to your Linux box. Make sure both are in the same directory. Run this command in one of your terminals:
for line in `seq 1 2000000`; do echo $line >> output1.txt; done
And then run this command in the other before the first one finishes:
head -1000 output1.txt > output1.txt_bkp && sed -i '1,+999d' output1.txt
Here is what's going to happen. The first command will start producing a file that looks like this:
1
2
3
...
2000000
The second command will chop off the first 1000 lines of output1.txt and put them into output1.txt_bkp and it will do so WHILE the file is being generated.
Afterwards, look inside output1.txt and output1.txt_bkp, you will see that the former looks like this:
1001
1002
1003
1004
...
2000000
While the latter will have the first 1000 lines. You can do the same exact thing with your logs.
A word of caution: Based on your description, your box is under a heavy load from all that dumping. This may negatively impact the process outlined above.

Why does my crontab not work?

I am planning to run some bash scripts every minute, and I wrote:
* * * * * bash ~/Dropbox/temp_scripts/run_all_scripts
in crontab.
It was supposed to run every minute, but it did not work. Does anyone have idea why this happens?
Transferring a comment into an answer.
Add I/O redirection to the command line in the crontab entry:
>/tmp/run_all_scripts.out 2>/tmp/run_all_scripts.err
Review the contents of the files after a minute or two has passed. Consider recording the environment to see if that's part of the problem. And consider using bash -x instead of just bash.
If you still don't get anything (the files in /tmp are not created), then you've got issues with cron; the daemon isn't running, or your user does not have permission to use it (but crontab isn't telling you that), or you've not submitted your crontab to the program (what does crontab -l say?), or … whatever is really wrong.
Note, too, that the output from cron jobs is normally (well, at least sometimes — on Mac OS X for a system I currently use, and Solaris for another that I've used previously) emailed to the person whose job it is. You should review the email on the system.
Thank you! I have already fixed it! The reason why it does not work is I used "ls -a .sh" in the script, and when the crontab did not find any *.sh files in the folder it was executing. When modifying it to "ls -a $HOME/Dropbox/temp_scripts/.sh", everything works! This debugging technique is quite helpful!
It is, in many ways, the most basic of debugging techniques — make sure you see what is actually happening. If you're not sure why a shell script isn't working, make sure you can see that it is executing and what it is producing in the way of output, and (very often) make sure you can see what it is executing with bash -x or equivalent. (AFAIK, all shells support -x to trace the execution.)

Run two shell script in parallel and capture their output

I want have a shell script, which configure several things and then call two other shell scripts. I want these two scripts run in parallel and I want to be able to get and print their live output.
Here is my first script which calls the other two
#!/bin/bash
#CONFIGURE SOME STUFF
$path/instance2_commands.sh
$path/instance1_commands.sh
These two process trying to deploy two different application and each of them took around 5 minute so I want to run them in parallel and also see their live output so I know where are they with the deploying tasks. Is this possible?
Running both scripts in parallel can look like this:
#!/bin/bash
#CONFIGURE SOME STUFF
$path/instance2_commands.sh >instance2.out 2>&1 &
$path/instance1_commands.sh >instance1.out 2>&1 &
wait
Notes:
wait pauses until the children, instance1 and instance2, finish
2>&1 on each line redirects error messages to the relevant output file
& at the end of a line causes the main script to continue running after forking, thereby producing a child that is executing that line of the script concurrently with the rest of the main script
each script should send its output to a separate file. Sending both to the same file will be visually messy and impossible to sort out when the instances generate similar output messages.
you may attempt to read the output files while the scripts are running with any reader, e.g. less instance1.out however output may be stuck in a buffer and not up-to-date. To fix that, the programs would have to open stdout in line buffered or unbuffered mode. It is also up to you to use -f or > to refresh the display.
Example D from an article on Apache Spark and parallel processing on my blog provides a similar shell script for calculating sums of a series for Pi on all cores, given a C program for calculating the sum on one core. This is a bit beyond the scope of the question, but I mention it in case you'd like to see a deeper example.
It is very possible, change your script to look like this:
#!/bin/bash
#CONFIGURE SOME STUFF
$path/instance2_commands.sh >> script.log
$path/instance1_commands.sh >> script.log
They will both output to the same file and you can watch that file by running:
tail -f script.log
If you like you can output to 2 different files if you wish. Just change each ling to output (>>) to a second file name.
This how I end up writing it using Paul instruction.
source $path/instance2_commands.sh >instance2.out 2>&1 &
source $path/instance1_commands.sh >instance1.out 2>&1 &
tail -q -f instance1.out -f instance2.out --pid $!
wait
sudo rm instance1.out
sudo rm instance2.out
My logs in two processes was different so I didn't care if aren't all together, that is why I put them all in one file.

IO Redirection in Linux Bash shell scripts not recreating moved/deleted file?

I am quite new to shell programming on Linux and in my Linux instance, I am redirecting the stdout and stderr of a program to two files in following manner and run it in background
myprog > run.log 2>> err.log &
This works fine, and I get my desired behavior
Now there is a another background process that monitors the run.log and err.log, and moves them to other file names, if the log files grow beyond a certain threshold.
e.g. mv err.log err[date-time].log
my expectation is that after this file move happens, err.log will be created again by the myprog output redirection and new output will be written to that new file. However, after my log file monitoring process moves the file, err.log or run.log never get created again although myprog continues to run without any issues.
Is this the normal behavior in Linux? If it is, what should I do to get my expected behavior working?
Yes, it is. Unless you first program reopen the files, it will keep writing to the old file, even if you can't access it anymore. In fact, the space used by that removed file will only be available after every process closes it. If reopening it is not possible (ie. you can't change the executable nor restart it), then a solution like http://httpd.apache.org/docs/2.4/programs/rotatelogs.html is your best bet.
It can rotate logs based on filesize or time, and even call a custom script after a rotation.
Example usage:
myprog | rotatelogs logname.log 50M
This way the log will be rotated whenever the size reaches 50 megabytes.
[EDIT: pointed to a newer version of rotatelogs]
If I had to guess, it actually associates the process that is logging with a file descriptor, not a file name. When you rename it, you only change the file name. So the process just keeps logging to the file. Just a guess. If I were tasked with fixing it, I would stop the logging process and restart it at that point to re-associate it with the right file.
Just a guess.
Software with support for log rotation actually has support written in for this rotation. If you look at man logrotate, you'll notice that a typical configuration looks like this:
"/var/log/httpd/access.log" /var/log/httpd/error.log {
rotate 5
mail www#my.org
size 100k
sharedscripts
postrotate
/usr/bin/killall -HUP httpd
endscript
}
...which is to say that it sends a HUP signal to the program whose log has been rotated; that program has a signal handler that reopens its output files.
You can do this in your shell scripts too:
reopen_logs() {
exec >>run.log 2>>err.log
}
trap reopen_logs HUP
...then, after rotating your logs, run kill -HUP pid_of_yourscript; on the next occasion when the script itself is executing a command (since signal handlers only run between foregrounded executables), it will reopen its output to recreate the log file without needing to restart.

Bash script exits with no error

I have a bash script that I'm running from DVD. This script copies multi-volume tar files from DVD to the local machine. Part-way through the copy, the script prompts the user to insert a second DVD, at which point the remaining files are copied. The script exists on the first DVD but not on the second.
This script is simply stopping after the last file is copied, but prior to starting the tar multi-volume extract operation and subsequent processing. There are no errors or messages reported. I've tried running bash with '-x' but there's nothing suspicious - not even an exit statement. Even more unfortunate is the fact that this behavior is inconsistent. Sometimes the script will stop, but other times it will continue with no problems.
I have run strace on the script. Following the conclusion of the copy operations, I see this:
read(255, "\0\0\0\0\0\0\0\0\0\0"..., 5007) = 1302
read(255, "", 5007) = 0
exit_group(0) = ?
I know that bash reads the script file into memory and executes it from there, but is it possible that it's trying to re-read the script file at some point and failing (since it no longer exists)? The tar files are quite large, and it takes approximately 10-15 minutes from the time the script starts to the time the last file is copied (from the second DVD).
I see you have already found a workaround, so I will just try to uncover what's happening:
bash isn't reading the whole script into memory, it's doing buffered reads on it, only as much as necessary each time (presumably that's for code sharing with terminal input). Before any external commands are launched, bash seeks to the exact position in the script and continues to read from there after the command finishes. You can see this if you edit the script file while it's running:
term1$ cat > test.sh
sleep 8
echo DONE
term1$ bash test.sh
While the sleep is executing, change the script from another terminal:
term2$ cat > test.sh
echo HAHA
Observe how bash becomes confused when the sleep is complete:
test.sh: line 2: A: command not found
It remembers that the position in the input file was 8 before the sleep, so it tries to read from there and is confronted with the last A from the overwritten script.
Now to your case. Normally, having a file open from a dvd locks the drive and prohibits disk change. If you nevertheless manage to change the disk, that should definitely involve an umount which should then invalidate the script fd. That's clearly not happening according to your strace output, which is a little strange. In any case, bash won't be able to read the rest of the script.

Resources