we have a hourly.sh script that contains abc.py script.
1. when i run the abc.py script independently it runs fine.
2. when i run an empty hoursly.sh (without abc.py script inside) it runs fine too.
But when hourly.sh is ran with abc.py inside, it hits memory related issues ("16214 Segmentation fault (core dumped)"). Just to provide an additional data point, there is no other script running at the same time as this script which can put more burden on the system.
What could cause a script to fail when triggered via cron?
There's always the possiblity that the differences in runtime environment cause problems. Take a look at the process parameters (Number of files etc.) which you can select using the "ulimit" command.
Maybe take a look at quotas for the user the cronjob is run, maybe the PATH environment.
Related
I am trying to run a simple code within SGE. However, I get very different results from same code when running from an interactive session (qrsh) or via qsub. Mostly many codes fail to run from qsub (without any warning or error).
Is there anyway to set up an interactive session within a batch submission (running qrsh within qsub)?
qsub test.sh
-V
-cwd
source MYPATH
qrsh
MYCODEHERE
`
Not sure if what you ask is possible. I can think of two ways why you are observing different results.
1) Environment differences: Between cluster nodes
2) Incomplete outputs: Maybe the code runs into an edge cases (not enough memory etc.) and exits silently.
Not exactly what you asked for but just trying to help.
You could submit a parallel job and then use qrsh -inherit <hostname> <command> to run a command under qrsh. Unfortunately grid engine limits the number of times you can call qrsh -inherit to either the number of slots in the job or one less (dependent on the job_is_first_task setting of the PE.
However it is likely that the problems are caused by a different environment between the qrsh environment and that provided by qsub by default. If you are selecting the shell to interpret your job script in the traditional unix way (Putting #!/bin/bash or similar as the first line of your job script you could try adding a -l to that line to make it a login shell #!/bin/bash -l which is likely more similar to what you get with qrsh.
NOTICE: Feedback on how the question can be improved would be great as I am still learning, I understand there is no code because I am confident it does not need fixing. I have researched online a great deal and cannot seem to find the answer to my question. My script works as it should when I change the parameters to produce less outputs so I know it works just fine. I have debugged the script and got no errors. When my parameters are changed to produce more outputs and the script runs for hours then it stops. My goal for the question below is to determine if linux will timeout a process running over time (or something related) and, if, how it can be resolved.
I am running a shell script that has several for loops which does the following:
- Goes through existing files and copies data into a newly saved/named file
- Makes changes to the data in each file
- Submits these files (which number in the thousands) to another system
The script is very basic (beginner here) but so long as I don't give it too much to generate, it works as it should. However if I want it to loop through all possible cases which means I will generates 10's of thousands of files, then after a certain amount of time the shell script just stops running.
I have more than enough hard drive storage to support all the files being created. One thing to note however is that during the part where files are being submitted, if the machine they are submitted to is full at that moment in time, the shell script I'm running will have to pause where it is and wait for the other machine to clear. This process works for a certain amount of time but eventually the shell script stops running and won't continue.
Is there a way to make it continue or prevent it from stopping? I typed control + Z to suspend the script and then fg to resume but it still does nothing. I check the status by typing ls -la to see if the file size is increasing and it is not although top/ps says the script is still running.
Assuming that you are using 'Bash' for your script - most likely, you are running out of 'system resources' for your shell session. Also most likely, the manner in which your script works is causing the issue. Without seeing your script it will be difficult to provide additional guidance, however, you can check several items at the 'system level' that may assist you, i.e.
review system logs for errors about your process or about 'system resources'
check your docs: man ulimit (or 'man bash' and search for 'ulimit')
consider removing 'deep nesting' (if present); instead, create work sets where step one builds the 'data' needed for the next step, i.e. if possible, instead of:
step 1 (all files) ## guessing this is what you are doing
step 2 (all files)
step 3 (all files
Try each step for each file - Something like:
for MY_FILE in ${FILE_LIST}
do
step_1
step_2
step_3
done
:)
Dale
I've been troubleshooting this issue for about a week and I am nowhere, so I wanted to reach out for some help.
I have a perl script that I execute via command like, usually in a manner of
nohup ./script.pl --param arg --param2 arg2 &
I usually have about ten of these running at once to process the same type of data from different sources (that is specified through parameters). The script works fine and I can see logs for everything in nohup.out and monitor status via ps output. This script also uses a sql database to track status of various tasks, so I can track finishes of certain sources.
However, that was too much work, so I wrote a wrapper script to execute the script automatically and that is where I am running into problems. I want something exactly the same as I have, but automatic.
The getwork.pl script runs ps and parses output to find out how many other processes are running, if it is below the configured thresh it will query the database for the most out of date source and kick off the script.
The problem is that the kicked off jobs aren't running properly, sometimes they terminate without any error messages and sometimes they just hang and sit idle until I kill them.
The getwork script queries sql and gets the entire execution command via sql concatanation, so in the sql query I am doing something like CONCAT('nohup ./script.pl --arg ',param1,' --arg2 ',param2,' &') to get the command string.
I've tried everything to get these kicked off, I've tried using system (), but again, some jobs kick off, some don't, sometimes it gets stuck, sometimes jobs start and then die within a minute. If I take the exact command I used to start the job and run it in bash, it works fine.
I've tried to also open a pipe to the command like
open my $ca, "| $command" or die ($!);
print $ca $command;
close $ca;
That works just about as well as everything else I've tried. The getwork script used to be executed through cron every 30 minutes, but I scrapped that because I needed another shell wrapper script, so now there is an infinite look in the get work script that executes a function every 30 minutes.
I've also tried many variations of the execution command, including redirecting output to different files, etc... nothing seems to be consistent. Any help would be much appreciated, because I am truly stuck here....
EDIT:
Also, I've tried to add separate logging within each script, it would start a new log file with it's PID ($$). There was a bunch of weirdness there too, all log files would get created, but then some of the processes would be running and writing to the file, others would just have an empty text file and some would just have one or two log entries. Sometimes the process would still be running and just not doing anything, other times it would die with nothing in the log. Me, running the command in shell directly always works though.
Thanks in advance
You need a kind of job managing framework.
One of the bigest one is Gearman: http://www.slideshare.net/andy.sh/gearman-and-perl
I have set a cron job for a file for every 6 hours. The file may run for 4hours.
If i set cron for another file , will it affect the previous one which may run for 4hours?
No. If the job is not working on same resources, it wont conflict even if it's running simultaneously.
The cron daemon doesn't check to see if anything else by the same name is running, if that is what you mean, so cron will not care. However, if your script creates temporary files, for example, without using helper-tools like "mktemp" they could conflict with each other - so that will depend how well written your script is.
I run a project build command which has a long execution period. I run it when I go to sleep. Is there some Linux system mechanism that logs the command duration in bash?
It is time you are searching for? Just prepend the time command before the command to be executed:
time build_cmd
Output looks like:
real 0m0.153s
user 0m0.004s
sys 0m0.000s
where real is the total time the command takes too run, user the amount of time spent in userland code and sys the time spent in kernel code.
If you want to get fancy, you can use Jenkins to do your build. It can build any type of project, and not just Java projects. One of the plugins is the Timestamper which will add a time stamp to each execution line of your build. Plus, Jenkins will always tell you (with or without this plugin) how long the build took.
There are lots of advantages of using a CI server for builds, even if you don't use it for continuous integration. (i.e., you either spawn the builds manually, or at a certain time of day).
You can also use the time command as pointed out by hek2mgl, or write a tiny shell script to print out a timestamp before and after the build:
date
build_cmd
date
Jenkins, as proposed by #DavidV. is a very complete tool to automate and manage build processes, while the time command, when prefixing the build command, will tell you exactly how much time the build process took, as proposed by #hek2mgl.
But assume you do not want to install Jenkins, and you forgot to type time, and you have started your command already, you can use either of the following trick:
look at the creation time of the first and latest target, or
you can also, while the building command is running, typing the date command blindly across the output of the building process. It will be buffered and executed when the process terminates. Combine that with the use of $HISTTIMEFORMAT and you have a pretty good idea of how much the process took.