duplicate jobs in sun grid engine - qsub

When I run qacct with the job ID, after it is finished, I get two results,
the one I run and an older job with the same jobid.
how can I delete the history of qacct?
Any one know how to solve this?
Thanks
Tsvi

Grid Endine (or SGE) has job IDs in the range 0..99999. This may roll over quickly in some clusters and people may be interested in finding statistics of older jobs with the same ID. You can identify your jobs knowing also the approximate submit time.
Anyway if you want to eliminate the duplicate job IDs from qacct you can rotate the accounting file (//common/accounting) using utilities like logchecker.sh.
Check the man page or this grid engine online documentation:
http://gridscheduler.sourceforge.net/howto/rotatelogs.html

Related

Getting the load for each job

Where can I find the load (used/claimed CPUs) per job? I know to get it per host using sinfo, but that does not directly give information on which job causes a possible 'incorrect' load of anything unequal to 1.
(I want to get this for all jobs, i.e. logging in to the node and running top is not my objective.)
You can use
sacct --format='jobid,ReqCPUS,elapsed,AveCPU'
and compare Elapsed with AveCPU. The latter will only be available for job steps, not for the whole job.

OOZIE CRON Scheduler

How to get list of next materialized times for oozie coordinator if we have scheduled it frequency using CRON.
In HUE i can see the next materialized time only, I want a list of all the time it will run.
Thank you.
Oozie doesn't give you that list nor provide tools to calculate it.
I usually convert my expression CRON syntax and check with this http://www.cronmaker.com/. This is not a solid answer to your question but at least gives you the execution frequency.
rest syntax
curl http://www.cronmaker.com/rest/sampler?expression={expression}
sample with rest api
http://www.cronmaker.com/rest/sampler?expression=0%200/2%20*%201/1%20*%20?%20*

Best practice beanstalkd (queue) and node.js

I currently do service using beanstalkd and node.js.
I would like when jobs fail, retry n time before give up the job.
If the job succede i want do it the same job 10 time.
So, what is the best practice, stock in mongo db with the jobId the error and success count, or delete and put a new job with a an error and success count in the body.
I dont know if i'm clear? so tell me , thanks a lot
There is a stats-job <id>\r\n that should also be available via the API library that returns, among other things, how many times the specific job has been reserved, released, buried, and so on.
This allows for a number of retries of failed jobs by checking previous reservation/releases.
To run the same job multiple times, I would personally create either one additional job, with a success count that would then be incremented (into another new job) - or, all nine new jobs, with optional delays before they start.
You have a couple of ways to do this:
you can release the job, and obtain from stats the number of reserves
you can put a new job with a retry count, and keep track of history in the data payload
You should do the later, and you don't need MongoDB as a second dependency.

How to know OPC job status using Syncsort or anyother method?

My objective is, I need to get the current timestamp using Syncsort if one OPC job(existing Job) run fine in production. In my case I can not interpret my new job after existing OPC job. Is there any facility to check the existing job ran fine in production ?
I mean any reference table to have production job details with status for each day ?
Please help anyone to move.
There are commercial packages that track jobs and job status. CA (computer associates) is one such vendor.
However, these packages cost a lot. A simple, home grown solution, is to have a dataset known to both jobs and write a one line record into that data set when job1 completes and the second job2 can read the dataset to "KNOW" if the job ran. IF this is what you are trying to do, it is not exactly clear from your question. But any solution along these lines works, until management wants to cough up $50K (or whatever) for a commercial package.

How to process scheduled, recurring jobs with Kue?

In my webapp, users can create recurring invoices that need to be generated and sent out at certain dates each month. For example, an invoice might need to be sent out on the 5th of every month.
I am using Kue to process all my background jobs so I want to do it in this case as well.
My current solution is to use setInterval() to create a processRecurringInvoices job every hour. This job will then find all recurring invoices from database and create a separate generateInvoice job for each recurring invoice.
The generateInvoice job will then actually generate the invoice, and if needed, will also in turn create a sendInvoiceToEmail job that will email the invoice.
At the moment this solution looks good to me, because it has a nice separation of concerns, however, I have the following questions:
I am not sure if I should wait for all the 'child' jobs to complete before I call done() on the main processRecurringInvoices job?
Where should I handle errors? Should i pass them back to the processRecurringInvoices job or should I handle them separately for each job?
How can I make sure that if processing takes extra long time (more than an hour), and either processRecurringInvoices or any of the child jobs are still runnning, the processRecurringInvoices job is not created again? Kind of like a unique job, or mutual exclusion?
Instead of "processRecurringInvoices" it might be easier to think of it as a job that initiates other, separate invoice-processing jobs. Thinking of it this way, once the invoice processing jobs have been enqueued, you can safely call done() on the job that kicks them all off.
Thinking of the problem in the way described in question 1, errors should be handled within each of the individual invoice processing jobs. If an error occurred finding potential invoice jobs, then that would probably be handled in the processRecurringInvoices jobs.
you can use kue.Job.rangeByType() to search for currently active jobs. If a job is active, you can skip kicking it off again.

Resources