threading the same job to work on separate files

threading the same job to work on separate files - multithreading

I want to run a job on 10 threads to process 100 files. Each thread is supposed to work on separate file. When a thread is done it is supposed to pick the next file.
What I'm doing right now is basically going on a loop, kick off the job and make it run in the background (using &), wait for any process to end if the count of processes is greater than 10 and pick up the next file. It is working but is there a better to way to achieve this?

I don't see any better solution, as long as each file has to be processed separately.

You'd be better off having each thread process its file, then try to pick up the next file itself. Maybe it won't matter in your current application, but the starting and teardown of threads is relatively expensive. Common practice is to keep a thread alive if it is otherwise just going to be replaced by a clone of itself.

Related

Sleep loop in groovy for hour

hey getting used to groovy and i wanted to have a loop such as a do while loop in my groovy script which is ran every hour or 2 for until a certain condition inside the loop is met (variable = something). So I found the sleep step but was wondering if it would be ok to sleep for such a long time. The sleep function will not mess up right?

The sleep function will not mess up. But that isn't your biggest problem.
If all your script is doing is sleeping, it would be better to have a scheduler like Cron launch your script. This way is simpler and more resilient, it reduces the opportunities for the script to be accumulating garbage, leaking memory, having its JVM get killed by another process, or otherwise just falling into a bad state from programming errors. Cron is solid and there is less that can go wrong that way. Starting up a JVM is not speedy but if your timeframe is in hours it shouldn't be a problem.
Another possible issue is that the time your script wakes up may drift. The OS scheduler is not obliged to wake your thread up at exactly the elapsed time. Also the time on the server could be changed while the script is running. Using Cron would make the time your script acts more predictable.
On the other hand, with the scheduler, if a process takes longer than the time to the next run, there is the chance that multiple instances of the process can exist concurrently. You might want to have the script create a lock file and remove it once it's done, checking to see if the file exists already to let it know if another instance is still running.

First of all there's not do {} while() construct in groovy. Secondly it's a better idea to use a scheduler e.g. QuartzScheduler to run a cron task.

Having intercommunicating asynchronous processes in wxPython

I am working on a big project that puts performance as a high priority. I have a little bit of experience using wxPython to create windows and dialog boxes for software, but I have no experience in getting processes to work in parallel during the course of a single program.
So basically, what I want to accomplish is the following:
I want one main class that controls the high level program. It sets up a configuration either from a config file or from user input. This much I have accomplished on my own.
I need PROCESS #1 to read in a file and a list of commands, execute the commands, and then pass the modified file to PROCESS #2 (this requires that PROCESS #2 is ready to accept new input.) Once the file is passed, PROCESS #1 would begin work on the next set of inputs and wait for PROCESS #2 to finish before the cycle repeats.
PROCESS #2 takes input from PROCESS #1 and writes output to a log file. Once the output is complete, it waits for the next set of output from PROCESS #1.
I know how to use wxTimers and the events associated with that, but what I have found is that a timer event will not execute if the program is otherwise occupied (like in the middle of a method.)
I have seen threads about "threading" and "Pool", but the terminology tends to go over my head, and I haven't gotten any of that sort of stuff to work.
If anybody can point me in the right direction, I would be greatly appreciative.

If you use threads, then I think this would be fairly easy to do. Here's what I would suggest:
Create a button (or some other widget) to execute process #1 in a thread. The thread itself will run BOTH processes. Here's some psuedo-code that might help:
# this is in your thread code:
result = self.call_process_1(args)
self.call_process_2(result)
This will allow you to start another process #1/2 with a new set of commands every time you press the button. Since the two processes are encapsulated in the thread, they don't have to wait for process #2 to finish. You will probably need to log to separate logs for the logs to make sense, but you can label the logs with a timestamp and a thread number or a uuid.
Depending on how many of these processes you need to do, you might need to look into setting up a cluster that's driven with celery or some such. But I think this is a good starting place.

wxpython using gauge pulse with threaded long running processes

The program I am developing uses threads to deal with long running processes. I want to be able to use Gauge Pulse to show the user that whilst a long running thread is in progress, something is actually taking place. Otherwise visually nothing will happen for quite some time when processing large files & the user might think that the program is doing nothing.
I have placed a guage within the status bar of the program. My problem is this. I am having problems when trying to call gauge pulse, no matter where I place the code it either runs to fast then halts, or runs at the correct speed for a few seconds then halts.
I've tried placing the one line of code below into the thread itself. I have also tried create another thread from within the long running process thread to call the code below. I still get the same sort of problems.
I do not think that I could use wx.CallAfter as this would defeat the point. Pulse needs to be called whilst process is running, not after the fact. Also tried usin time.sleep(2) which is also not good as it slows the process down, which is something I want to avoid. Even when using time.sleep(2) I still had the same problems.
Any help would be massively appreciated!
progress_bar.Pulse()

You will need to find someway to send update requests to the main GUI from your thread during the long running process. For example, if you were downloading a very large file using a thread, you would download it in chunks and after each chunk is complete, you would send an update to the GUI.
If you are running something that doesn't really allow chunks, such as creating a large PDF with fop, then I suppose you could use a wx.Timer() that just tells the gauge to pulse every so often. Then when the thread finishes, it would send a message to stop the timer object from updating the gauge.
The former is best for showing progress while the latter works if you just want to show the user that your app is doing something. See also
http://wiki.wxpython.org/LongRunningTasks
http://www.blog.pythonlibrary.org/2010/05/22/wxpython-and-threads/
http://www.blog.pythonlibrary.org/2013/09/04/wxpython-how-to-update-a-progress-bar-from-a-thread/

Is is OK to use a non-zero return code for a process that executed successfully?

I'm implementing a simple job scheduler, which spans a new process for every job to run. When a job exits, I'd like it to report the number of actions executed to the scheduler.
The simplest way I could find, is to exit with the number of actions as a return code. The process would for example exit with return code 3 for "3 actions executed".
But the standard (AFAIK) being to use the return code 0 when a process exited successfully, and any other value when there was en error, would this approach risk to create any problem?
Note: the child process is not an executable script, but a fork of the parent, so not accessible from the outside world.

What you are looking for is inter process communication - and there are plenty ways to do it:
Sockets
Shared memory
Pipes
Exclusive file descriptors (to some extend, rather go for something else if you can)
...
Return convention changes are not something a regular programmer should dare to violate.

The only risk is confusing a calling script. What you describe makes sense, since what you want really is the count. As Joe said, use negative values for failures, and you should consider including a --help option that explains the return values ... so you can figure out what this code is doing when you try to use it next month.

I would use logs for it: log the number of actions executed to the scheduler. This way you can also log datetimes and other extra info.
I would not change the return convention...

If the scheduler spans a child and you are writing that you could also open a pipe per child, or a named pipes or maybe unix domain sockets, and use that for inter process communication and writing the processed jobs there.
I would stick with conventions, namely returning 0 for success, expecially if your program is visible/usable around by other people, or anyway document well those decisions.
Anyway apart from conventions there are also standards.

How can I keep a RPG program running in memory?

I coded a monitoring program in RPG that checks if the fax/400 is operational.
And now I want this program to check every 15 minutes.
Instead of placing a job every 15 minutes in the job scheduler (which would be ugly to manage), I made the program wait between checks using DLYJOB.
Now how can I make this program "place itself" in memory so it keeps running?
(I thought of using SBMJOB, but I can't figure in which job queue I could place it.)

A good job queue to use for an endlessly running job would be QSYSNOMAX. That allows unlimited numbers of jobs to be running.
You could submit the job to that queue in your QSTRTUP program and it will simply remain running all the time.

Here what I have done in the past. There are two approaches to this.
Submit a new job every time the program runs with DLYJOB before it runs.
Create a loop and only end given a certain condition.
What I did with a Monitor MSGW program was the following:
PGM
DCL VAR(&TIME) TYPE(*CHAR) LEN(6)
DCL VAR(&STOPTIME) TYPE(*CHAR) LEN(6) +
VALUE('200000')
/* Setup my program (run only once) */
START:
/* Perform my actions */
RTVSYSVAL SYSVAL(QTIME) RTNVAR(&TIME)
IF COND(&TIME *GE &STOPTIME) THEN(GOTO CMDLBL(END))
DLYJOB DLY(180)
GOTO CMDLBL(START)
END:
ENDPGM
This will run continuously until 8:00 pm. Then I add this to the job scheduler to submit every morning.
As far as which jobq. I am using QINTER, but it could really be run anywhere. Make sure you choose a subsystem with enough available running jobs as this will take one.
The negative of running in QINTER if the program starts to hit 100% CPU, that will use up all of your interactive CPU and effectively locks up your system.

i know of 3 ways to that.
1) using Data queue, there is parm to tell it to wait endlessly and at time-interval.
2) using OVRDBF cmd, there is parm there to tell that it should not end or EOF, making your pgm to keep on waiting.
3) easiest to implement, sbmjob to call a pgm that loops forever eg with DOW 1=1, you can insert a code to check for certain time interval before it iterates. You can have your logic inside the loop that checks for fax, process it and then back to waiting.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

threading the same job to work on separate files - multithreading

I don't see any better solution, as long as each file has to be processed separately.

Related

Sleep loop in groovy for hour

Having intercommunicating asynchronous processes in wxPython

wxpython using gauge pulse with threaded long running processes

Is is OK to use a non-zero return code for a process that executed successfully?

How can I keep a RPG program running in memory?

Categories

Resources