I have a question concerning the integration of split(), resequence() together with multithreading. My (naive) routes are looking like this (abbreviated to explain the problem):
from("file:input")
.process(prioAssign)
.split(body().tokenize("\n")).streaming()
.resequence().simple("${in.header.prio}").allowDuplicates().reverse()
.to("direct:process")
.end()
.process(exportProcessor)
from("direct:process")
.threads(10, 100, "process")
.process(importProcessor) // take some time for processing
I like to accomplish the following things:
The importProcessor work should be distributed over several threads
The items (coming from the splitter) should be processed by priority (resequenced)
The exportProcessor must be triggered when all splitted objects are processed (from one file)
The problem with the code above is, that if I include the resequence step, the export is triggered immediately and the resequencing itself doesn't work. It seems, I don't understand the threading model behind Camel.
Thanks a lot in advance for all hints!
Couldn't it be that your prioAssign processor doesn't build a body that can be split later, and so the split ends instantly and everything moves to the exportProcessor?
Related
I have a shell script that copies files into a location and another one that picks these up for further processing. I want to use multithreading to pick up files parallelly in Scala using a threadpool.
However, if there are two threads and two files, both of them are picking up the same file. I have tried the program a lot of times, and it always ends up like this. I need the threads to pick up different files in parallel.
Can someone help me out? What approaches can I use? If you could point me in the right direction that would be enough.
I think you can use a parallel sequence to do the processing in parallel.
You don't have to handle this logic yourself. for ex. the code could be like this:
newFiles:Seq[String] = listCurrentFilesNames()
newFiles.par.foreach { fileName =>
processFile(fileName)
}
This code will be executed in parallel. and you could set the number of threads to a specific number as mentioned here: https://stackoverflow.com/a/37725987/2201566
You can also try using actors - for eg- for your reference - https://github.com/tsheppard01/akka-parallel-read-csv-file
I want to look for files with given extensions recursively from a given root directory and to display the number of files currently found in my GUI.
Since this kind of processing may be long, the GUI may be blocked.
I could just wait for the end of the processing and get the file count, but I am learning Qt (PyQt), so I see this as a training.
So I have read Qt doc:
When to Use Alternatives to Threads, and I don't think it's for me.
Then I read:
Choosing an Appropriate Approach, and I think my solution is the first one:
Run a new linear function within another thread, optionally with
progress updates during the run
But in this case you have 3 choices:
Qt provides different solutions:
Place the function in a reimplementation of QThread::run() and start the QThread. Emit signals to update progress. OR
Place the function in a reimplementation of QRunnable::run() and add the QRunnable to a QThreadPool. Write to a thread-safe variable
to update progress. OR
Run the function using QtConcurrent::run(). Write to a thread-safe variable to update progress.
Could you tell me how to choose the best one?
I have read some "solutions" but I'd like to understand why you should use one methodology instead of another one.
And also since I am looking for files, I may have a directory in which many files would match the search criteria. So it would mean lots of interruptions. Is there something special to keep in mind regarding this?
Thank you!
From what I know (hopefully more can chime in).
QThread offers support with signal interaction. For example, you'd be able to stop your concurrent function with a signal. Not sure how you'd do that with the other options, if at all.
Things to keep in mind: widgets all have to live in the main thread, but can communicate with other other threads via signals & slots.
Another quick thread on the topic w/ some decent bullet-points.
https://qt-project.org/forums/viewthread/50165/
Best of luck on your project, and welcome to Qt!
I would like to have one thread to queue some requests in a request queue and another to serve these requests. The producer should wake up the consumer when there is a new request queued.
Is there anyone who has done this already or knows how to do it?
I have tried several tutorials on the internet and none of them really worked cleanly. They either miss a request, cause a system lockup/instability, or they just do not terminate.
Note: My question in essence is similar to this one. However, I wont be specific like the one who asked that question. Anyone who can/willing to help can just throw his two cents and may be we can work something out.
Thanks!
You can use Work Queues. Work Queues are simple, once you set up up your work queue, you use something like the following:
DECLARE_WORK(name, void (*function)(void *), void *data);
Your function call will be scheduled and called later, take a look at this article.
I also highly recommend you this book: Linux Device Drivers
edit: I just saw you already linked an SO post where they use work queues. Have you tried it out? You run into some issues? I suggest you start with an really simple example, just to try out if it's working. Implement your core functionality later.
Update:
From the official Documentation:
Some users depend on the strict execution ordering of ST wq. The
combination of #max_active of 1 and WQ_UNBOUND is used to achieve this
behavior. Work items on such wq are always queued to the unbound
worker-pools and only one work item can be active at any given time
thus achieving the same ordering property as ST wq.
That way you will have a guaranteed FIFO execution of your workers. But be aware that the work may be executed on different CPUs. You have to use memory barriers to ensure visibility (eg. wmb()).
Update:
As #user2009594 mentioned, a single threaded wq can be created using the following macro defined in linux/workqueue.h:
#define create_singlethread_workqueue(name) \
alloc_workqueue("%s", WQ_UNBOUND | WQ_MEM_RECLAIM, 1, (name)))
Multicast Netlink sockets can work here greatly. Recently I did the same; only difference was that my consumer was in kernel while producers in user space: same can be used in kernel only space.
I'm working on an application that processes (possibly large reaching one or two million lines) text (in tab separated form) files containing detail of items and since the processing time can be long I want to update a progress bar so the user knows that the application didn't just hang, or better, to provide an idea of the remaining time.
I've already researched and I know how to update a simple progress bar but the examples tend to be simplistic as to call something like progressBar.setProgress(counter++, 100) using Timer, there are other examples where the logic is simple and written in the same class. I'm also new to the language having done mostly Java and some JavaScript in the past, among others.
I wrote the logic for processing the file (validation of input and creation of output files). But then, if I call the processing logic in the main class the update will be done at the end of processing (flying by so fast from 0 to 100) no matter if I update variables and try to dispatch events or things like that; the bar won't reflect the processing progress.
Would processing the input by chunks be a valid approach? And then, I'm not sure if the processing delay of one data chunk won't affect the processing of the next chunk and so on, because the timer tick is set to be 1 millisecond and the chunk processing time would be longer than that. Also, if the order of the input won't be affected or the result will get corrupted in some way. I've read multithreading is not supported in the language, so should that be a concern?
I already coded the logic described before and it seems to work:
// called by mouse click event
function processInput():void {
timer = new Timer(1);
timer.addEventListener(TimerEvent.TIMER, processChunk);
timer.start();
}
function processChunk(event:TimerEvent):void {
// code to calculate start and end index for the data chunk,
// everytime processChunk is executed these indexes are updated
var dataChunk:Array = wholeInputArray.splice(index0, index1);
processorObj.processChunk(dataChunk)
progressBar.setProgress(index0, wholeInputArray.length);
progressBar.label = index0 + " processed items";
if(no more data to process) { // if wholeInputArray.length == index1
timer.stop();
progressBar.setProgress(wholeInputArray.length, wholeInputArray.length);
progressBar.label = "Processing done";
// do post processing here: show results, etc.
}
}
The declaration for the progress bar is as follows:
<mx:ProgressBar id="progressBar" x="23" y="357" width="411" direction="right"
labelPlacement="center" mode="manual" indeterminate="false" />
I tested it with an input of 50000 lines and it seems to work generating the same result as the other approach that processes the input at once. But, would that be a valid approach or is there a better approach?
Thanks in advance.
your solution is good, i use it most of time.
But multithreading is now supported on AS3 (for desktop and web only for the moment).
Have a look at: Worker documentation and Worker exemple.
Hope that helps :)
may I ask if this Timer AS IS is the working Timer ??? because IF YES then you are in for a lot of trouble with your Application in the long run! - re loading & getting the Timer to stop, close etc. The EventListener would be incomplete and would give problems for sure!
I would like to recommend to get this right first before going further as I know from experience as in some of my own AIR Applications I need to have several hundred of them running one after another in modules as well as in some of my web Apps. not quiet so intense yet a few!
I'm sure a more smother execution will be the reward! regards aktell
Use Workers. Because splitting data into chunks and then processing it is a valid but quite cumbersome approach and with workers you can simply spawn a background worker, do all the parsing there and return a result, all without blocking GUI. Worker approach should require less time to do parsing, because there is no need to stop parser and wait for the next frame.
Workers would be an ideal solution, but quite complicated to set up. If you're not up to it right now, here's a PseudoThread solution I use in similar situations which you can probably get up and running in 5 minutes:
Pseudo Threads
It uses EnterFrame events for balancing between work and letting the UI does its thing and you can manually update the progress bar within your 'thread' code. I think it would be easily adapted for your needs since your data is easily sliced.
Without using Workers (which it seems you are not yet familiar with) AS3 will behave single threaded. Your timers will not overlap. If one of your chunks takes more than 1s to complete the next timer event will be processed when it can. It will not queue up further events if it takes more than your time period ( assuming your processing code is blocking).
The previous answers show the "correct" solution to this, but this might get you where you need to be faster.
I am building a simple application to download a set of XML files and parse them into a database using the async module (https://npmjs.org/package/node-async) for flow control. The overall flow is as follows:
Download list of datasets from API (single Request call)
Download metadata for each dataset to get link to XML file (async.each)
Download XML for each dataset (async.parallel)
Parse XML for each dataset into JSON objects (async.parallel)
Save each JSON object to a database (async.each)
In effect, for each dataset there is a parent process (2) which sets of a series of asynchronous child processes (3, 4, 5). The challenge that I am facing is that, because so many parent processes fire before all of the children of a particular process are complete, child processes seem to be getting queued up in the event loop, and it takes a long time for all of the child processes for a particular parent process to resolve and allow garbage collection to clean everything up. The result of this is that even though the program doesn't appear to have any memory leaks, memory usage is still too high, ultimately crashing the program.
One solution which worked was to make some of the child processes synchronous so that they can be grouped together in the event loop. However, I have also seen an alternative solution discussed here: https://groups.google.com/forum/#!topic/nodejs/Xp4htMTfvYY, which pushes parent processes into a queue and only allows a certain number to be running at once. My question then is does anyone know of a more robust module for handling this type of queueing, or any other viable alternative for handling this kind of flow control. I have been searching but so far no luck.
Thanks.
I decided to post this as an answer:
Don't launch all of the processes at once. Let the callback of one request launch the next one. The overall work is still asynchronous, but each request gets run in series. You can then pool up a certain number of the connections to be running simultaneously to maximize I/O throughput. Look at async.eachLimit and replace each of your async.each examples with it.
Your async.parallel calls may be causing issues as well.