I'm trying to figure out how utilizing SWF Flow framework, I can have my activity worker poll multiple task list. The use case is for having two different priorities for activity tasks that need to be completed.
Bouns points if someone uses glisten and can point out a way to achieve that.
Thanks!
It is not possible for a single ActivityWorker to poll on multiple task lists. The reason for such design is that each poll request can take up to a minute due to long poll. If a few such polls feed into a single threaded activity implemenation it is not clear how to deal with conflicts that arise if tasks are received on multiple task lists.
Until the SWF natively supports priority task lists the solution is to instantiate one ActivityWorker per task list (priority) and deal with conflicts yourself.
Related
I want to schedule a periodic task with Celery dynamically at the end of another group of task.
I know how to create (static) periodic tasks with Celery:
CELERYBEAT_SCHEDULE = {
'poll_actions': {
'task': 'tasks.poll_actions',
'schedule': timedelta(seconds=5)
}
}
But I want to create periodic jobs dynamically from my tasks (and maybe have a way to stop those periodic jobs when some condition is achieved (all tasks done).
Something like:
#celery.task
def run(ids):
group(prepare.s(id) for id in ids) | execute.s(ids) | poll.s(ids, schedule=timedelta(seconds=5))
#celery.task
def prepare(id):
...
#celery.task
def execute(id):
...
#celery.task
def poll(ids):
# This task has to be schedulable on demand
...
The straightforward solution to this requires that you be able to add/remove beat scheduler entries on the fly. As of the answering of this question...
How to dynamically add / remove periodic tasks to Celery (celerybeat)
This was not possible. I doubt it has become available in the interim because ...
You are conflating two concepts here. The notion of "Event Driven Work" and the idea of "Batch Schedule Driven Work"(which is really just the first case where the event happens on a schedule). If you really consider what you are doing here you'll find that there is a rather complex set of edge cases. Messages are distributed in nature what happens when groups spawned from different messages start creating conflicting entries? What do you do when you find yourself under a mountain of previously scheduled kruft?
When working with messaging systems you are really looking to build recursive trees. Spindles of work that do something and spawn more messages to do more things. Cycles(intended or otherwise) aside these ultimately achieve their base cases and terminate.
The answer to whatever you are actually trying to achieve lies with re-encoding your problem within the limitations of your messaging system and asynchronous work framework.
I have a requirement to create a child stream which will pick only specific folders from mainline(parent) stream. While creating child stream, to achieve this I restrict the view by using share/isolate/import successfully able to create the child streams which only the code i am interested in.
But, I have gone through some tutorials on streams and found something on lightweight streams (task streams) which is used to create the streams partially from parent. In my scenario do i need to really use this lightweight streams? What is the main advantage & limitiations of using this light-weight streams over using normal approach as I mentioned above?
The purpose of task streams is not to create streams "partially" -- you have already done this with your share/import paths. Don't fix what isn't broken!
Task streams are built to be short-lived and easily archive-able once the associated task is complete (via the "unload" command). The limitations of task streams are described in the documentation here:
https://www.perforce.com/perforce/doc.current/manuals/p4v/Content/P4V/streams.task.html
namely that they can't be reparented and they may not have children. If you use task streams as short-lived single-task streams (as the name "task stream" implies, a task stream is for a single task), these limitations won't generally be a problem. If you try to use a task stream as a development mainline, you're going to have problems.
If your development process involves creating a new branch for a short-term task (e.g. an individual hotfix parented to a particular branch), and you have a lot of these tasks, task streams may be useful due to their easy cleanup and low overhead (when a task stream is unloaded it's removed from the db, which means you don't accumulate db cruft over time as you create and abandon them).
If this does not sound like your development process, forget you ever heard about task streams. Do not try to imagine ways that you can use task streams for things that aren't short-term tasks. Hammers are suitable for nails. Do not use them to try to drive screws, especially when you have a perfectly good screwdriver right there and are already using it successfully.
(Can you tell I've seen more than a few instances of people trying to use task streams for absolutely everything because they "sound cool"? Resist the urge!)
I am building a simple application to download a set of XML files and parse them into a database using the async module (https://npmjs.org/package/node-async) for flow control. The overall flow is as follows:
Download list of datasets from API (single Request call)
Download metadata for each dataset to get link to XML file (async.each)
Download XML for each dataset (async.parallel)
Parse XML for each dataset into JSON objects (async.parallel)
Save each JSON object to a database (async.each)
In effect, for each dataset there is a parent process (2) which sets of a series of asynchronous child processes (3, 4, 5). The challenge that I am facing is that, because so many parent processes fire before all of the children of a particular process are complete, child processes seem to be getting queued up in the event loop, and it takes a long time for all of the child processes for a particular parent process to resolve and allow garbage collection to clean everything up. The result of this is that even though the program doesn't appear to have any memory leaks, memory usage is still too high, ultimately crashing the program.
One solution which worked was to make some of the child processes synchronous so that they can be grouped together in the event loop. However, I have also seen an alternative solution discussed here: https://groups.google.com/forum/#!topic/nodejs/Xp4htMTfvYY, which pushes parent processes into a queue and only allows a certain number to be running at once. My question then is does anyone know of a more robust module for handling this type of queueing, or any other viable alternative for handling this kind of flow control. I have been searching but so far no luck.
Thanks.
I decided to post this as an answer:
Don't launch all of the processes at once. Let the callback of one request launch the next one. The overall work is still asynchronous, but each request gets run in series. You can then pool up a certain number of the connections to be running simultaneously to maximize I/O throughput. Look at async.eachLimit and replace each of your async.each examples with it.
Your async.parallel calls may be causing issues as well.
I'm experimenting with the System.Collections.Concurrent namespace but I have a problem implementing my design.
My input queue (ConcurrentQueue) is getting populated fine from a Thread which is doing some I/O at startup to read and parse.
Next I kick off a Parallel.ForEach() on the input queue. I'm doing some I/O bound work on each item.
A log item is created for each item processed in the ForEach() and is dropped into a result queue.
What I would like to do is kick off the logging I start reading the input because I may not be able to fit all of the log items in memory. What is the best way to wait for items to land in the result queue? Are there design patterns or examples that I should be looking at?
I think the pattern you're looking for is the producer/consumer pattern. More specifically, you can have a producer/consumer implementation built around TPL and BlockingCollection.
The main concepts you want to read about are:
Task,
BlockingCollection,
TaskFactory.ContinueWhenAll(will allow you to perform some action when a set of tasks/threads is finished running).
Bounding and Blocking in BlockingCollection. This allows you to set a maximum size for your output collection (for memory reasons) and producer thread(s) will wait for consumers to pick up elements in case the maximum size you specify is reached.
BlockingCollection.CompleteAdding and BlockingCollection.IsCompleted which can be used to synchronize producers and consumers (producer can say when it's finished, consumer can check for that and keep running until the producer(s) are finised).
A more complete sample is in the second article I linked.
In your case I think you want the consumer to just pick up things from the result queue and dispose of them as soon as possible (write them to a logging store, or similar).
So your final collection, where you dump log items should be a BlockingCollection, not a ConcurrentQueue.
I have a process that requires approval from any two people out of a group of four or five. I'd like to create a task assigned to each person, and after two of those tasks are complete, delete the remaining tasks and move on with the workflow.
Is there a way to create multiple tasks with a single CreateTask activity? Also, I'm still fairly new to WorkFlow, so if I store the TaskIDs in an array, can I iterate over them to delete the remaining tasks after the fact?
Or am I going about this entirely the wrong way? I'm open to suggestions.
I am not sure if they apply to Sharepoint, but check these posts from Matt Winkler:
Different Execution Patterns with WF (or, Going beyond Sequential and State Machine)
Implementing the N of M Pattern in WF
I think that the second post describes exactly your case.