Making background jobs with Node/ES6 - node.js

I would like to sort things such as sending notification emails out into the far-backend and write a couple of job classes. Like:
import JobBase from "some library...";
class SendEMails extends JobBase {
// ...
}
Now, I did see a couple of Redis-backed job/task queues for Node, but appearently, they all just seem to operate on a single queue in Redis...which is kind of surprising, knowing that ES6 is around since a while now.
What would be a good method of defining job classes and running them when a job is queued - and having that organized in classes?
I really don't want to re-invent the wheel and write my own implementation... :)

Related

Classful Thread Pool

I need to design a thread pool system, in Python in this case, but I'm more interested in the general methodology.
It has to be something along the lines of https://www.metachris.com/2016/04/python-threadpool/, where threads wait idling until some jobs are pushed into the pool. How that works, using condition variables, is clear to me.
I have one additional requirement though: the jobs I'm pushing into the pool cannot run all in parallel. Each of them has a class (i don't mean the object class here, just a simple integer that somehow classifies the job) and only one job per class can be running at the same time. If a job is pushed having the same class of a job that is currently running, it has to wait in the queue until the latter is done.
I have already modified the mentioned class to do this, but what I achieved is pretty messy and I'm not sure it's reliable, so I would ask what modifications would be suggested or whether I should use a totally different approach. Again: I don't need the code, but rather a description.
Thanks.

Trigger periodic celery task after an event [duplicate]

I want to schedule a periodic task with Celery dynamically at the end of another group of task.
I know how to create (static) periodic tasks with Celery:
CELERYBEAT_SCHEDULE = {
'poll_actions': {
'task': 'tasks.poll_actions',
'schedule': timedelta(seconds=5)
}
}
But I want to create periodic jobs dynamically from my tasks (and maybe have a way to stop those periodic jobs when some condition is achieved (all tasks done).
Something like:
#celery.task
def run(ids):
group(prepare.s(id) for id in ids) | execute.s(ids) | poll.s(ids, schedule=timedelta(seconds=5))
#celery.task
def prepare(id):
...
#celery.task
def execute(id):
...
#celery.task
def poll(ids):
# This task has to be schedulable on demand
...
The straightforward solution to this requires that you be able to add/remove beat scheduler entries on the fly. As of the answering of this question...
How to dynamically add / remove periodic tasks to Celery (celerybeat)
This was not possible. I doubt it has become available in the interim because ...
You are conflating two concepts here. The notion of "Event Driven Work" and the idea of "Batch Schedule Driven Work"(which is really just the first case where the event happens on a schedule). If you really consider what you are doing here you'll find that there is a rather complex set of edge cases. Messages are distributed in nature what happens when groups spawned from different messages start creating conflicting entries? What do you do when you find yourself under a mountain of previously scheduled kruft?
When working with messaging systems you are really looking to build recursive trees. Spindles of work that do something and spawn more messages to do more things. Cycles(intended or otherwise) aside these ultimately achieve their base cases and terminate.
The answer to whatever you are actually trying to achieve lies with re-encoding your problem within the limitations of your messaging system and asynchronous work framework.

How to handle serialization when using a worker thread from wicket?

In our wicket application I need to start a long-running operation. It will communicate with an external device and provide a result after some time (up to a few minutes).
Java-wise the long running operation is started by a method where I can provide a callback.
public interface LegacyThingy {
void startLegacyWork(WorkFinished callback);
}
public interface WorkFinished {
public void success(Whatever ...);
// failure never happens
}
On my Wicket Page I plan to add an Ajax Button to invoke startLegacyWork(...) providing an appropriate callback. For the result I'd display a panel that polls for the result using an AbstractTimerBehavior.
What boggles my mind is the following problem:
To keep state Wicket serializes the component tree along with the data, thus the data needs to be wrapped in serializable models (or detachable models).
So to keep the "connection" between the result panel and the WorkFinished callback I'd need some way to create a link between the "we serialize everything" world of Wicket and the "Hey I'm a Java Object and nobody manages my lifetime" world of the legacy interface.
Of course I could store ongoing operations in a kind of global map and use a Wicket detachable model that looks them up by id ... but that feels dirty and I don't assume that's the correct way. (It opens up a whole can of worms regarding lifetime of such things).
Or I'm looking at a completly wrong angle on how to do long running operations from wicket?
I think the approach with the global map is good. Wicket also uses something similar internally - org.apache.wicket.protocol.http.StoredResponsesMap. This is a special map that keeps the generated responses for REDIRECT_TO_BUFFER strategy. It has the logic to keep the entries for at most some pre-configured duration and also can have upper limit of entries.

How to process rows of a CSV file using Groovy/GPars most efficiently?

The question is a simple one and I am surprised it did not pop up immediately when I searched for it.
I have a CSV file, a potentially really large one, that needs to be processed. Each line should be handed to a processor until all rows are processed. For reading the CSV file, I'll be using OpenCSV which essentially provides a readNext() method which gives me the next row. If no more rows are available, all processors should terminate.
For this I created a really simple groovy script, defined a synchronous readNext() method (as the reading of the next line is not really time consuming) and then created a few threads that read the next line and process it. It works fine, but...
Shouldn't there be a built-in solution that I could just use? It's not the gpars collection processing, because that always assumes there is an existing collection in memory. Instead, I cannot afford to read it all into memory and then process it, it would lead to outofmemory exceptions.
So.... anyone having a nice template for processing a CSV file "line by line" using a couple of worker threads?
Concurrently accessing a file might not be a good idea and GPars' fork/join-processing is only meant for in-memory data (collections). My sugesstion would be to read the file sequentially into a list. When the list reaches a certain size, process the entries in the list concurrently using GPars, clear the list and then move on with reading lines.
This might be a good problem for actors. A synchronous reader actor could hand off CSV lines to parallel processor actors. For example:
#Grab(group='org.codehaus.gpars', module='gpars', version='0.12')
import groovyx.gpars.actor.DefaultActor
import groovyx.gpars.actor.Actor
class CsvReader extends DefaultActor {
void act() {
loop {
react {
reply readCsv()
}
}
}
}
class CsvProcessor extends DefaultActor {
Actor reader
void act() {
loop {
reader.send(null)
react {
if (it == null)
terminate()
else
processCsv(it)
}
}
}
}
def N_PROCESSORS = 10
def reader = new CsvReader().start()
(0..<N_PROCESSORS).collect { new CsvProcessor(reader: reader).start() }*.join()
I'm just wrapping up an implementation of a problem just like this in Grails (you don't specify if you're using grails, plain hibernate, plain JDBC or something else).
There isn't anything out of the box that you can get that I'm aware of. You could look at integrating with Spring Batch, but the last time I looked at it, it felt very heavy to me (and not very groovy).
If you're using plain JDBC, doing what Christoph recommends probably is the easiest thing to do (read in N rows and use GPars to spin through those rows concurrently).
If you're using grails, or hibernate, and want your worker threads to have access to the spring context for dependency injection, things get a bit more complicated.
The way I solved it is using the Grails Redis plugin (disclaimer: I'm the author) and the Jesque plugin, which is a java implementation of Resque.
The Jesque plugin lets you create "Job" classes that have a "process" method with arbitrary parameters that are used to process work enqueued on a Jesque queue. You can spin up as many workers as you want.
I have a file upload that an admin user can post a file to, it saves the file to disk and enqueues a job for the ProducerJob that I've created. That ProducerJob spins through the file, for each line, it enqueues a message for a ConsumerJob to pick up. The message is simply a map of the values read from the CSV file.
The ConsumerJob takes those values and creates the appropriate domain object for it's line and saves it to the database.
We already were using Redis in production so using this as a queueing mechanism made sense. We had an old synchronous load that ran through file loads serially. I'm currently using one producer worker and 4 consumer workers and loading things this way is over 100x faster than the old load was (with much better progress feedback to the end user).
I agree with the original question that there is probably room for something like this to be packaged up as this is a relatively common thing.
UPDATE: I put up a blog post with a simple example doing imports with Redis + Jesque.

Creating Dependencies Within An NSOperation

I have a fairly involved download process I want to perform in a background thread. There are some natural dependencies between steps in this process. For example, I need to complete the downloads of both Table A and Table B before setting the relationships between them (I'm using Core Data).
I thought first of putting each dependent step in its own NSOperation, then creating a dependency between the two operations (i.e. download the two tables in one operation, then set the relationship between them in the next, dependent operation). However, each NSOperation requires it's own NSManagedContext, so this is no good. I don't want to save the background context until both tables have been downloaded and their relationships set.
I've therefore concluded this should all occur inside one NSOperation, and that I should use notifications or some other mechanism to call the dependent method when all the conditions for running it have been met.
I'm an iOS beginner, however, so before I venture down this path, I wouldn't mind advice on whether I've reached the right conclusion.
Given your validation requirements, I think it will be easiest inside of one operation, although this could turn into a bit of a hairball as far as code structure goes.
You'll essentially want to make two wire fetches to get the entire dataset you require, then combine the data and parse it at one time into Core Data.
If you're going to use the asynchronous API's this essentially means structuring a class that waits for both operations to complete and then launches another NSOperation or block which does the parse and relationship construction.
Imagine this order of events:
User performs some action (button tap, etc.)
Selector for that action fires two network requests
When both requests have finished (they both notify a common delegate) launch the parse operation
Might look something like this in code:
- (IBAction)someAction:(id)sender {
//fire both network requests
request1.delegate = aDelegate;
request2.delegate = aDelegate;
}
//later, inside the implementation of aDelegate
- (void)requestDidComplete... {
if (request1Finished && request2Finished) {
NSOperation *parse = //init with fetched data
//launch on queue etc.
}
}
There's two major pitfalls that this solution is prone to:
It keeps the entire data set around in memory until both requests are finished
You will have to constantly switch on the specific request that's calling your delegate (for error handling, success, etc.)
Basically, you're implementing operation dependencies on your own, although there might not be a good way around that because of the structure of NSURLConnection.

Resources