Basically, each of the clients ---that have a clientId associated with them--- can push messages and it is important that a second message from the same client isn't processed until the first one is finished processing (Even though the client can send multiple messages in a row, and they are ordered, and multiple clients sending messages should ideally not interfere with each other). And, importantly, a job shouldn't be processed twice.
I thought that using Redis I might be able to fix this issue, I started with some quick prototyping using the bull library, but I am clearly not doing it well, I was hoping someone would know how to proceed.
This is what I tried so far:
Create jobs and add them to the same queue name for one process, using the clientId as the job name.
Consume jobs while waiting large random amounts of random time on 2 separate process.
I tried adding the default locking provided by the library that I am using (bull) but it locks on the jobId, which is unique for each job, not on the clientId .
What I would want to happen:
One of the consumers can't take the job from the same clientId until the previous one is finished processing it.
They should be able to, however, get items from different clientIds in parallel without problem (asynchronously). (I haven't gotten this far, I am right now simply dealing with only one clientId)
What I get:
Both consumers consume as many items as they can from the queue without waiting for the previous item for the clientId to be completed.
Is Redis even the right tool for this job?
Example code
// ./setup.ts
import Queue from 'bull';
import * as uuid from 'uuid';
// Check that when a message is taken from a place, no other message is taken
// TO do that test, have two processes that process messages and one that sets messages, and make the job take a long time
// queue for each room https://stackoverflow.com/questions/54178462/how-does-redis-pubsub-subscribe-mechanism-works/54243792#54243792
// https://groups.google.com/forum/#!topic/redis-db/R09u__3Jzfk
// Make a job not be called stalled, waiting enough time https://github.com/OptimalBits/bull/issues/210#issuecomment-190818353
export async function sleep(ms: number): Promise<void> {
return new Promise((resolve) => {
setTimeout(resolve, ms);
});
}
export interface JobData {
id: string;
v: number;
}
export const queue = new Queue<JobData>('messages', 'redis://127.0.0.1:6379');
queue.on('error', (err) => {
console.error('Uncaught error on queue.', err);
process.exit(1);
});
export function clientId(): string {
return uuid.v4();
}
export function randomWait(minms: number, maxms: number): Promise<void> {
const ms = Math.random() * (maxms - minms) + minms;
return sleep(ms);
}
// Make a job not be called stalled, waiting enough time https://github.com/OptimalBits/bull/issues/210#issuecomment-190818353
// eslint-disable-next-line #typescript-eslint/ban-ts-comment
//#ts-ignore
queue.LOCK_RENEW_TIME = 5 * 60 * 1000;
// ./create.ts
import { queue, randomWait } from './setup';
const MIN_WAIT = 300;
const MAX_WAIT = 1500;
async function createJobs(n = 10): Promise<void> {
await randomWait(MIN_WAIT, MAX_WAIT);
// always same Id
const clientId = Math.random() > 1 ? 'zero' : 'one';
for (let index = 0; index < n; index++) {
await randomWait(MIN_WAIT, MAX_WAIT);
const job = { id: clientId, v: index };
await queue.add(clientId, job).catch(console.error);
console.log('Added job', job);
}
}
export async function create(nIds = 10, nItems = 10): Promise<void> {
const jobs = [];
await randomWait(MIN_WAIT, MAX_WAIT);
for (let index = 0; index < nIds; index++) {
await randomWait(MIN_WAIT, MAX_WAIT);
jobs.push(createJobs(nItems));
await randomWait(MIN_WAIT, MAX_WAIT);
}
await randomWait(MIN_WAIT, MAX_WAIT);
await Promise.all(jobs)
process.exit();
}
(function mainCreate(): void {
create().catch((err) => {
console.error(err);
process.exit(1);
});
})();
// ./consume.ts
import { queue, randomWait, clientId } from './setup';
function startProcessor(minWait = 5000, maxWait = 10000): void {
queue
.process('*', 100, async (job) => {
console.log('LOCKING: ', job.lockKey());
await job.takeLock();
const name = job.name;
const processingId = clientId().split('-', 1)[0];
try {
console.log('START: ', processingId, '\tjobName:', name);
await randomWait(minWait, maxWait);
const data = job.data;
console.log('PROCESSING: ', processingId, '\tjobName:', name, '\tdata:', data);
await randomWait(minWait, maxWait);
console.log('PROCESSED: ', processingId, '\tjobName:', name, '\tdata:', data);
await randomWait(minWait, maxWait);
console.log('FINISHED: ', processingId, '\tjobName:', name, '\tdata:', data);
} catch (err) {
console.error(err);
} finally {
await job.releaseLock();
}
})
.catch(console.error); // Catches initialization
}
startProcessor();
This is run using 3 different processes, which you might call like this (Although I use different tabs for a clearer view of what is happening)
npx ts-node consume.ts &
npx ts-node consume.ts &
npx ts-node create.ts &
I'm not familir with node.js. But for Redis, I would try this,
Let's say you have client_1, client_2, they are all publisher of events.
You have three machines, consumer_1,consumer_2, consumer_3.
Establish a list of tasks in redis, eg, JOB_LIST.
Clients put(LPUSH) jobs into this JOB_LIST, in a specific form, like "CLIENT_1:[jobcontent]", "CLIENT_2:[jobcontent]"
Each consumer takes out jobs blockingly (RPOP command of Redis) and process them.
For example, consumer_1 takes out a job, content is CLIENT_1:[jobcontent]. It parses the content and recognize it's from CLIENT_1. Then it wants to check if some other consumer is processing CLIENT_1 already, if not, it will lock the key to indicate that it's processing CLIENT_1.
It goes on to set a key of "CLIENT_1_PROCESSING" , with content as "consumer_1", using the Redis SETNX command (set if the key not exists), with an appropriate timeout. For example, the task norally takes one minute to finish, you set a timeout of the key of five minutes, just in case consumer_1 crashes and holds on the lock indefinitely.
If the SETNX returns 0, it means it fails to acquire the lock of CLIENT_1 (someone is already processing a job of client_1). Then it returns the job (a value of "CLIENT_1:[jobcontent]")to the left side of JOB_LIST, by using Redis LPUSH command.Then it might wait a bit (sleep a few seconds), and RPOP another task from the right side of the LIST. If this time SETNX returns 1, consumer_1 acquires the lock. It goes on to process job, after it finishes, it deletes the key of "CLIENT_1_PROCESSING", releasing the lock. Then it goes on to RPOP another job, and so on.
Some things to consider:
The JOB_LIST is not fair,eg, earlier jobs might be processed later
The locking part is a bit rudimentary, but will suffice.
----------update--------------
I've figured another way to keep tasks in order.
For each client(producer), build a list. Like "client_1_list", push jobs into the left side of the list.
Save all the client names in a list "client_names_list", with values "client_1", "client_2", etc.
For each consumer(processor), iterate the "client_names_list", for example, consumer_1 get a "client_1", check if the key of client_1 is locked(some one is processing a task of client_1 already), if not, right pop a value(job) from client_1_list and lock client_1. If client_1 is locked, (probably sleep one second) and iterate to the next client, "client_2", for example, and check the keys and so on.
This way, each client(task producer)'s task is processed by their order of entering.
EDIT: I found the problem regarding BullJS is starting jobs in parallel on one processor: We are using named jobs and where defining many named process functions on one queue/processor. The default concurrency factor for a queue/processor is 1. So the queue should not process any jobs in parallel.
The problem with our mentioned setup is if you define many (named) process-handlers on one queue the concurrency is added up with each process-handler function: So if you define three named process-handlers you get a concurrency factor of 3 for given queue for all the defined named jobs.
So just define one named job per queue for queues where parallel processing should not happen and all jobs should run sequentially one after the other.
That could be important e.g. when pushing a high number of jobs onto the queue and the processing involves API calls that would give errors if handled in parallel.
The following text is my first approach of answering the op's question and describes just a workaround to the problem. So better just go with my edit :) and configure your queues the right way.
I found an easy solution to operators question.
In fact BullJS is processing many jobs in parallel on one worker instance:
Let's say you have one worker instance up and running and push 10 jobs onto the queue than possibly that worker starts all processes in parallel.
My research on BullJS-queues gave that this is not intended behavior: One worker (also called processor by BullJS) should only start a new job from the queue when its in idle state so not processing a former job.
Nevertheless BullJS keeps starting jobs in parallel on one worker.
In our implementation that lead to big problems during API calls that most likely are caused by t00 many API calls at a time. Tests gave that when only starting one worker the API calls finished just fine and gave status 200.
So how to just process one job after the other once the previous is finished if BullJS does not do that for us (just what the op asked)?
We first experimented with delays and other BullJS options but thats kind of workaround and not the exact solution to the problem we are looking for. At least we did not get it working to stop BullJS from processing more than one job at a time.
So we did it ourself and started one job after the other.
The solution was rather simple for our use case after looking into BullJS API reference (BullJS API Ref).
We just used a for-loop to start the jobs one after another. The trick was to use BullJS's
job.finished
method to get a Promise.resolve once the job is finished. By using await inside the for-loop the next job gets just started immediately after the job.finished Promise is awaited (resolved). Thats the nice thing with for-loops: Await works in it!
Here a small code example on how to achieve the intended behavior:
for (let i = 0; i < theValues.length; i++) {
jobCounter++
const job = await this.processingQueue.add(
'update-values',
{
value: theValues[i],
},
{
// delay: i * 90000,
// lifo: true,
}
)
this.jobs[job.id] = {
jobType: 'socket',
jobSocketId: BackgroundJobTasks.UPDATE_VALUES,
data: {
value: theValues[i],
},
jobCount: theValues.length,
jobNumber: jobCounter,
cumulatedJobId
}
await job.finished()
.then((val) => {
console.log('job finished:: ', val)
})
}
The important part is really
await job.finished()
inside the for loop. leasingValues.length jobs get started all just one after the other as intended.
That way horizontally scaling jobs across more than one worker is not possible anymore. Nevertheless this workaround is okay for us at the moment.
I will get in contact with optimalbits - the maker of BullJS to clear things out.
I know I can write a while (true) loop to monitor the queue, but it will cause the CPU 100% problem.
I can sleep some seconds inside the while (true) loop, but it's NOT efficient.
In C language, I can wait for a semaphore inside the while (true) loop. When a task added into the queue, release the semaphore so that the while (true) loop can do its job. After the queue is empty, it can set the semaphore, and wait for it.
Is there similar way to do this in Nodejs?
Imagine we have this taskQueue:
// Tasks will be added to the array randomly
const tasks = [];
Note: the taskQueue above is something completely different than the internal NodeJS micro/macro task queue, that I'm referring to throughout this post.
A way of constantly monitoring this array would be to schedule a 'micro-task' or 'macro-task' that parses the array.
As an example:
function handleTasks() {
if (tasks.length) {
// Alternatively loop and pop all the current tasks in queue
const task = tasks.pop();
// Do something with the task
}
setImmediate(handleTasks)
}
setImmediate(handleTasks)
The setImmediate function will add a task to the internal macro-task queue.
The JS micro- and macro-tasks do not block the main thread and will only be executed when the event-loop picks it off the internal micro/macro task queue.
In NodeJS there are 4 ways of scheduling a function in a non-blocking way. Which way you pick is based on how much priority you'd want to give to the function.
Ordered by highest priority first the ways to do this are:
process.nextTick(handleTask)
new Promise((resolve) => { resolve() }).then(handleTask)
setImmediate(handleTask) / setTimeout(handleTask, 0)
setTimeout(handleTask, 1) # Every timeout value bigger than 0
Be aware that executing this function with the highest priority recursively could slow down the rest of your code.
Depending on how important clearing this taskQueue is, I'd generally suggest to use setTimeout with a reasonable value (as high as you can afford) to prevent affecting performance of your application. (Same goes for any other function that schedules itself on the micro/macro task queue.)
Questions
I know I can write a while (true) loop to monitor the queue, but it
will cause the CPU 100% problem.
In JavaScript the functions cannot be preempted, meaning that their execution cannot be halted somewhere in the middle.
The consequence is that once a function start, it will have to finish before another line of code (somewhere else) can be executed.
Therefore an infinite while-loop will not work.
I can sleep some seconds inside the while (true) loop, but it's NOT
efficient.
while(true) {
await timeout(1000);
// Do sth
}
Is actually syntactic sugar for
timeout(1000).then(() => {
// Do sth
timeout(1000).then(() => {
// Do sth
// ...etc
})
})
Using await inside a loop is considered a bad-practice, but could work since it just schedules each next iteration on the micro-task queue.
In C language, I can wait for a semaphore inside the while (true) loop. When a
task added into the queue, release the semaphore so that the while
(true) loop can do its job. After the queue is empty, it can set the
semaphore, and wait for it.
There is no such thing as a semaphore in JS. Something that might achieve a similar effect could be a callback function.
Example:
function heavyLoadTask() {
// Do sth
resumeExecution = () => {
// What to do when execution is resumed
}
}
// Somewhere else the execution could be resumed like this;
if (typeof resumeExecution === "function"){
resumeExecution();
}
Recommended reading
https://javascript.info/event-loop
https://nodejs.dev/learn/understanding-process-nexttick
https://nodejs.dev/learn/understanding-setimmediate
In DBFlow's docs you can read that:
While generally saving data synchronous should be avoided, for small
amounts of data it has little effect.
That's great! ...But, on the other hand there's this notice:
Doing operations on the main thread can block it if you read and write
to the DB on a different thread while accessing DB on the main.
So if I understand this correctly, if I have for example android service that periodically reads/writes to db using async transactions (in separate thread) and user clicks a button in Activity that executes simple model.save() then the main thread of my android application would be blocked and app would 'freeze' until db is unlocked.
Is that right?
If it is, the solution could be to place all db calls on async thread's queue, like:
interface AsyncResult {
fun onResult(success: Boolean)
}
fun MyModel.save(callback: AsyncResult) {
database<AppDatabase>().beginTransactionAsync { this.save() }
.success { callback.onResult(true) }
.error { _, _ -> callback.onResult(false)}
.build()
.execute()
}
and exeute it from Activity like:
myModel.save(object: AsyncResult {
override fun onResult(success: Boolean) = if (success) showToast("got it")
})
But aren't the simple calls like save(), insert() wrapped in implicit transactions, so we end up with nested transactions?
Not to mention all the boilerplate code on the model and view layers.
What is the proper way to deal with such a problem?
In my project I run an operation on a background thread using NSBlockOperation:
var operationQueue = NSOperationQueue()
var iop = NSBlockOperation(block: { self.reloadSize() /*calculation...*/ })
operationQueue.addOperation(iop)
Immediately after the calculations in the background thread are completed, I need to call: table.reloadData() on an NSTableView. I would do that in the very same thread, however, due to auto layout issues, the table has to be reloaded on the main thread. How can I accomplish this asynchronous relationship across both threads?
Two possible approaches:
Dispatch the reloading of the table from inside the block:
let operationQueue = NSOperationQueue()
let operation = NSBlockOperation() {
self.reloadSize()
...
dispatch_async(dispatch_get_main_queue()) { // or you can use NSOperationQueue.mainQueue().addOperationWithBlock()
self.table.reloadData()
}
}
operationQueue.addOperation(operation)
or just use addOperationWithBlock:
let operationQueue = NSOperationQueue()
operationQueue.addOperationWithBlock() {
self.reloadSize()
...
dispatch_async(dispatch_get_main_queue()) { // or you can use NSOperationQueue.mainQueue().addOperationWithBlock()
self.table.reloadData()
}
}
Create a new operation dependent upon this one:
let operationQueue = NSOperationQueue()
let operation = NSBlockOperation() {
self.reloadSize()
...
}
let completionOperation = NSBlockOperation() {
self.table.reloadData()
}
completionOperation.addDependency(operation)
operationQueue.addOperation(operation)
NSOperationQueue.mainQueue().addOperation(completionOperation)
Personally, I'd generally lean towards the first approach, though the latter approach can be useful in more complicated scenarios (e.g. the completion operation is dependent upon a number of other operations).
Try calling CFRunLoopRun().
It should run in the current queue.
If your operation ran on main queue, the current queue would be main queue and the operation would run on it succesfully
I have a function which calls Concurrency::create_task to perform some work in the background. Inside that task, there is a need to call a connectAsync method on the StreamSocket class in order to connect a socket to a device. Once the device is connected, I need to grab some references to things inside the connected socket (like input and output streams).
Since it is an asynchronous method and will return an IAsyncAction, I need to create another task on the connectAsync function that I can wait on. This works without waiting, but complications arise when I try to wait() on this inner task in order to error check.
Concurrency::create_task( Windows::Devices::Bluetooth::Rfcomm::RfcommDeviceService::FromIdAsync( device_->Id ) )
.then( [ this ]( Windows::Devices::Bluetooth::Rfcomm::RfcommDeviceService ^device_service_ )
{
_device_service = device_service_;
_stream_socket = ref new Windows::Networking::Sockets::StreamSocket();
// Connect the socket
auto inner_task = Concurrency::create_task( _stream_socket->ConnectAsync(
_device_service->ConnectionHostName,
_device_service->ConnectionServiceName,
Windows::Networking::Sockets::SocketProtectionLevel::BluetoothEncryptionAllowNullAuthentication ) )
.then( [ this ]()
{
//grab references to streams, other things.
} ).wait(); //throws exception here, but task executes
Basically, I have figured out that the same thread (presumably the UI) that creates the initial task to connect, also executes that task AND the inner task. Whenever I attempt to call .wait() on the inner task from the outer one, I immediately get an exception. However, the inner task will then finish and connect successfully to the device.
Why are my async chains executing on the UI thread? How can i properly wait on these tasks?
In general you should avoid .wait() and just continue the asynchronous chain. If you need to block for some reason, the only fool-proof mechanism would be to explicitly run your code from a background thread (eg, the WinRT thread pool).
You could try using the .then() overload that takes a task_options and pass concurrency::task_options(concurrency::task_continuation_context::use_arbitrary()), but that doesn't guarantee the continuation will run on another thread; it just says that it's OK if it does so -- see documentation here.
You could set an event and have the main thread wait for it. I have done this with some IO async operations. Here is a basic example of using the thread pool, using an event to wait on the work:
TEST_METHOD(ThreadpoolEventTestCppCx)
{
Microsoft::WRL::Wrappers::Event m_logFileCreatedEvent;
m_logFileCreatedEvent.Attach(CreateEventEx(nullptr, nullptr, CREATE_EVENT_MANUAL_RESET, WRITE_OWNER | EVENT_ALL_ACCESS));
long x = 10000000;
auto workItem = ref new WorkItemHandler(
[&m_logFileCreatedEvent, &x](Windows::Foundation::IAsyncAction^ workItem)
{
while (x--);
SetEvent(m_logFileCreatedEvent.Get());
});
auto asyncAction = ThreadPool::RunAsync(workItem);
WaitForSingleObjectEx(m_logFileCreatedEvent.Get(), INFINITE, FALSE);
long i = x;
}
Here is a similar example except it includes a bit of Windows Runtime async IO:
TEST_METHOD(AsyncOnThreadPoolUsingEvent)
{
std::shared_ptr<Concurrency::event> _completed = std::make_shared<Concurrency::event>();
int i;
auto workItem = ref new WorkItemHandler(
[_completed, &i](Windows::Foundation::IAsyncAction^ workItem)
{
Windows::Storage::StorageFolder^ _picturesLibrary = Windows::Storage::KnownFolders::PicturesLibrary;
Concurrency::task<Windows::Storage::StorageFile^> _getFileObjectTask(_picturesLibrary->GetFileAsync(L"art.bmp"));
auto _task2 = _getFileObjectTask.then([_completed, &i](Windows::Storage::StorageFile^ file)
{
i = 90210;
_completed->set();
});
});
auto asyncAction = ThreadPool::RunAsync(workItem);
_completed->wait();
int j = i;
}
I tried using an event to wait on Windows Runtime Async work, but it blocked. That's why I had to use the threadpool.