How do I replace a subscriber with an Observer?

How do I replace a subscriber with an Observer? - node.js

This issue on GitHub pretty much sums it up. I'am using a timer() with a recurring schedule of 1 second to perform a certain task. I pair it up with a Subscriber to subscribe the intervals. When a certain model runs out of data, I unsubscribe it & wait for new arrivals. When they data is populated again, I try to subscribe again but it doesn't work. It turns out whena Subscriber has been unsub'd, I can't use it again. So I must replace it with an Observer. A newbie here, I've no idea how to do that. tried looking at examples, they just confused me further.
How do I replace the following code to function with an Observer instead?
private timer = timer(1000, 1000);
// A timer subscription that keeps sending new images to the observer
timerSubscription = new Subscriber(() => {
// Check if there is an element in the list
if (this.head != null) {
// If the current node at head is a folder, unsubscribe the listener
if (this.head.data['id'].startsWith('folder')) {
this.timerSubscription.unsubscribe();
}
// Pop a node from the list and pass on to observer
this.observer.next(this.this$PiFrame.pop());
} else {
// If no nodes are left, unsubscribe from the timer
this.timerSubscription.unsubscribe();
console.log('No items left on the queue. Deactivating timer subscription.');
}
}, e => {}, () => {});
and I subscribe like so :
...
// Setup a timer to pop every 1000 ms
this.timer.subscribe(this.this$PiFrame.timerSubscription);
...
// If no nodes are left, unsubscribe from the timer
this.timerSubscription.unsubscribe();
...

Instead of creating the subscription the way you do, let the Observable return the subscription.
Keep your logic in a function, like so:
doWhatever() {
console.log("tick")
// Check if there is an element in the list
if (this.head != null) {
// If the current node at head is a folder, unsubscribe the listener
if (this.head.data['id'].startsWith('folder')) {
this.timerSubscription.unsubscribe();
}
// Pop a node from the list and pass on to observer
this.observer.next(this.this$PiFrame.pop());
} else {
// If no nodes are left, unsubscribe from the timer
this.timerSubscription.unsubscribe();
console.log('No items left on the queue. Deactivating timer subscription.');
}
}
Then, when you want to subscribe:
this.timerSubscription = this.timer.subscribe(() => this.doWhatever());
This can be used repeatedly, as each subscribe generates a new Subscription

Related

Do not process next job until previous job is completed (BullJS/Redis)?

Basically, each of the clients ---that have a clientId associated with them--- can push messages and it is important that a second message from the same client isn't processed until the first one is finished processing (Even though the client can send multiple messages in a row, and they are ordered, and multiple clients sending messages should ideally not interfere with each other). And, importantly, a job shouldn't be processed twice.
I thought that using Redis I might be able to fix this issue, I started with some quick prototyping using the bull library, but I am clearly not doing it well, I was hoping someone would know how to proceed.
This is what I tried so far:
Create jobs and add them to the same queue name for one process, using the clientId as the job name.
Consume jobs while waiting large random amounts of random time on 2 separate process.
I tried adding the default locking provided by the library that I am using (bull) but it locks on the jobId, which is unique for each job, not on the clientId .
What I would want to happen:
One of the consumers can't take the job from the same clientId until the previous one is finished processing it.
They should be able to, however, get items from different clientIds in parallel without problem (asynchronously). (I haven't gotten this far, I am right now simply dealing with only one clientId)
What I get:
Both consumers consume as many items as they can from the queue without waiting for the previous item for the clientId to be completed.
Is Redis even the right tool for this job?
Example code
// ./setup.ts
import Queue from 'bull';
import * as uuid from 'uuid';
// Check that when a message is taken from a place, no other message is taken
// TO do that test, have two processes that process messages and one that sets messages, and make the job take a long time
// queue for each room https://stackoverflow.com/questions/54178462/how-does-redis-pubsub-subscribe-mechanism-works/54243792#54243792
// https://groups.google.com/forum/#!topic/redis-db/R09u__3Jzfk
// Make a job not be called stalled, waiting enough time https://github.com/OptimalBits/bull/issues/210#issuecomment-190818353
export async function sleep(ms: number): Promise<void> {
return new Promise((resolve) => {
setTimeout(resolve, ms);
});
}
export interface JobData {
id: string;
v: number;
}
export const queue = new Queue<JobData>('messages', 'redis://127.0.0.1:6379');
queue.on('error', (err) => {
console.error('Uncaught error on queue.', err);
process.exit(1);
});
export function clientId(): string {
return uuid.v4();
}
export function randomWait(minms: number, maxms: number): Promise<void> {
const ms = Math.random() * (maxms - minms) + minms;
return sleep(ms);
}
// Make a job not be called stalled, waiting enough time https://github.com/OptimalBits/bull/issues/210#issuecomment-190818353
// eslint-disable-next-line #typescript-eslint/ban-ts-comment
//#ts-ignore
queue.LOCK_RENEW_TIME = 5 * 60 * 1000;
// ./create.ts
import { queue, randomWait } from './setup';
const MIN_WAIT = 300;
const MAX_WAIT = 1500;
async function createJobs(n = 10): Promise<void> {
await randomWait(MIN_WAIT, MAX_WAIT);
// always same Id
const clientId = Math.random() > 1 ? 'zero' : 'one';
for (let index = 0; index < n; index++) {
await randomWait(MIN_WAIT, MAX_WAIT);
const job = { id: clientId, v: index };
await queue.add(clientId, job).catch(console.error);
console.log('Added job', job);
}
}
export async function create(nIds = 10, nItems = 10): Promise<void> {
const jobs = [];
await randomWait(MIN_WAIT, MAX_WAIT);
for (let index = 0; index < nIds; index++) {
await randomWait(MIN_WAIT, MAX_WAIT);
jobs.push(createJobs(nItems));
await randomWait(MIN_WAIT, MAX_WAIT);
}
await randomWait(MIN_WAIT, MAX_WAIT);
await Promise.all(jobs)
process.exit();
}
(function mainCreate(): void {
create().catch((err) => {
console.error(err);
process.exit(1);
});
})();
// ./consume.ts
import { queue, randomWait, clientId } from './setup';
function startProcessor(minWait = 5000, maxWait = 10000): void {
queue
.process('*', 100, async (job) => {
console.log('LOCKING: ', job.lockKey());
await job.takeLock();
const name = job.name;
const processingId = clientId().split('-', 1)[0];
try {
console.log('START: ', processingId, '\tjobName:', name);
await randomWait(minWait, maxWait);
const data = job.data;
console.log('PROCESSING: ', processingId, '\tjobName:', name, '\tdata:', data);
await randomWait(minWait, maxWait);
console.log('PROCESSED: ', processingId, '\tjobName:', name, '\tdata:', data);
await randomWait(minWait, maxWait);
console.log('FINISHED: ', processingId, '\tjobName:', name, '\tdata:', data);
} catch (err) {
console.error(err);
} finally {
await job.releaseLock();
}
})
.catch(console.error); // Catches initialization
}
startProcessor();
This is run using 3 different processes, which you might call like this (Although I use different tabs for a clearer view of what is happening)
npx ts-node consume.ts &
npx ts-node consume.ts &
npx ts-node create.ts &

I'm not familir with node.js. But for Redis, I would try this,
Let's say you have client_1, client_2, they are all publisher of events.
You have three machines, consumer_1,consumer_2, consumer_3.
Establish a list of tasks in redis, eg, JOB_LIST.
Clients put(LPUSH) jobs into this JOB_LIST, in a specific form, like "CLIENT_1:[jobcontent]", "CLIENT_2:[jobcontent]"
Each consumer takes out jobs blockingly (RPOP command of Redis) and process them.
For example, consumer_1 takes out a job, content is CLIENT_1:[jobcontent]. It parses the content and recognize it's from CLIENT_1. Then it wants to check if some other consumer is processing CLIENT_1 already, if not, it will lock the key to indicate that it's processing CLIENT_1.
It goes on to set a key of "CLIENT_1_PROCESSING" , with content as "consumer_1", using the Redis SETNX command (set if the key not exists), with an appropriate timeout. For example, the task norally takes one minute to finish, you set a timeout of the key of five minutes, just in case consumer_1 crashes and holds on the lock indefinitely.
If the SETNX returns 0, it means it fails to acquire the lock of CLIENT_1 (someone is already processing a job of client_1). Then it returns the job (a value of "CLIENT_1:[jobcontent]")to the left side of JOB_LIST, by using Redis LPUSH command.Then it might wait a bit (sleep a few seconds), and RPOP another task from the right side of the LIST. If this time SETNX returns 1, consumer_1 acquires the lock. It goes on to process job, after it finishes, it deletes the key of "CLIENT_1_PROCESSING", releasing the lock. Then it goes on to RPOP another job, and so on.
Some things to consider:
The JOB_LIST is not fair,eg, earlier jobs might be processed later
The locking part is a bit rudimentary, but will suffice.
----------update--------------
I've figured another way to keep tasks in order.
For each client(producer), build a list. Like "client_1_list", push jobs into the left side of the list.
Save all the client names in a list "client_names_list", with values "client_1", "client_2", etc.
For each consumer(processor), iterate the "client_names_list", for example, consumer_1 get a "client_1", check if the key of client_1 is locked(some one is processing a task of client_1 already), if not, right pop a value(job) from client_1_list and lock client_1. If client_1 is locked, (probably sleep one second) and iterate to the next client, "client_2", for example, and check the keys and so on.
This way, each client(task producer)'s task is processed by their order of entering.

EDIT: I found the problem regarding BullJS is starting jobs in parallel on one processor: We are using named jobs and where defining many named process functions on one queue/processor. The default concurrency factor for a queue/processor is 1. So the queue should not process any jobs in parallel.
The problem with our mentioned setup is if you define many (named) process-handlers on one queue the concurrency is added up with each process-handler function: So if you define three named process-handlers you get a concurrency factor of 3 for given queue for all the defined named jobs.
So just define one named job per queue for queues where parallel processing should not happen and all jobs should run sequentially one after the other.
That could be important e.g. when pushing a high number of jobs onto the queue and the processing involves API calls that would give errors if handled in parallel.
The following text is my first approach of answering the op's question and describes just a workaround to the problem. So better just go with my edit :) and configure your queues the right way.
I found an easy solution to operators question.
In fact BullJS is processing many jobs in parallel on one worker instance:
Let's say you have one worker instance up and running and push 10 jobs onto the queue than possibly that worker starts all processes in parallel.
My research on BullJS-queues gave that this is not intended behavior: One worker (also called processor by BullJS) should only start a new job from the queue when its in idle state so not processing a former job.
Nevertheless BullJS keeps starting jobs in parallel on one worker.
In our implementation that lead to big problems during API calls that most likely are caused by t00 many API calls at a time. Tests gave that when only starting one worker the API calls finished just fine and gave status 200.
So how to just process one job after the other once the previous is finished if BullJS does not do that for us (just what the op asked)?
We first experimented with delays and other BullJS options but thats kind of workaround and not the exact solution to the problem we are looking for. At least we did not get it working to stop BullJS from processing more than one job at a time.
So we did it ourself and started one job after the other.
The solution was rather simple for our use case after looking into BullJS API reference (BullJS API Ref).
We just used a for-loop to start the jobs one after another. The trick was to use BullJS's
job.finished
method to get a Promise.resolve once the job is finished. By using await inside the for-loop the next job gets just started immediately after the job.finished Promise is awaited (resolved). Thats the nice thing with for-loops: Await works in it!
Here a small code example on how to achieve the intended behavior:
for (let i = 0; i < theValues.length; i++) {
jobCounter++
const job = await this.processingQueue.add(
'update-values',
{
value: theValues[i],
},
{
// delay: i * 90000,
// lifo: true,
}
)
this.jobs[job.id] = {
jobType: 'socket',
jobSocketId: BackgroundJobTasks.UPDATE_VALUES,
data: {
value: theValues[i],
},
jobCount: theValues.length,
jobNumber: jobCounter,
cumulatedJobId
}
await job.finished()
.then((val) => {
console.log('job finished:: ', val)
})
}
The important part is really
await job.finished()
inside the for loop. leasingValues.length jobs get started all just one after the other as intended.
That way horizontally scaling jobs across more than one worker is not possible anymore. Nevertheless this workaround is okay for us at the moment.
I will get in contact with optimalbits - the maker of BullJS to clear things out.

async.queue concurrent tasks

I am using async.queue to ensure that certain file copies in a service happen at most n concurrently, but watching the files copy sometimes I see a lot more than what the queue allows. Does anyone see something I may have missed in the below implementation?
createQueue(limit: number) {
let self = this;
return async.queue(function(cmdObj, callback) {
console.log("Beginning copy");
let cmd = cmdObj.cmd;
let args = cmdObj.args;
let request = cmdObj.req;
request.state = State.IN_PROGRESS;
self.reportStatus(request.destination);
const proc = spawn(cmd, args); //uses an rsync command upstream
proc.on("close", code => {
if (code !== 0) {
request.state = State.ERRORED;
self.reportStatus(request.destination); // these just report to the caller
statusMap.delete(request.destination);
} else {
fs.rename(request.destination + ".part", request.destination);
request.state = State.COMPLETED;
self.reportStatus(request.destination); // same here
statusMap.delete(request.destination);
}
callback();
});
proc.on("error", err => {
console.error("COPY ERR: " + err);
});
}, limit); // limit here, for example, may be two, but I see four copies concurrently
}
EDIT:
I now believe this is a side effect of the rest of the system...queues being cleared and reinitialized AFTER copies have started...so when new items are added to the reinitialized queues, they kick off immediately, as the system has no idea if something has been handed off to userland and is currently running.

So, this was user error...PEBCAK! Posting the solution more as a cautionary tale:
The queues above were working as designed, but I had an endpoint for the calling server to clear the queues as necessary; the problem was i was using kill() and re-initializing the queues, losing all track of any jobs in progress and their callbacks. As soon as a new item hit the fresh queue, it would think nothing was happening and spawn a new copy process. I resolved by using remove to clear the queues instead of re-initializing.

Async base-local with MQTT

I need to synchronize a base and a local client with MQTT. If client publishes then the other one will get the message.
If my MQTT broker is down, I need to stop sending messages, save the messages somewhere, wait for a connection, then continue sending.
If my local or base client is down for a second, I need to save the message which I didn't send, then send it when I turn on my base/local.
I'm working with Node.js and can't figure out how to implement this.
This is my handler when I connect or disconnect with my MQTT server.
client.on('connect',()=>{
store.state = true;
run(store).then((value)=>console.log('stop run'));
});
client.on('offline',()=>{
store.state = false;
console.log('offline');
});
This is my run function. I use store.state to decide if I should stop this interval. But this code does not seem to be a good way to implement my concept.
function run(store) {
return new Promise((resolve,reject)=>{
let interval = setInterval(()=>{
if (!store.state) {
clearInterval(interval);
resolve(true);
}
else if (store.queue.length > 0) {
let data = store.queue.pop();
let res = client.publish('push',JSON.stringify(data),{qos:2});
}
},300)
});
}
What should I do to implement a function which always sends, stop upon 'disconnect', then continues sending when connected?

I don't think set interval which 300ms is good.
If you want something that "always runs", at set intervals and in spite of any errors inside the loop, setInterval() makes sense. You are right that queued messages can be cleared faster than "once every 300 ms".
Since MQTT.js has a built-in queue, you could simplify a lot by using it. However, your messages are published to a target called "push", so I guess you want them delivered in the order of the queue. This answer keeps the queue and focuses on sending the next message as soon as the last one is confirmed.
What if res=client.publish(..) false ?
Good point! If you want to make sure it arrives, better to remove it once the publish has succeeded. For this, you need to retrieve the value without removing it, and use the callback argument to find out what happened (publish() is asynchronous). If that was the only change, it might look like:
let data = store.queue[store.queue.length - 1];
client.publish('push', JSON.stringify(data), {qos:2}, (err) => {
if(!err) {
store.queue.pop();
}
// Ready for next publish; call this function again
});
Extending that to include a callback-based run:
function publishFromQueue(data) {
return new Promise((resolve,reject)=>{
let res = client.publish('push', JSON.stringify(data), {qos:2}, (err) => {
resolve(!err);
});
});
}
async function run(store) {
while (store.queue.length > 0 && store.state) {
let data = store.queue[store.queue.length - 1];
let res = await publishFromQueue(data);
if(res) {
store.queue.pop();
}
}
}
This should deliver all the queued messages in order as soon as possible, without blocking. The only drawback is that it does not run constantly. You have two options:
Recur at set intervals, as you have already done. Slower, though you could set a shorter interval.
Only run() when needed, like:
let isRunning = false; //Global for tracking state of running
function queueMessage(data) {
store.queue.push(data);
if(!isRunning) {
isRunning = true;
run(store);
}
isRunning = false;
}
As long as you can use this instead of pushing to the queue, it should come out similar length, and more immediate and efficient.

Can I filter an Azure ServiceBusService using node.js SDK?

I have millions of messages in a queue and the first ten million or so are irrelevant. Each message has a sequential ActionId so ideally anything < 10000000 I can just ignore or better yet delete from the queue. What I have so far:
let azure = require("azure");
function processMessage(sb, message) {
// Deserialize the JSON body into an object representing the ActionRecorded event
var actionRecorded = JSON.parse(message.body);
console.log(`processing id: ${actionRecorded.ActionId} from ${actionRecorded.ActionTaken.ActionTakenDate}`);
if (actionRecorded.ActionId < 10000000) {
// When done, delete the message from the queue
console.log(`Deleting message: ${message.brokerProperties.MessageId} with ActionId: ${actionRecorded.ActionId}`);
sb.deleteMessage(message, function(deleteError, response) {
if (deleteError) {
console.log("Error deleting message: " + message.brokerProperties.MessageId);
}
});
}
// immediately check for another message
checkForMessages(sb);
}
function checkForMessages(sb) {
// Checking for messages
sb.receiveQueueMessage("my-queue-name", { isPeekLock: true }, function(receiveError, message) {
if (receiveError && receiveError === "No messages to receive") {
console.log("No messages left in queue");
return;
} else if (receiveError) {
console.log("Receive error: " + receiveError);
} else {
processMessage(sb, message);
}
});
}
let connectionString = "Endpoint=sb://<myhub>.servicebus.windows.net/;SharedAccessKeyName=KEYNAME;SharedAccessKey=[mykey]"
let serviceBusService = azure.createServiceBusService(connectionString);
checkForMessages(serviceBusService);
I've tried looking at the docs for withFilter but it doesn't seem like that applies to queues.
I don't have access to create or modify the underlying queue aside from the operations mentioned above since the queue is provided by a client.
Can I either
Filter my results that I get from the queue
speed up the queue processing somehow?

Filter my results that I get from the queue
As you found, filters as a feature are only applicable to Topics & Subscriptions.
speed up the queue processing somehow
If you were to use the #azure/service-bus package which is the newer, faster library to work with Service Bus, you could receive the messages in ReceiveAndDelete mode until you reach the message with ActionId 9999999, close that receiver and then create a new receiver in PeekLock mode. For more on these receive modes, see https://learn.microsoft.com/en-us/azure/service-bus-messaging/message-transfers-locks-settlement#settling-receive-operations

preventing race conditions with nodejs

I'm writing an application using nodeJS 6.3.0 and aws DynamoDB.
the dynamodb holds statistics information that are added to dynamodb that are being called from 10 different function (10 different statistic measures). the interval is set to 10 seconds, which means that every 10 seconds, 10 calls to my function are being made to add all the relevant information.
the putItem function:
function putItem(tableName,itemData,callback) {
var params = {
TableName: tableName,
Item: itemData
};
docClient.put(params, function(err, data) {
if (err) {
logger.error(params,"putItem failed in dynamodb");
callback(err,null);
} else {
callback(null,data);
}
});
now... I created a queue.
var queue = require('./dynamoDbQueue').queue;
that implements a simple queue with fixed size that I took from http://www.bennadel.com/blog/2308-creating-a-fixed-length-queue-in-javascript-using-arrays.htm.
the idea is that if there is a network problem.. lets say for a minute. i want all the events to be pushed to the queue and when the problem is resolved to send queue information to dynamodb and to free the queue.
so I modified my original function to the following code:
function putItem(tableName,itemData,callback) {
var params = {
TableName: tableName,
Item: itemData
};
if (queue.length>0) {
queue.push(params);
callback(null,null);
} else {
docClient.put(params, function (err, data) {
if (err) {
queue.push(params);
logger.error(params, "putItem failed in dynamodb");
handleErroredQueue(); // imaginary function that i need to implement
callback(err, null);
} else {
callback(null, data);
}
});
}
}
but since I have 10 insert functions that runs at the same second, there is a chance of race conditions. which means that ...
execute1 - one function validated that the queue is empty... and is about to execute docClient.put() function.
execute2 - and at the same time another function returned from docClient.put() with an error and as a result it adds to the queue it's first row.
execute1 - by the time that the first function calling docClient.put(), the problem has been resolved and it successfully inserted data to dynamodb, which leaves the queue with previous data that will be released in the next iteration.
so for example if i inserted 4 rows with ids 1,2,3,4, the order of rows that will be inserted to dynamodb is 1,2,4,3.
is there a way to resolve that ?
thanks!

I think you are on right track, but instead of checking for an error and then adding into queue what I would suggest is to add every operation to queue first and then read the data from the queue every time.
For instance, in your case you call function 1,2,3,4 and it results in 1,2,4,3 because you are using the queue at a time off error/abrupt operation.
Step1: All your function will make an entry to a Queue -> 1,2,3,4
Step2: Read your queue and make an insert, if success remove the element
else redo the operation. This way it will insert in the desired sequence
Another advantage is that because you are using queue you don't have to keep very high throughputs for the table.
Edit:
I guess you just need to ensure that on completion of your first operation you will perform your next process and not before that.
e.g: fn 1 -> read from queue (don't delete right now from queue) -> operation Completed if not perfrom again -> Delete from queue -> perform next operation.
You just have to make sure you read from queue and wait till you get response from DynamoDB.
Hope this helps.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string