I am using BullMQ, Redis and MySQL in a producer-consumer model to process jobs.
I have a producer that looks like this:
const jobOptions = {
removeOnComplete: true, // remove job if complete
delay: 60000,
attempts: 3
};
const sasWorkerProducerService = ({redisConnection}) => {
const queueLogId = async (jobId, logIds) => {
for(const logId of logIds) {
redisConnection.add({
jobId: jobId,
scraperPriceLogId: logId
}, jobOptions);
}
}
return {
queueLogId
}
}
module.exports = sasWorkerProducerService;
And I have a worker service that handles the jobs:
const Bull = require('bull');
const connectQueue = (name) => new Bull(name, {
redis: {
port: 6379, host: 'xxxx', password: 'xxxx'
},
settings: {
maxStalledCount: 5
}
})
module.exports = { connectQueue }
const nameQueue = 'sas-worker'
const cases = await connectQueue(nameQueue)
const initJob = () => {
console.info('job is working!');
cases.process('__default__', 300, processJob);
cases.on('failed', handlerFailure);
cases.on('completed', handlerCompleted);
cases.on('stalled', handlerStalled);
}
initJob()
Notice that the producer sends a jobId as part of the payload. This ID is an identifier that I have generated and stored in a MySQL database. A job represents a batch of items that need to be processed. I don't care about what order the jobs for a batch get completed in.
However, how can I determine once all of the jobs for a given jobId have been completed? I need to do some work after all the jobs have been processed.
I understand the nature of a a producer-consumer model is to do work on an item and forget about it, but how can I do some final, post-processing work for a job after all the items have indeed been processed?
Related
Description:
I have created a Firebase app where a user can insert a Firestore document. When this document is created a timestamp is added so that it can be automatically deleted after x amount of time, by a cloud function.
After the document is created, a http/onCreate cloud function is triggered successfully, and it creates a cloud task. Which then deletes the document on the scheduled time.
export const onCreatePost = functions
.region(region)
.firestore.document('/boxes/{id}')
.onCreate(async (snapshot) => {
const data = snapshot.data() as ExpirationDocData;
// Box creation timestamp.
const { timestamp } = data;
// The path of the firebase document('/myCollection/{docId}').
const docPath = snapshot.ref.path;
await scheduleCloudTask(timestamp, docPath)
.then(() => {
console.log('onCreate: cloud task created successfully.');
})
.catch((error) => {
console.error(error);
});
});
export const scheduleCloudTask = async (timestamp: number, docPath: string) => {
// Convert timestamp to seconds.
const timestampToSeconds = timestamp / 1000;
// Doc time to live in seconds
const documentLifeTime = 20;
const expirationAtSeconds = timestampToSeconds + documentLifeTime;
// The Firebase project ID.
const project = 'my-project';
// Cloud Tasks -> firestore time to life queue.
const queue = 'my-queue';
const queuePath: string = tasksClient.queuePath(project, region, queue);
// The url to the callback function.
// That gets envoked by Google Cloud tasks when the deadline is reached.
const url = `https://${region}-${project}.cloudfunctions.net/callbackFn`;
const payload: ExpirationTaskPayload = { docPath };
// Google cloud IAM & ADMIN principle account.
const serviceAccountEmail = 'myServiceAccount#appspot.gserviceaccount.com';
// Configuration for the Cloud Task
const task = {
httpRequest: {
httpMethod: 'POST',
url,
oidcToken: {
serviceAccountEmail,
},
body: Buffer.from(JSON.stringify(payload)).toString('base64'),
headers: {
'Content-Type': 'application/json',
},
},
scheduleTime: {
seconds: expirationAtSeconds,
},
};
await tasksClient.createTask({
parent: queuePath,
task,
});
};
export const callbackFn = functions
.region(region)
.https.onRequest(async (req, res) => {
const payload = req.body as ExpirationTaskPayload;
try {
await admin.firestore().doc(payload.docPath).delete();
res.sendStatus(200);
} catch (error) {
console.error(error);
res.status(500).send(error);
}
});
Problem:
The user can also extend the time to live for the document. When that happens the timestamp is successfully updated in the Firestore document, and a http/onUpdate cloud function runs like expected.
Like shown below I tried to update the cloud tasks "time to live", by calling again
the scheduleCloudTask function. Which obviously does not work and I guess just creates another task for the document.
export const onDocTimestampUpdate = functions
.region(region)
.firestore.document('/myCollection/{docId}')
.onUpdate(async (change, context) => {
const before = change.before.data() as ExpirationDocData;
const after = change.after.data() as ExpirationDocData;
if (before.timestamp < after.timestamp) {
const docPath = change.before.ref.path;
await scheduleCloudTask(after.timestamp, docPath)
.then((res) => {
console.log('onUpdate: cloud task created successfully.');
return;
})
.catch((error) => {
console.error(error);
});
} else return;
});
I have not been able to find documentation or examples where an updateTask() or a similar method is used to update an existing task.
Should I use the deleteTask() method and then use the createTask() method and create a new task after the documents timestamp is updated?
Thanks in advance,
Cheers!
Yes, that's how you have to do it. There is no API to update a task.
I have 2 queries, and corresponding functions, but while writing the resolver, I'm not sure how to store the 1st func data, then reuse it in the second one. Note: I do not want to call the function again as it will be executing again and has an inline API call. I just want to use it like a session on the global state in express js. Here's the code:
const resolvers={
getStudent:async({id})=>{
const resp=await service(id)
return resp;
},
const courseDetails:()=>{
console.log(resp)// I want to access resp object from above func., But don't want to call getStudent again
}
}
I tried context but didn't work.
You can implement a simple in-memory store.
By storing the Promise and returning it you won't need to worry about multiple requests to the same resources.
const got = require('got');
const assert = require('assert');
function studentServiceFactory(options = {}) {
const TTL = options.ttl || 60 * 60 * 5; // default 5 min ttl
const BASE_API = "https://swapi.dev/api";
const store = {};
return {
get: ({ id }) => {
if(!store[id] || store[id].timestamp + TTL < Date.now()) {
// store the promise
store[id] = {
promise: got(`${BASE_API}/people/${id}`),
timestamp: Date.now(),
};
console.log(`${BASE_API}/people/${id}`);
}
return store[id].promise;
}
}
}
const studentService = studentServiceFactory({ ttl: 1000});
const resolvers = {
studentService: studentService,
};
// test program
(async () => {
const request1 = await resolvers.studentService.get({ id: 1 });
const request2 = await resolvers.studentService.get({ id: 1 });
// Both calls will return the same promise.
assert.equal(request1, request2);
// wait for resources to get stale
setTimeout(async() => {
const request3 = await resolvers.studentService.get({ id: 1 });
assert.notEqual(request1, request3);
}, 3000);
})();
Two requests are independent of each other. The only way to share data between two requests is to persist the data somewhere. It can be a file, database, etc. In your case, you can simply call the service function again in the other resolver.
The error is with the batchProcessDocuments line and has the error:
{
code: 3,
details: 'Request contains an invalid argument.',
metadata: Metadata {
internalRepr: Map { 'grpc-server-stats-bin' => [Array] },
options: {}
},
note: 'Exception occurred in retry method that was not classified as transient'
}
I've tried to copy the example as much as possible but without success. Is there a way of finding out more information regarding the input parameters that are required? There are very few examples of using Document AI on the web with this being a new product.
Here is my code sample:
const projectId = "95715XXXXX";
const location = "eu"; // Format is 'us' or 'eu'
const processorId = "a1e1f6a3XXXXXXXX";
const gcsInputUri = "gs://nmm-storage/test.pdf";
const gcsOutputUri = "gs://nmm-storage";
const gcsOutputUriPrefix = "out_";
// Imports the Google Cloud client library
const {
DocumentProcessorServiceClient,
} = require("#google-cloud/documentai").v1beta3;
const { Storage } = require("#google-cloud/storage");
// Instantiates Document AI, Storage clients
const client = new DocumentProcessorServiceClient();
const storage = new Storage();
const { default: PQueue } = require("p-queue");
async function batchProcessDocument() {
const name = `projects/${projectId}/locations/${location}/processors/${processorId}`;
// Configure the batch process request.
const request = {
name,
inputConfigs: [
{
gcsSource: gcsInputUri,
mimeType: "application/pdf",
},
],
outputConfig: {
gcsDestination: `${gcsOutputUri}/${gcsOutputUriPrefix}/`,
},
};
// Batch process document using a long-running operation.
// You can wait for now, or get results later.
// Note: first request to the service takes longer than subsequent
// requests.
const [operation] = await client.batchProcessDocuments(request); //.catch(err => console.log('err', err));
// Wait for operation to complete.
await operation.promise();
console.log("Document processing complete.");
}
batchProcessDocument();
I think this is the solution: https://stackoverflow.com/a/66765483/15461811
(you have to set the apiEndpoint parameter)
Is there any way to query a users total unread messages count across all channels from the server?
I can see it is possible to do this from the client using setUser but this is not appropriate for usage in a server side scenario. I am using nodejs, any suggestions would be appreciated.
Thanks,
Mark
This is how I got all the messages across all channels from the server. I wrote it in JS, hopefully it will help you out:
async function listUnreadMessages(user) {
const serverClient = new StreamChat(api_key, stream_secret, options)
await serverClient.setUser(
{
id: `${user.id}`,
name: `${user.full_name}`,
image: user.profile_image
},
user.chat_token
)
const filter = { members: { $in: [`${user.id}`] } }
const sort = { last_message_at: -1 }
const channels = await serverClient.queryChannels(filter, sort, {
watch: true
})
let unreadList = {}
const unreads = await Promise.all(
channels.map((c) => {
unreadList[c.id] = c.countUnread()
return c.countUnread()
})
)
serverClient.disconnect()
return unreadList
}
I'm trying to get started with NodeJS and Kafka. I have one Kafka server running in a docker container with one Zookeeper server.
I can successfully connect a listener to my topic and when my code behaves I see messages passed from Producer to Consumer.
However - more often than not, my code fails with an error. Here are two consecutive attempts to Produce a message into the Kafka topic
2018-01-09T22:14:29.061Z - info: KafkaProducer: client is ready
err: null
data: [object Object]
err: BrokerNotAvailableError: Broker not available
data: undefined
2018-01-09T22:15:03.770Z - info: KafkaProducer: client is ready
The code that I'm using to publish the message is here:
'use strict';
const log = require('./Logger');
const kafka = require('kafka-node');
const Producer = kafka.Producer;
const KeyedMessage = kafka.KeyedMessage;
const Client = kafka.Client;
const Promise = require('bluebird');
const uuid = require('uuid');
const client = new Client('192.168.99.100:2181', 'test', {
sessionTimeout: 300,
spinDelay: 100,
retries: 2
});
const producer = new Producer(client, {requireAcks: 1});
producer.on('ready', () => {
log.info('KafkaProducer: client is ready');
});
producer.on('error', (err) => {
log.error(`KafkaProducer: ${err}`);
});
class KafkaProducer {
constructor() {}
sendRecord(type, userId, sessionId, data, callback) {
if (!userId) {
return callback(new Error('A userId must be provided.'));
}
const event = {
id: uuid.v4(),
timestamp: Date.now(),
userId: userId,
sessionId: sessionId,
type: type,
data: data
};
const buffer = new Buffer.from(JSON.stringify(event));
// Create a new payload
const record = [
{
topic: 'datastore',
messages: buffer,
attributes: 1 /* Use GZip compression for the payload */
}
];
//Send record to Kafka and log result/error
producer.send(record, callback);
}
}
module.exports = KafkaProducer;
Not particularly elegant code, but I'm just trying to get a simple use case working before integrating Kafka into my application.
If it is relevant this are my Dockerfiles for Kafka and Zookeeper
FROM wurstmeister/kafka:1.0.0
ENV KAFKA_ADVERTISED_HOST_NAME service-kafka
ENV KAFKA_PORT 9092
ENV KAFKA_HOST_NAME service-kafka
ENV KAFKA_ZOOKEEPER_CONNECT service-zookeeper:2181
ENV KAFKA_CREATE_TOPICS "auth:1:1,datastore:1:1,transactions:1:1"
and
FROM zookeeper:latest
This comment in a GitHub issue seems to suggest that it is the applications responsibility to catch connection errors and retry the message, but this happens so frequently there is something else wrong that I need some guidance with. Thanks