How to return a generated image with Bull.js queue? - node.js

My use case is this: I want to create screenshots of parts of a page. For technical reasons, it cannot be done on the client-side (see related question below) but needs puppeteer on the server.
As I'm running this on Heroku, I have the additional restriction of a quite small timeout window. Heroku recommends therefore to implement a queueing system based on bull.js and use worker processes for longer-running tasks as explained here.
I have two endpoints (implemented with Express), one that receives a POST request with some configuration JSON, and another one that responds to GET when provided with a job identifier (slightly modified for brevity):
This adds the job to the queue:
router.post('/', async function(req, res, next) {
let job = await workQueue.add(req.body.chartConfig)
res.json({ id: job.id })
})
This returns info about the job
router.get('/:id', async(req, res) => {
let id = req.params.id;
let job = await workQueue.getJob(id);
let state = await job.getState();
let progress = job._progress;
let reason = job.failedReason;
res.json({ id, state, progress, reason });
})
In a different file:
const start = () => {
let workQueue = new queue('work', REDIS_URL);
workQueue.process(maxJobsPerWorker, getPNG)
}
const getPNG = async(job) => {
const { url, width, height, chart: chartConfig, chartUrl } = job.data
// ... snipped for brevity
const png = await page.screenshot({
type: 'png',
fullPage: true
})
await page.close()
job.progress(100)
return Promise.resolve({ png })
}
// ...
throng({ count: workers, worker: start })
module.exports.getPNG = getPNG
The throng invocation at the end specifies the start function as the worker function to be called when picking a job from the queue. start itself specifies getPNG to be called when treating a job.
My question now is: how do I get the generated image (png)? I guess ideally I'd like to be able to call the GET endpoint above which would return the image, but I don't know how to pass the image object.
As a more complex fall-back solution I could imagine posting the image to an image hosting service like imgur, and then returning the URL upon request of the GET endpoint. But I'd prefer, if possible, to keep things simple.
This question is a follow-up from this one:
Issue with browser-side conversion SVG -> PNG

I've opened a ticket on the GitHub repository of the bull project. The developers said that the preferred practice is to store the binary object somewhere else, and to add only the link metadata to the job's data store.
However, they also said that the storage limit of a job object appears to be 512 Mb. So it is also quite possible to store an image of a reasonable size as a base64-encoded string.

Related

How should I go about using Redis for creating notifications with express/nodejs?

Okay so I have a Nodejs/Express app that has an endpoint which allows users to receive notifications by opening up a connection to said endpoint:
var practitionerStreams = [] // this is a list of all the streams opened by pract users to the
backend
async function notificationEventsHandler(req, res){
const headers ={
'Content-Type': 'text/event-stream',
'Connection': 'keep-alive',
'Cache-Control': 'no-cache'
}
const practEmail = req.headers.practemail
console.log("PRACT EMAIL", practEmail)
const data = await ApptNotificationData.findAll({
where: {
practEmail: practEmail
}
})
//console.log("DATA", data)
res.writeHead(200, headers)
await res.write(`data:${JSON.stringify(data)}\n\n`)
// create a new stream
const newPractStream = {
practEmail: practEmail,
res
}
// add the new stream to list of streams
practitionerStreams.push(newPractStream)
req.on('close', () => {
console.log(`${practEmail} Connection closed`);
practitionerStreams = practitionerStreams.filter(pract => pract.practEmail !== pract.practEmail);
});
return res
}
async function sendApptNotification(newNotification, practEmail){
var updatedPractitionerStream = practitionerStreams.map((stream) =>
// iterate through the array and find the stream that contains the pract email we want
// then write the new notification to that stream
{
if (stream["practEmail"]==practEmail){
console.log("IF")
stream.res.write(`data:${JSON.stringify(newNotification)}\n\n`)
return stream
}
else {
// if it doesnt contain the stream we want leave it unchanged
console.log("ELSE")
return stream
}
}
)
practitionerStreams = updatedPractitionerStream
}
Basically when the user connects it takes the response object (that will stay open), will put that in an Object along with a unique email, and write to it in the future in sendApptNotification
But obviously this is slow for a full app, how exactly do I replace this with Redis? Would I still have a Response object that I write to? Or would that be replaced with a redis stream that I can subscribe to on the frontend? I also assume I would store all my streams on redis as well
edit: from what examples I've seen people are writing events from redis to the response object
Thank you in advance
If you want to use Redis Stream as notification system, you can follow this official guide:
https://redis.com/blog/how-to-create-notification-services-with-redis-websockets-and-vue-js/ .
To get this data as real time you need to create a websocket connection. I prefer to send to you an official guide instead of create it for you it's because the quality of this guide. It's perfect to anyone understand how to create it, but you need to adapt for your reality, of course.
However like I've said to you in the comments, I just believe that it's more simple to do requests in your api endpoint like /api/v1/notifications with setInterval in your frontend code and do requests each 5 seconds for example. If you prefer to use a notification system as real time I think you need to understand why do you need it, so in the future you can change your system if you think you need it. Basically it's a trade-off you must to do!
For my example imagine two tables in a relational database, one as Users and the second as Notifications.
The tables of this example:
UsersTable
id name
1 andrew
2 mark
NotificationTable
id message userId isRead
1 message1 1 true
2 message2 1 false
3 message3 2 false
The endpoint of this example will return all cached notifications that isn't read by the user. If the cache doesn't exists, it will return the data from the database, put it on the cache and return to the user. In the next call from API, you'll get the result from cache. There some points to complete in this example, for example the query on the database to get the notifications, the configuration of time expiration from cache and the another important thing is: if you want to update all the time the notifications in the cache, you need to create a middleware and trigger it in the parts of your code that needs to notify the notifications user. In this case you'll only update the database and cache. But I think you can complete these points.
const redis = require('redis');
const redisClient = redis.createClient();
app.get('/notifications', async (request, response) => {
const userId = request.user.id;
const cacheResult = await redisClient.get(`user:${userId}:notifications`)
if (cacheResult) return response.send(cacheResult);
const notifications = getUserNotificationsFromDatabase(userId);
redisClient.set(`user:${userId}:notifications`, notifications);
response.send(notifications);
})
Besides that there's another way, you can simple use only the redis or only the database to manage this notification. Your relational database with the correct index will send to your the results as faster as you expect. You'll only think about how much notifications you'll have been.

How to implement Heroku background processes in Node

I'm very new to Heroku and node so have a basic question just about how to implement background processes in a graphql server app I have hosted on Heroku.
I have a working graphql server written in Keystone CMS and hosted on Heroku.
In the database I have a schema called `Item` which basically just takes a URL from the user and then tries to scrape a Hero Image from that URL.
As the URL can be anything, I'm trying to use a headless browser via Playwright in order to get images
This is a memory intensive process though and Heroku is OOM'ing with R14 errors. For this they recommend migrating intensive work like this to a Background Job via Redis, implemented in Bull and Throng
I've never used redis before nor these other libraries so I'm out of my element. I've looked at the Heroku implementation examples "server" and "worker" but haven't been able to translate those into a working implementation. To be honest I just don't understand the flow and design pattern I'm supposed to use with those even after reading the docs and examples.
Here is my code:
Relevant CMS schema where I call the getImageFromURL() function which is memory intensive
# Item.ts
import getImageFromURL from '../lib/imageFromURL'
export const Item = list({
...
fields: {
url: text({
validation: { isRequired: false },
}),
imageURL: text({
validation: { isRequired: false },
}),
....
},
hooks: {
resolveInput: async ({ resolvedData }) => {
if (resolvedData.url) {
const imageURL: string | undefined = await getImageFromURL(
// pass the user-provided url to the image scraper
resolvedData.url
)
if (imageURL) {
return {
...resolvedData,
// if we scraped successfully, return URL to image asset
imageURL,
}
}
return resolvedData
}
return resolvedData
},
}
Image scraping function getImageFromURL() (where I believe the background job needs to go?)
filtered to relevant parts
# imageFromURL.ts
// set up redis for processing
const Queue = require('bull')
const throng = require('throng')
const REDIS_URL = process.env.REDIS_URL || 'redis://127.0.0.1:6379'
let workers = 2
async function scrapeURL(urlString){
...
scrape images with playwright here
...
return url to image asset here
}
// HERE IS WHERE I'M STUCK
// How do I do `scrapeURL` in a background process?
export default async function getImageFromURL(
urlString: string
): Promise<string | undefined> {
let workQueue = new Queue('scrape_and_uppload', REDIS_URL)
// Something like this?
// const imageURL = await scrapeURL(urlString) ??
// Or this?
// This fails with:
// "TypeError: handler.bind is not a function"
// but I'm just lost as to how this should even work
// workQueue.process(2, scrapeURL(urlString))
return Promise.resolve(imageURL)
}
Then when testing I call this with throng((url) => getImageFromURL(url), { workers }).
I have my local redis db running but I'm not even seeing any log spew when I run this so I don't think I'm even successfully connecting to redis?
Thanks in advance let me know where I'm unclear or can add code examples

nodejs: create same value in a short period

I have a sample app, user can access some dynamic data via different URL.
The workflow is like this:
when user request get_data?id=1234567
first it checks the DB if there is data for it
if not, generate a random value
then if other users request the same url within a short time (say 10 min), it will return the value that already generated
if one of the users send a clear request, the value will be cleared from DB.
The bug is: if 2 users request the same url at the same time, since it needs time to query the DB, it would do 1 and 2 at the same time, then create different values for each user.
How to make sure that in a short period, it always generate same value for all users?
Although NodeJS is single threaded and does not have the problem of synchronization between multiple threads, its asynchronous event model still can require you to implement some kind of locking mechanism to synchronize the concurrent async operations in certain situations (like in your case).
There are a number of libraries that provide this functionality, e.g. async-mutex. Here's a very basic example of what your code could look like:
const express = require('express');
const app = express();
const Mutex = require('async-mutex').Mutex;
const locks = new Map();
app.get('/get_data', async (req, res) => {
const queryId = req.query.id;
if (!queryId) {
// handle empty queryid ...
}
if (!locks.has(queryId)) {
locks.set(queryId, new Mutex());
}
const lockRelease = await locks
.get(queryId)
.acquire();
try {
// do the rest of your logic here
} catch (error) {
// handle error
} finally {
// always release the lock
lockRelease();
}
});
app.listen(4000, function () {
console.log("Server is running at port 4000");
});

Return early from long-running POST in Node/Express

I'm new to Node/Express. I have a long-running series of processes, for example: post to Express endpoint -> save data (can return now) -> handle data -> handle data -> handle data -> another process -> etc.
A typical POST:
app.post("/foo", (req, res) => {
// save data and return
return res.send("200");
// but now I want to do a lot more stuff...
});
If I omit the return then more processing will occur, but even though I' a newbie to this stack, I can tell that's a bad idea.
All I want is to receive some data, save it and return. Then I want to start processing it, and call into other processes, which call into other processes, etc. I don't want the original POST to wait for all this to complete.
I need to do this in-process, so I can't save to a queue and process it separately afterwards.
Basically I want to DECOUPLE the receipt and processing of the data, in process.
What options are available using Node/Express?
I'd try something like this:
const express = require("express");
const port = 3000;
const app = express();
const uuid = require('uuid');
app.post("/foo", (req, res) => {
const requestId = uuid.v4();
// Send result. Set status to 202: The request has been accepted for processing, but the processing has not been completed. See https://tools.ietf.org/html/rfc7231#section-6.3.3.
res.status(202).json({ status: "Processing data..", requestId: requestId });
// Process request.
processRequest(requestId, request);
});
app.get("/fooStatus", (req, res) => {
// Check the status of the request.
let requestId = req.body.requestId;
});
function processRequest(requestId, request) {
/* Process request here, then perhaps save result to db. */
}
app.listen(port);
console.log(`Serving at http://localhost:${port}`);
Calling this with curl (for example):
curl -v -X POST http://localhost:3000/foo
Would give a response like:
{"status":"Processing data..","requestId":"abbf6a8e-675f-44c1-8cdd-82c500cbbb5e"}
There is absolutely nothing wrong with your approach of removing return here and ending the request.....so long as you don't have any other code that tries to send any data back later on.
I'd recommend returning status code 202 Accepted for these long running scenarios though, this indicates to the consumer that the server has accepted the request but it's not finished.

Publishing and subscribing to node-redis for an image resizing job after form POST

I've got a form submission that accepts an image. I'm creating thumbnails (resizing and pushing to S3) with the image upon submission, and it's taking awhile and blocking so I've decided I want to push it to a message queue and have that handle it.
I've decided to go with node-redis, since I'm already using redis in my stack. What I'm unclear on is how exactly the implementation would look (in its most basic form).
Consider some pseudocode below as:
var redis = require('redis'),
client = redis.createClient();
function listenForJob() {
client.on('message', function(msg) {
// msg is our temporary path name
// Kick of resize and push to s3 job
});
}
// Attached to my route where I POST to (e.g. /submit)
exports.form = function(req, res, next) {
// Input comes in, and image comes in as req.files
// and a temporary image is saved in /uploads
// which I'll save as the image submission for now
// until the process to resize and push to s3 happens.
listenForJob();
// Save to db
var data = {
title: req.body.title,
img: req.files.path // save temp file
}
connection.query('INSERT INTO submissions SET ?', data, function(err, rows) {
// Publish to our redis client?
client.publish('message', req.files.path);
connection.release();
res.redirect('/submissions');
});
};
Is this implementation even remotely the correct way to approach this? I'm new to taskworkers/message queues so I'm just wondering how to do implement it properly (given my use case).

Resources