How to implement Heroku background processes in Node - node.js

I'm very new to Heroku and node so have a basic question just about how to implement background processes in a graphql server app I have hosted on Heroku.
I have a working graphql server written in Keystone CMS and hosted on Heroku.
In the database I have a schema called `Item` which basically just takes a URL from the user and then tries to scrape a Hero Image from that URL.
As the URL can be anything, I'm trying to use a headless browser via Playwright in order to get images
This is a memory intensive process though and Heroku is OOM'ing with R14 errors. For this they recommend migrating intensive work like this to a Background Job via Redis, implemented in Bull and Throng
I've never used redis before nor these other libraries so I'm out of my element. I've looked at the Heroku implementation examples "server" and "worker" but haven't been able to translate those into a working implementation. To be honest I just don't understand the flow and design pattern I'm supposed to use with those even after reading the docs and examples.
Here is my code:
Relevant CMS schema where I call the getImageFromURL() function which is memory intensive
# Item.ts
import getImageFromURL from '../lib/imageFromURL'
export const Item = list({
...
fields: {
url: text({
validation: { isRequired: false },
}),
imageURL: text({
validation: { isRequired: false },
}),
....
},
hooks: {
resolveInput: async ({ resolvedData }) => {
if (resolvedData.url) {
const imageURL: string | undefined = await getImageFromURL(
// pass the user-provided url to the image scraper
resolvedData.url
)
if (imageURL) {
return {
...resolvedData,
// if we scraped successfully, return URL to image asset
imageURL,
}
}
return resolvedData
}
return resolvedData
},
}
Image scraping function getImageFromURL() (where I believe the background job needs to go?)
filtered to relevant parts
# imageFromURL.ts
// set up redis for processing
const Queue = require('bull')
const throng = require('throng')
const REDIS_URL = process.env.REDIS_URL || 'redis://127.0.0.1:6379'
let workers = 2
async function scrapeURL(urlString){
...
scrape images with playwright here
...
return url to image asset here
}
// HERE IS WHERE I'M STUCK
// How do I do `scrapeURL` in a background process?
export default async function getImageFromURL(
urlString: string
): Promise<string | undefined> {
let workQueue = new Queue('scrape_and_uppload', REDIS_URL)
// Something like this?
// const imageURL = await scrapeURL(urlString) ??
// Or this?
// This fails with:
// "TypeError: handler.bind is not a function"
// but I'm just lost as to how this should even work
// workQueue.process(2, scrapeURL(urlString))
return Promise.resolve(imageURL)
}
Then when testing I call this with throng((url) => getImageFromURL(url), { workers }).
I have my local redis db running but I'm not even seeing any log spew when I run this so I don't think I'm even successfully connecting to redis?
Thanks in advance let me know where I'm unclear or can add code examples

Related

Next.js - Rebuild Only New Pages

I am trying to build a website where a only 2 pages will be updated and 1 will be created ~daily. In my case, that would imply getting the new page to be created via an API route in a NodeJs backend, and the updated content also would come from an API, but to update the redux state.
All other pages would stay completely the same. The problem is, if I build from next.js, this build time would increment daily and this is not a good option.
Is there a way to build only the differences / force some pages to stay the same?
try the code below. for more check this link: Incremental Static Regeneration
export async function getStaticProps() {
const res = await fetch('https://.../newPost')
const newPost = await res.json()
return {
props: {
newPost,
},
// Next.js will attempt to re-generate the page:
// - When a request comes in
// - At most once every day
revalidate: 86400, // seconds in a day
}
}

How to cancel a task enqueued on Firebase Functions?

I'm talking about this: https://firebase.google.com/docs/functions/task-functions
I want to enqueue tasks with the scheduleTime parameter to run in the future, but I must be able to cancel those tasks.
I expected it would be possible to do something like this pseudo code:
const task = await queue.enqueue({ foo: true })
// Then...
await queue.cancel(task.id)
I'm using Node.js. In case it's not possible to cancel a scheduled task with firebase-admin, can I somehow work around it by using #google-cloud/tasks directly?
PS: I've also created a feature request: https://github.com/firebase/firebase-admin-node/issues/1753
The Firebase SDK doesn't return the task name/ID right now as in the code.
If you need this functionality, I'd recommend filing a feature request and meanwhile use Cloud Tasks directly.
You can simply create a HTTP Callable Function and then use the Cloud Tasks SDK to create a HTTP Target tasks that call this Cloud Function instead of using the onDispatch.
// Instead of onDispatch()
export const handleQueueEvent = functions.https.onRequest((req, res) => {
// ...
});
Adding a Cloud Task:
async function createHttpTask() {
const parent = client.queuePath(project, location, queue);
const task = {
httpRequest: {
httpMethod: 'POST', // change method if required
url, // Use URL of handleQueueEvent function here
},
};
const request = {
parent: parent,
task: task
};
const [response] = await client.createTask(request);
return;
}
Checkout the documentation
for more information.

How can I properly create redirects from an array in Gatsby

I am working with Gatsby and WordPress. I am trying to redirect some URLs using the Gatsby redirect API. I write the query to get an Object and then I use the Map method to create an array of the items we need from that object. I then run a for Each method to get the individual data from that array but it fails on running the development server.
What is the Right way to do this?
const { createRedirect } = actions;
const yoastRedirects = graphql(`
{
wp {
seo {
redirects {
format
origin
target
type
}
}
}
}
`)
const redirectOriginUrls = yoastRedirects.wp.seo.redirects.map(redirect=>(redirect.origin))
const redirectTargetUrls = yoastRedirects.wp.seo.redirects.map(redirect=>(
redirect.target
))
redirectOriginUrls.forEach(redirectOriginUrl=>(
redirectTargetUrls.forEach(redirectTargetUrl=>(
createRedirect({
fromPath: `/${redirectOriginUrl}`,
toPath: `/${redirectTargetUrl}`,
isPermanent: true
})
))
))
The createRedirect API needs to recieve a structure like:
exports.createPages = ({ graphql, actions }) => {
const { createRedirect } = actions
createRedirect({ fromPath: '/old-url', toPath: '/new-url', isPermanent: true })
createRedirect({ fromPath: '/url', toPath: '/zn-CH/url', Language: 'zn' })
createRedirect({ fromPath: '/not_so-pretty_url', toPath: '/pretty/url', statusCode: 200 })
// Create pages
}
In your case, you are not entering to the correct fetched data. Assuming that the loops are properly done, you must do:
let redirectOriginUrls=[];
let redirectTargetUrls=[];
yoastRedirects.data.wp.seo.redirects.map(redirect=>{
return redirectOriginUrls.push(redirect.origin)
});
yoastRedirects.data.wp.seo.redirects.map(redirect=>{
return redirectTargetUrls.push(redirect.target)
})
Instead of:
const redirectOriginUrls = yoastRedirects.wp.seo.redirects.map(redirect=>(redirect.origin))
const redirectTargetUrls = yoastRedirects.wp.seo.redirects.map(redirect=>(
redirect.target
))
Notice the .data addition in the nested object.
In addition, keep in mind that the createRedirect API will only work only when having a hosting infrastructure behind, like AWS or Netlify, both have plugins integration with Gatsby. This will generate meta redirect HTML files for redirecting on any static file host.

How to return a generated image with Bull.js queue?

My use case is this: I want to create screenshots of parts of a page. For technical reasons, it cannot be done on the client-side (see related question below) but needs puppeteer on the server.
As I'm running this on Heroku, I have the additional restriction of a quite small timeout window. Heroku recommends therefore to implement a queueing system based on bull.js and use worker processes for longer-running tasks as explained here.
I have two endpoints (implemented with Express), one that receives a POST request with some configuration JSON, and another one that responds to GET when provided with a job identifier (slightly modified for brevity):
This adds the job to the queue:
router.post('/', async function(req, res, next) {
let job = await workQueue.add(req.body.chartConfig)
res.json({ id: job.id })
})
This returns info about the job
router.get('/:id', async(req, res) => {
let id = req.params.id;
let job = await workQueue.getJob(id);
let state = await job.getState();
let progress = job._progress;
let reason = job.failedReason;
res.json({ id, state, progress, reason });
})
In a different file:
const start = () => {
let workQueue = new queue('work', REDIS_URL);
workQueue.process(maxJobsPerWorker, getPNG)
}
const getPNG = async(job) => {
const { url, width, height, chart: chartConfig, chartUrl } = job.data
// ... snipped for brevity
const png = await page.screenshot({
type: 'png',
fullPage: true
})
await page.close()
job.progress(100)
return Promise.resolve({ png })
}
// ...
throng({ count: workers, worker: start })
module.exports.getPNG = getPNG
The throng invocation at the end specifies the start function as the worker function to be called when picking a job from the queue. start itself specifies getPNG to be called when treating a job.
My question now is: how do I get the generated image (png)? I guess ideally I'd like to be able to call the GET endpoint above which would return the image, but I don't know how to pass the image object.
As a more complex fall-back solution I could imagine posting the image to an image hosting service like imgur, and then returning the URL upon request of the GET endpoint. But I'd prefer, if possible, to keep things simple.
This question is a follow-up from this one:
Issue with browser-side conversion SVG -> PNG
I've opened a ticket on the GitHub repository of the bull project. The developers said that the preferred practice is to store the binary object somewhere else, and to add only the link metadata to the job's data store.
However, they also said that the storage limit of a job object appears to be 512 Mb. So it is also quite possible to store an image of a reasonable size as a base64-encoded string.

How can one upload an image to a KeystoneJS GraphQL endpoint?

I'm using TinyMCE in a custom field for the KeystoneJS AdminUI, which is a React app. I'd like to upload images from the React front to the KeystoneJS GraphQL back. I can upload the images using a REST endpoint I added to the Keystone server -- passing TinyMCE an images_upload_handler callback -- but I'd like to take advantage of Keystone's already-built GraphQL endpoint for an Image list/type I've created.
I first tried to use the approach detailed in this article, using axios to upload the image
const getGQL = (theFile) => {
const query = gql`
mutation upload($file: Upload!) {
createImage(file: $file) {
id
file {
path
filename
}
}
}
`;
// The operation contains the mutation itself as "query"
// and the variables that are associated with the arguments
// The file variable is null because we can only pass text
// in operation variables
const operation = {
query,
variables: {
file: null
}
};
// This map is used to associate the file saved in the body
// of the request under "0" with the operation variable "variables.file"
const map = {
'0': ['variables.file']
};
// This is the body of the request
// the FormData constructor builds a multipart/form-data request body
// Here we add the operation, map, and file to upload
const body = new FormData();
body.append('operations', JSON.stringify(operation));
body.append('map', JSON.stringify(map));
body.append('0', theFile);
// Create the options of our POST request
const opts = {
method: 'post',
url: 'http://localhost:4545/admin/api',
body
};
// #ts-ignore
return axios(opts);
};
but I'm not sure what to pass as theFile -- TinyMCE's images_upload_handler, from which I need to call the image upload, accepts a blobInfo object which contains functions to give me
The file name doesn't work, neither does the blob -- both give me server errors 500 -- the error message isn't more specific.
I would prefer to use a GraphQL client to upload the image -- another SO article suggests using apollo-upload-client. However, I'm operating within the KeystoneJS environment, and Apollo-upload-client says
Apollo Client can only have 1 “terminating” Apollo Link that sends the
GraphQL requests; if one such as apollo-link-http is already setup,
remove it.
I believe Keystone has already set up Apollo-link-http (it comes up multiple times on search), so I don't think I can use Apollo-upload-client.
The UploadLink is just a drop-in replacement for HttpLink. There's no reason you shouldn't be able to use it. There's a demo KeystoneJS app here that shows the Apollo Client configuration, including using createUploadLink.
Actual usage of the mutation with the Upload scalar is shown here.
Looking at the source code, you should be able to use a custom image handler and call blob on the provided blobInfo object. Something like this:
tinymce.init({
images_upload_handler: async function (blobInfo, success, failure) {
const image = blobInfo.blob()
try {
await apolloClient.mutate(
gql` mutation($image: Upload!) { ... } `,
{
variables: { image }
}
)
success()
} catch (e) {
failure(e)
}
}
})
I used to have the same problem and solved it with Apollo upload link. Now when the app got into the production phase I realized that Apollo client took 1/3rd of the gzipped built files and I created minimal graphql client just for keystone use with automatic image upload. The package is available in npm: https://www.npmjs.com/package/#sylchi/keystone-graphql-client
Usage example that will upload github logo to user profile if there is an user with avatar field set as file:
import { mutate } from '#sylchi/keystone-graphql-client'
const getFile = () => fetch('https://github.githubassets.com/images/modules/logos_page/GitHub-Mark.png',
{
mode: "cors",
cache: "no-cache"
})
.then(response => response.blob())
.then(blob => {
return new File([blob], "file.png", { type: "image/png" })
});
getFile().then(file => {
const options = {
mutation: `
mutation($id: ID!, $data: UserUpdateInput!){
updateUser(id: $id, data: $data){
id
}
}
`,
variables: {
id: "5f5a7f712a64d9db72b30602", //replace with user id
data: {
avatar: file
}
}
}
mutate(options).then(result => console.log(result));
});
The whole package is just 50loc with 1 dependency :)
The easies way for me was to use graphql-request. The advantage is that you don't need to set manually any header prop and it uses the variables you need from the images_upload_handler as de docs describe.
I did it this way:
const { request, gql} = require('graphql-request')
const query = gql`
mutation IMAGE ($file: Upload!) {
createImage (data:
file: $file,
}) {
id
file {
publicUrl
}
}
}
`
images_upload_handler = (blobInfo, success) => {
// ^ ^ varibles you get from tinymce
const variables = {
file: blobInfo.blob()
}
request(GRAPHQL_API_URL, query, variables)
.then( data => {
console.log(data)
success(data.createImage.fileRemote.publicUrl)
})
}
For Keystone 5 editorConfig would stripe out functions, so I clone the field and set the function in the views/Field.js file.
Good luck ( ^_^)/*

Resources