Have a question for those with more experience with worker threads here..
I have been doing some testing with worker threads and have a question on how they work, or maybe how they should work.
I am using a worker thread pool called piscina, this has been setup and appears to be working. 'Appears' to is the key..
Here is my scenario. I have a 'workers.js' file that has a 'longer' run script ( this is for testing and purposefully a long loop)
When running it, it does what it should, the main EL is still open to process other tasks etc, however what I have noticed is that there only appears to be 1 worker.
What I mean by that, is subsequent request to that route seem to get queued up, so the request don't run in parallel, instead, waits until the first worker finished, then goes on to the next.
What I would like to have happen, is that each request fires off a new 'worker' (to a point, then place them in the que) right now we have the app running in containers, so if the CPU or Mem gets too high, it should push traffic to the other container etc.
Now with that being said, we would still like to have the workers spawn out with each request as to not bottleneck the inbound request to that 'page'.
Again, maybe I'm not understanding this properly but anyone with more experience with workers that would help me out would be greatly appreciated.
*** edit with code ***
Route:
*Route contains imports of piscina library
router.get('/:error?', auth("3","edit"),function(req,res){
console.log('running')
let piscina = new Piscina({
filename: path.resolve(__dirname, 'worker.js'),
minThreads:5
});
const result = piscina.run({accountID:req.session.AccountID,cID:req.session.cID,cCode:req.session.cCode}).then(data =>{
console.log(date)
})
});
Worker file
module.exports = async({accountID,cID,cCode}) => {
const n = 10000000000;
for (let i = 1; i <= n; i++) {
}
return 'finished';
})
After running the loop, it simply return back the 'finished' string, as noted that works, however if I hit the page in multiple tabs they do not all finish at the same time, instead, lets say tab 1 takes 7 seconds to finish, tab 2 and 3 tak 14, then 21 etc.
Note: The variables we are passing to the worker have no use RN, its just a loop that we run for testing to verify the flow
Related
I am very new to NodeJS and trying to develop an application which acts as a scheduler that tries to fetch data from ELK and sends the processed data to another ELK. I am able to achieve the expected behaviour but after completing all the processes, scheduler job does not exists and wait for another scheduler job to come up.
Note: This scheduler runs every 3 minutes.
job.js
const self = module.exports = {
async schedule() {
if (process.env.SCHEDULER == "MinuteFrequency") {
var timenow = moment().seconds(0).milliseconds(0).valueOf();
var endtime = timenow - 60000;
var starttime = endtime - 60000 * 3;
//sendData is an async method
reports.sendData(starttime, endtime, "SCHEDULER");
}
}
}
I tried various solutions such Promise.allSettled(....., Promise.resolve(true), etc, but not able to fix this.
As per my requirement, I want the scheduler to complete and process and exit so that I can save some resources as I am planning to deploy the application using Kubernetes cronjobs.
When all your work is done, you can call process.exit() to cause your application to exit.
In this particular code, you may need to know when reports.sendData() is actually done before exiting. We would have to know what that code is and/or see the code to know how to know when it is done. Just because it's an async function doesn't mean it's written properly to return a promise that resolves when it's done. If you want further help, show us the code for sendData() and any code that it calls too.
This an Angular app, and this specific code is inside a webworker in Typescript. I'm still new to webworkers, but both the sleep and the loop execute inside the same thread.
The intent is to poll a service and exit the loop when a the process is completed. My problem is the sleep call below is not sleeping. I need it to delay for at least 9 seconds, and ideally I'd like it to be configurable. But it runs as though the sleep didn't run.
I have two questions:
Why is the sleep not working?
This is an Angular app served by a NodeJS container on cirrus. When the Angular app is requested and served by this NodeJS server, I'd like to pass a secret defined at the NodeJS server along with the Angular app. This secret would be the polling delay. Would a cookie value be returned in the NodeJS Angular App response? Not sure what the best response would be.
Code below:
function sleep(ms: number) {
return new Promise((resolve) => {
log('DEBUG','Sleeping for ' + ms + ' ms');
setTimeout(resolve, ms);
});
}
while (jobIsStillRunning(jobExecutionResult)) {
postMessageWithLog(jobExecutionResult);
log('DEBUG','sendFilePolling() jobExecutionResult=' + JSON.stringify(jobExecutionResult));
sleep(30000); // Sleep function runs but doesn't do anything.
jobExecutionResult = await getAsyncFilesResult(jobExecutionResult);
}
You need to await on sleep function
I think you need to know macro and micro tasks.
Microtasks come solely from our code. They are usually created by promises: an execution of .then/catch/finally handler becomes a microtask
If a microtask recursively queues other microtasks, it might take a long time until the next macrotask is processed. This means, you could end up with a blocked UI, or some finished I/O idling in your application.
example:
macrotasks: setTimeout, setInterval, setImmediate, I/O, UI rendering, etc/
microtasks: process.nextTick, Promises, etc.
code:
console.log('1')
setTimeout(()=>{
console.log('2')
},0)
Promise.resolve().then(()=>{
console.log('3')
})
console.log('4')
output:
1
4
3
2
wait what?!
explain:
console.log('1') it's a normal code and immediately run.
setTimeout it's a macrotask then it will add to macro queue
Promise.resolve it's a microtask then it will add to micro queue
console.log('4') it a normal code and immediately run.
now when normal codes are end and now microtaskqueue and macrotaskqueue need to be dequeue.
inside microtaskqueue we have one task and it's Promise.resolve then immediately run .then function
now microtaskqueue is clear too, inside macrotaskqueue we have one task and it's setTimeout then immediately run timeoutCallback function
if you want to use sleep function without await you need to handle this.
but you can add await before your sleep function and wait on it
Guys Heroku is terminating the req if the response takes more then 30sec to return, so is there any way I can wait for as long as the response would come back?
Well the user is uploading his file and I need to do something with the file in my server and after updates are done I will give a download link to the user. But mostly it takes more then 30 sec for the server to process the file so that the user need to wait for response
From the official Heroku Helpcenter : https://devcenter.heroku.com/articles/request-timeout
The timeout value is not configurable. If your server requires longer than 30 seconds to complete a given request, we recommend moving that work to a background task or worker to periodically ping your server to see if the processing request has been finished. This pattern frees your web processes up to do more work, and decreases overall application response times.
The short answer is : No, you can't change this configuration. I suggest you investigate why your application needs more than 30 seconds to process that request. If it takes longer than 10 seconds your really should consider the steps suggested in the Heroku Help Center 👆
Your Problem
You mention you need this for file processing. I understand that file processing could easily take longer than 30 seconds. Normally what I would do is to just create some sort of task reference and keep it in a database along with a status ("processing", "finished", "failed") - also store the original file and then just end the request of the user. This shouldn't take long. Then process the task ... with another endpoint or websocket connection the user could check if the task has been fullfilled.
Use a Task Queue
The following is just a basic interpretation of a solution - it's not meant for copy & pasting as it depends on so many things.
Routes (Endpoints)
Basically you need to have 3 routes in your backend. One for uploading the file, one for downloading the processed file and one for checking the status of the task.
1. Upload
app.post('/files', /* some middleware e.g. multer */, async (req, res) => {
// This is your upload controller
// I assume at this point the file has been uploaded and
// req.file contains a reference to the uploaded file.
// create new process task and add to queue
const task = await createNewTask(req.file);
queue.push(task);
// now a task has been created, but the user
// doesn't need to wait for it to finish
// so let's end the request here.
return req.status(200).json(task);
});
2. Check Status
app.get('/task/:id', async (req, res) => {
// From uploading a file in the first step, you'll
// get back a task id. Use the task id to check on
// the status.
const task = await getTask(req.params.id);
if (!task) {
return res.status(404).end();
} else {
return res.status(200).json(task);
}
});
The task can include informations like status, progress percentage, original filename, new filename or even a download link to the processed file once it's finished. Status could be something like pending, processing, finished or failed.
3. Download
app.get('/file/:filename', (req, res) => {
return req.status(200)
.sendFile('./path/to/file/' + req.params.filename);
});
Notes
It might be a good idea to rename the incoming files with a random id like a uuid. So it's easier to work with them in the automation process. Also the random id could be used for the task id at the same time.
It's up to you how big you want to go with this. For the task queue there are many different libraries to help you out with it. It could be an in-memory queue or one that's backed with a database.
EDIT
I have noticed the removal of the .end() function appears to solve the issue, but after reading the Nightmare docs on the use of .end() it says: Completes any queue operations, disconnect and close the electron process.
Now while this does solve the problem, am I now just opening more and more electron processes each time the route is called, which will eventually cause the server to run out of memory, or is this a safe way to fix the issue?
ORIGINAL TEXT
Please consider the following problem:
I am developing a Node based service that will allow the user to request screenshot of a particular URL.
For this I am using Nightmare to visit the URL, wait 2 seconds, take a screenshot, which is saved to the disk, convert it to base64, delete the image and then return the base64 string.
console.log('Nightmare starts');
nightmare
.goto(url)
.wait(2000)
.screenshot(filename)
.end()
.then(function (result)
{
fs.exists(filename, function(exists)
{
if (exists)
{
data = fs.readFileSync(filename);
var base64 = data.toString('base64')
fs.unlink(filename);
var output = {'message':'success','map_image':base64};
res.send(output);
}
});
})
.catch(function (error)
{
console.error('Search failed:', error);
});
console.log("Nightmare Finished");
The above code works just fine, the first time it runs. However any subsequent calls to this just consoles "Nightmare starts" and "Nightmare Finished" instantly with the actual code in-between not running. I don't appear to have any errors display, nothing is caught if I wrap it in a try/catch. The node requires a reboot to allow it to happen again.
Something worth noting is that I am running on a headless ubuntu machine, as electron (one of the nightmare dependencies) appears to need a GUI, I am using xvfb to launch the node using the following command:
xvfb-run --auto-servernum --server-num=1 node server.js
I'm assuming this may be an issue with some resource not being released correctly on the first run, but any assistance would be appreciated.
Also open to any constructive criticism of my code, very new to Node and i'm sure i'm not writing in the most optimal way (sync file loading etc)
It appears that you are simply misplacing where you are creating the nightmare instances. Cannot help much without some more code snippet and information.
Way 1
Create nightmare instance every time and close them after you are done with your task. It will require some time to boot up the instance, but it will also lessen the memory load. Not to mention you can have multiple nightmare instances for different users.
Way 2
Don't end and re-use same nightmare instance. Have multiple nightmare instances and queue the call for screenshot. The websites will load fast and it won't take time to boot up an instance, but you will have longer wait time for longer queue.
I'm new in Node JS and i wonder if under mentioned snippets of code has multisession problem.
Consider I have Node JS server (express) and I listen on some POST request:
app.post('/sync/:method', onPostRequest);
var onPostRequest = function(req,res){
// parse request and fetch email list
var emails = [....]; // pseudocode
doJob(emails);
res.status(200).end('OK');
}
function doJob(_emails){
try {
emailsFromFile = fs.readFileSync(FILE_PATH, "utf8") || {};
if(_.isString(oldEmails)){
emailsFromFile = JSON.parse(emailsFromFile);
}
_emails.forEach(function(_email){
if( !emailsFromFile[_email] ){
emailsFromFile[_email] = 0;
}
else{
emailsFromFile[_email] += 1;
}
});
// write object back
fs.writeFileSync(FILE_PATH, JSON.stringify(emailsFromFile));
} catch (e) {
console.error(e);
};
}
So doJob method receives _emails list and I update (counter +1) these emails from object emailsFromFile loaded from file.
Consider I got 2 requests at the same time and it triggers doJob twice. I afraid that when one request loaded emailsFromFile from file, the second request might change file content.
Can anybody spread the light on this issue?
Because the code in the doJob() function is all synchronous, there is no risk of multiple requests causing a concurrency problem.
If you were using async IO in that function, then there would be possible concurrency issues.
To explain, Javascript in node.js is single threaded. So, there is only one thread of Javascript execution running at a time and that thread of execution runs until it returns back to the event loop. So, any sequence of entirely synchronous code like you have in doJob() will run to completion without interruption.
If, on the other hand, you use any asynchronous operations such as fs.readFile() instead of fs.readFileSync(), then that thread of execution will return back to the event loop at the point you call fs.readFileSync() and another request can be run while it is reading the file. If that were the case, then you could end up with two requests conflicting over the same file. In that case, you would have to implement some form of concurrency protection (some sort of flag or queue). This is the type of thing that databases offer lots of features for.
I have a node.js app running on a Raspberry Pi that uses lots of async file I/O and I can have conflicts with that code from multiple requests. I solved it by setting a flag anytime I'm writing to a specific file and any other requests that want to write to that file first check that flag and if it is set, those requests going into my own queue are then served when the prior request finishes its write operation. There are many other ways to solve that too. If this happens in a lot of places, then it's probably worth just getting a database that offers features for this type of write contention.