REST Endpoints moving to pending state in warp - rust

I have a warp server that servers 3 routes, one is a post while the other two are gets. I have my server set up in the following manner,
let serve = warp::serve(
routes::index_route()
.or(post_routes(data))
.or(get_routes())
.recover(move |error: Rejection| {
error::handle_rejection(error)
})
.with(warp::cors().allow_any_origin())
);
serve.run(([0, 0, 0, 0], 8080)).await;
The issue is my post_route is a bit heavy on the computation side and each
call takes a while to complete. During that time while the post route completes
executing all the instructions the get_routes are in pending state and I am not being able to query the get_routes when the post route is being executed.Is there a way
to prevent this from happening?

Related

NodeJS worker threads pools

Have a question for those with more experience with worker threads here..
I have been doing some testing with worker threads and have a question on how they work, or maybe how they should work.
I am using a worker thread pool called piscina, this has been setup and appears to be working. 'Appears' to is the key..
Here is my scenario. I have a 'workers.js' file that has a 'longer' run script ( this is for testing and purposefully a long loop)
When running it, it does what it should, the main EL is still open to process other tasks etc, however what I have noticed is that there only appears to be 1 worker.
What I mean by that, is subsequent request to that route seem to get queued up, so the request don't run in parallel, instead, waits until the first worker finished, then goes on to the next.
What I would like to have happen, is that each request fires off a new 'worker' (to a point, then place them in the que) right now we have the app running in containers, so if the CPU or Mem gets too high, it should push traffic to the other container etc.
Now with that being said, we would still like to have the workers spawn out with each request as to not bottleneck the inbound request to that 'page'.
Again, maybe I'm not understanding this properly but anyone with more experience with workers that would help me out would be greatly appreciated.
*** edit with code ***
Route:
*Route contains imports of piscina library
router.get('/:error?', auth("3","edit"),function(req,res){
console.log('running')
let piscina = new Piscina({
filename: path.resolve(__dirname, 'worker.js'),
minThreads:5
});
const result = piscina.run({accountID:req.session.AccountID,cID:req.session.cID,cCode:req.session.cCode}).then(data =>{
console.log(date)
})
});
Worker file
module.exports = async({accountID,cID,cCode}) => {
const n = 10000000000;
for (let i = 1; i <= n; i++) {
}
return 'finished';
})
After running the loop, it simply return back the 'finished' string, as noted that works, however if I hit the page in multiple tabs they do not all finish at the same time, instead, lets say tab 1 takes 7 seconds to finish, tab 2 and 3 tak 14, then 21 etc.
Note: The variables we are passing to the worker have no use RN, its just a loop that we run for testing to verify the flow

How to have more then 30sec response timeout in heroku

Guys Heroku is terminating the req if the response takes more then 30sec to return, so is there any way I can wait for as long as the response would come back?
Well the user is uploading his file and I need to do something with the file in my server and after updates are done I will give a download link to the user. But mostly it takes more then 30 sec for the server to process the file so that the user need to wait for response
From the official Heroku Helpcenter : https://devcenter.heroku.com/articles/request-timeout
The timeout value is not configurable. If your server requires longer than 30 seconds to complete a given request, we recommend moving that work to a background task or worker to periodically ping your server to see if the processing request has been finished. This pattern frees your web processes up to do more work, and decreases overall application response times.
The short answer is : No, you can't change this configuration. I suggest you investigate why your application needs more than 30 seconds to process that request. If it takes longer than 10 seconds your really should consider the steps suggested in the Heroku Help Center 👆
Your Problem
You mention you need this for file processing. I understand that file processing could easily take longer than 30 seconds. Normally what I would do is to just create some sort of task reference and keep it in a database along with a status ("processing", "finished", "failed") - also store the original file and then just end the request of the user. This shouldn't take long. Then process the task ... with another endpoint or websocket connection the user could check if the task has been fullfilled.
Use a Task Queue
The following is just a basic interpretation of a solution - it's not meant for copy & pasting as it depends on so many things.
Routes (Endpoints)
Basically you need to have 3 routes in your backend. One for uploading the file, one for downloading the processed file and one for checking the status of the task.
1. Upload
app.post('/files', /* some middleware e.g. multer */, async (req, res) => {
// This is your upload controller
// I assume at this point the file has been uploaded and
// req.file contains a reference to the uploaded file.
// create new process task and add to queue
const task = await createNewTask(req.file);
queue.push(task);
// now a task has been created, but the user
// doesn't need to wait for it to finish
// so let's end the request here.
return req.status(200).json(task);
});
2. Check Status
app.get('/task/:id', async (req, res) => {
// From uploading a file in the first step, you'll
// get back a task id. Use the task id to check on
// the status.
const task = await getTask(req.params.id);
if (!task) {
return res.status(404).end();
} else {
return res.status(200).json(task);
}
});
The task can include informations like status, progress percentage, original filename, new filename or even a download link to the processed file once it's finished. Status could be something like pending, processing, finished or failed.
3. Download
app.get('/file/:filename', (req, res) => {
return req.status(200)
.sendFile('./path/to/file/' + req.params.filename);
});
Notes
It might be a good idea to rename the incoming files with a random id like a uuid. So it's easier to work with them in the automation process. Also the random id could be used for the task id at the same time.
It's up to you how big you want to go with this. For the task queue there are many different libraries to help you out with it. It could be an in-memory queue or one that's backed with a database.

Delaying execution of multiple HTTP requests in Google Cloud Function

I've implemented a web scraper with Nodejs, cheerio and request-promise that scrapes an endpoint (basic html page) and return certain information. The content of the page I'm crawling differs based on a parameter at the end of the url (http://some-url.com?value=12345 where 12345 is my dynamic value).
I need this crawler to work every x minutes and crawl multiple pages, and to do that I've set a cronjob using Google Cloud Scheduler. (I'm fetching the dynamic values I need from Firebase).
There could be more than 50 different values for which I'd need to crawl the specific page, but I would like to ease the load with which I'm sending the requests so the server doesn't choke. To accomplish this, I've tried to add a delay
1) using setTimeout
2) using setInterval
3) using a custom sleep implementation:
const sleep = require('util').promisify(setTimeout);
All 3 of these methods work locally; all of the requests are made with y seconds delay as intended.
But when tried with Firebase Cloud Functions and Google Cloud Scheduler
1) not all of the requests are sent
2) the delay is NOT consistent (some requests fire with the proper delay, then there are no requests made for a while and other requests are sent with a major delay)
I've tried many things but I wasn't able to solve this problem.
I was wondering if anyone could suggest a different theoretical approach or a certain library etc. I can take for this scenario, since the one I have now doesn't seem to work as I intended. I'm adding one of the approaches that locally work below.
Cheers!
courseDataRefArray.forEach(async (dataRefObject: CourseDataRef, index: number) => {
console.log(`Foreach index = ${index} -- Hello StackOverflow`);
setTimeout(async () => {
console.log(`Index in setTimeout = ${index} -- Hello StackOverflow`);
await CourseUtil.initiateJobForCourse(dataRefObject.ref, dataRefObject.data);
}, 2000 * index);
});
(Note: I can provide more code samples if necessary; but it's mostly following a loop & async/await & setTimeout pattern, and since it works locally I'm assuming that's not the main problem.)

Rate Limit the Nodejs Module Request

So I'm trying to create a data scraper with Nodejs using the Request module. I'd like to limit the concurrency to 1 domains on a 20ms cycle to go through 50,000 urls.
When I execute the code, I'm DoS-ing the network with the 40Gbps bandwidth my system has access to... This creates local problems and remote problems.
The 5 concurrent scans on a 120ms cycle for 50k domains (if I calculated correctly) will finish the list in ~20 minutes and will not create any issues remotely at least.
The code I'm testing with:
var urls = // data from mongodb
urls.forEach(fn(url) {
// pseudo
request the url
process
});
The forEach function executes instantly "queueing" all urls and tries to fetch all. It seems impossible to do a delay on each loop. All google searches seem to show how to rate limit incoming request to your server/api. Same thing appears to happen with a for loop as well. Impossible to control how fast the loops execute. I'm missing something probably or the code logic is wrong. Any suggestions?
For simplification your code implementation use async/await and Promises instead callbacks.
Use package got or axios for run Promised requests.
Use p-map or similar approach form promise-fun
There is copypasted example:
const pMap = require('p-map');
const urls = [
'sindresorhus.com',
'ava.li',
'github.com',
…
];
console.log(urls.length);
//=> 100
const mapper = url => {
return fetchStats(url); //=> Promise
};
pMap(urls, mapper, {concurrency: 5}).then(result => {
console.log(result);
//=> [{url: 'sindresorhus.com', stats: {…}}, …]
});

Is there any risk to read/write the same file content from different 'sessions' in Node JS?

I'm new in Node JS and i wonder if under mentioned snippets of code has multisession problem.
Consider I have Node JS server (express) and I listen on some POST request:
app.post('/sync/:method', onPostRequest);
var onPostRequest = function(req,res){
// parse request and fetch email list
var emails = [....]; // pseudocode
doJob(emails);
res.status(200).end('OK');
}
function doJob(_emails){
try {
emailsFromFile = fs.readFileSync(FILE_PATH, "utf8") || {};
if(_.isString(oldEmails)){
emailsFromFile = JSON.parse(emailsFromFile);
}
_emails.forEach(function(_email){
if( !emailsFromFile[_email] ){
emailsFromFile[_email] = 0;
}
else{
emailsFromFile[_email] += 1;
}
});
// write object back
fs.writeFileSync(FILE_PATH, JSON.stringify(emailsFromFile));
} catch (e) {
console.error(e);
};
}
So doJob method receives _emails list and I update (counter +1) these emails from object emailsFromFile loaded from file.
Consider I got 2 requests at the same time and it triggers doJob twice. I afraid that when one request loaded emailsFromFile from file, the second request might change file content.
Can anybody spread the light on this issue?
Because the code in the doJob() function is all synchronous, there is no risk of multiple requests causing a concurrency problem.
If you were using async IO in that function, then there would be possible concurrency issues.
To explain, Javascript in node.js is single threaded. So, there is only one thread of Javascript execution running at a time and that thread of execution runs until it returns back to the event loop. So, any sequence of entirely synchronous code like you have in doJob() will run to completion without interruption.
If, on the other hand, you use any asynchronous operations such as fs.readFile() instead of fs.readFileSync(), then that thread of execution will return back to the event loop at the point you call fs.readFileSync() and another request can be run while it is reading the file. If that were the case, then you could end up with two requests conflicting over the same file. In that case, you would have to implement some form of concurrency protection (some sort of flag or queue). This is the type of thing that databases offer lots of features for.
I have a node.js app running on a Raspberry Pi that uses lots of async file I/O and I can have conflicts with that code from multiple requests. I solved it by setting a flag anytime I'm writing to a specific file and any other requests that want to write to that file first check that flag and if it is set, those requests going into my own queue are then served when the prior request finishes its write operation. There are many other ways to solve that too. If this happens in a lot of places, then it's probably worth just getting a database that offers features for this type of write contention.

Resources