Should I save data to DB asynchronously - node.js

I'm using node express and postgress.
I'm not sure if what I'm trying to do is a good practice or a very big mistake.
Save data to database asynchronously after I already return a result to the client.
I tried to demonstrate it with console.log to check if my server will be blocked during the saving.
Here you can see status route and statusB route.
app.get("/statusB", async (req, res) => {
return res.status(200).send("testAAA");
});
app.get("/status", async (req, res) => {
const userStats = await UserController.getData("id")
const x = test();
return res.status(200).send(userStats);
});
async function test() {
return new Promise(() => {
for (let x = 0; x < 10000; x++) {
setTimeout( () => {
console.log(x)
}, 5000);
}
})
}
What should I want to happen is if I send /status and right after send statusB.
I expect the output to be:
/status will return userStats data
/StatusB return 'testAAA'
and the counter will run asynchronously.
But actual the output is:
- /status return userStats data
- The counter run
- /StatusB return 'testAAA' only after the counter finished
The console log is only test to know if I can fetching and saving data to the database asynchronously instead of the console log.

Depends on your business case.
If it's alright for your customer to get a 200 OK status code even if the saving might actually have failed, then sure, you can do it asynchronously after you've responded.
In other cases, you'll want to do the saving within the request and only respond after you're sure everything is safe and sound.

It's depending on your logic if you want for example to return the saved resource to the client you should wait (async/await or callback) until the data is saved to the database but for example, if you want just log an action without any returns to the frontend you can save it asynchronously

Yes, you should save data to db asynchronously, because of the way nodejs works. If you wait for an answer from db (synchronously), nodejs block event loop and doesn't handle new requests from clients. BUT if your business logic rely on the fact that you should return the answer from db to client, you should do it synchronously and maybe think about workarounds or choose another runtime, if that will become a problem.

Related

Node.js child process fork return response -- Cannot set headers after they are sent to the client

situation:
have an function that does an expensive operation such as fetching a large query from mongodb, then performing a lot of parsing and analysis on the response. I have offloaded this expensive operation to a child process fork, and waiting for the worker to be done before sending response in order to not block the main event loop.
current implentation:
I have an API endpoint GET {{backend}}/api/missionHistory/flightSummary?days=90&token={{token}}
api entry point code:
missionReports.js
const cp = require('child_process');
//if reportChild is initailzed here, "Cant send headers after they were sent"
const reportChild = cp.fork('workers/reportWorker.js');
exports.flightSummary = function (req, res) {
let request = req.query;
// if initialized here, there is no error.
const reportChild = cp.fork('workers/reportWorker.js');
logger.debug(search_query);
let payload = {'task':'flight_summary', 'search_params': search_query};
reportChild.send(payload);
reportChild.on('message', function (msg) {
logger.info(msg);
if (msg.worker_status === 'completed')
{
return res.json(msg.data);
}
});
};
worker code:
reportWorker.js
process.on('message', function (msg) {
process.send({'worker_status': 'started'});
console.log(msg);
switch(msg.task)
{
case 'flight_summary':
findFlightHours(msg.search_params,function (response) {
logger.info('completed')
process.send({'worker_status': 'completed', 'data':response});
})
break;
}
});
scenario 1: reportChild (fork) is initialized at beginning of module definitions. api call works once, and returns correct data. on second call, it crashes with cannot send headers after theyve been sent. I stepped through the code, and it definitely only sends it once per api call.
scenario 2: if i initalize the reportChild inside of the api definition, it works perfectly every time. Why is that? Is the child forked process not killed unless its redefined? Is this standard implementation of child proceses?
This is my first attempt at threading in node.js, I am trying to move expensive operations off of the main event loop into different workers. Let me know what is best practice for this situation. Thanks.

Difficulty processing CSV file, browser timeout

I was asked to import a csv file from a server daily and parse the respective header to the appropriate fields in mongoose.
My first idea was to make it to run automatically with a scheduler using the cron module.
const CronJob = require('cron').CronJob;
const fs = require("fs");
const csv = require("fast-csv")
new CronJob('30 2 * * *', async function() {
await parseCSV();
this.stop();
}, function() {
this.start()
}, true);
Next, the parseCSV() function code is as follow:
(I have simplify some of the data)
function parseCSV() {
let buffer = [];
let stream = fs.createReadStream("data.csv");
csv.fromStream(stream, {headers:
[
"lot", "order", "cwotdt"
]
, trim:true})
.on("data", async (data) =>{
let data = { "order": data.order, "lot": data.lot, "date": data.cwotdt};
// Only add product that fulfill the following condition
if (data.cwotdt !== "000000"){
let product = {"order": data.order, "lot": data.lot}
// Check whether product exist in database or not
await db.Product.find(product, function(err, foundProduct){
if(foundProduct && foundProduct.length !== 0){
console.log("Product exists")
} else{
buffer.push(product);
console.log("Product not exists")
}
})
}
})
.on("end", function(){
db.Product.find({}, function(err, productAvailable){
// Check whether database exists or not
if(productAvailable.length !== 0){
// console.log("Database Exists");
// Add subsequent onward
db.Product.insertMany(buffer)
buffer = [];
} else{
// Add first time
db.Product.insertMany(buffer)
buffer = [];
}
})
});
}
It is not a problem if it's just a few line of rows in the csv file but just only reaching 2k rows, I encountered a problem. The culprit is due to the if condition checking when listening to the event handler on, it needs to check every single row to see whether the database contains the data already or not.
The reason I'm doing this is that the csv file will have new data added into it and I need to add all the data for the first time if database is empty or look into every single row and only add those new data into mongoose.
The 1st approach I did from here (as in the code),was using async/await to make sure that all the datas have been read before proceeding to the event handler end. This helps but I see from time to time (with mongoose.set("debug", true);), some data are being queried twice, which I have no idea why.
The 2nd approach was not to use the async/await feature, this has some downside since the data was not fully queried, it proceeded straight to the event handler end and then insertMany some of the datas which were able to get pushed into the buffer.
If i stick with the current approach, it is not an issue, but the query will take 1 to 2 minutes, not to mention even more if the database keeps growing. So, during those few minutes of querying, the event queue got blocked and therefore when sending request to the server, the server time out.
I used stream.pause() and stream.resume() before this code but I can't get it to work as it just jump straight to the end event handler first. This cause the buffer to be empty every single time since end event handler runs before the on event handler
I cant' remember the links that I used but the fundamentals that I got from is through this.
Import CSV Using Mongoose Schema
I saw these threads:
Insert a large csv file, 200'000 rows+, into MongoDB in NodeJS
Can't populate big chunk of data to mongodb using Node.js
to be similar to what I need but it's a bit too complicated for me to understand what is going on. Seems like using socket or a child process maybe? Furthermore, I still need to check conditions before adding into the buffer
Anyone care to guide me on this?
Edit: await is removed from console.log as it is not asynchronous
Forking a child process approach:
When web service got a request of csv data file save it somewhere in app
Fork a child process -> child process example
Pass the file url to the child_process to run the insert checks
When child process finish processing the csv file, delete the file
Like what Joe said, indexing the DB would speed up the processing time by a lot when there are lots(millions) of tuples.
If you create an index on order and lot. The query should be very fast.
db.Product.createIndex( { order: 1, lot: 1 }
Note: This is a compound index and may not be the ideal solution. Index strategies
Also, your await on console.log is weird. That may be causing your timing issues. console.log is not async. Additionally the function is not marked async
// removing await from console.log
let product = {"order": data.order, "lot": data.lot}
// Check whether product exist in database or not
await db.Product.find(product, function(err, foundProduct){
if(foundProduct && foundProduct.length !== 0){
console.log("Product exists")
} else{
buffer.push(product);
console.log("Product not exists")
}
})
I would try with removing the await on console.log (that may be a red herring if console.log is for stackoverflow and hiding the actual async method.) However, be sure to mark the function with async if that is the case.
Lastly, if the problem still exists. I may look into a 2 tiered approach.
Insert all lines from the CSV file into a mongo collection.
Process that mongo collection after the CSV has been parsed. Removing the CSV from the equation.

Using redis as cache as REST Api user (in order to save Api requests)

I am a API user and I have only a limited number of requests availble for a high traffic website (~1k concurrent visitors). In order to save API requests I would like to cache the responses for specific requests which are unlikely to change.
However I want to refresh this redis key (the API response) at least every 15 seconds. I wonder what the best approach for this would be?
My ideas:
I thought the TTL field would be handy for this scenario. Just set a TTL of 15s for this key. When I query this key and it's not present I would just request it again using the API. The problem: Since this is a high traffic website I would expect around 20-30 requests until I've got a response from the API and this would lead to 20-30 requests to the API within a few ms. So I would need to "pause" all incoming requests until there is a API response
My second idea was to refresh the key every 15s. I could set a background task which runs every 15s or upon page request I could check in my controller if the key needs a refresh. I would prefer the last idea but therefore I would need to maintain the redis key age and this seems to be very expensive and it is not a built in feature?
What would you suggest for this use case?
My controller code:
function players(req, res, next) {
redisClient.getAsync('leaderboard:players').then((playersLeaderboard) => {
if(!playersLeaderboard) {
// We need to get a fresh copy of the playersLeaderboard
}
res.set('Cache-Control', 's-maxage=10, max-age=10')
res.render('leaderboards/players', {playersLeaderboard: playersLeaderboard})
}).catch((err) => {
logger.error(err)
})
}
Simply fetch and cache the data when the node.js server starts and then set an interval for 15 seconds to fetch fresh data and update cache. Avoid using the TTL for this usecase.
function fetchResultsFromApi(cb) {
apiFunc((err, result) => {
// do some error handling
// cache result in redis without ttl
cb();
});
}
fetchResultsFromApi(() => {
app.listen(port);
setInterval(() => {
fetchResultsFromApi(() => {});
}, 15000);
}
Pros:
Very simple to implement
No queuing of client request required
Super fast response times
Cons:
The cache update might not execute/complete exactly after every 15th second. It might be a few milliseconds here and there. I assume that it won't make a lot of difference for what you are doing and you can always reduce the interval time to update cache before 15 seconds.
I guess this is more of an architecture question than those typical "help my code don't work" kind.
Let me paraphrase your requirements.
Q: I would like to cache the responses of some HTTP requests which are unlikely to change and I would like these cached responses to be refreshed every 15 seconds. Is it possible?
A: Yes it is and you're so going to thank the fact that Javascript is single threaded so it is going to be quite straight forward.
Some fundamental knowledge here. NodeJS is an event driven framework which means that at 1 point in time it is going to execute only one piece of code, all the way until it is done.
If any aysnc call is encountered along the way, it will call them and add an event to the event-loop to say "callback when a response is received". When the code routine is finished then it will pops the next event from the queue to run them.
Based on this knowledge, we know we can achieve this by building a function to only fire-off 1 async call to update the cached-responses everytime it expires. If an async call is already in action, then just put their callback functions into a queue. This is so that you don't do multiple async calls to fetch the new result.
I'm not familiar with the async module so I have provided an pseudo code example using promises instead.
Pseudo code:
var fetch_queue = [];
var cached_result = {
"cached_result_1": {
"result" : "test",
"expiry" : 1501477638 // epoch time 15s in future
}
}
var get_cached_result = function(lookup_key) {
if (cached_result.hasOwnProperty(lookup_key)) {
if (result_expired(cached_result[lookup_key].expiry)) {
// Look up cached
return new Promise(function (resolve) {
resolve(cached_result[lookup_key].result);
});
}
else {
// Not expired, safe to use cached result
return update_result();
}
}
}
var update_result = function() {
if (fetch_queue.length === 0) {
// No other request is retrieving an updated result.
return new Promise(function (resolve, reject) {
// call your API to get the result.
// When done call.
resolve("Your result");
// Inform other requests that an updated response is ready.
fetch_queue.forEach(function(promise) {
promise.resolve("Your result");
})
// Compute the new expiry epoch time and update the cached_result
})
}
else {
// Create a promise and park it into the queue
return new Promise(function(resolve, reject) {
fetch_queue.push({
resolve: resolve,
reject: reject
})
});
}
}
get_cached_result("cached_result_1").then(function(result) {
// reply the result
})
Note: As the name suggested the code is not actual working solution but the concept is there.
Something worth noting is, setInterval is 1 way to go but it doesn't guarantee that the function will get called exactly at the 15 second mark. The API only make sure that something will happen after the expected time.
Whereas the proposed solution will ensure that as long as the cached result has expired, the very next person looking it up will do a request and the following requests will wait for the initial request to return.

Store settimeout id from nodejs in mongodb

I am running a web application using express and nodejs. I have a request to a particular endpoint in which I use settimeout to call a particular function repeatedly after varying time intervals.
For example
router.get ("/playback", function(req, res) {
// Define callback here ...
....
var timeoutone = settimeout(callback, 1000);
var timeouttwo = settimeout(callback, 2000);
var timeoutthree = settimeout(callback, 3000);
});
The settimeout function returns an object with a circular reference. When trying to save this into mongodb i get a stack_overflow error. My aim is to be able to save these objects returned by settimeout into the database.
I have another endpoint called cancel playback which when called, will retrieve these timeout objects and call cleartimeout passing them in as an argument. How do I go about saving these timeout objects to the database ? Or is there a better way of clearing the timeouts than having to save them to the database. Thanks in advance for any help provided.
You cannot save live JavaScript objects in the database! Maybe you can store a string or JSON or similar reference to them, but not the actual object, and you cannot reload them later.
Edit: Also, I've just noticed you're using setTimeout for repeating stuff. If you need to repeat it on regular intervals, why not use setInterval instead?
Here is a simple solution, that would keep indexes in memory:
var timeouts = {};
var index = 0;
// route to set the timeout somewhere
router.get('/playback', function(req, res) {
timeouts['timeout-' + index] = setTimeout(ccb, 1000);
storeIndexValueSomewhere(index)
.then(function(){
res.json({timeoutIndex: index});
index++;
});
}
// another route that gets timeout indexes from that mongodb somehow
req.get('/playback/indexes', handler);
// finally a delete route
router.delete('/playback/:index', function(req, res) {
var index = 'timeout-' + req.params.index;
if (!timeouts[index]) {
return res.status(404).json({message: 'No job with that index'});
} else {
timeouts[index].cancelTimeout();
timeouts[index] = undefined;
return res.json({message: 'Removed job'});
}
});
But this probably would not scale to many millions of jobs.
A more complex solution, and perhaps more appropriate to your needs (depends on your playback job type) could involve job brokers or message queues, clusters and workers that subscribe to something they can listen to for their own job cancel signals etc.
I hope this helps you a little to clear up your requirements.

wait for async to complete before return

mongoosejs async code .
userSchema.static('alreadyExists',function(name){
var isPresent;
this.count({alias : name },function(err,count){
isPresent = !!count
});
console.log('Value of flag '+isPresent);
return isPresent;
});
I know isPresent is returned before the this.count async function calls the callback , so its value is undefined . But how do i wait for callback to change value of isPresent and then safely return ?
what effect does
(function(){ asynccalls() asynccall() })(); has in the async flow .
What happens if var foo = asynccall() or (function(){})()
Will the above two make return wait ?
can process.nextTick() help?
I know there are lot of questions like these , but nothing explained about problem of returning before async completion
There is no way to do that. You need to change the signature of your function to take a callback rather than returning a value.
Making IO async is one of the main motivation of Node.js, and waiting for an async call to be completed defeats the purpose.
If you give me more context on what you are trying to achieve, I can give you pointers on how to implement it with callbacks.
Edit: You need something like the following:
userSchema.static('alreadyExists',function (name, callback) {
this.count({alias : name}, function (err, count) {
callback(err, err ? null : !!count);
console.log('Value of flag ' + !!count);
});
});
Then, you can use it like:
User.alreadyExists('username', function (err, exists) {
if (err) {
// Handle error
return;
}
if (exists) {
// Pick another username.
} else {
// Continue with this username.
}
}
Had the same problem. I wanted my mocha tests to run very fast (as they originally did), but at the same time to have a anti-DOS layer present and operational in my app. Running those tests just as they originally worked was crazy fast and ddos module I'm using started to response with Too Many Requests error, making the tests fail. I didn't want to disable it just for test purposes (I actually wanted to have automated tests to verify Too Many Requests cases to be there as well).
I had one place used by all the tests that prepared client for HTTPS requests (filled with proper headers, authenticated, with cookies, etc.). It looked like this more or less:
var agent = thiz.getAgent();
thiz.log('preReq for user ' + thiz.username);
thiz.log('preReq for ' + req.url + ' for agent ' + agent.mochaname);
if(thiz.headers) {
Object.keys(thiz.headers).map(function(header) {
thiz.log('preReq header ' + header);
req.set(header, thiz.headers[header]);
});
}
agent.attachCookies(req);
So I wanted to inject there a sleep that would kick in every 5 times this client was requested by a test to perform a request - so the entire suite would run quickly and every 5-th request would wait to make ddos module consider my request unpunishable by Too Many Requests error.
I searched most of the entries here about Async and other libs or practices. All of them required going for callback - which meant I would have to re-write a couple of hundreds of test cases.
Finally I gave up with any elegant solution and fell to the one that worked for me. Which was adding a for loop trying to check status of non-existing file. It caused a operation to be performed long enough I could calibrate it to last for around 6500 ms.
for(var i = 0; i < 200000; ++i) {
try {
fs.statSync('/path' + i);
} catch(err) {
}
};

Resources