NodeJs throttle data - node.js

I would like to know, how to use javascript to achieve my use case,
My app receives a post request, then it incr memcache key, then it publish the increased value straightaway to users(mobile app) using third party API.
Eg. first requst value become 1, publish 1.
second request value become 2, publish 2 ...
It works fine with requests less than 2k within 30 secs.
If the requests number goes up to 10k, users(mobile app) may receive too many messages from publisher(battery consuming)
So I have to the throttle publishing calls, instead of publishing per request, I want to publish the value every second. In 1 second, the value can be 1, then publish 1. In 2 second then value can be 100, then publish 100. So that I saved 99 publish calls.
When requests are not coming anymore, I don't want a worker keep running every second.

Each time it increments, cache the new value to a global variable and post it to clients using setInterval. Here is a simple example:
var key = 0;
// Update the cache to the present
// value on application start
memcache.get('key', updateKey);
// Handle increment request and
// save the new value
app.post('/post', function(req, res){
memcache.incr('key', updateKey);
});
// Update the cached key
function updateKey(err, val){
key = val;
}
// Publish to clients once
// a second
function publish(){
clients.emit(key);
}
setInterval(publish, 1000);
Starting and stopping this routine is a little more involved and may depend on how you're serving requests / incrementing the value.

Take a look at node-rate-limiter
You can implement it in a number of ways to solve your problem...

Related

How to have more then 30sec response timeout in heroku

Guys Heroku is terminating the req if the response takes more then 30sec to return, so is there any way I can wait for as long as the response would come back?
Well the user is uploading his file and I need to do something with the file in my server and after updates are done I will give a download link to the user. But mostly it takes more then 30 sec for the server to process the file so that the user need to wait for response
From the official Heroku Helpcenter : https://devcenter.heroku.com/articles/request-timeout
The timeout value is not configurable. If your server requires longer than 30 seconds to complete a given request, we recommend moving that work to a background task or worker to periodically ping your server to see if the processing request has been finished. This pattern frees your web processes up to do more work, and decreases overall application response times.
The short answer is : No, you can't change this configuration. I suggest you investigate why your application needs more than 30 seconds to process that request. If it takes longer than 10 seconds your really should consider the steps suggested in the Heroku Help Center đŸ‘†
Your Problem
You mention you need this for file processing. I understand that file processing could easily take longer than 30 seconds. Normally what I would do is to just create some sort of task reference and keep it in a database along with a status ("processing", "finished", "failed") - also store the original file and then just end the request of the user. This shouldn't take long. Then process the task ... with another endpoint or websocket connection the user could check if the task has been fullfilled.
Use a Task Queue
The following is just a basic interpretation of a solution - it's not meant for copy & pasting as it depends on so many things.
Routes (Endpoints)
Basically you need to have 3 routes in your backend. One for uploading the file, one for downloading the processed file and one for checking the status of the task.
1. Upload
app.post('/files', /* some middleware e.g. multer */, async (req, res) => {
// This is your upload controller
// I assume at this point the file has been uploaded and
// req.file contains a reference to the uploaded file.
// create new process task and add to queue
const task = await createNewTask(req.file);
queue.push(task);
// now a task has been created, but the user
// doesn't need to wait for it to finish
// so let's end the request here.
return req.status(200).json(task);
});
2. Check Status
app.get('/task/:id', async (req, res) => {
// From uploading a file in the first step, you'll
// get back a task id. Use the task id to check on
// the status.
const task = await getTask(req.params.id);
if (!task) {
return res.status(404).end();
} else {
return res.status(200).json(task);
}
});
The task can include informations like status, progress percentage, original filename, new filename or even a download link to the processed file once it's finished. Status could be something like pending, processing, finished or failed.
3. Download
app.get('/file/:filename', (req, res) => {
return req.status(200)
.sendFile('./path/to/file/' + req.params.filename);
});
Notes
It might be a good idea to rename the incoming files with a random id like a uuid. So it's easier to work with them in the automation process. Also the random id could be used for the task id at the same time.
It's up to you how big you want to go with this. For the task queue there are many different libraries to help you out with it. It could be an in-memory queue or one that's backed with a database.

Tracking currently active users in node.js

I am building an application using node.js and socket.io. I would like to create a table of users who are actively browsing the site at any given moment, which will update dynamically.
I am setting a cookie to give each browser a unique ID, and have a mysql database of all users (whether online or not); however, I'm not sure how best to use these two pieces of information to determine who is, and who isn't, actively browsing right now.
The simplest way would seem to be to store the cookie & socket IDs in an array, but I have read that global variables (which presumably this would have to be) are generally bad, and to be avoided.
Alternatively I could create a new database table, where IDs are inserted and deleted when a socket connects/disconnects; but I'm not sure whether this would be overkill.
Is one of these methods any better than the other, or is there a way of tracking this information which I haven't thought of yet?
You can keep track of active users in memory without it being a global variable. It can simply be a module level variable. This is one of the advantages of the nodejs module system.
The reasons to put it in a database instead of memory are:
You have multiple servers so you need a centralized place to put the data
You want the data stored persistently so if the server is restarted (normally or abnormally) you will have the recent data
The reasons for not putting it directly in a database:
It's a significant load of new database operations since you have to update the data on every single incoming request.
You can sometimes get the persistence without directly using a database by logging the access to a log file and then running chron jobs that parse the logs and do bulk addition of data to the database. This has a downside in that it's not as easy to query live data (since the most recent data is sitting in databases and hasn't been parsed yet).
For an in-memory store, you could do something like this:
// middleware that keeps track of user access
let userAccessMap = new Map();
app.use((req, res, next) => {
// get userId from the cookie (substitute your own cookie logic here)
let id = id: req.cookie.userID;
let lastAccess = Date.now();
// if you want to keep track of more than just lastAccess,
// you can store an object of data here instead of just the lastAccess time
// To update it, you would get the previous object, update some properties
// in it, and then set it back in the userAccessMap
userAccessMap.set(id, lastAccess);
next();
});
// routinely clean up the userAccessMap to remove old access times
// so it doesn't just grow forever
const cleanupFrequency = 30 * 60 * 1000; // run cleanup every 30 minutes
const cleanupTarget = 24 * 60 * 60 * 1000; // clean out users who haven't been here in the last day
setInterval(() => {
let now = Date.now();
for (let [id, lastAccess] of userAccessMap.entries()) {
if (now - lastAccess > cleanupTarget) {
// delete users who haven't been here in a long time
userAccessMap.delete(id);
}
}
}, cleanupFrequncy);
// Then, create some sort of adminstrative interface (probably with some sort of access protection)
// that gives you access to the user access info
// This might even be available in a separate web server on a separate port that isn't open to the general publoic
app.get("/userAccessData", (req, res) => {
// perhaps convert this to a human readable user name by looking up the user id
// also may want to sort the data by recentAccess
res.json(Array.from(userAccessMap));
});

Delaying execution of multiple HTTP requests in Google Cloud Function

I've implemented a web scraper with Nodejs, cheerio and request-promise that scrapes an endpoint (basic html page) and return certain information. The content of the page I'm crawling differs based on a parameter at the end of the url (http://some-url.com?value=12345 where 12345 is my dynamic value).
I need this crawler to work every x minutes and crawl multiple pages, and to do that I've set a cronjob using Google Cloud Scheduler. (I'm fetching the dynamic values I need from Firebase).
There could be more than 50 different values for which I'd need to crawl the specific page, but I would like to ease the load with which I'm sending the requests so the server doesn't choke. To accomplish this, I've tried to add a delay
1) using setTimeout
2) using setInterval
3) using a custom sleep implementation:
const sleep = require('util').promisify(setTimeout);
All 3 of these methods work locally; all of the requests are made with y seconds delay as intended.
But when tried with Firebase Cloud Functions and Google Cloud Scheduler
1) not all of the requests are sent
2) the delay is NOT consistent (some requests fire with the proper delay, then there are no requests made for a while and other requests are sent with a major delay)
I've tried many things but I wasn't able to solve this problem.
I was wondering if anyone could suggest a different theoretical approach or a certain library etc. I can take for this scenario, since the one I have now doesn't seem to work as I intended. I'm adding one of the approaches that locally work below.
Cheers!
courseDataRefArray.forEach(async (dataRefObject: CourseDataRef, index: number) => {
console.log(`Foreach index = ${index} -- Hello StackOverflow`);
setTimeout(async () => {
console.log(`Index in setTimeout = ${index} -- Hello StackOverflow`);
await CourseUtil.initiateJobForCourse(dataRefObject.ref, dataRefObject.data);
}, 2000 * index);
});
(Note: I can provide more code samples if necessary; but it's mostly following a loop & async/await & setTimeout pattern, and since it works locally I'm assuming that's not the main problem.)

How to implement server side rendering datatable, Using node and mongo db?

So i have one user collection(mongo DB) which consists millions of user.
I m using nodejs as backend, angular js as frontend and datatable for displaying those users.
But datatable Load all users in one api call which load more then 1 million user.
This makes my API response two slow.
I want only first 50 users then next 50 then so on....
Server stack = node js + angular js + mongo DB
Thanks
If you are using datatable with huge amount of data you should consider using server side processing functionnality.
Server side processing for datatable is described here : https://datatables.net/manual/server-side
But if you feel lazy to implement this on your server you could use third parties like :
https://github.com/vinicius0026/datatables-query
https://github.com/eherve/mongoose-datatable
Hope this helps.
The way to solve you client trying to fetch users from your server(and DB) and then rendering them to a datatable is done using pagination. There a few ways of solving pagination which i have seen, let's assume you are using REST.
One way of doing this is having your API ending with:
/api/users?skip=100&limit=50
Meaning, the client will ask your server for users(using default sorting) and skipping the first 100 results it finds and retrieving the next 50 users.
Another way is to have your API like this(I don't really like this approach):
/api/users?page=5&pageSize=50
Meaning, the client will pass which page and how many results per page it wants to fetch. This will result in a server side calculation becuase you would need to fetch users from 250-300.
You can read on pagination a lot more on the web.
Having said that, your next issue is to fetch the desired users from the database. MongoDB has two functions for using skip and limit, which is why I like the first API better. You can do the query as follows:
users.find().skip(50).limit(50)
You can read more about the limit function here and the skip function here
First Thing you need in to add skip and limit to you mongo query like this
Model.find().skip(offset).limit(limit)
then the next thing you have to do is enable server side processing in datatables
If you are using javascript data-table then this fiddle will work for you
http://jsfiddle.net/bababalcksheep/ntcwust8/
For angular-datatables
http://l-lin.github.io/angular-datatables/archives/#/serverSideProcessing
One other way if you want to send own parameters
$scope.dtOptions = DTOptionsBuilder.newOptions()
.withOption('serverSide', true)
.withOption('processing', true)
.withOption('ajax', function (data, callback, settings) {
// make an ajax request using data.start and data.length
$http.post(url, {
draw: draw,
limit: data.length,
offset: data.start,
contains: data.search.value
}).success(function (res) {
// map your server's response to the DataTables format and pass it to
// DataTables' callback
draw = res.draw;
callback({
recordsTotal: res.meta,
recordsFiltered: res.meta,
draw: res.draw,
data: res.data
});
});
})
you will get the length per page and offset as start variable in data object in the .withOption('ajax' , fun...) section and from there you can pass this in get request as params e.g. /route?offset=data.start&limit?data.length or using the post request in above example
On hitting next button in table this function will automatically trigger with limit and start and many other datatable related value
#mahesh
when loading page create 2 variables lets say skipVar=0 and limit when user clicks on next send *skipVar value key skip
var skipVar =0
on page load skip=skipVar&limit=limit
on next button
skipVar=skipVar*limit
and send Query String as
skip=skipVar&limit=limit

Meteor publish method

I just started the Meteor js, and I'm struggling in its publish method. Below is one publish method.
//Server side
Meteor.publish('topPostsWithTopComments', function() {
var topPostsCursor = Posts.find({}, {sort: {score: -1}, limit: 30});
var userIds = topPostsCursor.map(function(p) { return p.userId });
return [
topPostsCursor,
Meteor.users.find({'_id': {$in: userIds}})
];
});
// Client side
Meteor.subscribe('topPostsWithTopComments');
Now I'm not getting how I can use publish data on client. I meant I want to use data which will be given by topPostsWithTopComments
Problem is detailed below
When a new post enters the top 30 list, two things need to happen:
The server needs to send the new post to the client.
The server needs to send that post’s author to the client.
Meteor is observing the Posts cursor returned on line 6, and so will send the new post down as soon as it’s added, ensuring the client will receive the new post straight away.
However, consider the Meteor.users cursor returned on line 7. Even if the cursor itself is reactive, it’s now using an outdated value for the userIds array (which is a plain old non-reactive variable), which means its result set will be out of date as well.
This is why as far as that cursor is concerned, there is no need to re-run the query and Meteor will happily continue to publish the same 30 authors for the original 30 top posts ad infinitum.
So unless the whole code of the publication runs again (to construct a new list of userIds), the cursor is no longer going to return the correct information.
Basically what I need is:
if any changes happens in Post, then it should have the updated users list. without calling user collection again. I found some user full mrt modules.
link1 |
link2 |
link3
Please share your views!
-Neelesh
When you publish data on the server you're just publishing what the client is allowed to query. This is for security. After you subscribe to your publication you still need to query what the publication returned.
if(Meteor.isClient) {
Meteor.subscribe('topPostsWithTopComments');
// This returns all the records published with topPostsWithComments from the Posts Collection
var posts = Posts.find({});
}
If you wanted to only publish posts that the current user owns you would want to filter them out in the publish method on the server and not on the client.
I think #Will Brock already answered your question but maybe it becomes more clear with an abstract example.
Let's construct two collections named collectiona and collectionb.
// server and client
CollectionA = new Meteor.Collection('collectiona');
CollectionB = new Meteor.Collection('collectionb');
On the server you could now call Meteor.publish with 'collectiona' and 'collectionb' separately to publish both record sets to the client. This way the client could then also separately subscribe to them.
But instead you can also publish multiple record sets in a single call to Meteor.publish by returning multiple cursors in an array. Just like in the standard publishing procedure you can of course define what is being sent down to the client. Like so:
if (Meteor.isServer) {
Meteor.publish('collectionAandB', function() {
// constrain records from 'collectiona': limit number of documents to one
var onlyOneFromCollectionA = CollectionA.find({}, {limit: 1});
// all cursors in the array are published
return [
onlyOneFromCollectionA,
CollectionB.find()
];
});
}
Now on the client there is no need to subscribe to 'collectiona' and 'collectionb' separately. Instead you can simply subscribe to 'collectionAandB':
if (Meteor.isClient) {
Meteor.subscribe('collectionAandB', function () {
// callback to use collection A and B on the client once
// they are ready
// only one document of collection A will be available here
console.log(CollectionA.find().fetch());
// all documents from collection B will be available here
console.log(CollectionB.find().fetch());
});
}
So I think what you need to understand is that there is no array sent to the client that contains the two cursors published in the Meteor.publish call. This is because returning an array of cursors in the function passed as an argument to your call to Meteor.publish merely tells Meteor to publish all cursors contained in the array. You still need to query the individual records using your collection handles on the client (see #Will Brock's answer).

Resources