In firestore, better way for a connected user to get call a cloud function to get data from the cloud firestore? - node.js

I have a website that stores data in Cloud Firestore. Every minute, the database gets updated when I go out to various APIs and store new data in it.
I need to provide the user this updated data every minute. Currently, I have it so that every minute, the user's browser will make a new Cloud Function call, which then goes out to the Cloud Firestore and gets the new data. However, imagine if a user were to leave their browser open all day - that would result in 1,440 requests.
Cloud Functions only provides me 2,000,000 requests for free, and if I had many users, those requests would get eaten up quite quickly. Is there a better way for me to give the user this data every minute and not eat up my Cloud Functions quota? Perhaps I could make my own Socket and have the user connect to that? Though I'd have to see how I could update that socket every minute without adding too much to the quota.

Firebase allows your clients to directly connection to Cloud Firestore, where they can then listen for realtime updates. This saves them from having to poll for that data, and removes the need for the Cloud Function.
Attaching a real time listener can be as simple as (from the documentation):
db.collection("cities").doc("SF")
.onSnapshot(function(doc) {
console.log("Current data: ", doc.data());
});
The onSnapshot callback above will trigger whenever the /cities/SF document is updated. Similarly you can attach a listener to the entire collection.

Related

Calling external API only when new data is available

I am serving my users with data fetched from an external API. Now, I don't know when this API will have new data, how would be the best approach to do that using Node, for example?
I have tried setInterval's and node-schedule to do that and got it working, but isn't it expensive for the CPU? For example, over a day I would hit this endpoint to check for new data every minute, but it could have new data every five minutes or more.
The thing is, this external API isn't ran by me. Would the only way to check for updates hitting it every minute? Is there any module that can do that in Node or any approach that fits better?
Use case 1 : Call a weather API for every city of the country and just save data to my db when it is going to rain in a given city.
Use case 2 : Send notification to the user when a given Philips Hue lamp is turned on at the time it is turned on without having to hit the endpoint to check if it is on or not.
I appreciate the time to discuss this.
If this external API has no means of notifying you when there's new data, then the only thing you can do is to "poll" it to check for new data.
You will have to decide what an "efficient design" for polling is in your specific application and given the type of data and the needs of the client (what is an acceptable latency for new data).
You also need to be sure that your service is not violating any terms of service with your polling scheme or running afoul of rate limiting that may deny you access to the server if you use it "too much".
Would the only way to check for updates hitting it every minute?
Unless the API offers some notification feature, there is no other scheme other than polling at some interval. Polling every minute is fairly quick. Do your clients really need information that is less than a minute old? Or would it really make no difference if the information was as much as 5 minutes old.
For example, in your example of weather, a client wouldn't really need temperature updates more often than probably every 10-15 minutes.
Is there any module that can do that in Node or any approach that fits better?
No. Not really. You'll probably just use some sort of timer (either repeated setTimeout() or setInterval() in a node.js app to repeatedly carry out your API operations.
Use case: Call a weather API for every city of the country and just save data to my db when it is going to rain in a given city.
Trying to pre-save every possible piece of data from an external API is probably a losing proposition. You're essentially trying to "scrape" all the data from the external API. That is likely against the terms of service and will likely also run afoul of rate limits. And, it's just not very practical.
Instead, you will probably want to fetch data upon demand (when a client requests data for Phoenix, then, and only then, do you start collecting data for Phoenix) and then once a demand for a certain type of data (temperatures in a particular city) is established, then you might want to pre-cache that data more regularly so you can notify clients of changes. If, after awhile, no clients are asking for data from Phoenix, you stop requesting updates for Phoenix any more until a client establishes demand again.
I have tried setInterval's and node-schedule to do that and got it working, but isn't it expensive for the CPU? For example, over a day I would hit this endpoint to check for new data every minute, but it could have new data every five minutes or more.
Making a remote network request is not a CPU intensive operation, even if you're doing it every minute. node.js uses non-blocking networking so most of the time during a network request, node.js isn't doing anything and isn't using the CPU at all. The only time the CPU would be briefly used is when you first send the API request and then when you receive back the result from the API call and need to process it.
Whether you really need to "poll" every minute depends upon the data and the needs of the client. I'd ask yourself if your app will work just fine if you check for new data every 5 minutes.
The method I would use to update would be contained outside of the code in a scheduled batch/powershell/bash file. In windows you can schedule tasks based upon time of day or duration since last run, so what you could do is run a simple command that will kill your application for five minutes, run npm update, and then restart your application before closing the shell.
That way you're staying out of your API and keeping code to a minimum, and if your code is inside that Node package in the update, it'll be there and ready once you make serious application changes or you need to take the server down for maintenance and updates to the low-level code.
This is a light-weight solution for you and it's a method I've used once or twice at my workplace. There are lots of options out there, and if this isn't what you're looking for I can keep looking out for you.

What is the best way to keep local copy of Firebase Database on node.js

I have an app where I need to check people's posts constantly. I am trying to make sure that the -server- handles more than 100,000 posts. I tried to explain the program and specify the issues I am worried about by numbers.
I am running a simple node.js program on my terminal that runs as firebase admin controlling the Firebase Database. The program has no connectivity with clients(users), it just keeps the database locally to check users' posts every 2-3 seconds. I am keeping the posts in local hash variables by using on('child_added') to simply push the post to a posts hash and so on for on('child_removed') and on('child_changed').
Are these functions able to handle more than 5 requests per second?
Is this the proper way of keeping data locally for faster processing(and not abusing firebase limits)? I need to check every post on the platform every 2-3 seconds, so I am trying to keep a local copy of the -posts data.
That local copy of the posts are looped through every 2-3 seconds.
If there are thousands of posts, will a simple array variable handle that load?
Second part of the program:
I run a for loop to loop through the posts in a function. I run the function every 2-3 seconds using setInterval(). The program needs not only to check new added posts but it constantly needs to check all posts on the database.
If(specific condition for a post) => the program changes the state of the post
.on(child_changed) function => sends an API request to a website after that state change
Can this function run asynchronously ? When it is called, the function should not wait for the previous call to finish because the old call is sending an API request and it might not complete fast. How can I make sure that .on(child_changed) doesn't miss a single change on the -posts data?
Listen for Value Events documentation shows how to observe changes, namely one uses the .on method.
In terms of backing up your Realtime Database, you simply export the data manually, or if you have the paid plan you can automate it.
I don't understand why you would want to recreate the wheel, so to speak, and have your server ping firebase for updates. Simply use firebase observers.

nodejs - run a function at a specific time

I'm building a website that some users will enter and after a specific amount of time an algorithm has to run in order to take the input of the users that is stored in the database and create some results for them storing the results also in the database. The problem is that in nodejs i cant figure out where and how should i implement this algorithm in order to run after a specific amount of time and only once(every few minutes or seconds).
The app is builded in nodejs-expressjs.
For example lets say that i start the application and after 3 minutes the algorithm should run and take some data from the database and after the algorithm has created some output stores it in database again.
What are the typical solutions for that (at least one is enough). thank you!
Let say you have a user request that saves url to crawl and get listed products
So one of the simplest ways would be to:
On user requests create in DB "tasks" table
userId | urlToCrawl | dateAdded | isProcessing | ....
Then in node main site you have some setInterval(findAndProcessNewTasks, 60000)
so it will get all tasks that are not currently in work (where isProcessing is false)
every 1 min or whatever interval you need
findAndProcessNewTasks
will query db and run your algorithm for every record that is not processed yet
also it will set isProcessing to true
eventually once algorithm is finished it will remove the record from tasks (or mark some another field like "finished" as true)
Depending on load and number of tasks it may make sense to process your algorithm in another node app
Typically you would have a message bus (Kafka, rabbitmq etc.) with main app just sending events and worker node.js apps doing actual job and inserting products into db
this would make main app lightweight and allow scaling worker apps
From your question it's not clear whether you want to run the algorithm on the web server (perhaps processing input from multiple users) or on the client (processing the input from a particular user).
If the former, then use setTimeout(), or something similar, in your main javascript file that creates the web server listener. Your server can then be handling inputs from users (via the app listener) and in parallel running algorithms that look at the database.
If the latter, then use setTimeout(), or something similar, in the javascript code that is being loaded into the user's browser.
You may actually need some combination of the above: code running on the server to periodically do some processing on a central database, and code running in each user's browser to periodically refresh the user's display with new data pulled down from the server.
You might also want to implement a websocket and json rpc interface between the client and the server. Then, rather than having the client "poll" the server for the results of your algorithm, you can have the client listen for events arriving on the websocket.
Hope that helps!
If I understand you correctly - I would just send the data to the client-side while rendering the page and store it into some hidden tag (like input type="hidden"). Then I would run a script on the server-side with setTimeout to display the data to the client.

Where / When to close Mongo db connection in a for each iteration context

I have a node.js app running on heroku backed by a Mongo db which breaks down like this:
Node app connects to db and stores db and collection into "top level" variables (not sure if Global is the right word)
App iterates through each document in the db using the foreach() function in node mongo driver.
Each iteration sends the document id to another function that uses the id to access fields on that document and take actions based on that data. In this case its making requests against api's from amazon and walmart getting updated pricing info. This function is also being throttled so as not to make too many requests too quickly.
My question is this, how can I know its safe to close the db connection. My best idea is to get a count of the documents, multiply that by the number of external api hits per document and then increment a variable by one each time a api transaction finishes and then test that number against the total number expected and if it hits that close the connection. This sounds so hackish there has to be a better way. Any ideas?

Is there a way to connect to a database via sockets and socket.io?

I am writing an application whereby some external module/component is updating a SQLite database with new data every few hundred milliseconds or so, and my job is to write an application that queries that data and broadcasts it over sockets every few hundred milliseconds as well.
So currently I'm doing something like this with node, express, and socket.io:
timer = setInterval(function() {
db.all('SELECT * FROM cache', function(err, rows) {
io.emit('data', rows);
});
},
400
);
But I feel like there should be a more direct approach to this, whereby I can maintain a socket connection directly to the database, and listen for changes "live", rather than having to do blind queries (even if the data may not have changed), and emit.
Maybe this is not supported by SQLite (which is fine, I think I have some flexibility in the storage system I'm using), but is what I'm asking at all possible?
Note that I don't have control over the database updating process, so I can't just emit the data I'm about to store in the database. That whole process is a black box C program and I ONLY have access to the database itself.
What you're looking for is commonly called pub/sub (short for publish and subscribe). Clients waiting for data connect to a server and subscribe to the sort of events they want to receive. The data originators also connect to this server and publish events. The RPC with events that Socket.IO gives you are really similar to this. The clients have set up handlers for certain types of events, and the server fires these events with the appropriate data.
The problem is, pub/sub isn't typically implemented in a database. (Redis is an exception.) SQLite certainly has no capability for this. Since you can't modify the original application and only have access to the file database, there is nothing you can do. What you need is to effectively make your server an adapter from polling the database to broadcasting messages.
I do see a problem though with your setup. The first is that you are querying the database every 400 milliseconds. Don't do that. What if your query takes 500 milliseconds? Now you have a second query piling up. What if those two queries are now slow because they are both attempting to run at the same time? Now you have 3, 4, 5, and then 100 queries piling up. Don't schedule your next query to run until one is done. Check out an implementation of throttle for this.
The next problem is that you are blindly sending out all of the results to the client every time. I don't know what your application does, but I'm guessing that there is a chance for overlap from the previous query. Does your database have columns with timestamps? You could modify your query to use them. Or, modify your application to filter them.

Resources