Node js REST Client Scaling the Data collection - node.js

I have a scenario where my node js client collects data from rest api.
Scenario : my api endpoint is like this http://url/{project}
where project is parameter. the project comes from a Database table.
here is my procedure:
I am getting all the projects names from Database to a list
using a loop calling rest endpoint for every project in the list
My Query: If I have less number of projects in the Database this procedure working fine but, If I have around 1000 projects to collect, the requests are taking long time and some times failing due to timeout errors.
How can I scale this process so that it finish collecting data in a good amount of time?

Related

NodeInvocationException: The Node invocation timed out after 60000ms

I have an asp.net core web application with angular 5. In my repository layer I have a simple linq query that get data from a table. every thing works well till I change the query and join it to another entity to fetch data from two table.
the join query get data from DB quickly and there is no delay.
and now when I run the app I get this error:
NodeInvocationException: The Node invocation timed out after 60000ms.
You can change the timeout duration by setting the
InvocationTimeoutMilliseconds property on NodeServicesOptions
When I run the API alone, it works well and returns Json data without any problem.
any help would be appreciated.

What is the best way to keep local copy of Firebase Database on node.js

I have an app where I need to check people's posts constantly. I am trying to make sure that the -server- handles more than 100,000 posts. I tried to explain the program and specify the issues I am worried about by numbers.
I am running a simple node.js program on my terminal that runs as firebase admin controlling the Firebase Database. The program has no connectivity with clients(users), it just keeps the database locally to check users' posts every 2-3 seconds. I am keeping the posts in local hash variables by using on('child_added') to simply push the post to a posts hash and so on for on('child_removed') and on('child_changed').
Are these functions able to handle more than 5 requests per second?
Is this the proper way of keeping data locally for faster processing(and not abusing firebase limits)? I need to check every post on the platform every 2-3 seconds, so I am trying to keep a local copy of the -posts data.
That local copy of the posts are looped through every 2-3 seconds.
If there are thousands of posts, will a simple array variable handle that load?
Second part of the program:
I run a for loop to loop through the posts in a function. I run the function every 2-3 seconds using setInterval(). The program needs not only to check new added posts but it constantly needs to check all posts on the database.
If(specific condition for a post) => the program changes the state of the post
.on(child_changed) function => sends an API request to a website after that state change
Can this function run asynchronously ? When it is called, the function should not wait for the previous call to finish because the old call is sending an API request and it might not complete fast. How can I make sure that .on(child_changed) doesn't miss a single change on the -posts data?
Listen for Value Events documentation shows how to observe changes, namely one uses the .on method.
In terms of backing up your Realtime Database, you simply export the data manually, or if you have the paid plan you can automate it.
I don't understand why you would want to recreate the wheel, so to speak, and have your server ping firebase for updates. Simply use firebase observers.

Fetching bot answers from a database

I'm using Azure Cosmos DB with MongoDB for storing the answers that my Microsoft Bot Framework-based chatbot will give to different dialogs.
My issue is that I don't know if it's best to do a query for each response or do one large query to fetch everything in the DB once the code runs and store it in arrays.
The Azure Cosmos DB pricing uses the unit Request Units per second (RU/s).
In terms of cost and speed, I'm thinking of doing one query whenever the bot service is run (in my case, that would be when app.js is run on my Azure Web App).
This query fetches all the data in my database and stores results in different arrays in my code. Inside my bot.dialog()s I will use these arrays to fetch the answer that I wont the bot to return to the end user.
i would load all the data from the db into the bot when the app starts up and if you manipulate the data you can write it back into the db when the bot shuts down. this would mean that you have one single big query at the beginning of your bots life and another one at the end. but this also depends on the amount of memory that your app has allocated and how big the db is
From Cosmos DB perspective fewer requests that yield larger datasets will typically be faster/cheaper in terms of RUs than more requests fetching smaller datasets. Roundtrips are expensive. But it depends on the complexity of the queries too - aggregation pipelines are more expensive than find() with filters. Everything else should be a client-side consideration

nodejs - run a function at a specific time

I'm building a website that some users will enter and after a specific amount of time an algorithm has to run in order to take the input of the users that is stored in the database and create some results for them storing the results also in the database. The problem is that in nodejs i cant figure out where and how should i implement this algorithm in order to run after a specific amount of time and only once(every few minutes or seconds).
The app is builded in nodejs-expressjs.
For example lets say that i start the application and after 3 minutes the algorithm should run and take some data from the database and after the algorithm has created some output stores it in database again.
What are the typical solutions for that (at least one is enough). thank you!
Let say you have a user request that saves url to crawl and get listed products
So one of the simplest ways would be to:
On user requests create in DB "tasks" table
userId | urlToCrawl | dateAdded | isProcessing | ....
Then in node main site you have some setInterval(findAndProcessNewTasks, 60000)
so it will get all tasks that are not currently in work (where isProcessing is false)
every 1 min or whatever interval you need
findAndProcessNewTasks
will query db and run your algorithm for every record that is not processed yet
also it will set isProcessing to true
eventually once algorithm is finished it will remove the record from tasks (or mark some another field like "finished" as true)
Depending on load and number of tasks it may make sense to process your algorithm in another node app
Typically you would have a message bus (Kafka, rabbitmq etc.) with main app just sending events and worker node.js apps doing actual job and inserting products into db
this would make main app lightweight and allow scaling worker apps
From your question it's not clear whether you want to run the algorithm on the web server (perhaps processing input from multiple users) or on the client (processing the input from a particular user).
If the former, then use setTimeout(), or something similar, in your main javascript file that creates the web server listener. Your server can then be handling inputs from users (via the app listener) and in parallel running algorithms that look at the database.
If the latter, then use setTimeout(), or something similar, in the javascript code that is being loaded into the user's browser.
You may actually need some combination of the above: code running on the server to periodically do some processing on a central database, and code running in each user's browser to periodically refresh the user's display with new data pulled down from the server.
You might also want to implement a websocket and json rpc interface between the client and the server. Then, rather than having the client "poll" the server for the results of your algorithm, you can have the client listen for events arriving on the websocket.
Hope that helps!
If I understand you correctly - I would just send the data to the client-side while rendering the page and store it into some hidden tag (like input type="hidden"). Then I would run a script on the server-side with setTimeout to display the data to the client.

Azure Logic Apps - Timeout issue

I have created a Azure Logic apps to pull data from a REST API and populate a Azure SQL Database to process some data and push result to Dynamics 365. I have around 6000 rows from REST API and I have created 2 logic apps, one pulls data as paged (each page having 10 records) and using a do until loop to process each set. I'm calling another logic app 2 from DO UNTIL loop and passing the paged records which inserts record in to SQL Database.
The issue i'm encountering is the Main logic app times out after 2 minutes.(It process around 600 rows and times out.)
I came across this article which explains various patterns related to managing long running process.
https://learn.microsoft.com/en-us/azure/logic-apps/logic-apps-create-api-app
What would be the best approach to executing long running tasks without time out issues?
Your REST API should follow async pattern by returning 202 with a retry-after & location header, see more at: https://learn.microsoft.com/azure/logic-apps/logic-apps-create-api-app
Or, your REST API can be of webhook kind, so Logic Apps can provide a callback url for you to invoke once the processing is completed.

Resources