Run a web scraper most efficient in React / Node - node.js

I'm running a web scraper in my React (MERN STACK) web app. I'm using request-promise (rp) and cheerio library to fetch url/html.
I have this method to run in componentWillMount() every times a user goes to the X page. The array it fetches are around 80-150 elements long with 4-5 objects. But it doesn't seem very efficient to run it every time a user enters that X page. So is there a better way to do it? Sometimes it takes a while before the array "loads" / from 5 seconds and up to 30-40 seconds at most.
One option I wondered if was possible is a fetch method running every 15 minutes (for the whole server) or so and posting it to my MongoDB, then retrieving when user enters that X page instead. Is that possible in any way? Like a extern method without having anyone on the page?
Or is there any script you could run on your desktop to run every 15 minutes to push data to database?

Ended up using Heroku Scheduler to set up a cron job, works great.

Related

How do I save data across a node.js server?

I'm using Heroku to host a node.js server where a variable that stores the number of times every user that used the site has clicked something on it. When clicked, the variable gets increased by 1. However, Heroku does this thing where inactivity for 15 mins causes the site to go to sleep and everything is reset. I tried to use node.js to write to a file and save it but it seems the files are also reset. Does anyone know a way to get the data saved even after Heroku declares it inactive?
There is no way around it, since Heroku gets rid of files after inactivity. You need some external storage like a MongoDB set up somewhere else.

Logic apps - Get response time of a http request

I am trying to use Logic apps to ping our website every 10 minutes. I would like to know how to get a response time of that call to make sure the website is now slow.
Currently i am doing this
Recurrence (Every 10 minutes)
Get Current Time
Http GET Call
Get Current time 2
Difference of (Current time 2 - Current time)
Condition to see if it is greater than threshold.
This looks like a not clean solution. Wondering if there is a easier way to get the time / latency of that HTTP call in step 3
According to the official doc, with the connector you're using is not possible to get response time. You'd better use Azure Functions for that. More info:
https://learn.microsoft.com/en-us/azure/connectors/connectors-native-http
You can use azure application insights for this kind of situation it's the best and optimal solution.
https://learn.microsoft.com/en-us/azure/azure-monitor/app/app-insights-overview

Run a schedule function for every user in node js

I need to find the best possible way to do the following task.
I have different users let's say (over 500) and all users have a scheduled function that need to be run twice every day. But if any of the user's phone is off. Then of course that function wont work since its code is written on client side.
now what i want to do is Run a scheduled function in the backend using Node js, but idk how to run that for every user. (note : every user has different schedules). Thats why i wrote that in client side, but with with a possibility of phone might be switched off so its bit off to do that.
What should i do in this scenario? any leads?

Streaming data from node js controller to web page continuously

I have a very large nodejs controller that runs thru huge tables and checks data against other huge tables. This takes quite a while to run.
Today my page only shows the final result after 20 minutes.
Is there anyway to stream out the result continuously from the controller to the web page? Like a real time scrolling log of whats going on?
(or is Socket I.O. my only option)
You can try with node-cron and set the dynamic query for fetching data with different limits and append data in the front-end side.
I am not sure is it a proper way or not.

ASP.NET WEBAPI MVC service return to page and keep processing asynchronous multthreading?

We have a WEBAPI service running on a windows asp.net MVC solution. There is a load method that takes about 40 minutes to complete and return status on the called page. During that time the browser window is tied up. What design options do we have if we want the web page to come back with submitted and the process to continue to run and complete. I don't care if page never shows complete, we can pull that from another status page.
I've done something similar in the past, even though in my case the delay was shorter - 40-50 seconds of loading of fresh data from multiple backend servers in a VPN. It was also in ASP.NET back then, but I believe that the approach is still feasible and you can get some ideas if I share my experience. I remember an old thread that I had favourited in the past and used the insight from it. You can check it out.
Here are some tips, but in short, because I don't remember the details anymore (excuse my google-assisted memory!):
You should start the task in a new thread and not wait for it in your main thread.
You should also make sure that the task is started only once and cannot be initiated infinite number of times by the user via refresh or via the UI. So, you better persist the state in the database, so at refresh, the new thread is created only if the database says that it has not been executed recently or it is not in progress.
Your page will be loaded and show its contents and you can display a .gif representing a progress bar, a loading wheel or something similar to the user.
The task you started will continue on the server. When it completes you can push and update the UI via ajax from within the code-behind to make the experience even smoother if you like.
On subsequent requests, you can just retrieve the state of your task from the database in order to display something like update completed at hh:mm:ss.
Hope this helps you and I wish you the best of luck!

Resources