Scrolling in browser cause slow fetch(url) responses - node.js

[context] - this turned out to be irrelevant, the issue at hand is a client side thing
I'm experiencing some strange response times from my NodeJS/Express app.
According to the logs, the requests complete in 180-220 MS
But from the web client perspective, I'm seeing these numbers which are very strange, some are in the range of seconds. and the payload is not that big. its around 1.5k.
I've disabled every feature I could suspect, Sessions, AD Authentication etc.
To add to the confusion, the overhead is only there sometimes, maybe 30% of the requests. others complete in the same time range as listed in the first picture.
I wish I could give more context but it's a really simple React frontend using Fetch to HTTP GET data from the /contacts endpoint.
Nothing more.
[EDIT 1]
This seems to be a client side thing.
The whole process is part of a virtual/infinite scroll React component.
If I do scroll down using the down arrow key, response times stay normal.
If I scroll really fast using the scrollbar/touch, the response time goes up.
Important to know is that the services do not return more data, it is always 20 rows at a time, yet the response time increases.
So, does scrolling in a browser somehow prevent promises or other more native constructs from completing?
[EDIT 2]
Same behavior across Chrome, Safari, Firefox.
Scrolling fast seems to make the request unable to complete in a timely manner.
Upgrading to React 16 seems to have improved the issue, or maybe just false positives.
[EDIT 3]
Even when replacing the call to the backend to a static JSON service online. the behavior still persists, so this has nothing to do with Express or NodeJS.
There is something odd about scrolling and React+Fetch.
[EDIT 4]
Client side code snippets:
Event listeners
this.scrollHandler = this.checkWindowScroll;
this.resizeHandler = this.checkWindowScroll;
window.addEventListener("scroll", this.scrollHandler, { passive: true });
window.addEventListener("resize", this.resizeHandler, { passive: true });
ScrollHandler
//check if we have scrolled to the bottom of the screen
checkWindowScroll = () => {
if (this.state.loading) {
return;
}
const trigger = 800;
const pageBottom =
window.document.body.getBoundingClientRect().height - trigger;
const scrollBottom = window.pageYOffset + window.innerHeight;
if (scrollBottom > pageBottom) {
this.loadMore();
}
};
Fetch Next Data
//responsible for fetching another chunk of data from a backend
async loadMore() {
this.setState({ loading: true, error: undefined }); // begin load
const items = await this.props.fetchData(
this.search,
this.state.items.length
);
this.setState(
{
loading: false,
error: undefined,
items: [...this.state.items, ...items] // clone
},
() => this.checkWindowScroll() // once state is updated, check again if we need more data
);
}

Related

Progress bar for express / react communicating with backend

I want to make a progress bar kind of telling where the user where in process of fetching the API my backend is. But it seems like every time I send a response it stops the request, how can I avoid this and what should I google to learn more since I didn't find anything online.
React:
const {data, error, isError, isLoading } = useQuery('posts', fetchPosts)
if(isLoading){<p>Loadinng..</p>}
return({data&&<p>{data}</p>})
Express:
app.get("api/v1/testData", async (req, res) => {
try {
const info = req.query.info
const sortByThis = req.query.sortBy;
if (info) {
let yourMessage = "Getting Data";
res.status(200).send(yourMessage);
const valueArray = await fetchData(info);
yourMessage = "Data retrived, now sorting";
res.status(200).send(yourMessage);
const sortedArray = valueArray.filter((item) => item.value === sortByThis);
yourMessage = "Sorting Done now creating geojson";
res.status(200).send(yourMessage);
createGeoJson(sortedArray)
res.status(200).send(geojson);
}
else { res.status(400) }
} catch (err) { console.log(err) res.status(500).send }
}
You can only send one response to a request in HTTP.
In case you want to have status updates using HTTP, the client needs to poll the server i.e. request status updates from the server. Keep in mind though that every request needs to be processed on the server side and will take resources away which are then not available for other (more important) requests from other clients. So don't poll too frequently.
If you want to support long running operations using HTTP have a look at the following API design pattern.
Alternatively you could also use a WebSockets connection to push updates from the server to the client. I assume your computation on the backend will not be minutes long and you want to update the client in real-time, so probably WebSockets will be the best option for you. A WebSocket connection has, once established, considerably less overhead than sending huge HTTP requests/ responses between client and server.
Have a look at this thread which dicusses abovementioned and other possibilites.

NodeJs,ExpressJS Running functions in background

im asking this question because i dont know what to look for right now and my googling wasnt great so far.
I am making nodejs,express,sql app that scrape website. It takes 30 to 120 seconds to scrape whole category. How to make that function run in the background without blocking website. Frontend template engine is eJS. If its not possible to do with eJS which framework,library should i use then? I imagine it work like that
User go to /scrape
Choose category and send to server by clicking button
Some container on /scrape gets greyed out with circle rotating or
other % or smth
User can freely leave /scrape and click around webiste or just stay
on /scrape waiting for result
When user cames back to /scrape the results are there or when he
stayed results shows up with or without reloading the page
Getting full respond to these questions will be very helpfull. But just keywords for me to look up also are very helpfull
Sorry for bad english
For your case here you could use redis, or just store the data you scrape on an data structrue that you like (in my opinion, because of the category, hashmaps (js objects) are the best here) directly in nodejs. The process would then look like this:
User goes to /scrape and selects a category
Backend checks if that category was already scraped (e.g. checks for the data in the hashmap with the category name as key)
If the data exists (just check if the key is defined), then send the data to the user, else (if the data isn't stored, e.g. key == undefined), send the user a message that the data is beign scraped and just run the scrape funtion in the backround. The scrape function than scrapes the data, and if it is done, it pushes the data with the category key to the hashmap. To avoid the same categorys beign scraped at the same time, you could add a "pending" property to the hashmap. So if the user accesses the /scrape route, you check in the hashmap if the category key exsists, if yes and pending is false, send data, if yes and pending is true, send wait alert, if the key doesn't exists, start the scrape function and send a wait alter.
Additionally, to make the whole thing "live", you could use socket.io (https://socket.io/) to implement websockets. You could then send the scraped data to the user without the user having to reload the page to check if the scrape process is done.
I made a little exmaple, that doesn't implement scraping, but should make the whole logic here a little bit easier to understand. I also added some explenation to the code in form of comments.
const express = require("express");
const app = express()
// the data hashmap
const data = {};
// scrape function
const scrape = async (id) => {
// set pending to true to prevent multiple scraped on same category
data[id] = { pending: true, data: {} }
// this would be your scrape functio, I used a promise here that
// resolves after 5 seconds with an random number just for
// simplicity
const a = await new Promise((res, rej) => {
setTimeout(() => { res(Math.floor(Math.random()*1000)) }, 5000)
})
// if the data was scraped, set pending to false and add the data
data[id].pending = false;
data[id].data = { id: a }
}
// "scrape" route
app.get("/:id", async (req, res) => {
const { id } = req.params; // if would represent category
// check if id (category) is not in hashmap, if not, then
// start the scrape process and send a wait alert
if (data[id] == undefined) {
scrape(id);
res.send("scraping...")
// if the data is already beign scraped, send a wait alert
// the pending property prvents that multiple people trigger
// the scrape of the same category
} else if (data[id].pending == true) {
res.send("still scraping...")
// lastly, if the data is defined, and is not pending, then
// you could just send it
} else {
res.send(data[id].data)
}
})
// to test this, go to the root with any id, could be string, number,
// whatever (e.g. /1337 or /helloworld), wait for 5 seceonds (or
// leave and come back after 5 seconds), refresh the page and you can
// see the random number. If you now go to an other route (e.g /test)
// and go back to the last one, you still can see the data, if you again
// wait for 5 seconds and then go back to /test, you can see the data.
// You can also open multiple tabs at the same time, which means the
// scraping is asynchronous, so you don't have to wait for the
// one category to be scraped to scrape the next
app.listen(5000)

Capture requests (XHR, JS, CSS) from embedded iframes using devtool protocol

For the context, I am developing a synthetic monitoring tool using Nodejs and puppeteer.
For each step of a defined scenario, I capture a screenshot, a waterfall and performance metrics.
My problem is on the waterfall, I previously used puppeter-har but this package is not able to capture request outside of a navigation.
Therefore I use this piece of code to capture all interesting requests :
const {harFromMessages} = require('chrome-har');
// Event types to observe for waterfall saving (probably overkill, I just set all events of Page and Network)
const observe = [
'Page.domContentEventFired',
'Page.fileChooserOpened',
'Page.frameAttached',
'Page.frameDetached',
'Page.frameNavigated',
'Page.interstitialHidden',
'Page.interstitialShown',
'Page.javascriptDialogClosed',
'Page.javascriptDialogOpening',
'Page.lifecycleEvent',
'Page.loadEventFired',
'Page.windowOpen',
'Page.frameClearedScheduledNavigation',
'Page.frameScheduledNavigation',
'Page.compilationCacheProduced',
'Page.downloadProgress',
'Page.downloadWillBegin',
'Page.frameRequestedNavigation',
'Page.frameResized',
'Page.frameStartedLoading',
'Page.frameStoppedLoading',
'Page.navigatedWithinDocument',
'Page.screencastFrame',
'Page.screencastVisibilityChanged',
'Network.dataReceived',
'Network.eventSourceMessageReceived',
'Network.loadingFailed',
'Network.loadingFinished',
'Network.requestServedFromCache',
'Network.requestWillBeSent',
'Network.responseReceived',
'Network.webSocketClosed',
'Network.webSocketCreated',
'Network.webSocketFrameError',
'Network.webSocketFrameReceived',
'Network.webSocketFrameSent',
'Network.webSocketHandshakeResponseReceived',
'Network.webSocketWillSendHandshakeRequest',
'Network.requestWillBeSentExtraInfo',
'Network.resourceChangedPriority',
'Network.responseReceivedExtraInfo',
'Network.signedExchangeReceived',
'Network.requestIntercepted'
];
At the start of the step :
// list of events for converting to HAR
const events = [];
client = await page.target().createCDPSession();
await client.send('Page.enable');
await client.send('Network.enable');
observe.forEach(method => {
client.on(method, params => {
events.push({ method, params });
});
});
At the end of the step :
waterfall = await harFromMessages(events);
It works good for navigation events, and also for navigation inside a web application.
However, the web application I try to monitor has iframes with the main content.
I would like to see the iframes requests into my waterfall.
So a few question :
Why is Network.responseReceived or any other event doesn't capture this requests ?
Is it possible to capture such requests ?
So far I've red the devtool protocol documentation, nothing I could use.
The closest to my problem I found is this question :
How can I receive events for an embedded iframe using Chrome Devtools Protocol?
My guess is, I have to enable the Network for each iframe I may encounter.
I didn't found any way to do this. If there is a way to do it with devtool protocol, I should have no problem to implement it with nodsjs and puppeteer.
Thansk for your insights !
EDIT 18/08 :
After more searching on the subject, mostly Out-of-process iframes, lots of people on the internet point to that response :
https://bugs.chromium.org/p/chromium/issues/detail?id=924937#c13
The answer is question states :
Note that the easiest workaround is the --disable-features flag.
That said, to work with out-of-process iframes over DevTools protocol,
you need to use Target [1] domain:
Call Target.setAutoAttach with flatten=true;
You'll receive Target.attachedToTarget event with a sessionId for the iframe;
Treat that session as a separate "page" in chrome-remote-interface. Send separate protocol messages with additional sessionId field:
{id: 3, sessionId: "", method: "Runtime.enable", params:
{}}
You'll get responses and events with the same "sessionId" field which means they are coming from that frame. For example:
{sessionId: "", method: "Runtime.consoleAPICalled",
params: {...}}
However I'm still not able to implement it.
I'm trying this, mostly based on puppeteer :
const events = [];
const targets = await browser.targets();
const nbTargets = targets.length;
for(var i=0;i<nbTargets;i++){
console.log(targets[i].type());
if (targets[i].type() === 'page') {
client = await targets[i].createCDPSession();
await client.send("Target.setAutoAttach", {
autoAttach: true,
flatten: true,
windowOpen: true,
waitForDebuggerOnStart: false // is set to false in pptr
})
await client.send('Page.enable');
await client.send('Network.enable');
observeTest.forEach(method => {
client.on(method, params => {
events.push({ method, params });
});
});
}
};
But I still don't have my expected output for the navigation in a web application inside an iframe.
However I am able to capture all the requests during the step where the iframe is loaded.
What I miss are requests that happened outside of a proper navigation.
Does anyone has an idea about the integration into puppeteer of that chromium response above ? Thanks !
I was looking on the wrong side all this time.
The chrome network events are correctly captured, as I would have seen earlier if I checked the "events" variable earlier.
The problem comes from the "chrome-har" package that I use on :
waterfall = await harFromMessages(events);
The page expects the page and iframe main events to be present in the same batch of event than the requests. Otherwise the request "can't be mapped to any page at the moment".
The steps of my scenario being sometimes a navigation in the same web application (=no navigation event), I didn't have these events and chrome-har couldn't map the requests and therefore sent an empty .har
Hope it can help someone else, I messed up the debugging on this one...

Chain of endpoints in Node and Express: how to prevent that some of them stops all the series?

In some page I have to get information from 8 different endpoints. 2 of them are outside of my application and sometimes they cause an delay at displaying data. The web browser waits until the data is processed. Once they're outside of my app I can't refactor them in order to make them fast, but I need to show the information that they provide. In addition, sometimes one of them returns nothing. If so, I use default data to show to the user. The waiting time takes time for the user experience perspective.
I'm using promises to call these endpoints. Below is part of the code snippet that I am using.
The code is working fine. The issue is the delay.
First. Here is the array that contains all the service that I need to process:
var requests = [{
// 0
url: urlLocalApi + '/endpointURL_1/',
headers: {
'headers': 'apitoken'
},
}, {
// 1
url: urlLocalApi + '/endpointURL_2/',
headers: {
'headers': 'apitoken'
},
];
The code of array is encapsulated in this method:
const requests = homePageFunctions.createRequest();
Now, it is how the data is processed. I am using both 'request-promise' and 'bluebird', and a personal logger to check it out if everything goes fine.
const Promise = require("bluebird");
const request = require('request-promise');
var viewsHelper = {
getPageData: function (requests) {
return Promise.map(requests, function (obj) {
return request(obj).then(function (body) {
AppLogger.log(`Endpoint parsed`, statusLogger.infodate);
return JSON.parse(body);
});
});
}
}
module.exports = viewsHelper;
How do I call this?
viewsHelper.getPageData(requests)
.then(results => {
var output = [];
for (var i = 0; i < results.length; i++) {
output.push(results[i]);
}
// render data
res.render('homepage/index', output);
AppLogger.log(`PageData is rendered`, statusLogger.infodate);
})
.catch(err => {
console.log(err);
});
};
Take a look that inside of each index item of "output" array, there is the output of each data of each endpoint.
The problem here is:
If any of the endpoint takes long, the entire chain slows even though
if they are already processed. The web page waits in a blank mode.
How to prevent this behavior?
That is an interesting question but I have questions in order to answer it effectively.
You have Node server and client (HTML/JS)
You have 8 end points 2 are slow because you don’t have control over them.
Does the client (page) aware of the 8 end points? I .e you make 8 calls everytime you reload the page?
OR
Does the page makes one request to your node JS and your nodeJS synchronously calls the 8 end points
If it is 1 then lazy loading will work easily for you since the page is making the requests.
If it is 2 lazy loading will work only at the server side however the client will be blocked because it doesn’t know (or care how you load your data. The page made one request and it is blocked waiting for that request..
Obviously each method has pros and cons ..
One way you can solve this is to asynchronously call those end points on node and cache them and when the page makes the 1 request you have the data ready ..
Again we know very little about the situation there are many ways to solve this
Hope this helps

Node.js: Multiple very heavy requests at the same time, single response to all requests

I am sorry that I can't come up with a better title.
I always have this problem (when coding in node.js also python) but I think my solution is kind dirty.
I am here to seek a better solution for this problem.
Here is the scenario:
Your server is doing a very very heavy task upon a special http request (like generating browser screenshot for an URL/generating game server banner with statistics). Whoever did a HTTP request to your server will get the same response. The response will be cached for a long time.
For example, in the browser screenshot generating HTTP request, your server is expected to spawn a phantomjs, capture the screenshot, save it and cache it for a long time, then respond with the PNG captured. The HTTP request after this should hit the cache.
The pseudo code to scenario:
server.get(":urlname.png", function(req, res, next) {
var cached = cache.get(req.params_urlname);
if (cached) {
res.send(cached);
return;
}
// This will take very long time
generateScreenshot(req.params_urlname, function(pngData) {
cache.set(req.params_urlname, pngData, LONG_TIME);
res.send(cached);
});
});
Here is the problem:
Imagine that you have a screenshot generating URL
(http://yourserver.com/generate-screenshot/google.png). The screenshot
is not generated nor cached yet.
Your posted the URL in a very popular forum, and there are 1000 HTTP requests to the that URL at the same time! It means that your server will have to spawn 1000 phantomjs and all of them together will generate screenshot of google.com at the same time, which is crazy!
In other words, the heavy function should be executed only once for generating cache.
My current code solution to the problem:
var pendingResponse = {};
server.get(":urlname.png", function(req, res, next) {
var cached = cache.get(req.params_urlname);
if (cached) {
res.send(cached);
return;
}
// The screenshot is currently generating for other request. Let's mark this response as pending.
if (req.params_urlname in pendingResponse) {
pendingResponse[req.params_urlname].push(res);
return;
}
// The screenshot needs to be generated now. Let's mark the future response as pending.
req.params_urlname[req.params_urlname] = [];
// This will take very long time
generateScreenshot(req.params_urlname, function(pngData) {
cache.set(req.params_urlname, pngData, LONG_TIME);
res.send(cached);
// Let's respond all the pending responses with the PNG data as well.
for (var i in pendingResponse[req.params_urlname]) {
var pRes = pendingResponse[req.params_urlname][i];
pRes.send(cached);
}
// No longer mark the future responses as pending.
delete pendingResponse[req.params_urlname];
});
});
This solution works. However, I consider this solution dirty, because it not reusable at all. Also, I think it may cause resource leak. Is there any better solution / library?
Here's a proof-of-concept server doing this result caching using a memoizee package (not only removes the necessity to cache computations in progress, but also allows to remove the "cache" altogether):
var express = require('express');
var memoize = require('memoizee');
function longComputation(urlName, cb) {
console.log('called for ' + urlName);
setTimeout(function () {
console.log('done for ' + urlName);
cb();
}, 5000);
}
var memoizedLongComputation = memoize(longComputation, {async: true, maxAge: 20000});
var app = express();
app.get('/hang/:urlname', function (req, res, next) {
memoizedLongComputation(req.params.urlname, function () {
res.send('hang over');
});
});
app.listen(3000);
Here we make the result be cached for 20 seconds.
When I start the server and then run in the shell
for i in `seq 1 10`; do curl http://localhost:3000/hang/url1; done
(or just open several browser tabs and quickly navigate them all to http://localhost:3000/hang/url1), I see one "called for url1" and in 5 s one "done for url1" message in the console, meaning only one "real" longComputation call was made. If I repeat it shortly after (less than in 20 s), there are no additional messages, and results are returned instantaneously, because they are cached. If I repeat the command later (in more than 20 s), there's again one call only.

Resources