Solr ECONNRESET during load tests with Node.js - node.js

I tried to create some load on our Solr server by means of a simple Node.js script which executes requests in a bluebird-"Promise.map". When i drive this up to about 1000 "parallel" requests, solr starts to close connections and i get "ECONNRESET" errors in Node.js.
I'm supprised, since it would assume that solr (or better to say jetty) should be able the handle this amount of requests. I don't see any indication for errors in the Solr log.
1.) Should solr / jetty be able to handle this?
2.) Would it be expected, that "ECONNRESET" get more frequent, if solr has to process multiple heavy queries?
3.) If 1. and 2. shouldn't be an issue, are there any suggestions, why this happens?
Thanks a lot!

Related

Fetch data from external API and populate database every minutes

I would like to fetch data from external API with limited request and populate my database. My concern is more about the architecture, language and tools to use. I would like to have a big picture in term of performance and good practise.
I did make an cron with nodejs and express running every minutes and populate my database and it works. On the same server i did created some routes to be called for client.
What should be better to do rather than using cron on nodejs ? I know that i can also make cron under linux calling a script whatever it's python or nodejs. But what would be the good practise ? Specially if i want more cron instead of a single one ?
Should i separate my cron into another instance to not block any request from client ? If my server is already busy retrieving data from external API while someone is calling a route in the same server does it will increase the latency ?
There is some tools to monitor my tasks instead of using logs ?
As i know node js is better to handle big amount of requests than a few other servers but if you are able to change the framework then you can give chance to https://bun.sh/.
also, you can try multithreading in node.js it can be more affordable and easy.
https://www.digitalocean.com/community/tutorials/how-to-use-multithreading-in-node-js

Postgresql IPC: MessageQueueSend delaying queries from nodejs backend

I am testing postgresql with a nodejs backend server, using Pg npm module to query the database. The issue I am having is that when I run a particular query directly on the postgres database table using query tool on pgAdmin4, the data is fetched within 5 seconds. But the same query when requested from the backend through my nodejs server, the process is split between parallel workers and a client backend using IPC: messagequeuesend, this runs for almost 17minutes before return the data. I can't understand why the same query is fast using query tool, it just processes it fast but the one coming from my server has to delay. Is there a way to increase the priority for queries coming from backend to run like it was queried inside pgAdmin. I noticed when I check pg_stat_activity, there is an application value for the query when using query tool, but when the same query comes from the nodejs server the application value is null. I do not seem to understand why its like this, i have been searching every community for answers to this for the past 5 days, and there is no question or answer for this. Please any help will be appreciated. Thanks in advance
I tried running a query from the backend, but its split using IPC processes and result comes in after 17 minutes, the same query takes only 5 seconds to return a result inside pgAdmin query tool

How many connections could node's mongoose and mongodb themselves handle? Will the server crash?

I was thinking about building some kind of API built on NodeJS with mongoose. I read that mongoose uses 1 connection per app.
But let us say that we have 300,000 users joining a room to answer some questions (real-time), will mongoose/mongodb handle it? Or will the server itself even handle it?
Thinking on the database side only:
The mongod executable have a parameter (--maxConns) for setting the maximum number of connections, prior to v2.6 you had a limit for that. Now, as the docs say: "This setting has no effect if it is higher than your operating system’s configured maximum connection tracking threshold", see here for linux.
Besides that you MUST consider a sharded cluster for this kind of load.

Where to find what queries are hitting to gremlin server via gremlin-javascript

I am using gremlin-javascript module of nodejs to query titan database. Everything is working fine but I want to monitor what is actually hitting the gremlin server and anything else that i can get to know about that query. I already checked the gremlin-server log in logs folder inside the titan folder . I can not find anything of use in those logs. Any help in this regard will be extremely useful. thanks
For a client side solution with gremlin-javascript, there is currently no quick and easy way to log to the console outgoing queries or protocol messages sent to Gremlin Server.
You could either:
Implement your own function that wraps calls to the Gremlin client methods you call (typically client.execute()), and logs arguments. If using Node.js v6+, this could be a nice use case for an ES2015 Proxy object. This is the safest, non intrusive approach.
Monkeypatch the client.prototype.messageStream method, and log parameters. As of v2.3.2, this low level method gets called whether you're doing client.execute() or client.stream(). This is riskier and trickier.
Quick and dirty: edit the source code in ./node_modules/gremlin/lib/GremlinClient.js and add this after line 405 (prototype.messageStream definition):
console.log('query:', script);
console.log('params:', bindings);
There's currently an open issue about logging of ingoing messages but this could be developed to include outgoing messages as well (queries with parameters, down to protocol messages).

Ektorp querying performance against CouchDB is 4x slower when initiating the request from a remote host

I have a Spring MVC app running under Jetty. It connects to a CouchDB instance on the same host using Ektorp.
In this scenario, once the Web API request comes into Jetty, I don't have any code that connects to anything not on the localhost of where the Jetty instance is running. This is an important point for later.
I have debug statements in Jetty to show me the performance of various components of my app, including the performance of querying my CouchDB database.
Scenario 1: When I initiate the API request from localhost, i.e. I use Chrome to go to http://localhost:8080/, my debug statements indicate a CouchDB peformance of X.
Scenario 2: When I initiate the exact same API request from a remote host, i.e. I use Chrome to go to http://:8080/, my debug statements indicate a CouchDB performance of 4X.
It looks like something is causing the connection to CouchDB to be much slower in scenario 2 than scenario 1. That doesn't seem to make sense, since once the request comes into my app, regardless of where it came from, I don't have any code in the application that establishes the connection to CouchDB differently based on where the initial API request came from. As a matter of fact, I have nothing that establishes the connection to CouchDB differently based on anything.
It's always the same connection (from the application's perspective), and I have been able to reproduce this issue 100% of the time with a Jetty restart in between scenario 1 and 2, so it does not seem to be related to caching either.
I've gone fairly deep into StdCouchDbConnector and StdHttpClient to try to figure out if anything is different in these two scenarios, but cannot see anything different.
I have added timers around the executeRequest(HttpUriRequest request, boolean useBackend) call in StdHttpClient to confirm this is where the delay is happening and it is. The time difference between Scenario 1 and 2 is several fold on client.execute(), which basically uses the Apache HttpClient to connect to CouchDB.
I have also tried always using the "backend" HttpClient in StdHttpClient, just to take Apache HTTP caching out of the equation, and I've gotten the same results as Scenarios 1 and 2.
Has anyone run into this issue before, or does anyone have any idea what may be happening here? I have gone all the way down to the org.apache.http.impl.client.DefaultRequestDirectory to try to see if anything was different between scenarios 1 and 2, but couldn't find anything ...
A couple of additional notes:
a. I'm currently constrained to a Windows environment in EC2, so instances are virtualized.
b. Scenarios 1 and 2 give the same response time when the underlying instance is not virtualized. But see a - I have to be on AWS.
c. I can also reproduce similar 4X slower performance as in scenario 2 with this third scenario: instead of making the localhost:8080/ using Chrome, I make it using Postman, which is a Chrome application. Using Postman from the Jetty instance itself, I can reproduce the 4X slower times.
The only difference I see in c. above is that the request headers in Chrome's developer tools indicate a Remote Address of [::1]:8080. I don't have any way to set that through Postman, so I don't know if that's the difference maker. And if it were, first I wouldn't understand why. And second, I'm not sure what I could do about it, since I can't control how every single client is going to connect to my API.
All theories, questions, ideas welcome. Thanks in advance!

Resources