EMFILE error on bulk data insert - node.js

I'm developing an loopback application to get data using oracledb npm module from ORACLE and convert it to JSON format to be stored in MONGODB.
MONGODB is accessed using "loopback-connector-mongodb".
The data to be stored would be around for 100 collections as of for 100 tables from ORACLE.
I'm pushing data with http request row by row for the entire collection list from node server from my local application to another server application on remote machine using http-request through remote method call.
When the data write operation the remote machine application stops throwing an error showing "EMFILE error".
I googled and some reference showed that it is because of the maximum number of opened files/sockets. Hence i tried disconnecting the DataSource for each request. Still i'm getting the same error.
Please help me on this!!
Thanks.

If you are making an http request for each row of data, and aren't throttling or otherwise controlling the order of those requests, it is possible you are simply making too many requests at once because of node's async io model.
For example, making those calls in a simple for loop would actually result in all of them being made in parallel.
If this is the case, you might want to consider using something like the async module, which includes some utilities for throttling the parallelism.

Don't forget Oracle Database 12.1.0.2 has JSON support. Maybe you don't need to move the data in the first place?
See JSON in Oracle Database. To quote the manual:
'Oracle Database supports JavaScript Object Notation (JSON) data
natively with relational database features, including transactions,
indexing, declarative querying, and views.'

Related

Postgresql IPC: MessageQueueSend delaying queries from nodejs backend

I am testing postgresql with a nodejs backend server, using Pg npm module to query the database. The issue I am having is that when I run a particular query directly on the postgres database table using query tool on pgAdmin4, the data is fetched within 5 seconds. But the same query when requested from the backend through my nodejs server, the process is split between parallel workers and a client backend using IPC: messagequeuesend, this runs for almost 17minutes before return the data. I can't understand why the same query is fast using query tool, it just processes it fast but the one coming from my server has to delay. Is there a way to increase the priority for queries coming from backend to run like it was queried inside pgAdmin. I noticed when I check pg_stat_activity, there is an application value for the query when using query tool, but when the same query comes from the nodejs server the application value is null. I do not seem to understand why its like this, i have been searching every community for answers to this for the past 5 days, and there is no question or answer for this. Please any help will be appreciated. Thanks in advance
I tried running a query from the backend, but its split using IPC processes and result comes in after 17 minutes, the same query takes only 5 seconds to return a result inside pgAdmin query tool

Sequelize | notify my application when an insert occurs in a specific SQL Server table

I'm working on an API that sends messages contained in a SQL Server database.
The problem is that I implemented it in a way that every 10 seconds the API performs a query for messages not yet sent.
I would like to optimize this by making the SQL Server warn every time my table receives an insert, so that the application can query the messages to be sent.
For that I'm using node JS and importing Sequelize.
I also think it's important to comment that the inserts of this table are made by another application.
If your infrastructure has a message broker set up (it probably doesn't, but ask your operations crew) you can use an event notification for this.
Otherwise you're stuck polling your server, as you described in your question. That is a workable strategy, but it's worth hard work to make sure the polling query is very efficient.

Alternative to GraphQL long polling on an Express server for a large request?

Objective
I need to show a big table of data in my React web app frontend.
My backend is an Express server with a GraphQL layer and a few "normal" endpoints.
My server gets data from various sources, including an external API, which is the data source for my current task.
My server has a database that I can use freely. I cannot directly access the external API from my front end.
The data all comes from the external API I mentioned. In fact, it comes from multiple similar calls to the same endpoint with many different IDs. Each of those individual calls takes a while to return but doesn't risk timing out.
Current Solution
My naive implementation: I do one GraphQL query in which the resolver does all the API calls to the external service in parallel. It waits on them all to complete using Promise.all(). It then returns a big array containing all the data I need to my server. My server then returns that data to me.
Problem With Current Solution
Unfortunately, this sometimes leaves my frontend hanging for too long and it times out (takes longer than 2 minutes).
Proposed Solution
Is there a better way than manually implementing long polling in GraphQL?
This is my main plan for a solution at the moment:
Frontend sends a request to my server
Server returns a 200 and starts hitting the external API, and sets a flag in the database
Server stores the result of each API call in the database as it completes
Meanwhile, the frontend shows a loading screen and keeps making the same GraphQL query for an entity like MyBigTableData which will tell me how many of the external API calls have returned
When they've all returned, the next time I ask for MyBigTableData, the server will send back all the data.
Question
Is there a better alternative to GraphQL long polling on an Express server for this large request that I have to do?
An alternative that comes to mind is to not use GraphQL and instead use a standard HTTP endpoint, but I'm not sure that really makes much difference.
I also see that HTTP/2 has multiplexing which could be relevant. My server currently runs HTTP/1.1 and upgrading is something of an unknown to me.
I see here that Keep-Alive, which sounds like it could be relevant, is unusable in Safari which is bad as many of my users use Safari to access the frontend.
I can't use WebSockets because of technical restraints. I don't want to set a ridiculously long timeout on my client either (and I'm not sure if it's possible)
I discovered that GraphQL has polling built in https://www.apollographql.com/docs/react/data/queries/#polling
In the end, I made a REST polling system.

NodeJS sharding architecture with many MondoDB databases approaches

We have architecture problem on our project. This project requires sharding, as soon as we need almost unlimited scalability for the part of services.
Сurrently we use Node.js + MongoDb (Mongoose) and MySQL (TypeORM). Data is separated by databases through the simple 'DB Locator'. So node process needs connections to a lot of DBs (up to 1000).
Requests example:
HTTP request from client with Shop ID;
Get DB IP address/credentials in 'DB Locator' service by Shop ID;
Create connection to specific database with shop data;
Perform db queries.
We tried to implement it in two ways:
Create connection for each request, close it on response.
Problems:
we can't use connection after response (it's the main problem, because sometimes we need some asynchronous actions);
it works slower;
Keep all connections opened.
Problems:
reach simultaneous connections limit or some another limits;
memory leaks.
Which way is better? How to avoid described problems? Maybe there is a better solution?
Solution #1 perfectly worked for us on php as it runs single process on request and easily drops connections on process end. As we know, Express is pure JS code running in v8 and is not process based.
It would be great to close non-used connections automatically but can't find options to do that.
The short answer: stop using of MongoDB with Mongoose 😏
Longer answer:
MongoDB is document-oriented DBMS. The main usage case is when you have some not pretty structured data that you have to store, but you don't need to use too much. There is lazy indexing, dynamic typing and many more things that not allow you to use it as RDBMS, but it is great as a storage of logs or any serialized data.
The worth part here is Mongoose. This is the library that makes you feel like your trashbox is wonderful world with relations, virtual fields and many things that should not to be in DODBMS. Also, there is a lot of legacy code from previous versions that also make some troubles with connections management.
You already use TypeORM that may works instead Mongoose. With some restrictions, for sure.
It works exactly same way as MySQL connection management.
Here is some more data: https://github.com/typeorm/typeorm/blob/master/docs/mongodb.md#defining-entities-and-columns
In this case you may use you TypeORM Repository as transparent client that will init connections and close it or keep it alive on demand.

request advice converting project to node.js

Our company processes machine data. We get machine reports from a socket connection and write the data to mysql tables. Then, other scripts and threads written in python and ruby pull these records out and process the data, saving them back to the database and notify our clients on their machines health.
I want to convert these processes to node.js But, i have to do it kinda incrementally. Instead of doing so much writing back and forth to the database, I would like to hand the data off from our input process to a node server, that will process everything at once.
I suppose i can modify our input scripts to hand off the raw bytes we get from the machines or I could process them as JSON. I had thought about writing the new processes using a node http server, but that seems like some overhead. I have thought websockets may be a better solution, but I am looking for ideas.. thanks

Resources