Connecting to AWS Elasticsearch from non-AWS node.js app - node.js

I'm working on puzzling out an infrastructure-ish issue with a project I'm working on. The service that I'm developing is hosted on a transient, containerized platform w/o a stable IP — only a domain name (api.example.com). I'm utilizing Elasticsearch for search, so requests go to something like /my-search-resource and then use ES to find results to return. It's written in node and uses the supported elasticsearch driver to connect to ES.
The issue I'm having is in trying to use an AWS Elasticsearch domain. This project is bootstrapped, so I'm taking advantage of the free-tier from AWS, even though the other services are hosted/deployed on another platform (think: heroku, GCP, etc. — containerized and transient resources).
Since I can't just whitelist a particular IP, I'm not sure what I should do to enable the service to have access to the service. I do need to sign every request sent to the domain? This isn't ideal, since it would require monkey-patching the ES driver library with that functionality. Ideally, I'd like to just use username & pw to connect to the domain, but I know IAM isn't really oriented for something like that from an external service. Any ideas? Is this even something possible?

In my current project we connect to AWS Elastic by using the normal elasticsearch NPM package, and then use http-aws-es to create a specific AWS connection header when connecting.
So for example we have something like this:
const es = require( 'elasticsearch' );
const httpAwsEs = require( 'http-aws-es' );
const esClient = es.Client( {
hosts: 'somehostonaws',
connectionClass: httpAwsEs,
awsConfig: {
region: 'some-aws-region',
accessKey: 'some-aws-access-key',
secretKey: 'some-aws-secret-key'
}
} );
That wouldn't require the whole AWS SDK, but it would allow you to connect to Elastic's that are behind the AWS. Is that a solution to your issue?

This is not a solution to the problem, but a few thoughts on how to approach it. We're in the same pickle at the moment: we wish to use AWS but we do not want to tie in with AWS SDK. As far as I understand it, AWS offers 3 options:
Open to public (not advisable)
Fixed IP addresses (whitelist)
AWS authentication
Option 1 is not an option.
Option 2 presents us with the problem that we have to teach whatever we use to log there to go through a proxy so that the requests appear to come from the same IP address. Our setup is on Heroku and we use QuotaGuard for similar problems. However: i checked the modules I was going to use to interact (we're trying to log there, either to logstash or elasticsearch directly using winston transports) and they offer no support for proxy. Perhaps this is different in your case.
Option 3 is also not supported in any way by winston transports at this time. Which would leave us to use aws-sdk modules and tie in with AWS forever or write our own.

Related

Winston for logging a mulitlpe container application

I am planning to use the Digital Ocean App Platform to host my backend but I wanted to know if each container in App platform would have a different log file (assuming I’m logging into files with Winston) and if this is the case would this even be an issue.
I thought of multiple solutions in case I should handle this:
1- Log into the database
2- Make another container that expects to get the logs through HTTP from the other running containers.
(Note I'm new with dealing with containers so I might be missing/misunderstanding something)
Yes, you will get log files from each node.js process. For this scenario Winston supports alternative transports that can centralize logs from many sources.
As you suggested (2) some of these transport options write logs to an RDBMS.

best approach for in-memory storage (multi region on GCP/cloud memorystore)

I'm building a chat app in react-native, with a nodejs backend. I'm using the google cloud platform.
I'm using websockets to create a continuous connection between the app and the backend. Because users can send messages to specific clients, I'm storing the sockets in nodejs:
var sockets = {}
io.on('connection', socket => {
console.log('user connected')
let userId = socket.handshake.query.userId
sockets[userId] = socket
socket.on('message', msgData => {
let msg = JSON.parse(msgData)
sockets[msg.userId].emit('message', JSON.stringify(msg.message))
}
socket.on('disconnect', () => {
console.log('user disconnected')
delete sockets[userId]
}
})
Please note that this is a simplified example.
The problem is: I'm planning on having multiple instances in different regions behind a load balancer. When you connect to a specific instance, other instances can't reach the sockets object. So when 2 different users are connected to 2 different instances, they can't chat with each other.
To solve this, I was thinking of storing the sockets in a redis cache (cloud memorystore). But the redis instance must be in the same region as the VM instance. But like I said, I have multiple VM instances in multiple regions.
My questions are:
1) Is this solution the best way to go? Or are there any other possibilities, like just storing the sockets in a database?
2) How can I solve the issue of not being able to connect VM instances to a redis instance when they are not in the same region. Should I create a redis instance for each region I use (asia-east1, europe-north1, us-central1), and mirror those 3 redis instances so they all have the same content?
If you on the other hand have a total different approach, please let me know! I'm still learning nodejs and google cloud platform, and I'm open to new input.
Edit: All instances (instancegroups) are ofc in the same VPC.
Edit 2: What if I create a VM in the same region as the redis instance, and use this as a proxy? Would there be any performance issues?
Edit 3: I got it working by creating a proxy server using haproxy. The instance is located in the same region as the redis instance. One question: will there be any performance issues? And is this really the way to go?
Focusing on your first question, I would say that this architecture is not the best way of implementing a chat application. Google Cloud Platform provides a very strong messaging service - Pub/Sub. By using this service all the issues regarding load balancing, concurrency, connections and efficiency would be solved by the default.
Here you can find a really nice article about how to create a chat application wiht Cloud Pub/Sub. It is C# based, but the idea is the same, but using the Nodejs client libraries
Take a look on a general schema on how Pub/Sub works :
The architecture of this app will have the following advantages:
One-to-One (Direct) and One-to-Many messaging functionality
Transmission method that did not require a full server to be
developed
In case you do not want to use Pub/Sub, I would still think that you will need a centralized server application, which will be able to communicate with the users, process their messages and send them to the proper destination and reverse.
Regarding your second question, that may work, but I think it may affect the performance and, more important than that, the clearness of the system itself. It would be a nightmare to maintain, debug something like this.

Introducing node.js layer between UI and AWS services

I am designing a solution on AWS that utilizes Cognito for user management.
I am using this Quick Start as a starting point:
SAAS QuickStart
With one significant change: I plan to make this serverless. So no ECS containers to host the services. I will host my UI on S3.
My one question lies with the 'auth-manager' used in the existing solution, and found on github:
Auth-Manager using Node.js
Basically, this layer is used by the UI to facilitate interaction with Cognito. However, I don't see an advantage to doing it this way vs. simply moving these Cognito calls into the front-end web application. Am I missing something? I know that such a Node layer may be advantageous for providing a caching layer but I think I could just utilize Elasticache(Redis)as a service if I needed that.
Am I missing something? If I simply moved this Node auth-manager piece into my S3 static Javascript application, am I losing something?
Thanks in advance.
It looks like its pulling some info from
https://github.com/aws-quickstart/saas-identity-cognito/blob/master/app/source/shared-modules/config-helper/config.js
//Configure Environment
const configModule = require('../shared-modules/config-helper/config.js');
var configuration = configModule.configure(process.env.NODE_ENV);
which exposes lots of backend AWS account info, which you wouldn't want in a front end app.
Best case seems to run this app on a small ec2 instance instead of faragte because of the massive cost difference, and have your front-end send requests for authorization.

Conceptual question: How do React Native, Apollo, Node, and GraphQL all work together?

I'm new to GraphQL, Apollo, AWS S3, and Redux. I've read the tutorials for each and I'm familiar with React Native, Node, Heroku, and Mongo. I'm having trouble understanding the following:
how a "GraphQL Server" is hosted for a mobile device using React Native?
can I create the GraphQL server with Node and host it on AWS S3?
how to grab that data by using Apollo/GraphQL in my React Native code and store that data locally using Apollo/Redux?
do I have to use Graphcool as the endpoint from the start instead? All I'm trying to do is pulling data from my database when the app loads (not looking to stream it, so that I am able to use the data offline).
Where should I look to get a better understanding?
I have a couple comments for you in your exploration of new territory.
GraphQL is simply the query language the talks to your database. So you are free to run any type of api (on a server, serverless, etc.) that will use graphql to take in a graphql query/mutation and interact with your database.
GraphCool is a "production-ready backend" basically back-end as a service. So you wouldn't worry about running a server (as I believe they run most everything on serverless infrastructure) or managing where your DB is housed.
You can run an HTTP server on AWS EC2 or serverless using AWS Lambda. (Or the same flavor with Google or Azure). Whatever you decide to use to accept requests, your endpoint will accept graphql query strings and then do stuff with the db. AWS S3 is more of static storage. You can store files there to be retrieved, or scripts that can be pulled, but S3 probably isn't where you would want any server-like code to run.
Apollo would be a tool to use on your frontend for easily interacting with your graphql server. React-Apollo
Apollo/Redux may help you then manage the state throughout the app. You'll simply be loading the data into the app state on load then interacting with that state without needing to make any more external calls it sounds like.
Hopefully this was helpful.

Issues with a Node app deployed through elastic beanstalk

I'm deploying a node app to an ec2 instance through aws elastic beanstalk. I set up a cron job with the cron node package that, on tick, will run a sequelize query, parse the data returned, then send it in the body of an email.
When testing locally, it works fine and the email gets sent. When i deploy it using awsebcli command eb deploy, it says the deploy was successful, but I don't receive any emails.
At first I believed the npm start command wasn't working on the server, but I checked the error logs and it appears sequelize is throwing a time out error when trying to connect.
I wrote a configuration for sequelize to connect to multiple schemas at once. Three of those schemas are hosted on the same RDS, one on a seperate RDS.
I've done almost the exact same thing with another node app and it worked fine. The only thing different is the additional schema on a seperate RDS that I'm connecting to fine on my local machine.
Any thoughts or suggestions would be appreciated.
EDIT: Checked server logs and found Sequelize connection error.
Found that the issue was caused by security groups on aws preventing my instance from connecting to one of the DBs it needed.
Edit:
Specifics have been requested. Since this is a very old post and I don't have access to AWS anymore, I can venture a guess on what I did.
If my memory serves correct, the db I was blocked by was hosted in a different aws account. Changing the security group on the DB was not an option as security on that account was firmly maintained. The reason I was able to connect locally was because the facility I was working at had a whitelisted IP on the DB security groups. I eventually settled on running the script on my local machine, since my machine rarely left that location and it did not matter where the script was run, just that it ran periodically. Ideally though, I would have been able to change the security group on the db to allow incoming traffic from my server.

Resources