Using terraform and AWS I've created a Postgres RDS DB within a VPC. During creation, the superuser is created, but it seems like a bad idea to use this user inside my app to run queries.
I'd like to create an additional access-limited DB user during the terraform apply phase after the DB has been created. The only solutions I've found expect the DB to be accessible outside the VPC. For security, it also seems like a bad idea to make the DB accessible outside the VPC.
The first thing that comes to mind is to have terraform launch an EC2 instance that creates the user as part of the userdata script and then promptly terminates. This seems like a pretty heavy solution and terraform would create the instance each time terraform apply is run unless the instance is not terminated at the end of the script.
Another option is to pass the superuser and limited user credentials to the server that runs migrations and have that server create the user as necessary. This, however, would mean that the server would have access to the superuser and could do some nefarious things if someone got access to it.
Are there other common patterns for solving this? Do people just use the superuser for everything or open the DB to the outside world?
Related
On AWS, I have an API Gateway setup that calls a lambda function which in turns accesses a Redshift database. All of these services are within the same VPC and work. The only problem is that every api call takes a minimum 10 seconds just for spinning up the Lambda function inside a VPC.
From what I've read, if we were to move the Lambda function outside of the VPC it should be able to avoid that 10 second startup. However, is it still possible to connect to the redshift db at that point? The redshift db is publicly accessible but does the lambda function need a VPC in order to access the internet/public redshift db?
As others suggested in comments, I would say, look into your Lambda code and see if the dependencies are really complex that it takes so much time in initialization.
I far as I understand, its going to take same time irrespective of its inside the VPC or outside.
There is something call as "Cold start / warm call with AWS Lambda", its time when initialization is taking place. As initialization requires building downloading the code, making container up, initializing the container and eventually executing the code.
Its nicely explained here.
https://blog.octo.com/en/cold-start-warm-start-with-aws-lambda/
"The initialization time of a Lambda represents a significant part of the total time. After a cold start, the Lambda will remain instantiated for a while (5 minutes) allowing any other call not to have to wait for this initialization to be done each time."
Regarding your second question, should you put Lambda outside, so the best practice suggests that "don't put Lambda inside the VPC unless you have to".
https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html
So it turns out i was having a timeout issue for the lambda connecting to the redshift db because the zone in the VPC that the redshift db lives in didn't have an IGW route table associated to it. I fixed that and then all I had to do was remove the lambda from its vpc and things just worked.
Long story short: Make sure your redshift db has public internet access.
I have a set of lambda functions that processes messages on an SQS stack. They take data sets, process them and store the results in an RDS MySQL database, which it connects to via VPC. Both the Lambda functions and the RDS database are in the same availability zone.
This has been working for the last couple of months without any issues, but early this morning (2019-01-12) at 01:00 I started seeing lambda timeouts and messages being moved into the dead letter queue.
I've done some troubleshooting and confirmed the reason for the timeouts is the inability for Lambda to establish a connection to the database server.
The RDS server is public, but locked down to allow access only through VPC and 2 public IPs.
I've taken the following steps so far to try and resolve the issue:
Given the lambda service role admin rights to rule out IAM issues
Unassigned VPC from the lambda functions and opened up RDC inbound access from 0.0.0.0/0 to rule out VPC issues.
Restarted the RDS hosts, the good ol' off'n'on again.
Used serverless to invoke the lambda functions locally with test data (worked). My local machine connects to the public RDS IP, not through VPC.
Changed the runtime environment from 3.6 to 3.7
It doesn't appear to be a code issue, as it's been working flawlessly for the past couple of months and I can invoke locally without issue and my Elastic Beanstalk instance, which sits on the same VPC subnet continues to connect through VPC without issue.
Here's the code I'm using to connect:
connectionString = 'mysql+pymysql://{0}:{1}#{2}/{3}'.format(os.environ['DB_USER'], os.environ['DB_PASSWORD'], os.environ['DB_HOST'], os.environ['DB_SCHEMA'])
engine = create_engine(connectionString, poolclass=NullPool)
with engine.connect() as con: <--- breaking here
meta = MetaData(engine, reflect=True) <-- never gets to here
I double checked the connection string & user accounts, both are correct/working locally.
If someone could point me in the right direction, I'd be grateful!
My first guess is that you've hit a connection limit on the RDS database. Because Lambdas can be executed concurrently (this could easily be the case if there were suddenly a lot of messages in your SQS queue), and each execution opens a new connection to your DB, the connection pool can get saturated.
If this is the case, you can set a concurrent execution limit on your Lambda function to prevent this.
A side note - it is not recommended to use a database with a persistent connection in a serverless architecture exactly for this reason. AFAIK, AWS is working on a better solution to use RDS from Lambda, but it's not available yet.
So...
I was changing security groups and it was having no effect on the RDS host, at one point I removed all access and I could still connect, which is crazy. At this point I started to think the outage on Friday night put the underlying RDS host into a weird state. I put the Security Groups back to the way they should be, stopped & started (restart had no effect) the RDS host and everything started to work again.
Very frustrating, but happy it's finally resolved.
I want to connect dynamic mongo DB with my single code according to sub domain url.
eg.
if www.xyz.example.com then mongo DB is xyz
if www.abc.example.com then mongo DB is abc
if www.efg.example.com then mongo DB is efg
if someone hit www.xyz.example.com url then xyz DB automatically connect. if someone hit www.abc.example.com url then abc DB automatically connect.
but xyz DB connection should not disconnect. it should be remain . Because there is single code/project.
Please give a solution.
I'm not quite sure about your application use case so cannot assure the best solution.
One feasible solution is to run 3 node.js threads on 3 different ports, each connect to a specific DB instance. You can do it by running 3 different node.js process with different environment variables. Then forward the requests to each domain to different ports.
This approach has some advantages:
Ease of configuration, just need to care about deployment setting without if/else hacking in source code.
System availability, if 1 of the 3 DBs is down, only 1 domain affected, the others still work well.
NOTE: This approach just works well with small number of sub domains. If you have 30 sub domains or dynamic domains, then please re-consider your deployment architecture :). You may need to use some more advanced techniques to deal with it. A quick (but not best) way is to maintain a list of mongoose instances inside the application during application runtime, each instance is responsible for 1 sub domains. Then use req.get('host') to check the sub domain and use the corresponding mongoose instance to process the DB operations.
I have just started using beanstalkd and pheanstalk and I am curious whether the following situation is a security issue (and if not, why not?):
When designing a queue that will contain jobs for an eventual worker script to pick up and preform SQL database queries, I asked a friend what I could do to prevent an online user from going into port 11300 of my server, and inserting a job into the queue himself and hence causing the job to be executed with malicious code. I was told that I could include a password inside the job being sent.
Though after some time passed, I recognized that someone could preform a few simple commands on a terminal and obtain the job inside the queue, and hence find the password, and then create jobs with the password included:
telnet thewebsitesipaddress 11300 //creating a telnet connection
list-tubes //finding which tubes are currently being used
use a_tube_found //using one of the tubes found
peek-ready //see whats inside one of the jobs and find the password
What could be done to make sure this does not happen and my queue doesn't get hacked / controlled?
Thanks in advance!
You can avoid those situations by placing beanstalkd behind a firewall or in a private network.
DigitalOcean (for example) offers such a service where you have a private network IP address which can be accessed only from servers of the same location.
We've been using beanstalkd in our company for more than a year, and we haven't had any of those issues yet.
I see, but what if the producer was a page called index.php, where when someone entered it, a job would be sent to the queue. In this situation, wouldn't the server have to be an open network?
The browser has no way to get in contact with the job server, it only access the resources /you/ allow them to, that is the view page. Only the back-end is allowed to access the job server. Also, if you build the web application in a certain way that the front-end is separated from the back-end, you're going to have even less potential security issues.
Trying to give servers production access to more ops people in our team.
Only issue is the DB access concern. For most tasks ops do not need DB access and only limited people should have such access.
Let's say we have two servers:
Application Server:
tomcat (app needs access to DB server)
DB server:
Database
So ultimately we would like to give root access to the "application server" so that ops can do all sorts maintenance on the server but not be able to gain access to the DB server. This means I cannot just store DB pass in a configuration files for the app to read for example.
Are there well known practices that would solve issue like that?
First any credential that the 'Application Server' has to access the 'DB Server' should be considered handed over to anyone with root on the Application Server. Since you say that DB access must be limited you cannot give ops complete root on the Application Server.
But do not lose hope, there is sudo.
sudo can give users or groups access to root power, but only for limited purposes. Unfortunately setting up sudo correctly can be tricky to prevent subshells and wildcards from getting full root, but it is possible.
There are too many permutations for a general answer beyond sudo without additional information about your use case. A great reference for sudo with exactly this use case in mind is 'Sudo Mastery' by Michael W. Lucas.