Connection timed out - using sqlalchemy to access AWS usaspending data - python-3.x

I created an instance of the usaspending.gov database in my AWS RDS. A description of this database can be found here: https://aws.amazon.com/public-datasets/usaspending/
The data are available as a PostgreSQL snapshot, and I would like to access the database using Python's sqlalchemy package within a Jupyter notebook within Amazon SageMaker.
I tried to set up my database connection with the code below, but I'm getting a Connection timed out error. I'm pretty new to AWS and Sagemaker, so maybe I messed up my sqlalchemy engine? I think my VPC security settings are OK (it looks like they accept inbound and outbound requests).
Any ideas what I could be missing?
engine = create_engine(‘postgresql://root:password#[my endpoint]/[DB instance]

(I'm a member of the SageMaker team.)
Thanks for using Amazon SageMaker!
Can you check the VPC settings on your side? It might be related to an issue another AWS customer saw for Redshift, https://forums.aws.amazon.com/thread.jspa?threadID=270111&tstart=0 . The issue might be that DNS not being resolved to private IP address of the RDS instance. If following the forum post https://forums.aws.amazon.com/thread.jspa?threadID=270111&tstart=0 , your issue doesn't get resolved, you can start a forum thread with AWS SageMaker. We would help you debug the issue.

Related

Can not override DynamoDB endpoint for Kinesis Consumer

Can not set up my local environment through aws-sdk, localstack and aws-kcl. After creating the consumer and trying to run it on my local environment I am getting an error that my credentials are incorrect.
So Kinesis consumer always go to the real Amazon DynamoDB, and I can not point it to my localstack dynamodb. The question is: how can I point it to my local dynamodb?
I believe there are a few issues currently with connecting the multi lang daemon with the Kinesis Consumer Library, but I believe the settings you are looking for are buried within the kcl.properties, by adding these settings:
kinesisEndpoint = http://localhost:4568
dynamoDBEndpoint = http://localhost:4569
It should make the Multi Lang Daemon point to your local instances of kinesis and dynamo.
I've tried this multiple times with DotNet and it seems to be having issues further down the pipeline, but for now I hope this helps!

Connecting to public Redshift DB from Lambda with no VPC

On AWS, I have an API Gateway setup that calls a lambda function which in turns accesses a Redshift database. All of these services are within the same VPC and work. The only problem is that every api call takes a minimum 10 seconds just for spinning up the Lambda function inside a VPC.
From what I've read, if we were to move the Lambda function outside of the VPC it should be able to avoid that 10 second startup. However, is it still possible to connect to the redshift db at that point? The redshift db is publicly accessible but does the lambda function need a VPC in order to access the internet/public redshift db?
As others suggested in comments, I would say, look into your Lambda code and see if the dependencies are really complex that it takes so much time in initialization.
I far as I understand, its going to take same time irrespective of its inside the VPC or outside.
There is something call as "Cold start / warm call with AWS Lambda", its time when initialization is taking place. As initialization requires building downloading the code, making container up, initializing the container and eventually executing the code.
Its nicely explained here.
https://blog.octo.com/en/cold-start-warm-start-with-aws-lambda/
"The initialization time of a Lambda represents a significant part of the total time. After a cold start, the Lambda will remain instantiated for a while (5 minutes) allowing any other call not to have to wait for this initialization to be done each time."
Regarding your second question, should you put Lambda outside, so the best practice suggests that "don't put Lambda inside the VPC unless you have to".
https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html
So it turns out i was having a timeout issue for the lambda connecting to the redshift db because the zone in the VPC that the redshift db lives in didn't have an IGW route table associated to it. I fixed that and then all I had to do was remove the lambda from its vpc and things just worked.
Long story short: Make sure your redshift db has public internet access.

Lambda lost connection to RDS at 01:00 2019-01-12 (EU/London)

I have a set of lambda functions that processes messages on an SQS stack. They take data sets, process them and store the results in an RDS MySQL database, which it connects to via VPC. Both the Lambda functions and the RDS database are in the same availability zone.
This has been working for the last couple of months without any issues, but early this morning (2019-01-12) at 01:00 I started seeing lambda timeouts and messages being moved into the dead letter queue.
I've done some troubleshooting and confirmed the reason for the timeouts is the inability for Lambda to establish a connection to the database server.
The RDS server is public, but locked down to allow access only through VPC and 2 public IPs.
I've taken the following steps so far to try and resolve the issue:
Given the lambda service role admin rights to rule out IAM issues
Unassigned VPC from the lambda functions and opened up RDC inbound access from 0.0.0.0/0 to rule out VPC issues.
Restarted the RDS hosts, the good ol' off'n'on again.
Used serverless to invoke the lambda functions locally with test data (worked). My local machine connects to the public RDS IP, not through VPC.
Changed the runtime environment from 3.6 to 3.7
It doesn't appear to be a code issue, as it's been working flawlessly for the past couple of months and I can invoke locally without issue and my Elastic Beanstalk instance, which sits on the same VPC subnet continues to connect through VPC without issue.
Here's the code I'm using to connect:
connectionString = 'mysql+pymysql://{0}:{1}#{2}/{3}'.format(os.environ['DB_USER'], os.environ['DB_PASSWORD'], os.environ['DB_HOST'], os.environ['DB_SCHEMA'])
engine = create_engine(connectionString, poolclass=NullPool)
with engine.connect() as con: <--- breaking here
meta = MetaData(engine, reflect=True) <-- never gets to here
I double checked the connection string & user accounts, both are correct/working locally.
If someone could point me in the right direction, I'd be grateful!
My first guess is that you've hit a connection limit on the RDS database. Because Lambdas can be executed concurrently (this could easily be the case if there were suddenly a lot of messages in your SQS queue), and each execution opens a new connection to your DB, the connection pool can get saturated.
If this is the case, you can set a concurrent execution limit on your Lambda function to prevent this.
A side note - it is not recommended to use a database with a persistent connection in a serverless architecture exactly for this reason. AFAIK, AWS is working on a better solution to use RDS from Lambda, but it's not available yet.
So...
I was changing security groups and it was having no effect on the RDS host, at one point I removed all access and I could still connect, which is crazy. At this point I started to think the outage on Friday night put the underlying RDS host into a weird state. I put the Security Groups back to the way they should be, stopped & started (restart had no effect) the RDS host and everything started to work again.
Very frustrating, but happy it's finally resolved.

Connecting Lambda to both RDS and S3

I have a Node.js task that converts values from my database to MP3 files, then uploads them to s3 storage. The code works beautifully when executing it on my laptop. I decided to migrate it to Lambda so I can run it automatically every couple hours. I made a few minor modifications, and again, it works great. But here's the catch: it's only working when my RDS instance is set to allow connections from ANY IP. Obviously, that's an unacceptable security risk.
I put my database and Lambda code in the same VPC and security group, but even so, my code wouldn't connect to S3. Then, I added an endpoint for S3, and it looked like everything was working per my console logs. However, the file in S3 storage is empty (0 bytes).
What do I need to change? I've heard that I might need to configure my VPC to have internet access, but I'm not sure if that's what I need to do. And honestly, those tutorials seem confusing to me.
Can someone point me in the right direction?
It is a known problem (known by users, not really acknowledged by AWS that I've seen) The lambda vps docs say:
http://docs.aws.amazon.com/lambda/latest/dg/vpc.html
"When a Lambda function is configured to run within a VPC, it incurs
an additional ENI start-up penalty. This means address resolution may
be delayed when trying to connect to network resources."
And
"If your Lambda function accesses a VPC, you must make sure that your
VPC has sufficient ENI capacity to support the scale requirements of
your Lambda function. "
Source: https://forums.aws.amazon.com/thread.jspa?messageID=767285
This means it has serious drawbacks that make it unworkable:
speed penalty
have to manually setup scaling
have to pay for NAT gateway 0.059 per hour (https://aws.amazon.com/vpc/pricing/)

Is AWS Lambda using Elastic IP?

First my question: are AWS Lambda "instances" using EIP?
My background:
I'm using lambda as solution to reduce my application load of certain task(download youtube videos).
In the past I was having problems trying to do this very thing in my ec2 instances, in which I used them with EIP, which always returned limit exceed message, and prompted human captcha verification. I solved this at that time by using the instances without EIP and worked like a charm.
Now using lambda for certain videos it throws me Error: Code 150: The uploader has not made this video available in your country. and I double checked that the video was not blocked for US, and it wasn't. So I decided to go back and test with an instance with EIP, and that was it, the same message that was been returned in my lambda function.
It seems to be a change from youtube, because around 3-4 months ago the error when using EIP was limit exceed, but now it turned to country blocked issue. So it's like lambda uses EIP or alike service which youtube doesn't seems to like.
PS: I'm running my lambda function with nodejs and download the videos with ytdl-core btw.
PS2: I asked this very question in aws forums but no luck so far in a week or so. So I decided to try asking here.
Thanks in advance
AWS Lambda is not the same an as EC2 instance. It runs on containers within the AWS infrastructure. Traffic would "appear" to be coming from certain IP addresses, but there is no way to configure which IP address is used.
It is possible that the range of "IP addresses from which Lambda appears to come" is not correctly updated in the geo-database used by the video service, and it thinks they are located in a different location.
Bottom line: There is nothing you can configure.

Resources