Can not override DynamoDB endpoint for Kinesis Consumer - node.js

Can not set up my local environment through aws-sdk, localstack and aws-kcl. After creating the consumer and trying to run it on my local environment I am getting an error that my credentials are incorrect.
So Kinesis consumer always go to the real Amazon DynamoDB, and I can not point it to my localstack dynamodb. The question is: how can I point it to my local dynamodb?

I believe there are a few issues currently with connecting the multi lang daemon with the Kinesis Consumer Library, but I believe the settings you are looking for are buried within the kcl.properties, by adding these settings:
kinesisEndpoint = http://localhost:4568
dynamoDBEndpoint = http://localhost:4569
It should make the Multi Lang Daemon point to your local instances of kinesis and dynamo.
I've tried this multiple times with DotNet and it seems to be having issues further down the pipeline, but for now I hope this helps!

Related

Stream changes from Amazon RDS for PostgreSQL using Amazon Kinesis Data Streams and AWS Lambda

I am trying to follow this documentation- https://aws.amazon.com/blogs/database/stream-changes-from-amazon-rds-for-postgresql-using-amazon-kinesis-data-streams-and-aws-lambda/
in order to stream changes in the Postgres database (running in AWS rds) to AWS-Kinesis. When I run the given code in ec2 (or on my local system), it works and it prints any crud operation in the terminal. However, option 2 using lambda does not work. Nowhere has it been mentioned that how the lambda will be triggered. Also, a lambda is supposed to run for a max of 15 min. I am highly confused and would really like some help on this.

Change failover time for aws kcl

AWS recommends to increase failover time for KCL (kinesis), if apps with connectivity issues.
https://docs.aws.amazon.com/streams/latest/dev/troubleshooting-consumers.html
But I can’t find how failover time can be changed.
I’m looking for (one or all):
settings in AWS console
settings for the node.js kcl package
settings by Terraform
The failover time is a configuration option for the Kinesis Client Library. It is not a property on the stream. As a result, you cannot change it in the AWS console.
Configuring AWS Kinesis Client library for Node.js is done using property files. I assume you already have a property file otherwise you wouldn't be able to start up your consumer application. What you need to do is add this to your property file:
# Fail over time in milliseconds.
failoverTimeMillis = 10000
See this sample property file provided by the library:
https://github.com/awslabs/amazon-kinesis-client-nodejs/blob/master/samples/basic_sample/consumer/sample.properties#L38
Also see this documentation for more detail on how to change the property file:
https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-implementation-app-nodejs.html#kinesis-record-processor-initialization-nodejs

Lambda starts timing out randomly when communicating with DynamoDB

I have a Node.js Lambda code base that talks to tiny dataset in DynamoDB (less than 400 byte each). Every now and then the function will time out over 5 minutes whilst doing a get() request to DynamoDB (via dynamoDbdAWS.DynamoDB.DocumentClient();).
The problem is it's completely random as to when this issue will occur but when it works it take ~2 second from a cold start, so taking over 5 minutes to run makes no sense and at random points.
It's a dev environment so only myself is using this, and I'm doing maybe 10 requests a day
context.callbackWaitsForEmptyEventLoop = false; has been set
Memory allocation never exceeds 45MB (128MB set)
I'm testing directly in Lambda
The code is deployed via Serverless
When testing, using Serverless, locally it works whilst the Lambda fails
I've inherited this project but have a good understanding of the architecture around it and it's fairly simple but I've not done much work with Lambda before.
Any ideas what I should look for or any known issues will be a massive help.
It sounds like one (or more) of the VPC subnets the Lambda function is configured to run in doesn't have a route to a NAT Gateway (or an AWS PrivateLink configuration). So whenever that subnet is used by the Lambda function it is unable to access the AWS API.
If the Lambda function doesn't actually need to access any resources in the VPC then it is much better to not configure it to use the VPC at all.

Lambda lost connection to RDS at 01:00 2019-01-12 (EU/London)

I have a set of lambda functions that processes messages on an SQS stack. They take data sets, process them and store the results in an RDS MySQL database, which it connects to via VPC. Both the Lambda functions and the RDS database are in the same availability zone.
This has been working for the last couple of months without any issues, but early this morning (2019-01-12) at 01:00 I started seeing lambda timeouts and messages being moved into the dead letter queue.
I've done some troubleshooting and confirmed the reason for the timeouts is the inability for Lambda to establish a connection to the database server.
The RDS server is public, but locked down to allow access only through VPC and 2 public IPs.
I've taken the following steps so far to try and resolve the issue:
Given the lambda service role admin rights to rule out IAM issues
Unassigned VPC from the lambda functions and opened up RDC inbound access from 0.0.0.0/0 to rule out VPC issues.
Restarted the RDS hosts, the good ol' off'n'on again.
Used serverless to invoke the lambda functions locally with test data (worked). My local machine connects to the public RDS IP, not through VPC.
Changed the runtime environment from 3.6 to 3.7
It doesn't appear to be a code issue, as it's been working flawlessly for the past couple of months and I can invoke locally without issue and my Elastic Beanstalk instance, which sits on the same VPC subnet continues to connect through VPC without issue.
Here's the code I'm using to connect:
connectionString = 'mysql+pymysql://{0}:{1}#{2}/{3}'.format(os.environ['DB_USER'], os.environ['DB_PASSWORD'], os.environ['DB_HOST'], os.environ['DB_SCHEMA'])
engine = create_engine(connectionString, poolclass=NullPool)
with engine.connect() as con: <--- breaking here
meta = MetaData(engine, reflect=True) <-- never gets to here
I double checked the connection string & user accounts, both are correct/working locally.
If someone could point me in the right direction, I'd be grateful!
My first guess is that you've hit a connection limit on the RDS database. Because Lambdas can be executed concurrently (this could easily be the case if there were suddenly a lot of messages in your SQS queue), and each execution opens a new connection to your DB, the connection pool can get saturated.
If this is the case, you can set a concurrent execution limit on your Lambda function to prevent this.
A side note - it is not recommended to use a database with a persistent connection in a serverless architecture exactly for this reason. AFAIK, AWS is working on a better solution to use RDS from Lambda, but it's not available yet.
So...
I was changing security groups and it was having no effect on the RDS host, at one point I removed all access and I could still connect, which is crazy. At this point I started to think the outage on Friday night put the underlying RDS host into a weird state. I put the Security Groups back to the way they should be, stopped & started (restart had no effect) the RDS host and everything started to work again.
Very frustrating, but happy it's finally resolved.

Connection timed out - using sqlalchemy to access AWS usaspending data

I created an instance of the usaspending.gov database in my AWS RDS. A description of this database can be found here: https://aws.amazon.com/public-datasets/usaspending/
The data are available as a PostgreSQL snapshot, and I would like to access the database using Python's sqlalchemy package within a Jupyter notebook within Amazon SageMaker.
I tried to set up my database connection with the code below, but I'm getting a Connection timed out error. I'm pretty new to AWS and Sagemaker, so maybe I messed up my sqlalchemy engine? I think my VPC security settings are OK (it looks like they accept inbound and outbound requests).
Any ideas what I could be missing?
engine = create_engine(‘postgresql://root:password#[my endpoint]/[DB instance]
(I'm a member of the SageMaker team.)
Thanks for using Amazon SageMaker!
Can you check the VPC settings on your side? It might be related to an issue another AWS customer saw for Redshift, https://forums.aws.amazon.com/thread.jspa?threadID=270111&tstart=0 . The issue might be that DNS not being resolved to private IP address of the RDS instance. If following the forum post https://forums.aws.amazon.com/thread.jspa?threadID=270111&tstart=0 , your issue doesn't get resolved, you can start a forum thread with AWS SageMaker. We would help you debug the issue.

Resources