Lambda function gets stuck when calling RDS via SQLalchemy URI

Lambda function gets stuck when calling RDS via SQLalchemy URI - python-3.x

I have a fast API application. Initially, I was passing my DB URI via ngrok tunnel like this in my SAM template. In this setup Lambda will be using my local machine's PSQL DB.
DbConnnectionString:
Type: String
Default: postgresql://<uname>:<pwd>#x.tcp.ngrok.io:PORT/DB
This is how I read the URI in my Python code
# config.py
DATABASE_URL = os.environ.get('DB_URI')
db_engine = create_engine(DATABASE_URL)
db_session = sessionmaker(autocommit=False, autoflush=False,bind=db_engine)
print(f"Configs initialized for {API_V1_STR}")
# app.py
# 3rd party
from fastapi import FastAPI
# Custom
from config.app_config import PROJECT_NAME, db_engine
from models.db_models import Base
print("Creating all database")
Base.metadata.create_all(bind=db_engine)
app = FastAPI(title=PROJECT_NAME)
print("APP created")
In this setup, everything seems to work as expected.
But whenever I replace the DB URL with RDS DB, suddenly the call gets stuck at create all database step as shown in the image below. when this happens the lambda always times out and throws exceptions.
If I run the code locally using uvicorn this error doesn't occur.
Everything works as expected.
When I use sam local invoke even with RDS URL, the API call works without any issues.
This problem occurs only while deployed in AWS Lambda.
I notice that configs are initialized twice in this setup, Once before START request ID and once after.
I have tried reading up on it but not clear what could I do to fix this. Any help would be much appreciated.

It was my bad!. I didn't pay attention to security groups. It was a connection timeout all along. Once I fixed the port access in Security groups, lambda started working as expected.

Related

list_datasets() method does nothing in AWS Lambda

I am trying to get the list of datasets from BigQuery inside the AWS lambda. But, while executing the client.list_datasets() method it does nothing and lambda is timed out.
My code is as follows:
from google.cloud.bigquery import Client
from google.oauth2.service_account import Credentials
credentials = Credentials.from_service_account_info(
service_account_dict)
client = Client(
project=service_account_dict.get("project_id"),
credentials=credentials
)
datasets = client.list_datasets()
print(datasets)
for dataset in datasets:
print("dataset info", dataset.__dict__)
The output of first print statement is:
<google.api_core.page_iterator.HTTPIterator object at 0x7fbae4975550>
But, the second print for dataset.__dict__ is not being printed. Or, looping over the HTTPIterator object is not performed.
BTW, the code works perfectly fine in local machine.

The AWS VPC that I used in lambda function was causing this issue. The VPC blocked requests to the external API (in my case BigQuery API).
Configuring the VPC subnet and NAT Gateway to expose lambda function to the internet (0.0.0.0/0) solved the issue.

How to trigger a function before Gunicorn starts it's main process

I'm setting up a Flask app with Gunicorn in a Docker environment.
When I want to spin up my containers, I want my Flask container to create database tables (based on my models) if my database is empty. I included a function in my wsgi.py file, but that seems to trigger the function each time a worker is initialized. After that I tried to use server hooks in my gunicorn.py config file, like below.
"""gunicorn WSGI server configuration."""
from multiprocessing import cpu_count
from setup import init_database
def on_starting(server):
"""Executes code before the master process is initialized"""
init_database()
def max_workers():
"""Returns an amount of workers based on the number of CPUs in the system"""
return 2 * cpu_count() + 1
bind = '0.0.0.0:8000'
worker_class = 'eventlet'
workers = max_workers()
I expect gunicorn to trigger the on_starting function automatically but the hook never seems to trigger. The app seems to startup normally, but when I try to make a request that wants to insert a database entry it says that the table doesn't exist. How do I trigger the on_starting hook?

I fixed my issue by preloading the app first before creating workers to serve my app. I did this by adding this line to my gunicorn.py config file:
...
preload_app = True
This way the app is already running and can accept commands to create the necessary database tables.

Gunicorn imports a module in order to get at app (or whatever other name you tell Gunicorn the WSGI application object lives at). During that import, which happens before Gunicorn starts directing traffic to the app, code is executing. Put your startup code there, after you've created db (assuming you're using SQLAlchemy), and imported your models (so that SQLAlchemy will know about then and will hence know what tables to create).
Alternatively, populate your container with an pre-created database.

Flask SQLalchemy can't connect to Google Cloud Postgresql database with Unix socket

I am using Flask SQLalchemy in my google app engine standard environment project to try and connect to my GCP Postgresql database..
According to google docs, the url can be created in this format
# postgres+pg8000://<db_user>:<db_pass>#/<db_name>?unix_socket=/cloudsql/<cloud_sql_instance_name>
and below is my code
from flask import Flask, request, jsonify
import constants
app = Flask(__name__)
# Database configuration from GCP postgres+pg8000
DB_URL = 'postgres+pg8000://{user}:{pw}#/{db}?unix_socket=/cloudsql/{instance_name}'.format(user=user,pw=password,db=dbname, instance_name=instance_name)
app.config['SQLALCHEMY_DATABASE_URI'] = DB_URL
app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False # silence the
deprecation warning
sqldb = SQLAlchemy(app)
This is the error i keep getting:
File "/env/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 412, in connect return self.dbapi.connect(*cargs, **cparams) TypeError: connect() got an unexpected keyword argument 'unix_socket'

The argument to specify a unix socket varies depending on what driver you use. According to the pg8000 docs, you need to use unix_sock instead of unix_socket.
To see this in the context of an application, you can take a look at this sample application.

It's been more than 1.5 years and no one has posted the solution yet :)
Anyway, just use the below URI
postgres+psycopg2://<db_user>:<db_pass>#<public_ip>/<db_name>?host=/cloudsql/<cloud_sql_instance_name>
And yes, don't forget to add your systems public IP address to the authorized network.

Example of docs
As you can read in the gcloud guides, an examplary connection string is
postgres+pg8000://<db_user>:<db_pass>#/<db_name>?unix_sock=<socket_path>/<cloud_sql_instance_name>/.s.PGSQL.5432
Varying engine and socket part
Be aware that the engine part postgres+pg8000 varies depending on your database and used driver. Also, depending on your database client library, the socket part ?unix_sock=<socket_path>/<cloud_sql_instance_name>/.s.PGSQL.5432 may be needed or can be omitted, as per:
Note: The PostgreSQL standard requires a .s.PGSQL.5432 suffix in the socket path. Some libraries apply this suffix automatically, but others require you to specify the socket path as follows: /cloudsql/INSTANCE_CONNECTION_NAME/.s.PGSQL.5432.
PostgreSQL and flask_sqlalchemy
For instance, I am using PostgreSQL with flask_sqlalchemy as database client and pg8000 as driver and my working connection string is only postgres+pg8000://<db_user>:<db_pass>#/<db_name>.

Pulling image from ECR via docker-py

I have a script that retrieves a login for ECR, authenticates a DockerClient instance with the login credentials (reauth set to True), and then attempts to pull a nominated container image.
The code seems to work perfectly when running on my local machine interacting with docker daemon on an EC2 instance, but when running from the EC2 instance I am constantly getting
404 Client Error: Not Found ("repository XXXXXXXX.dkr.ecr.eu-west-2.amazonaws.com/autohld-runner not found: does not exist or no pull access")
The same repo is being used for both executing the code locally and remotely on the EC2 instance. I have tried setting the access to the image within ECR to allow pull for both everyone and my AWS ID. I have granted the role assigned to the EC2 instance Full Admin access also. All with no joy.
If I perform the same tasks on the EC2 instance via command line with the exact same repo URI (copied from the error), it works with no issue.
Is there something I am missing within docker-py ?
url = "tcp://127.0.0.1:2375"
dockerd = docker.DockerClient(base_url=url, version='auto')
dockerd.login(username=ecr.username, password=ecr.password, email='none', registry=ecr.registry, reauth=True)
dockerd.images.pull(ecr.get_repo(instance.tags['Container']), tag='latest')
get_repo returns the full URI as reported in the error message, the Container element is the name 'autohld-runner'
Thanks

It seems that if the registry has been accessed via the cli then an auth token or something is set and docker remembers this allowing subsequent calls to work. However in this case the instance is starting up completely fresh and using the login method within docker-py.
This doesn't seem to pass the credentials on to the pull, I have found that using the auth_config named argument and passing in a dictionary of auth parameters works.
auth_creds = {'username': ecr.username, 'password': ecr.password}
dockerd.images.pull(ecr.get_repo(instance.tags['Container']), tag='latest', auth_config=auth_creds)
HTH

Read/Read-Write URIs for Amazon Web Services RDS

I am using HAProxy to for AWS RDS (MySQL) load balancing for my app, that is written using Flask.
The HAProxy.cfg file has following configuration for the DB
listen mysql
bind 127.0.0.1:3306
mode tcp
balance roundrobin
option mysql-check user haproxy_check
option log-health-checks
server db01 MASTER_DATABSE_ENDPOINT.rds.amazonaws.com
server db02 READ_REPLICA_ENDPOINT.rds.amazonaws.com
I am using SQLALCHEMY and it's URI is:
SQLALCHEMY_DATABASE_URI = 'mysql+pymysql://USER:PASSWORD#127.0.0.1:3306/DATABASE'
but when I am running an API in my test environment, the APIs that are just reading stuff from DB are executing just fine but the APIs that are writing something to DB are giving me errors mostly that:
(pymysql.err.InternalError) (1290, 'The MySQL server is running with the --read-only option so it cannot execute this statement')
I think I need to use 2 URLs now in this scenario, one for read-only operation and one for writes.
How does this work with Flask and SQLALCHEMY with HAProxy?
How do I tell my APP to use one URL for write operations and other HAProxy URL to read-only operations?
I didn't find any help from the documentation of SQLAlchemy.

Binds
Flask-SQLAlchemy can easily connect to multiple databases. To achieve
that it preconfigures SQLAlchemy to support multiple “binds”.
SQLALCHEMY_DATABASE_URI = 'mysql+pymysql://USER:PASSWORD#DEFAULT:3306/DATABASE'
SQLALCHEMY_BINDS = {
'master': 'mysql+pymysql://USER:PASSWORD#MASTER_DATABSE_ENDPOINT:3306/DATABASE',
'read': 'mysql+pymysql://USER:PASSWORD#READ_REPLICA_ENDPOINT:3306/DATABASE'
}
Referring to Binds:
db.create_all(bind='read') # from read only
db.create_all(bind='master') # from master

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Lambda function gets stuck when calling RDS via SQLalchemy URI - python-3.x

It was my bad!. I didn't pay attention to security groups. It was a connection timeout all along. Once I fixed the port access in Security groups, lambda started working as expected.

Related

list_datasets() method does nothing in AWS Lambda

How to trigger a function before Gunicorn starts it's main process

Flask SQLalchemy can't connect to Google Cloud Postgresql database with Unix socket

Pulling image from ECR via docker-py

Read/Read-Write URIs for Amazon Web Services RDS

Categories

Resources