AWS Lambda issue connecting to DocumentDB - python-3.x

Okay, I have written an AWS Lambda function which pulls data from an API and inserts the data into a DocumentDB database. When I connect to my cluster from the shell and run my python script it works just fine and inserts the data no problem.
But, when I implement the same logic into a lambda function is does not work. Below is an example of what would work in the shell but not through a Lambda function.
import urllib3
import json
import certifi
import pymongo
from pymongo import MongoClient
# Make our connection to the DocumentDB cluster
# (Here I use the DocumentDB URI)
client = MongoClient('mongodb://admin_name_here:<insertYourPassword>my_docdb_cluster/?ssl=true&ssl_ca_certs=rds-combined-ca-bundle.pem&retryWrites=false')
# Specify the database to use
db = client.my_db
# Specify the collection to use
col = db.my_col
col.insert_one({"name": "abcdefg"})
The above works just fine in the shell but when run in Lambda I get the following error:
[ERROR] ServerSelectionTimeoutError: my_docdb_cluster timed out, Timeout: 30s, Topology Description: <TopologyDescription id: ***********, topology_type: ReplicaSetNoPrimary, servers: [<ServerDescription (my_docdb_cluster) server_type: Unknown, rtt: None, error=NetworkTimeout(my_docdb_cluster timed out')>]>
From my understanding, this error is telling me that the replica set has no primary. But, that is not true there definitely is a primary in my replica set. Does anyone know what could be the problem here?

Related

How to get pyodbc connection on AWS MWAA Airflow DAG?

I tried to put in requirements.txt for MWAA Airflow with pyodbc=4.0.30 and in code, made connection string like
dbconnection = pyodbc.connect("Driver={ODBC Driver 17 for SQL Server};Server="+Server+";Database="+Database+";UID="+UserID+";PWD="+Password, autocommit=True)
Now the error is Broken DAG: [/usr/local/airflow/dags/test.py] No module named 'pyodbc'
Version of Airflow: 1.10.12
There is hardly any documentation on SQL Server/ Postgres based connection on MWAA AWS documentation, especially for pyodbc connection, I earlier got this issue with lambda functions and figured it out with lambda layers, but not sure how MWAA works, any suggestions appreciated.
Please don't recommend any other technology like EC2 to host Airflow as the company is very rigid to use MWAA Airflow.
import pymssql
conn = pymssql.connect(
server=server,
user=username,
password=password,
database=database
)
query ="select IDpk,name,Remarks from TestTable"
df = pd.read_sql(query,conn)

Lambda function gets stuck when calling RDS via SQLalchemy URI

I have a fast API application. Initially, I was passing my DB URI via ngrok tunnel like this in my SAM template. In this setup Lambda will be using my local machine's PSQL DB.
DbConnnectionString:
Type: String
Default: postgresql://<uname>:<pwd>#x.tcp.ngrok.io:PORT/DB
This is how I read the URI in my Python code
# config.py
DATABASE_URL = os.environ.get('DB_URI')
db_engine = create_engine(DATABASE_URL)
db_session = sessionmaker(autocommit=False, autoflush=False,bind=db_engine)
print(f"Configs initialized for {API_V1_STR}")
# app.py
# 3rd party
from fastapi import FastAPI
# Custom
from config.app_config import PROJECT_NAME, db_engine
from models.db_models import Base
print("Creating all database")
Base.metadata.create_all(bind=db_engine)
app = FastAPI(title=PROJECT_NAME)
print("APP created")
In this setup, everything seems to work as expected.
But whenever I replace the DB URL with RDS DB, suddenly the call gets stuck at create all database step as shown in the image below. when this happens the lambda always times out and throws exceptions.
If I run the code locally using uvicorn this error doesn't occur.
Everything works as expected.
When I use sam local invoke even with RDS URL, the API call works without any issues.
This problem occurs only while deployed in AWS Lambda.
I notice that configs are initialized twice in this setup, Once before START request ID and once after.
I have tried reading up on it but not clear what could I do to fix this. Any help would be much appreciated.
It was my bad!. I didn't pay attention to security groups. It was a connection timeout all along. Once I fixed the port access in Security groups, lambda started working as expected.

Connecting to MongoDB in Kubernetes pod with kubernetes-client using Python

I have a MongoDB instance running on Kubernetes and I'm trying to connect to it using Python with the Kubernetes library.
I'm connecting to the context on cmd line using:
kubectl config use-context CONTEXTNAME
With Python, I'm using:
from kubernetes import client, config
config.load_kube_config(
context = 'CONTEXTNAME'
)
To connect to MongoDB in cmd line:
kubectl port-forward svc/mongo-mongodb 27083:27017 -n production &
I then open a new terminal and use PORT_FORWARD_PID=$! to connect
I'm trying to get connect to the MongoDB instance using Python with the Kubernetes-client library, any ideas as to how to accomplish the above?
Define a kubernetes service for example like this, and then reference your mongodb using a connection string similar to mongodb://<service-name>.default.svc.cluster.local
My understanding is that you need to find out your DB Client Endpoint.
That could be achieved if you follow this article MongoDB on K8s
make sure you got the URI for MongoDB.
(example)
“mongodb://mongo-0.mongo,mongo-1.mongo,mongo-2.mongo:27017/dbname\_?”
and after that, you can call your DB client in Python script.
import pymongo
import sys
##Create a MongoDB client
client = pymongo.MongoClient('mongodb://......')
##Specify the database to be used
db = client.test
##Specify the collection to be used
col = db.myTestCollection
##Insert a single document
col.insert_one({'hello':'world'})
##Find the document that was previously written
x = col.find_one({'hello':'world'})
##Print the result to the screen
print(x)
##Close the connection
client.close()
Hope that will give you an idea.
Good luck!

list_datasets() method does nothing in AWS Lambda

I am trying to get the list of datasets from BigQuery inside the AWS lambda. But, while executing the client.list_datasets() method it does nothing and lambda is timed out.
My code is as follows:
from google.cloud.bigquery import Client
from google.oauth2.service_account import Credentials
credentials = Credentials.from_service_account_info(
service_account_dict)
client = Client(
project=service_account_dict.get("project_id"),
credentials=credentials
)
datasets = client.list_datasets()
print(datasets)
for dataset in datasets:
print("dataset info", dataset.__dict__)
The output of first print statement is:
<google.api_core.page_iterator.HTTPIterator object at 0x7fbae4975550>
But, the second print for dataset.__dict__ is not being printed. Or, looping over the HTTPIterator object is not performed.
BTW, the code works perfectly fine in local machine.
The AWS VPC that I used in lambda function was causing this issue. The VPC blocked requests to the external API (in my case BigQuery API).
Configuring the VPC subnet and NAT Gateway to expose lambda function to the internet (0.0.0.0/0) solved the issue.

Flask SQLalchemy can't connect to Google Cloud Postgresql database with Unix socket

I am using Flask SQLalchemy in my google app engine standard environment project to try and connect to my GCP Postgresql database..
According to google docs, the url can be created in this format
# postgres+pg8000://<db_user>:<db_pass>#/<db_name>?unix_socket=/cloudsql/<cloud_sql_instance_name>
and below is my code
from flask import Flask, request, jsonify
import constants
app = Flask(__name__)
# Database configuration from GCP postgres+pg8000
DB_URL = 'postgres+pg8000://{user}:{pw}#/{db}?unix_socket=/cloudsql/{instance_name}'.format(user=user,pw=password,db=dbname, instance_name=instance_name)
app.config['SQLALCHEMY_DATABASE_URI'] = DB_URL
app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False # silence the
deprecation warning
sqldb = SQLAlchemy(app)
This is the error i keep getting:
File "/env/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 412, in connect return self.dbapi.connect(*cargs, **cparams) TypeError: connect() got an unexpected keyword argument 'unix_socket'
The argument to specify a unix socket varies depending on what driver you use. According to the pg8000 docs, you need to use unix_sock instead of unix_socket.
To see this in the context of an application, you can take a look at this sample application.
It's been more than 1.5 years and no one has posted the solution yet :)
Anyway, just use the below URI
postgres+psycopg2://<db_user>:<db_pass>#<public_ip>/<db_name>?host=/cloudsql/<cloud_sql_instance_name>
And yes, don't forget to add your systems public IP address to the authorized network.
Example of docs
As you can read in the gcloud guides, an examplary connection string is
postgres+pg8000://<db_user>:<db_pass>#/<db_name>?unix_sock=<socket_path>/<cloud_sql_instance_name>/.s.PGSQL.5432
Varying engine and socket part
Be aware that the engine part postgres+pg8000 varies depending on your database and used driver. Also, depending on your database client library, the socket part ?unix_sock=<socket_path>/<cloud_sql_instance_name>/.s.PGSQL.5432 may be needed or can be omitted, as per:
Note: The PostgreSQL standard requires a .s.PGSQL.5432 suffix in the socket path. Some libraries apply this suffix automatically, but others require you to specify the socket path as follows: /cloudsql/INSTANCE_CONNECTION_NAME/.s.PGSQL.5432.
PostgreSQL and flask_sqlalchemy
For instance, I am using PostgreSQL with flask_sqlalchemy as database client and pg8000 as driver and my working connection string is only postgres+pg8000://<db_user>:<db_pass>#/<db_name>.

Resources