list_datasets() method does nothing in AWS Lambda - python-3.x

I am trying to get the list of datasets from BigQuery inside the AWS lambda. But, while executing the client.list_datasets() method it does nothing and lambda is timed out.
My code is as follows:
from google.cloud.bigquery import Client
from google.oauth2.service_account import Credentials
credentials = Credentials.from_service_account_info(
service_account_dict)
client = Client(
project=service_account_dict.get("project_id"),
credentials=credentials
)
datasets = client.list_datasets()
print(datasets)
for dataset in datasets:
print("dataset info", dataset.__dict__)
The output of first print statement is:
<google.api_core.page_iterator.HTTPIterator object at 0x7fbae4975550>
But, the second print for dataset.__dict__ is not being printed. Or, looping over the HTTPIterator object is not performed.
BTW, the code works perfectly fine in local machine.

The AWS VPC that I used in lambda function was causing this issue. The VPC blocked requests to the external API (in my case BigQuery API).
Configuring the VPC subnet and NAT Gateway to expose lambda function to the internet (0.0.0.0/0) solved the issue.

Related

How to increase the AWS lambda to lambda connection timeout or keep the connection alive?

I am using boto3 lambda client to invoke a lambda_S from a lambda_M. My code looks something like
cfg = botocore.config.Config(retries={'max_attempts': 0},read_timeout=840,
connect_timeout=600) # tried also by including the ,
# region_name="us-east-1"
lambda_client = boto3.client('lambda', config=cfg) # even tried without config
invoke_response = lambda_client.invoke (
FunctionName=lambda_name,
InvocationType='RequestResponse',
Payload=json.dumps(request)
)
Lambda_S is supposed to run for like 6 minutes and I want lambda_M to be still alive to get the response back from lambda_S but lambda_M is timing out, after giving a CW message like
"Failed to connect to proxy URL: http://aws-proxy..."
I searched and found someting like configure your HTTP client, SDK, firewall, proxy or operating system to allow for long connections with timeout or keep-alive settings. But the issue is I have no idea how to do any of these with lambda. Any help is highly appreciated.
I would approach this a bit differently. Lambdas charge you by second, so in general you should avoid waiting in them. One way you can do that is create an sns topic and use that as the messenger to trigger another lambda.
Workflow goes like this.
SNS-A -> triggers Lambda-A
SNS-B -> triggers lambda-B
So if you lambda B wants to send something to A to process and needs the results back, from lambda-B you send a message to SNS-A topic and quit.
SNS-A triggers lambda, which does its work and at the end sends a message to SNS-B
SNS-B triggers lambda-B.
AWS has example documentation on what policies you should put in place, here is one.
I don't know how you are automating the deployment of native assets like SNS and lambda, assuming you will use cloudformation,
you create your AWS::Lambda::Function
you create AWS::SNS::Topic
and in its definition, you add 'subscription' property and point it to you lambda.
So in our example, your SNS-A will have a subscription defined for lambda-A
lastly you grant SNS permission to trigger the lambda: AWS::Lambda::Permission
When these 3 are in place, you are all set to send messages to SNS topic which will now be able to trigger the lambda.
You will find SO answers to questions on how to do this cloudformation (example) but you can also read up on AWS cloudformation documentation.
If you are not worried about automating the stuff and manually tests these, then aws-cli is your friend.

How to use AWS Secrets Manager Caching for Python Lambda?

I am referring the aws-secretsmanager-caching-python documentation and trying to cache the retrieved secret from secrets manager however, for some reason, i am always getting timeout without any helpful errors to troubleshoot this further. I am able to retrieve the secrets properly if i retrieve the secrets from secrets manager (without caching).
My main function in lambda function looks like this:
import botocore
import botocore.session
from aws_secretsmanager_caching import SecretCache, SecretCacheConfig
from cacheSecret import getCachedSecrets
def lambda_handler(event, context):
result = getCachedSecrets()
print(result)
and i have created cacheSecret as following.
from aws_secretsmanager_caching import SecretCache
from aws_secretsmanager_caching import InjectKeywordedSecretString, InjectSecretString
cache = SecretCache()
#InjectKeywordedSecretString(secret_id='my_secret_name', cache=cache, secretKey1='keyname1', secretKey2='keyname2')
def getCachedSecrets(secretKey1, secretKey2):
print(secretKey1)
print(secretKey2)
return secretKey1
In the above code, my_secret_name is the name of the secret created in secret manager and secretKey1 and secretKey1 are the secret key names which have string values.
Error:
{
"errorMessage": "2021-03-31T15:29:08.598Z 01f5ded3-7658-4zb5-ae66-6f300098a6e47 Task timed out after 3.00 seconds"
}
Can someone please suggest what needs to be fixed in the above to make this work. Also, i am not sure where to define the secret_name, secret key names in case if we dont use decorators.
The lambda needs to ack the response within 3 seconds but the code can run longer. The timeout can be configured in the function config: https://docs.aws.amazon.com/lambda/latest/dg/configuration-function-common.html

Lambda function gets stuck when calling RDS via SQLalchemy URI

I have a fast API application. Initially, I was passing my DB URI via ngrok tunnel like this in my SAM template. In this setup Lambda will be using my local machine's PSQL DB.
DbConnnectionString:
Type: String
Default: postgresql://<uname>:<pwd>#x.tcp.ngrok.io:PORT/DB
This is how I read the URI in my Python code
# config.py
DATABASE_URL = os.environ.get('DB_URI')
db_engine = create_engine(DATABASE_URL)
db_session = sessionmaker(autocommit=False, autoflush=False,bind=db_engine)
print(f"Configs initialized for {API_V1_STR}")
# app.py
# 3rd party
from fastapi import FastAPI
# Custom
from config.app_config import PROJECT_NAME, db_engine
from models.db_models import Base
print("Creating all database")
Base.metadata.create_all(bind=db_engine)
app = FastAPI(title=PROJECT_NAME)
print("APP created")
In this setup, everything seems to work as expected.
But whenever I replace the DB URL with RDS DB, suddenly the call gets stuck at create all database step as shown in the image below. when this happens the lambda always times out and throws exceptions.
If I run the code locally using uvicorn this error doesn't occur.
Everything works as expected.
When I use sam local invoke even with RDS URL, the API call works without any issues.
This problem occurs only while deployed in AWS Lambda.
I notice that configs are initialized twice in this setup, Once before START request ID and once after.
I have tried reading up on it but not clear what could I do to fix this. Any help would be much appreciated.
It was my bad!. I didn't pay attention to security groups. It was a connection timeout all along. Once I fixed the port access in Security groups, lambda started working as expected.

Timeout when writing custom metric data to CloudWatch with AWS lambda

I'm running a vanilla AWS lambda function to count the number of messages in my RabbitMQ task queue:
import boto3
from botocore.vendored import requests
cloudwatch_client = boto3.client('cloudwatch')
def get_queue_count(user="user", password="password", domain="<my domain>/api/queues"):
url = f"https://{user}:{password}#{domain}"
res = requests.get(url)
message_count = 0
for queue in res.json():
message_count += queue["messages"]
return message_count
def lambda_handler(event, context):
metric_data = [{'MetricName': 'RabbitMQQueueLength', "Unit": "None", 'Value': get_queue_count()}]
print(metric_data)
response = cloudwatch_client.put_metric_data(MetricData=metric_data, Namespace="RabbitMQ")
print(response)
Which returns the following output on a test run:
Response:
{
"errorMessage": "2020-06-30T19:50:50.175Z d3945a14-82e5-42e5-b03d-3fc07d5c5148 Task timed out after 15.02 seconds"
}
Request ID:
"d3945a14-82e5-42e5-b03d-3fc07d5c5148"
Function logs:
START RequestId: d3945a14-82e5-42e5-b03d-3fc07d5c5148 Version: $LATEST
/var/runtime/botocore/vendored/requests/api.py:72: DeprecationWarning: You are using the get() function from 'botocore.vendored.requests'. This dependency was removed from Botocore and will be removed from Lambda after 2021/01/30. https://aws.amazon.com/blogs/developer/removing-the-vendored-version-of-requests-from-botocore/. Install the requests package, 'import requests' directly, and use the requests.get() function instead.
DeprecationWarning
[{'MetricName': 'RabbitMQQueueLength', 'Value': 295}]
END RequestId: d3945a14-82e5-42e5-b03d-3fc07d5c5148
You can see that I'm able to interact with the RabbitMQ API just fine--the function hangs when trying to post the metric.
The lambda function uses the IAM role put-custom-metric, which uses the policies recommended here, as well as CloudWatchFullAccess for good measure.
Resources on my internal load balancer, where my RabbitMQ server lives, are protected by a VPN, so it's necessary for me to associate this function with the proper VPC/security group. Here's how it's setup right now (I know this is working, because otherwise the communication with RabbitMQ would fail):
I read this post where multiple contributors suggest increasing the function memory and timeout settings. I've done both of these, and the timeout persists.
I can run this locally without any issue to create the metric on CloudWatch in less than 5 seconds.
#noxdafox has written a brilliant plugin that got me most of the way there, but at the end of the day I ended going for a pure lambda-based solution. It was surprisingly tricky getting the cloud watch plugin running with docker, and after I had trouble with the container shutting down its services and stopping processing of the message queue. Additionally, I wanted to be able to normalize queue count by the number of worker services in my ECS cluster, so I was going to need to connect to at least one AWS resource from within my VPC anyhow. I figured best to keep everything simple and in the same place.
import os
import boto3
from botocore.vendored import requests
USER = os.getenv("RMQ_USER")
PASSWORD = os.getenv("RMQ_PASSWORD")
cloudwatch_client = boto3.client(service_name='cloudwatch', endpoint_url="https://MYCLOUDWATCHURL.monitoring.us-east-1.vpce.amazonaws.com")
ecs_client = boto3.client(service_name='ecs', endpoint_url="https://vpce-MYECSURL.ecs.us-east-1.vpce.amazonaws.com")
def get_message_count(user=USER, password=PASSWORD, domain="rabbitmq.stockbets.io/api/queues"):
url = f"https://{user}:{password}#{domain}"
res = requests.get(url)
message_count = 0
for queue in res.json():
message_count += queue["messages"]
print(f"message count: {message_count}")
return message_count
def get_worker_count():
worker_data = ecs_client.describe_services(cluster="prod", services=["worker"])
worker_count = worker_data["services"][0]["runningCount"]
print(f"worker_count count: {worker_count}")
return worker_count
def lambda_handler(event, context):
message_count = get_message_count()
worker_count = get_worker_count()
print(f"msgs per worker: {message_count / worker_count}")
metric_data = [
{'MetricName': 'MessagesPerWorker', "Unit": "Count", 'Value': message_count / worker_count},
{'MetricName': 'NTasks', "Unit": "Count", 'Value': worker_count}
]
cloudwatch_client.put_metric_data(MetricData=metric_data, Namespace="RabbitMQ")
Creating the VPC endpoints was easier that I thought it would be. For Cloudwatch, you want to search for the "monitoring" VPC endpoint during the creation step (not "cloudwatch" or "logs". Searching for "ecs" gets you what you need for the ECS connect.
Once your lambda is us you need to configure the metric and accompanying alerts, and then relate those to an auto-scaling policy, but that's probably beyond the scope of this post. Leave a comment if you have questions on how I worked that out.
Only reason you might want to use a Lambda function to achieve your goal is if you do not own the RabbitMQ cluster. The fact your logic is hanging during communication suggests a network issue mostly due to mis-configured security groups.
If you can change the cluster configuration, I'd suggest you to install and configure the CloudWatch metrics exporter plugin which does most of the heavy-lifting work for you.
If your cluster runs on Docker, I believe the custom Docker file to be the best solution. If you run your Docker instances in AWS via ECS/Fargate, the plugin should be able to automatically infer the credentials from the Task Role through ExAws. Otherwise, just follow the README instructions on how to set the credentials yourself.

boto3 s3 connection error: An error occurred (SignatureDoesNotMatch) when calling the ListBuckets operation

I'm using the boto3 package to connect from outside an s3 cluster (i.e. the script is currently not being run within the AWS 'cloud', but from my MBP connecting to the relevant cluster). My code:
s3 = boto3.resource(
"s3",
aws_access_key_id=self.settings['CREDENTIALS']['aws_access_key_id'],
aws_secret_access_key=self.settings['CREDENTIALS']['aws_secret_access_key'],
)
bucket = s3.Bucket(self.settings['S3']['bucket_test'])
for bucket_in_all in boto3.resource('s3').buckets.all():
if bucket_in_all.name == self.settings['S3']['bucket_test']:
print ("Bucket {} verified".format(self.settings['S3']['bucket_test']))
Now I'm receiving this error message:
botocore.exceptions.ClientError: An error occurred (SignatureDoesNotMatch) when calling the ListBuckets operation
I'm aware of the sequence of how the aws credentials are checked, and tried different permutations of my environment variables and ~/.aws/credentials, and know that the credentials as per my .py script should override, however I'm still seeing this SignatureDoesNotMatch error message. Any ideas where I may be going wrong? I've also tried:
# Create a session
session = boto3.session.Session(
aws_access_key_id=self.settings['CREDENTIALS']['aws_access_key_id'],
aws_secret_access_key=self.settings['CREDENTIALS']['aws_secret_access_key'],
aws_session_token=self.settings['CREDENTIALS']['session_token'],
region_name=self.settings['CREDENTIALS']['region_name']
)
s3 = boto3.resource('s3')
for bucket in s3.buckets.all():
print(bucket.name)
...however I also see the same error traceback.
Actually, this was partly answered by #John Rotenstein and #bdcloud nevertheless I need to be more specific...
The following code in my case was not necessary and causing the error message:
# Create a session
session = boto3.session.Session(
aws_access_key_id=self.settings['CREDENTIALS']['aws_access_key_id'],
aws_secret_access_key=self.settings['CREDENTIALS']['aws_secret_access_key'],
aws_session_token=self.settings['CREDENTIALS']['session_token'],
region_name=self.settings['CREDENTIALS']['region_name']
)
The credentials now stored in self.settings mirror the ~/.aws/credentials. Weirdly (and like last week where the reverse happened), I now have access. It could be that a simple reboot of my laptop meant that my new credentials (since I updated these again yesterday) in ~/.aws/credentials were then 'accepted'.

Resources