AWS Lambda fails for torch (python) - python-3.x

All I'm trying to do is create a flask app for Places365 and deploy it as an AWS Lambda with API. While everything works fine on my EC-2 instance, Lambda keeps failing with
"No module named 'torch': ModuleNotFoundError" error.
Initially, when I tried to include the torch as part of my virtual environment, Lambda kept failing with "No space left" error. So, I uninstalled torch from my virtual environment, redeployed the function and added PyTorch layer (arn:aws:lambda:us-east-1:934676248949:layer:pytorchv1-py36:2) to the function. Still, it fails with "No module named 'torch': ModuleNotFoundError" error
Also, I used Zappa for Lambda deployment
It would be great if someone can share their experience of deploying torch to Lambda

I was able to fix it. Below is what I did
ARN of pytorch layer that I used:
arn:aws:lambda:us-east-1:934676248949:layer:pytorchv1-py36:2
Added the following code to my python Lambda function
sys.path.insert(1, '/opt')
import unzip_requirements
import torch

Related

sonarqube and aws lambda

I recently enabled sonarqube for my lambda functions.
Now as we all know, for any lambda_handler this is the standard process. However all the logics are stated and based on event not much on context.
def lambda_handler(event, context):
Now after running the sonarqube scan, I'm getting:
Remove the unused function parameter "context".
I'm getting MAJOR issue from the sonarqube for all my lambdas , any suggestion to fix this issue?
My lambdas are Python based.

KeyError: 'PYSPARK_GATEWAY_SECRET' when creating spark context inside aws lambda code

I have deployed a lambda function which uses sparknlp, as a docker container. For working with sparknlp I need spark context. So, In my sparknlp code, I start with
sc = pyspark.SparkContext().getOrCreate()
I tested my lambda on local and it worked fine.
On aws I got this error :
java gateway process exited before sending its port number
even though JAVA_HOME was properly set.
I found out in the source code :
https://github.com/apache/spark/blob/master/python/pyspark/java_gateway.py
in the launch_gateway method that it tries to create a temporary file and if that file is not created it raises the above error. (line 105)
Lambda won't allow write access to file system, so the file cannot be created.
So I am trying to pass gateway_port and gateway_secret as environment variables.
I have kept the PYSPARK_GATEWAY_PORT=25333 which is the default value.
I am not able to figure out how to get the PYSPARK_GATEWAY_SECRET.
Which is why getting the error :
KeyError: 'PYSPARK_GATEWAY_SECRET'

Testing the sagemaker endpoint deployment locally

How to test the endpoint deployment in sagemaker locally using the sagemaker notebook instance?
The issue is that if we want to test the endpoint using the sagemaker studio notebook then it will take some time before it spins up the docker inference container depending on the instance. This can certainly hamper the development and process cycle!
Create a LocalSession and configure it directly:
from sagemaker.local import LocalSession
sagemaker_session = LocalSession()
sagemaker_session.config = {'local': {'local_code': True}}
Now pass this sagemaker_session to your estimator or model

Lambda function gets stuck when calling RDS via SQLalchemy URI

I have a fast API application. Initially, I was passing my DB URI via ngrok tunnel like this in my SAM template. In this setup Lambda will be using my local machine's PSQL DB.
DbConnnectionString:
Type: String
Default: postgresql://<uname>:<pwd>#x.tcp.ngrok.io:PORT/DB
This is how I read the URI in my Python code
# config.py
DATABASE_URL = os.environ.get('DB_URI')
db_engine = create_engine(DATABASE_URL)
db_session = sessionmaker(autocommit=False, autoflush=False,bind=db_engine)
print(f"Configs initialized for {API_V1_STR}")
# app.py
# 3rd party
from fastapi import FastAPI
# Custom
from config.app_config import PROJECT_NAME, db_engine
from models.db_models import Base
print("Creating all database")
Base.metadata.create_all(bind=db_engine)
app = FastAPI(title=PROJECT_NAME)
print("APP created")
In this setup, everything seems to work as expected.
But whenever I replace the DB URL with RDS DB, suddenly the call gets stuck at create all database step as shown in the image below. when this happens the lambda always times out and throws exceptions.
If I run the code locally using uvicorn this error doesn't occur.
Everything works as expected.
When I use sam local invoke even with RDS URL, the API call works without any issues.
This problem occurs only while deployed in AWS Lambda.
I notice that configs are initialized twice in this setup, Once before START request ID and once after.
I have tried reading up on it but not clear what could I do to fix this. Any help would be much appreciated.
It was my bad!. I didn't pay attention to security groups. It was a connection timeout all along. Once I fixed the port access in Security groups, lambda started working as expected.

Azure ML: how to access logs of a failed Model deployment

I'm deploying a Keras model that is failing with the error below. The exception says that I can retrieve the logs by running "print(service.get_logs())", but that's giving me empty results. I am deploying the model from my AzureNotebook and I'm using the same "service" var to retrieve the logs.
Also, how can i retrieve the logs from the container instance? I'm deploying to an AKS compute cluster I created. Sadly, the docs link in the exception also doesnt detail how to retrieve these logs.
More information can be found using '.get_logs()' Error:
{ "code":
"KubernetesDeploymentFailed", "statusCode": 400, "message":
"Kubernetes Deployment failed", "details": [
{
"code": "CrashLoopBackOff",
"message": "Your container application crashed. This may be caused by errors in your scoring file's init() function.\nPlease check
the logs for your container instance: my-model-service. From
the AML SDK, you can run print(service.get_logs()) if you have service
object to fetch the logs. \nYou can also try to run image
mlwks.azurecr.io/azureml/azureml_3c0c34b65cf18c8644e8d745943ab7d2:latest
locally. Please refer to http://aka.ms/debugimage#service-launch-fails
for more information."
} ] }
UPDATE
Here's my code to deploy the model:
environment = Environment('my-environment')
environment.python.conda_dependencies = CondaDependencies.create(pip_packages=["azureml-defaults","azureml-dataprep[pandas,fuse]","tensorflow", "keras", "matplotlib"])
service_name = 'my-model-service'
# Remove any existing service under the same name.
try:
Webservice(ws, service_name).delete()
except WebserviceException:
pass
inference_config = InferenceConfig(entry_script='score.py', environment=environment)
comp = ComputeTarget(workspace=ws, name="ml-inference-dev")
service = Model.deploy(workspace=ws,
name=service_name,
models=[model],
inference_config=inference_config,
deployment_target=comp
)
service.wait_for_deployment(show_output=True)
And my score.py
import joblib
import numpy as np
import os
import keras
from keras.models import load_model
from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType
def init():
global model
model_path = Model.get_model_path('model.h5')
model = load_model(model_path)
model = keras.models.load_model(model_path)
# The run() method is called each time a request is made to the scoring API.
#
# Shown here are the optional input_schema and output_schema decorators
# from the inference-schema pip package. Using these decorators on your
# run() method parses and validates the incoming payload against
# the example input you provide here. This will also generate a Swagger
# API document for your web service.
#input_schema('data', NumpyParameterType(np.array([[0.1, 1.2, 2.3, 3.4, 4.5, 5.6, 6.7, 7.8, 8.9, 9.0]])))
#output_schema(NumpyParameterType(np.array([4429.929236457418])))
def run(data):
return [123] #test
Update 2:
Here is a screencap of the endpoint page. Is it normal for the CPU to be .1? Also, when i hit the swagger url in the browser, i get the error: "No ready replicas for service doc-classify-env-service"
Update 3
After finally getting to the container logs, it turns out that it was choking with this error on my score.py
ModuleNotFoundError: No module named 'inference_schema'
I then ran a test that commented out the refs for "input_schema" and "output_schema" and also simplified my pip_packages and the REST endpoint come up! I was also able to get a prediction out of the model.
pip_packages=["azureml-defaults","tensorflow", "keras"])
So my question is, how should I have my pip_packages for the scoring file to utilize the inference_schema decorators? I'm assuming I need to include azureml-sdk[auotml] pip package, but when i do so, the image creation fails and I see several dependency conflicts.
Try retrieving your service from the workspace directly
ws.webservices[service_name].get_logs()
Also, I found deploying an image as an endpoint to be easier than inference+deploy model (depending on your use case)
my_image = Image(ws, name='test', version='26')
service = AksWebservice.deploy_from_image(ws, "test1", my_image, deployment_config, aks_target)

Resources