get aws lambda to start transcribe job with filename without extension

get aws lambda to start transcribe job with filename without extension - python-3.x

I have a lambda function which will start a transcribe job when an object is put into the s3 bucket. I am having trouble getting setting the transcribe job to be the file name without the extension; also the file is not putting into the correct prefix folder in S3 bucket for some reason, here's what I have:
import json
import boto3
import time
import os
from urllib.request import urlopen
transcribe = boto3.client('transcribe')
def lambda_handler(event, context):
if event:
file_obj = event["Records"][0]
bucket_name = str(file_obj['s3']['bucket']['name'])
file_name = str(file_obj['s3']['object']['key'])
s3_uri = create_uri(bucket_name, file_name)
job_name = filename
print(os.path.splitext(file_name)[0])
transcribe.start_transcription_job(TranscriptionJobName = job_name,
Media = {'MediaFileUri': s3_uri},
MediaFormat = 'mp3',
LanguageCode = "en-US",
OutputBucketName = "sbox-digirepo-transcribe-us-east-1",
Settings={
# 'VocabularyName': 'string',
'ShowSpeakerLabels': True,
'MaxSpeakerLabels': 2,
'ChannelIdentification': False
})
while Ture:
status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
if status["TranscriptionJob"]["TranscriptionJobStatus"] in ["COMPLETED", "FAILED"]:
break
print("Transcription in progress")
time.sleep(5)
s3.put_object(Bucket = bucket_name, Key="output/{}.json".format(job_name), Body=load_)
return {
'statusCode': 200,
'body': json.dumps('Transcription job created!')
}
def create_uri(bucket_name, file_name):
return "s3://"+bucket_name+"/"+file_name
the error i get is
[ERROR] BadRequestException: An error occurred (BadRequestException) when calling the StartTranscriptionJob operation: 1 validation error detected: Value 'input/7800533A.mp3' at 'transcriptionJobName' failed to satisfy constraint: Member must satisfy regular expression pattern: ^[0-9a-zA-Z._-]+
so my desired output should have the TranscriptionJobName value to be 7800533A for this case, and the result OutputBucketName to be in s3bucket/output. any help is appreciated, thanks in advance.

The TranscriptionJobName argument is a friendly name for your job and is pretty limited based on the regex. You're passing it the full object key, which contains the prefix input/, but / is a disallowed character in the job name. You could just split out the file name portion in your code:
job_name = file_name.split('/')[-1]
I put a full example of uploading media and starting an AWS Transcribe job on GitHub that puts all this in context.

Related

How can I scrape PDFs within a Lambda function and save them to an S3 bucket?

I'm trying to develop a simple lambda function that will scrape a pdf and save it to an s3 bucket given the url and the desired filename as input data. I keep receiving the error "Read-only file system,' and I'm not sure if I have to change the bucket permissions or if there is something else I am missing. I am new to S3 and Lambda and would appreciate any help.
This is my code:
import urllib.request
import json
import boto3
def lambda_handler(event, context):
s3 = boto3.client('s3')
url = event['url']
filename = event['filename'] + ".pdf"
response = urllib.request.urlopen(url)
file = open(filename, 'w')
file.write(response.read())
s3.upload_fileobj(response.read(), 'sasbreports', filename)
file.close()
This was my event file:
{
"url": "https://purpose-cms-preprod01.s3.amazonaws.com/wp-content/uploads/2022/03/09205150/FY21-NIKE-Impact-Report_SASB-Summary.pdf",
"filename": "nike"
}
When I tested the function, I received this error:
{
"errorMessage": "[Errno 30] Read-only file system: 'nike.pdf.pdf'",
"errorType": "OSError",
"requestId": "de0b23d3-1e62-482c-bdf8-e27e82251941",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 15, in lambda_handler\n file = open(filename + \".pdf\", 'w')\n"
]
}

AWS Lambda has limited space in /tmp, the sole writable location.
Writing into this space can be dangerous without a proper disk management since this storage is kept alive across multiple executions. It can lead to a saturation or unexpected file share with previous requests.
Instead of saving locally the PDF, write it directly to S3, without involving file system this way:
import urllib.request
import json
import boto3
def lambda_handler(event, context):
s3 = boto3.client('s3')
url = event['url']
filename = event['filename']
response = urllib.request.urlopen(url)
s3.upload_fileobj(response.read(), 'sasbreports', filename)
BTW: The .pdf appending should be removed according your use case.

AWS Lambda functions can only write to the /tmp/ directory. All other directories are Read-Only.
Also, there is a default limit of 512MB for storage in /tmp/, so make sure you delete the files after upload it to S3 for situations where the Lambda environment is re-used for future executions.

Phone book search on AWS Lambda and S3

I want to make a serverless application in AWS Lambda for phone book searches.
What I've done:
Created a bucket and uploaded a CSV file to it.
Created a role with full access to the bucket.
Created a Lambda function
Created API Gateway with GET and POST methods
The Lambda function contains the following code:
import boto3
import json
s3 = boto3.client('s3')
resp = s3.select_object_content(
Bucket='namebbacket',
Key='sample_data.csv',
ExpressionType='SQL',
Expression="SELECT * FROM s3object s where s.\"Name\" = 'Jane'",
InputSerialization = {'CSV': {"FileHeaderInfo": "Use"}, 'CompressionType': 'NONE'},
OutputSerialization = {'CSV': {}},
)
for event in resp['Payload']:
if 'Records' in event:
records = event['Records']['Payload'].decode('utf-8')
print(records)
elif 'Stats' in event:
statsDetails = event['Stats']['Details']
print("Stats details bytesScanned: ")
print(statsDetails['BytesScanned'])
print("Stats details bytesProcessed: ")
print(statsDetails['BytesProcessed'])
print("Stats details bytesReturned: ")
print(statsDetails['BytesReturned'])
When I access the Invoke URL, I get the following error:
{errorMessage = Handler 'lambda_handler' missing on module 'lambda_function', errorType = Runtime.HandlerNotFound}
CSV structure: Name, PhoneNumber, City, Occupation
How to solve this problem?

Please refer to this documentation topic to learn how to write a Lambda function in Python. You are missing the Handler. See: AWS Lambda function handler in Python

Wecome to S.O. #smac2020 links you to the right place AWS Lambda function handler in Python. In short, AWS Lambda needs to know where to find your code, hence the "handler". Though a better way to think about it might be "entry-point."
Here is a close approximation of your function, refactored for use on AWS Lambda:
import json
import boto3
def function_to_be_called(event, context):
# TODO implement
s3 = boto3.client('s3')
resp = s3.select_object_content(
Bucket='stack-exchange',
Key='48836509/dogs.csv',
ExpressionType='SQL',
Expression="SELECT * FROM s3object s where s.\"breen_name\" = 'pug'",
InputSerialization = {'CSV': {"FileHeaderInfo": "Use"}, 'CompressionType': 'NONE'},
OutputSerialization = {'CSV': {}},
)
for event in resp['Payload']:
if 'Records' in event:
records = event['Records']['Payload'].decode('utf-8')
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!'),
'pugInfo': records
}
This function produces the following result:
Response
{
"statusCode": 200,
"body": "\"Hello from Lambda!\"",
"currentWorkdingDirectory": "/var/task",
"currentdirlist": [
"lambda_function.py"
],
"pugInfo": "1,pug,toy\r\n"
}
The "entry point" for this function is in a Python file called lambda_function.py and the function function_to_be_called. Together these are the "handler." We can see this in the Console:
or using the API through Boto3
import boto3
awslambda = boto3.client('lambda')
awslambda.get_function_configuration('s3SelectFunction')
Which returns:
{'CodeSha256': 'mFVVlakisUIIsLstQsJUpeBIeww4QhJjl7wJaXqsJ+Q=',
'CodeSize': 565,
'Description': '',
'FunctionArn': 'arn:aws:lambda:us-east-1:***********:function:s3SelectFunction',
'FunctionName': 's3SelectFunction',
'Handler': 'lambda_function.function_to_be_called',
'LastModified': '2021-03-10T00:57:48.651+0000',
'MemorySize': 128,
'ResponseMetadata': ...
'Version': '$LATEST'}

Getting Error While Inserting JSON data to DynamoDB using Python

HI,
I am trying to put the json data into AWS dynamodb table using AWS Lambda , however i am getting an error like below. My json file is uploaded to S3 bucket.
Parameter validation failed:
Invalid type for parameter Item, value:
{
"IPList": [
"10.1.0.36",
"10.1.0.27"
],
"TimeStamp": "2020-04-22 11:43:13",
"IPCount": 2,
"LoadBalancerName": "internal-ALB-1447121364.us-west-2.elb.amazonaws.com"
}
, type: <class 'str'>, valid types: <class 'dict'>: ParamValidationError
Below i my python script:-
import boto3
import json
s3_client = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
json_file_name = event['Records'][0]['s3']['object']['key']
json_object = s3_client.get_object(Bucket=bucket,Key=json_file_name)
jsonFileReader = json_object['Body'].read()
jsonDict = json.loads(jsonFileReader)
table = dynamodb.Table('test')
table.put_item(Item=jsonDict)
return 'Hello'
Below is my json content
"{\"IPList\": [\"10.1.0.36\", \"10.1.0.27\"], \"TimeStamp\": \"2020-04-22 11:43:13\",
\"IPCount\": 2, \"LoadBalancerName\": \"internal-ALB-1447121364.us-west-2.elb.amazonaws.com\"}"
Can someone help me, how can insert data to dynamodb.

json.loads(jsonFileReader) returns string, but table.put_item() expects dict. Use json.load() instead.

Writing string to S3 with boto3: "'dict' object has no attribute 'put'"

In an AWS lambda, I am using boto3 to put a string into an S3 file:
import boto3
s3 = boto3.client('s3')
data = s3.get_object(Bucket=XXX, Key=YYY)
data.put('Body', 'hello')
I am told this:
[ERROR] AttributeError: 'dict' object has no attribute 'put'
The same happens with data.put('hello') which is the method recommended by the top answers at How to write a file or data to an S3 object using boto3 and with data.put_object: 'dict' object has no attribute 'put_object'.
What am I doing wrong?
On the opposite, reading works great (with data.get('Body').read().decode('utf-8')).

put_object is a method of the s3 object, not the data object.
Here is a full working example with Python 3.7:
import json
import boto3
s3 = boto3.client('s3')
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
bucket = 'mybucket'
key = 'id.txt'
id = None
# Write id to S3
s3.put_object(Body='Hello!', Bucket=bucket, Key=key)
# Read id from S3
data = s3.get_object(Bucket=bucket, Key=key)
id = data.get('Body').read().decode('utf-8')
logger.info("Id:" + id)
return {
'statusCode': 200,
'body': json.dumps('Id:' + id)
}

aws firehose lambda function invocation gives wrong output strcuture format

When i insert a data object to aws firhose stream using a put operation it works fine .As lambda function is enabled on my firehose stream .hence a lambda function is invoked but gives me a output structure response error :
"errorMessage":"Invalid output structure: Please check your function and make sure the processed records contain valid result status of Dropped, Ok, or ProcessingFailed."
so now i have created my lambda function like this way to make the correct output strcuture :
import base64
import json
print('Loading function')
def lambda_handler(event, context):
output=[]
print('event'+str(event))
for record in event['records']:
payload = base64.b64decode(record['data'])
print('payload'+str(payload))
payload=base64.b64encode(payload)
output_record={
'recordId':record['recordId'],
'result': 'Ok',
'data': base64.b64encode(json.dumps('hello'))
}
output.append(output_record)
return { 'records': output }
Now i am getting the follwing eror on encoding the 'data' field as
"errorMessage": "a bytes-like object is required, not 'str'",
and if i change the 'hello' to bytes like b'hello' then i get the following error :
"errorMessage": "Object of type bytes is not JSON serializable",

import json
import base64
import gzip
import io
import zlib
def lambda_handler(event, context):
output = []
for record in event['records']:
payload = base64.b64decode(record['data']).decode('utf-8')
output_record = {
'recordId': record['recordId'],
'result': 'Ok',
'data': base64.b64encode(payload.encode('utf-8')).decode('utf-8')
}
output.append(output_record)
return {'records': output}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

get aws lambda to start transcribe job with filename without extension - python-3.x

Related

How can I scrape PDFs within a Lambda function and save them to an S3 bucket?

Phone book search on AWS Lambda and S3

Getting Error While Inserting JSON data to DynamoDB using Python

Writing string to S3 with boto3: "'dict' object has no attribute 'put'"

aws firehose lambda function invocation gives wrong output strcuture format

Categories

Resources