python azure blob readinto with requests stream upload - python-3.x

I have videos saved in azure blob storage and i want to upload them into facebook. Facebook video upload is a multipart/form-data post request. Ordinary way of doing this is download azure blob as bytes using readall() method in azure python sdk and set it in requests post data as follows.
# download video from azure blob
video = BlobClient.from_connection_string(AZURE_STORAGE_CONNECTION_STRING,
AZURE_CONTAINER_NAME,
f"{folder_id}/{file_name}")
video = video.download_blob().readall()
# upload video to facebook
url = f"{API_VIDEO_URL}/{page_id}/videos"
params = {
"upload_phase": "transfer",
"upload_session_id": session_id,
"start_offset": start_offset,
"access_token": access_token
}
response = requests.post(url, params=params, files={"video_file_chunk": video})
Bytes of the file is loaded into the memory and this is not good for larger files. There is a method in azure sdk readinto(stream) that downloads the file into a stream. Is there a way to connect requests streaming upload and readinto() method. Or is there a another way to upload the file directly from blob storage?

Regarding how to upload video in chunk with stream, please refer to the following code
from azure.storage.blob import BlobClient
import io
from requests_toolbelt import MultipartEncoder
import requests
blob_poperties=blob.get_blob_properties()
blob_size=blob_poperties.size # the blob size
access_token=''
session_id='675711696358783'
chunk_size= 1024*1024 #the chunk size
bytesRemaining = blob_size
params = {
"upload_phase": "transfer",
"upload_session_id": session_id,
"start_offset": 0,
"access_token": access_token
}
url="https://graph-video.facebook.com/v7.0/101073631699517/videos"
bytesToFetch=0
start=0 # where to start downlaoding
while bytesRemaining>0 :
with io.BytesIO() as f:
if bytesRemaining < chunk_size:
bytesToFetch= bytesRemaining
else:
bytesToFetch=chunk_size
print(bytesToFetch)
print(start)
downloader =blob.download_blob(start,bytesToFetch)
b=downloader.readinto(f)
print(b)
m = MultipartEncoder(
fields={'video_file_chunk':('file',f) }
)
r =requests.post(url, params=params, headers={'Content-Type': m.content_type}, data=m)
s=r.json()
print(s)
start =int(s['start_offset'])
bytesRemaining -=int(s['start_offset'])
params['start_offset']=start
print(params)
# end uplaod
params['upload_phase']= 'finish'
r=requests.post(url, params=params)
print(r)

Related

PYTHON FLASK - request.files displays as file not found eventhough it exits

I am trying to trigger an external api from postman by passing the uploadFile in the body as form-data. Below code throws me an error as 'FileNotFoundError: [Errno 2] No such file or directory:'
Note: In postman, uploadFile takes file from my local desktop as input. have also modified the postman settings to allow access for files apart from working directory
Any help would be highly appreciable.
Below is the Code:
#app.route('/upload', methods=['POST'])
#auth.login_required
def post_upload():
payload = {
'table_name': 'incident', 'table_sys_id': request.form['table_sys_id']
}
files = {'file': (request.files['uploadFile'], open(request.files['uploadFile'].filename,
'rb'),'image/jpg', {'Expires': '0'})}
response = requests.post(url, headers=headers, files=files, data=payload)
return jsonify("Success- Attachment uploaded successfully ", 200)
Below code throws me an error as 'FileNotFoundError: [Errno 2] No such file or directory:
Have you defined UPLOAD_FOLDER ? Please see: https://flask.palletsprojects.com/en/latest/patterns/fileuploads/#a-gentle-introduction
i am passing the attribute (upload file) in body as form-data, can this be passed as raw json
You cannot upload files with JSON. But one hacky way to achieve this is to base64 (useful reference) encode the file before sending it. This way you do not upload the file instead you send the file content encoded in base64 format.
Server side:
import base64
file_content = base64.b64decode(request.data['file_buffer_b64'])
Client side:
-> Javascript:
const response = await axios.post(uri, {file_buffer_b64: window.btoa(file)})
-> Python:
import base64
with open(request.data['uploadFile'], "rb") as myfile:
encoded_string = base64.b64encode(myfile.read())
payload = {"file_buffer_b64": encoded_string}
response = requests.post(url, data=payload)

Alternative to Azure Event Hub Capture for sending Event Hub messages to Blob Storage?

Is there any way to send my Event Hub data, which is being send in JSON format via Postman by HTTP post to blob storage in Azure?
I've tried using the EventHub's Capture feature, but unfortunately, the data is being saved in an Avro format, I really have a hard time being able to convert it back to its original JSON format again.
Therefore I would like to send my EventHub data directly to some kind of blob storage which will keep my Event Hub messages in their original JSON format, which I then can retrieve with the use of an Azure function (Get Http trigger) from my SPA via a frontend communication.
Also, will I have to create a new blob for each message in a container? As I don't think I'll be able to write them all in one blob since I won't be able to retrieve my data via the frontend when I trigger my get HTTP function at the same time.
Are there alternatives to Event Hub Capture? Is using plain blob storage the best solution? I've read a few articles on Azure Timeseries Insights and CosmosDB, but I'm not sure if these are the best ways to handle my problem.
So the issue is that I initially send this as raw data via Postman:
Raw data as JSON send via Postman:
{
"id":1,
"receiver":"2222222222222",
"message":{
"Name":"testing",
"PersonId":2,
"CarId":2,
"GUID":"1s3q1d-s546dq1-8e22e",
"LineId":2,
"SvcId":2,
"Lat":-64.546547,
"Lon":-64.546547,
"TimeStamp":"2021-03-18T08:29:36.758Z",
"Recorder":"dq65ds4qdezzer",
"Env":"DEV"
},
"operator":20404,
"sender":"MSISDN",
"binary":1,
"sent":"2021-03-18T08:29:36.758Z"
}
Once this is caught by Event Hub Capture it converts to an Avro file.
I am trying to retrieve the data by using fastavro and converting it to a JSON format.
The problem is that I am not getting back the same raw data that was initially sent by Postman. I can't find a way to convert it back to its original state, why does Avro also send me additional information from Postman?
I probably need to find a way to set the "Body" to only convert. But for some reason, it also adds "bytes" inside the body
I am just trying to get my original raw data back that was sent via Postman.
init.py (Azure function)
import logging
import os
import string
import json
import uuid
import avro.schema
import tempfile
import azure.functions as func
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter
from fastavro import reader, json_writer
#Because the Apache Python avro package is written in pure Python, it is relatively slow, therefoer I make use of fastavro
def avroToJson(avroFile):
with open("json_file.json", "w") as json_file:
with open(avroFile, "rb") as avro_file:
avro_reader = reader(avro_file)
json_writer(json_file, avro_reader.writer_schema, avro_reader)
def main(req: func.HttpRequest) -> func.HttpResponse:
logging.info('Python HTTP trigger function processed a request.')
print('Processor started using path ' + os.getcwd())
connect_str = "###########"
container = ContainerClient.from_connection_string(connect_str, container_name="####")
blob_list = container.list_blobs() # List the blobs in the container.
for blob in blob_list:
# Content_length == 508 is an empty file, so process only content_length > 508 (skip empty files).
if blob.size > 508:
print('Downloaded a non empty blob: ' + blob.name)
# Create a blob client for the blob.
blob_client = ContainerClient.get_blob_client(container, blob=blob.name)
# Construct a file name based on the blob name.
cleanName = str.replace(blob.name, '/', '_')
cleanName = os.getcwd() + '\\' + cleanName
# Download file
with open(cleanName, "wb+") as my_file: # Open the file to write. Create it if it doesn't exist.
my_file.write(blob_client.download_blob().readall())# Write blob contents into the file.
avroToJson(cleanName)
with open('json_file.json','r') as file:
jsonStr = file.read()
return func.HttpResponse(jsonStr, status_code=200)
Expected result:
{
"id":1,
"receiver":"2222222222222",
"message":{
"Name":"testing",
"PersonId":2,
"CarId":2,
"GUID":"1s3q1d-s546dq1-8e22e",
"LineId":2,
"SvcId":2,
"Lat":-64.546547,
"Lon":-64.546547,
"TimeStamp":"2021-03-18T08:29:36.758Z",
"Recorder":"dq65ds4qdezzer",
"Env":"DEV"
},
"operator":20404,
"sender":"MSISDN",
"binary":1,
"sent":"2021-03-18T08:29:36.758Z"
}
Actual result:
{
"SequenceNumber":19,
"Offset":"10928",
"EnqueuedTimeUtc":"4/1/2021 8:43:19 AM",
"SystemProperties":{
"x-opt-enqueued-time":{
"long":1617266599145
}
},
"Properties":{
"Postman-Token":{
"string":"37ff4cc6-9124-45e5-ba9d-######e"
}
},
"Body":{
"bytes":"{\r\n \"id\": 1,\r\n \"receiver\": \"2222222222222\",\r\n \"message\": {\r\n \"Name\": \"testing\",\r\n \"PersonId\": 2,\r\n \"CarId\": 2,\r\n \"GUID\": \"1s3q1d-s546dq1-8e22e\",\r\n \"LineId\": 2,\r\n \"SvcId\": 2,\r\n \"Lat\": -64.546547,\r\n \"Lon\": -64.546547,\r\n \"TimeStamp\": \"2021-03-18T08:29:36.758Z\",\r\n \"Recorder\": \"dq65ds4qdezzer\",\r\n \"Env\": \"DEV\"\r\n },\r\n \"operator\": 20404,\r\n \"sender\": \"MSISDN\",\r\n \"binary\": 1,\r\n \"sent\": \"2021-03-29T08:29:36.758Z\"\r\n}"
}
}

Using local image for Read 3.0, Azure Cognitive Service, Computer Vision

I am attempting to use a local image in my text recognition script, the documentation has the following example (https://learn.microsoft.com/en-us/azure/cognitive-services/computer-vision/quickstarts/python-hand-text):
But when I change the image_url to a local file path, it sends a HTTPError: 400 Client Error: Bad Request for url. I have tried following other tutorials but nothing seems to work.
Any help would be greatly appreciated :)
The Cognitive services API will not be able to locate an image via the URL of a file on your local machine. Instead you can call the same endpoint with the binary data of your image in the body of the request.
Replace the following lines in the sample Python code
image_url = "https://raw.githubusercontent.com/MicrosoftDocs/azure-docs/master/articles/cognitive-services/Computer-vision/Images/readsample.jpg"
headers = {'Ocp-Apim-Subscription-Key': subscription_key}
data = {'url': image_url}
response = requests.post(
text_recognition_url, headers=headers, json=data)
with
headers = {'Ocp-Apim-Subscription-Key': subscription_key,'Content-Type': 'application/octet-stream'}
with open('YOUR_LOCAL_IMAGE_FILE', 'rb') as f:
data = f.read()
response = requests.post(
text_recognition_url, headers=headers, data=data)
And replace the following line:
image = Image.open(BytesIO(requests.get(image_url).content))
with
image = Image.open('./YOUR_LOCAL_IMAGE_FILE.png')

How to upload video to s3 using API GW and python?

Im trying to make a api which will upload video to s3 . I all ready managed to upload the video in s3, but the problem is the video file is not working . i checked content-type of video file, and it's binary/octet-stream instead on video/mp4 . So i set content-type to "video/mp4" while calling put_object api, but it still not working.
I use Lambda function for putting the video to s3 . here is my lambda code -
import json
import base64
import boto3
def lambda_handler(event, context):
bucket_name = 'ad-live-streaming'
s3_client = boto3.client('s3')
file_content = event['content']
merchantId = event['merchantId']
catelogId = event['catelogId']
file_name = event['fileName']
file_path = '{}/{}/{}.mp4'.format(merchantId, catelogId, file_name)
s3_response = s3_client.put_object(Bucket=bucket_name, Key=file_path, Body=file_content, ContentType='video/mp4')
return {
'statusCode': 200,
"merchantId":merchantId,
"catelogId":catelogId,
"file_name":file_name,
}
Any idea how to solve this issue ?
Based on the example in Upload binary files to S3 using AWS API Gateway with AWS Lambda | by Omer Hanetz | The Startup | Medium, it appears that you need to decode the file from base64:
file_content = base64.b64decode(event['content'])

accessing audio files from Google Cloud Storage when using Google Speech

I have used this bit of code below to successfully parse a .wav file which contains speech, to text, using Google Speech.
But I want to access a different .wav file, which I have placed on Google Cloud Storage (publicly), instead of on my local hard drive. Why doesn't simply changing
speech_file = 'my/local/system/sample.wav'
to
speech_file = 'https://console.cloud.google.com/storage/browser/speech_proj_files/sample.wav'
work acceptably?
Here is my code:
speech_file = 'https://console.cloud.google.com/storage/browser/speech_proj_files/sample.wav'
DISCOVERY_URL = ('https://{api}.googleapis.com/$discovery/rest?'
'version={apiVersion}')
def get_speech_service():
credentials = GoogleCredentials.get_application_default().create_scoped(
['https://www.googleapis.com/auth/cloud-platform'])
http = htt|plib2.Http()
credentials.authorize(http)
return discovery.build(
'speech', 'v1beta1', http=http, discoveryServiceUrl=DISCOVERY_URL)
def main(speech_file):
"""Transcribe the given audio file.
Args:
speech_file: the name of the audio file.
"""
with open(speech_file, 'rb') as speech:
speech_content = base64.b64encode(speech.read())
service = get_speech_service()
service_request = service.speech().syncrecognize(
body={
'config': {
'encoding': 'LINEAR16', # raw 16-bit signed LE samples
'sampleRate': 44100, # 16 khz
'languageCode': 'en-US', # a BCP-47 language tag
},
'audio': {
'content': speech_content.decode('UTF-8')
}
})
response = service_request.execute()
return response
I'm not sure why your approach isn't working, but I want to offer a quick suggestion.
Google Cloud Speech API natively supports Google Cloud Storage objects. Instead of downloading the whole object only to upload it back to the Cloud Speech API, just specify the object by swapping out this line:
'audio': {
# Remove this: 'content': speech_content.decode('UTF-8')
'uri': 'gs://speech_proj_files/sample.wav' # Do this!
}
One other suggestion. You may find the google-cloud Python library easier to use. Try this:
from google.cloud import speech
speech_client = speech.Client()
audio_sample = speech_client.sample(
content=None,
source_uri='gs://speech_proj_files/sample.wav',
encoding='LINEAR16',
sample_rate_hertz= 44100)
results_list = audio_sample.sync_recognize(language_code='en-US')
There are some great examples here: https://github.com/GoogleCloudPlatform/python-docs-samples/tree/master/speech/cloud-client

Resources