I'm trying to create a Blob trigger Azure Function in Python that automatically split all sheets in a specific Excel file into separate .csv files onto the same Azure Blob container. My init.py and function.json files look something like this:
function.json file:
{
"scriptFile": "__init__.py",
"bindings": [
{
"name": "blobin",
"type": "blobTrigger",
"direction": "in",
"path": "folder/ExcelFile.xlsx",
"connection": "blobSTORAGE"
},
{
"name": "blobout",
"type": "blob",
"direction": "out",
"path": "folder/{name}",
"connection": "blobSTORAGE"
}
]
}
init.py file:
import logging
from xlrd import open_workbook
import csv
import azure.functions as func
def main(blobin: func.InputStream, blobout: func.Out[bytes]):
logging.info(f"Python blob trigger function processed blob \n")
try:
wb = open_workbook('ExcelFile.xlsx')
for i in range(0, wb.nsheets):
sheet = wb.sheet_by_index(i)
print(sheet.name)
with open("Sheet_%s.csv" %(sheet.name.replace(" ","")), "w") as file:
writer = csv.writer(file, delimiter = ",")
print(sheet, sheet.name, sheet.ncols, sheet.nrows)
header = [cell.value for cell in sheet.row(0)]
writer.writerow(header)
for row_idx in range(1, sheet.nrows):
row = [int(cell.value) if isinstance(cell.value, float) else cell.value
for cell in sheet.row(row_idx)]
writer.writerow(row)
blobout.set(file.read)
logging.info(f"Split sheet %s into CSV successfully.\n" %(sheet.name))
except:
print("Error")
I tried to run the pure Python code on my PC without the Azure function implementing and succeeded. However, when I deployed the function onto Azure Apps it does not "trigger" when I tried to upload the Excel file. I am thinking that the config I put is wrong but don't know how to confirm or fix it. Any suggestion?
Since you don't have problem on local, I think the problems comes from the blobSTORAGE that doesn't have setted in the environment variable.
On local, environment variable is setted in the values section in local.settings.json. But when you deploy the function app, it will get environment variable from this place:
Related
I would like to use python in azure function to read these excel files in this blob storage. i would like to know how to access this folder of blob in the easiest way and the python code to read the excel files.
enter image description here
The easiest way would be to use input bindings
in your function.json add bindings
{
"name": "inputblob",
"type": "blob",
"dataType": "binary",
"path": "blob/path/filename.csv",
"connection": "MyStorageConnectionAppSetting",
"direction": "in"
},
Then function
import logging
import azure.functions as func
# The input binding field inputblob can either be 'bytes' or 'str' depends
# on dataType in function.json, 'binary' or 'string'.
def main(queuemsg: func.QueueMessage, inputblob: bytes) -> bytes:
logging.info(f'Python Queue trigger function processed {len(inputblob)} bytes')
return inputblob
I have setup a very simple azure function that should be triggered whenever an excel file is dropped into my bucket. However, nothing is happening. Any suggestions?
Here is my functions.json
{
"scriptFile": "__init__.py",
"bindings": [
{
"name": "myblob",
"type": "blobTrigger",
"direction": "in",
"path": "excel/{name}.xlsx",
"connection": "jbucket123_STORAGE"
}
]
}
Here is my init.py file. Any recommendations?
import logging
import pandas as pd
import azure.functions as func
def main(myblob: func.InputStream):
logging.info(f"Python blob trigger function processed blob \n"
f"Name: {myblob.name}\n"
f"Blob Size: {myblob.length} bytes")
df=pd.read_excel(myblob.read())
logging.info(f"{df}")
Your configuration looks correct, a few things to doublecheck:
Do you use a "general-purpose" storage account? Blob triggers are not supported for the legacy "BlobStorage" account types.
Does a container "excel" exist in your storage account and are you putting your file into it? Does your file end with .xlsx (and not e.g. .xls)?
If you test locally is the storage account connection string stored under local.settings.json like so:
{
"IsEncrypted": false,
"Values": {
"FUNCTIONS_WORKER_RUNTIME": "python",
"AzureWebJobsStorage": "DefaultEndpointsProtocol=https;AccountName=[...]",
"jbucket123_STORAGE": "DefaultEndpointsProtocol=https;AccountName=[...]"
}
}
If you run the function in Azure, is the "jbucket123_STORAGE" property set under "Configuration > Application settings"? Did you put all dependencies (e.g. pandas, openpyxl) into your requirements.txt before you published the function?
You might also want to check the "Log stream" of your function to get more details what your function is doing.
I want to create a Azure function using Python which will read data from the Azure Event Hub.
Fortunately, Visual Studio Code provides a way to create to create Azure functions skeleton. That can be edited according to the requirement.
I am able to create a demo HTTP trigger Azure Function with the help of a Microsoft Documentation but I don't know what change I should made in the below function so that it can read the data from the event hub and write the same to Azure Blob Storage.
Also, if someone can refer suggest any blog to get more details on azure function and standard practice.
UPDATE:
I tried to update my code based on suggestion of #Stanley but possibly it need to update in code.
I have written following code in my Azure function.
local.settings.json
{
"IsEncrypted": false,
"Values": {
"AzureWebJobsStorage": "Storage account connection string",
"FUNCTIONS_WORKER_RUNTIME": "python",
"EventHub_ReceiverConnectionString": "Endpoint Connection String of the EventHubNamespace",
"Blob_StorageConnectionString": "Storage account connection string"
}
}
function.json
{
"scriptFile": "__init__.py",
"bindings": [
{
"authLevel": "function",
"type": "eventHubTrigger",
"direction": "in",
"name": "event",
"eventHubName": "pwo-events",
"connection": "EventHub_ReceiverConnectionString",
"cardinality": "many",
"consumerGroup": "$Default",
"dataType": "binary"
}
]
}
init.py
import logging
import azure.functions as func
from azure.storage.blob import BlobClient
storage_connection_string='Storage account connection string'
container_name = ''
def main(event: func.EventHubEvent):
logging.info(f'Function triggered to process a message: {event.get_body().decode()}')
logging.info(f' SequenceNumber = {event.sequence_number}')
logging.info(f' Offset = {event.offset}')
blob_client = BlobClient.from_connection_string(storage_connection_string,container_name,str(event.sequence_number) + ".txt")
blob_client.upload_blob(event.get_body().decode())
Following is the screenshot of my blob container:
After executing he above code something got written to blob containers.
but instead of txt file it got saved in some other format. also, if I trigger azure function multiple time then files are getting overwritten.
I want to perform append operation instead of overwrite.
Also, I want to save my file in user defined location. Example: container/Year=/month=/date=
Thanks !!
If you want to read data from the Azure Event Hub, using the event hub trigger will be much easier, this is my test code (read data and write into storage):
import logging
import azure.functions as func
from azure.storage.blob import BlobClient
import datetime
storage_connection_string=''
container_name = ''
today = datetime.datetime.today()
def main(event: func.EventHubEvent):
logging.info(f'Function triggered to process a message: {event.get_body().decode()}')
logging.info(f' SequenceNumber = {event.sequence_number}')
logging.info(f' Offset = {event.offset}')
blob_client = BlobClient.from_connection_string(
storage_connection_string,container_name,
str(today.year) +"/" + str(today.month) + "/" + str(today.day) + ".txt")
blob_client.upload_blob(event.get_body().decode(),blob_type="AppendBlob")
I use the code below to send events to the event hub:
import asyncio
from azure.eventhub.aio import EventHubProducerClient
from azure.eventhub import EventData
async def run():
# Create a producer client to send messages to the event hub.
# Specify a connection string to your event hubs namespace and
# the event hub name.
producer = EventHubProducerClient.from_connection_string(conn_str="<conn string>", eventhub_name="<hub name>")
async with producer:
# Create a batch.
event_data_batch = await producer.create_batch()
# Add events to the batch.
event_data_batch.add(EventData('First event '))
event_data_batch.add(EventData('Second event'))
event_data_batch.add(EventData('Third event'))
# Send the batch of events to the event hub.
await producer.send_batch(event_data_batch)
loop = asyncio.get_event_loop()
loop.run_until_complete(run())
My local.settings.json:
{
"IsEncrypted": false,
"Values": {
"AzureWebJobsStorage": "<storage account conn str>",
"FUNCTIONS_WORKER_RUNTIME": "python",
"testhubname0123_test_EVENTHUB": "<event hub conn str>"
}
}
My function.json just as this doc indicated:
{
"scriptFile": "__init__.py",
"bindings": [{
"type": "eventHubTrigger",
"name": "event",
"direction": "in",
"eventHubName": "test01(this is my hubname, pls palce yours here)",
"connection": "testhubname0123_test_EVENTHUB"
}]
}
Result
Run the function and send data to the event hub using the code above:
Data has been saved into storage successfully:
Download .txt and check its content we can see that 3 event content has been written:
I am trying to debug an Azure function locally using Blob trigger. When uploading an image file to Azure, the trigger is received by my function running locally.
def main(blobin: func.InputStream, blobout: func.Out[bytes], context: func.Context):
logging.info(f"Python blob trigger function processed blob \n"
f"Name: {blobin.name}\n"
f"Blob Size: {blobin.length} bytes")
image_file = Image.open(blobin)
My function.json
{
"scriptFile": "__init__.py",
"bindings": [
{
"name": "blobin",
"type": "blobTrigger",
"direction": "in",
"path": "uploads/{name}",
"connection": "STORAGE"
},
{
"name": "blobout",
"type": "blob",
"direction": "out",
"path": "uploads/{blob_name}_resized.jpg",
"connection": "STORAGE"
}
]
}
The error I get when the Image.open(blobin) line runs is :
System.Private.CoreLib: Exception while executing function:
Functions.ResizePhoto. System.Private.CoreLib: Result: Failure
Exception: UnidentifiedImageError: cannot identify image file
<_io.BytesIO object at 0x0000017FD4FD7F40>
The interesting thing is that the image itself does open in VSCode watch window, but fails when the code is ran. It also gives the same error as above if I add it to the watch again (probably triggering a watch refresh).
If you want to resize an image and then save it by function blob trigger, try the code below:
import logging
from PIL import Image
import azure.functions as func
import tempfile
import ntpath
import os
def main(blobin: func.InputStream, blobout:func.Out[func.InputStream], context: func.Context):
logging.info(f"Python blob trigger function processed blob \n"
f"Name: {blobin.name}\n"
f"Blob Size: {blobin.length} bytes")
temp_file_path = tempfile.gettempdir() + '/' + ntpath.basename(blobin.name)
print(temp_file_path)
image_file = Image.open(blobin)
image_file.resize((50,50)).save(temp_file_path)
blobout.set(open(temp_file_path, "rb").read())
os.remove(temp_file_path)
function.json :
{
"scriptFile": "__init__.py",
"bindings": [
{
"name": "blobin",
"type": "blobTrigger",
"direction": "in",
"path": "samples-workitems/{name}",
"connection": "STORAGE"
},
{
"name": "blobout",
"type": "blob",
"direction": "out",
"path": "resize/{name}",
"connection": "STORAGE"
}
]
}
Note that you should not store the resized image in the same container as it will lead to an endless loop (new image triggers the blob trigger and resize again and again) and your issue is due to the newly resized image outputted not correctly so that the exception occurs while run: Image.open(blobin)
Anyway, the code above works for me perfectly, see the result below:
Upload a big image:
resize the image and save it to another container:
It turns out setting a breakpoint at the Image.open(blobin) line breaks the function somehow. Removing it from there and adding it to the next line does not prompt the error anymore. Probably Azure doesn't like to wait and times out the stream? Who knows.
I can't quite seem to get the output bindings to enable a file to be saved to blob storage. I have created an Azure Function using Python, that uses a CosmosDB Change Feed trigger. I need to save that document to blob storage.
I've set-up the function.json file as follows:
{
"scriptFile": "__init__.py",
"bindings": [
{
"type": "cosmosDBTrigger",
"name": "documents",
"direction": "in",
"leaseCollectionName": "leases",
"connectionStringSetting": "cosmos_dev",
"databaseName": "MyDatabase",
"collectionName": "MyCollection",
"createLeaseCollectionIfNotExists": "true"
},
{
"type": "blob",
"direction": "out",
"name": "outputBlob",
"path": "raw/changefeedOutput/{blobname}",
"connection": "blobStorageConnection"
}
]
}
So the trigger will get a documents like the following:
{ "id": "documentId-12345",
other sections here
"entity": "customer"
}
In the init.py file I have the base code of
def main(documents: func.DocumentList) -> func.Document:
logging.info(f"CosmosDB trigger executed!")
for doc in documents:
blobName = doc['id'] + '.json'
blobFolder= doc['entity']
blobData = doc.to_json()
I think i need to add in the def something like 'outputBlob: func.Out' but unsure how to proceed
Looking at the examples on github
https://github.com/yokawasa/azure-functions-python-samples/tree/master/v2functions/blob-trigger-watermark-blob-out-binding
it look like i have to
outputBlob.set(something)
So i'm looking for how to set up the def part and send the blob to the location that i've set from the data in the cosmosdb document.
I have tried the following:
def main(documents: func.DocumentList, outputBlob: func.Out[str] ) -> func.Document:
logging.info(f"CosmosDB trigger executed!")
for doc in documents:
blobName = doc['id'] + '.json'
outputBlob.set(blobName)
and get the result:
CosmosDB trigger executed!
Executed 'Functions.CosmosTrigger_py' (Failed, Id=XXXXX)
System.Private.CoreLib: Exception while executing function: Functions.CosmosTrigger_py. Microsoft.Azure.WebJobs.Host: No value for named parameter 'blobname'.
I could just call the connection stuff from the os.enviro, and get the connection string that way, I think and use the standard create_blob_from_text, with location, name and blob data,
block_blob_service.create_blob_from_text(blobLocation, blobName, formattedBlob)
Any pointers would be great