How can we prevent an azure function from processing already processed blob?

How can we prevent an azure function from processing already processed blob? - azure

I have an azure function, which is binded to blob storage. Once the blob is successfully processed I rename the file with a suffix '-Processed'.
But my azure function again picks up the same blob for processing. I tried putting {name}.csv filter in the BlobTrigger binding but that didn't help as the file will still be a csv even after the rename.
I know I can filter blobs to have a particular string in file name, for eg "original-{name}" will filter files starting with original.
But Is there a way in azure functions using which I can filter the blob names to not include a particular string, in my case '-Processed'?

Just use two different paths for processed and not processed blobs.
Put your new blobs with prefix ("notprocessed-" for example), when renaming remove prefix. Set "path": "input/notprocessed-{name}"

Actually, blob service only supports filtering by blob prefix and not by suffix. Your only option would be to list blobs and then do client side filtering.
Also, the list blobs operation has an additional delimiter parameter that enables the caller to traverse the blob namespace by using a user-configured delimiter.
You could refer to this article for more details.

Related

Extracting Files from Onprem server to Azure Blob storage while filtering files with no data

I am trying to transfer on-premise files to azure blob storage. However, out of the 5 files that I have, 1 has "no data" so I can't map the schema. Is there a way I can filter out this file while importing it to azure? Or would I have to import them into azure blob storage as is then filter them to another blob storage? If so, how would I do this?
DataPath
CompleteFiles Nodata

If your on prem source is your local file system, then first copy the files with folder structure to a temporary blob container using azcopy SAS key. Please refer this thread to know about it.
Then use ADF pipeline to filter out the empty files and store it final blob container.
These are my files in blob container and sample2.csv is an empty file.
First use Get Meta data activity to get the files list in that container.
It will list all the files and give that array to the ForEach as #activity('Get Metadata1').output.childItems
Inside ForEach use lookup to get the row count of every file and if the count !=0 then use copy activity to copy the files.
Use dataset parameter to give the file name.
Inside if give the below condition.
#not(equals(activity('Lookup1').output.count,0))
Inside True activities use copy activity.
copy sink to another blob container:
Execute this pipeline and you can see the empty file is filtered out.
If your on-prem source is SQL, use lookup to get the list of tables and then use ForEach. Inside ForEach do the same procedure for individual tables.
If your on-prem source other than the above mentioned also, first try to copy all files to blob storage then follow the same procedure.

Azure Data Factory, BlobEventsTrigger: configure blob path with scheduledTime value which is dynamic content

I have an Azure Data Factory pipeline with two triggers:
schedule trigger
blob event trigger
I would like for blob event trigger to wait for a marker file in storage account under dynamic path e.g.:
landing/some_data_source/some_dataset/#{formatDateTime(#trigger().scheduledTime, 'yyyyMMdd')}/_SUCCESS
Refering to #trigger().scheduledTime doesn't work.
How to pass scheduleTime parameter value from schedule trigger to blob event trigger ?

If I understand correctly, you are trying to edit blob event trigger fields Blob path begins with or Blob path ends with - using the scheduleTime from the scheduleTrigger!
Unfortunately, as we can confirm from the official MS doc Create a trigger that runs a pipeline in response to a storage event
Blob path begins with and ends with are the only pattern matching
allowed in Storage Event Trigger. Other types of wildcard matching
aren't supported for the trigger type.
It takes the literal values.
This does not work:
Only if you have a file names as same ! Unlikely
Also, this dosen't
But, this would
Workaround:
As discussed with #marknorkin earlier, since this is not available out-of-the-box in BlobEventTrigger, we can try use Until activity composed from GetMetadata+Wait activities, where in GetMetadata will check for dynamic path existence.

Is there a way to only pass the appended blob content to blob trigger Azure Function?

I am trying to make a blob trigger azure function for log files, but the problem is it will pass through the entire blob content when any blob is created or updated.
So I am wondering is there a way to only get the appended blob content?
module.exports = async function main(context, myBlob) {
// I am using javascript, myblob contains the entire content of single blob.
// For logs in append blob, it results in duplicated logs, which is not ideal.
};

So I am wondering is there a way to only get the appended blob content?
No.
Unless you
maintain the byte-index/position per log file where you read last some place (e.g. using a file/DB/any-persistant-storage) or use Durable Function
on change notification, you find the last byte-index/position and read starting from that location using appropriate SDK/API. Here is the REST API (for ADLS Gen2, find the right one if you're using Gen1 or Blob) and some description on how to read a byte range out of a file in blobs.

Get list of blobs in container added after some date

I need to read blobs from a azure container which are added after a particular date.
Basically, I have a windows service which runs once a day and gets the list of blobs added after the first run.
I do not see any such option in CloudBlobContainer.ListBlobsSegmentedAsync function or via the Get Blob REST API call.
I could think of only one option- have timestamp in the filename and filter by prefix but would like to know other better options to achieve this.

Unfortunately there's very limited server-side filtering available in Azure Blob Storage and only filtering allowed today is by blob name prefix.
One solution to your problem is list all blobs in a container. Each blob has a property called Created Date/Time which tells you when the blob was first created (there's another property called Last Modified as well).
When you have the list, you can filter on the client side by this Created Date/Time property to get the desired list of blobs.

Can you output an Azure Logic App Variable to file and store on blob storage?

I have searched google and MSDN and it's not clear if you can write a variable to blob storage? Searching the available steps/actions does not yield anything obvious either.
I have constructed an array variable of file names from an SFTP in per the following documentation, but I can't figure out if this can be stored or saved in any capacity.
Right now it seems these variables are essentially internal to the logic app and can't be made external or is there a way to export them?
https://learn.microsoft.com/en-us/azure/logic-apps/logic-apps-create-variables-store-values

If you just simply want to save the variable's value in a blob then you can do so with the Azure Blob Storage - Create Blob action:

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How can we prevent an azure function from processing already processed blob? - azure

Just use two different paths for processed and not processed blobs. Put your new blobs with prefix ("notprocessed-" for example), when renaming remove prefix. Set "path": "input/notprocessed-{name}"

Related

Extracting Files from Onprem server to Azure Blob storage while filtering files with no data

Azure Data Factory, BlobEventsTrigger: configure blob path with scheduledTime value which is dynamic content

Is there a way to only pass the appended blob content to blob trigger Azure Function?

Get list of blobs in container added after some date

Can you output an Azure Logic App Variable to file and store on blob storage?

Categories

Resources