How to filter blobs in Azure SDK for Python - python-3.x

I want to search for blobs in my Azure blob storage according to a specific tag (like: .name, .creation_date, .size...)
My current way is returning all blobs from the container with MyContainerClient.list_blobs and searching for the corresponding tag afterwards. Since my container stores around 800000 blobs, this takes me around 20 min, which is not usable for a live view of the content.
But I also found another ContainerClient function: .find_blobs_by_tags(filter_expression: str) where I can search for a specific blob whose tags matches the specified condition.
In the Azure API they specified this filter_expression as: ""yourtagname"='firsttag'" , therefore I specified: ""name"='example.jpg'" or ""creation_date"='2021-07-04 09:35:19+00:00'"
Azure SDK Python - ContainerClient.find_blobs_by_tag
Unfortunately I always get an error:
azure.core.exceptions.HttpResponseError: Error parsing query at or near character position 1: unexpected 'creation_time'
RequestId:63bd850b-401e-005f-745e-400d5a000000
Time:2022-03-25T15:40:22.4156367Z
ErrorCode:InvalidQueryParameterValue
queryparametername:where
queryparametervalue:'creation_time'='0529121f-7676-46c7-8a52-424664774240/0529121f-7676-46c7-8a52-424664774240.json'
reason:This query parameter value is invalid.
Content: <?xml version="1.0" encoding="utf-8"?>
<Error><Code>InvalidQueryParameterValue</Code><Message>Error parsing query at or near character position 1: unexpected &apos;creation_time&apos;
RequestId:63bd850b-401e-005f-745e-400d5a000000
Time:2022-03-25T15:40:22.4156367Z</Message><QueryParameterName>where</QueryParameterName><QueryParameterValue>&apos;creation_time&apos;=&apos;0529121f-7676-46c7-8a52-424664774240/0529121f-7676-46c7-8a52-424664774240.json&apos;</QueryParameterValue><Reason>This query parameter value is invalid.</Reason></Error>
Has someone experience with this Azure function calls?

Looking at the github code(in the find_blobs_by_tags function) , it says :
:param str filter_expression:
The expression to find blobs whose tags matches the specified condition.
eg. "\"yourtagname\"='firsttag' and \"yourtagname2\"='secondtag'"
Looks like you are missing the escape characters? Can you try including them in?

Related

Azure Cognitive Search - No results using wildcards on content with DOT

I'm using Azure Cognitive Search to build a rich search experience inside a web application, however I'm facing the following issue: a field of the index will contains codes like "Z.A.01.12", "A.A.44.11" and so on...I'm trying to use the wildcards * as suffix in order to search all the results that starts with the value Z.A (just an example).
"Z.A.01.12" -> Z.A* => No results found.
"Z.A.01.12" -> Z.A.* => No results found.
"Z.A.01.12" -> Z\.A* => No results found.
I have tried different analyzer (standard lucene, en.microsoft, whitespace and keyword), but also when I see that exists only one token produced (for example with whitespace) with the entire content, when I query the service using wildlcard I receive "No results found".
I have already set queryType=full and searchMode=any. Furthermore I also tried to escape the . with "", but the results is always empty. Is there anything I can do to manage these cases?

Get blob content from file using wildcard

Example:
Blob: container/folder/myfile123.txt [where 123 is dynamic]
I am trying to get the content of an Azure blob file by using a wildcard for the extension since it can be different, but always the same leading (like myfile) and ending extension (like .txt). I've tested using things like myfile*.txt or myfile?.txt but no success when specifying the path.
For getting a wildcard file using the Get blob content tool in logic apps, how can I get a file by a leading name and ending extension but any possible combination between?
You must use the exact name of the file.
What you can do is to get a list of all blobs in the container. Then loop over that list getting each individual file.
You can use the "List blobs" connector and then the "Filter array" collections connector to get the wildcard functionality via the "contains"-operator. Then just use the "Get blob using path" and type in the expression: body('Filter_array')[0]['name']
Or in code view:
"path": "/my_catalogue/#{body('Filter_array')[0]['name']}"
to get the first filename that matches your wildcard.

Get blob contents from last modified folder in Azure container via Azure logic apps

I have an Azure logic app that's getting blob contents from my Azure storage account on a regular basis. However, my blobs are getting stored in sub-directories.
Eg. MyContainer > Invoice > 20200101 > Invoice1.csv
Every month my 3rd sub-directory that is '20200101' will change to '20200201', '20200301' so & so forth.
I need my Logic app to return the blob contents of the latest folder that gets created in my container.
Any advice regarding this?
Thanks!!
For this requirement, please refer to my logic app below:
1. List all of the folders under /mycontainer/Invoice/.
2. Initialize two variables in type of Integer, one named maxNum and the other named numberFormatOfName.
3. Use "For each" to loop the value from "List blobs" above. In "For each" loop, first set numberFormatOfName with expression int(replace(items('For_each')?['Name'], '/', '')). Then add a "If" condition to judge if numberFormatOfName greater than maxNum. If true, set the value of maxNum with numberFormatOfName.
4. After the "For each" loop, use another "List blobs" to list all of the blobs in latest(max number) folder. The expression in below screenshot is string(variables('maxNum')).
If you do not want list blobs, but you want get the blob content. You can do it like below:
==============================Update==============================
Running the logic app, I get the result shown as below screenshot:
I created three folders 20200101, 20200202, 20200303 under /mycontainer/Invoice in my blob storage. The content of three csv file are 111,111, 222,222, 333,333. The logic app response the third csv file content 333,333.
=============================Update 2=============================

Regular expression with if condition activity in Azure

I want to check if a file name contains a date pattern (dd-mmm-yyy) in it using if condition activity in Azure Data Factory. For example: The file name I have is like somestring_23-Apr-1984.csv which has a date pattern in it.
I get the file name using Get Metadata activity and passing it to the if condition activity where I want to check if the file name has the date pattern in it and based on the result I would like to perform different tasks. The only way I know to do this is by using regex to check if the pattern is available in the file name string but, Azure does not have a regex solution mentioned in the documentation.
Is there any other way to achieve my requirement in ADF? Your help is much appreciated.
Yes,there is no regex in expression.There is another way to do this,but it is very complex.
First,get the date string(23-Apr-1984) from the output of Get Metadata.
Then,split the date string and determine whether each part match date pattern.
Below is my test pipeline:
First Set variable:
name: fileName
value: #split(split(activity('MyGetMetadataActivity').output.itemName,'_')[1],'.csv')[0]
Second Set variable:
name: fileArray
value: #split(variables('fileName'),'-')
If Condition:
Expression:#and(contains(variables('DateArray'),variables('fileArray')[0]),contains(variables('MonthArray'),variables('fileArray')[1]))
By the way,I want to compare date with 0 and 30 originally,but greaterOrEquals() doesn't support nested properties.So I use contains().
Hope this can help you.

Azure adds timestamp at the beginning logs

I have a problem with the logs retrieving from my docker containers with Azure log analytics, all logs are retrieving well but Azure adds a date at the beginning of each line of the log, which means that an entry is created for each line and I can't analyze my logs correctly because they are divided...
For example on this image I have in the black rectangle an added date (by azure I think) and in the red rectangle the date appearing in my logs :
Also, if there is no date on a line of my logs, there is still an added date on all lines, even the empty ones
The problem is that azure cuts my log file line by line by adding a date on each line when I would like it to delimit with the dates already present in my logs files.
Do you have any solutions?
One of the solution I can think of is that, when you query the logs, you can use the replace() method to replace the redundant date(replace it with a empty string etc.). And you need to write the proper regular expression for your purpose.
A false query like below:
ContainerLog
| extend new_logEntry=replace(#'xxx', #'xxx', LogEntry)
Currently Azure Monitor for containers doesn’t support multi-line logging, but there are workarounds available. You can configure all the services to write in JSON format and then Docker/Moby will write them as a single line.
https://learn.microsoft.com/fr-fr/azure/azure-monitor/insights/container-insights-faq#how-do-i-enable-multi-line-logging

Resources