Find a Blob in Azure Container - azure

I have thousands & thousands of Blobs in a container, something like
A/Temp/A001-1.log
A/Temp/A001-2.log
A/Temp/A001-3.log
B/Tmp/B001-1.log
B/Tmp/B001-2.log
B/Tmp/B002-1.log
Now my problem is that I want to find Blob having A001 in its name. I understand that ListBlobsWithPrefix looks for Blob starting with some text which is not the case for me. ListBlobs would bring all the blobs to my code and then I would have to search for the one. Is there any way where I can just get the blobs I am looking for.

There's really no easy way to search a container for a specific blob (or set of blobs with a name pattern) aside from brute-force. And name prefixes, as you've guessed, won't help you either in this case.
What I typically advise folks to do is keep their searchable metadata somewhere else (maybe SQL DB, maybe MongoDB, doesn't really matter as long as it provides the search capability they need), with that data store containing a reference link to the exact blob. The blob name itself can also be stored in the metadata as one of the searchable properties.
Also: Once you get into the "thousands & thousands of blobs in a container," you'll find that pulling the blob names is going to take a while (which, again, I think you're seeing). Containers can certainly hold as many blobs as you want, but in that case, you really want to be accessing them directly, based on some other metadata, and not enumerating through the name list.

Instead of searching, construct the blobname if its prefix is known and then try downloading the blob.If the blob doesnt found you will be getting 404 not found exception.

As of today there's a preview feature available in few regions to allow for Blob Storage indexing:
https://learn.microsoft.com/en-us/azure/storage/blobs/storage-manage-find-blobs?tabs=azure-portal
Hope they make it available soonn,
regards

Related

Azure Blob Storage mass delete items with specific names

Let's say that I have a blob container that have the following files with names
2cfe4d2c-2703-4b0f-bed0-6b239f238206
1a18c31f-bf28-4f64-a796-79237fabc66a
20a300dd-c032-405f-b67d-9c077623c26c
6f4041dd-92da-484a-966d-d5a168a9a2ec
(Let's say there are around ~15000 files)
I want to delete around 1200 of them. I have a file with all the names I want to delete. In my case, I have it in JSON but it does not really matter in what kinda format it is; I know the files I wanna delete.
What is the most efficient/safe way to delete these items?
I can think of a few ways. For example, using az storage blob deletebatch or az storage blob delete . I am sure that the former is more efficient but I would not know how to do this because there is not really a pattern, just a big list of guids (names) that I want to delete.
I guess I would have to write some code to iterate over my list of files to delete and then use the CLI or some azure storage library to delete them.
I'd prefer to just use some built-in tooling but I can't find a way to do this without having to write code to talk with the API.
What would be the best way to do this?
The tool azcopy would be perfect for that. Using the command azcopy remove you can specify a path to a text file using the parameter --list-of-files={path} to filter on specific files. The file should be plain text and line delimited.

Invalid blob name length. The blob name must be between 1 and 1024 characters long

I have been working with an Azure blob storage, in which I have tons of documents (all .pdf format). When working with cognitive services (index, indexer and skillsets), running it with the storage it works fine after it appears the error in the question name. But when I check and the length is less than 1024 (I think so).
I would like to know if there's a way to find out how Cognitive services is acquiring those blobs so I know what is happening and try to change it.
Example: Tons of pdfs with any type of name (normal characters (a-z), numbers and "-") but after like 600 docs it appears the error described before.
Curiously, as I was checking which document was related with the error, the same name appeared (i.e. NAME SURNAME 1.pdf and so on for like 9 documents). Once deleted, the error appeared again with another document, completely different.
Sorry for the large description.
As far as valid blob names go, see Naming and Referencing Containers, Blobs, and Metadata - Blob names:
A blob name must conforming to the following naming rules:
A blob name can contain any combination of characters.
A blob name must be at least one character long and cannot be more than 1,024 characters long, for blobs in Azure Storage.
The Azure Storage emulator supports blob names up to 256 characters long. For more information, see Use the Azure storage emulator for development and testing.
Blob names are case-sensitive.
Reserved URL characters must be properly escaped.
The number of path segments comprising the blob name cannot exceed 254. A path segment is the string between consecutive delimiter characters (e.g., the forward slash '/') that corresponds to the name of a virtual directory.

Azure Data Factory - Recording file name when reading all files in folder from Azure Blob Storage

I have a set of CSV files stored in Azure Blob Storage. I am reading the files into a database table using the Copy Data task. The Source is set as the folder where the files reside, so it's grabbing it's file and loading it into the database. The issue is that I can't seem to map the file name in order to read it into a column. I'm sure there are more complicated ways to do it, for instance first reading the metadata and then read the files using a loop, but surely the file metadata should be available to use while traversing through the files?
Thanks
This is not possible in a regular copy activity. Mapping Data Flows has this possibility, it's still in preview, but maybe it can help you out. If you check the documentation, you find an option to specify a column to store file name.
It looks like this:

Search keywords in PDF blob - Azure Search

I am trying to search for keywords contained in the metadata of a PDF doc. I am unsure if this is possible. Any guidance would be much appreciated!
Here is an example of the keywords/tags in a PDF I am referring to
I know it is possible to add fields to the search index, but am unsure how to map it. I have tried the following but it did not work.
Here is how the keywords metadata would work -
Adding a keywords (metadata) to the pdf file would not work as only selected custom metadata tags are supported for pdf.
Refer this document - https://learn.microsoft.com/en-us/azure/search/search-howto-indexing-azure-blob-storage
A work around to this problem could be add metadata tag to the pdf file blob itself.
After we create a index in azure search for ("All Metadata"/Storage Metadata) this key starts appearing under the list of field names to select(search/retrieve/filter etc.).
And finally we can search on the custom keywords now.
The Keywords tag is not one of the ones we support through the metadata_ format (the ones that are, are listed here). If you add a field to the index called "Keywords", does it extract it? Also, I if you look at the properties of the PDF in something like Azure Storage Explorer, I assume this keyword metadata is still there and it is called "Keywords". If not, this might give some additional insight.

Want to setup setting data in Windows Azure Stream Analytics

Need help to setup the Reference data in stream analytics. I want to add setting(default) data of my application into stream analytics. I can add the reference data and by doing upload sample file I can upload JSON or CSV file. However while firing a join query it gives 0 rows as all reference data haven't stored (So null if left outer join).
I investigate the issue and I think it is due to Path Pattern, but I do not have much idea about it.
Based on your description, as you said, you had been sure that the issue was caused by Path Pattern/Path Prefix Pattern, but I could not give some helpful suggestion for you without any details, such as the screenshot of your Path Pattern setting.
So just list some resources as references for you, hope these help for resolving your issue.
Two screenshots about Path Prefix Pattern/Path Pattern which be introduced from Link 1 & 2.
A sample Use Stream Analytics to process exported data from Application Insights introduce how to read stream data from Blob Storage at its section Create an Azure Stream Analytics instance, which step as similar as for Reference data.
Hope it helps.
The issue was due to not properly formatted JSON file.

Resources