Invalid blob name length. The blob name must be between 1 and 1024 characters long - azure

I have been working with an Azure blob storage, in which I have tons of documents (all .pdf format). When working with cognitive services (index, indexer and skillsets), running it with the storage it works fine after it appears the error in the question name. But when I check and the length is less than 1024 (I think so).
I would like to know if there's a way to find out how Cognitive services is acquiring those blobs so I know what is happening and try to change it.
Example: Tons of pdfs with any type of name (normal characters (a-z), numbers and "-") but after like 600 docs it appears the error described before.
Curiously, as I was checking which document was related with the error, the same name appeared (i.e. NAME SURNAME 1.pdf and so on for like 9 documents). Once deleted, the error appeared again with another document, completely different.
Sorry for the large description.

As far as valid blob names go, see Naming and Referencing Containers, Blobs, and Metadata - Blob names:
A blob name must conforming to the following naming rules:
A blob name can contain any combination of characters.
A blob name must be at least one character long and cannot be more than 1,024 characters long, for blobs in Azure Storage.
The Azure Storage emulator supports blob names up to 256 characters long. For more information, see Use the Azure storage emulator for development and testing.
Blob names are case-sensitive.
Reserved URL characters must be properly escaped.
The number of path segments comprising the blob name cannot exceed 254. A path segment is the string between consecutive delimiter characters (e.g., the forward slash '/') that corresponds to the name of a virtual directory.

Related

How do I set a logic app in Azure to transfer files from blob storage to sharepoint based on the name?

Good morning all,
I'm trying to build a logic app, that will upload the files from Azure Blob Storage to Sharepoint. It was quite easy to do when all the files were supposed to be uploaded to one folder. I was asked to seperate them by the name. So, if a file contains 'dog' in the name, it should go to folder 1, but if a file contains 'cat' it should go to a different folder on Sharepoint, from the same blob storage.
I've tried to add a condition to the logic app, that if it's 'true' that 'name' contains 'dog', upload it to folder 1, if false, upload it to folder 2 (there is always a file containing either 'dog' or 'cat'), but it still uploaded all of them to the folder with 'false' result. Basically, when I ran the logic app, all the results were false, so the problem is with the condition itself, but as I'm new to this, I wasn't able to figure out, what exactly is failing. Below is the screenshot of the logic app to upload all the files to one folder, I'm not quite sure where to put the condition (I've tried to place it everywhere, same result) and how to configure it properly.
Working solution to upload everything to one folder
If the left hand side of the condition is the name of the blob, based on what you've said you want, the right hand side should literally be the word Library ... no expressions or anything else.
Your condition, in plain English, says ...
If the name of the blob contains the word "Library", do the true side, else, do the false.
If you want to check for the word Library ignoring case, wrap the blob name in a toLower() expression and set the right hand side to be all lower case, like thus ... library

Azure Blob Storage mass delete items with specific names

Let's say that I have a blob container that have the following files with names
2cfe4d2c-2703-4b0f-bed0-6b239f238206
1a18c31f-bf28-4f64-a796-79237fabc66a
20a300dd-c032-405f-b67d-9c077623c26c
6f4041dd-92da-484a-966d-d5a168a9a2ec
(Let's say there are around ~15000 files)
I want to delete around 1200 of them. I have a file with all the names I want to delete. In my case, I have it in JSON but it does not really matter in what kinda format it is; I know the files I wanna delete.
What is the most efficient/safe way to delete these items?
I can think of a few ways. For example, using az storage blob deletebatch or az storage blob delete . I am sure that the former is more efficient but I would not know how to do this because there is not really a pattern, just a big list of guids (names) that I want to delete.
I guess I would have to write some code to iterate over my list of files to delete and then use the CLI or some azure storage library to delete them.
I'd prefer to just use some built-in tooling but I can't find a way to do this without having to write code to talk with the API.
What would be the best way to do this?
The tool azcopy would be perfect for that. Using the command azcopy remove you can specify a path to a text file using the parameter --list-of-files={path} to filter on specific files. The file should be plain text and line delimited.

Copying file from SFTP to Azure Data Lake Gen2

So my problem is quite stupid but I cannot find a way to resolve it. I have one 15 GB file on external SFTP server that I need to copy to my data lake. The thing is that column delimiter is a comma and I have some nested lists as well. So when I am trying to use ADF copy activity, the result looks like that:
And most of my data is gone(as nested structures get cut on the first occurence of comma). So maybe I could ignore delimiter. I have tried to set pipe as a delimiter just to get this whole dataset as one column but this doesnt work either.
Powershell? I have tried different scripts that used to work with smaller files and I am getting an error every time.
I have even tried to upload it manually via Azure Storage Explorer but it fails as well after some time. I am not really sure how to make it work at this point.
Thank you for any advice!

Find a Blob in Azure Container

I have thousands & thousands of Blobs in a container, something like
A/Temp/A001-1.log
A/Temp/A001-2.log
A/Temp/A001-3.log
B/Tmp/B001-1.log
B/Tmp/B001-2.log
B/Tmp/B002-1.log
Now my problem is that I want to find Blob having A001 in its name. I understand that ListBlobsWithPrefix looks for Blob starting with some text which is not the case for me. ListBlobs would bring all the blobs to my code and then I would have to search for the one. Is there any way where I can just get the blobs I am looking for.
There's really no easy way to search a container for a specific blob (or set of blobs with a name pattern) aside from brute-force. And name prefixes, as you've guessed, won't help you either in this case.
What I typically advise folks to do is keep their searchable metadata somewhere else (maybe SQL DB, maybe MongoDB, doesn't really matter as long as it provides the search capability they need), with that data store containing a reference link to the exact blob. The blob name itself can also be stored in the metadata as one of the searchable properties.
Also: Once you get into the "thousands & thousands of blobs in a container," you'll find that pulling the blob names is going to take a while (which, again, I think you're seeing). Containers can certainly hold as many blobs as you want, but in that case, you really want to be accessing them directly, based on some other metadata, and not enumerating through the name list.
Instead of searching, construct the blobname if its prefix is known and then try downloading the blob.If the blob doesnt found you will be getting 404 not found exception.
As of today there's a preview feature available in few regions to allow for Blob Storage indexing:
https://learn.microsoft.com/en-us/azure/storage/blobs/storage-manage-find-blobs?tabs=azure-portal
Hope they make it available soonn,
regards

Issue while creating subfolder in Azure cloud

I am using the Azure Api provided by Microsoft for cloud storage...I am facing an unusual bug while creating the sub folder.
i.e when I create a subfolder within any container it is created easily within seconds. But when I try to create a subfolder again with different name it takes more time as compared to the previous one.
Again I try it is created easily. It means the sub folder no. 1 , 3 ,5 ,7 and so on are created easily and the even no. such as 2, 4 etc sub folders are created with delay.
i.e "Alternate sub folder creation is taking too much time"
Please let me know if there is any solution for this bug...
Are you creating these in blob storage? Folders, strictly speaking don't exist. They are simply a '/' character (or whatever delimiter you specify) are that imbedded in the file names. So you save a file using a path with out implicitly creating the path.
In the documentation, they even refer to these as "vitual folders" because they don't actual exist.

Resources