Best way to index data in Azure Blob Storage? - azure

I plan on using Azure Blob storage to store images. I will have around 5000 categories for images that I plan on using folders to keep separated. For each of the image files, the file names won't differ a lot across the board and there is the potential to need to change metadata frequently.
My original plan was to use a SQL database to index all of these files and store my metadata there, but I'm second guessing that plan.
Is it feasible to index files in Azure Blob storage using a database, or should I just stick with using blob metadata?
Edit: I guess this question should really be "are there any downsides to indexing Azure Blob storage using a relational database?". I'm much more comfortable working with a DB than I am Azure storage, so my preference is to use a DB.
I'm second guessing whether or not to use a DB after looking at Azure storage more and discovering meta-tags and indexing. Hope this helps.

You can use Azure Search for this task as well, store images in Azure Storage (BLOB) and use Azure Search for crawling. indexing and searching. Using metadata you can enhance your search as well. This way you might not even need to use Folders to separate different categories.

Blob Index is a very feasible option and it can save the in the pricing, time, and overhead in terms of not using SQL.
https://azure.microsoft.com/en-gb/blog/manage-and-find-data-with-blob-index-for-azure-storage-now-in-preview/
If you are looking for more information on this preview feature, I would love hear more and work closer on this issue. Could you please reach me on BlobIndexPreview#microsoft.com.

Related

moving locally stored documented documents to azure

I want to spike whether azure and the cloud is a good fit for us.
We have a website where users upload documents to our currently hosted website.
Every document has an equivalent record in a database.
I am using terraform to create the azure infrastructure.
What is my best way of migrating the documents from the local file path on the server to azure?
Should I be using file storage or blob storage. I am confused about the difference.
Is there anything in terraform that can help with this?
Based on your comments, I would recommend storing them in Blob Storage. This service is suited for storing and serving unstructured data like files and images. There are many other features like redundancy, archiving etc. that you may find useful in your scenario.
File Storage is more suitable in Lift-and-Shift kind of scenarios where you're moving an on-prem application to the cloud and the application writes data to either local or network attached disk.
You may also find this article useful: https://learn.microsoft.com/en-us/azure/storage/common/storage-decide-blobs-files-disks
UPDATE
Regarding uploading files from local computer to Azure Storage, there are actually many options available:
Use a Storage Explorer like Microsoft's Storage Explorer.
Use AzCopy command-line tool.
Use Azure PowerShell Cmdlets.
Use Azure CLI.
Write your own code using any available Storage Client libraries or directly consuming REST API.

Can I store txt files in Azure SQL Database?

Hi is it possible to store .txt files inside azure sql database? if not then which service provides this?
You can write them into an nvarchar(max) column if you'd like. If they are CSV files, you may want to shred them into columns in a table. However, if you have a lot of text file data, you may find it cheaper to use Azure Storage instead. It is generally better for colder data where you don't need to process it if your text data is all want to store or if it is a majority of what you are trying to store.
hope that helps
You can have them first uploaded/stored on an Azure Storage Account and have them automatically processed and uploaded to a table on an Azure SQL Database using Azure Functions as explained here. Azure Functions have triggers to respond to an event like a new file has been uploaded.

Storing Documents on Azure with custom metadata

i am trying to find the best way to implement a small site allowing the user to upload a file and then search on it.
i used azure search with blob storage.
the file is stored on the blob storage and is then gets indexed by azure search indexer - so far so good.
the problem is that i would like to add to each document some custom data like file id and other business data, this data is not part of the document. is there a way to achieve this?
some one, suggested i use cosmos db, though i am not sure its the best way to go when it comes to documents.
Thanks
If you would like to keep using blob storage, you can store metadata with the blobs - just add custom metadata to your blobs, add corresponding fields to the search index, and the blob indexer will pick up the metadata.

Search blob storage data

I have a large amount of diagnostics data stored in an Azure Blob Storage. Is there any way I can get that data searchable from my Azure SQL database? I would like to join on some custom data fields in my blob stored data.
Blob storage doesn't have searchable metadata, per se: You may certainly search containers for given blob names, and you may even enumerate blobs to look at their metadata. But aside from the container/blob URI, there's no built-in search mechanisms.
If you want to search for metadata, you'll need to build your own data store (e.g.e in a searchable database such as SQL Database, the one you mentioned). This would be completely up to your app to do (you'd need to extract specific data you want to search, and store it in your database engine of choice). You'd then need to link your database engine's contents back to blob storage (e.g. store a blob's url alongside its metadata).
If you're talking about full-text search, you'd need to employ an appropriate fts tool. Azure provides Azure Search as a 1st-party full-text-search service, or you may certainly use a 3rd-party tool or service. What you choose is completely up to you.

Fast mechanism for querying Azure blob names

I'm trying to get a list of blob names in Azure and I'm looking for ways to make this operation significantly faster. Within a given sub-folder, the number of blobs can exceed 150,000 elements. The filenames of the blobs are an encoded ID which is what I really need to get at, but I could store that as some sort of metadata if there was a way to query just the metadata or a single field of the metadata.
I'm finding that something as simple as the following:
var blobList = container.ListBlobs(null, false);
can take upwards of 60 seconds to run from my desktop and typically around 15 seconds when running on a VM hosted in Azure. These times are based on a test of 125k blobs in an otherwise empty container and were several hours after they were uploaded, so they've definitely had time to "settle", so to speak.
I've attempted multiple variations and tried using ListBlobsSegmented but it doesn't really help because the function is returning a lot of extra information that I simply don't need. I just need the blob names so I can get at the encoded ID to see what's currently stored and what isn't.
The query for the blob names and extracting the encoded Id is somewhat time sensitive so if I could get it to under 1 second, I'd be happy with it. If I stored the files locally, I can get the entire list of files in a few ms, but I have to use Azure storage for this so that's not an option.
The only thing I can think of to be able to reduce the time it takes to identify the available blobs is to track the names of the blobs being added or removed from a given folder and store it in a separate blob. Then when I need to know the blob names in that folder, I would read the blob with the metadata rather than using ListBlobs. I suppose another would be to use Azure Table storage in a similar way, but it seems like I'm being forced into caching information about a given folder in the container.
Is there a better way of doing this or is this generally what people end up doing when you have hundreds of thousands of blobs in a single folder?
As mentioned, Azure Blob storage is a storage system and doesn't help you in indexing the content. We now have Azure Search Indexer which indexes the content uploaded to Azure Blob storage, refer https://azure.microsoft.com/en-us/documentation/articles/search-howto-indexing-azure-blob-storage/ with this you can perform all the features supported by Azure Search e.g. listing, searching, paging, sorting etc.. Hope this helps.

Resources