I am using Azure for hosting my project and chosen blob to store all by files (as they are in megabyte and count is huge). I have a requirement to search within all my files in blob (kind of like full text search). I tried integrating it with Azure search but no luck as the indexes are made on sql only. Is there a way to integrate the full text search in blob?
If not, what would be effective way of storing the documents in azure and still making them searchable (full text search) just like what sharepoint provides.
I work on Azure Search. We just shipped preview support for indexing documents stored in Azure blob storage, with support for PDF, Office docs, HTML and a few other formats. Please see https://azure.microsoft.com/en-us/documentation/articles/search-howto-indexing-azure-blob-storage/ for more details.
Thanks,
Eugene
You can try azure search which now supports cognitive search[Preview] where it does image recognition using OCR. It does a great job with pdf and all type of documents.
It works good even with scanned document.
There is an online demo from microsoft on azure search which does a great job. https://jfk-demo.azurewebsites.net/
Related
Moving from AWS Glue to Azure Purview and i am confused about something
Its is possible to query Azure purview data catalog/assets in the same way we can query from AWS Glue data catalog using AWS Athena?
Unfortunately, you cannot query data from Azure Purview.
The Purview search experience is powered by a managed search index. After a data source is registered with Purview, its metadata is indexed by the search service to allow easy discovery. The index provides search relevance capabilities and completes search requests by querying millions of metadata assets. Search helps you to discover, understand, and use the data to get the most value out of it.
The search experience in Purview is a three stage process:
The search box shows the history containing recently used keywords
and assets.
When you begin typing the keystrokes, the search suggests
the matching keywords and assets.
The search result page is shown with assets matching the keyword entered.
For more details, refer to Understand search features in Azure Purview.
I plan on using Azure Blob storage to store images. I will have around 5000 categories for images that I plan on using folders to keep separated. For each of the image files, the file names won't differ a lot across the board and there is the potential to need to change metadata frequently.
My original plan was to use a SQL database to index all of these files and store my metadata there, but I'm second guessing that plan.
Is it feasible to index files in Azure Blob storage using a database, or should I just stick with using blob metadata?
Edit: I guess this question should really be "are there any downsides to indexing Azure Blob storage using a relational database?". I'm much more comfortable working with a DB than I am Azure storage, so my preference is to use a DB.
I'm second guessing whether or not to use a DB after looking at Azure storage more and discovering meta-tags and indexing. Hope this helps.
You can use Azure Search for this task as well, store images in Azure Storage (BLOB) and use Azure Search for crawling. indexing and searching. Using metadata you can enhance your search as well. This way you might not even need to use Folders to separate different categories.
Blob Index is a very feasible option and it can save the in the pricing, time, and overhead in terms of not using SQL.
https://azure.microsoft.com/en-gb/blog/manage-and-find-data-with-blob-index-for-azure-storage-now-in-preview/
If you are looking for more information on this preview feature, I would love hear more and work closer on this issue. Could you please reach me on BlobIndexPreview#microsoft.com.
Is there a way to use Azure Search against Azure File Shares. I only see blob storage as an option. We have on-prem servers that sync files to Azure File Shares and would like to search inside those files in a web application.
At this moment, there's no way unless you manually query and push file content to your Azure Cognitive Search index. In the future, there's a hope you'll be able to trigger an Azure Function using this type of binding, which will make your life easier. You can follow / vote up for this feature in the following link:
https://github.com/Azure/azure-webjobs-sdk-extensions/issues/14
Per UserVoice Page for Azure Search: https://feedback.azure.com/forums/263029-azure-search/suggestions/14274261-indexer-for-azure-file-shares#{toggle_previous_statuses}, Azure File Indexer is available in private preview (in fact this has been in this stage for almost 2 years now :)).
Search team would like to reach out to them in case you're interested.
I have the following use case for building a Data Lake (e.g. in Azure):
My organization deals with companies that go into bankruptcy. Once a company goes bankrupt, it needs to hand over all of their data to us, including structured data (e.g. CSVs) as well as semi-structured and unstructured data (e.g. PDFs, Word documents, images, JSON, .txt files etc.). Having a data lake would help here as the volumes of data can be large and unpredictable and Azure Data Lake seems like a relatively low-cost and scalable storage solution.
However, apart from storing all of that data we also need to give business users a tool that will enable them to search through all of that data. I can imagine two search types:
searching for specific files (using file names or part of file names as the search criteria)
searching through all text files (word documents, .txt and PDFs) and identifying those files that meet the search criteria (e.g. a specific phrase being searched for)
Are there any out of the box tools that can use Azure Data Lake as a data source that would enable users to perform such searches?
Unfortunately, there isn't a tool can help you filter the files directly in Data Lake for now.
Even Azure Storage Explorer only support search by prefix.
Data Factory support we filter the files, but it usually used for copy and transfer data. Reference: Data Factory supports wildcard file filters for Copy Activity
Update:
Azure Cognitive Search seems to be a good choice.
Cognitive Search supports import source from Data Lake, and it provide the filter to help us search the files.
A filter provides criteria for selecting documents used in an Azure Cognitive Search query. Unfiltered search includes all documents in the index. A filter scopes a search query to a subset of documents.
We could reference from Filters in Azure Cognitive Search
Hope this helps.
Cognitive Search with Azure Data Lake is definitely an option and it is Microsoft recommends. Several factors we need to consider:
Price. https://azure.microsoft.com/en-us/pricing/details/search/. Not a cheap option.
Size of your source data and index you need.
Your acknowledgment of other open-source services. ELK is a popular open-source framework for full-text searching.
Data in sql azure, we have an existing webapi expose the data with odata. Issue is client want to make a call with filters with substring query on a few columns, which making performance really slow. we are debating at this point whether to use full text search index or use the azure search service, thoughts please?
Some of the considerations and tradeoffs between hosting search in Azure Search vs. using SQL Server FTS are captured here.
As pointed out above, Azure Search can index in-database data - see Connecting Azure SQL Database to Azure Search using indexers.
You can point Azure Search at your AzureSQL database and it will index it without you having to write code, but Azure Search is a service you have to pay for on hourly basis and you can learn more information about it here.
Azure Search is recommended for performing searches on various sources and application
Azure search can be used instead of Full-Text search, but If you need to join search results with other tables, then Full-Text Search is recommended.
Hope this helps.