Azure Cognitive Search - Index and Deletes - azure

I setup a demo instance of Azure Search with the web front end app.
One thing I have noticed is that even after I remove a document from Blob storage and the indexer runs again, the deleted document and its contents are still stored in the index. How can I remove the document’s contents from the index without deleting and recreating the index?
Here is the link to my GitHub repository for the template for this environment… https://github.com/jcbendernh/Azure-Search-Ignite-2018-Demo
Any insight that you can provide is extremely appreciated.

In order to get a document to be removed from your index by the indexer when it is no longer in the data source, you need to define a data deletion detection policy in your indexer.
There are two different approaches:
1. By defining a column that defines which fields are supposed to be deleted from your data source (SoftDeleteColumnDeletionDetectionPolicy)
2. Or by using the new native soft delete support in blob storage (NativeBlobSoftDeleteDeletionDetectionPolicy)
Both of these approaches are documented at https://learn.microsoft.com/en-us/azure/search/search-howto-indexing-azure-blob-storage#incremental-indexing-and-deletion-detection
Thanks,
-Luis Cabrera (Azure Search PM)

Related

How to retrieve latest file from SharePoint to blob using logic app?

I am getting the everyday one new file in my SharePoint. My file is stored in "Shared Document/Data"
Below is my logic app flow. I used Get Files and I am selecting Document and included Nested items as I am getting new data under "data folder" which is stored in the Shared document.
I used Filter Array to get the last modified file with less than or equal to 5m so I can get the latest file
I am facing two issues,
It takes all the files from SharePoint under "Get Files" and the filter array is not working.
I have used create blob wrongly
Can anyone advise me on how to do this?
Follow the workaround
You can use the below Trigger & Connector
Sharepoint (Trigger) - When a file is created or modified in a folder
You can use this connector to select and get the exact Directory to fetch the recent Modified/Created File.
Azure Blob Storage (Connector)- Create Blob (V2)
Use this connector to create a blob.
Result
Modified file fetched and added in a Blob
Refer here for more information
Updated Answer
Here is the list of available directories in SharePoint that will be shown in a SharePoint Trigger. You can select according to your requirement.

Storing Documents on Azure with custom metadata

i am trying to find the best way to implement a small site allowing the user to upload a file and then search on it.
i used azure search with blob storage.
the file is stored on the blob storage and is then gets indexed by azure search indexer - so far so good.
the problem is that i would like to add to each document some custom data like file id and other business data, this data is not part of the document. is there a way to achieve this?
some one, suggested i use cosmos db, though i am not sure its the best way to go when it comes to documents.
Thanks
If you would like to keep using blob storage, you can store metadata with the blobs - just add custom metadata to your blobs, add corresponding fields to the search index, and the blob indexer will pick up the metadata.

Is it possible to Push (using API) and Pull (using indexer) data into the same Index with Azure Search?

I have an index in Azure Search lets says called Hotels.
I have a hotels table in Azure SQL that has the same schema that is a copy of the hotels index found in Azure Search.
I push from my back-end to Azure SQL table and Azure Search at create/update/delete.
In a scenario my data was pushed to Azure SQL but failed to be pushed to Azure Search is it possible to have my Azure SQL Hotels table be an indexer, such that the indexer could sync data to my Azure Search index (hotels) that failed to be pushed from my backend?
Yes, you can both mix push and pull as well as have multiple pull indexers targeting the same index. We see this done often when part of the data is in one data source and part in another, where the index is the point where they converge, coordinated by their key.
The pattern you're describing is not as common, but generally speaking it should work. You'd have to account for cases where your write conflicts with an indexer write, and make sure the writes you do as they happen ultimately win. Also if you go down this path make sure to configure a change detection (and deletion detection if you delete rows) policy so we index from SQL incrementally and don't ready everything on every run.
An alternative approach if you're worried about missing writes is to push all your writes into a queue, and then pull from the queue and into Azure Search. That way you have a single stream of writes instead of two.

Fast mechanism for querying Azure blob names

I'm trying to get a list of blob names in Azure and I'm looking for ways to make this operation significantly faster. Within a given sub-folder, the number of blobs can exceed 150,000 elements. The filenames of the blobs are an encoded ID which is what I really need to get at, but I could store that as some sort of metadata if there was a way to query just the metadata or a single field of the metadata.
I'm finding that something as simple as the following:
var blobList = container.ListBlobs(null, false);
can take upwards of 60 seconds to run from my desktop and typically around 15 seconds when running on a VM hosted in Azure. These times are based on a test of 125k blobs in an otherwise empty container and were several hours after they were uploaded, so they've definitely had time to "settle", so to speak.
I've attempted multiple variations and tried using ListBlobsSegmented but it doesn't really help because the function is returning a lot of extra information that I simply don't need. I just need the blob names so I can get at the encoded ID to see what's currently stored and what isn't.
The query for the blob names and extracting the encoded Id is somewhat time sensitive so if I could get it to under 1 second, I'd be happy with it. If I stored the files locally, I can get the entire list of files in a few ms, but I have to use Azure storage for this so that's not an option.
The only thing I can think of to be able to reduce the time it takes to identify the available blobs is to track the names of the blobs being added or removed from a given folder and store it in a separate blob. Then when I need to know the blob names in that folder, I would read the blob with the metadata rather than using ListBlobs. I suppose another would be to use Azure Table storage in a similar way, but it seems like I'm being forced into caching information about a given folder in the container.
Is there a better way of doing this or is this generally what people end up doing when you have hundreds of thousands of blobs in a single folder?
As mentioned, Azure Blob storage is a storage system and doesn't help you in indexing the content. We now have Azure Search Indexer which indexes the content uploaded to Azure Blob storage, refer https://azure.microsoft.com/en-us/documentation/articles/search-howto-indexing-azure-blob-storage/ with this you can perform all the features supported by Azure Search e.g. listing, searching, paging, sorting etc.. Hope this helps.

How do I backup my azure search index?

I am using Azure search and would like to make sure I can recover from a self inflicted disaster before I push more docs in there. How do I backup my index?
Is creating Azure Search replicas equivalent to making a backup?
How would one restore that?
Thanks
Microsoft have released a console app on github that can be used to backup and restore Azure Search Indexes - its not perfect, but I use it almost daily for backup and restores from prod to CI/QC/Dev instances
https://learn.microsoft.com/en-us/samples/azure-samples/azure-search-dotnet-samples/azure-search-backup-restore-index/
Right now you can't do that from the API or the portal, just save a copy of the JSON schema to a .js file, for example. See the Get Index API.
Normally you don't need to touch the index very often, only add, update or remove documents.
You would need to use an indexer from an external source to push the data into Search and be able to create backups at the same time.
If its an AzureSQL database, this may do it for you automatically, depending on your subscription
Create a table with the same fields in the Azure Search Index and add a deleted flag and a last update date, then import all of your data into the database. Set the date flag to the time that you imported the data.
At the top of the azure search bar, there is an option to 'Import Data'. This will allow you to connect the data source, this way you can create an index which will look at the last modified data and deleted flag when you create the connection.
The wizard will take you through all of the options
From there, just update the SQL table with your changes and the indexer will automatically push them to Azure search.
Thank you for an answer about https://learn.microsoft.com/en-us/rest/api/searchservice/Get-Index
Sometimes Azure Search index it's an only source to restore data.
For example in Microsoft QnA maker - if you will delete azure web app or azure app service- you no longer can even export Knowledge base from QnA maker.
To somehow restore data from QnA maker- I used Azure Search index.

Resources