Anzure Search ingestion events - azure

I'd like to know if Azure Search offers any ability to trigger an Azure Function when a document gets indexed or inserted into Azure search or if there are any other events I can take advantage of.
I'd like to avoid a timed event which continuously scans Azure search for new documents.

If you're using an indexer, you can add a skillset with a WebApiSkill to invoke your Azure Function for each inserted document. However, there's no transactional consistency guarantees - a document for which your function will be invoked is not guaranteed to be successfully inserted into the index.

Unfortunately, there isn't a great way to do this today. Eugene's suggestion will work, but isn't super efficient and also does indeed have the limitation of the document might not actually make it to the index if something else goes wrong later in the indexer. Please vote on the following uservoice item which is related to implementation for triggered events for Azure Cognitive Search if you are interested in seeing a more well defined option for this scenario: https://feedback.azure.com/forums/263029-azure-search/suggestions/10095111-azure-search-alerts

Related

How to do a transaction with Azure.Data.Tables.TableClient?

I can use this TableClient SDK (for Azure Tables) to create, update, retrieve, delete....etc.
But I'm not sure how to do updates (multiple records) in a transaction.
I dont see any documentation of this anywhere (other than the mere mention of doing transactions as a possible design pattern when working with Azure Tables).
How to do this?
Reference to the document: https://www.nuget.org/packages/Azure.Data.Tables/
Found the answer on my own.
Use the TableTransactionActions.
Then call tableclient.SubmitTransaction(actionsList)

Bringing a MS Graph Search Custom Connector into working mode

Recently Microsoft published the Microsoft Search API (beta) which provides the possibility to index external systems by creating an MS Graph search custom connector.
I created such a connector that was successful so far. I also pushed a few items to the index and in the MS admin center, I created a result type and a vertical. Now I'm able to find the regarded external items in the SharePoint Online modern search center in a dedicated tab belonging to the search vertical created before. So far so good.
But now I wonder:
How can I achieve that the external data is continuously pushed to the MS Search Index? (How can this be implemented? Is there any tutorial or a sample project? What is the underlying architecture?)
Is there a concept of Full / Incremental / Continuous Crawls for a Search Custom Connector at all? If so, how can I "hook" into a crawl in order to update changed data to the index?
Or do I have to implement it all on my own? And if so, what would be a suitable approach?
Thank you for trying out the connector APIs. I am glad to hear that you are able to get items into the index and see the results.
Regarding your questions, the logic for determining when to push items, and your crawl strategy is something that you need to implement on your own. There is no one best strategy per se, and it will depend on your data source and the type of access you have to that data. For example, do you get notifications every time the data changes? If not, how do you determine what data has changed? If none of that is possible, you might need to do a periodic full recrawl, but you will need to consider the size of your data set for ingestion.
We will look into ways to reduce the amount of code you have to write in the future, but right now, this is something you have to implement on your own.
-James
I recently implemented incremental crawling for Graph connectors using Azure functions. I created a timer triggered function that fetches the items updated in the data source since the time of the last function run and then updates the search index with the updated items.
I also wrote a blog post around this approach considering a SharePoint list as the data source. The entire source code can be found at https://github.com/aakashbhardwaj619/function-search-connector-crawler. Hope it would be useful.

How to bulk delete (say millions) of documents spread across millions of logical partitions in Cosmos db sql api?

MS Azure documentation does not talk anything about it. Formal bulk executor documentations talks only about insert and update options, not delete. There is a suggested java script server side program to create a stored procedure which sounds very good, but that requires us to input the partition key value. It wont make sense if our documents are spread across millions of logical partitions.
This is a very simple business need. While migrating huge volume of data in a sql api cosmos collection, if we insert some wrong data, there seems to be no option to delete other then restore to previous state. I have explored for few hrs now, but couldnt find a solution. Even raised a case with MS support, they directed to some .net code which I see need to see as that does not look straightforward. What if someone dont know .net.
Cant we easily bulk delete docs spread across several logical partitions in MS Cosmos SQL API ? Feels disgusting ..
I hope you can provide some accurate details. How to achieve this with some simple straight forward sample code and steps as well. Hope MS and Cosmos db experts to share views as well.
Even raised a case with MS support, they directed to some .net code
which I see need to see as that does not look straightforward.
Obviously,you have already made some efforts to find any solutions except below 2 scenarios:
1.Bulk delete Stored procedure:https://github.com/Azure/azure-cosmosdb-js-server/blob/master/samples/stored-procedures/bulkDelete.js
2.Bulk delete executor:
.NET: https://github.com/Azure/azure-cosmosdb-bulkexecutor-dotnet-getting-started/blob/master/BulkDeleteSample/BulkDeleteSample/Program.cs
Java: https://github.com/Azure/azure-cosmosdb-bulkexecutor-java-getting-started/blob/master/samples/bulkexecutor-sample/src/main/java/com/microsoft/azure/cosmosdb/bulkexecutor/bulkdelete/BulkDeleter.java
So far, only above official solutions are supported. Another workaround is TTL for cosmos db.I believe you have your own logic to judge which part of data is correct and which part of data is wrong,should be deleted. You could set TTL on those data so that they could be killed as soon as expired data arrivals.
Has anyone tried this .. looks like a good solution in java
https://github.com/Azure/azure-cosmosdb-bulkexecutor-java-getting-started#bulk-delete-api
If you write a batch job to do that delete documents over night by using some date configuration we could achieve it. Here is the article published on how to do it.
https://medium.com/#vaibhav.medavarapu/bulk-delete-documents-from-azure-cosmos-db-using-asp-net-core-8bc95dd20411

CosmosDB change feed, leases and azure functions

I've recently started working with Azure CosmosDB and functions. While reading documentation https://learn.microsoft.com/pl-pl/azure/cosmos-db/change-feed-processor I found something that is quite hard to understand for me. Is it actually possible to share a change feed between many functions so they will be triggered by one and same db operation? What is the lease collection and what problem does it solve. What is the purpose of lease? I'd like to have a basic explaination of these terms. In the link i provided it is said that it is possible to share a lease between two functions but then it is said that a lease object has an owner property.
Yes you can have multiple functions being triggered from the same change. However this requires you to have separate leases for them. They can live in the same lease collection but they need a different prefix. There is a setting for that. In Azure functions it's the leaseCollectionPrefix attribute property.
Leases are really just documents like any other in Cosmos DB that will be used to keep track of the consumers for this change feed processor and save some checkpoints so they know where to continue if your app restarts.

Monitor Database Calls with Application Insights

So I've been reading through the Application Insights information published by Microsoft, and in particular this article: https://azure.microsoft.com/en-gb/documentation/articles/app-insights-search-diagnostic-logs/
So what I want to ask is, whats the most logical methodology to log database calls?
In my head, I want to be able to log into application insights, see the most common database calls being made, and see what their average call times are. That way, I can say "wow the lookup to the membership profile table is taking a few seconds today, what's the deal?"
So I have a database name, a stored procedure name, and an execution time, what's the best way for me to take that data and store it in AI? As a metric, an event, something else?
First of all AI has dependency calls autocollection. Please read this. Secondly it is planned to release SDK 1.1 next week. As part of that release there you will have DependencyTelemetry type that is added specifically for monitoring SQL, http, blob and other external dependencies.

Resources