Bringing a MS Graph Search Custom Connector into working mode - search

Recently Microsoft published the Microsoft Search API (beta) which provides the possibility to index external systems by creating an MS Graph search custom connector.
I created such a connector that was successful so far. I also pushed a few items to the index and in the MS admin center, I created a result type and a vertical. Now I'm able to find the regarded external items in the SharePoint Online modern search center in a dedicated tab belonging to the search vertical created before. So far so good.
But now I wonder:
How can I achieve that the external data is continuously pushed to the MS Search Index? (How can this be implemented? Is there any tutorial or a sample project? What is the underlying architecture?)
Is there a concept of Full / Incremental / Continuous Crawls for a Search Custom Connector at all? If so, how can I "hook" into a crawl in order to update changed data to the index?
Or do I have to implement it all on my own? And if so, what would be a suitable approach?

Thank you for trying out the connector APIs. I am glad to hear that you are able to get items into the index and see the results.
Regarding your questions, the logic for determining when to push items, and your crawl strategy is something that you need to implement on your own. There is no one best strategy per se, and it will depend on your data source and the type of access you have to that data. For example, do you get notifications every time the data changes? If not, how do you determine what data has changed? If none of that is possible, you might need to do a periodic full recrawl, but you will need to consider the size of your data set for ingestion.
We will look into ways to reduce the amount of code you have to write in the future, but right now, this is something you have to implement on your own.
-James

I recently implemented incremental crawling for Graph connectors using Azure functions. I created a timer triggered function that fetches the items updated in the data source since the time of the last function run and then updates the search index with the updated items.
I also wrote a blog post around this approach considering a SharePoint list as the data source. The entire source code can be found at https://github.com/aakashbhardwaj619/function-search-connector-crawler. Hope it would be useful.

Related

Cognos REST API and scheduling schema loading

I am trying to find out more informations about using the REST API in order to create a schedule for schema loading. Right now, I have to reload the particular schemas via my data server connections manually (click on every schema and Load Metadata) and would like to automate this process.
Any pointers will be much appreciated.
Thank you
If the metadata of your data warehouse is so in flux that you need to reload the metadata so frequently that you want to automate the process then you need to understand that your data warehouse is in no way ready for use.
So, the question becomes why would you want to frequently reload the metadata of a data source schema? I'm guessing that you are refreshing the data of your data base and, because your query cache has not expired, you are not seeing the new data.
So the answer is, you probably don't want to do what you think you need to do unless you can convince me otherwise.
Also, if you enter some obvious search terms you will find the Cognos analytics REST api documentation without too much difficulty.

Setting up Azure Cognitive Search using #azure/search-document vs. using Azure Portal

My team is working on implementing Azure Cognitive Search on one of our websites. We notice that there are 2 ways to set it up: one way is using Azure Portal to import the data, create the index, and expose the APIs that do not require coding at all; another way is to use the #azure/search-documents library which requires a lot of coding to make the search happen.
We don't know for sure which way is better. We notice some aspects as followings:
Using portal: the process of setting up the search is easy and quick.
Using #azure/search-documents: it is a bit more tedious to set up the search, but it gives us the flexibility to the index definition and rules when to update the index.
Other than the above points, we don't know what are the other pros/cons of those 2 ways?
Any insight on this would be very appreciated!
Thank you!
While it's subjective based on the use case what the 'better' way is, typically for minimal business logic and simple data sources, you can use the Portal quickly to index and enrich documents.
You can check out the React Template we have that once you have an index you can seamlessly display UI elements like searching, filtering, sorting, and faceting documents.
https://github.com/dereklegenzoff/azure-search-react-template
You can also check out the Knowledge Mining Accelerator to show a step-by-step process on how to build a Cognitive Search solution.
https://learn.microsoft.com/en-us/samples/azure-samples/azure-search-knowledge-mining/azure-search-knowledge-mining/

Anzure Search ingestion events

I'd like to know if Azure Search offers any ability to trigger an Azure Function when a document gets indexed or inserted into Azure search or if there are any other events I can take advantage of.
I'd like to avoid a timed event which continuously scans Azure search for new documents.
If you're using an indexer, you can add a skillset with a WebApiSkill to invoke your Azure Function for each inserted document. However, there's no transactional consistency guarantees - a document for which your function will be invoked is not guaranteed to be successfully inserted into the index.
Unfortunately, there isn't a great way to do this today. Eugene's suggestion will work, but isn't super efficient and also does indeed have the limitation of the document might not actually make it to the index if something else goes wrong later in the indexer. Please vote on the following uservoice item which is related to implementation for triggered events for Azure Cognitive Search if you are interested in seeing a more well defined option for this scenario: https://feedback.azure.com/forums/263029-azure-search/suggestions/10095111-azure-search-alerts

How to bulk delete (say millions) of documents spread across millions of logical partitions in Cosmos db sql api?

MS Azure documentation does not talk anything about it. Formal bulk executor documentations talks only about insert and update options, not delete. There is a suggested java script server side program to create a stored procedure which sounds very good, but that requires us to input the partition key value. It wont make sense if our documents are spread across millions of logical partitions.
This is a very simple business need. While migrating huge volume of data in a sql api cosmos collection, if we insert some wrong data, there seems to be no option to delete other then restore to previous state. I have explored for few hrs now, but couldnt find a solution. Even raised a case with MS support, they directed to some .net code which I see need to see as that does not look straightforward. What if someone dont know .net.
Cant we easily bulk delete docs spread across several logical partitions in MS Cosmos SQL API ? Feels disgusting ..
I hope you can provide some accurate details. How to achieve this with some simple straight forward sample code and steps as well. Hope MS and Cosmos db experts to share views as well.
Even raised a case with MS support, they directed to some .net code
which I see need to see as that does not look straightforward.
Obviously,you have already made some efforts to find any solutions except below 2 scenarios:
1.Bulk delete Stored procedure:https://github.com/Azure/azure-cosmosdb-js-server/blob/master/samples/stored-procedures/bulkDelete.js
2.Bulk delete executor:
.NET: https://github.com/Azure/azure-cosmosdb-bulkexecutor-dotnet-getting-started/blob/master/BulkDeleteSample/BulkDeleteSample/Program.cs
Java: https://github.com/Azure/azure-cosmosdb-bulkexecutor-java-getting-started/blob/master/samples/bulkexecutor-sample/src/main/java/com/microsoft/azure/cosmosdb/bulkexecutor/bulkdelete/BulkDeleter.java
So far, only above official solutions are supported. Another workaround is TTL for cosmos db.I believe you have your own logic to judge which part of data is correct and which part of data is wrong,should be deleted. You could set TTL on those data so that they could be killed as soon as expired data arrivals.
Has anyone tried this .. looks like a good solution in java
https://github.com/Azure/azure-cosmosdb-bulkexecutor-java-getting-started#bulk-delete-api
If you write a batch job to do that delete documents over night by using some date configuration we could achieve it. Here is the article published on how to do it.
https://medium.com/#vaibhav.medavarapu/bulk-delete-documents-from-azure-cosmos-db-using-asp-net-core-8bc95dd20411

Combining data from Project Server and SharePoint into a single report

I need to combine data from the Project Server reporting database with data from custom lists in SharePoint workspaces. The results need to be displayed within a single report. How should this be done? Options I've thought of:
Extend the reporting database with the custom list data (if this is possible). Use Reporting Services to display the output.
Query the reporting database and the SharePoint workspaces and combine results in memory. Write custom code to display the output.
Any other ideas? I have the skills to develop this but am very open to purchasing a product if it solves the problem.
I've had this sort of problem as well. My apporach:
Create a Custom reporting Db.
Run regular jobs from the SQL Server to query sharepoint (via WS) and store the results in the db.
i use the ListItemsChangesSinceToken is Lists.asmx to improve effeciency. Also I utilise the sitedataquery tool set. I wrote a really simple interface into it for the ability to call a sitedataquery remotely, returning a dataTable.
Use Reporting Services / any tool to extract and report on the data.
The reason I opted for a staging Db was for
Performance - the WS calls are pretty slow.
Service continuity - if SP is down for any reason or slow then queries will fail.
Hope this helps.
I also found the tool SharePoint Data Miner which appears to do the same as DJ's answer.

Resources