Events in Azure Search - azure

Is there a way to attach webhooks or get events from Azure Search?
Specifically we are looking for way to get notified (programmatically) when an indexer completes indexing an index.

Currently, there are no such events. However, you can implement functionality like this yourself. There are several scenarios to consider. Basically, you have two main approaches to adding content. Either define a content source and use pull or use the API to push content to the index.
The simplest scenario would be when you are using push via the API to add a single item. You could create a wrapper method that both submits your item and then queries the index until that item is found. Your wrapper method would need to either call a callback or fire an event. To support updates on an item you would need a marker on the item, like a timestamp property that indicates the time when the item was submitted to the index. Or a version number or something that allows you to distinguish the new item from the old.
A more complex scenario is when you handle batches or volumes of content. Assuming you start from scratch and your corpus is 100.000 items, you could query until the count matches 100.000 items before you fire your event. To handle updates, the best approach is to use some marker. E.g. you submit a batch of 100 updates at 2020-18-08 09:58. You could then query the index, filtering by items that are updated after the timestamp you submitted your content. Once the count from your query matches 100 you can fire your event.
You would also need to handle indexing errors or exceptions when submitting content in these scenarios.
For pull-scenarios your best option is to define a skill that adds a timestamp to items. You could then poll the index with a query, filtering by content with a timestamp after the point indexing started and then fire your event.

Related

How to reliably determine when an Azure Cognitive Search index is up to date?

Azure Cognitive Search is eventually consistent - writes to the service return successfully but the writes are not materialized in the search index for a short period of time.
We are using Azure Cognitive Search in an eventually consistent event sourced CQRS architecture, where an Azure Search index is used as a projection of the event stream. We use websockets to notify connected clients when a projection has been updated, so that they can re-query it to fetch the latest data.
This presents a challenge with Azure Search, because when we notify a client that the index has been updated, the client may query the index before it can provide the most up to date data.
Does Azure Cognitive Search provide any built in ability to determine when a given write will be queryable?
If not, what patterns can be used to achieve what we want?
I am not aware of any functionality in Azure Search that allows you to submit content with a callback that confirms that the content is indexed. However, I have used search engines before where this was an option. When submitting content you could choose between different wait options:
Fire and forget (quickest, return immediately)
Wait for confirmed storage (reasonably quick)
Confirmed indexed (potentially very slow depending on overall indexing load)
It seems like you want the last option. You could create a function that submits a batch of content and then queries until that content is available. For this to work, you would need an indicator on each record that you can use to confirm via a query that the new records are in fact indexed. I always include a timestamp property on all my records and it would do the job in this case.
Use case: You have an index with 500 items. You then have a batch of 10 updated items where 5 are updates and 5 are new records. You add a timestamp to all of these records, submit the batch of 10 and then query in a loop. Once the query confirms that you have 10 or more records with a timestamp higher or equal to the time you submitted your batch, you know the index is updated.

Truncate feeds in getStream

I would like to limit the number of feed updates (records) in my GetStream app. I want to keep each feed at a constant length of 500 items.
I make heavy use of the 'to:' field, which results in a lot of feeds of different lengths. I want them all to grow to 500 items, so I would rather not remove items by date.
For what it's worth, I store all the updates in my own database which results in a replica of the network activity.
What would be a good way of keeping my feeds short?
There's no straightforward way to limit your feeds to 500 items. There's 2 ways to remove activities from Stream:
the removeActivity method, which will remove 1 activity at a time via the foreign_id or activity id (https://getstream.io/docs/js/#removing-activities)
the "Truncate Data" button on the dashboard for your app, which will remove all activities in Stream.
It might be possible to get the behavior you're looking for by keeping track of all activities that you're adding to Stream, then periodically culling the ones that put you over 500.
Hopefully this helps!

Can ElasticSearch delete all and insert new documents in a single query?

I'd like to swap out all documents for a specific index's type. I'm thinking about this like a database transaction, where I'd:
Delete all documents inside of the type
Create new documents
Commit
It appears that this is possible with ElasticSearch's bulk API, but is there a more direct way?
Based on the following statement, from the elasticsearch Delete by Query API Documentation:
Note, delete by query bypasses versioning support. Also, it is not recommended to delete "large chunks of the data in an index", many times, it’s better to simply reindex into a new index.
You might want to reconsider removing entire types and recreating them from the same index. As this statement suggests, it is better to simply reindex. In fact I have a scenario where we have an index of manufacturer products and when a manufacturer sends an updated list of products, we load the new data into our persistent store and then completely rebuild the entire index. I have implemented the use of Index Aliases to allow for masking the actual index being used. When products changes occur a process is started to rebuild the new index in the background (a process that currently takes about 15 minutes) and then switch the alias to the new index once the data load is complete and delete the old index. So this is completely seamless and does not cause any downtime for our users.

Efficient manner to perform multiple lookups in one custom control

I am building a custom control to do a lookup and provide a summary of the status of several items in database. There are 20 different statuses, and in order to determine the number for each status, I am doing a NotesDatabase.search to count each status.
This was fine when there were only 2 statuses to check, however the business now want all of them displayed. :)
I'm concerned about the time it will take to do the search, and want to do this in the most efficient manner possible.
Things I have taken into account:
The documents are updated regularly, so I can't really have an agent doing the calcs and the custom control run a lookup for those static values. This would mean data is old.
The results are dependant on the user logged in, doing counts based on their login ID so I can't really have seperate views per person.
Does anyone have a clean suggested solution?
I am about to start testing the 20 searches and will update this with those results, but am expecting it to be very slow.
A
the other option: instead of #DBLookup you go into the view and just run through it end to end using a navigator. That's pretty fast and should be faster than 20x search.
Of course you could update tallies in the QuerySave event and write it into a user specific in memory profile.
So in your QuerySave you would see what Users are loaded in a ApplicationBean and update those. If a user logs in newly then a search in the database is done into the application bean. When a session expires (Session listener) the entry in the ApplicationBean is cleared out.
Instead of 20 searches you actually might be better off with ONE Ajax call. Create a view that is categorized by your status and is collapsed. Then make an Ajax call ...statusview?ReadViewEntries&Outputformat=JSON&count=100. This will give you the 100 status summary entries with a childcount property.
Would that work for you?
Would it be possible to add 20 Status documents, and to update one or more of these documents whenever some condition is met? Each time a document is updated, an agent runs to match with those conditions, in order to update the status.
If there are many updates per day, it's not really efficient.

Applying CQRS to Inventory Management

I am still trying to wrap my head around how to apply DDD and, most recently, CQRS to a real production business application. In my case, I am working on an inventory management system. It runs as a server-based application exposed via a REST API to several client applications. My focus has been on the domain layer with the API and clients to follow.
The command side of the domain is used to create a new Order and allows modifications, cancellation, marking an Order as fulfilled and shipped/completed. I, of course, have a query that returns a list of orders in the system (as read-only, lightweight DTOs) from the repository. Another query returns a PickList used by warehouse employees to pull items from the shelves to fulfill specific orders. In order to create the PickList, there are calculations, rules, etc that must be evaluated to determine which orders are ready to be fulfilled. For example, if all order line items are in stock. I need to read the same list of orders, iterate over the list and apply those rules and calculations to determine which items should be included in the PickList.
This is not a simple query, so how does it fit into the model?
UPDATE
While I may be able to maintain (store) a set of PickLists, they really are dynamic until an employee retrieves the next PickList. Consider the following scenario:
The first Order of the day is received. I can raise a domain event that triggers an AssemblePickListCommand which applies all of the rules and logic to create one or more PickLists for that Order.
A second Order is received. The event handler should now REPLACE the original PickLists with one or more new PickLists optimized across both pending Orders.
Likewise after a third Order is received.
Let's assume we now have two PickLists in the 'queue' because the optimization rules split the lists because components are at opposite ends of the warehouse.
Warehouse employee #1 requests a PickList. The first PickList is pulled and printed.
A fourth Order is received. As before, the handler removes the second PickList from the queue (the only one remaining) and regenerates one or more PickLists based on the second PickList and the new Order.
The PickList 'assembler' will repeat this logic whenever a new Order is received.
My issue with this is that a request must either block while the PickList queue is being updated or I have an eventual consistency issue that goes against the behavior the customer wants. Each time they request a PickList, they want it optimized based on all of the Order received to that point in time.
While I may be able to maintain (store) a set of PickLists, they really are dynamic until an employee retrieves the next PickList. Consider the following scenario:
The first Order of the day is received. I can raise a domain event that triggers an AssemblePickListCommand which applies all of the rules and logic to create one or more PickLists for that Order.
A second Order is received. The event handler should now REPLACE the original PickLists with one or more new PickLists optimized across both pending Orders.
This sounds to me like you are getting tangled trying to use a language that doesn't actually match the domain you are working in.
In particular, I don't believe that you would be having these modeling problems if the PickList "queue" was a real thing. I think instead there is an OrderItem collection that lives inside some aggregate, you issue commands to that aggregate to generate a PickList.
That is, I would expect a flow that looks like
onOrderPlaced(List<OrderItems> items)
warehouse.reserveItems(List<OrderItems> items)
// At this point, the items are copied into an unasssigned
// items collection. In other words, the aggregate knows
// that the items have been ordered, and are not currently
// assigned to any picklist
fire(ItemsReserved(items))
onPickListRequested(Id<Employee> employee)
warehouse.assignPickList(Id<Employee> employee, PickListOptimizier optimizer)
// PickListOptimizer is your calculation, rules, etc that know how
// to choose the right items to put into the next pick list from a
// a given collection of unassigned items. This is a stateless domain
// *domain service* -- it provides the query that the warehouse aggregate needs
// to figure out the right change to make, but it *doesn't* change
// the state of the aggregate -- that's the aggregate's responsibility
List<OrderItems> pickedItems = optimizer.chooseItems(this.unassignedItems);
this.unassignedItems.removeAll(pickedItems);
// This mockup assumes we can consider PickLists to be entities
// within the warehouse aggregate. You'd need some additional
// events if you wanted the PickList to have its own aggregate
Id<PickList> = PickList.createId(...);
this.pickLists.put(id, new PickList(id, employee, pickedItems))
fire(PickListAssigned(id, employee, pickedItems);
onPickListCompleted(Id<PickList> pickList)
warehouse.closePicklist(Id<PickList> pickList)
this.pickLists.remove(pickList)
fire(PickListClosed(pickList)
onPickListAbandoned(Id<PickList> pickList)
warehouse.reassign(Id<PickList> pickList)
PickList list = this.pickLists.remove(pickList)
this.unassignedItems.addAll(list.pickedItems)
fire(ItemsReassigned(list.pickedItems)
Not great languaging -- I don't speak warehouse. But it covers most of your points: each time a new PickList is generated, it's being built from the latest state of pending items in the warehouse.
There's some contention - you can't assign items to a pick list AND change the unassigned items at the same time. Those are two different writes to the same aggregate, and I don't think you are going to get around that as long as the client insists upon a perfectly optimized picklist each time. It might be worth while to sit down with the domain experts and explore the real cost to the business if the second best pick list is assigned from time to time. After all, there's already latency between the placing the order and its arrival at the warehouse....
I don't really see what your specific question is. But the first thing that comes to mind is that pick list creation is not just a query but a full blown business concept that should be explicitly modeled. It then could be created with AssemblePicklist command for instance.
You seem to have two roles/processes and possibly also two aggregate roots - salesperson works with orders, warehouse worker with picklists.
AssemblePicklistsCommand() is triggered from order processing and recreates all currently unassigned picklists.
Warehouse worker fires a AssignPicklistCommand(userid) which tries to choose the most appropriate unassigned picklist and assign it to him (or doing nothing if he already has an active picklist). He could then use GetActivePicklistQuery(userid) to get the picklist, pick items with PickPicklistItemCommand(picklistid, item, quantity) and finally MarkPicklistCompleteCommand() to signal order he's done.
AssemblePicklist and AssignPicklist should block each other (serial processing, optimistic concurency?) but the relation between AssignPicklist and GetActivePicklist is clean - either you have a picklist assigned or you don't.

Resources