Background:
We have an eCommerce site that uses Oracle Endeca to search and filter for products and return the search results paginated. Unfortunately, we have custom backend logic that applies special pricing depending on the customer, which only gets applied to the Endeca paginated results. Our backend system is archaic (if you must know, Progress WDS-II). Also unfortunate, is that we allow the user to sort the results by (Catalog) Price. So, the problem is that the calculated price shown is actually sorted.
Question:
Does anyone have any suggestions on how to handle this scenario? Architecturally, are there good designs for how you should apply business logic and calculate fields post-query? Or is it as simple as "Don't do it"? How do other search engines like elasticsearch or Lucene apply business logic to the search results?
Related
I am setting up a large (2000+ records) "task tracking register" using a SharePoint List, and intend to use Powerapps as the UI.
As you would imagine there numerous drop drown fields in the list which I would like to use as a filter within the Powerapp, but being that these are "Complex" fields, they are non-delegatable.
I'm lead to believe that I can avoid this by creating additional Columns in the SharePoint list that use a Flow that populates them with plain text based on the Drop-down selected.
This is a bit of pain, so I'd like to limit the quantity of these helper columns as much as possible.
Can anyone advise if a Powerapps Gallery will initially filter the results being returned using the delegateable functions first, and then perform the non-delegatable search functions on those items, or whether the inclusion of a non-delgatable search criteria means that the whole query is performed in a non-delegatable manner?
i.e.
Filter 3000 records down to 800 using delegatable search, then perform the additional filtering of those 800 on the app for the non-delegatable search criteria.
I understand that it may be possible to do this via loading the initial filtered results into a collection within the app and potentially filtering that list, but have read some conflicting information as to the efficacy of this method, so not such if this is the route I should take.
Delegation can be a challenge. Here are some methods for handling it:
Users rarely need more than a few dozen records at any time in a mobile app. Try to use delegable queries to create a Collection locally. From there, its lightning fast.
If you MUST pull in all 3k+ of your records, here's my favorite hack. Collect chunks of your data source then combine into a single collection.
If you want the function to scale (and the user's wait time) you can determine the first and last ID to dynamically build a function.
Good luck!
We are interested in using DocumentDb as a data store for a number of data sources and as such we are running a quick POC to establish whether it meets the criteria we are looking for.
One of the areas we are keen to provide is look ahead search capabilities for certain fields. These are traditionally provided using the SQL LIKE syntax which does not appear to be supported at present.
Searching online I have seen people talking about integrating Azure search but this appears to be a very costly mechanism for such a simple use case.
I have also seen people mention the use of UDF's but this appears to require an entire collection scan which is not practical from a performance perspective.
Does anyone have any alternative suggestions? One thing I considered was simply using a SQL table and initiating an update each time a document was inserted\updated\deleted?
DocumentDB supports STARTSWITH and range indexes to support prefix/look ahead searching.
You can progressively make queries like the following based on what your user types in a text box:
SELECT TOP 10 * FROM hotel H WHERE STARTSWITH(H.name, "H")
SELECT TOP 10 * FROM hotel H WHERE STARTSWITH(H.name, "Hi")
SELECT TOP 10 * FROM hotel H WHERE STARTSWITH(H.name, "Hil")
SELECT TOP 10 * FROM hotel H WHERE STARTSWITH(H.name, "Hilton")
Note that you must configure the collection, or the path/property you're using for these queries with a range index. You can extend this approach to handle additional cases as well:
To query in a case-insensitive manner, you must store the lower case form of the search property, and use that for querying.
I faced a similar situation, where a fast lookup was required, as a user typed search terms.
My scenario was that potentially thousands of simultaneous users would be performing such lookups; when testing this under load, to avoid saturation and throttling, we found we would have to increase the DocumentDB Request Unit (RU) throughput amount to a point that was not financially viable for us, in our specific circumstances.
We decided that DocumentDB was best used as the persistent store, and 'full' data retrieval - and this role it performs exceptionally well - while a small ElasticSearch cluster performed the role it was designed for - text search, faceted search, weighted search, stemming, and most relevant to your question, autocomplete analyzersand completion suggesters.
The subject of type ahead queries, creation of indexes, autocomplete analyzer and query time 'search as you type' in ElasticSearch can be found here, here and here
The fact that you plan to have several data sources would also potentially make the ElasticSearch cluster approach more attractive, to aggregate search data.
I used the Bitnami template available in the Azure market place to create relatively small instances, and most importantly, this allowed me to place the cluster on the same Virtual Network as my other components, which greatly increased performance.
Cost was lower than Azure Search (which uses ElasticSearch under the hood).
I am building a search page with Azure Search. On my page, I have a search box. I want to provide suggestions to the users. In an attempt to do this, I'm using the Suggestions endpoint on my index. At this time, I have a request that includes the following query string:
search=sta&suggesterName=sites&$top=3
My question is, how does top determine which three results to return? Is it the first three matches it encounters when going through the search index? Or is it something else? Based on the URL structure, I don't think it's using a scoring profile. So, I ruled out relevancy. But then I started reading about the minimumCoverage field and I got confused.
If the suggest endpoint just returns the first [top] matches it encounters, then why is the minimumCoverage field even needed?
In general, $top will give you the top N results based on whatever order the rest of the query specifies. For queries with no $orderby, the sort order is descending by relevance score. This applies to both Suggest and Search.
Note that just because you don't have a scoring profile (such as with Suggest), that doesn't mean Azure Search doesn't calculate relevance scores for each document. Scoring profiles can influence the score, but they do not completely define it.
For queries with an $orderby, the order of results is defined first by the fields in the $orderby, and then by score if there are any ties to be broken.
minimumCoverage has nothing to do with ordering or $top. It has to do with the way search queries are distributed. Every query is executed concurrently against different subsets of the index (this happens regardless of whether or not you have multiple search units). Sometimes one of these subsets fails to execute for whatever reason, usually when your search service is under heavy load. The minimumCoverage parameter provides a way to relax the rule that normally says "X% of the index must successfully execute the query in order to consider the overall query a success" (X is 100 by default for Search and 80 by default for Suggest). This is a way to tradeoff completeness of search results for higher availability in case of heavy load or partial outages.
How would one go about setting up Elasticsearch so that it returns personalized results?
For example, I would want results returned to a particular user to rank higher if they clicked on a result previously, or if they "starred" that result in the past. You could also have a "hide" option that pushes results further down the ranking. From what I've seen with elasticsearch so far, it seems difficult to return different rankings to users based on that user's own dynamic data.
The solution would have to scale to thousands of users doing a dozen or so searches per day. Ideally, I would like the ranking to change in real-time, but it's not critical.
Elasticsearch provides a wide variety of scoring options , but then to achieve what you have told you will need to do some additional tasks.
Function score query and document terms lookup terms filter would be our tools of our choice
First create a document per user , telling the links or link ID he visited and the links he has liked. This should be housed separately as separate index. And this should be maintained by the user , as he should update and maintain this record from client side.
Now when a user hits the data index, do a function score query with filter function pointing to this fields.
In this approach , as the filter is cached , you should get decent performance too.
take for instance an ecommerce store with catalog and price data in different web services. Now, we know that solr does not allow partial updates to a document field(JIRA bug), so how do you index these two services ?
I had three possibilities, but I'm not sure which one is correct:
Partial update - not possible
Solr join - have price and catalog in separate index and join them in solr. You cant join them in your client side code, without screwing up pagination and facet counts. I dont know if this is possible in pre-solr 4.0
have some sort of intermediate indexing service, which composes an entire document based on the results from both these services and sends this for indexing. however there are two problems with this approach:
3.1 You can still compose documents partially, and then when the document is complete, you can set a flag indicating that this is a complete document. However, to do this each time a document has to be indexed, it has to first check whether the document exists in the index, edit it and push it back. So, big performance hit.
3.2 Your intermediate service checks whether a particular id is available from all services - if not silently drops it and hopes that when it appears in the other service, the first service will already be populated. This is OK, but it means that an item is not available in search until all fields are available (not desirable always - if u dont have price, you can simply set it to out-of-stock and still have it available)
Of all these methods, only #3.2 looks viable to me - does anyone know how you do this kind of thing with DIH? Because now, you have two different entry points (2 different web services) into indexing and each has to check the other
The usual way to solve this is close to your 3.2: write code that creates the document you want to index from the different available services. The usual flow would be to fetch all the items from the catalog, then fetch the prices when indexing. Wether you want to have items in the search from the catalog that doesn't have prices available depends on your business rules for the service. If you want to speed up the process (fetch product, fetch price, repeat), expand the API to fetch 1000 products and then prices for all the products at the same time.
There is no reason why you should drop an item from the index if it doesn't have price, unless you don't want items without prices in your index. It's up to you and your particular need what kind of information you need to have available before indexing the document.
As far as I remember 4.0 will probably support partial updates as it moves to the new abstraction layer for the index files, although I'm not sure it'll make your situation that much more flexible.
Approach 3.2 is the most common, though I think about it slightly differently. First, think about what you want in your search results, then create one Solr document for each potential result, with as much information as you can get. If it is OK to have a missing price, then add the document that way.
You may also want to match the documents in Solr, but get the latest data for display from the web services. That gives fresh results and avoids skew between the batch updates to Solr and the live data.
Don't hold your breath for fine-grained updates to be added to Solr and Lucene. It gets a lot of its speed from not having record-level locking and update.