Filtering venues with the Foursquare API - foursquare

I'm building an app where I want to get all restaurants in a given radius from a coordinate with the Foursquare API, but I want certain filters applied. Specifically, I want to filter by:
if the venue is currently open
the price range of the venue
keyword searches defined by the user
one or more categories returned by the categories endpoint
My app is focused on restaurants, so I want to unconditionally filter the result to be in the "Food" category (ID #4d4b7105d754a06374d81259). The explore endpoint is documented to support filters for openNow, price, and query, but it does not list support for a categoryId, which is crucial. It appears to have undocumented support for a categoryId filter, but it only allows one ID in this parameter, and the or more part is crucial.
The search endpoint is documented to support filtering by one or more categoryIds and a query, but not openNow or price like explore can filter by. Is it possible to achieve the feature set I want with the Foursquare API without having to do my own data processing?
UPDATE: I stumbled across a search recommendations endpoint that seems to meet my needs. However, the documentation for this endpoint says that it's only available to specific whitelisted partners. If it's not possible to get the feature set I want in the current public set of APIs, how would I go about getting on this whitelist?

Related

How to apply business logic to paginated search data?

Background:
We have an eCommerce site that uses Oracle Endeca to search and filter for products and return the search results paginated. Unfortunately, we have custom backend logic that applies special pricing depending on the customer, which only gets applied to the Endeca paginated results. Our backend system is archaic (if you must know, Progress WDS-II). Also unfortunate, is that we allow the user to sort the results by (Catalog) Price. So, the problem is that the calculated price shown is actually sorted.
Question:
Does anyone have any suggestions on how to handle this scenario? Architecturally, are there good designs for how you should apply business logic and calculate fields post-query? Or is it as simple as "Don't do it"? How do other search engines like elasticsearch or Lucene apply business logic to the search results?

Why can't ContinuationToken be used for paging in Azure Search API?

Reading the documentation for the Azure Search .NET SDK, I see that the ContinuationToken property is not supposed to used for pagination (this is the same as the #odata.nextLink and #search.nextPageParameter properties in the REST API).
Note that this property is not meant to help you implement paging of search results. You can implement paging using the Top and Skip search parameters.
Source
Why can't I use it for pagination? I have a situation where I want to run a query and then step through a static copy of the results page by page. I don't want those query results to change beneath my feet, however, as I am navigating through them, as new documents are added to the underlying database. In my case, there could be hundreds or thousands of results that get added in the minute or two between submitting the initial query and navigating to another page. How could I accomplish this?
Your question can be addressed in two parts:
Why is it not recommended to use ContinuationToken to implement pagination?
How can pagination be implemented such that results remain completely stable from page to page?
These are actually unrelated questions, since nothing about ContinuationToken guarantees the stability of the search results. Azure Search makes no consistency guarantees around paging, whether you use $top and $skip or ContinuationToken.
For question #1, the reason ContinuationToken is not recommended for paging is that Azure Search controls when the token is returned, not your application code. If you make assumptions about how and when Azure Search decides to return you a token, there's a chance those assumptions may break with a future service update. The intent of ContinuationToken is to prevent requests for too many documents from overwhelming the service, so you should assume that it is entirely at the service's discretion whether it will return a token.
For question #2, since Azure Search doesn't provide consistency guarantees, you can't completely avoid issues like the same document showing up in multiple pages, missing documents, or documents that are deleted by the time they are seen in results. Even if you wanted to build your own snapshot of the results and page over them in your application code, building a consistent snapshot isn't possible in the first place. However, if your only concern is to avoid showing new documents in the results, you can include a created timestamp field in your index and filter on that in every search request.
Frankly, unless you're trying to export the entire contents of your index, I would question the need for such strong consistency guarantees around paging. Google and Bing make no such guarantees, so arguably user expectations are already set around this. If you are trying to export your data, this is unfortunately not easy with Azure Search today. In that case, please vote on this User Voice item to help the team prioritize this scenario.

Personalized Search Results for Elasticsearch

How would one go about setting up Elasticsearch so that it returns personalized results?
For example, I would want results returned to a particular user to rank higher if they clicked on a result previously, or if they "starred" that result in the past. You could also have a "hide" option that pushes results further down the ranking. From what I've seen with elasticsearch so far, it seems difficult to return different rankings to users based on that user's own dynamic data.
The solution would have to scale to thousands of users doing a dozen or so searches per day. Ideally, I would like the ranking to change in real-time, but it's not critical.
Elasticsearch provides a wide variety of scoring options , but then to achieve what you have told you will need to do some additional tasks.
Function score query and document terms lookup terms filter would be our tools of our choice
First create a document per user , telling the links or link ID he visited and the links he has liked. This should be housed separately as separate index. And this should be maintained by the user , as he should update and maintain this record from client side.
Now when a user hits the data index, do a function score query with filter function pointing to this fields.
In this approach , as the filter is cached , you should get decent performance too.

Do we need to call foursquare Venue Categories API at regular interval?

Please let me know whether we have to call foursquare Venue Categories API at regular interval
or we have to call only once so that we can store category list in database and use them for searching items
If category Id is not getting changed in the above scenario , it will work for me .
Yes, you should call the categories endpoint at a regular interval, but that interval can be large.
They make changes to the categories - we call it once a month or so (manually actually), to update the hierarchy that we cache on our side.
We have not seen a category ID changes, but rather more categories are added over time, and maybe removed (not really sure about removed)
It happens rarely, but we sometimes have an error when we can a category id that we do not recognize and then we need to go refresh the categories list and rebuild our cache.
From the API docs (https://developer.foursquare.com/docs/venues/categories):
"...please download this list only once per session, but also avoid caching this data for longer than a week to avoid stale information."
So, you can store the list in your database, but you should refresh this data at least once a week.

How does solr work with data split into different services and therefore not synchronously available?

take for instance an ecommerce store with catalog and price data in different web services. Now, we know that solr does not allow partial updates to a document field(JIRA bug), so how do you index these two services ?
I had three possibilities, but I'm not sure which one is correct:
Partial update - not possible
Solr join - have price and catalog in separate index and join them in solr. You cant join them in your client side code, without screwing up pagination and facet counts. I dont know if this is possible in pre-solr 4.0
have some sort of intermediate indexing service, which composes an entire document based on the results from both these services and sends this for indexing. however there are two problems with this approach:
3.1 You can still compose documents partially, and then when the document is complete, you can set a flag indicating that this is a complete document. However, to do this each time a document has to be indexed, it has to first check whether the document exists in the index, edit it and push it back. So, big performance hit.
3.2 Your intermediate service checks whether a particular id is available from all services - if not silently drops it and hopes that when it appears in the other service, the first service will already be populated. This is OK, but it means that an item is not available in search until all fields are available (not desirable always - if u dont have price, you can simply set it to out-of-stock and still have it available)
Of all these methods, only #3.2 looks viable to me - does anyone know how you do this kind of thing with DIH? Because now, you have two different entry points (2 different web services) into indexing and each has to check the other
The usual way to solve this is close to your 3.2: write code that creates the document you want to index from the different available services. The usual flow would be to fetch all the items from the catalog, then fetch the prices when indexing. Wether you want to have items in the search from the catalog that doesn't have prices available depends on your business rules for the service. If you want to speed up the process (fetch product, fetch price, repeat), expand the API to fetch 1000 products and then prices for all the products at the same time.
There is no reason why you should drop an item from the index if it doesn't have price, unless you don't want items without prices in your index. It's up to you and your particular need what kind of information you need to have available before indexing the document.
As far as I remember 4.0 will probably support partial updates as it moves to the new abstraction layer for the index files, although I'm not sure it'll make your situation that much more flexible.
Approach 3.2 is the most common, though I think about it slightly differently. First, think about what you want in your search results, then create one Solr document for each potential result, with as much information as you can get. If it is OK to have a missing price, then add the document that way.
You may also want to match the documents in Solr, but get the latest data for display from the web services. That gives fresh results and avoids skew between the batch updates to Solr and the live data.
Don't hold your breath for fine-grained updates to be added to Solr and Lucene. It gets a lot of its speed from not having record-level locking and update.

Resources