I see that the foursquare api has a 50 result limit and this sort of concerns me a bit. My problem is this
"A user would scan a item and it would find all stores that have that item within x kms from them".
Now in some categories like groceries I don't think the 50 result limit will be too bad as I can't really think of an area that 50 grocery stores even if I up the search area to the max(50kms).
However what if that item is a piece of clothing and you in say a big mall. I would think in this situation that the 50 limit can be hit.
I would get 50 results back and then scan my database to see if that store contains that clothing. Yet nothing comes back but in reality if I could have had 51 stores returned that 51th store would have had it.
Anyone have any ideas to prevent this?
Yes, there is a way around this that involves running multiple searches that specify slightly different locations within the same general area.
This has been answered before here: How do I get more locations?
Related
Case in point: say we have a search query that returns 2000 results ranging from very relevant to hardly relevant at all. When this is sorted by relevance this is fine, as the most relevant results are listed on the first page.
However, when sorting by another field (e.g. user rating) the results on the first page are full of hardly-relevant results, which is a problem for our client. Somehow we need to only show the 'relevant' results with highest ratings.
I can only think of a few solutions, all of which have problems:
1 - Filter out listings on Solr side if relevancy score is under a threshold. I'm not sure how to do this, and from what I've read this isn't a good idea anyway. e.g. If a result returns only 10 listings I would want to display them all instead of filter any out. It seems impossible to determine a threshold that would work across the board. If anyone can show me otherwise please show me how!
2 - Filter out listings on the application side based on score. This I can do without a problem, except that now I can't implement pagination, because I have no way to determine the total number of filtered results without returning the whole set, which would affect performance/bandwidth etc... Also has same problems of the first point.
3 - Create a sort of 'combined' sort that aggregates a score between relevancy and user rating, which the results will then be sorted on. Firstly I'm not sure if this is even possible, and secondly it would be weird for the user if the results aren't actually listed in order of rating.
How has this been solved before? I'm open to any ideas!
Thanks
If they're not relevant, they should be excluded from the result set. Since you want to order by a dedicated field (i.e. user rating), you'll have to tweak how you decide which documents to include in the result at all.
In any case you'll have to define "what is relevant enough", since scores aren't really comparable between queries and doesn't say anything about "this was xyz relevant!".
You'll have to decide why those documents that are included aren't relevant and exclude them based on that criteria, and then either use the review score as a way to boost them further up (if you want the search to appear organic / by relevance). Otherwise you can just exclude them and sort by user score. But remember that user score, as an experience for the user, is usually a harder problem to make relevant than just order by the average of the votes.
Usually the client can choose different ordering options, by relevance or ratings for example. But you are right that ordering by rating is probably not useful enough. What you could do is take into account the rating in the relevance scoring. For example, by multiplying an "organic" score with a rating transformed as a small boost. In Solr you could do this with Function Queries. It is not hard science, and some magic is involved. Much is common sense. And it requires some very good evaluation and testing to see what works best.
Alternatively, if you do not want to treat it as a retrieval problem, you can apply faceting and let users do filtering of the results by rating. Let users help themselves. But I can imagine this does not work in all domains.
Engineers can define what relevancy is. Content similarity scoring is not only what constitutes relevancy. Many Information Retrieval researchers and engineers agree that contextual information should be used besides only the content similarity. This opens a plethora of possibilities to define a retrieval model. For example, what has become popular are Learning to Rank (LTR) approaches where different features are learnt from search logs to deliver more relevant documents to users given their user profiles and prior search behavior. Solr offers this as module.
first time posting.
I wanted to ask if anyone knows how I can search on YouTube for, let's say, music video's that have been viewed between a set number of times. Like the title says for example, between 9 and 11 million times.
One reason I want to do this is because I want to find good music that I haven't heard before. The logic I'm working on is that the Got Talent type video's that get viewed millions of times are generally viewed that many times for one of two reason. 1) they're amazing. 2) they're embarrassingly horrible.
And though I don't think a song being popular will necessarily mean I'll like it, I'm hoping this method will be successful to some degree.
Another reason is to look for trailers for independent films with a similar logic as above. Though with these movies I think I only hear about them six months to a year after they've been released because they're flying under the radar.
If I were to be able to search for movie trailers with 'x' number of views though.. for example, between 500,000 and a million, maybe I'd be able to find movies that I'll like quicker than via time passing and them getting mentioned to me by a friend.
Any help would be greatly appreciated as I've wanted to be able to perform these kind of searches for awhile now.
thanks
You will need to use YouTube API v3.
I havent written this exact request but it looks like you can list videos then filter by 'Chart' = 'mostPopular'
https://developers.google.com/youtube/v3/docs/videos/list
Perhaps a bit of background reading on the API would help too...
https://developers.google.com/youtube/v3/
First off, you would need the Youtube Data API. "v3" means nothing because it's simply the current version, like "Windows 10."
The API lets you get a video's view count, but doesn't put it in a range like 9 million to 11 million.
Youtube's own search function is pretty sophisticated. For instance,
https://www.youtube.com/results?search_query=movie+trailer&search_sort=video_view_count&filters=month. This gives all results for "movie trailer," within the last month, sorted by view count. You can customize the URL, i.e. "week" instead of month would return only trailers from the last week. Or year, etc. Essentially this is a "Videos: List: MostPopular" query, with subject filter.
I have a few Youtube API scripts, and I hardly think it's worth the hassle to do it that way when Youtube's advanced search get you 99% there. If you did, you would need to to a Search:list query for a given subject (i.e. "movie trailer"). Limited to a given time frame (i.e. last month). Then for each video ID, make a Videos:list query to get its view count. Then print all, sorted by views.
Is it possible to search venues (via venues/search) in whole city without passing "radius" parameter? Because I don't know radius of each city :) Documentation says "Searches can be done near a point or through a whole city", but how can I provide this in venues/search?
Thanks.
I do not think there is a way to tell it 'search the entire city', but I also think it might be a wrong use case.
You need to remember a few things when searching:
Foursquare will return up to 50 results (the limit parameter)
The 50 results are ordered by the most popular places around the center of your search
So if you are searching a city which have more than 50 venues in Foursquare database, 'searching the entire city' will usually get the same (up to) 50 results - always.
This where the filters comes in handy, in our case, to get you better results for our needs, we use the categoryId combined with the radius to get things we want to show our users. Sometimes we get information from other cities because of a big radius, but for our application its okay, we actually give our customers more options :) . I can also guess that a lot of apps also use the query filter as they know the name of the place they are looking for.
You just need to experiment with it and discover how to get the data which is right to your application.
In theory, to search an entire city I would use the city lat/lng from Google or Open Street Maps or geonames and do a 10Km search around that point (intent=browse, radius=10000), the following is a guess, but it will get 50 places for over 99% of the cities people who own smartphones live in :)
You can do obtain results within in a city as follows:
https://api.foursquare.com/v2/venues/search?near=Singapore,Singapore&client_id=YOUR_CLIENT_ID
&client_secret=YOUR_CLIENT_SECRET3&v=YYYYMMDD
For more details check the documentation:
https://developer.foursquare.com/docs/venues/explore
Assuming you're talking about requests with a query, I would just set a reasonable value for radius and use the city's default city center. If you want to avoid showing results from neighboring cities, you can post-request filter by the returned venue's "city" string in the location stanza.
I'm trying to match existing data to Foursquare Venues. I've tried matching about 100,000 records using intent=match and 30% of them don't return results. Now, sometimes these venues are actually missing, but sometimes the search just isn't finding results that would be obvious to a human. For example:
https://api.foursquare.com/v2/venues/search?intent=match&ll=40.075800000000001,-80.698800000000006&query=19%20TH%20HOLE
That returns no results. However, if I search for "19TH HOLE" I do get a result.
I could just add all these non-matches to Foursquare, but it seems that I'd end up creating a whole lot of duplicates... and I don't want to abuse the system. We're trying to make Foursquare our Venues database, and we can't go and process 300,000 records without matches by hand, either.
I'm open to suggestions on what else I can do.
You can "relax" the search strictness by specifying intent=checkin or intent=browse and using your own criteria to determine if the top result is the one you're looking for.
The company I work for is in the business of sending press releases. We want to make it possible for interested parties to search for press releases based on a number of criteria, the most important being location. For example, someone might search for all news sent to New York City, Massachusetts, or ZIP code 89134, sent from a governmental institution, under the topic of "traffic". Or whatever.
The problem is, we've sent, literally, hundreds of thousands of press releases. Searching is slow and complex. For example, a press release sent to Queens, NY should show up in the search I mentioned above even though it wasn't specifically sent to New York City, because Queens is a subset of New York City. We may also want to implement "and" and "or" and negation and text search to the query to create complex searches. These searches also have to be fast enough to function as dynamic RSS feeds.
I really don't know anything about search theory, or how it's properly done. The way we are getting by right now is using a data mart to store the locations the releases were sent to in a single table. However, because of the subset thing mentioned above, the data mart is gigantic with millions of rows. And we haven't even implemented cities yet, and there are about 50,000 cities in the United States, which will exponentially increase the size of the data mart by so much I'm afraid it just won't work anymore.
Anyway, I realize this is not a simple question and there won't be a "do this" answer. However, I'm hoping one of you can point me in the right direction where I can learn about how massive searches are done? Because I really know nothing about it. And such a search engine is turning out to be incredibly difficult to make. Thanks! I know there must be a way because if Google can search the entire internet we must be able to search our own database :-)
Google can search the entire internet, and your data via a Google Appliance!