Google Vision product search indexing

Google Vision product search indexing - python-3.x

I have a question regarding Google Vision product search.
I know the Product Search index of products is updated approximately every 30 minutes. Does indexTime reset to default value "1970-01-01T00:00:00Z" on an unused ProductSet?

That means that it hasn't been indexed yet

Related

Why does Azure Search give higher score to less relevant document?

I have two documents indexed in Azure Search (among many others):
Document A contains only one instance of "BRIG" in the whole document.
Document B contains 40 instances of "BRIG".
When I do a simple search for "BRIG" in the Azure Search Explorer via Azure Portal, I see Document A returned first with "#search.score": 7.93229 and Document B returned second with "#search.score": 4.6097126.
There is a scoring profile on the index that adds a boost of 10 for the "title" field and a boost of 5 for the "summary" field, but this doesn't affect these results as neither have "BRIG" in either of those fields.
There's also a "freshness" scoring function with a boost of 15 over 365 days with a quadratic function profile. Again, this shouldn't apply to either of these documents as both were created over a year ago.
I can't figure out why Document A is scoring higher than Document B.

It's possible that document A is 'newer' than document B and that's the reason why it's being displayed first (has a higher score). Besides Term relevance, freshness can also impact the score.
EDIT:
After some research it looks like that newer created Azure Cognitive Search uses BM25 algorithm by default. (source: https://learn.microsoft.com/en-us/azure/search/index-similarity-and-scoring#scoring-algorithms-in-search)
Document length and field length also play a role in the BM25 algorithm. Longer documents and fields are given less weight in the relevance score calculation. Therefore, a document that contains a single instance of the search term in a shorter field may receive a higher relevance score than a document that contains the search term multiple times in a longer field.

Test your scoring profile configurations. Perhaps try issuing queries without scoring profiles first and see if that meets your needs.
The "searchMode" parameter controls precision and recall. If you want more recall, use the default "any" value, which returns a result if any part of the query string is matched. If you favor precision, where all parts of the string must be matched, change searchMode to "all". Try the above query both ways to see how searchMode changes the outcome. See Simple Query Examples.
If you are using the BM25 algorithm, you also may want to tune your k1 and b values. See Set BM25 Parameters.
Lastly, you may want to explore the new Semantic search preview feature for enhanced relevance.

Paging in Azure search when results have equal scores

I'm using Azure Search on my e-commerce site, and now i faced the problem with paging on my search page. When i reload the search page i can get different order of products. So when i'm using paging i can see same products on different pages, and this is critical.
I started researching what's going wrong, and i've found this info on Microsoft docs https://learn.microsoft.com/en-us/rest/api/searchservice/add-scoring-profiles-to-a-search-index#what-is-default-scoring
Search score values can be repeated throughout a result set. For
example, you might have 10 items with a score of 1.2, 20 items with a
score of 1.0, and 20 items with a score of 0.5. When multiple hits
have the same search score, the ordering of same scored items is not
defined, and is not stable. Run the query again, and you might see
items shift position. Given two items with an identical score, there
is no guarantee which one appears first.
So if i got it correctly, i face this issue because products has same score.
How to fix this?

You got it correctly! Because the products you are getting have the same score, there is no guarantee which one appears first.
In order to avoid it in this stage, you can add to your $orderby parameter a field that has unique values, and that way you guarantee the same order. However, this approach doesn’t take scoring into account. We are currently working on a solution to this problem. We will update this answer once the solution is available (the ETA at this point is weeks, not months).

Please note that you can now use search.score() function to order by score:
From the link below:
https://learn.microsoft.com/en-us/rest/api/searchservice/odata-expression-syntax-for-azure-search.
"You can specify multiple sort criteria. The order of expressions determines the final sort order. For example, to sort descending by score, followed by rating, the syntax would be $orderby=search.score() desc,rating desc."

Azure Search Distance filter with variable distance

Suppose I have the following scenario:
A search UI to allow individuals to find plumbers who are able to service their home location.
When a plumber enters their info into the system, they provide their coordinates and a maximum distance they are willing to travel.
The individual can then enter their home coordinates and should be presented with a list of plumbers who are eligible.
Looking at the Azure Search geo.distance function, I cannot see how to do this. Scenarios where the searcher provides a distance are well covered but not where the distance is different for each search record.
The documentation provides the following example:
$filter=geo.distance(location, geography'POINT(-122.131577 47.678581)') le 10
This works correctly but if I try and change the 10 to the maxDistance field, it fails with
Comparison must be between a field, range variable or function call
and a literal value
My requirement seems fairly basic but am now wondering if this is currently possible with Azure Search?

I found an azure feedback suggestion asking for this feature but no news on if/when it will be implemented. Therefore it is safe to assume that this scenario is not currently supported.

To add to Paul's answer, one possible workaround is to use a conservatively large constant value instead of referencing the maxDistance field in your $filter expression. Then, you can filter the resulting list of plumbers on the client to take each plumber's max distance into account and produce final list of plumbers.

Counts of web search hits [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a set of search queries in the size of approx. 10 millions. The goal is to collect the number of hits returned by a search engine for all of them. For example, Google returns about 47,500,000 for the query "stackoverflow".
The problem is that:
1- Google API is limited to 100 query per day. This is far from being useful to my task since I would have to get lots of counts.
2- I used Bing API but it does not return an accurate number. Accureate in the sense of matching the number of hits shown in Bing UI. Has anyone came across this issue before?
3- Issuing search queries to a search engine and parsing the html is one solution but it results in CAPTCHA and does not scale to this number of queries.
All I care about is that the number of hits and I am open for any suggestion.

Well, I was really hoping that someone would answer this since this is something that I also was interested in finding out but since it doesn't look like anyone will I will throw in these suggestions.
You could set up a series of proxies that change their IP every 100 requests so that you can query google as seemingly different people (seems like a lot of work). Or you can download wikipedia and write something to parse the data there so that when you search a term you can see how many pages it falls in. Of course that is a much smaller dataset than the whole web but it should get you started. Another possible data source is the google n-grams data which you can download and parse to see how many books and pages the search terms fall in. Maybe a combination of these methods could boost the accuracy on any given search term.
Certainly none of these methods are as good as if you could just get the google page counts directly but understandably that is data they don't want to give out for free.

I see this is a very old question but I was trying to do the same thing which brought me here. I'll add some info and my progress to date:
Firstly, the reason you get an estimate that can change wildly is because search engines use probabilistic algorithms to calculate relevance. This means that during a query they do not need to examine all possible matches in order to calculate the top N hits by relevance with a fair degree of confidence. That means that when the search concludes, for a large result set, the search engine actually doesn't know the total number of hits. It has seen a representative sample though, and it can use some statistics about the terms used in your query to set an upper limit on the possible number of hits. That's why you only get an estimate for large result sets. Running the query in such a way that you got an exact count would be much more computationally intensive.
The best I've been able to achieve is to refine the estimate by tricking the search engine into looking at more results. To do this you need to go to page 2 of the results and then modify the 'first' parameter in the URL to go way higher. Doing this may allow you to find the end of the result set (this worked for me last year I'm sure although today it only worked up to the first few thousand). Even if it doesn't allow you to get to the end of the result set you will see that the estimate gets better as the query engine considers more hits.
I found Bing slightly easier to use in the above way - but I was still unable to get an exact count for the site I was considering. Google seems to be actively preventing this use of their engine which isn't that surprising. Bing also seems to hit limits although they looked more like defects.
For my use case I was able to get both search engines to fairly similar estimates (148k for Bing, 149k for Google) using the above technique. The highest hit count I was able to get from Google was 323 whereas Bing went up to 700 - both wildly inaccurate but not surprising since this is not their intended use of the product.
If you want to do it for your own site you can use the search engine's webmaster tools to view indexed page count. For other sites I think you'd need to use the search engine API (at some cost).

Twitter search API: get more results and since a specified date

I’m working on a simple search machine for Twitter where I want to extract all search results of an word (or words) since the dawn of time (or anyway Twitter). Is that possible?
I can only retrieve 100 results ordered by recently added tweets but I want to for example see how many times “facebook” has been twitted last month, last year, and so on.
I’ve tried this URL: http://search.twitter.com/search.atom?lang=en&since=2006-01-01&rpp=1000&q=facebook but it still doesn’t give me more than 100 results. I’ve read that the rpp parameter has a maximum of 100 but is there a way to “scroll” through the list and get all results?

Offsets and pagination is probably your best bet

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string