I'm looking for a search engine solution whereby there are attributes for each document which can be filtered against, but not absolutely - only scored.
doc1 has attributes a, b and c
doc2 has attributes b and c
if a user chooses attribute "a" only, it won't completely remove doc2, it'll just score doc1 higher...
Are there any search engines that can do something like that?
You can build something like that with Solr. You wouldn't use the built-in facets; instead, you would expose the attributes as links (perhaps in a facet-like way) and use them to add a scoring weight to that attribute, which can be sent as a parameter in a Solr query on the fly.
Related
In Azure Search you can boost a query by using first term || secondterm^2 to give more weight to a particular part of the query.
You can also filter for documents that belong to a particular group using search.in(): https://learn.microsoft.com/en-us/azure/search/search-query-odata-search-in-function
$filter=group_ids/any(g: search.in(g, '123, 456, 789'))
What I would like to do is boost a document if it belongs to a group but not restrict to only documents that belong to that group.
Something like:
search=mysearchterm^3&$filter=group_ids/any(g: search.in(g, '123, 456, 789'))^2
But (1) boosting on the filter doesn't seem to work and (2) this would restrict to only documents that belong to group 123,456,789. I would like it to only boost if it belongs, but not restrict to only those groups.
Is this possible?
I've had a look at "Tag Boosting": https://azure.microsoft.com/en-au/blog/personalizing-search-results-announcing-tag-boosting-in-azure-search/
But it doesn't seem relevant, it only seems possible to tag boost a top level string field, not a string or int field within a collection.
UPDATE:
I think i've figured it out as:
search=mysearchterm AND (group_id:123)^2
If you're using a complex collection it might be:
search=mysearchterm AND (groups/id:123)^2
Filters are designed to constrain your result set by filtering out documents that don't match the filter criteria. This process happens before term matching and scoring. Given that this is not what you are trying to do, I don't think filters are the right tools for the job.
You can try to instead craft a Lucene query that includes the logic you described:
mysearchterm OR (mysearchterm AND group_ids:(123 OR 456 OR 789))^2
I'm looking into ArangoSearch for the first time and it looks like a pretty good functionality.
However, in all the tutorials, despite having the ability to tell it to index all fields, one cannot do a 'blind' search across all fields of the document. Like when we look at the example below:
FOR d in myView SEARCH d.text IN ["quick", "brown"] RETURN d
I don't seem to have the ability to just search d entirely without specifying each individual field that I want to include in my search. Is that correct and if so, why is that and are there workarounds? I'm dealing with a lot of different collections with a lot of different fields that can contain a relevant term, it would be a shame if I'd have to tabulate all of them to make an expansive search.
Can someone confirm the behaviour of the Smart search results webpart when using a Smart search filter on a particular field, documentation here, when the index, and the expected results, are compound of multiple page types?
In my scenario I have 2 page types, one is always a child of the other, my hypothetical scenario would be a Folder and File types as an example.
I've configured the index with Pages type and Standard analyzer to include all Folder and File types under the path /MyOS/% on the tree.
The search page, includes the Smart search results webpart and a Smart search filter, a checkbox for the File's field FileIsHidden.
What I'm trying to ascertain is the possibility for the results to include all folders that have a hidden field, as well as the files?
Client has a v8.2 license and now has a requirement similar to this scenario.
Thanks so much for any help in advance.
Firstly what i would do is download the latest version of LUKE, it's a lucene inspector that allows you to run queries, inspect the data, etc.
https://code.google.com/archive/p/luke/downloads
Your search indexes are in the App_Data/Modules/SmartSearch/[SearchName], now i am not sure if LUKE can query 2 indexes as the same time, however you can run hte same query against both and see if it's filtering out results one way or another.
If you are trying to query where a field must be a value, and the other page type does not have the field, it probably is filtered out. What you need to do is use the lucene syntax to say "(classname = 'cms.file' and fileonlyproperty = '' OR classname <> 'cms.file')" so to say.
You'll have to test, but say the class name is cms.file and cms.folder, and the property is FileIsHidden, i think the syntax would be:
+((FieldIsHidden:(true) and classname:('cms.file')) OR (NOT classname:('cms.file'))
But you'll have to test that.
I have the following situation when using Solr. My document contains "entities" for example "peanut butter". I have a list of such entities. These are items that go together and are not to be treated as two individual words. During indexing, I want solr to realize this and treat "peanut butter" as an entity. For example if someone searches for
"peanut"
then documents that have the word peanut should rank higher than documents that have the word "peanut butter". However if someone searches for
"peanut butter"
then the document that has peanut butter should show up higher than ones that have just peanut. Is there a config setting somewhere which can be modified such that the entity list can be specified in a file and Solr would do the needful?
Configure that field to use a StrField type, instead of a TextField. TextField is designed to handle tokenization and full-text search on textual content. StrField treats it's contents as a keyword, and so does not tokenize.
Searching Magento with fulltext search engine and like method , it will store results in catalogsearch_fulltext table in "data_index" field where it stores value in the format like
each searchable attribute is separated with |.
e.g
3003|Enabled|None||Product name|1.99|yellow|0
here it store sku,status,tax class, product name , price ,color etc etc
It stores all searchable attribute value.
Now the issue is for Configurable product , it will also store the associated products name ,price ,status in the same field like
3003|Enabled|Enabled|Enabled|Enabled|None|None|None|None|Product name|Product name|associted Product name1|associted Product name2|associted Product name3|1.99|2.00|2.99|3.99|yellow|black|yellow|green|0|0|0|0
So what happen is if i search for any word from associated product, it will also list the main configurable product as it has the word in its "data_index" field.
Need some suggestion how can i avoid associated products being included in data_index, So that i can have perfect search result.
thanks
We are looking into our search as well and it has been surprising to see the inefficiencies included in the fulltext table. We have some configurable products as well that have MANY variations and their population in the fulltext search is downright horrendous.
As for solutions, I can only offer my approach to fix the problem (not completed: but rather in the process).
I am extending Magento to include an event listener to the process of indexing the products (Because catalog search indexing is when the fulltext database is populated). Once that process occurs, I am writing my own module to remove duplicate entries from the associated products and also to add the functionality of adding additional search keyword terms as populated from a CSV file.
This should effectively increase search speed dramatically and also return more relevent search results. Because as of now, configurable products are getting "search bias" in the search results.
This isn't so much of an answer as a comment, but it was too lengthy to fit in the comments but I thought this might be beneficial to you. Once I get my module working, if you would like, I can possibly give you directions on how you could implement a similar module yourself.
Hope that helped (if only for moral support in magento's search struggle)