Solr - Exact and Stem search on same text with highlights - search

I am trying to perform exact and stemmed searches on text and get back "compiled" results.
Currently what I have:
There is text being stored in the stem field and it's copied into the quoted field. Stem queries and exact queries work on their respective fields. When I search for (problem only with and)
"word1 word2" and/or word3
which gives me the query
stemmed:word3 &/or quote:"word1 word2
What I get is results from the two fields respectively. With or, this is fine but with and, I get back two or more results back for the same text and each has different highlighting.
The question is: what's the best way to do stem/exact search on the same text (guessing multiple fields) and if I have the right approach, what's the best way to merge these and if solr can do it?
Thanks!!
Edit: I checked out edismax but fail to see how to use it properly. My results are in the comments of the answer suggesting it...

Please check on the Edismax Query Parser which will allow you to define the fields and have the text being searched on all of them with variable boost.

Related

Alteryx Analyse the similarity of the words

I am currently doing out the top 10 types of fault chart. So the user will key in what is the fault about, ex. light bulb fused. As it is free flow text box, the words may not be the same. Is there anyway to make Alteryx understand that some words may be the same, allowing me to find the top 10 types of fault. Thank you.
You have a couple of ways. You can use the Fuzzy Match tools in the Join category to sort out slight spelling mistakes. You can find Alteryx examples of Fuzzy Match on Youtube.
You can also use the Record ID followed by Text to Columns (Split to Rows based on space) to get a list of single words.
In what you are trying to do, I would advise building up a bit of a lookup table. You can then use the Find-Replace Tool to Append the Category from the lookup depending on the words that are found.
Depending on the cleanliness of your data and how different each category is will guide you as to how far down the above paths you should go.

MongoDB: Indexing for a live search

Situation
I need to create a live search with MongoDB. But I don't know, which index is better to use normal or text. Yesterday I found main differences between them. I have a following document:
{
title: 'What vitamins are found in blueberries'
//other fields
}
So, when user enter blue, the system must find this document (... blueberries).
Problem
I found these differences in the article about them:
A text index on the other hard will tokenize and stem the content of the field. So it will break the string into individual words or tokens, and will further reduce them to their stems so that variants of the same word will match ("talk" matching "talks", "talked" and "talking" for example, as "talk" is a stem of all three).
So, Why is a text index, and its subsequent searchs faster than a regex on a non-indexed text field? It's because text indexes work as a dictionary, a clever one that's capable of discarding words on a per-language basis (defaults to english). When you run a text search query, you run it against the dictionary, saving yourself the time that would otherwise be spent iterating over the whole collection.
That's what I need, but:
The $text operator can search for words and phrases. The query matches on the complete stemmed words. For example, if a document field contains the word blueberry, a search on the term blue will not match the document. However, a search on either blueberry or blueberries will match.
Question
I need a fast clever dictionary but I also need searching by substring. How can I join these two methods?

Search for exact term in an Algolia index

I want to filter an index by an exact value of an attribute. I wonder what possibilities Algolia offers for that.
Querying an index always results in a search for substrings, that means a search term abc will always match any object which attribute values contain abc. What I want to achieve is a search for abc that finds only abc as a value of an attribute (in this case I have specific attributes to search in).
One possibility I came up with was tagging, which doesn't seem to be the best way to think of.
Edit
I think I could also use facet filters. I thought about the different pros and cons and can't come up with arguments that places either one position above the other.
You're right with your edit that facet filters would be the way to go on this one. You'll get the exact match you're looking for and won't have to create a new attribute of _tags to use the tag filter.

excel search for multiple items

I am trying to search for multiple items in a cell. If any of the terms I am looking for is present, I want cell D to display "Laptop", otherwise, display "Desktop". I can get the following to work, with just one term to search for:
=IFERROR(IF(SEARCH("blah",A2),"Laptop",""),"Desktop")
But I want to search for the presence of blah, blah2, and blah3. I don't know how to get Excel to search for any of the following terms. (Not all of them mind you, just any of the following.
I did see that there is an or option for the logic.
=OR(first condition, second condition, …, etc.)
I am not sure how to get these two to work together. Any thoughts on how to get them to display "Laptop" if any of the words are present?
This should work:
=IF(SUM(COUNTIF(A2,"*" &{"blah1";"blah2";"blah3"}& "*"))>0,"laptop","desktop")
You could use the combination of OR, IFERROR and SEARCH as you suggest, but I think the simpler construct would be ...
=IF(AND(ISERROR(SEARCH("value1",A2)),ISERROR(SEARCH("value2",A2))),"Desktop","Laptop")

Google Site Search - Filtering with PageMap attributes that contain blanks / special characters

We're using a PageMap to provide structured data with our Html content. Part of this structured data are keywords that are displayed on the result page. Furthermore, it should also be possible to filter the results with such a keyword.
We do have keywords that contain spaces as well as special characters. So here's an excerpt of a result element returned by the XML API of the Google Site Search:
<PageMap>
<DataObject type="document">
<Attribute name="mykeywords">Computer & Hobby</Attribute>
...
</DataObject>
</PageMap>
This works perfectly for displaying the result. However, for filtering we would have to pass a query like that:
more:pagemap:document-mykeywords:computer___hobby
How can we determine the query string from the result in the XML? Simply by lowercasing the value and replacing every non word character with _? How reliable is this?
Or is it better to provide two distinct attributes in our PageMap, one for the label of the keyword and the other for the id of the keyword?

Resources