How to config Solr to exclude certain documents from search result? - search

To exclude some documents from the search result, I can use the not in or - negative sign to specify the ids like this through a query.
select/?q=*:*&fq=-id:86+-id:338
This is ok if there are only a few ids needed to be excluded, but the query will be very very long if I have thousands of ids needed to be excluded from the search. I need to conditionally exclude a list of documents from the search result. How can I do it through solr configuration file such as creating an exclusion list? For example, I need to exclude the documents from search result if usstate is equal to NY, CA, VA, and zip code ending with 3

You can use the QueryElevationComponent to add a list of documents to exclude for a certain query. This will require you to perform a search that matches the query mentioned in the configuration file for it to work.
If you meant "i don't want to have any documents with usstate as NY, CA, VA where the zip code ends with 3" for your example, you're probably better off by just using that criteria directly in your query (with a modified version of -(usstate:(NY OR CA OR VA) AND zipcode:*3) to get the behavior you want) (see ReverseWildcardFilterFactory for faster reverse matches).

Related

How to do - wildcard search in Data Catalog (Google Cloud Platform)

How to execute a wildcard/RegEx search in Data Catalog (Google Cloud Platform) ?
It would make sense to search metadata across column names and tag attributes (and there values).
The current documentation only lists very strict search behavior
e.g. for tag:data_gov_template.hasPII(=true)
Needed would be a result for "PII" - I don't care about specifying the exact template name etc.
e.g. labels:etl
if I only search for etl there is no result
(metadata/attributes and values is not searchable on a direct way?)
From your use case, I understood that you want to search for a particular metadata attribute, like a Tag field, PII, right?
For tagged assets
If you don't care about the template name. You could use the tag:x search facet.
So if all your templates, data_gov_template, data_curator_template, data_etl_template, all contain the same Tag field name, has_pii, you can search using:
tag:has_pii and this will return all assets with that metadata attribute, no matter what the template name is.
For columns
You can use the column:x search facet to match a substring of the column name in the schema of the data asset. Which does not support nested columns yet.
For labels
You can use the labels:bar search facet for data assets that have a label (with some value) and the label key has bar as a substring.
You are also able to search on their values. So yes, the metadata/attributes and values are searchable.
But it is not a regex kind, it is a substring match when the search facet uses colon :, like labels:bar or an exact match when the search facet uses equals =, like type=table.

Multi LookUp - Check for unique values

I´m trying to set up unique values in my PowerApp-Form. The data is stored in a Sharepoint list. I have a column called watches, items in this column have a unique number, which have to be unique. People can pick multiple of those watches in a LookUp-field. But before submitting the form, I need to check if those picked values already exist in my list and at least display an error message.
I have setup a regular text field and added following rule to it:
If(LookUp(MyList.Watches;DataCardValue4.SelectedItems.Value in Watches;"OK")<>"OK";"No Error";"Watch already exist")
DataCardValue4 is my LookUp field, where people can pick those watches. With this rule I want to check if a item already is in my column watches and let my text field display the error. Somehow the rule doesn´t work.
Can you tell me how I compare multiple lookup choices to my table/column entries?
The first parameter to the LookUp function should be the table (SharePoint list) rather than the column. So the first parameter should be 'MyList' rather than 'MyList.Watches'. Also, I'm not sure that the formula provided (second parameter to LookUp) will work. In your formula, you will be looking for multiple items (DataCardValue4.SelectedItems.Value) within multiple items (Watches). Perhaps you can update your app to have users only pick one watch value before submitting?
One last thing to note. I'm not sure how big you expect your SharePoint list to get, but I would highly recommend keeping your LookUp formula within the bounds to support delegation. More specifically, SharePoint has different formula requirements than other connectors. For example, you can use '=' in your formula, but not 'in'.
Your new rule might look something like below. Please note that it could have syntax errors and might not even be delegable in it's current form since I am providing the rule with no checking. Also, I switched from using LookUp to using Filter instead just because I'm more familiar with Filter. However, both functions are very similar.
If(CountRows(Filter(MyList; DataCardValue4.Selected.Value = Watches)) > 0; "Watch already exist"; "No Error")

MongoDB: Indexing for a live search

Situation
I need to create a live search with MongoDB. But I don't know, which index is better to use normal or text. Yesterday I found main differences between them. I have a following document:
{
title: 'What vitamins are found in blueberries'
//other fields
}
So, when user enter blue, the system must find this document (... blueberries).
Problem
I found these differences in the article about them:
A text index on the other hard will tokenize and stem the content of the field. So it will break the string into individual words or tokens, and will further reduce them to their stems so that variants of the same word will match ("talk" matching "talks", "talked" and "talking" for example, as "talk" is a stem of all three).
So, Why is a text index, and its subsequent searchs faster than a regex on a non-indexed text field? It's because text indexes work as a dictionary, a clever one that's capable of discarding words on a per-language basis (defaults to english). When you run a text search query, you run it against the dictionary, saving yourself the time that would otherwise be spent iterating over the whole collection.
That's what I need, but:
The $text operator can search for words and phrases. The query matches on the complete stemmed words. For example, if a document field contains the word blueberry, a search on the term blue will not match the document. However, a search on either blueberry or blueberries will match.
Question
I need a fast clever dictionary but I also need searching by substring. How can I join these two methods?

Search for exact term in an Algolia index

I want to filter an index by an exact value of an attribute. I wonder what possibilities Algolia offers for that.
Querying an index always results in a search for substrings, that means a search term abc will always match any object which attribute values contain abc. What I want to achieve is a search for abc that finds only abc as a value of an attribute (in this case I have specific attributes to search in).
One possibility I came up with was tagging, which doesn't seem to be the best way to think of.
Edit
I think I could also use facet filters. I thought about the different pros and cons and can't come up with arguments that places either one position above the other.
You're right with your edit that facet filters would be the way to go on this one. You'll get the exact match you're looking for and won't have to create a new attribute of _tags to use the tag filter.

Lucene number extracting

I have this number extracting problem.
I want to get all matches that don't have a certain number in it
ex : 125501874, 125001873
Every number that as 55 at the position 2 are not to be considered.
The first numbers range is 0 to 9 and the second is 1-9 so the real range is [01-99]
(we cannot have 00 as the first two number)
With Lucene I wanted to add NOT field:[01-99]55*
But it doesn't seem to work. Is there an easy way to find ??55* and disregard it in a Search("NOT field:[01-99]55*")?
Thank you Lucene guru
Lucene can do this very efficiently if one creates an "index-only" field with only the third and fourth digits in it. The complete value can be "stored" (or stored and indexed if other queries use the whole number) in the original field.
Update: A followup comment asked, "Is [there] a way to create a temporary index on only the second digit?"
Using a ParallelReader "vertically partitions" the fields of an index. One partition could hold the current index, with its fields, while the other is a temporary index with the new field, possibly stored in a RAMDirectory.
Assuming the number is "stored" in the original index, iterate over each document in the original index, retrieve the stored field, parse out the key digits, and add a Document to the temporary index with the new field. As the ParallelReader documentation states, it is imperative that the document numbers match in both indexes.
Thank you erickson, Your solution is probably the best, using ParallelReader if only I could use temporary indexes, cause we cache the search query, we will need those later.
But like you said before, better start with an index on the relevant digits straighaway.
I have another solution.
NOT field:0?55*
NOT field:1?55*
...
NOT field:9?55*
It is efficient enough for the search I'm doing and it bypass the first character wildcard limitation. I wouldn't use that if their where more digits to check or if they where farther from the start.
Now I'm testing this on a million of row and it's pretty efficient for our needs.

Resources