azure search exact match - azure

I have a table with a lot of data. One field is a string for example
searchableField
row 1: abcdefgdefg1hijklmnopqrstuvw234234
row 2: abcdefgdefg1hijklmnopqrstuvw2dsfds33
row 3: abcdefgdefg1hijklmnopqrstuvw234234
row 4:
abcdefgdefg1hijklmnopqrstuvwweewere333wr
row 5:
abcdefgdefg2hijklmnopqrstuvw234222aadfff
row 6:
abcdefgdefg1hijklmnopqrstuvwdsfdsf
I only want result row 5 back, but adding search therm defg2 won't work.
In some other cases I want onl y result 1, 2, 3, 4, 6 back. but also searching on defg1 won't work for me.
Something that should work for me is a filter, but unfortunately there are no filters with contains. What can I do as work around?

Please read the How full text search works in Azure Search article. It will help you understand how your documents and query terms are processed and how to customize the behavior of your search index to achieve the results you want.
In your case, you might want to create a custom analyzer that will break up the long terms in your document into smaller ones that are likely to be used as query terms by users of your application.
Alternatively, you can issue a wildcard or a regex query using the Lucene query language to simulate the contains behavior you're looking for. More information here: Azure search, search by partial terms

Below lucene query will help for doing a like or contains search as above question
item : /.* defg2 .*/

you can use search.ismatch or search.ismatchscoring functions
ex:
"filter": "search.in(metadata_library, 'a3e9838f-3fec-49d8-a1ea-46f361238ffd') and search.ismatch('[exe pixel!][test new tags 102][css monitor]', 'metadata_tags','simple','all')",

Related

microsoft search using multiple criteria (wildcards, exclusions, etc.)

I am using a Microsoft DB application (AXAPTA) that lets us search various fields by typing criteria at the tops of the table. An example would be to filter item numbers to those starting with a 2 by typing 2* or excluding items with bell in the description by typing !bell. Quotes are not normally needed. We often combine multiple criteria by separating them with a "," For example, 2*,9* where the "," acts as an OR. Unfortunately, I cannot figure out how to create a multiple criteria AND. What I am trying to do is exclude items that have DNU in the description AND also have bell in the description.
My thought would be
!DNU & bell
but that doesn't work. Any ideas? I am sure this is simple, but I am stuck.
You need to use the advanced SQL query syntax, which can also be put in the filter location.
You'll have to play with it a while to get exactly what you want, but see these links below. You'll probably need to use a combination of info from the different links:
https://technet.microsoft.com/en-us/library/aa569937.aspx
http://www.axaptapedia.com/Expressions_in_query_ranges
https://msdn.microsoft.com/en-us/library/aa893981.aspx
https://learn.microsoft.com/en-us/dynamics365/unified-operations/fin-and-ops/get-started/advanced-filtering-query-options

Search for exact term in an Algolia index

I want to filter an index by an exact value of an attribute. I wonder what possibilities Algolia offers for that.
Querying an index always results in a search for substrings, that means a search term abc will always match any object which attribute values contain abc. What I want to achieve is a search for abc that finds only abc as a value of an attribute (in this case I have specific attributes to search in).
One possibility I came up with was tagging, which doesn't seem to be the best way to think of.
Edit
I think I could also use facet filters. I thought about the different pros and cons and can't come up with arguments that places either one position above the other.
You're right with your edit that facet filters would be the way to go on this one. You'll get the exact match you're looking for and won't have to create a new attribute of _tags to use the tag filter.

excel search for multiple items

I am trying to search for multiple items in a cell. If any of the terms I am looking for is present, I want cell D to display "Laptop", otherwise, display "Desktop". I can get the following to work, with just one term to search for:
=IFERROR(IF(SEARCH("blah",A2),"Laptop",""),"Desktop")
But I want to search for the presence of blah, blah2, and blah3. I don't know how to get Excel to search for any of the following terms. (Not all of them mind you, just any of the following.
I did see that there is an or option for the logic.
=OR(first condition, second condition, …, etc.)
I am not sure how to get these two to work together. Any thoughts on how to get them to display "Laptop" if any of the words are present?
This should work:
=IF(SUM(COUNTIF(A2,"*" &{"blah1";"blah2";"blah3"}& "*"))>0,"laptop","desktop")
You could use the combination of OR, IFERROR and SEARCH as you suggest, but I think the simpler construct would be ...
=IF(AND(ISERROR(SEARCH("value1",A2)),ISERROR(SEARCH("value2",A2))),"Desktop","Laptop")

Quick SQL question

Working on postgres SQL.
I have a table with a column that contains values of the following format:
Set1/Set2/Set3/...
Seti can be a set of values for each i. They are delimited by '/'.
I would like to show distinct entries of the form set1/set2 and that is - I would like to trim or truncate the rest of the string in those entries.
That is, I want all distinct options for:
Set1/Set2
A regular expression would work great: I want a substring of the pattern: .*/.*/
to be displayed without the rest of it.
I got as far as:
select distinct column_name from table_name
but I have no idea how to make the trimming itself.
Tried looking in w3schools and other sites as well as searching SQL trim / SQL truncate in google but didn't find what I'm looking for.
Thanks in advance.
mu is too short's answer is fine if the the lengths of the strings between the forward slashes is always consistent. Otherwise you'll want to use a regex with the substring function.
For example:
=> select substring('Set1/Set2/Set3/' from '^[^/]+/[^/]+');
substring
-----------
Set1/Set2
(1 row)
=> select substring('Set123/Set24/Set3/' from '^[^/]+/[^/]+');
substring
--------------
Set123/Set24
(1 row)
So your query on the table would become:
select distinct substring(column_name from '^[^/]+/[^/]+') from table_name;
The relevant docs are http://www.postgresql.org/docs/8.4/static/functions-string.html
and http://www.postgresql.org/docs/8.4/static/functions-matching.html.
Why do you store multiple values in a single record? The preferred solution would be multiple values in multiple records, your problem would not exist anymore.
Another option would be the usage of an array of values, using the TEXT[] array-datatype instead of TEXT. You can index an array field using the GIN-index.
SUBSTRING() (like mu_is_too_short showed you) can solve the current problem, using an array and the array functions is another option:
SELECT array_to_string(
(string_to_array('Set1/Set2/Set3/', '/'))[1:2], '/' );
This makes it rather flexible, there is no need for a fixed length of the values. The separator in the array functions will do the job. The [1:2] will pick the first 2 slices of the array, using [1:3] would pick slices 1 to 3. This makes it easy to change.
If they really are that regular you could use substring; for example:
=> select substring('Set1/Set2/Set3/' from 1 for 9);
substring
-----------
Set1/Set2
(1 row)
There is also a version of substring that understands POSIX regular expressions if you need a little more flexibility.
The PostgreSQL online documentation is quite good BTW:
http://www.postgresql.org/docs/current/static/index.html
and it even has a usable index and sensible navigation.
If you want to use .*/.* then you'd want something like substring(s from '[^/]+/[^/]+'), for example:
=> select substring('where/is/pancakes/house?' from '[^/]+/[^/]+');
substring
-----------
where/is
(1 row)

Lucene number extracting

I have this number extracting problem.
I want to get all matches that don't have a certain number in it
ex : 125501874, 125001873
Every number that as 55 at the position 2 are not to be considered.
The first numbers range is 0 to 9 and the second is 1-9 so the real range is [01-99]
(we cannot have 00 as the first two number)
With Lucene I wanted to add NOT field:[01-99]55*
But it doesn't seem to work. Is there an easy way to find ??55* and disregard it in a Search("NOT field:[01-99]55*")?
Thank you Lucene guru
Lucene can do this very efficiently if one creates an "index-only" field with only the third and fourth digits in it. The complete value can be "stored" (or stored and indexed if other queries use the whole number) in the original field.
Update: A followup comment asked, "Is [there] a way to create a temporary index on only the second digit?"
Using a ParallelReader "vertically partitions" the fields of an index. One partition could hold the current index, with its fields, while the other is a temporary index with the new field, possibly stored in a RAMDirectory.
Assuming the number is "stored" in the original index, iterate over each document in the original index, retrieve the stored field, parse out the key digits, and add a Document to the temporary index with the new field. As the ParallelReader documentation states, it is imperative that the document numbers match in both indexes.
Thank you erickson, Your solution is probably the best, using ParallelReader if only I could use temporary indexes, cause we cache the search query, we will need those later.
But like you said before, better start with an index on the relevant digits straighaway.
I have another solution.
NOT field:0?55*
NOT field:1?55*
...
NOT field:9?55*
It is efficient enough for the search I'm doing and it bypass the first character wildcard limitation. I wouldn't use that if their where more digits to check or if they where farther from the start.
Now I'm testing this on a million of row and it's pretty efficient for our needs.

Resources