Does CloudSpanner support Fuzzy Search or Wild Card Search? - google-cloud-spanner

I'm doing research on CloudSpanner as part of a spike for work, and comparing it to BigTable/Elastic Search. My team wanted to find out whether or not CloudSpanner supports either FuzzySearch and/or WildCard query. I could find this neither in articles nor by viewing youtube live demos, and i can't access a demo/free trial either. I know that CloudSpanner utilizes NewSQL but I couldn't find anything on NewSQL supporting those either.

Cloud Spanner doesn't support FuzzySearch directly and as far as I know it does not support WildCard Queries directly.
the most similar things that it supports that can work for you are:
Regular Expressions:
With the function REGEXP_CONTAINS with which you can perform queries with a regular expression that matches what you want. This allows to look for [úuü] looks for all the alternatives of u.
LIKE Operator
The like operator will allow you to match a section of a string. to see it documentation you can check it here
If none of this alternatives work for you then i would suggest to do something like this

Yes, Cloud Spanner supports wild cards for some SQL queries (see documentation).
https://cloud.google.com/spanner/docs/query-syntax

Related

How to create a partial search in Meteor that only returns permitted data

I'm trying to create a search feature in Meteor 1.8.1 that does the following:
returns partial matches, e.g. "fish" will find "fish", "fishcake" and "dogfish"
has server-side control of which documents are returned, so search results don't include documents that are not published to the user
is reasonably efficient
returns a limited number of results
This seems like it should be a common requirement, but I'm failing to find any solution.
MongoDB full text search will only return on whole words, so will only find "fish".
Easy search doesn't support server-side permissions, as far as I can tell.
I could try a regex solution but I think it would be expensive?
Thank you for any solutions!
Edit: From discussion it seems that Easy Search does support server-side filtering using a selector, and this would be the best solution. However, I can't get a selector working from the examples and documentation. For clarity, I've created a new question for that issue
The documentation explicitly states that for advanced use cases you may want to use elastic search and offers you a pluggable extension to ease the burden of integration.
https://matteodem.github.io/meteor-easy-search/docs/recipes/#advanced-search
You might wish that a search for cafe returns documents with the text café in them (special character). Or that your search string is split up by whitespace and those terms used to search across multiple fields.
You should consider using a search engine like ElasticSearch for your search if you have these usecases. ElasticSearch allows you to configure precisely how your fields are being searched. One way you can do that is by analyzing your data, so that searching itself is as fast as possible.

In Azure Search, how do you run a "contains" search with multiple terms in searchtext?

I am using Azure Search in full query mode on top of CosmosDB and I want to run a query for any documents with a field that contains the string "azy do". This should match, for example, a document containing "lazy dog".
Reading the Azure Search documentation, it looks like this is impossible due to the term-based indexes it uses.
Rejected solutions
0 matches since it is looking for whole words:"azy do"
Doesn't work since regexes are not allowed to span multiple terms:/.azy do./
This "works", to the extent that it will match "lazy dog", but this does not respect the ordering of the query and will also match "dog lazy", for example /.azy./ AND /.do./
Is there any way of doing this correctly in Azure Search?
If you cannot achieve that via regular expression in the Lucene Query syntax, then is not possible. You may want to vote for supporting contains here.
It should be /.lazy|dog./
So, split the terms based on whitespaces and add a pipe(|) delimiter which stands for OR.
Shortly, Azure Search is not designed to support this scenario. You might be better off using the CONTAINS function in Cosmos DB or its equivalent, depending on what query language you use.
Azure Search is designed for finding terms or phrases that occur in unstructured content (documents) and returning the most relevant documents. The process of extracting and indexing those searchable terms is customizable and described here: How full text search works in Azure Search.

Google Datastore query filter for multiple values for same property

I have a query I wish to run on google Datastore that is intended to retrieve data from multiple devices. However, I couldn't find anywhere in the documentation that would allow me to get data from e.g. device-1 or device-2 or device-3, i.e. only 1 property name can be set. Is this a Datastore limitation? Or am I just missing something that I don't know about?
Based on the NodeJS client library, the query might look something like the below filter criteria:
var query = datastore.createQuery('data')
.filter('device_id', 1)
.filter('device_id', 2)
.filter('device_id', 3);
Otherwise, I might have to run separate queries for the various devices, which doesn't seem like a very elegant solution, especially if there are a lot of devices to simultaneously run queries on.
Any suggestions for the Datastore API or alternative approaches are welcome!
Yes, this would be an OR operation which is one of the Restrictions on queries (emphasis mine):
The nature of the index query mechanism imposes certain restrictions
on what a query can do. Cloud Datastore queries do not support
substring matches, case-insensitive matches, or so-called full-text
search. The NOT, OR, and != operators are not natively
supported, but some client libraries may add support on top of Cloud
Datastore. Additionally:

mongodb approximate string matching

I am trying to implement a search engine for my recipes-website using mongo db.
I am trying to display the search suggestions in type-ahead widget box to the users.
I am even trying to support mis-spelled queries(levenshtein distance).
For example: whenever users type 'pza', type-ahead should display 'pizza' as one of the suggestion.
How can I implement such functionality using mongodb?
Please note, the search should be instantaneous, since the search result will be fetched by type-ahead widget. The collections over which I would run search queries have at-most 1 million entries.
I thought of implementing levenshtein distance algorithm, but this would slow down performance, as collection is huge.
I read FTS(Full Text Search) in mongo 2.6 is quite stable now, but my requirement is Approximate match, not FTS. FTS won't return 'pza' for 'pizza'.
Please recommend me the efficient way.
I am using node js mongodb native driver.
The text search feature in MongoDB (as at 2.6) does not have any built-in features for fuzzy/partial string matching. As you've noted, the use case currently focuses on language & stemming support with basic boolean operators and word/phrase matching.
There are several possible approaches to consider for fuzzy matching depending on your requirements and how you want to qualify "efficient" (speed, storage, developer time, infrastructure required, etc):
Implement support for fuzzy/partial matching in your application logic using some of the readily available soundalike and similarity algorithms. Benefits of this approach include not having to add any extra infrastructure and being able to closely tune matching to your requirements.
For some more detailed examples, see: Efficient Techniques for Fuzzy and Partial matching in MongoDB.
Integrate with an external search tool that provides more advanced search features. This adds some complexity to your deployment and is likely overkill just for typeahead, but you may find other search features you would like to incorporate elsewhere in your application (e.g. "like this", word proximity, faceted search, ..).
For example see: How to Perform Fuzzy-Matching with Mongo Connector and Elastic Search. Note: ElasticSearch's fuzzy query is based on Levenshtein distance.
Use an autocomplete library like Twitter's open source typeahead.js, which includes a suggestion engine and query/caching API. Typeahead is actually complementary to any of the other backend approaches, and its (optional) suggestion engine Bloodhound supports prefetching as well as caching data in local storage.
The best case for it would be using elasticsearch fuzzy query:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-fuzzy-query.html
It supports levenshtein distance algorithm out of the box and has additional features which can be useful for your requirements i.e.:
- more like this
- powerful facets / aggregations
- autocomplete

Are there any technologies that help develop website search?

PROBLEM:
I need to write an advanced search functionality for a website. All the data is stored in MySQL and I'm using Zend Framework on top. I know that I can write a script that takes the search page and builds an SQL query out of it, but this becomes extremely slow if there's a lot of hits. Then I would have to get down to the gritty details of optimizing the database tables/fields/etc. which I'm trying to avoid if possible.
Lucene: I gave Lucene a try, but since it's a full-text search engine, it does not allow any mathematical operators!! So if I wanted to get all the records where field_x > 5, there is no way to do it (correct?)
General Practice? I would like to know how large sites deal with this dilemma. Is there a standard way of doing this that I don't know about, or does everyone have to deal with the nasty details of optimizing the database at some point? I was hoping that some fast indexing/searching technology existed (e.g. Lucene) that would address this problem.
ANY OTHER COMMENTS OR SUGGESTION ARE MOST WELCOME!!
Thanks a lot guys!
Ali
You can use Zend Lucene for textual search, and combine it with MySQL for joins.
Please see Mark Krellenstein's Search Engine vs DBMS paper about the choice; Basically, search engines are better for ranked text search; Databases are better for more complex data manipulations, such as joins, using different record structures.
For a simple x>5 type query, you can use a range query inside Lucene.
Use Lucene for your text-based searches, and use SQL for field_x > 5 searches. I say this because text-based search is hard to get right, and you're probably better off leaving that to an expert.
If you need your users to have the capability of building mathematical expression searches, consider writing an expression builder dialog like this example to collect the search phrase. Then use a parameterized SQL query to execute the search.
SqlWhereBuilder ASP.NET Server Control
http://www.codeproject.com/KB/custom-controls/SqlWhereBuilder.aspx
You can use filters in Lucene to carry out a text search of a reduced set of records. So if you query the database first to get all records where field_x > 5, build a filter (a list of lucene document IDs) and pass this into the lucene search method along with the text query. I'm just learning about this, here's a link to a question I asked (it uses Lucene.Net and C# but it may help) - ignore my question, just check out the accepted answer:
How do you implement a custom filter with Lucene.net?

Resources