I'm wanting to create a search page in Sails.js that will search through a MongoDB. I know how to accomplish this. However, I was wondering if there is a way with Waterline, or any other option, to account for typos and alternate spellings. For example. If the MongoDB entry is "Springfield High School" how can I account for "Springfield High-School" or "Spring Field High School" etc... I'm assuming if this is possible it's done with Waterline some way, but I haven't been able to find any good documentation (findLike()???).
MongoDB supports full text search through text indexes, including search string tokenization and simple language-specific stemming. See the linked page for a full description of features.
Related
We have many different documentation sites and I would like to search a keyword across all of these sites. How can I do that?
I already thought about implementing a simple web scraper, but this seems like a very ugly solution.
An alternative may be to use Elasticsearch and somehow point it to the different doc repos.
Are there better suggestions?
Algolia is the absolute best solution that I can think of. There's also Typesense and Meilisearch of course.
Algolia is meant specifically for situations like yours, so it even comes with a crawler.
https://www.algolia.com/products/search-and-discovery/crawler/
https://www.algolia.com/
https://typesense.org/
https://www.meilisearch.com/
Here's a fun page comparing them (probably a little biased in Typesense's favor)
https://typesense.org/typesense-vs-algolia-vs-elasticsearch-vs-meilisearch/
Here are some example sites that use Algolia Search
https://developers.cloudflare.com/
https://getbootstrap.com/docs/5.1/getting-started/introduction/
https://reactjs.org/
https://hn.algolia.com/
If you personally are just trying to search for a keyword, as long as they're indexed by Google, you can always search with the format site:{domain} "keyword"
You can checkout Meilisearch for your use case. Meilisearch is a Rust based and open sourced search engine.
Meilisearch comes with a document scraper tool ( https://github.com/meilisearch/docs-scraper ) that can scrape content and then also index it.
While using it you need to define what exact content you are searching for in the configuration file for the scraper tool. And then you can run the tool using Docker.
i´m using drupal8 with solr 7.4 and the search api module. I don´t find a way to configure the search api to get all indexed items by searching with ""(nothing an put enter) or searching by "*". How can i enable such a search behavior?
Thanks a lot
Tim
As upto my understanding on your question, You can Create Facets and configure the filters by specific taxonomy terms or content types. Before that check the fields what you are indexing into the solr.
I have this requirement:
We have a journalarticle and we wish to have sections which have content for internal and external users for the application.
We are able to hide the content from rendering by implementing custom template on web content display and using a simple custom-field for a user which helps us to classify it.
Having said that when we search something as an external user, the search portlet is able to fetch an article where the search text is a part of internal user content, and due to the above mentioned template the content is not visible.
In short, from the user's perspective the resultant article does not match the searched term.
I wish to seek some pointer to check whether there is a mechanism to ensure that when an external user searches something then we only search the dynamic-element of the doc which matches the user type?
We have thousands of such articles and create multiple copy of the same article does not seems viable solution.. so any pointers would be a great help.
Liferay version : 6.2 GA4 CE
Thanks!
AJ
First of all: Not finding a search term in a document can be a sign of good working synonym resolution in the search engine. It's questionable if this behaviour is always wrong or only in this particular case. Remember google bombs?
That being said, I believe that this architecture of half-visible documents is flawed from the beginning. Ideally I'd suggest to change it, for example by splitting the information to two articles, so that you can use the standard permissions to resolve. If you link both, you can determine how/which article or template to use. It's not an ideal solution, but might be a workaround.
Another workaround might be to change Liferay's indexer component and index two different versions of the article, with two different permissions. Of course, you'll have to change the search side as well, so that you'll find each article at most once, even if it's now twice in the search engine.
Again - not ideal, but might be the quickest fix that you can get right now without changing the underlying architecture. However, to change the underlying architecture is my actual recommendation.
I am trying to make multi-language stemming working with the Solr. I have setup language detection with LangDetectLanguageIdentifierUpdateProcessorFactory as per official Solr guides. The language is recognized and now I have a whole bunch of dynamic fields like:
description_en
description_de
description_fr
...
which are properly stemmed.
The question now is how do I search across so many fields? Making a long query every time that will search across dozens possible language fields doesn't seem like a smart option. I have tried using copyField like:
<copyField source="description_*" dest="text"/>
but stemming is being lost in the text field when I do that.
The text field is defined as solr.TextField with solr.WhitespaceTokenizerFactory. Maybe I am not setting up the text field properly or how is this supposed to be done?
You have multiple options:
search over all the fields you mentioned. There always will be some overhead: the more fields you use, the slower search will be (gradually)
try to recognise query language and search over only necessary fields: for example recognised and some default one. Here you can find library for this
develop custom solution with multiple languages in one field, which is possible and could work in production according to Trey Graigner
The question is a bit old, but maybe that answer will help other people.
I would like to use the full Lucene query syntax on an Orchard CMS based Website.
Currently, after enabling the indexing and search on Orchard, I can search on the website according to the fields I selected on the Orchard search administration page,
but I cannot perform one search on a particular field only (without changing the behavior on the entire search)
I cannot use fuzzy search...
From the logs, I can see that Orchard take care of that part (providing Lucene a good query syntax), but I would like to do it on my own.
For example, when searching "wel" on the website, Orchard will send to Lucene this query : title:wel* body:wel* (if I have the title and body fields activated on the search).
I did see some blogs that talk about coding some features to customize search, but I would like to be sure I'm not missing something before switching to developer mode :)
There are so many scenarios that can be done with search that there is no way to provide such coverage out of the box, which is why the API is very simple to use if you need custom searching capabilities.
You should copy-paste the controller from the search module and use the Parse() method of the ISearchBuilder with the escape parameter to false. This will parse a pure lucene query. You can also use the WithField("body", "value") to do simpler field search.
I don't believe anyone has released any modules that provide additional search functionality, because if you need it, it is so simple to develop ^_^ So yes, you will have to go dev mode to do custom field search