Are there any other indexing/searching engine like sphinx and Clucene?
I would like to split my document into several parts and index each part individually for later look-up.
any lighter (scalable) engine?
Cheers
Related
We have many different documentation sites and I would like to search a keyword across all of these sites. How can I do that?
I already thought about implementing a simple web scraper, but this seems like a very ugly solution.
An alternative may be to use Elasticsearch and somehow point it to the different doc repos.
Are there better suggestions?
Algolia is the absolute best solution that I can think of. There's also Typesense and Meilisearch of course.
Algolia is meant specifically for situations like yours, so it even comes with a crawler.
https://www.algolia.com/products/search-and-discovery/crawler/
https://www.algolia.com/
https://typesense.org/
https://www.meilisearch.com/
Here's a fun page comparing them (probably a little biased in Typesense's favor)
https://typesense.org/typesense-vs-algolia-vs-elasticsearch-vs-meilisearch/
Here are some example sites that use Algolia Search
https://developers.cloudflare.com/
https://getbootstrap.com/docs/5.1/getting-started/introduction/
https://reactjs.org/
https://hn.algolia.com/
If you personally are just trying to search for a keyword, as long as they're indexed by Google, you can always search with the format site:{domain} "keyword"
You can checkout Meilisearch for your use case. Meilisearch is a Rust based and open sourced search engine.
Meilisearch comes with a document scraper tool ( https://github.com/meilisearch/docs-scraper ) that can scrape content and then also index it.
While using it you need to define what exact content you are searching for in the configuration file for the scraper tool. And then you can run the tool using Docker.
I'm having an issue where inline Javascript is being displayed in Solr search results on my Drupal website. Is there a way to hide parts of my code from being indexed by Solr similar to how google uses googleoff:index and googleon:index to keep code from being indexed?
If you use the solr search module for drupal, you can tell solr to index specific fields in your content :
https://www.drupal.org/project/search_api_solr
So your javascript will not get indexed.
I'm wanting to create a search page in Sails.js that will search through a MongoDB. I know how to accomplish this. However, I was wondering if there is a way with Waterline, or any other option, to account for typos and alternate spellings. For example. If the MongoDB entry is "Springfield High School" how can I account for "Springfield High-School" or "Spring Field High School" etc... I'm assuming if this is possible it's done with Waterline some way, but I haven't been able to find any good documentation (findLike()???).
MongoDB supports full text search through text indexes, including search string tokenization and simple language-specific stemming. See the linked page for a full description of features.
I am doing a simple project on searchengine using C#. For this I am using lucene.net. Now I am done with indexing and simple searching..My project has one module where it extracts all email-ids on a particular page. For this i need to use some logic of regular expression..I searched and searched on the net, but couldn't find any example on how to search the index using RegexQuery or any other reg expression tool.. Please help!
I am using lucene.net version 2.9.2
I just started researching the same thing and have come to the conclusion that Regex searches wasn't supported in 2.9.2.
On the brighter side it looks like the newest version does do Regex searches.
http://incubator.apache.org/lucene.net/
To quote the site version 2.9.4 "A couple of new features: Search.Regex, Simple Faceted Search, and simple phrase analysis in the Fast Vector Highlighter"
Looking at the wikimedia api documentation it only talks of searching for images who's titles begin with the search term. I'd like to have a more general search.
http://www.mediawiki.org/wiki/API:Allimages
This API document does state that you can query the images like this:
http://en.wikipedia.org/w/api.php?action=query&list=allimages&aiprop=url&format=xml&ailimit=10&aifrom=Albert
However the aifrom= parameter seems kind of limited. Is there an alternative parameter to use for wildcard searches?
I don't think there is. You can use table indexes with wildcards at the end of words, but cannot otherwise, so such wildcard searches would lead to preformance problems on sites the size of Wikipedia.