I am looking into using ElasticSearch as a search engine for one of the projects I am working on.
There is still one thing which I need to find an answer for, and I hope someone inhere can help.
The customer want to be able to see some search statistic, like google analytics. Most searched words, new search words and so on.
Is there a way to easily setup this type of search statistic. My idea is something like ElasticSearch stores search history, about the search request made to the REST API. Then my customer can use Kibana or some other visual tool to monitor the search history of ElasticSearch.
Hope someone can help me with an answer for this.
Regards
Jacob
You could adjust the slow log to a time which it will capture all requests, however this will then produce large log files which will require maintenance. You could write an application which handles all of your ES requests, takes the search phrase and indexes this in a separate index i.e. your search history index and then deals with the actual request as normal, returning the response to the user.
Related
I need to know if it is worth to build a crawler on top of the results given by a search engine.
By that means, for a given query, grab N URLs from a search engine and input them into a crawler to find more relevant pages to the search. Is there any scientific paper/experiment claiming that doing this helps gathering more relevant pages instead of only getting URLs from the search engine?
If I understood it right, you would rebuild the search engine, because it was its job to bring the most related/relevant results first over a search. And, although you did not mention directly your search engine, which I guess it is google, I would suggest you to use the advanced search options before trying anything else. Google provides an API for performing searches, which you can use in your system. But if this approach does not fit to you, it is possible to craw over google results, and even perform custom searches (for example filtering results by site, term or etc) but google would not be happy with this and would eventually block your calls. I suggest you give a try over its open API...
I want to do search mechanism similar to google using NLP(Natural Language Processing) in java. The algorithm should be able to give auto-suggestions , spellcheck , get the meaning out of the sentence and display top relevant results.
For Example , if I typed "laptop" relevant results to be shown ["laptop bags","laptop deals","laptop prices","laptop services" ,"laptop tablet"]
Is it possible to achieve with NLP and Semantics? It would be appreciable if you post any reference links or ideas to achieve.
"Get the meaning out of a sentence" - that's really difficult task. I don't believe even google does that in their search engine;) When talking about searching getting the meaning of query is not that important...but it really depends on what do you mean by "get the meaning", anyway you always can buy yourself something like "Google Search Appliance" - its a private google search box.
All the other requirements are quite straightforward. I'm from java land soi'd suggest you to look at:
Apache Lucene - if you are a developer, it's an indexer created around full text searches
Elasticsearch It's full blown,fast scalable server build around lucene that can do most of what you are asking.
Solr Another one, in terms of functionality equal to elastic IMHO.
How to filter activities in e.g. If I want to filter activity based on some content, it should list all the activities which had that content. Is there any such functionality provided by getstream?
Could you please suggest ?
I'm looking for the same thing. If we assume GetStream is like Facebook, Facebook doesn't really let you search their feed, and in general I think it's a tricky thing to do. What Facebook DOES do is have that SLOW search up at the top (you can tell it takes a long time and half the time doesn't even really have good results). If we accepted that our search would be slow we can paginate through a list of all the objects in our feed from GetStream then search through them on our site. Not a good solution but haven't really seen any other ones.
I would like SolR to be able to "learn" from my website users' choices. By that i mean that i know which product the user click after he performed a search. So i collected a list of [term searched => number of clicks] for each product indexed in SolR. But i can't figure how to have a boost that depends on the user input. Is it possible to index some key/value pairs for a document and retrieve the value with a function usable in the boost parameter ?
I'm not sure to be clear, so i'll add a concrete example :
Let's say that when a user search for "garden chair", SolR returns me 3 products, "green garden chair", "blue chair", and "hamac for garden".
"green garden chair" ranks first, the hamac ranks last, as expected.
But, then, all the users searching for "garden chair" ends up clicking on the hamac.
I would like to help the hamac to rank first on the search "garden chair", WITHOUT altering the rank it got on other search. So i would like to be able to perform a key=>value based boost.
Is that possible to achieve with SolR ?
I'm sure that i can't be the first one needing such user-based search results improvement.
Thanks in advance.
You could you edismax bq, if you are using edismax (or maybe bf). For this to work, you obviously need to store the info (in a db, redis, whatever you fancy):
searched "garden chair":
clicked "hamac for garden": 10
clicked "green garden chair": 4
searched "green table":
...
And so forth, look this up when there is a search, and if there is info available for the search, send the bq boosting what you want.
Also, check out the QueryElevationComponent It might your purpose (although is stronger than just boosting....). There are two things to consider though:
Every time you change the click number you would need to modify the xml and reload, so it would be better if you could batch it to nightly or something like that.
there was a recent jira issue to allow you to provide similar functionality but by providing request params, no need of xml/reload, so check that out too
I'm trying to organize a solr search engine. I've already set up the misspelling system and the suggestions.
However I can't seem to find how to retrieve the top 10 most searched words/terms/keywords in solr/lucene. How can I get this? I want to display those on my homepage.
Solr does not provide this kind of feature out of the box. There is the StatsComponent, that provides you with all kind of statistics, but all of those are numeric only.
Depending on how you access solr (directly or via your own app) you could intercept all calls an log the query string. I did this in a recent project where I logged a queries to a database. If you submit all keywords to an other core on your solr server, you can faceting queries on your search terms as described by Hyque
You could use a facet for retrieving the Top X words like this:
http://yourservergoeshere/solr/select?q=*&wt=xml&indent=true&facet=true&facet.query=*&facet.field=message&facet.limit=10&facet.minCount=1
The value of facet.field depends on the field you like to search in. With facet.limit you'll (obviously) limit the amount of results to 10. You'll find the facet results at the end of the results, starting with "facet_counts"
Edit: I really should go to bed earlier. I didn't see the "most searched" in your question. Sorry for that.
Apache Solr does not provide any such capability as of today. There is a desire for this and a JIRA ticket corresponding to it. You can vote for it if you'd like to see it in Solr some day: https://issues.apache.org/jira/browse/SOLR-10359.
The stats component provides information around statistics, but it's mostly numeric in nature. You could parse server logs and come up with a way to build a Frequently Searched Terms (e.g. pump those logs in SiLK or Kibana for visualization).
If you have the ability to change the front end and add some javascript code to the UI or can intercept the search request and make an async or batch calls to APIs for tracking, you can use SearchStax Analytics that provides Search Analytics that tracks searches, clicks, cart actions, revenue, etc.