Combine search engine and machine learning [closed] - search

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm pretty new on search engines and pretty newbie on machine learning. But I wanted to know if there is a way to combine functionalities of search engines like elasticsearch or Apache Solr and machine learning project like Apache Mahout, H2O or PredictionIO.
For exemple, if you work on a travel website where you can search for a destination. You start type "au", so the first suggestions are "AUstria", "AUstralia", "mAUrice island", "mAUritania"... etc... This is typically what elasticsearch can do.
But you know that this user has already travelled on Mauritania three times, so you want that Mauritania goes on the first place of suggestions. And I guess that's typically what machine learning can do.
Is there bridges between this two type of technologies ? Can machine learning ensure the work of search engine efficiently ?
I'm open to all answers, regardless of the technologies used. If you have ever experienced this type of problems, my ears are wide open :-)
Thank you

Your question is very general in nature- so my answer will have to be the same.
Consider a recommender framework such as the one in Apache Mahout correlated co-occurance. Unlike the vanilla spark recommender, this implementation allows for multiple types of actions, such as viewed a web site, booked a trip their before, demographic information, etc.
Now you would calculate the recommendations for each user at whatever interval. Recommendations being based on multiple criteria and what other people similar to this user has done. Consider your 'items' in this case to be every destination in the world. So we now have every possible destination ranked for each user.
It is then a trivial extension to index elastic search by user/the ordered list of that users recommended destinations.
For example, we have a user who has visited Berlin, looked at several hotels in Vienna, and is from Romainia. When the user types in "au", we would expect to see "Austria" come up in the results much higher than 'Austrailia'
Per the comments and down votes- you probably should have either A) asked a more specific programming question or B) asked this question on another forum such as Data Science Stack Exchange, fyi

Related

What are some ways to figure out related products/questions/items anything? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
What are some ways including machine learning that I can use in my projects to generate things related to another. Like related apps, related websites, related products, etc.
I've been brainstorming these are strategies...
one way i can think of is show items from same category. But that would be too broad.
2nd way improves upon previous step, it's to keep track of what people click next and promote that item. Meanwhile keep bottom list randomized to let other relevant items show up and get clicked.
3rd way is to use machine learning and provide training data somehow and use that.
I want something simple but smart, as it gets better with time.
Collaborative filtering is designed for solving exactly this problem. The problem with this approach is that produces good results having a lot of data only. I mean... A LOT. And it's not a really simple thing to use. However, any machine learning technique is not simple. There are some node.js packages for CF available, but I have no idea how good are they.

The best search solution for a university website which has high-traffic time to time [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
We are building a university website and finding a search solution for it. Our university website has high-traffic because it has faculty of open university so very much students (approximately 1.5 million). Even we use caching for speeding up the website. Anyway, which search engine do you suggest for our situation?
Note: We think Solr, Elasticsearch or Sphinx for now, but also it can be one of the others.
Update: We need a full-text search engine which must be fast, extendable and with the features like query likening and indicating priority support.
Thanks.
It really depends on your use-case, what features you want, and whether you have any experience with any of the technologies. I could paraphrase arguments, but there's a very good discussion here: ElasticSearch, Sphinx, Lucene, Solr, Xapian. Which fits for which usage? that covers the pros and cons of each.
Edit (in response to the question's edit):
Of these technologies I have only used Solr (and SQL), but I've found it to be easy to use and would recommend it. It supports native sharding and replication, which should cover the extendibility issue. It also supports things like joins and field weighting, which I think covers all your needs if I read your requirements correctly.

Need to build a tool with NO IDEA how to start [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I want to build ... something (website? app? tool of some variety?) that searches other sites -- such as Amazon -- for specific items and then lists whether or not those items exist. Ideally it could also pull prices, but that's secondary.
I'd like to be able to enter a (very specific, an identification number) search term into the thing that I build and then have the thing return whether or not the searched item exists on the sites that it checks (a predetermined list). I'd also like it to take a list of ID numbers and search them all at once.
I have no idea where to begin. Can anyone point me in the right direction? What do I need to learn to make this happen?
You will need to learn a few key languages in order to start working on a program like this.
PHP: you need a server side language to skim the site
Javascript: For the input on the users side
HTML: to implement the javascript
Once you learn the basics, search stackoverflow for specific questions relating to a specific problem.
This is certainly a too broad question, but as OP asks to point in some direction here are few suggesstions-
Well this seems to be a big projects. You'll need to find if there is some official api given the other sites from where you want to fetch the product info, if yes use the api to retrieve the product info or else use web scraping where you retrieve the data by parsing the page and storing into your local database.
Amazon provides EC2 instances, where you can hourly rent specific configuration server as needed e.g. Linux with apache/mysql/php, or linux/java.
Amazon has a set of other tools like the S3 storage where you can host your images/docs/video and link them on the site.
Hope this helps in someway.

measuring precision and recall [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
We are building a text search solution and want a way to measure precision and recall of the system every time we add new document types. From reading some of the posts here it sounds like a machine learning based solution is the way to go. Can a expert comment on this? We will then look to add machine learning folks to our team.
The only way to get the F1-score require knowledge about the correct class, rank of all samples obtains by evaluation querys, and you also need thoses evaluation querys.
Any machine learning will need a large quantity of manual work to provided thoses samples and/or querys. So large that it wont save you any time.
Another bad aspect of this evaluation is through to learning-related intrinsic errors. It will go with the growing size of the index of the search engine and the number of examples required. You never get a good evaluation.
Forget machine-learning for the evaluation of search engine.
Build by hand your tests querys and sample, by the time it will become big and reliable.
If you really want machine-learning in your system, you should look at query pre-processing. Getting some meta-information about the query by another way (you say SVN, why not?) is generaly a good for performance and while it did'nt change the result, you can use the same sample for an end-to-end evaluation.
That what I have done few years ago, but with naive baye classifier on natural langage analysis.

Which open-source search engine should be used? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
My aim is to build an aggregrator of news feeds and blog feeds so as to make
searching/tracking of entitites in it easy. I have been looking at many solutions out there like Terrier, Lucene, SWISH-E, etc.
Basically, I could find only 2 sources of comparison studies done on these engines and one of them is kinda outdated. Basically I want a search engine which would be used in a case in which the data size is not too large, but the indexing will be frequent, every 30 minutes or so. I feel Terrier is not a good tool to be used in this case. It works better when the data size is large and updation frequency is low. Can somebody who has worked in the Information Retrieval field offer some advice ?
Lucene is well known and supported, so personally, that would be my first choice.
If you find a ready-to-use search engine, check out fastcatsearch.
It has been developed for commercial search, and applied to a lot of various sites.
Faster than lucene, and has web-based web manager to use easily.
Hosted in github, and check it out. https://github.com/fastcatgroup/fastcatsearch

Resources