Software alternatives to Google Search Appliance (GSA)

Software alternatives to Google Search Appliance (GSA) - search

I am interested in software alternatives to the Google Search Appliance (GSA) for use in a (large) university context. Has anyone experiences of migrating from GSA to an alternative solution? If so, what were the reasons for doing this (technical, financial, staff effort, etc) and have the experiences been positive?

I would recommend looking up Apache Solr , it is IMHO the best scalable, feature-rich search server out there. A F/OSS out-of-the-box solution from Apache Software Foundation and used by organizations such as Netflix, AOL, CNet etc. We had used GSA in our company for an year before moving to Solr. The move was relatively painless compared to the benefits accrued.
Since it integrates with a RESTful interface it can be integrated into your platform of choice without language/platform tie-ins. Give it a whirl!

We are currently moving from Google (GSA) to Microsoft FAST (specifically FSIS).
The reason is simple, we are not satisified with the Google experiance from a supportablity and manageability perspective. We have chossen FAST because it gives us a platform that can scale as our needs grow over the next few years. Also it gives us a very fine level of control. What I mean is it will give us the ability to define custom fields and then control how these fields are populated.

The company I work for is a Google GSA partner and has developed a solution on top of the GSA. We also have a cloud solution with very similar benefits to the GSA and a host of things that the GSA can't do - like scale geographically, scale with load, upload data and have it in the index in near real-time, have nested records, deal with hierarchy etc...
In our experience, the people who migrated from the GSA to the Cloud solution did so for the following reasons.
Primarily, they did not want to manage hardware.
Most of our customers are ecommerce / media companies, and they had a lot of navigation. The GSA search throughput really struggles when you have a lot of navigations / refinements. For example if you have 20 navigations, the throughput drops from around 50 queries per second to about 12.
Indexing time - the GSA has a minimum of 7 minutes for something to show up in the index, and for ecomm / media these times are unacceptable.
GroupBy has written migration tools to allow the smooth transition from GSA --> Cloud and also the cloud platform accepts the same format that the GSA accepts.
Have the experiences been positive? Well, clearly I'm going to be biased and say yes, but there are hard conversion increases that support the clients positivity. :-)
More details at: www.groupbyinc.com

Related

Search Engines for Password Protected Sites

Our institution is moving away from Google Search Appliance since it has been discontinued. They are switching over to Google Custom Search Engine for our sites.
We have one site that is password protected so CSE won't work for it. Has anyone had experience with other web search solutions that work with password protected sites? It can be hosted locally on our server or cloud based, provided that data isn't accessible to everyone, like it would be with Google.

I have personally reviewed many of the replacement appliances you can find listed on Gartner's 2017 Magic Quadrant (a popular industry reports that talks about leading enterprise search engines), so feel I can give you an unbiased and informed opinion here.
If you are looking for an on-premises solution like GSA, at a similar price, that are capable of working with password protected sites and much more, I would recommend these:
Lucidworks Fusion: excellent choice, well polished admin console, super fast to get search instances running. This will feel like a nice upgrade from your GSA setup. They also have a very good list of existing clients. Build on top of Lucene/SOLR and main driving force between SOLR. They have some good AI under-the-hood too, so you don't need an army of editors to be continually tuning your engine (though it doesn't hurt). If you would like to test it, you can download a trial version from their site. https://lucidworks.com/
MindBreeze: very similar to GSA, physical server you can install in your data center. Easy to tune. You'll get similar relevancy as per GSA with just the basic tuning. Interface feels a little dated for my liking. https://www.mindbreeze.com/
SearchBlox: a lower cost solution you install on VM. A little more basic than the others, but cheap and cheerful. They also have a 30 day trial version that you can install on local VM for testing purposes. https://www.searchblox.com/
If you want something cloud-based, you have plenty of options too. Lucidworks and SearchBlox can both be installed on AWS and other cloud platforms. You also have some of the market leader options too, such as Attivio, Sinequa and Coveo. They are great, but I suspect offer a little more than you need (and can get costly).

User orientated search?

I am currently looking into finding a search device that can facilitate a lot of documents and a few different websites and an LMS.
Where this differs is that we would like there to be a heavy amount of relevancy based on user roles. Everyone is auto-logged into all of our systems via SSO much like this site. We want heavy weighting to be put on documents, web site articles/knowledgebase, and class in the LMS that are for that user's selected role.
I personally have limited knowledge of solr which we use for some full text searches. I have considered looking into elasticsearch, solr, google appliance, and FAST.
Do any of these have any innate features that will help me get to my end goal faster? My worries about elasticsearch and solr is the amount of development time. Our group has done limited search customization so also wondering on dev time needed for various solutions.

You must have not done much research yet, because FAST is gone (absorbed by Microsoft) and Google Appliance is not very customizable.
Solr is more customizable than Elasticsearch, so that's my recommendation. For the rest, I would start by having the field for a role and then using that as a boost factor.
If the basic approach does not work, the Solr-Users mailing list may actually be a better place to follow-up as it allows a discussion to fine tune the issue.
(Updated)
For more packaged solutions that integrate with Solr and include crawling, you can look at:
Apache Nutch to add crawling specifically
Apache ManifoldCF - has a lot of integrators into other data sources
LucidWorks Fusion (commercial)
Flexile search platform (commercial)
Cloudera - is a Big Data solution, but it integrates with Solr and - may - have a crawler. It does have a nice UI Hue
And probably more, if you dig a little bit.

Architecture of the Catalog Search of an Online Store - Valuable Resources

I'm currently working on an online store and I'm curious if there are any "best practices" that I should consider to attain subsecond (or close to) search operations. I'm using Full Text Search in Sql Server 2008 which I'm sure I could optimize in various ways. Right now, searches within Management Studio alone are taking 2-3 seconds roughly. Furthermore, I'm curious if client or server-side caching of some sort could be utilized. The database for the catalog contains millions of records. Does anyone know how Amazon.com or Borders.com return search results so quickly? Are there any books or articles that discuss search optimization and architecture? This isn't to be confused with search-engine optimization. Right now, I don't care about how visible the site is to the public.

Those websites use full text search or IR libraries. Apache Lucene is an open source framework that perfectly meets your needs. These information retrieval or IR libraries use inverted-index to obtain better search performance trading the index creation time. Also look for using Facets and collaborative filtering (the suggestion list you see on amazon) using Taste.

www.acm.org/dl
&computer.org
& searchenginewatch
& microsoft/enterprisesearch whitepapers
& lucidimagination
& autonomy
& endeca
All of these resources publish consumable information that is both useful and not always too obscure nor facile.
You can get the task done with MSSQL 2008 but you need to spend more time than a question on stackO can get you. |imho|
Note: Its fine to explore the implementation issues before you architect, but its not always a good idea to bring those implementation details into the architecture.

Embed Google/ Yahoo search into a web site or build your own

I am looking for an opinion on the whether to use Google custom search, Yahoo search builder or build my own for web projects (no more than 100 pages of content). If I should build my own - do you have any fast start kits you could recommend?
Many thanks
Chris

I have had success using OpenSearch for my personal blog.
While working at BigCorp we used dedicated search applicances in yellow boxes, but in your case (around 100 pages) it does not make sense to take such a route.

I would suggest going with either Google Custom Search, or Yahoo Search Builder (as long as they both index your site sufficiently to provide good results).
More often than not, you'll get better results and you don't have to worry about building your own custom engine (or implementing an off the shelf/open source piece of software to do the job for you).

I've used IBM OmniFind Yahoo Edition and had fantastic results with it. You are limited to a single index per implementation but it's very fast and easy to integrate with and extensible in terms of search customization. I've used it with a ASP.NET site without issue. A caveat being that it needs to be installed on the server and running as a service so it is out of the question for most shared hosting. It has the index capabilities of general search engines (pdf/html/etc) which is very nice.
Edit:
I forgot to mention that some of the reasons I liked it vs other options is that it is free and doesn't require any additional hardware, just FYI.

The main situation I see Google/Yahoo as being sub-optimal is when your site relies on up-to-the-minute results. You're at the mercy of their crawling policies/speed/etc. If that's okay (and I suspect it will be for most 100ish page sites), use them - the results will be great. If realtime results are important, you may have to bite the bullet and install something locally.

Yahoo boss is cheaper and recommended by many people
I am going to integrate it soon.

search integration

I am working on a website that currently has a number of disparate search functions, for example:
A crawl 'through the front door' of the website
A search that communicates with a web-service
etc...
What would be the best way to tie these together, and provide what appears to be a unified search function?
I found the following list on wikipedia
Free and open source enterprise search software
Lucene and Solr
Xapian
Vendors of proprietary enterprise search software
AskMeNow
Autonomy Corporation
Concept Searching Limited
Coveo
Dieselpoint, Inc.
dtSearch Corp.
Endeca Technologies Inc.
Exalead
Expert System S.p.A.
Funnelback
Google Search Appliance
IBM
ISYS Search Software
Microsoft (includes Microsoft Search Server, Fast Search & Transfer):
Open Text Corporation
Oracle Corporation
Queplix Universal Search Appliance
SAP
TeraText
Vivísimo
X1 Technologies, Inc.
ZyLAB Technologies
Thanks for any advice regarding this.

Solr is an unbelievably flexible solution for search. Just in the last year I coded 2 solr-based websites and worked on a third existing one, each worked in a very different way.
Solr simply eats XML requests to add something to index, and XML requests to search for something inside an index. It doesn't do crawling or text extraction for you, but most of the time these are easy to do. There are many existing addons to Solr/Lucene stack so maybe something for you already exists.
I would avoid proprietary software unless you're sure Solr is insufficient. It's one of the nicest programs I've worked with, very flexible when you need it and at the same time you can start in minutes without reading long manuals.

Note that no matter what search solution you use, a search setup is "disparate" by nature.
You will still have an indexer, and a search UI, or the "framework".
You WILL corner yourself by marrying a specific search technology. You actually want to have the UI as separate from the search backend as possible. The backend may stop to scale, or there may be a better search engine out there tomorrow.
Switching search engines is very common, so never - ever - write your interface with a specific search engine in mind. Always abstract it, so the UI is not aware of the actual search technology used.
Keep it modular, and you will thank yourself later.
By using a standard web services interface, you can also allow 3rd parties to build stuff for you, and they won't have to "learn" whatever search engine you use on the backend.

Take a look at these similar questions:
Best text search engine for integrating with custom web app?
How do I implement Search Functionality in a website?
My personal recommendation: Solr.

All these companies offer different features of Universal Search. Smaller companies carved themselves very functional and extremely desired niches. For example Queplix enables any search engine to work with structured data and enterprise applications by extracting the data, business objects, roles and permissions from all indexed applications. It provides enterprise-ranking criteria as well as data-compliance alerts.

Two other solutions that weren't as well-known &/or available around the time the original question was asked:
Google Custom Search - especially since the disable public URL option was recently added
YaCy - you can join the network or download and roll your own independent servers

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string