Search Engines for Password Protected Sites - search

Our institution is moving away from Google Search Appliance since it has been discontinued. They are switching over to Google Custom Search Engine for our sites.
We have one site that is password protected so CSE won't work for it. Has anyone had experience with other web search solutions that work with password protected sites? It can be hosted locally on our server or cloud based, provided that data isn't accessible to everyone, like it would be with Google.

I have personally reviewed many of the replacement appliances you can find listed on Gartner's 2017 Magic Quadrant (a popular industry reports that talks about leading enterprise search engines), so feel I can give you an unbiased and informed opinion here.
If you are looking for an on-premises solution like GSA, at a similar price, that are capable of working with password protected sites and much more, I would recommend these:
Lucidworks Fusion: excellent choice, well polished admin console, super fast to get search instances running. This will feel like a nice upgrade from your GSA setup. They also have a very good list of existing clients. Build on top of Lucene/SOLR and main driving force between SOLR. They have some good AI under-the-hood too, so you don't need an army of editors to be continually tuning your engine (though it doesn't hurt). If you would like to test it, you can download a trial version from their site. https://lucidworks.com/
MindBreeze: very similar to GSA, physical server you can install in your data center. Easy to tune. You'll get similar relevancy as per GSA with just the basic tuning. Interface feels a little dated for my liking. https://www.mindbreeze.com/
SearchBlox: a lower cost solution you install on VM. A little more basic than the others, but cheap and cheerful. They also have a 30 day trial version that you can install on local VM for testing purposes. https://www.searchblox.com/
If you want something cloud-based, you have plenty of options too. Lucidworks and SearchBlox can both be installed on AWS and other cloud platforms. You also have some of the market leader options too, such as Attivio, Sinequa and Coveo. They are great, but I suspect offer a little more than you need (and can get costly).

Related

Build a web app basing on a dms kernel

I need your help for my question.
I need to build a web based application that should perform some activity of document management. I'm evaluating existing document management solution and I need a solution that expose api via rest or other protocol, so that I can interact with them from my application.
I read about alfresco, sharepoint and knowledge tree but I find difficult to understant prices for commercial use. Can someone help me with a comparison of function/prices for a commercial use?
Alfresco is available in two versions, Alfresco Community Edition and Alfresco Enterprise. Alfresco Community is under the LGPL license. Assuming you want to use it in-house (not distribute it to others), you can use + customise + extend Alfresco Community to your heart's content, without restriction or charge. (LGPL/GPL/etc are distribution licenses, not use licenses, so only kick in when you redistribute). However, Alfresco Community comes with no commercial support, only support provided by the community. For a lot of uses that's good enough, but for other cases you'll want to be able to ring someone for support / get hotfixes backported to your version / etc. In that case...
Alfresco Enterprise is paid for, coming with commercial support (including SLAs, pick up the phone and talk to an expert etc), along with a handful of features that matter in big deployments (clustering being one). Pricing depends on a few things, mostly around size of deployment and SLA, but for small deployments isn't too bad. For big deployments, it can be a huge saving over other systems! Give sales a call, they're very friendly, and only rarely buy me beer ;-)
If you don't want to run your own repo, there's also the Alfresco Cloud version, which comes with a public API. With this, Alfresco themselves run and maintain the instance for you, and you can use the public API to store / retrieve / manage / etc your content. It's much simpler to get going with! But you don't quite get as much control or customisation as with the on-premise versions.
SharePoint might already be covered by your existing Microsoft licensing deal, if you have one. If not, you'd need to decide between licensing on a per-server or per-user basis. See Microsoft pages like this to get an idea of the options, then ring your Microsoft sales rep to get an idea of the pricing. In many cases, you'd need to pay someone else for support, so you'd be back to a similar thing as with Alfresco Community vs Enterprise.
If you're not sure what system to go with, you might be safest and best off implementing your project using CMIS (Content Management Interoperability Services). This provides a common way to talk to content repositories, allowing you to store/retrieve/browse/search/permissions/etc irrespective of what the underlying repo is. Alfresco provide some information on it, and Apache Chemistry provides open source client libraries for most common programming languages, which makes getting started very quick. There's also an excellent book on CMIS which I can very much recommend! And not only because the authors of that have been known to buy me beer too... ;-)

Software alternatives to Google Search Appliance (GSA)

I am interested in software alternatives to the Google Search Appliance (GSA) for use in a (large) university context. Has anyone experiences of migrating from GSA to an alternative solution? If so, what were the reasons for doing this (technical, financial, staff effort, etc) and have the experiences been positive?
I would recommend looking up Apache Solr , it is IMHO the best scalable, feature-rich search server out there. A F/OSS out-of-the-box solution from Apache Software Foundation and used by organizations such as Netflix, AOL, CNet etc. We had used GSA in our company for an year before moving to Solr. The move was relatively painless compared to the benefits accrued.
Since it integrates with a RESTful interface it can be integrated into your platform of choice without language/platform tie-ins. Give it a whirl!
We are currently moving from Google (GSA) to Microsoft FAST (specifically FSIS).
The reason is simple, we are not satisified with the Google experiance from a supportablity and manageability perspective. We have chossen FAST because it gives us a platform that can scale as our needs grow over the next few years. Also it gives us a very fine level of control. What I mean is it will give us the ability to define custom fields and then control how these fields are populated.
The company I work for is a Google GSA partner and has developed a solution on top of the GSA. We also have a cloud solution with very similar benefits to the GSA and a host of things that the GSA can't do - like scale geographically, scale with load, upload data and have it in the index in near real-time, have nested records, deal with hierarchy etc...
In our experience, the people who migrated from the GSA to the Cloud solution did so for the following reasons.
Primarily, they did not want to manage hardware.
Most of our customers are ecommerce / media companies, and they had a lot of navigation. The GSA search throughput really struggles when you have a lot of navigations / refinements. For example if you have 20 navigations, the throughput drops from around 50 queries per second to about 12.
Indexing time - the GSA has a minimum of 7 minutes for something to show up in the index, and for ecomm / media these times are unacceptable.
GroupBy has written migration tools to allow the smooth transition from GSA --> Cloud and also the cloud platform accepts the same format that the GSA accepts.
Have the experiences been positive? Well, clearly I'm going to be biased and say yes, but there are hard conversion increases that support the clients positivity. :-)
More details at: www.groupbyinc.com

Open source alternative to WebEx WebOffice?

I have a client who has been using WebOffice (from WebEx) for a variety of tasks within their small organization. The problem is that they only really need a small subset of the features WebOffice provides (Contact list, Database, and Document Storage).
They've asked me to develop a website focused on these three features with the rationalization that this should be more cost-effective, since they currently aren't using many of the features of WebOffice they pay for.
What are some open-source alternatives that I could implement for them? Sharepoint sounds like it would be too bloated and Google Apps may not be as collaborative as they would like.
We looked at sharepoint and went like "meh". Anything interesting you want to do with it requires prohibitive licensing, and if you expose any piece of it to the internet then the cost just blows any budget away.
We are piloting a deployment of Alfresco, with KnowledgeTree also being a very decent option, IMO. As for the main site, something like OpenAtrium looks like a pretty decent and flexible fit without much configuration needed. OpenAtrium is based on Drupal.
SharePoint sounds like a good match? Did whoever told you it was bloated also mention why?
You might only need WSS which is free (if you have Windows Server).
My company hosts LumiPortal (www.lumiportal.com) which is similar to WebOffice but with drive letters for storage. If you have inhouse technical expertise, then on the open source side we see Joomla and Drupal, which could be thought of as classic content management systems. If you have in-house technical expertise, you might look at Drupal and their document management component first.
Call WebOffice customer service and tell them. They will probably adjust your payment options to suit your needs.
There's a good roundup of online collaboration/office suites here although it is a bit dated now.
http://www.readwriteweb.com/archives/web_office_2007_year_in_review.php
Webex WebOffice hasn't been updated in 5 years and has been sunset by Webex with no migration path (confirmed in their forums) so I would get off it ASAP.
With the addition of Wave to Google Apps it would seem to be a much more cost effective and modern replacement.

Embed Google/ Yahoo search into a web site or build your own

I am looking for an opinion on the whether to use Google custom search, Yahoo search builder or build my own for web projects (no more than 100 pages of content). If I should build my own - do you have any fast start kits you could recommend?
Many thanks
Chris
I have had success using OpenSearch for my personal blog.
While working at BigCorp we used dedicated search applicances in yellow boxes, but in your case (around 100 pages) it does not make sense to take such a route.
I would suggest going with either Google Custom Search, or Yahoo Search Builder (as long as they both index your site sufficiently to provide good results).
More often than not, you'll get better results and you don't have to worry about building your own custom engine (or implementing an off the shelf/open source piece of software to do the job for you).
I've used IBM OmniFind Yahoo Edition and had fantastic results with it. You are limited to a single index per implementation but it's very fast and easy to integrate with and extensible in terms of search customization. I've used it with a ASP.NET site without issue. A caveat being that it needs to be installed on the server and running as a service so it is out of the question for most shared hosting. It has the index capabilities of general search engines (pdf/html/etc) which is very nice.
Edit:
I forgot to mention that some of the reasons I liked it vs other options is that it is free and doesn't require any additional hardware, just FYI.
The main situation I see Google/Yahoo as being sub-optimal is when your site relies on up-to-the-minute results. You're at the mercy of their crawling policies/speed/etc. If that's okay (and I suspect it will be for most 100ish page sites), use them - the results will be great. If realtime results are important, you may have to bite the bullet and install something locally.
Yahoo boss is cheaper and recommended by many people
I am going to integrate it soon.

hardware infrastructure for public web application

I'd like to start a free budget/personal finance site and will need plenty of horsepower and storage. I'm definitely a nubee, so how does one get started in terms of hardware infrastructure? Do I need to get a dedicated IP from my ISP and obtain my own servers? Do I go with amazon or Sql Server Data Services/Azure or something like that? Is the latter services free or a discount offering available to non-profit/free services such as the budget/personal finance site I'm looking to start?
If you don't mind writing your web application in python, then I's suggest using Google App Engine. See: What Is Google App Engine?
What I like to do when I have new ideas for a site is to find an inexpensive hosting solution ($10 per month). This allows me to test the idea and see if the site is going to be successful. If it is a flop, I haven't wasted much money and if it is successful I can upgrade to better hosting (dedicated server).
There are many hosting options available and several of them have great tools such as an online SQL Server management studio. Your other option would be to host it yourself if you are prepared to deal with firewall issues, backups, storage, etc.
Whether it is feasible to DIY varies a lot by country...if you have a decent broadband connection with a fixed IP this can be the cheapest route to play around with first, especially if you need an awful lot of storage.
Note however that many fast broadband connections are only fast for downloads - when you're running a server, the speed your users will see is the upload speed, which is usually a lot less. Also, you'll need to do your own admin and backup etc.
Apart from this most hosting options have a price tag on top, varying from virtual hosts (sharing a real machine), to colocation (your machine in somebody's data center), to cloud services like amazon et al (which have a good scaling ability)- and you will need to shop around for the software stack and hardware features you really need.
There's really two ways to answer this question, what differentiates them is budget.
One is to properly design this solution, prototype it, benchmark the prototype, extrapolate anticipated user load, add overhead and scale accordingly. This takes time, costs but gives you a supportable solution that serves your customers well.
The other is to just give something, anything a go and fix the problems as they come along. This is quicker and cheaper but might be a headache for a while and might p*** off your customers.
Basically it comes down to budget.
Best of luck.

Resources