Looking for Open Source Search Federation solutions. Essentially I have a myriad of data repositories that I am unable to connect directly to SOLR.
I have evaluated SESAT (http://sesat.no/home.html) which is a possibility, however I am curious if there are other open source federated search solutions. Ideally I'd like to find something that can work along side SOLR or Lucene.
Related
Recently, google custom search has been shutting down and ending their services come April 2018 (don't quote me on that).
In light of this, I've been attempting to move our Drupal site's search to a new search engine, namely Apache Solr.
Our drupal site hosts tons of files from PDFs to Images to JSON and XML files.
I haven't had any trouble indexing these files since they're stored locally on the same machine that hosts the Drupal site, but we have a bunch of external files that I used to have no problem searching with GCSE.
I want to be able to index external files and be able to search/query them with Solr just like I was able to search them with GCSE.
Is this possible? I'm sort of a noobie and have been following step-by-step guides up until now in order to get Solr search up and running on our site.
If anyone has any idea on how to search and query external files with Apache Solr, I'd be grateful.
Yes, it's possible to index different external files inside Apache Solr. There are plenty of tutorials, how to do that.
I will recommend you to look through this reference guide. Basically most of the stuff under Indexing and Basic Data Operations, with paying attention to the Uploading Data with Index Handlers, which will help you to index XML/XSLT, JSON and CSV data and also take a look at Uploading Data with Solr Cell using Apache Tika, which will explain how to index PPT, XLS, PDF and others more complex formats.
On the side of the querying it - follow some initial guidelines from Searching, when you will have troubles - feel free to ask additional questions here.
Is Sharepoint my best option to replace an aging network of fileshares? There's approx 1TB of data residing among 3 fileshares (1 DFS, 2 NAS boxes). A document management system is in place for new things - the file shares are now just read-only archives/legacy. Our users would simply need to be able to search for and open the documents.
Users are finding it difficult to locate their documents in the file shares and windows search does not often help. Sharepoint was suggested as something which would play nicely with Office documents (99% of the content) and have a good search facility.
Not being a Sharepoint Developer or having had any training on it, I'm getting a little lost. I have set up a test server to try it out using SP2013. I have managed to index each of my file shares and have created a search page. However, results aren't consistent with the indexted items. I assume I need to somehow get the relevant metadata from the files but I have no idea how to go about this.
Could anyone suggest some resources for help on this subject (my searches have mainly turned up paid-for Sharepoint addons or outdated blogs) and any experience of doing something similar? Also happy for any suggestions on ways to achieve this using other software/platforms.
I went with Microsoft Search Server 2010 in the end.
Sharepoint is basically optimized to be a document manager. I think you don't need to buy or donwload addons.
For your problem, metadata are the key! You need to properly specify the metadata.
I give you the theory of a plan document management in SharePoint 2013 :
https://technet.microsoft.com/en-us/library/cc263266.aspx
A nice introduction to metadata :
http://fr.slideshare.net/gzelfond/document-management-in-sharepoint-without-folders-introduction-to-metadata
Be careful to use the Microsoft documentation for the beginning. From my experience, its difficult to start with this documentation because you have several things in it. There is also good books/ebooks that you can find easily to start well, and probably more simplified than MS documentation.
I am using Azure search where it creates index on my database tables and shows results as expected.
Now I have a requirement where I need to find-out what are the words or items users have searched most or what was the pick time for search.
Is it possible to find any such reports with Azure Search?
Either by its portal or using the API or Code?
I'm on Azure Search team, thanks for using the service. Currently it's not possible, however, we understand the importance of this feature and we're working to deliver it. No exact dates yet. For now, you'd have to collect and aggregate the information you need on the client side.
For feature request like this, feel free to use our User Voice page to help us prioritize work: http://feedback.azure.com/forums/263029-azure-search
We are currently using a number of open source and commercial products to store different type of information (in our internal network). All these products come with their own repositories (usually a database) and their own search capabilities and store different type of information.
Currently the list of products is as follows:
Wordpress
Jira
Confluence
Sharepoint
Dynamics AX
Moodle
The problem we are facing is that when one needs to search for information, one needs to login into all these different systems and execute a search on each one.
I Googled for "search engine frontend", "meta search engine", etc. but i was not able to find something obvious that solves our problem. At this point, i have to say that we are not interested in building one "central repository" to be searched, but instead we are in need of a frontend that will accept the query from the user, "package it" to the format that each of the individual search engines understand, receive the respone (JSON or XML) and present it to the user
Any suggestions on how we could solve it?
Your strategy is right: If you are not interested in building a central index, you will need an application that accepts the query from the user, converts it to the format that each of the individual search engines understand, receives the responses and presents them to the user. This is exactly what a meta search engine does. Even if you use a framework (e.g. Carrot2), much work will probably remain to write those query and result transformers, and you will probably experience slow results because the meta search can never be faster than the underlying search modules of the components you search through.
Instead of querying each backend separately you can put your data into one backend.
You could export your data to a Apache Solr server and use a frontend like CorePages, http://www.corepages.biz . You could add a backlink to your data so you can directly jump to your search result entry, f. e. a Jira Ticket or a wiki article.
I'm currently working on a project requesting to integrate or connect Sharepoint to TYPO3.
Share point will somehow replace the fileadmin of TYPO3.
So what I mean by "integrate" or "connect" is the following points:
To display lists of documents from sharepoint on TYPO3 pages through the TYPO3 BE by using some tag or category. In short accessing sharepoints document in the TYPO3 BE.
To be able through TYPO3 to search documents from Sharepoint. And to filter them by type or category. And of course to display the results.
I found some references on the web.
The obvious one was the sharepoint connector SPTools of TYPOTYCOON but it seems dead as there are no fresh news on the website and no activity on the twitter account.
I found also two extensions on the TER (WSS/MOSS Reader and WSS/MOSS Writer) last uploaded December 2010. Surely outdated - Did anyone ever used them? have some feedback?
I found also some references about CMIS and the TYMIS extension but couldn't find it on the TER.
That's why I come to you, hoping you have some solution, useful feedback or lead at least...
Starting in TYPO3 6.0 the new File abstraction Layer (FAL) [1.] was introduced. This gives you the possibility to split the file storage from the files used in TYPO3. As the result fileadmin might contain any number of virtual mount points of any supported storage. Multiple (local, WebDAV) FAL drivers come preinstalled and there is an Amazon S3 driver at [2.]. I am not aware of any FAL driver development for Sharepoint. So this might be up to you to resolve, but these hints should get you started.
Links:
http://docs.typo3.org/typo3cms/FileAbstractionLayerReference/
http://git.typo3.org/TYPO3v4/Extensions/fal_amazons3.git