Document Management System Recommendation [closed] - search

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I have an application that generates around 10000 printed pages per month. Each report (around 2000/month) is archived as PDF on a simple network
file share.
I am searching for a Document Management System meeting the following requirements:
watch the archive folder and update the index either on regular basis or when changes are detected
provide an Intranet Webpage where users can search documents based on filenames, timespans and other relevant file attributes
fulltext search
can handle large/substantially growing archives
To be clear, I am searching for a pre-built solution here, commercial products are accepted.

Sounds like Microsoft Search Server 2008 Express would be a good candidate. Free and installs in a couple of minutes.

I can suggest you google docs. AFAIK It can handle all your requirements.

This is a very vague question and I'm not quite sure how to respond.
It looks like you want a way to index all your files and ensure that the information is kept up to date in the database. What I can suggest is you look into some search servers like:
Sphinx
Solr
These both take some setup but they handle all your requirements: They can easily be setup to watch a folder and keep your index up to date, they provide great fulltext search, they can be accessed via an intranet webpage if you setup a page to search your database, and they are used for enormous operations so large archives shouldn't be a problem.
If you're looking for a pre-built solution, I'm not sure what to mention.

Plone could work pretty well for your needs. It has plugins for indexing PDF content, and you can customize the metadata. Also, it has a fantastic web interface with built-in search. The best part is that it's free and easy-to-use, and if your needs grow, you can pay for support.
My only recommendation (at first glance) is that you store your content on the file system and not in the Zope OO database. You should only store your metadata and index data in the database. This is a pretty common way of storing large amounts of content in the document management world.
Hope that helps!
Tom Purl

As Tom said Plone does to what you describe. It has build in full text search that relies on the commandline programm pdftotext for pdfs to be in the path. There are several Extension you may me interested in:
Reflecto - Watches a part of the
filesystem and alows searching and
displaying it inside Plone:
See reflecto on the plone.org/products
TextIndexNG 3 - Indexing extentions written for a publishing house
http://www.zopyx.com/projects/TextIndexNG3/textindexng3-the-leading-fulltext-indexing/
or
collective.solr - use the search enging "solr" to drive the catalog:
See collective.solr on plone.org/products
(Sorry, missing links due to stackoverflows new user policy)

Related

Are there any decent open-source multi-tenant CMS's out there? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
We're looking for a CMS that we can use as the basis for a new product we're rolling out.
As it's principally a content based thing, we need to base everything on a CMS, but there's a few things we need:
As we're supporting tens - hundreds of users, we ideally need a multi-tenant CMS (single shared code base), that can support different designs per site
As we're selling in functionality, we need something that will let us deploy a new 'module' and switch it on/off on a per site basis
We prefer stuff that is open source (PHP or Rails, that sort of thing)
Before I consider building something, is there anything out there that's any good?
Now I am biased, but dotCMS 1.9 is a flexible open source solution (java) that was designed to make running tens or hundreds of sites within a single instance easy. You can create site "templates" and use them again and again as needed them. Sites can share content, assets and templates, or not share anything depending on how you set them up. Users can have access to manage one site or many sites - their views into the management tool are limited by their permissions (as you'd expect). Again, I obviously am biased as I work for the company, but this is exactly the problem that dotCMS 1.9 was designed to solve.
Plone sounds like it'd do what you want.
It's written in Python, on top of Zope, and supports multiple distinct sites (with distinct and/or shared users, groups, styling). Extra functionality is added through 'products'; there are a number of Free extensions and it's quite easy to write your own too.
We use http://www.alfresco.com/ ...seems to fit your definition . Different designs per site can be achieved with what they call "web scripts" . It supports deployment and branching infrastructure that you can leverage to for your different clients
As we're supporting tens - hundreds of users, we ideally need a multi-tenant CMS (single shared code base), that can support different designs per site
My first thought when I read that was WordpressMU (perhaps with Buddypress if you need groups, etc?), but it might not be "CMS" enough for your needs... you don't elaborate on which features of a CMS you are looking for (media management, workflows, etc), so it's a bit hard to recommend one.
DotNetNuke supports multi-tenant operation, and has a fairly active marketplace for add on modules, skins etc. It has pretty well defined module development interfaces as well.
Yanel is a Java/XML/XSLT based CMS (Apache 2.0 license) designed for multi-tenancy and one can run arbitrary many sites inside the same Yanel instance, whereas see in particular the documentation on 'realms'.

What is the biggest new feature/improvement in SharePoint 2010? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
This is really a question for the 7400 people (!) at the SharePoint Conference 2009. Of all the new features and improvements in SharePoint 2010, which one (or area or feature set) do you think will have the biggest impact on the world of SharePoint development?
I haven't had a chance to do anything with it yet, but the new BDC (Business Data Catalog), the BCS (Business Connectivity Services), looks really promising - and something that people may actually use as more than a last resort this time around.
Edit: Now that I have had the time to play with the BCS - I can tell you that it is a HUGE improvement over the BDC in terms of both flexibility and ease of use - it is going to be the center of a ton of big-business custom development work to come.
Development support on Win 7 / WS08R2
You no longer have to do your development on Windows Server. You can use Win 7, Vista, or WS08R2.
It may sound stupid, but I would say sound compatibility with Firefox is the comforting thing to know. It not because I am a big fan of Firefox, but it shows a big step of MS towards openness.
For me? The fact that I can now publish my access applications to the web. Here is a video of me playing with ms-access, and about half way through this short demo I switch over to running the application in a browser. I tested this in FireFox, and it also runs 100% perfect...
http://www.youtube.com/watch?v=AU4mH0jPntI
I liked the BDC but was disappointed at the lack of tools to quickly bring an existing SPList from another site or site collection in as a external list. It can be done, but it is a very manual tedious process. I would have liked to see a point and proxy sort of API.
Having lived through some SharePoint application upgrade disasters, I would say that I am very favorable to the new Feature Versioning and Feature Upgrade capabilities. The ability to define an upgrade path for content types and lists as well as move existing file URL's is great. With the new event and FeatureUpgrading method on the SPFeatureReceiver you can do just about anything in upgrade.
More on the Feature Upgrade...
I have been playing with Business Conectivity Services and i'm very impressed. this is the tool that will make sharepoint the bridge in a business.
Out of the box Global Navigation Components no longer use tables. I know it's really not way up there on the list of improvements, but I was super excited when I read this.
SharePoint 2010 Changes in Rendering
2 biggest improvements:
1 - Dev tools support - You can throw away WSPBuilder, SharePoint Manager, and all the other hodge-podge tools you used to develop SharePoint Solutions.
2 - Taxonomy/MetaData - You can add a metadata column to any content type and query that information accross farms. Leverages the new service application infrastructure which gets rid of SSP's

Document Management [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
We looking for a simple, open source, web based document management system for Linux. With document management I mean the ability to store a set of files (minimally doc, xls and pdf) as a document. Associate meta data with the document like owner and version. Update and delete documents. Ability to index and search content. Authentication and the ability to authorize at least read, and possible write. If possible I would like to avoid implementations in Java or PHP, and as we use MySQL already that would work especially well for meta-data storage.
We have used Google Applications in the past but the lack of support for PDF makes it a poor fit. Other downsides include their service losing some of our spreadsheets, no concept of company owning information opposed to individual accounts, and some of our information is sensitive and we prefer keeping it in-house (passwords, contracts etc).
MediaWiki was not a good fit either as our documents is really a set opposed to be structured content (i.e. not looking for a content management system), and at least the version we had installed did not deal well with attachments.
Based on review of past questions I plan on looking into KnowledgeTree. Any other projects that we should consider?
I've been using KnowledgeTree now for a few months developing an ASP.Net application and I only have good things to say about it. Our product uses it for PDF storage/retrieval and it really couldn't be easier to deal with. The basic install gives you a simple environment with almost endless amounts of configuration for meta-data, document groups, and various security options. Also, the KnowledgeTree staff have been very helpful and have provided us with sample code when we have run into 'how are we going to do that?' moments.
I'll second the recommendation for KnowledgeTree. Have been using it for a couple years and have roughly 1K documents indexed. Sometime last year, I wrote a short script which monitors KT's transaction table (in MySQL) and notifies users of new or updated documents via Twitter, Identica, and/or Jabber. The Twitter/Identica feeds can then be monitored with a RSS reader.
Look for something that will index all your document formats and keep them searchable.
I solved this in my office using Coldfusion. It has verity search engine built in. This indexes files on your network (doc/xls/pdf, etc) to make the text in them searchable (like google).
An instant search engine for all my files, for upto 150,000 or so is built in for free with Coldfusion so it suits my purpose.. Something like this would allow you to save your files on a network how/wherever and you'd be able to extract the rest of the information about owners, modification dates through libraries available in java / .net.
I am sure you could replicate this with another language, but probably a bit more effort. I am presently wishing I could use the Google Docs API as a wysiwyg editor in my own wiki in-house.. that would solve most of my problems then because everything would be intranet based.
Try https://www.mayan-edms.com, written on Django, db agnostic
You can consider GroupDocs as they have got storage, conversion and few more features.

What other web analytics tools are out there? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
What other web analytics tools people use other than google analytics?.
Webtrends
Webalizer (C)
AWStats (Perl)
Smarterstats
Did you want free ones? 2 of those aren't, but webalizer is quite poor. AWStats is decent and free. Plus google "web stats" and you get hundreds of commercial ones.
Or the open directory links:
Log analysis (Free/open source)
Log analysis (Commercial ones)
If you're in the mood to finish a project, I got 1/3 through writing one in C# :) Unfortunately the most important part - the eye candy reporting - was never finished. But the parsing was done.
I don't have any experience with it but this seem to be the most "famous" open source solution:
Piwik (Open Source)
These are other commercial solutions:
Woopra (Free during beta)
Clicky Web Analytics
Compete
ClickTale
I have no idea of their usage statistic. I wonder why someone would pay for this kind of service.
Mint is nice, very pretty, lots of features, and not very expensive at all.
If you want to spend a lot of money and time, try out Omniture SiteCatalyst.
If you want an easy-to-use, real-time, high-performance tracking and reporting system which lets you download all of your data (if you want), you can try my company:
ConversionRuler conversion tracking
We track sites which gets 10M+ page view per month without a hiccup, and have dozens of agency clients who track 30+ clients using our tools. Many of them use us as an alternative to Omniture and WebTrends to save on their enormous costs.
Not for everyone, but worth a look. We focus on "less is more", and we're not a search engine who you may not want to give your data to, especially if you're paying them for traffic. (e.g. Google Analytics, Urchin, and Yahoo! Analytics)
My $0.02.
Reinvigorate - seems like they're always in beta registrations though. One upcoming feature they do mention is being able see heat map reports.
Please see my answer to this question with a comparison of metrics. It is meant as a warning against using Awstats and the like.
Just because a tool produces number and colorful charts, it does not mean it is any good.
I posted a screen shot of the Woopra live analytics capabilities in my answer to this question.

How do I implement Search Functionality in a website? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I want to implement search functionality for a website (assume it is similar to SO). I don't want to use Google search of stuff like that.
My question is:
How do I implement this?
There are two methods I am aware of:
Search all the databases in the application when the user gives his query.
Index all the data I have and store it somewhere else and query from there (like what Google does).
Can anyone tell me which way to go? What are the pros and cons?
Better, are there any better ways to do this?
Use lucene,
http://lucene.apache.org/java/docs/
Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
It is available in java and .net. It is also in available in php in the form of a zend framework module.
Lucene does what you wanted(indexing of the searched items), you have to keep track of a lucene index but it is much better than doing a database search in terms of performance. BTW, SO search is powered by lucene. :D
It depends on how comprehensive your web site is and how much you want to do yourself.
If you are running a a small website without further possibilities to add a custom search, let google do the work (maybe add a sitemap) and use the google custom search.
If you run a medium site with an sql engine use the search features of your sql engine.
If you run some heavier software stack like J2EE or .Net use Lucene, a great, powerful search engine or its .Net clone lucene.Net
If you want to abstract your search from your application and be able to query it in a language neutral way with XML/HTTP and JSON APIs, have a look at solr. Solr runs lucene in the background, but adds a nice web interface to it.
You might want to have a look at xapian and the omega front end. It's essentially a toolkit on which you can build search functionality.
The best way to approach this will depend on how you construct your pages.
If they're frequently composed from a lot of different records (as I imagine stack overflow pages are), the indexing approach is likely to give better results unless you put a lot of work into effectively reconstructing the pages on the database side.
The disadvantage you have with the indexing approach is the turn around time. There are workarounds (like the Google's sitemap stuff), but they're also complex to get right.
If you go with database path, also be aware that modern search engine systems function much better if they have link data to process, so finding a system which can understand links between 'pages' in the database will have a positive effect.
If you are on Microsoft plattform you could use the Indexing service. This integrates very easliy with IIS websites.
It has all the basic features like full text search, ranking, exlcude and include certain files types and you can add your own meta information as well via meta tags in the html pages.
Do a google and you'll find tons!
This is somewhat orthogonal to your question, but I highly recommend the idea of a RESTful search. That is, to perform a search that has never been performed, the website POSTs a query to /searches/. To re-run a search, the website GETs /searches/{some id}
There are some good documents to be found regarding this, for example here.
(That said, I like indexing where possible, though it is an optimization, and thus can be premature.)
If you application uses the Java EE stack and you are using Hibernate you can use the Compass Framework maintain a searchable index of your database. The Compass Framework uses Lucene under the hood.
The only catch is that you cannot replicate your search index. So you need to use a clustered database to hold the index tables or use the newer grid based index storage mechanisms that have been added to the Compass Framework 2.x.

Resources