We looking for a simple, open source, web based document management system for Linux. With document management I mean the ability to store a set of files (minimally doc, xls and pdf) as a document. Associate meta data with the document like owner and version. Update and delete documents. Ability to index and search content. Authentication and the ability to authorize at least read, and possible write. If possible I would like to avoid implementations in Java or PHP, and as we use MySQL already that would work especially well for meta-data storage.
We have used Google Applications in the past but the lack of support for PDF makes it a poor fit. Other downsides include their service losing some of our spreadsheets, no concept of company owning information opposed to individual accounts, and some of our information is sensitive and we prefer keeping it in-house (passwords, contracts etc).
MediaWiki was not a good fit either as our documents is really a set opposed to be structured content (i.e. not looking for a content management system), and at least the version we had installed did not deal well with attachments.
Based on review of past questions I plan on looking into KnowledgeTree. Any other projects that we should consider?

I've been using KnowledgeTree now for a few months developing an ASP.Net application and I only have good things to say about it. Our product uses it for PDF storage/retrieval and it really couldn't be easier to deal with. The basic install gives you a simple environment with almost endless amounts of configuration for meta-data, document groups, and various security options. Also, the KnowledgeTree staff have been very helpful and have provided us with sample code when we have run into 'how are we going to do that?' moments.

I'll second the recommendation for KnowledgeTree. Have been using it for a couple years and have roughly 1K documents indexed. Sometime last year, I wrote a short script which monitors KT's transaction table (in MySQL) and notifies users of new or updated documents via Twitter, Identica, and/or Jabber. The Twitter/Identica feeds can then be monitored with a RSS reader.

Look for something that will index all your document formats and keep them searchable.
I solved this in my office using Coldfusion. It has verity search engine built in. This indexes files on your network (doc/xls/pdf, etc) to make the text in them searchable (like google).
An instant search engine for all my files, for upto 150,000 or so is built in for free with Coldfusion so it suits my purpose.. Something like this would allow you to save your files on a network how/wherever and you'd be able to extract the rest of the information about owners, modification dates through libraries available in java / .net.
I am sure you could replicate this with another language, but probably a bit more effort. I am presently wishing I could use the Google Docs API as a wysiwyg editor in my own wiki in-house.. that would solve most of my problems then because everything would be intranet based.

Try, written on Django, db agnostic

You can consider GroupDocs as they have got storage, conversion and few more features.


Are there methods to trace text files that were web scrapped, finding the user involved and after the text has been machine translated?

I’m a bit new and want to create a site with these capabilities but don’t know where to start, please point out if I violated any rules or should write differently.
I’ll a bit specific here, so there is a web novel site where content is hidden behind a subscription.
So you would need to be logged into an account.
The novels are viewed through the site’s viewer which can not be selected/highlighted then copied.
If you web scrape and download the chapters as .txt files then machine translated using like Google Translate, is there a way to track the uploader of the MTL or when the MTL file is shared?
Of similar nature there are aggregator sites that have non-MTL’d novels on them, but the translation teams have hidden lines which tell readers to go to their official site. The lines aren’t on the official site though, only when it has been copied. How is that possible?
I’ve slightly read about JWT, and I’m assuming they can find the user when they’re on the website but what about in the text?
Additional Question (if above is possible, don’t have to answer this just curious)
If it is possible to embed like some identifying token, is there a way to break it by perhaps converting it into an encrypted epub?

Social networking is great, but there is something fundamentally wrong with the way social networking is implemented today in most popular services. I'll put it in this example: Imagine that there is no SMTP, and consequently, it is globally assumed and accepted that you can only send email to addresses on the same domain. The result would be the emergence of a single email service, let's call it, which we all have to subscribe to, if we really want to communicate with the world.
This is what's happening with social networking today. You HAVE to use the same service your fiends/colleagues are using to talk to them.
I would like to be able to put up my own social site, invite my friends who trust me, share amongst us, but still be able to share with the world at large.
What are the chances of this scenario happening in the future? What does it take?
There sure is, and not just one! The future you wanted is now here.
By the time of the question, back in the end of 2010, OStatus had already existed for half a year, and the year before that there was OpenMicroBlogging (OMB), and at about the same time as OMB, the XMPP XEP 0277.
Since then several other protocols have popped up, such as diaspora* just half a year later, and later some smaller players like Friendica's DFRN and HubZilla's Zot.
OStatus never left draft status, but the big buzz[0] these days is about ActivityPub, which is a W3C recommendation since January 2018 and came out of the Social WG mentioned by #keithjgrant in his answer. There is a multitude of implementations[1], finding their niches with different use cases like microblogging, blogging, link sharing, picture sharing, video sharing and audio sharing.
There is also the collection of blog-oriented protocols described on
[0] pun intended
[1] Diaspora and GNU Social, although shown at, do not implement ActivityPub. The other applications shown do. There are several other applications not shown there, such as FunkWhale, Plume, WriteFreely, Prismo ... There is no terse and complete overview of all of them, but several are listed at and publishes news and interviews related to all of them.
There are a few. One Social Web uses XMPP which is open and decentralized like SMTP.
Check it out.
I absolutely agree. The good news is, yes, things are happening. Even better, they are happening in the W3C, which means open standards.
The W3C now has a Social Web working group. They are actively working on a handful of standards. The biggest of these seems to be the Social Web Protocol.
Today, they also posted the W3C Recommended spec for Webmention, which is sort of an improved version of the old pingbacks that used to be used on blogs, this time built on HTTP. It allows a post to notify another page on the web when it references it. There are already a number of libraries and services that implement this today.
I think you should take a look at It is a spec developed by google and other social networking players. It supports interoperability and much more.
OpenSocial is currently being developed by a broad set of members of the web community. The ultimate goal is for any social website to be able to implement the API and host 3rd party social applications. There are many websites that support OpenSocial, including hi5, LinkedIn, MySpace, Netlog, Ning, orkut, and Yahoo!

I have an application that generates around 10000 printed pages per month. Each report (around 2000/month) is archived as PDF on a simple network
file share.
I am searching for a Document Management System meeting the following requirements:
watch the archive folder and update the index either on regular basis or when changes are detected
provide an Intranet Webpage where users can search documents based on filenames, timespans and other relevant file attributes
fulltext search
can handle large/substantially growing archives
To be clear, I am searching for a pre-built solution here, commercial products are accepted.
Sounds like Microsoft Search Server 2008 Express would be a good candidate. Free and installs in a couple of minutes.
I can suggest you google docs. AFAIK It can handle all your requirements.
This is a very vague question and I'm not quite sure how to respond.
It looks like you want a way to index all your files and ensure that the information is kept up to date in the database. What I can suggest is you look into some search servers like:
These both take some setup but they handle all your requirements: They can easily be setup to watch a folder and keep your index up to date, they provide great fulltext search, they can be accessed via an intranet webpage if you setup a page to search your database, and they are used for enormous operations so large archives shouldn't be a problem.
If you're looking for a pre-built solution, I'm not sure what to mention.
Plone could work pretty well for your needs. It has plugins for indexing PDF content, and you can customize the metadata. Also, it has a fantastic web interface with built-in search. The best part is that it's free and easy-to-use, and if your needs grow, you can pay for support.
My only recommendation (at first glance) is that you store your content on the file system and not in the Zope OO database. You should only store your metadata and index data in the database. This is a pretty common way of storing large amounts of content in the document management world.
Hope that helps!
Tom Purl
As Tom said Plone does to what you describe. It has build in full text search that relies on the commandline programm pdftotext for pdfs to be in the path. There are several Extension you may me interested in:
Reflecto - Watches a part of the
filesystem and alows searching and
displaying it inside Plone:
See reflecto on the
TextIndexNG 3 - Indexing extentions written for a publishing house
collective.solr - use the search enging "solr" to drive the catalog:
See collective.solr on
(Sorry, missing links due to stackoverflows new user policy)

I am a programmer, and my father uses Access to collect the patients information (my father is a doctor),
He wants me to teach him how to use it.
I don't like Access (I'm a linux guy), and I cannot find any replacement of it. Do you guys know of any? (it must be easy enough for my father to use)
Maybe you need to be a bit more pragmatic about this.
I'm not a fan of Access either, but if your father already understands it and he already has the system in place, you need to ask the question, why change? If it aint broke don't try to fix it.
You may find that a few simple changes in the existing system gives your father everything he needs, it'll save you a whole lot of time and means you don't need to retrain your father.
What about OpenOffice - Base?
Your father wants you to teach him how to use access but you're a linux guy and don't like access.
Access isn't the problem here
I don't think you and your father a good fit for this.
Get someone else to teach him how to use Access
Access is not always the monster it is made out to be. A poorly coded database in any application or language is a poorly coded database. Access' dominance of the market at a critical time led to more people coming across a higher ratio of poorly designed databases.
There's a great deal of support out there for Access users and programmers too. I particularly like Access World Forums. As ilivewithian said, if you're not happy telling him about it, get someone else to.
If however you are keen to take on the role of tutor to your dad (and I can see the attraction - a chance to give something back, perhaps), then I would suggest a web-based database interface. Unlike Oli, I have no experience of Django, but I would recommend Dabble or blist. (Blist is particularly good at handling images, Dabble is better at flexible report formats, though neither is as good at reports as Access, IMHO).
I think the natural successor to Access is a simple web-interface database system.
They're simple enough to create in a billion different ways but I would seriously suggest trying Django (because you'll find its admin area does 90% of the real work for you in this case)
FileMaker Inc. is subsidiary of Apple. It runs on Mac OS X as well as Windows (whereas MS Access only runs on Windows). Many people claim FileMaker is easier to use than MS Access. Sounds like FileMaker might be the perfect solution for you! (although I do agree with ilivewithian)
There's also Sun's counterpart to MS Access in OpenOffice/StarOffice called BASE (someone already mentioned this), which is also cross-platform compatible.
Rather than develop his own record keeping application he would probably be better off purchasing an already developed system from one of the numerous medical record system vendors. He'll get a better application and have people he can call on for support. Plus there are all of the legal issues about medical record storage and access. A vendor will have worked out those problems already.
That having been said there are many other file based databased systems out there:
I haven't used any of them so I can't make a recommendation.
Of course, there's always the various enterprise databases (Oracle, MySQL, SQL Server, etc...) as well. Of those SQL Server is probably the easiest to learn for a newbie. Since there's no 64 bit version of Access I'm starting to see people replace Access with SQL Server Express (free!) for small applications that need to run on 64 bit windows.
I am using Viravis now for more than 6 months in a multi-language organization with several projects and I find it very good. It's not only easy to build (I am a beginner) but they give also very good support!
Gambas ist a very good alternative for Access if one used Access as a database frontend and programmed with VBA (Visual Basic fro Applications). One can reuse a lot of code written for Access and create forms and reports easily.
So for a VB or VBA programmer, who wants to use the own knowledge under Linux, Gambas is a wonderful solution.
No first hand experience, but you can try out Database. Or, you may teach your Dad to use the MySQL GUI tool.
Getting the database structure is the toughest part for most. Creating a simple form or report is not that tough either. As far as being a users (data entry, reports, etc.) is probably easier than most applications. You also have all the searching and sorting capabilities; why reinvent the wheel?
Viravis may be an online alternative to the access database. You should better to check it out if it fit your need.
For Windows and simple data, I would use Excel, so I think Open Office should be ok. Unless your father has a hospital, it will probably fit... Or you can do some programming, take embedded database like Firebird and write something on your own, say - in Java?

Currently all our files are stored on a Windows network drive and with 15 members of staff and 3 external workers, file control is beginning to become a bit of a nightmare. Even though we have a policy in place, people still seem to save file to their PCs, make changes, and copy them back without notifying anyone, send files via email instead of its location, and create folders/structures which only make sense to them.
Consequently on a recent project we found that 3 members of staff were using different versions of the same document and when those 3 people are editors and proof readers, you can probably imagine the problem that ensued in the end.
So we are looking for some nice simple file management apps. MS Sharepoint has been mentioned but we are looking to get away from being tied to a Windows machine, and the cost of setup etc. seems expensive particularly for a non-profit company. Also it seems Sharepoint may be a little over-the-top for our needs.
All we need is something that can fulfill the following:
can be used to store and control files
allow different user access
provide basic versioning
hopefully accessible through a web-browser so our remote workers can access it
We are not keen on SAAS solutions because of the nature of our confidentiality and also because we use these files all day everyday and the internet connection does go down from time to time. We want to be able to install in-house.
Ideally the solution will be FOSS, although we will consider buying software if it meets our needs.
You can try Alfresco:
Alfresco is the Open Source
Alternative for Enterprise Content
Management (ECM) led by John Newton,
founder of Documentum, and John
Powell, former COO of Business
Objects, and is backed by Accel
Partners, Mayfield Fund and SAP
Here has a good howto install it on linux.
The first question you probably need to ask is why the existing Windows file shares aren't working, and people are still saving files to their own computers.
For example, if they're often working outside of the office and can't access the file shares or they need to maintain a working copy, these are problems that can be fixed with SharePoint or other version control/file management software.
However, if they're just not following policy, then it's not going to matter what software you put in its place. Figuring out what problems the users have is going to help you choose the right solution.
Not sure this is the best place for such a question (its a discussio with no write/wrong answer) but anyway
Google apps for business?
Totally easy, low TOC (OSS is not free in a time sense).
You can share docs (read/write or read only) with external people or just do the old fashioned copy/paste the detail into OpenOffice/Word/iLife whatever and send a copy to them
Wouldn't something like a source control system be useful? SVN for example? admittedly binary files are a problem here, but if you're using a basic format you could convert to rtf or the new document standards used by Office 2007\OpenOffice.
It's worth noting that SharePoint and other variants are used widely for a reason; they do what you need.
Are you trying to avoid Windows Server completely, or just avoid buying Microsoft SharePoint Server?
If you are willing to purchase a Windows Server license you will get a basic version of SharePoint Server called SharePoint Services as part of the package. SharPoint Services allows you to have a powerful document management and collaboration system without having to buy an additional software package. It does include a version control system and you can integrate it with other applications. You can find more information here: Windows SharePoint Services 2.0 Overview.
Another MS provided solution that can handle file management and version control is Microsoft Groove. You can find more information on it here: Microsoft Groove. A great feature of Groove is that it can act as a front-end for Sharepoint (and most likely SharePoint Services) to allow users to more easily interact with the file storage mechanism.
A third option but will be less powerful would be to use your existing network file shares (through Windows or Samba), map the shares to local drives and/or reconfigure their My Documents to point to the network, and turn on Offline Storage. This will allow the users to interact with their documents as if they were local files even when they are offline. There will be a few small issues that you will experience with this route but it would break you from having to use a pure Microsoft solution.
In answer to some of the above questions.
The main reason its not working is because. One person will open a document from the shared drive and save a copy to their pc, which they work on. The changes they make are then not on the shared drive, when they copy it back, which everyone does the changes they have made overwrite any anyone else has done, they also dont inform anyone so if someone is working from that document they are now working on an old document. It is a case of getting users into a better frame of mind! But we feel software may help that, plus our external workers do not have access to the internal drive at present.
We have a number of servers, only one is windows and so we want to get away from using that windows server and have all linux servers for ease of management. Any MS product will require we run a dedicated MS machine!!
Local drives mapped is not really a good option as many people work out of the office and so wont be on the network to contribute, plus the file structure would probably not allow it.
It does seem that a MS solution might be the only one, i was just hoping there were some good alternatives available which were also a little simpler.
A standard sharepoint document library, with versioning turned on, and checkin/checkout required, would meet your needs. Like previously posted, WSS comes free with Windows Server.
