We are branching out beyond the development team and trying to get other groups within my company to use version control for important documents that need change tracking. One frequent need is for Excel spreadsheets. These are large spreadsheets, modified fairly frequently (weekly or monthly) but with only a small portion of the cells changed each time.
Just sticking the files in subversion (the particular tool we are using) gives a history of changes and keeps old versions. And the TortoiseSVN client makes it easy for non-technical users. Recent versions of TortoiseSVN even contain a script which can be used to perform nice visual diffs between Excel documents.
My remaining concern is disk space. These are large documents. The diffs between versions are small, but I worry that the version control will notice that the file is binary and fall back to storing each version separately. Does anyone know of a solution to this? For instance, a format we could save in in which the diffs would be small so only differences would be saved, or a version control system which is specifically aware of Excel files? I have not yet done performance testing, but our version control server is already badly taxed and if there is a better solution I'd love to know what it is.
Currently SVN cannot efficently store those types of files. There has been some discussion about it though
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=651443
This SO question shows a graph when storing an OpenXML office document. The results were pretty linear
Will Subversion efficiently store OpenXML Office documents?
Although your question wasn't specifically about that format it may still apply. You might just need to run a test in SVN and see what kind of storage it takes. SVN is pretty good at storing binary files, so it might not be too terrible. The SO question above also mentions saving the file as a plain text XML 2003 document, which you might investigate also.
One consideration is using Team Foundation Server for source control (if that's an option), which will just store your delta changes, although it may be a bit heavy for what you're looking for.
From my understanding, binary vs. text doesn't have an impact on the storage size in SVN: http://help.collab.net/index.jsp?topic=/faq/svnbinary.html
Related
We have Product Backlog in an Excel Spreadsheet that we also commit to SVN, so everyone can open use it and update to latest version.
The problem we have is:
How do you enable Excel spreadsheet to be simultaneously used by many people and not override each other's data. Some kind of merging data? Is it at all possible?
We would like to keep our data in Excel spreadsheet since it provides all the functionality we need. We tried Google spreadsheets that are much better in terms of collaboration, but they don't support cell drop-down values...
EDIT
I've found out that I can also save my Excel file in XML format (XML Spreadsheet 2003). This format preserves all formating, formulas, conditional styles etc. Only graphs fall out, but I can live with that. I suppose that SVN Merge tools more or less support XML file merging when multiple people work on it.
So I thought of this feature, but I don't know how much does Excel change XML document between saves. Maybe it reorders a lot of stuff hence making it impossible for merge tools to work with expected results. Anyone has any experience with this?. It would basically make it possible for multiple people working on the same file.
EDIT2
I've tried managing Excel spreadsheets in a plain XML file (XML Spreadsheet 2003 file format) that makes it possible for SVN to merge it since it's a text file. I've learned to not use conditinal formatting, because it doesn't always work, so you'll have problems reopening the file. Also graphs won't be preserved.
Testing outcome: I've tried simultaneous work on the same XML, but Excel works with these files in no particular order (looks like it converts it to Excel and back to XML when saving it), so even if you make a small change to data, your XML will look completely different. So this format is a no go for collaborative purposes with merge capabilities.
FYI note: I've moved my operations from Excel to a more collaborative and similarly capable solution: Google Spreadsheets. Simultaneous collaboration is just working and working great (hopefully Microsoft will someday make collaboration this way on all its Office range) and it supports versioning as well as all the capabilities I'm using in Excel. COnditional formatting and with some additional script code I can use conditional formatting on whole rows based on single cells as well, so I can easily background colour whole rows of when set particular story status to "Completed".
Excel data can be merged but it's somewhat hard and error prone.
What I've found useful is to put the svn:needs-lock property on that kind of hard-to-merge files. The file will be read-only by default to communicate "do not edit this" and acquiring a svn lock makes it a read-write file, while preventing other users from acquiring a lock at the same time. It's a communication mechanism, not a 100% foolproof merge conflict prevention mechanism.
More information in svnbook.
Excel has a built-in share/merge capability (search the help for "share workbook") but it's meant for live copies on shared drives. Still, for this particular problem that might be a better solution than SVN.
Excel doesn't support the scenario you described, you need to use a multi-user database, or an proper work-tracking system. There are some hosted ones you can try (Jira, FogBugz) but you're not going to get multi-user with Excel (unless you use the shared spreadsheet option, but as David Moles said, that's meant for shared xls files on a file server).
SVN allows eveyone to get the latest version and work on the latest version without a "checking out" / "make read-only". This enables multi-users working the same file, and would assume that the Merge features only work on Text docuements (code files)
(This is based on my possibly limited experience with SVN)
Its probabaly time to upgrade from an Excel system to a more systematic multi-user database system.
I wonder if you're using the right tools for the job. Perhaps you should be investigating some Scrum tools to help you manage your projects.
Excel works well in a simple, non-shared scenario but I would investigate something more powerful to allow your team to manage the scrum process simultaneously.
After hearing about this project on several podcasts, I've started using Zen. It's based on a kanban board.
I'm pretty sure TortoiseSVN has some hooks (merge-docx.js, et al) which enable you to merge complex file types like office documents.
Or perhaps something like this (at a push)
have you looked at the Greenhopper planning backlog tool.
You could try XLLoop. This lets you write excel functions (UDFs) on an external server. It includes server implementations in many different languages.
So instead of keeping the data on your spreadsheet, you could have it stored in an external database then create a template excel sheet that calls the functions on your server. This will allow the data to be automatically updated and also allow multiple people to use the sheet at the same time.
BTW, I work on the project so let me know if you have any questions.
Looking to develop server-side application that will process documents. The source documents are mostly MS-Word 2003, 2007, i.e. the MS version of Docx. Want the server application to be able to run on both linux or windows.
Wanting to know what is the best tool or library for reading and writing MS-Word files under linux. Compatibility is the most important consideration. Must preserve source document formatting including tables.
I have seen a kind of similar post here but it was specific to python. I don't care what language or libraries are used as long as they are available for windows and linux.
Must not require MS-Word to read the Word files.
I am aware of Open Office but am looking for a solution which has a high degree of compatibility with MS-Word files.
Also just came across this solution which looks promising. aspose.com
Anyone had any experience using Aspose.Words for Java or similar 3rd party packages? It looks promising but it's pricey at over $2K for an OEM subscription. That said if it delivers as advertised it may still be the best solution out there.
thanks
There have been a couple of suggestions but nothing so far which would fits the bill (or the budget).
Have you considered using b2xtranslator to convert binary .doc to .docx. (On Linux, you'd have to run it in Mono)
You could then use POI or docx4j to manipulate the docx. Not a solution if you need to save as .doc though (unless you use OO for that bit)
Ok, I'll have another go at an answer ;-)
What about using unaconv
It can convert any document OpenOffice can read to any document OpenOffice can write. You should be able to use that to convert both to/from MS-Word documents (providing they're not overly complicated which I've found open office can't handle very well).
The only caveat is that you need to have an instance of OpenOffice running on the linux server for unoconv to interact with.
Mono has recently acquired support for the system.io.packaging .net class, which allows some degree of manipulation of docx files. If the kind of thing you want to do is add/remove resources and recurse over the text, it's probably the right thing.
I am looking for a good way to keep a design document up to date with the latest decisions.
We are a small team (two developers, game designer, graphic designer, project manager, sales guy). Most of our projects last a couple of months. At the start of the project a design is made but we generally find ourselves making changes or new decisions throughout the project. Most of these changes are improvements, so we want to keep our process like that. (If the changed design results in more time needed this is generally taken care of, so that part is OK)
However, at the moment we have no nice way of capturing the changes to the initial design document and this results in the initial design quickly being abandoned as a source while coding. This is of course a waste of effort.
Currently our documents are OpenOffice/Word, and the best way to track changes in those documents will probably be adding a changelist to the top of the document and making the changes in the text in parallel — not really an option I'd think as ideal.
I've looked at requirements management software, but that looks way to specialized. The documents could be stored in subversion but I think that is a bit too low level to give insight in the changes.
Does anyone know a good way to track changes like these and keep the design document a valuable resource throughout the project?
EDIT: At the moment we mostly rely on changes to the original design being put in the bugtracker, that way they are at least somewhere.
EDIT: Related question
Is version control (ie. Subversion) applicable in document tracking?
I've found a wiki with revision logging works well as a step-up from Word documents, provided the number of users is relatively small. Finding one that makes it easy to make quick edits is helpful in ensuring it's kept up to date.
Both openoffice and word include capaiblities for showing/hiding edits to your document. Assuming there's resistance to changing, then that's your best option - either that or export to text and put it into any source control software.\
Alternatively, maintain a separate (diffable using the appropriate tool) document for change-description text, and save archive versions at appropriate points in time.
This problem has been a long standing issue in our programming shop too. The funny thing is that programmers tend to look at this from the wrong optimization angle: "keep everything in one place". In my opinion, you have two main issues:
The changes' descriptions must be easy to read ("So what's new?")
The process should be optimized for writing of the specification to agree upon, and then get to work already!
Imagine how this problem is solved in another environment: government law making. The lawbook is not rewritten with "track changes" turned on every time the government adds another law, or changes one...
The best way is to never touch a released document. Don't stuff everything into the same file, you'll get the:
dreaded version history table
eternal status "draft",
scattered inconsistencies,
horribly rushed sentences, and
foul smelling blend of authors' styles
Instead, release an addendum, describing only the changes in detail, and possibly replacing full paragraphs/pages of the original.
With the size of our project, this can never work, can it?
In my biggest project so far, I released one base spec, and 5 consecutive addenda. Each of around 5 pages. Worked like a charm!
I don't know any good, free configuration management tools, but why not place your design under source control? Just add it to SVN, CVS, or whatever you are using. This is good because:
1) It is always up to date (if you check it in, of course)
2) It is centralized
3) You can keep track of changes by using the built-in compare feature, available in almost any source control system
It may not be the 'enterprisish' solution you'd want, but you are a small team of developers anyway, so for that situation, it is more than perfect.
EDIT: I see now that you already mentioned a source control system, my mistake. Still, I think it should work well.
Use Google Docs. Its free, web based, muti-user in real time, you can choose who has access to your documents, and keeps versioning. You can also upload all your word documents and it will transform them for you.
For more information: http://www.google.com/google-d-s/intl/en/tour2.html
We are developing a web application which is available in 3 languages.
There are these key-value pairs to translate everything. At this moment we use Excel (key, german, french, english) for this. But this does not work well ... if there is more than 1 person editing this file, you have no chance to automatically merge the different files.
Is there a good (and free) tool which can handle this job?
--- additional information ---
(This is a STRUTS application) But the question is how to manage these kinds of information in general (or at least in an conveinient way, which also supports multiple users editing this single file ("mergeable" filetypes))
Why not use gettext and manage separate .po files? See that blog entry.
If you can store this information in plain text then you will be able to use a version control system like subversion to help you with merging changes. Subversion is free.
The free guide (the "Red Book") to subversion gives a fairly good explanation of how this kind of merging works.
http://svnbook.red-bean.com/en/1.5/svn.basic.vsn-models.html#svn.basic.vsn-models.copy-merge
EDIT: Another thought - if you really want to stay using a spreadsheet - Google Docs supports simultaneous editing of a spreadsheet. You could import your existing spreadsheet and get your multi-user merging wishes for free with very little change to how you work.
Good Question.
There are some "Best Practice" depending on what you actually code in (java, ms-windows c#).
I solved this (but I think there must be a better way) by using a SQL db instead of excel file, and a wrote a plug for VS (VB6,........,..., emacs) that was able to insert new keys into the db without going to round trip with version control. The keys are the developers name of what they think is a best guess for a label. (key => save, sv => "spara", no => "", en => "save").
This db can then be generated as a module, class, obj, txt, to appropriate code(platform)
and can be accessed, depending on the ide, so in c#, bt,label = corelang.save;
Someone else can then do all the language stuff, and then we just update the db and rerun the generation to the platform resources.
After years of seeing localization done, including localization at large companies like Sony. I can only say the "standard" is Excel :)
There are tons of good ideas around, and probably many better ways to do it, but in real-life excel seems to be the best/cost effective solution that doesn't require training or making complex new tools to get the job done.
Found out, that Intellij Idea (at leas in version 7 and 8) has an editor for application resources. But it is not free at all. And it does not scale for bigger resource files with more than 1.000 keys.
Another good choice would be to use Google's spreadsheets ... for those who don't know it - it is like an "online Excell web-application". It can handle concurrent access from multiple users. Yay! But sadly, it comes from Google. This makes it impossible to be used in commercial projects.
So,
still searching...
cheers,
mana
Any good recommendations for a platform agnostic (i.e. Javascript) grid control/plugin that will accept pasted Excel data and can emit Excel-compliant clipboard data during a Copy?
I believe Excel data is formatted as CSV during "normal" clipboard operations.
dhtmlxGrid looks promising, but the online demo's don't actually copy contents to my clipboard!
I'm currently using dhtmlxGrid and we have the Excel copy/paste functionality working. dhtmlXGrid is the most full featured javascript grid package that I've found.
On their website, dhtmlXGrid claims to support Clipboard functionality in the Professional version. (However, I noticed the Sample on their site isn't working on my Firefox. EDIT: It's probably the permissions issue that Nathan mentioned.)
In any case, we had to do some extra work to get the exact Excel copy and paste functionality we wanted. We essentially had to override some of their functionality to get the desired behavior. Their support was pretty good in helping us come up with a solution.
So to answer your question, you should be able to get them to support copy and paste if you purchase the Professional version. I'm just warning you that it may take some additional work to fine tune that behavior.
Overall, I'm happy with dhtmlXGrid. We use a lot of their features. Their support is pretty good. They usually take one day to respond since they are in Europe (I think). And Javascript is by its very nature open source so I can always dive in when I need to.
Not an answer, but a warning: my company bought the 2007 Infragistics ASP.NET controls just for the Grid, and we regret that choice.
The quality of API is horrible (in our opinion at least), making it very hard to program against the grid (for example, inconsistent naming conventions, but this is just an inconvenience, we have complaints about the object model as well).
So I can't say that I know of a better option, I just know I will give a try to something else before paying for Infragistics products again (and the email support we got was horrible as well).
I was wrestling with this problem several years ago (2004 I think). We ran into the problem that Firefox doesn't allow scripts to read the clipboard by default (but you can grant access to the clipboard).
There's other ways of reading the clipboard data as well...Flash, for instance, can read the clipboard. There's a good article on ajaxian to explain how do to this behind the scenes.
In the end, we couldn't find a web-based Grid that fit the bill, so we had to create our own in a mixture of Actionscript and Javascript.
I'd hate to be Captain Obvious here...but what about a plain old .NET Gridview control? You can copy Excel data into it and out of it...and you can run it on any system with the .NET platform installed.
http://dhtmlx.com/dhxdocs/doku.php?id=dhtmlxgrid:clipboard_operations