I will be one month updating Excel files. These files are in a language other than English. I thought I could use Git too to manage what I want to do.
The situation (the initial commit)
I have an Excel file that is written in the other language.
I have to perform some work and fill an Excel file with data from that.
My plan
After an initial commit, create a branch called toEnglish. Then translate some text on the Excel files to English so that I feel more comfortable. Once I do this I will commit.
Then, the one-month work will start and I will fill the data in the Excel file. I will commit periodically.
After the one month finishes, I will commit, and so I will have the data filled in a Excel page where some labels are in English.
However the output of that one month work has to have those labels in the original language.
So I have a original branch with the original language labels but no data
and the toEnglish branch with the data but English labels.
The question
I can not merge (fast-forward merge) the branches since that will eliminate the original language labels, so how can I merge in order to produce conflicts (the labels in two different languages) that I will solve one by one so that the final merge will have both the data and the labels in the original language?
There is an even bigger problem with versioning Excel files in Git, which is that Excel files (xls and xlsx) are binary. Git doesn't generally handle binary very well. Each commit you make on an Excel file will likely record the entire file as the diff. In addition, comparing Excel files from two different commits/branches won't give you much insight.
One workaround which comes to mind would be to version plain text CSV versions of your Excel worksheets. Such CSV files would likely version well with Git. Of course, if the worksheets have lots of rich content on top of the data, then this option might not work as well.
There is an open-source Git extension that makes Excel workbook files diff- and mergeable: https://github.com/ZoomerAnalytics/git-xltrail (disclaimer, I'm one of the authors)
It installs a custom differ and merger for xls* types and configures Git accordingly so that it behaves the same way as if it were a text file.
For docs and a short video, have a look at https://www.xltrail.com/client
Excel is a bit useless in Git - it does not matter whether it is a binary (xlsb or not xlsx) - it will just copy the file and leave it as it is. Thus, it is a bit of a challenge to do a working source control for VBA developers - in general it is accepted that it does not exist and cannot be done (this is what I usually hear), but there are some ways for workaround - e.g., if you follow MVC and you do not put any business logic in the worksheets.
What you can do is simply to save the worksheets to a csv and proceed working as if it is normal plain text. At the end, even some "manual" merge with formulas is possible, based on the different worksheets (this is the bonus excel gives).
Related
Is it possible to insert a code so we can track all copied excel files in the future?
The reason why: we are creating a template excel file that people can copy and fill in. The problem is that they regularly have to fill in the same information so instead of starting from the template they copy the already filled in template.
If we decide to change the template, we want to change all the files that were copied so there are no multiple versions going around.
All the files are stored on a server in subfolders so We can access them all. Titles of the file will vary based on the wishes from the customer.
After reading you, I see that:
Summary:
You have one single Template that everybody copies
You store all the filled templates on one Server Subfolder
Title of the Files varies from Customer's needs
Challenges:
For Performance shake, you might need of a program than Excel to manage those files
Otherwise, it is possible to use Excel VBA, but is somehow/enough complicated so you would need to have an advanced skills and enough time to write everything handling that Subfolders' file renaming if you wish to collect the data in one Single Excel.
Suggested Solution:
I recommend you to have A Locked Worksheet + Workbook Excel
Template so your customers won't be able to edit its structure and
it will keep all of your templates to be the same.
You better have some kind of the Standard in the nomenclature of your Excel Files which will help you use that description later on for search/filter/sorting ...
You can have a Reset Button as well within the Template where your customers will click and will empty all the fields effortless.
In short, If you wish to track of files being copies, you would need more than Excel VBA for that as you need to play with A windows service for you to track them.
Hope this will give you some ideas. All the Best!
We have used Drools for our business rules management .We have created decision tables in spreadsheets and use MS-Excel for it .
We use git as a version control system.We are a big team and are facing huge issues with merging our changes made to the decision table excel sheet .
We co-ordinate will all developers and make all the changes in one excel sheet and then check into GIT.
What is the most efficient way of handling and merging the decision tables spreadsheet into git ?
Any alternate solution which can make each developer in-dependently check in their changes made to the decision rules spreadsheet into git ?
I would try to version your decision tables in csv file that can be converted to xls.
I need to work on tabular data with some people who will edit it using Excel 2010, likely set up with , as decimal delimiter and ; as list separator.
The data needs to be under version control, and it must be easy to fork/share, merge, and compare different states of the data set. This includes low-barrier ways to contribute for people outside our work group who do not have me around to help them set the system up.
What setup will allow my co-workers to easily open the file using Excel and edit cells, and commit and compare with very few clicks and maybe entering a commit message?
In particular, I require that if I open the file in Excel, do a null-change and save it again, and then do whatever is the commit step for this setup, the commit will be empty.
The data contains non-ASCII characters, in particular IPA-symbols.
I expected that I would just be able to use git with csv files. But while for ASCII data, .csv with comma as separator seems to fulfill these conditions, for more complete encodings I cannot seem to get Excel keep data the same on round-trips, not even by appearance, not to mention binary compatibility – it either loses unicode characters upon saving or does not recognize the format upon reading.
Like many, I have spreadsheet that draws data from over 40 text files as data sources. The text files are from another app, and need to be periodically updated into Excel.
The set of data source files and spreadsheet need to be able to be duplicated and run on different systems. This is where the astonishing inability of Excel to support data import from the spreadsheet folder (or relative paths at all) becomes a big problem. This question mentions the issue but has no solution.
I developed a crude workaround for this (IMHO) fundamental flaw in Excel. Map your spreadsheet folder to a drive letter with SUBST. Then import the data from the SUBST drive letter. That drive letter and path will become part of the spreadsheet, buried deep in dialogs, and very inconvenient to update. So instead, whenever you copy or move the spreadsheet, re-create the SUBST to the current folder. Ugly, but effective.
New Question: Using this technique, when I open the spreadsheet and click Refresh to refresh from the data sources, I have to click "Import" on over 40 dialogs - one for each file. How can I automate that process?
I discovered that under a data range properties, there is a setting for "Prompt for file name on refresh". By unchecking that, it is no longer necessary to click import for every linked file. The properties for each linked data source must be adjusted individually. There doesn't seem to be any ability to multi-select data sources.
I'm considering replacing a (very) large body of Office-automation code with something that works with the Office XML format directly. I'm just starting out, but already I'm worried that it's too big a task.
I'll be dealing with Word, Excel and PowerPoint. So far I've only looked at Word and Excel. It looks like Word documents should be reasonably easy to manipulate, but Excel workbooks look like a nightmare. For example...
In Word, it looks like you could delete a paragraph simply by deleting the corresponding "w:p" tag. However, the supplied code snippet for deleting a row in Excel takes about 150 lines of code(!).
The reason the Excel code is so big is that deleting a row means updating the row indexes of all the subsequent rows, fixing up the "shared strings" table, etc. According to a comment at the top, the code snippet is not even complete, in that it won't deal with a workbook that has tables in it (I can live with that).
What I'm not clear on is whether that's the only restriction that the sample code has. For example, would there also be a problem if the workbook contained a Pivot Table? Or a chart that references data from the same sheet? Or some named ranges? Wouldn't you also have to update the formulae for any cells (etc.) that referenced a row whose row index had changed?
[That's not to mention the "calc chain", which (thankfully) I think you can simply delete since it's only a chache that can be re-built.]
And that's my question, woolly though it is. Just how hard do you have to work do something as simple as deleting a row properly? Is it an insurmountable task?
Also, if there are other, similar issues either with Excel or with Word or PowerPoint, I'd love to hear about them now, before I waste too much time going down a blind alley. Thanks.
Having worked with the Open XML SDK 2.0 for almost two years now I can say that doing seemingly trivial tasks can take many hours and sometimes days to figure out how to do it properly. For example, deleting an Excel row should be fairly straightforward and easy to do right? Nope because not only do you need code to delete your row, but then you have to update all the row indices, update any merged cell references, update hyperlink references, etc. Our internal delete method is close to 500 lines of code to just delete a row and I'm sure we don't have all the cases accounted for either.
The biggest complaint I have is the lack of documentation on how to do the most common tasks. The MSDN section on the Open XML SDK is very limited and whenever you need to do anything complicated you are really on your own. I've had to read the Open XML standard a lot to figure out what certain elements mean and how they should be implemented since I could find very little online.
The other challenging part is if you insert an element in a spot where it doesn't belong or put an invalid attribute on an element you will get a corrupt file when you try and open it. Most of the time you will not get any information on what caused the error and you will have to look at the Open XML standard spec to see what you did wrong.
If you need a fast turnaround time on converting that Office automation code into Open XML and what you are doing is not really basic, then I would say pass. If you have time and the patience to read up on the Word, Excel and PowerPoint XML structures and get familiar with how they relate then I say go for it. In my opinion it is really the only way to have very fine control over these office documents, but there will be a great learning curve when you start.
Oh and just for fun here is how much code is needed to add a comment to an Excel cell.
Just for completeness, here are some libraries I found for working with Excel XML:
www.extremexml.com - a layer on top of the Open XML SDK classes; focusses on injecting data into an existing spreadsheet; handles many of the cross-reference problems I identified in my question. Open source but GPL2 not LGPL. Code looks nice, and documentation is excellent. Does not appear terribly active on codeplex though.
Closed XML - another layer on top of the Open XML SDK - again open source, but with a less restrictive license (MIT). Looks nice, and looks more "active" than the above.
SpreadsheetLight - from what I can tell, a closed-source library sitting atop the Open XML SDK classes. Targeted more at those looking to create a spreadsheet from scratch rather than making changes to existing spreadsheets.
Here is another third party library dedicated to working with OpenXML:
http://www.officewriter.com
In the example cited by amurra above of deleting Excel spreadsheet rows, this is a single method call with this tool. It updates formulas and all the other references for which it seems that 500 lines of code would be required for otherwise.
The OpenXML SDK itself is a great tool for very simple things, but you still have to concern yourself with a lot of the internals of the file format and packaging structure to get things really right.
Here are some additional libraries that can manipulate with OOXML formats:
- GemBox.Spreadsheet (XLSX)
- GemBox.Document (DOCX)
Also GemBox published some articles that demonstrate how to manipulate with OOXML file format with pure .NET (without a use of any library), I think you'll find this interesting:
www.codeproject.com/Articles/15593/Read-and-write-Open-XML-files-MS-Office
(Introduction to SpreadsheetML format and an explanation on how we can read and write worksheet's cell content)
www.codeproject.com/Articles/649064/Show-Word-File-in-WPF
(Introduction to WordprocessingML format and demonstration on how we can read document's text)