How to compare Excel files (.xlsx) with Kdiff3?

How to compare Excel files (.xlsx) with Kdiff3? - excel

I added Kdiff3 as my external diff tool in Source tree as shown in the figure.
But when I select two commits from Master and click on External Diff from Actions, kdiff3 is showing non-readable text as shown.

To compare excel files in SOurceTree, I used WinMerge (along with plugin, to compare excel files) which is a free tool from http://freemind.s57.xrea.com/xdocdiffPlugin/en/

It seems like you're trying to compare two Excel files. Such files are stored in binary format and are not comparable using tools designed for comparing text files (such as kdiff or winmerge).
To compare two Excel files use Excel itself: https://support.office.com/en-ca/article/Compare-two-versions-of-a-workbook-by-using-Spreadsheet-Compare-0e1627fd-ce14-4c33-9ab1-8ea82c6a5a7e

Related

Upload Microsoft Excel Workbook with Many Sheets into Azure ML Studio

I want to upload my Excel Workbook into Azure Machine Learning Studio. The reason is I have some data that I would like to join into my other .csv files to create a training data set.
When I upload my Excel, I don't get .xlsx, or .xls, but other extensions such as .csv, .txt etc..
This is how it looks,
I uploaded anyways and now, I am getting weird characters. How can I get excel workbook uploaded and get my sheets, so, I can join data and do, data preparation. Any suggestions?

You could save the workbook as a (set of) CSV file(s) and upload them separately.
A CSV file, a 'Comma Separated Values' file, is exactly that. A flat file with some values separated by a comma. If you load an Excel file it will mess up since there's way more information in an Excel file than just values separated by comma's. Have a look at File -> Save as -> Save as type where you can select 'CSV (comma delimited) (*.csv)'
Disclaimer: no, it's not always a comma...
In addition, the term "CSV" also denotes some closely related delimiter-separated formats that use different field delimiters. These include tab-separated values and space-separated values. A delimiter that is not present in the field data (such as tab) keeps the format parsing simple. These alternate delimiter-separated files are often even given a .csv extension despite the use of a non-comma field separator.
Edit
So apparently Excel files are supported: Supported data sources for Azure Machine Learning data preparation
Excel (.xls/.xlsx)
Read an Excel file one sheet at a time by specifying sheet name or number.
But also, only UTF-8 is supported: Import Data - Technical notes
Azure Machine Learning requires UTF-8 encoding. If the data you are importing uses a different encoding, or was exported from a data source that uses a different default encoding, various problems might appear in the text.

Save versions of Excel file on Git to reconcile differences manually later

I will be one month updating Excel files. These files are in a language other than English. I thought I could use Git too to manage what I want to do.
The situation (the initial commit)
I have an Excel file that is written in the other language.
I have to perform some work and fill an Excel file with data from that.
My plan
After an initial commit, create a branch called toEnglish. Then translate some text on the Excel files to English so that I feel more comfortable. Once I do this I will commit.
Then, the one-month work will start and I will fill the data in the Excel file. I will commit periodically.
After the one month finishes, I will commit, and so I will have the data filled in a Excel page where some labels are in English.
However the output of that one month work has to have those labels in the original language.
So I have a original branch with the original language labels but no data
and the toEnglish branch with the data but English labels.
The question
I can not merge (fast-forward merge) the branches since that will eliminate the original language labels, so how can I merge in order to produce conflicts (the labels in two different languages) that I will solve one by one so that the final merge will have both the data and the labels in the original language?

There is an even bigger problem with versioning Excel files in Git, which is that Excel files (xls and xlsx) are binary. Git doesn't generally handle binary very well. Each commit you make on an Excel file will likely record the entire file as the diff. In addition, comparing Excel files from two different commits/branches won't give you much insight.
One workaround which comes to mind would be to version plain text CSV versions of your Excel worksheets. Such CSV files would likely version well with Git. Of course, if the worksheets have lots of rich content on top of the data, then this option might not work as well.

There is an open-source Git extension that makes Excel workbook files diff- and mergeable: https://github.com/ZoomerAnalytics/git-xltrail (disclaimer, I'm one of the authors)
It installs a custom differ and merger for xls* types and configures Git accordingly so that it behaves the same way as if it were a text file.
For docs and a short video, have a look at https://www.xltrail.com/client

Excel is a bit useless in Git - it does not matter whether it is a binary (xlsb or not xlsx) - it will just copy the file and leave it as it is. Thus, it is a bit of a challenge to do a working source control for VBA developers - in general it is accepted that it does not exist and cannot be done (this is what I usually hear), but there are some ways for workaround - e.g., if you follow MVC and you do not put any business logic in the worksheets.
What you can do is simply to save the worksheets to a csv and proceed working as if it is normal plain text. At the end, even some "manual" merge with formulas is possible, based on the different worksheets (this is the bonus excel gives).

Pentaho Data Inegration - Multiple Excel File Inputs Loading

I've been using Spoon as a tool to complete a project. One of the requirements is to load multiple Excel files, that have the same format (sheets), in order to output it to a Table Output.
However the number of Excel Files has to be variable (requirement) but they are located on the same folder. Which step(s) allows to load all the Excel files that are on a folder?
Thanks.

The Microsoft Excel input step support reading all files in a folder, or some based on regular expressions. You can also read all files including subfolders.

Beyond Compare: Export Differences in two CSV files in CSV format

I want to export differences between 2 CSV files and want the export to be present in CSV format.
Using Beyond Compare software:
I can use text compare and export differences in csv, but if i use text compare, i cant use keys
I can use data compare which allows me to use keys (by using csv format) but then differences are exported without any separators.
I am using Session > Data compare report > Side-by-Side for exporting differences
Any tips/suggestion?

I exported the differences using side-by-side as HTML Report in data compare report. (Used mono chrome since its size is less)
Then copied the table into excel and saved result as csv.
Not the solution i was expecting but it solved my problem.

Totaling figures in .csv files using Excel

I have 12 .csv files produced by another program. The .csv files contain numeric data, separated by commas.
I need an easy way of totaling the values in certain columns in each of the files and comparing the totals across the various files e.g. compare the total from file 1 to the total from file 5.
The format of each file is the same i.e. 5 values in each record, separated by commas. Each of the 12 .csv files is about 50 Mb in size. Each file has a different number of records.
The environment I work in is 'secure' and I cant run any programs other than what I have installed on the PC I use. I have Excel installed and assume I can write VBA code/macros and I have access to the Command line. I can't (for example) load anything from a USB key and can not install any scripting language e.g. Python.
I have thought of doing this manually e.g. open each .csv file in Excel and total the columns using Excel functions i.e. SUM()
My challenge I need to do this many times of the next few weeks as new versions of the .csv files are produced i.e. I now have the first version, there will be many versions of the 12 files produced as I conduct testing on the other system. For each new version I need to sum the data and compare across files.
Last thing to say is, I cant change the system that produces the .csv files e.g. to create a set of totals
I'm looking for a programming solution that I can use, given my limited resources (ability to use any tools other than what is already on the PC)

You should be able to do this easily using an excel VBA macro but it might take quite some time if it needs to load and convert a 50MB csv file.
JScript (a microsoft form of JavaScript) is generally available on all machines and runs under the windows scripting host. Just create a file with a .js extension and try to run with a double click. Or you can use vbscript with a .vbs extension.
I think your easiest solution would be to write an excel macro (as you will have the IDE for excel vba as limited as it is).

Powershell or a batch script? A CSV is nothing more than a text file split with commas. Should be fairly easy to knock something up.

ADO can work on CSV files and you could then use SQL statements to sum the appropriate values - see this MSDN article for full details.
If you go to the Visual Basic Editor in Excel then try to add a reference via the Tools menu you should have several for Microsoft ActiveX Data Objects (2.8 being the most recent one.) Adding that reference lets you use ADO.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to compare Excel files (.xlsx) with Kdiff3? - excel

I added Kdiff3 as my external diff tool in Source tree as shown in the figure. But when I select two commits from Master and click on External Diff from Actions, kdiff3 is showing non-readable text as shown.

To compare excel files in SOurceTree, I used WinMerge (along with plugin, to compare excel files) which is a free tool from http://freemind.s57.xrea.com/xdocdiffPlugin/en/

Related

Upload Microsoft Excel Workbook with Many Sheets into Azure ML Studio

Save versions of Excel file on Git to reconcile differences manually later

Pentaho Data Inegration - Multiple Excel File Inputs Loading

Beyond Compare: Export Differences in two CSV files in CSV format

Totaling figures in .csv files using Excel

Categories

Resources