Append new columns into Excel with MATLAB - excel

I would like to ask how to use MATLAB to append new columns into existing excel file without altering the original data in the file? In my case I don't know the original number of columns and rows in the file and it is inefficient to open the files one by one and check in practice. Another difficulty is that the new columns may have different number of rows to the existing data so that I cannot use the trick of reading in the data, forming a new matrix and replace the data with the new matrix.
I have seen many posts teaching people how to add new rows but adding new column seems quite a different thing since the columns are named by letters instead of numbers.
Thank you.

You could try reading in the data, use size on the array to determine the number of columns, and then use xlswrite with the range that you want. Have a look here for a function to turn the column number into the excel format: http://au.mathworks.com/matlabcentral/answers/54153-dynamic-ranges-using-xlswrite

Finally I solve it with the following code:
%%%
if (step==1)
xlswrite(filename,array,sheetname,'A1'); %Create the file
else
[~,~,Data]=xlsread(filename,sheetname); %read in all the old data
OriCol=size(Data,2); %get the column number of the old data
NewCol=OriCol+1; %the new array is placed right next to the original data
ColLetter=xlcolumnletter(NewCol);
StartCell=[ColLetter,'1'];
xlswrite(filename,array,sheetname,StartCell);
end

Related

Generate a multicolumn table using docxtpl

I have a series of data (in 2-dimensional list 'CombinedTable') I need to use to populate a table in an MS Word template. The table has 7 columns so I attempted the following using docxtpl module:
context = {
'tpl_modules1': CombinedTable[0]
'tpl_modules2': CombinedTable[2]
'tpl_modules3': CombinedTable[4]
'tpl_modules4': CombinedTable[6]
'tpl_modules5': CombinedTable[8]
'tpl_modules6': CombinedTable[10]
'tpl_modules7': CombinedTable[12]
}
tpl.render(context)
tpl.save(FilePath + FileName)
Not the most elegant solution I know but am just trying to get this working- unfortunately using this code with the following template results in tpl_modules7 data being written in to all columns, rather than just the 7th.
Does anyone have advice for how to resolve this? I attempted to create a for loop through the columns as well as rows but was unsuccessful in writing anything to the doc (was saved as a blank & empty doc).
The CombinedTable variable is a list of 12 lists (one for each column in template, although only 7 contain data). Each of these 12 lists contains another list with cell data whose length is equal to the number of rows to be written to the table in that column. This means that the number of rows that are written to varies for each column.
EDIT: Looking more closely at the docs, it states that I cannot use %tr multiple times in the same row. I assume I will then have to use a loop through %tc and %tr (which I tried & couldn't get working). Any advice on how to implement this? Especially on the side of the word document. Thanks!
I was able to resolve this satisfactorily for my requirements, however my solution may not suit all. I simply set up 7 different tables in a document with 7 columns and adjusted margins/borders to suit the dimensions I required for the tables. Each of the 7 tables had identical docxtpl syntax as image in my question with the small buffer columns between them being replaced by columns in the word document.

Removing duplicates between multiple large CSV files

I am trying to find the best way to remove duplicates from large CSV files.
I receive CSV files of around 5/6 million rows every month.
I need to adjust these (I only need some of the columns, and I need to add some others).
The files also contain a lot of duplicate, and incomplete rows.
I've come up with a solution in python where I use a set and check for each row if it's in the set. And change what needs changing.
Now, I get the second file, and it contains a lot of duplicates that are in the previous file.
I'm trying to find an efficient solution to remove duplicates within the file, and between the different files. In the end I want to have a list (table or csv file) that contains only the new entries for that month.
I would like use python, and I was thinking about using a sqlite database for storing the data. But I'm unsure which way would be most efficient.
I would use numpy.unique():
import numpy as np
data = np.vstack((np.loadtxt("path/to/file1.csv"), np.loadtxt("path/to/file2.csv")))
#this will stack both arrays on top of each other, creating one giant array
data = np.unique(data, axis=0)
np.unique takes the entire array and returns only the unique elements. Make sure you set axis=0 so that it goes row by row and not cell by cell.
One caveat: This should work, but if there are several million rows, it may take a while. Still better than doing it by hand though! Good luck!

How to read mixed string and number data from csv in matlab and manipulate

I'm looking to write a script for MATLAB that will import data from a csv file which has a first row containing string headers and the data in each of those columns is either string, date or numeric.
I want to then be able to filter the data in MATLAB according to instances of a particular string and number combination.
Any help appreciated!
Cheers!
I would recommend you to start with reading MATLAB documentation.
[num,txt,raw] = xlsread('myExample.xlsx')
Reads numeric, text and combined data, so, if your data is combined, then you need the cell array raw. After that, you do whatever you want with your cell array (Additional information is not provided since OP did not provide any specific information about the way the data would be filtered)
Try using readtable function in MATLAB.
It correctly imports csv file with header and mixed data type.
xlsread was imported by mixed csv file very incorrectly repeating the some rows while maintaining the same total rows.
I got this after searching for a long time:
MATLAB Central Question/Answer

Reading mix between numeric and non-numeric data from excel into Matlab

I have a matrix where the first column contains dates and the first row contains maturities which are alpha/numeric (e.g. 16year).
The rest of the cells contain the rates for each day, which are double precision numbers.
Now I believe xlsread() can only handle numeric data so I think I will need something else or a combination of functions?
I would like to be able to read the table from excel into MATLAB as one array or perhaps a struct() so that I can keep all the data together.
The other problem is that some of the rates are given as '#N/A'. I want the cells where these values are stored to be kept but would like to change the value to blank=" ".
What is the best way to do this? Can it be done as part of the input process?
Well, from looking at matlab reference for xlsread you can use the format
[num,txt,raw] = xlsread(FILENAME)
and then you will have in num a matrix of your data, in txt the unreadable data, i.e. your text headers, and in raw you will have all of your data unprocessed. (including the text headers).
So I guess you could use the raw array, or a combination of the num and txt.
For your other problem, if your rates are 'pulled' from some other source, you can use
=IFERROR(RATE DATA,"")
and then there will be a blank instead of the error code #N\A.
Another solution (only for Windows) would be to use xlsread() format which allows running a function on your imported data,
[num,txt,raw,custom] = xlsread(filename,sheet,xlRange,'',functionHandler)
and let the function replace the NaN values with blank spots. (and you will have your output in the custom array)

How to get the max no. of columns filled in an XLSX file using POI?

I know we can get the max number of columns by iterating over all the rows and calling getLastCellNumber on each row object.. but this approach requires iterating over all the rows which I want to avoid since it will take lot of time for files with a million rows(that’s the kind of files I am expecting to be read).
When POI reads a excel file, it stores the sheet dimensions (first row number, last row number , first col number, last col number) in an object of the DimensionsRecord class. So if I get this object I will get what I need. These objects can be obtained from the Sheet class which is an inner class of POI. I was able to extract what I need for XLS files, but I have hit a roadblock for XLSX files.
Does POI maintain DimensionsRecord object for XLSX also?, if yes has anybody tried to extract it? Or Is there some other by which this can be done?? please help!
Also I wanted to ask, whether my approach is correct or not, i.e I am using the inner classes of POI (it is getting my work done), is this correct or should I solely rely on exposed APIs (too time consuming).
There's a dimension object on XSSF Sheets too. Try:
CTSheetDimension dimension = sheet.getCTWorksheet().getDimension();
String sheetDimensions = dimenson.getRef();
The one issue that springs to mind is I'm not sure if it's required for the dimension (CTDimensions or DimensionsRecord) to always be correct...

Resources