How to read mixed string and number data from csv in matlab and manipulate - excel

I'm looking to write a script for MATLAB that will import data from a csv file which has a first row containing string headers and the data in each of those columns is either string, date or numeric.
I want to then be able to filter the data in MATLAB according to instances of a particular string and number combination.
Any help appreciated!
Cheers!

I would recommend you to start with reading MATLAB documentation.
[num,txt,raw] = xlsread('myExample.xlsx')
Reads numeric, text and combined data, so, if your data is combined, then you need the cell array raw. After that, you do whatever you want with your cell array (Additional information is not provided since OP did not provide any specific information about the way the data would be filtered)

Try using readtable function in MATLAB.
It correctly imports csv file with header and mixed data type.
xlsread was imported by mixed csv file very incorrectly repeating the some rows while maintaining the same total rows.
I got this after searching for a long time:
MATLAB Central Question/Answer

Related

MATLAB xlsread Function to Import Dates

thanks for taking a look at my question.
I'm having a peculiar issue importing an xlsx file into MATLAB R2016a (Mac OS X) , more specifically importing dates.
I am using the below code to import my bank statement history from the Worksheet 'Past' in the xlsx file 'bank_statements.xlsx'. A snippet of column 1 with the dates in dd/mm/yyyy format is also included.
[ndata, text, data] = xlsread('bank_statements.xlsx','Past');
My understanding is that MATLAB uses filters to distinguish between text and numeric data with these being represented in the 'text' and 'data' arrays respectively whilst 'ndata' is a cell array with everything included. Previously, when running the script on MATLAB 2015a (Windows) the dates from column 1 were treated as strings and populated in the 'text' array, whilst on MATLAB 2016a (Mac OS X) column 1 of the text array is blank. I assumed this was because updates had been made to how the xlsread function interprets date information.
Here's the strange part. Whilst inspecting the text array through the Variables window and referencing in the Command Window shows text(2,1) to be empty, performing the datenum function on this "empty" cell successfully gives the date in a numbered format:
Whilst I can solve this issue by using the ndata array (or ignoring the fact that the above doesn't make sense to me) I'd really like to understand what is happening here and whilst a seemingly empty cell can actually be holding information which operations can be performed on.
Best regards,
Jim
I was able to replicate your problem and although I can't answer the intricacies of what is happening, I could offer a suggestion. I was only able to replicate it when I was converting a string of non-date text, which leads me to believe that there might be an issue with the way the data was imported.
Instead of:
[ndata,text,data] = xlsread('bank_statements.xlsx','Past');
maybe try and add in the #convertSpreadsheetDates function if you have it, along with the range of values you want to import, i.e.
[ndata,text,data] = xlsread('bank_statements.xlsx','Past','A2:A100','',#convertSpreadsheetDates);
Probably not what you are looking for but it might help!

Append new columns into Excel with MATLAB

I would like to ask how to use MATLAB to append new columns into existing excel file without altering the original data in the file? In my case I don't know the original number of columns and rows in the file and it is inefficient to open the files one by one and check in practice. Another difficulty is that the new columns may have different number of rows to the existing data so that I cannot use the trick of reading in the data, forming a new matrix and replace the data with the new matrix.
I have seen many posts teaching people how to add new rows but adding new column seems quite a different thing since the columns are named by letters instead of numbers.
Thank you.
You could try reading in the data, use size on the array to determine the number of columns, and then use xlswrite with the range that you want. Have a look here for a function to turn the column number into the excel format: http://au.mathworks.com/matlabcentral/answers/54153-dynamic-ranges-using-xlswrite
Finally I solve it with the following code:
%%%
if (step==1)
xlswrite(filename,array,sheetname,'A1'); %Create the file
else
[~,~,Data]=xlsread(filename,sheetname); %read in all the old data
OriCol=size(Data,2); %get the column number of the old data
NewCol=OriCol+1; %the new array is placed right next to the original data
ColLetter=xlcolumnletter(NewCol);
StartCell=[ColLetter,'1'];
xlswrite(filename,array,sheetname,StartCell);
end

Best way to import numeric and non-numeric data (string) from an excel file into MATLAB?

I want to know the best way of importing both number and non-numeric data (which is string in the present case) from an excel file into MATLAB? By best (or better) way, I mean all the data together in a variable (or data structure).
First, I tried uiopen(filename) function which opens a wizard and from there, I can import the data into a MATLAB variable. However, problem here is that it replaces all the non-numeric data with zeros which is not required. I later on, found that this function calls another function, named xlsread(filename), which is another way (actual way) of importing excel file.
Second (last) way that I tried (which seems to be better) is to use function called importdata(filename) which imports both numeric and non-numeric data into separate structure variables.
However, I am wondering if there exists some other way(s) to import everything into a single variable or data structure?
xlsread is the correct way to import data from Excel spreadsheets,both numeric and non-numeric data. Check the documentation:
[num,txt,raw] = xlsread(___) additionally returns the text fields in
cell array txt, and the unprocessed data (numbers and text) in cell
array raw using any of the input arguments in the previous syntaxes.
If xlRange is specified, leading blank rows and columns in the
worksheet that precede rows and columns with data are returned in raw.

Reading mix between numeric and non-numeric data from excel into Matlab

I have a matrix where the first column contains dates and the first row contains maturities which are alpha/numeric (e.g. 16year).
The rest of the cells contain the rates for each day, which are double precision numbers.
Now I believe xlsread() can only handle numeric data so I think I will need something else or a combination of functions?
I would like to be able to read the table from excel into MATLAB as one array or perhaps a struct() so that I can keep all the data together.
The other problem is that some of the rates are given as '#N/A'. I want the cells where these values are stored to be kept but would like to change the value to blank=" ".
What is the best way to do this? Can it be done as part of the input process?
Well, from looking at matlab reference for xlsread you can use the format
[num,txt,raw] = xlsread(FILENAME)
and then you will have in num a matrix of your data, in txt the unreadable data, i.e. your text headers, and in raw you will have all of your data unprocessed. (including the text headers).
So I guess you could use the raw array, or a combination of the num and txt.
For your other problem, if your rates are 'pulled' from some other source, you can use
=IFERROR(RATE DATA,"")
and then there will be a blank instead of the error code #N\A.
Another solution (only for Windows) would be to use xlsread() format which allows running a function on your imported data,
[num,txt,raw,custom] = xlsread(filename,sheet,xlRange,'',functionHandler)
and let the function replace the NaN values with blank spots. (and you will have your output in the custom array)

How to get the max no. of columns filled in an XLSX file using POI?

I know we can get the max number of columns by iterating over all the rows and calling getLastCellNumber on each row object.. but this approach requires iterating over all the rows which I want to avoid since it will take lot of time for files with a million rows(that’s the kind of files I am expecting to be read).
When POI reads a excel file, it stores the sheet dimensions (first row number, last row number , first col number, last col number) in an object of the DimensionsRecord class. So if I get this object I will get what I need. These objects can be obtained from the Sheet class which is an inner class of POI. I was able to extract what I need for XLS files, but I have hit a roadblock for XLSX files.
Does POI maintain DimensionsRecord object for XLSX also?, if yes has anybody tried to extract it? Or Is there some other by which this can be done?? please help!
Also I wanted to ask, whether my approach is correct or not, i.e I am using the inner classes of POI (it is getting my work done), is this correct or should I solely rely on exposed APIs (too time consuming).
There's a dimension object on XSSF Sheets too. Try:
CTSheetDimension dimension = sheet.getCTWorksheet().getDimension();
String sheetDimensions = dimenson.getRef();
The one issue that springs to mind is I'm not sure if it's required for the dimension (CTDimensions or DimensionsRecord) to always be correct...

Resources