Extract same column from multiple excel files using xlsread - excel

I have a directory on C drive containing a number of excel files of the same format. I would like to copy column H from each file into a new file using the following script I found online:
dirs=dir('C:\xxx\*.xlsx');
dircell=struct2cell(dirs);
filenames=dircell(1,:);
range = 'H:H';
n = (numel(filenames));
for i = 1:n;
Newfile(:,i) = xlsread(filenames{i},range);
end
This gives an error message of "Subscripted assignment dimension mismatch." with only one column extracted in the resulting file (Newfile).
I played around with the range and noticed that error occurs when xlsread reaches the end of the list of the first file and stops when the value is empty. My column H's have different number of filled values (i.e. file 1 has 20, file 2 has 100, file 3 has 3, etc.).
So, my question is whether it is possible to modify this script so that when it encounters an empty cell, either an empty cell or a NaN cell is extracted and most importantly that it will move on to the next column.
Thank you so much for your help in advance!

Not having Matlab at home I have to take it from the top of my head.
Since the column you read, H, has different number of valid entries you should not try to force them into the resulting array NewFile directly but rather use a temporary variable
dirs=dir('C:\xxx\*.xlsx');
dircell=struct2cell(dirs);
filenames=dircell(1,:);
range = 'H:H';
n = numel(filenames);
Newfile = NaN*ones(1, n);
for nf = 1:n;
tempVar = xlsread(filenames{nf},range);
r = size(NewFile,1); % get number of rows in NewFile
if length(tempVar) > r
% Make Newfile big enough to fit column nf
Newfile = [Newfile;NaN*ones(length(tempVar)-r,n)];
end
Newfile(:,nf) = tempVar;
end

Related

Read excel file and assign each coulmn a variable in MATLAB

I am having a simple problem while reading excel data which contains strings, long string, and numbers. Now I need to make each column (I have 11 here) to define separate variables of 1 column vector so that I can plot in MATLAB against each other or combination.
But the problem is the reading the file and creating 11 column vector. When I assign variable the header also comes.
Code:
%fid = fopen('Data_Link.xlsx');
[num,txt,raw] = xlsread('Data_Link.xlsx');
%fclose(fid);
% Extract data from readData
A = raw(:,1);
B = raw(:,2);
C = raw(:,6);
So I need the variables without header
Data file is truncated and given here.
Can anyone help me?
You can use readtable as ThP suggested. But if you want to use xlsread and you want your data without the header, you just need to remove the first row as in the below example:
%fid = fopen('Data_Link.xlsx');
[num,txt,raw] = xlsread('Data_Link.xlsx');
%fclose(fid);
% Extract data from readData
A = raw(2:end,1);
B = raw(2:end,2);
C = raw(2:end,6);
Note that each array will receive data from row 2 to last row.
You can use readtable instead of xlsread.
Using
T = readtable(‘Data_Link.xlsx’)
will result in a table with a variable for each column. For example T.Year would hold the values from the ‘Year’ column and T.Title would hold the values from the ‘Title’ column, etc.

Looping in Matlab to batch process Excel files

I know how to read in multiple Excel files, but am struggling to conduct the same analysis on all of those files. The analysis requires I average some values in different columns, then print those average values to a separate Excel sheet. I can do this with one Excel file, but have trouble figuring out how to print each average value in a different row in the output Excel file. Here is the code I have that works for one file (reads it, averages values in column 4, then prints to a separate Excel file):
data = xlsread('test_1.xlsx');
average_values_1 = data(:,4);
a = [average_values_1];
data_cells = num2cell(a);
column_header ={'Average Value 1'};
row_header(1,1) ={'File 1'}
output = [{' '} column_header; row_header data_cells];
xlswrite('Test Averages.xls', output);
How might I do this over and over again while printing values from each file in the output file as its own table? I suspect a nested loop is in my future.
Thanks in advance.
Here is working example of what you possibly want to do with xlswrite[‍1]:
filename = 'testdata.xlsx'; % Filename to save average values in
for k = 1:10 % Looping for 10 iterations
sheet = 2; % Selecting sheet2
Avg = randi([1 10],1,1); % Generating a random average each time the loop is run
xlRange = char(64+k); % 65 is the ASCII value of A
xlswrite(filename,Avg,sheet,xlRange); % Writing the excel file
end
This code gives the following output [‍2] :
Fig.1: Values are saved in a single row of excel file
If you want to get the output in a single column then use this xlRange = ['A',num2str(k)]; instead. It'll give you the following output [‍2] :
Fig.2: Values are saved in a single column of excel file
[‍1]: Read the documentation of xlswrite for more details.
[‍2]: Output values may vary since random integers are generated.

Reading and Combining Excel Time Series in Matlab- Maintaining Order

I have the following code to read off time series data (contained in sheets 5 to 19 in an excel workbook). Each worksheet is titled "TS" followed by the number of the time series. The process works fine apart from one thing- when I study the returns I find that all the time series are shifted along by 5. i.e. TS 6 becomes the 11th column in the "returns" data and TS 19 becomes the 5th column, TS 15 becomes the 1st column etc. I need them to be in the same order that they are read- such that TS 1 is in the 1st column, TS 2 in the 2nd etc.
This is a problem because I read off the titles of the worksheets ("AssetList") which maintain their actual order throughout subsequent codes. Therefore when I recombine the titles and the returns I find that they do not match. This complicates further manipulation when, for example column 4 is titled "TS 4" but actually contains the data of TS 18.
Is there something in this code that I have wrong?
XL='TimeSeries.xlsx';
formatIn = 'dd/mm/yyyy';
formatOut = 'mmm-dd-yyyy';
Bounds=3;
[Bounds,~] = xlsread(XL,Bounds);
% Determine the number of worksheets in the xls-file:
FirstSheet=5;
[~,AssetList] = xlsfinfo(XL);
lngth=size(AssetList,2);
AssetList(:,1:FirstSheet-1)=[];
% Loop through the number of sheets and RETRIEVE VALUES
merge_count = 1;
for I=FirstSheet:lngth
[FundValues, ~, FundSheet] = xlsread(XL,I);
% EXTRACT DATES AND DATA AND COMBINE
% (TO REMOVE UNNECCESSARY TEXT IN ROWS 1 TO 4)
Fund_dates_data = FundSheet(4:end,1:2);
FundDates = cellstr(datestr(datevec(Fund_dates_data(:,1),...
formatIn),formatOut));
FundData = cell2mat(Fund_dates_data(:,2));
% CREATE TIME SERIES FOR EACH FUND
Fundts{I}=fints(FundDates,FundData,['Fund',num2str(I)]);
if merge_count == 2
Port = merge(Fundts{I-1},Fundts{I},'DateSetMethod','Intersection');
end
if merge_count > 2
Port = merge(Port,Fundts{I},'DateSetMethod','Intersection');
end
merge_count = merge_count + 1;
end
% ANALYSE PORTFOLIO
Returns=tick2ret(Port);
q = Portfolio;
q = q.estimateAssetMoments(Returns)
[qassetmean, qassetcovar] = q.getAssetMoments
This is probably due to merge. By default, it sorts columns alphabetically. Unfortunately, as your naming pattern is "FundN", this means that, for example, Fund10 will normally be sorted before Fund9. So as you're looping over I from 5 to 19, you will have Fund10, through Fund19, followed by Fund4 through Fund9.
One way of solving this would to be always use zero padding (Fund01, Fund02, etc) so that alphabetical order and numerical order are the same. Alternatively, force it to stay in the order you read/merge the data by setting SortColumns to 0:
Port = merge(Port,Fundts{I},'DateSetMethod','Intersection','SortColumns',0);

Matlab number of rows in excel file

is there a command of Matlab to get the number of the written rows in excel file?
firstly, I fill the first row. and then I want to add another rows in the excel file.
so this is my excel file:
I tried:
e = actxserver ('Excel.Application');
filename = fullfile(pwd,'example2.xlsx');
ewb = e.Workbooks.Open(filename);
esh = ewb.ActiveSheet;
sheetObj = e.Worksheets.get('Item', 'Sheet1');
num_rows = sheetObj.Range('A1').End('xlDown').Row
But num_rows = 1048576, instead of 1.
please help, thank you!
If the file is empty, or contains data in only one row, then .End('xlDown').Row; will move to the very bottom of the sheet (1048576 is the number of rows in a Excel 2007+ sheet).
Test if cell A2 is empty first, and return 0 if it is.
Or use Up from the bottom of the sheet
num_rows = sheetObj.Cells(sheetObj.Rows.Count, 1).End('xlUp').Row
Note: I'm not sure of the Matlab syntax, so this may need some adjusting
You can use MATLAB's xlsread function to read in the spreadsheet. This obtains the following fields:
[numbers strings misc] = xlsread('myfile.xlsx');
if you do a size check on strings or misc, this should give you the following:
[rows columns] = size(strings);
testing this, I got rows = 1, columns = 10 (assuming nothing else was beyond 'A' in the spreadsheet).

Using VBA, how can I select every other cell in a row range (to be copied and pasted vertically)?

I have a 2200+ page text file. It is delivered from a customer through a data exchange to us with asterisks to separate values and tildes (~) to denote the end of a row. The file is sent to me as a text file in Word. Most rows are split in two (1 row covers a full line and part of a second line). I transfer segments (10 page chunks) of it at a time into Excel where, unfortunately, any zeroes that occur at the end of a row get discarded in the "text to columns" procedure. So, I eyeball every "long" row to insure that zeroes were not lost and manually re-enter any that were.
Here is a small bit of sample data:
SDQ EA 92 1551 378 1601 151 1603 157 1604 83
The "SDQ, EA, and 92" are irrelevant (artifacts of data transmission). I want to use Excel and/or VBA to select 1551, 1601, 1603, and 1604 (these are store numbers) so that I can copy those values, and transpose paste them vertically. I will then go back and copy 378, 151, 157, and 83 (sales values) so that I can transpose paste them next to the store numbers. The next two rows of data contain the same store numbers but give the corresponding dollar values. I will only need to copy the dollar values so they can be transpose pasted vertically next to unit values (e.g. 378, 151, 157, and 83).
Just being able to put my cursor on the first cell of interest in the row and run a macro to copy every other cell would speed up my work tremendously. I have tried using ActiveCell and Offset references to select a range to copy, but have not been successful. Does any have any suggestions for me? Thanks in advance for the help.
It's hard to give a complete answer without more information about the file.
I think if your input data is 2200+ pages long, it's unlikely that opening it with the default excel opening functions is the way to go. Especially since Excel has maximum number of rows and columns. If the file is a text file (.txt) I would suggest opening it with VBA and reading each line, one at a time, and processing the data.
Here's an example to get you started. Just keep in mind that this is transposing each row of text into columns of data, so you will quickly fill all the columns of excel long before you run thru 2200 pages of text. But it's just an example.
Sub getData()
dFile = FreeFile
sFile = "c:\code\test.txt"
Open sFile For Input As #dFile
c = 1
'keep doing this until end of file
Do While Not EOF(dFile)
'read line into dataLine
Input #dFile, dataLine
' break up line into words based on spaces
j = Split(dataLine, " ")
jLength = UBound(j)
If jLength > 2 Then
r = 1
'ignore first 3 words
'and get every other word
'transpose rows of text into columns
For word = 3 To jLength Step 2
Cells(r, c) = j(word)
r = r + 1
Next word
End If
c = c + 1
Loop
Close #Data
End Sub

Resources