Merging data from several columns - excel

I have a script that does the following: I have an Excel spreadsheet with three columns of Chlorophyll concentrations and sometimes more. Eg. 1) WholeFluro 2) WholeHPLC 3) Net1Fluro but not all three columns have all 3 datasets. For instance, I often have WholeFluro and WholeHPLC OR just Net1Fluro. In order to have more data I like to merge all the data into one data column. In addition I also have the corresponding locations in the ocean where I collected the data from as well as the depths in the water column from where my sample was draw.
I have a MATLAB script that I use for this but it’s very long and it takes me long to use it for different types of data that I want to merge. I was wondering if I could get some help to make it more efficient as it would also help me learn to write better scripts.
My script is as follows:
% go to the data folder
cd C:\Users\Documents\CHLOROPHYLL_DATA_PROCESSED\
filename = 'Knorr-Working_SizeFracNew.xlsx';
[num,txt,raw] = xlsread(filename);
% Pick out the relevant data that you need from the large spreadsheet
%Use Raw only to get header info because I can see the header info that I %cannot see in ‘num’
% Use ‘num’ to get actual data
FracsData=num(:,[1:2:7 8:2:17 19:23 24 29 34]);
FracsHeaderInfo=raw(:,[1:2:7 8:2:17 19:23 24 29 34]);
%%
% choose the various Chlorophyll data from the large spreadsheet that I want %to merge
% Here I want to Merge the two columns which have Whole, two columns with
% Net1, two columns with % Net2 and two columns with Net3
WholeFluro=FracsData(:,6);
WholeHPLC=FracsData(:,10);
Net1Fluro=FracsData(:,7);
Net1HPLC=FracsData(:,15);
Net2Fluro=FracsData(:,8);
Net2HPLC=FracsData(:,16);
Net3Fluro=FracsData(:,9);
Net3HPLC=FracsData(:,17);
%%
%Replace NaNs with -999 or I get an error when NaNs from various columns %clash
WholeFluro(isnan(WholeFluro))=-999;
WholeHPLC(isnan(WholeHPLC))=-999;
Net1Fluro(isnan(Net1Fluro))=-999;
Net1HPLC(isnan(Net1HPLC))=-999;
Net2Fluro(isnan(Net2Fluro))=-999;
Net2HPLC(isnan(Net2HPLC))=-999;
Net3Fluro(isnan(Net3Fluro))=-999;
Net3HPLC(isnan(Net3HPLC))=-999;
% Here I create 4 variables that will hold my merged data for whole, N1, N2, % & N3 that are equal to the % the original data
MergedDataWhole = WholeFluro;
MergedDataN1 = Net1Fluro;
MergedDataN2 = Net2Fluro;
MergedDataN3 = Net3Fluro;
%find the empty places and fill them up with the HPLC data at points x – I %give HPLC data precedence
x=find(MergedDataWhole==-999);
MergedDataWhole(x)=WholeHPLC(x);
x=find(MergedDataN1==-999);
MergedDataN1(x)=Net1HPLC(x);
x=find(MergedDataN2==-999);
MergedDataN2(x)=Net2HPLC(x);
x=find(MergedDataN3==-999);
MergedDataN3(x)=Net3HPLC(x);
MergedDataALL=horzcat(MergedDataWhole,MergedDataN1,MergedDataN2,MergedDataN3);
MergedDataALL(MergedDataFracs<0)=0;
outputfilename = 'Merged.xlsx';
[suc,msg] = xlswrite(outputfilename,Merged);

Related

Looping in Matlab to batch process Excel files

I know how to read in multiple Excel files, but am struggling to conduct the same analysis on all of those files. The analysis requires I average some values in different columns, then print those average values to a separate Excel sheet. I can do this with one Excel file, but have trouble figuring out how to print each average value in a different row in the output Excel file. Here is the code I have that works for one file (reads it, averages values in column 4, then prints to a separate Excel file):
data = xlsread('test_1.xlsx');
average_values_1 = data(:,4);
a = [average_values_1];
data_cells = num2cell(a);
column_header ={'Average Value 1'};
row_header(1,1) ={'File 1'}
output = [{' '} column_header; row_header data_cells];
xlswrite('Test Averages.xls', output);
How might I do this over and over again while printing values from each file in the output file as its own table? I suspect a nested loop is in my future.
Thanks in advance.
Here is working example of what you possibly want to do with xlswrite[‍1]:
filename = 'testdata.xlsx'; % Filename to save average values in
for k = 1:10 % Looping for 10 iterations
sheet = 2; % Selecting sheet2
Avg = randi([1 10],1,1); % Generating a random average each time the loop is run
xlRange = char(64+k); % 65 is the ASCII value of A
xlswrite(filename,Avg,sheet,xlRange); % Writing the excel file
end
This code gives the following output [‍2] :
Fig.1: Values are saved in a single row of excel file
If you want to get the output in a single column then use this xlRange = ['A',num2str(k)]; instead. It'll give you the following output [‍2] :
Fig.2: Values are saved in a single column of excel file
[‍1]: Read the documentation of xlswrite for more details.
[‍2]: Output values may vary since random integers are generated.

matlab multidimensional array to excel

I have a Matlab script with an output of a multidimensional array LCOE (3 dimensions) of size 16:12:34. This output needs to be written to Excel, therefore I use xlswrite.
I have tried this:
T = LCOE(:,:,1);
xlswrite('filename', T, 'sheetname', 'B2');
This does what it's supposed to, but only writes one table to excel, and I would like to write all 34 tables to excel underneath each other, spaced by `2 blank rows.
Then, I tried this:
for y = 1:34
T = LCOE(:,:,y)
xlswrite('filename', T, 'sheetname', strcat('B', num2str(2+(y-1)*18)));
This works, but is very slow, since matlab writes each table separately to excel. Is there a faster way to do this?
Instead of using xlswrite again and again. Dump all the values of a 3D matrix into a 2D matrix and add rows of NaNs so that when you write that to excel file, you get 2 blank rows.
Following code improves the execution time by the factor of more than 10.
LCOE = 100*rand(16,12,34); % Taking random values for LCOE
T = NaN(18*34-2 ,12); % 1. Pre-allocation 2. 16+2 = 18
% Following loop dumps all the values of 3D matrix into a 2-D followed by 2 rows of NaN
% to leave 2 blank rows in excel file.
for k = 1:34
T(18*(k-1)+[1:16], :) = LCOE(:,:,k);
end
xlswrite('filename', T, 'sheetname', 'B2'); % Writing the Excel file
In my system, my code takes about 1 second to execute while your code takes about 10.5 seconds. So that's a significant difference.

xlswrite in case of vectors

I have a .mat file which contains titles={'time','data'} and 2 column vectors:
time=[1;2;3;4;5] and data=[10;20;30;40;50].
I created a new cell called table={'time','data';time data} and i used:
xlswrite(filename,table);
However, when i open the xlsx file it shows me only the titles and not showing the numbers.
I saw that xlswrite will show empty cell in case im trying to export more than 1 number in a cell.
Is there anything i can do to export the whole vector instead of writing each value in it's cell?
The final result that i tried to get is like this:
time data
1 10
2 20
3 30
4 40
5 50
You have a couple options. Usually what I do is break it into two xlswrite calls, one for the header and one for the data.
titles = {'time','data'};
time = [1;2;3;4;5];
data = [10;20;30;40;50];
xlswrite('myfile.xlsx', titles, 'Sheet1', 'A1');
xlswrite('myfile.xlsx', [time, data], 'Sheet1', 'A2');
Alternatively, if you have R2013b or newer you can also use the table builtin, which has its own method for writing out data. With the same sample data:
mytable = table(time, data, 'VariableNames', titles);
writetable(mytable, 'myfile.xlsx');

How can I write looping data matrix into same excel file?

I have a looping data matrix in matlab, and I want to write it into same excel file. If I use the xlswrite('name.xls', M) it'll make a 'name.xls' excel, contained one matrix (value of the last looping). How can I write all my looping matrix (lets say it has 10 matrix with 13 columns) in one file excel and that excel file will contain all my matrix with 13 columns (so there'll be 10 rows with 13 columns). Please help, Thanks. -Machmum
In each loop iteration add your newest vector to a single matrix. Then only after the loop, write this matrix to a .xls file:
M = zeros(10,13); %// Pre-allocation like this is essential for speed
for k = 1:10
... %Your code
M(k,:) = ... %//Put your new 1-by-13 vector that you create each iteration here
end
xlswrite(file_name, M)
Although it would be better to create the matrix first and then write it in one go as Dan has suggested, it is possible to specify where xlswrite starts writing, which allows you to append data to existing files. If only specifying a start location, you must also give xlswrite a sheet name. This will be slower than pre-calculating the matrix and then just calling xlswrite once, though.
Simple example:
M = 1:10;
for n = 1:10
t = sprintf('A%d',n); % starting cell A1 through A10
xlswrite('testdata.xls',M*n,'Sheet1',t); % writes one row
end

Reading and Combining Excel Time Series in Matlab- Maintaining Order

I have the following code to read off time series data (contained in sheets 5 to 19 in an excel workbook). Each worksheet is titled "TS" followed by the number of the time series. The process works fine apart from one thing- when I study the returns I find that all the time series are shifted along by 5. i.e. TS 6 becomes the 11th column in the "returns" data and TS 19 becomes the 5th column, TS 15 becomes the 1st column etc. I need them to be in the same order that they are read- such that TS 1 is in the 1st column, TS 2 in the 2nd etc.
This is a problem because I read off the titles of the worksheets ("AssetList") which maintain their actual order throughout subsequent codes. Therefore when I recombine the titles and the returns I find that they do not match. This complicates further manipulation when, for example column 4 is titled "TS 4" but actually contains the data of TS 18.
Is there something in this code that I have wrong?
XL='TimeSeries.xlsx';
formatIn = 'dd/mm/yyyy';
formatOut = 'mmm-dd-yyyy';
Bounds=3;
[Bounds,~] = xlsread(XL,Bounds);
% Determine the number of worksheets in the xls-file:
FirstSheet=5;
[~,AssetList] = xlsfinfo(XL);
lngth=size(AssetList,2);
AssetList(:,1:FirstSheet-1)=[];
% Loop through the number of sheets and RETRIEVE VALUES
merge_count = 1;
for I=FirstSheet:lngth
[FundValues, ~, FundSheet] = xlsread(XL,I);
% EXTRACT DATES AND DATA AND COMBINE
% (TO REMOVE UNNECCESSARY TEXT IN ROWS 1 TO 4)
Fund_dates_data = FundSheet(4:end,1:2);
FundDates = cellstr(datestr(datevec(Fund_dates_data(:,1),...
formatIn),formatOut));
FundData = cell2mat(Fund_dates_data(:,2));
% CREATE TIME SERIES FOR EACH FUND
Fundts{I}=fints(FundDates,FundData,['Fund',num2str(I)]);
if merge_count == 2
Port = merge(Fundts{I-1},Fundts{I},'DateSetMethod','Intersection');
end
if merge_count > 2
Port = merge(Port,Fundts{I},'DateSetMethod','Intersection');
end
merge_count = merge_count + 1;
end
% ANALYSE PORTFOLIO
Returns=tick2ret(Port);
q = Portfolio;
q = q.estimateAssetMoments(Returns)
[qassetmean, qassetcovar] = q.getAssetMoments
This is probably due to merge. By default, it sorts columns alphabetically. Unfortunately, as your naming pattern is "FundN", this means that, for example, Fund10 will normally be sorted before Fund9. So as you're looping over I from 5 to 19, you will have Fund10, through Fund19, followed by Fund4 through Fund9.
One way of solving this would to be always use zero padding (Fund01, Fund02, etc) so that alphabetical order and numerical order are the same. Alternatively, force it to stay in the order you read/merge the data by setting SortColumns to 0:
Port = merge(Port,Fundts{I},'DateSetMethod','Intersection','SortColumns',0);

Resources