Table Segmentation,Data Reduction, Sorting And Writing in MATLAB/EXCEL - excel

I would like to do a data reduction operation on a spreadsheet. Preferably I would like to use MATLAB/(or excel) since I need separate output files for each case.
The link is for the spreadsheet is below
Spreadsheet
A screenshot of the spreadsheet is as below
The output I required in text files is something as below
The first sheet in the .xls file is the main input. Wheras the the following sheets (d**) are my required output. I also need these sheets in a separate ASCII file (.dat) to plot hem later on. Here is how the algorithm works
Lookup the number/string in column B(FileName)
Extract all data in Columns C and D (Saturation and ETC) with same FileName Value(Column B)
Lookup the matching FileName(Column B) value in Column E (ImageIndex).
Copy Value of ImageName(Column F) to the corresponding Value in Image(IndexColumn E)
Result would be three columns (ImageName,Saturation,ETC). ImageName would be same for each subcase
Sort the columns based on Saturation
Write each sub case as a separate .dat file
I tried using a few recipes using categorical arrays (findgroups and splitapply) in MATLAB. Didn't seem to work out for me. I would be later working on a larger data set so automation is necessary. I think this could be done using macros on excel, but I would prefer using MATLAB since I would use MATLAB to plot the data. Any other alternative suggestions are welcome
Thanks,

Here's a Matlab solution. You could do it with a rather convoluted accumarray call, but readability would be rather bad, so I'm opting for a loop here.
out is a structure which you can use to either write files, or to plot the data.
tbl = readtable('yourFile.xls');
%# get the group indices for the files
%# this assumes that you have cleaned up the dash after the 1
%# so that all of the entries in the FileName column are numeric
idx = tbl.FileName;
%# the uIdx business is to account for the possibility
%# that there are images missing from the sequence
uIdx = unique(idx);
nImages = length(uIdx);
%# preassign output structure
out(1:nImages) = struct('name','','saturation',0,'etc',0);
%# loop to extract relevant information
for iImage = uIdx(:)'
myIdx = idx==iImage;
data = tbl(myIdx,{'Saturation','ETC'});
data = sortrows(data,'Saturation');
name = tbl.ImageName{tbl.ImageIdx==iImage};
out(iImage==uIdx).name = name;
out(iImage==uIdx).saturation = data.Saturation;
out(iImage==uIdx).etc= data.ETC;
end
%# plotting
for iImage = 1:nImages
figure('name',out(iImage).name)
plot(out(iImage).saturation, out(iImage).etc,'.');
end

Related

For loop through different excel files AND different worksheets (Matlab)

I try to loop not only through different excel sheets(~125), but also through different excel files (~12). I managed to write a code for the sheets, but now I am struggling how to scale that up to different excel files. The excel-files all have the same structure and number/name of sheets.
Can anyone help me? Thanks a lot in advance!!
foldername = 'Raw_data';
cd(foldername);
fnames = dir('*raw.xlsx');
%% extraction of sheet name
[~, sheet_name] = xlsfinfo('Test_raw.xlsx');
%% additional array for merging later
cali=[1; 2; 5; 10; 20; 50; 100; 200; 500; 1000];
%for i=1:length(fnames) %I guess ?
for k=1:numel(sheet_name) %operation for all sheets
%extract data of one excel file, but different sheets
[~,~,raw{k}]=xlsread('Test_raw.xlsx',sheet_name{k},'A5:A14');
x=vertcat(raw{:});
end
B = reshape(x,10,k);
numind = cellfun(#isnumeric, B); %identifies numeric values
B(~numind) = {NaN} %NOT num. values to NaN
b =cell2mat(B);
final_data = [cali b];
%end
You want to loop through all your excel files. You already gathered all the filenames in fnames.
You basically did setup your for-loop, the only thing missing is replacing 'Test_raw.xlsx' in xlsread with fnames(i).name.
for i=1:length(fnames) %I guess ?
for k=1:numel(sheet_name) %operation for all sheets
%extract data of one excel file, but different sheets
[~,~,raw{k}]=xlsread('Test_raw.xlsx',sheet_name{k},'A5:A14');
x=vertcat(raw{:});
end
end
Be careful that you have to adapt your final_data variable.
For just all the data from all the files in it you could use this variable as a cell-array containing an element for each file. It is good practice allocating this array before entering the loop
final_data = cell(length(fnames),1);
%% here go the loops
clear B
B = reshape(x,10,k);
numind = cellfun(#isnumeric, B); %identifies numeric values
B(~numind) = {NaN} %NOT num. values to NaN
b =cell2mat(B);
final_data{i} = [cali b];
B, numind and b will be temporary working variables that are being overwritten each loop. Because of this, clearing them before their next use can be good practice.
After the loop, you can access your data with e.g. final_data{5} to access the fifth file.

Rename a variable in the workspace through a list of names in a cell - MATLAB

I have some data in an excel file.
At first, I read the file and create a list of names stored in a cell through this command:
[status,sheets] = xlsfinfo(filename);
and I get:
sheets = {'A1','A2','B1','B2','C1'};
(these are the names of excelsheets in the excel file)
and through some process I obtain a matrix for each of these names (excelsheets). The final matrix for each is called:
completeData = [x,v,z,y,s];
Now, I want to:
change the name of "completeData" variable to each of its corresponding excelsheet (from the "sheets" cell).
then save this newly renamed variable (the old "completeData") with the name of its corresponding excelsheet (again from the "sheets" cell).
So far, I have only managed to save each completeData matrix resulting for each excel sheet separately with the name of the sheets [which is point number 2] through this command:
save(sprintf('%s',sheets{excelSheet}),'completeData');
(here I have a loop over "excelsheet")
The problem is that when I have mange excel sheets, and save all of them in a folder my hard disk, whenever I run any of these saved variables I get "completeData" in the workspace which is not what I want. I want to get also the name of the excelsheet.
How can I do this?
P.S. through this command:
eval(sprintf([sheets{excelsheet} '=completeData;']));
(again another loop over excelsheet)
I have managed to create several matrices with the names of excel sheets. But I do not know how I can save these very good newly created variables through a loop so that I do not do it one by one.
Following up the comments above, I tried to write you a simplified example:
%% Initialise
names = {'name1', 'name2', 'name3'};
data = randn(10, 3);
%% it create three fields called name1, name2 and name3 from data, in s
for ind=1:size(data, 2)
s.(names{ind}) = data(:, ind);
end
Hope it helps!
So this is how it worked:
First read information:
[status,sheets] = xlsfinfo(filename);
NamesList = sheets(:,1);
Now, apply the primarily collected information to read again with details:
for ind = 1:length(NamesList)
% Only read those particular sheets
[num,txt,raw] = xlsread(filename,NamesList{ind,:});
var.(NamesList{ind})={txt(:,MyColumn),num};
clear num txt raw
end

How to loop through excel sheets, perform calculations, and compile results

I have roughly 70,000 sheets that all have to have calculations done, and then all results compiled into a new sheet (which would be 70,000 lines long).
It needs to be sorted by date.
I'm VERY very very poor at matlab, but I've what I need the script to do for each excel sheet, I'm just unsure how to make it do them for all.
Thank you!!! (I took out some of the not important code)
%Reading in excel sheet
B = xlsread('24259893-008020361800.TorqueData.20160104.034602AM.csv');
%Creating new matrix
[inYdim, inXdim] = size(B);
Ydim = inYdim;
[num,str,raw]=xlsread('24259893-008020361800.TorqueData.20160104.034602AM.csv',strcat('A1:C',num2str(Ydim)));
%Extracting column C
C=raw(:,3);
for k = 1:numel(C)
if isnan(C{k})
C{k} = '';
end
end
%Calculations
TargetT=2000;
AvgT=mean(t12);
TAcc=((AvgT-TargetT)/TargetT)*100 ;
StdDev=std(B(ind1:ind2,2));
ResTime=t4-t3;
FallTime=t6-t5;
DragT=mean(t78);
BreakInT=mean(t910);
BreakInTime=(t10-t9)/1000;
BreakInE=BreakInT*BreakInTime*200*.1047;
%Combining results
Results=[AvgT TAcc StdDev ResTime FallTime DragT BreakInT BreakInTime BreakInE]
I think I need to do something along the lines of:
filenames=dir('*.csv')
and I found this that may be useful:
filenames=dir('*.csv');
for file=filenames'
csv=load(file.name);
with stuff in here
end
You have the right idea, but you need to index your file names in order to be able to step through them in the for loop.
FileDir = 'Your Directory';
FileNames = {'Test1';'Test2';'Test3'};
for k=1:length(FileNames)
file=[FileDir,'/',FileNames{k}]);
[outputdata]=xlsread(file,sheet#, data locations);
THE REST OF YOUR LOOP, Indexed by k
end
How you choose to get the file names and directory is up to you.

Working with Excel sheets in MATLAB

I need to import some Excel files in MATLAB and work on them. My problem is that each Excel file has 15 sheets and I don't know how to "number" each sheet so that I can make a loop or something similar (because I need to find the average on a certain column on each sheet).
I have already tried importing the data and building a loop but MATLAB registers the sheets as chars.
Use xlsinfo to get the sheet names, then use xlsread in a loop.
[status,sheets,xlFormat] = xlsfinfo(filename);
for sheetindex=1:numel(sheets)
[num,txt,raw]=xlsread(filename,sheets{sheetindex});
data{sheetindex}=num; %keep for example the numeric data to process it later outside the loop.
end
I 've just remembered that i posted this question almost 2 years ago, and since I figured it out, I thought that posting the answer could prove useful to someone in the future.
So to recap; I needed to import a single column from 4 excel files, with each file containing 15 worksheets. The columns were of variable lengths. I figured out two ways to do this. The first one is by using the xlsread function with the following syntax.
for count_p = 1:2
a = sprintf('control_group_%d.xls',count_p);
[status,sheets,xlFormat] = xlsfinfo(a);
for sheetindex=1:numel(sheets)
[num,txt,raw]=xlsread(a,sheets{sheetindex},'','basic');
data{sheetindex}=num;
FifthCol{count_p,sheetindex} = (data{sheetindex}(:,5));
end
end
for count_p = 3:4
a = sprintf('exercise_group_%d.xls',(count_p-2));
[status,sheets,xlFormat] = xlsfinfo(a);
for sheetindex=1:numel(sheets)
[num,txt,raw]=xlsread(a,sheets{sheetindex},'','basic');
data{sheetindex}=num;
FifthCol{count_p,sheetindex} = (data{sheetindex}(:,5));
end
end
The files where obviously named control_group_1, control_group_2 etc. I used the 'basic' input in xlsread, because I only needed the raw data from the files, and it proved to be much faster than using the full functionality of the function.
The second way to import the data, and the one that i ended up using, is building your own activeX server and running a single excelapplication on it. Xlsread "opens" and "closes" an activeX server each time it's called so it's rather time consuming (using the 'basic' input does not though). The code i used is the following.
Folder=cd(pwd); %getting the working directory
d = dir('*.xls'); %finding the xls files
N_File=numel(d); % Number of files
hexcel = actxserver ('Excel.Application'); %starting the activeX server
%and running an Excel
%Application on it
hexcel.DisplayAlerts = true;
for index = 1:N_File %Looping through the workbooks(xls files)
Wrkbk = hexcel.Workbooks.Open(fullfile(pwd, d(index).name)); %VBA
%functions
WorkName = Wrkbk.Name; %getting the workbook name %&commands
display(WorkName)
Sheets=Wrkbk.Sheets; %sheets handle
ShCo(index)=Wrkbk.Sheets.Count; %counting them for use in the next loop
for j = 1:ShCo(index) %looping through each sheet
itemm = hexcel.Sheets.Item(sprintf('sheet%d',j)); %VBA commands
itemm.Activate;
robj = itemm.Columns.End(4); %getting the column i needed
numrows = robj.row; %counting to the end of the column
dat_range = ['E1:E' num2str(numrows)]; %data range
rngObj = hexcel.Range(dat_range);
xldat{index, j} = cell2mat(rngObj.Value); %getting the data in a cell
end;
end
%invoke(hexcel);
Quit(hexcel);
delete(hexcel);

How to import lots of data into matlab from a spreadsheet?

I have an excel spreadsheet with lots of data that I want to import into matlab.
filename = 'for_matlab.xlsx';
sheet = (13*2)+ 1;
xlRange = 'A1:G6';
all_data = {'one_a', 'one_b', 'two_a', 'two_b', 'three_a', 'three_b', 'four_a', 'four_b', 'five_a', 'five_b', 'six_a', 'six_b', 'seven_a', 'seven_b', 'eight_a', 'eight_b', 'nine_a', 'nine_b', 'ten_a', 'ten_b', 'eleven_a', 'eleven_b', 'twelve_a', 'twelve_b', 'thirteen_a', 'thirteen_b', 'fourteen_a'};
%read data from excel spreadsheet
for i=1:sheet,
all_data{i} = xlsread(filename, sheet, xlRange);
end
Each element of the 'all_data' vector has a corresponding matrix in separate excel sheet. The code above imports the last matrix only into all of the variables. Could somebody tell me how to get it so I can import these matrices into individual matlab variables (without calling the xlsread function 28 times)?
You define a loop using i but then put sheet in the actual xlsread call, which will just make it read repeatedly from the same sheet (the value of the variable sheet is not changing). Also not sure whether you intend to somehow save the contents of all_data, as written there's no point in defining it that way as it will just be overwritten.
There are two ways of specifying the sheet using xlsread.
1) Using a number. If you intended this then:
all_data{i} = xlsread(filename, i, xlRange);
2) Using the name of the sheet. If you intended this and the contents of all_data are the names of sheets, then:
data{i} = xlsread(filename, all_data{i}, xlRange); %avoiding overwriting

Resources