How to loop through excel sheets, perform calculations, and compile results - excel

I have roughly 70,000 sheets that all have to have calculations done, and then all results compiled into a new sheet (which would be 70,000 lines long).
It needs to be sorted by date.
I'm VERY very very poor at matlab, but I've what I need the script to do for each excel sheet, I'm just unsure how to make it do them for all.
Thank you!!! (I took out some of the not important code)
%Reading in excel sheet
B = xlsread('24259893-008020361800.TorqueData.20160104.034602AM.csv');
%Creating new matrix
[inYdim, inXdim] = size(B);
Ydim = inYdim;
[num,str,raw]=xlsread('24259893-008020361800.TorqueData.20160104.034602AM.csv',strcat('A1:C',num2str(Ydim)));
%Extracting column C
C=raw(:,3);
for k = 1:numel(C)
if isnan(C{k})
C{k} = '';
end
end
%Calculations
TargetT=2000;
AvgT=mean(t12);
TAcc=((AvgT-TargetT)/TargetT)*100 ;
StdDev=std(B(ind1:ind2,2));
ResTime=t4-t3;
FallTime=t6-t5;
DragT=mean(t78);
BreakInT=mean(t910);
BreakInTime=(t10-t9)/1000;
BreakInE=BreakInT*BreakInTime*200*.1047;
%Combining results
Results=[AvgT TAcc StdDev ResTime FallTime DragT BreakInT BreakInTime BreakInE]
I think I need to do something along the lines of:
filenames=dir('*.csv')
and I found this that may be useful:
filenames=dir('*.csv');
for file=filenames'
csv=load(file.name);
with stuff in here
end

You have the right idea, but you need to index your file names in order to be able to step through them in the for loop.
FileDir = 'Your Directory';
FileNames = {'Test1';'Test2';'Test3'};
for k=1:length(FileNames)
file=[FileDir,'/',FileNames{k}]);
[outputdata]=xlsread(file,sheet#, data locations);
THE REST OF YOUR LOOP, Indexed by k
end
How you choose to get the file names and directory is up to you.

Related

Clean data in excel that comes in varying formats

I have an excel table that contain values in these formats. The tables span over 30000 entries.
I need to clean this data so that only the numbers directly after V- are left. This would mean that when the value is SV-51140r3_rule, V-4407..., I would only want 4407 to remain and when the value is SV-245744r822811_rule, I would only want 245744 to remain. I have about 10 formulas that can handle these variations, but it requires a lot of manual labor. I've also used the text to column feature of excel to clean this data as well, but it takes about 30 minutes to an hour to go through the whole document. I'm looking for ways that I can streamline this process so that one formula or function can handle all of these different variations. I'm open to using VBA but don't have a whole lot of experience with it and I am unable to use Pandas or any IDE or programming language. Help please!!
I've used text to columns to clean data that way and I've used a variation of this formula
=IFERROR(RIGHT(A631,LEN(A631)-FIND("#",SUBSTITUTE(A631,"-","#",LEN(A631)-LEN(SUBSTITUTE(A631,"-",""))))),A631)
Depending on your version of Excel, either of these should work. If you have the ability to use the Let function, it will improve your performance, as this outstanding article articulates.
If you're on a really old version of excel, you'll need to hit ctl shift enter to make array formula work.
While these look daunting, all these functions are doing is finding the last V (by this function) =SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄","") and then looping through each character and only returning numbers.
Obviously the mushroom 🍄 could be any character that one would consider improbable to appear in the actual data.
Old School
=TEXTJOIN("",TRUE,IF(ISNUMBER(MID(MID(SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄",""),
FIND("-",SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄","")),9^9),
FILTER(COLUMN($1:$1),COLUMN($1:$1)<=LEN(MID(SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄",""),
FIND("-",SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄","")),9^9))),1)+0),
MID(MID(SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄",""),
FIND("-",SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄","")),9^9),
FILTER(COLUMN($1:$1),COLUMN($1:$1)<=LEN(MID(SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄",""),
FIND("-",SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄","")),9^9))),1),""))
Let Function
(use this if you can)
=LET(zText,SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄",""),
TEXTJOIN("",TRUE,IF(ISNUMBER(MID(MID(zText,FIND("-",zText),9^9),
FILTER(COLUMN($1:$1),COLUMN($1:$1)<=LEN(MID(zText,FIND("-",zText),9^9))),1)+0),
MID(MID(zText,FIND("-",zText),9^9),
FILTER(COLUMN($1:$1),COLUMN($1:$1)<=LEN(MID(zText,FIND("-",zText),9^9))),1),"")))
VBA Custom Function
You could also use a VBA custom function to accomplish what you want.
Function getNumbersAfterCharcter(aCell As Range, aCharacter As String) As String
Const errorValue = "#NoValuesInText"
Dim i As Long, theValue As String
For i = Len(aCell.Value) To 1 Step -1
theValue = Mid(aCell.Value, i, 1)
If IsNumeric(theValue) Then
getNumbersAfterCharcter = Mid(aCell.Value, i, 1) & getNumbersAfterCharcter
ElseIf theValue = aCharacter Then
Exit Function
End If
Next i
If getNumbersAfterCharcter = "" Then getNumbersAfterCharcter = errorValue
End Function

For loop through different excel files AND different worksheets (Matlab)

I try to loop not only through different excel sheets(~125), but also through different excel files (~12). I managed to write a code for the sheets, but now I am struggling how to scale that up to different excel files. The excel-files all have the same structure and number/name of sheets.
Can anyone help me? Thanks a lot in advance!!
foldername = 'Raw_data';
cd(foldername);
fnames = dir('*raw.xlsx');
%% extraction of sheet name
[~, sheet_name] = xlsfinfo('Test_raw.xlsx');
%% additional array for merging later
cali=[1; 2; 5; 10; 20; 50; 100; 200; 500; 1000];
%for i=1:length(fnames) %I guess ?
for k=1:numel(sheet_name) %operation for all sheets
%extract data of one excel file, but different sheets
[~,~,raw{k}]=xlsread('Test_raw.xlsx',sheet_name{k},'A5:A14');
x=vertcat(raw{:});
end
B = reshape(x,10,k);
numind = cellfun(#isnumeric, B); %identifies numeric values
B(~numind) = {NaN} %NOT num. values to NaN
b =cell2mat(B);
final_data = [cali b];
%end
You want to loop through all your excel files. You already gathered all the filenames in fnames.
You basically did setup your for-loop, the only thing missing is replacing 'Test_raw.xlsx' in xlsread with fnames(i).name.
for i=1:length(fnames) %I guess ?
for k=1:numel(sheet_name) %operation for all sheets
%extract data of one excel file, but different sheets
[~,~,raw{k}]=xlsread('Test_raw.xlsx',sheet_name{k},'A5:A14');
x=vertcat(raw{:});
end
end
Be careful that you have to adapt your final_data variable.
For just all the data from all the files in it you could use this variable as a cell-array containing an element for each file. It is good practice allocating this array before entering the loop
final_data = cell(length(fnames),1);
%% here go the loops
clear B
B = reshape(x,10,k);
numind = cellfun(#isnumeric, B); %identifies numeric values
B(~numind) = {NaN} %NOT num. values to NaN
b =cell2mat(B);
final_data{i} = [cali b];
B, numind and b will be temporary working variables that are being overwritten each loop. Because of this, clearing them before their next use can be good practice.
After the loop, you can access your data with e.g. final_data{5} to access the fifth file.

Table Segmentation,Data Reduction, Sorting And Writing in MATLAB/EXCEL

I would like to do a data reduction operation on a spreadsheet. Preferably I would like to use MATLAB/(or excel) since I need separate output files for each case.
The link is for the spreadsheet is below
Spreadsheet
A screenshot of the spreadsheet is as below
The output I required in text files is something as below
The first sheet in the .xls file is the main input. Wheras the the following sheets (d**) are my required output. I also need these sheets in a separate ASCII file (.dat) to plot hem later on. Here is how the algorithm works
Lookup the number/string in column B(FileName)
Extract all data in Columns C and D (Saturation and ETC) with same FileName Value(Column B)
Lookup the matching FileName(Column B) value in Column E (ImageIndex).
Copy Value of ImageName(Column F) to the corresponding Value in Image(IndexColumn E)
Result would be three columns (ImageName,Saturation,ETC). ImageName would be same for each subcase
Sort the columns based on Saturation
Write each sub case as a separate .dat file
I tried using a few recipes using categorical arrays (findgroups and splitapply) in MATLAB. Didn't seem to work out for me. I would be later working on a larger data set so automation is necessary. I think this could be done using macros on excel, but I would prefer using MATLAB since I would use MATLAB to plot the data. Any other alternative suggestions are welcome
Thanks,
Here's a Matlab solution. You could do it with a rather convoluted accumarray call, but readability would be rather bad, so I'm opting for a loop here.
out is a structure which you can use to either write files, or to plot the data.
tbl = readtable('yourFile.xls');
%# get the group indices for the files
%# this assumes that you have cleaned up the dash after the 1
%# so that all of the entries in the FileName column are numeric
idx = tbl.FileName;
%# the uIdx business is to account for the possibility
%# that there are images missing from the sequence
uIdx = unique(idx);
nImages = length(uIdx);
%# preassign output structure
out(1:nImages) = struct('name','','saturation',0,'etc',0);
%# loop to extract relevant information
for iImage = uIdx(:)'
myIdx = idx==iImage;
data = tbl(myIdx,{'Saturation','ETC'});
data = sortrows(data,'Saturation');
name = tbl.ImageName{tbl.ImageIdx==iImage};
out(iImage==uIdx).name = name;
out(iImage==uIdx).saturation = data.Saturation;
out(iImage==uIdx).etc= data.ETC;
end
%# plotting
for iImage = 1:nImages
figure('name',out(iImage).name)
plot(out(iImage).saturation, out(iImage).etc,'.');
end

Working with Excel sheets in MATLAB

I need to import some Excel files in MATLAB and work on them. My problem is that each Excel file has 15 sheets and I don't know how to "number" each sheet so that I can make a loop or something similar (because I need to find the average on a certain column on each sheet).
I have already tried importing the data and building a loop but MATLAB registers the sheets as chars.
Use xlsinfo to get the sheet names, then use xlsread in a loop.
[status,sheets,xlFormat] = xlsfinfo(filename);
for sheetindex=1:numel(sheets)
[num,txt,raw]=xlsread(filename,sheets{sheetindex});
data{sheetindex}=num; %keep for example the numeric data to process it later outside the loop.
end
I 've just remembered that i posted this question almost 2 years ago, and since I figured it out, I thought that posting the answer could prove useful to someone in the future.
So to recap; I needed to import a single column from 4 excel files, with each file containing 15 worksheets. The columns were of variable lengths. I figured out two ways to do this. The first one is by using the xlsread function with the following syntax.
for count_p = 1:2
a = sprintf('control_group_%d.xls',count_p);
[status,sheets,xlFormat] = xlsfinfo(a);
for sheetindex=1:numel(sheets)
[num,txt,raw]=xlsread(a,sheets{sheetindex},'','basic');
data{sheetindex}=num;
FifthCol{count_p,sheetindex} = (data{sheetindex}(:,5));
end
end
for count_p = 3:4
a = sprintf('exercise_group_%d.xls',(count_p-2));
[status,sheets,xlFormat] = xlsfinfo(a);
for sheetindex=1:numel(sheets)
[num,txt,raw]=xlsread(a,sheets{sheetindex},'','basic');
data{sheetindex}=num;
FifthCol{count_p,sheetindex} = (data{sheetindex}(:,5));
end
end
The files where obviously named control_group_1, control_group_2 etc. I used the 'basic' input in xlsread, because I only needed the raw data from the files, and it proved to be much faster than using the full functionality of the function.
The second way to import the data, and the one that i ended up using, is building your own activeX server and running a single excelapplication on it. Xlsread "opens" and "closes" an activeX server each time it's called so it's rather time consuming (using the 'basic' input does not though). The code i used is the following.
Folder=cd(pwd); %getting the working directory
d = dir('*.xls'); %finding the xls files
N_File=numel(d); % Number of files
hexcel = actxserver ('Excel.Application'); %starting the activeX server
%and running an Excel
%Application on it
hexcel.DisplayAlerts = true;
for index = 1:N_File %Looping through the workbooks(xls files)
Wrkbk = hexcel.Workbooks.Open(fullfile(pwd, d(index).name)); %VBA
%functions
WorkName = Wrkbk.Name; %getting the workbook name %&commands
display(WorkName)
Sheets=Wrkbk.Sheets; %sheets handle
ShCo(index)=Wrkbk.Sheets.Count; %counting them for use in the next loop
for j = 1:ShCo(index) %looping through each sheet
itemm = hexcel.Sheets.Item(sprintf('sheet%d',j)); %VBA commands
itemm.Activate;
robj = itemm.Columns.End(4); %getting the column i needed
numrows = robj.row; %counting to the end of the column
dat_range = ['E1:E' num2str(numrows)]; %data range
rngObj = hexcel.Range(dat_range);
xldat{index, j} = cell2mat(rngObj.Value); %getting the data in a cell
end;
end
%invoke(hexcel);
Quit(hexcel);
delete(hexcel);

How to import lots of data into matlab from a spreadsheet?

I have an excel spreadsheet with lots of data that I want to import into matlab.
filename = 'for_matlab.xlsx';
sheet = (13*2)+ 1;
xlRange = 'A1:G6';
all_data = {'one_a', 'one_b', 'two_a', 'two_b', 'three_a', 'three_b', 'four_a', 'four_b', 'five_a', 'five_b', 'six_a', 'six_b', 'seven_a', 'seven_b', 'eight_a', 'eight_b', 'nine_a', 'nine_b', 'ten_a', 'ten_b', 'eleven_a', 'eleven_b', 'twelve_a', 'twelve_b', 'thirteen_a', 'thirteen_b', 'fourteen_a'};
%read data from excel spreadsheet
for i=1:sheet,
all_data{i} = xlsread(filename, sheet, xlRange);
end
Each element of the 'all_data' vector has a corresponding matrix in separate excel sheet. The code above imports the last matrix only into all of the variables. Could somebody tell me how to get it so I can import these matrices into individual matlab variables (without calling the xlsread function 28 times)?
You define a loop using i but then put sheet in the actual xlsread call, which will just make it read repeatedly from the same sheet (the value of the variable sheet is not changing). Also not sure whether you intend to somehow save the contents of all_data, as written there's no point in defining it that way as it will just be overwritten.
There are two ways of specifying the sheet using xlsread.
1) Using a number. If you intended this then:
all_data{i} = xlsread(filename, i, xlRange);
2) Using the name of the sheet. If you intended this and the contents of all_data are the names of sheets, then:
data{i} = xlsread(filename, all_data{i}, xlRange); %avoiding overwriting

Resources