Multithreaded programming in MATLAB - multithreading

I need help in parallel programming in MATLAB.
I have a list of filenames;
for each file I have to do an independent calculation that returns a table row.
All the tableRows should be integrated into one table.
The order of the rows has no meaning.
How do I process all the files in parallel and insert the rows into one table?
samples=dir('*.txt');
for smpl=samples'
row=processSamples(smpl,prm1,prm2); //should be parallel
table=[table;row];
end
Thanks

Matlab has a very useful and easy to use implementation of the parallelized for loop called parfor. See e.g. doc parfor.
Code would be similar to the below depending on the dimension and type of your "row" variable. The point is you have to index into table for parfor to be able to work.
samples=dir('*.txt');
parfor k=1:length(samples)
smpl = samples(k)
row=processSamples(smpl,prm1,prm2); //should be parallel
table(k) = row % Preserves order
end
When the code hits the parfor line it will spend some time (seconds) to set up parallel workers on your local computer. Alternatively you can use parpool to set up the workers on your local computer or a cluster fx.
Exemplifying
n = 10;
y = randi(10,1,n);
squaredy = zeros(n,1) % Not necessary for parfor to work
parfor k = 1:n
squaredy(k) = y(k)^2
end
disp(y)
disp(squaredy)

Related

Import Time Data from MATLAB to Excel

I have a machine that has a starting time and end time as shown below:
I want to read this values in MATLAB and save it to format HH:MM:SS, I'm using this command
filename='Name.csv';
W=xlsread(filename, 'B5:B5');
t = datetime(W,'ConvertFrom','excel');
I want to get a final array that starts with the start time and add 1 second until it reach the last time like this
time=[3:45:13,3:45:14,3:45:15,3:45:16,........,11:25:06]
Any ideas ?
Thanks in advance
So you've gotten this far:
W = xlsread(filename, 'B5:B5');
t = datetime(W,'ConvertFrom','excel');
P = xlsread(filename, 'B6:B6');
t1 = datetime(P,'ConvertFrom','excel');
At that point you have your start time in t and your end time in t1.
Similar to how you can construct ranges of numbers using the notation a:b, you can construct ranges of times using datetimes and the : operator. Use start_time:step_size:end_time to set the step/interval size.
time = t:seconds(1):t1
Then you can use datestr() on the results to convert it to whatever display format you want.

extremely slow add a table to python-docx from a csv file

I have to add a table from a CSV file around 1500 rows and 9 columns, (75 pages) in a docx word document. using python-docx.
I have tried differents approaches, reading ths csv with pandas or directly openning de csv file, It cost me around 150 minutes to finish the job independently the way I choose
My question is if this could be normal behavior or if exist any other way to improve this task.
I'm using this for loop to read several cvs files and parsing it in table format
for toTAB in listBRUTO:
df= pd.read_csv(toTAB)
# add a table to the end and create a reference variable
# extra row is so we can add the header row
t = doc.add_table(df.shape[0]+1, df.shape[1])
t.style = 'LightShading-Accent1' # border
# add the header rows.
for j in range(df.shape[-1]):
t.cell(0,j).text = df.columns[j]
# add the rest of the data frame
for i in range(df.shape[0]):
for j in range(df.shape[-1]):
t.cell(i+1,j).text = str(df.values[i,j])
#TABLE Format
for row in t.rows:
for cell in row.cells:
paragraphs = cell.paragraphs
for paragraph in paragraphs:
for run in paragraph.runs:
font = run.font
font.name = 'Calibri'
font.size= Pt(7)
doc.add_page_break()
doc.save('blabla.docx')
Thanks in advance
You'll want to minimize the number of calls to table.cell(). Because of the way cell-merging works, these are expensive operations that really add up when performed in a tight loop.
I would start with refactoring this block and see how much improvement that yields:
# --- add the rest of the data frame ---
for i in range(df.shape[0]):
for j, cell in enumerate(table.rows[i + 1].cells):
cell.text = str(df.values[i, j])
python-docx walk the whole table every single time you access its "cells" property.
so you better call ".cell" as less as possible and use a cache for cells instead.
these are two examples access a table with size 3*1500:
code 1: about 150.0s
for row in table.rows:
print('processing: {0:30s}'.format(row.cells[0].text),end='\r')
code 2: about 1.4s
clls=table._cells
for row_idx in range(len(clls)//table._column_count):
print('processing: {0:30s}'.format(
clls[0 + row_idx*table._column_count].text),end='\r')
clls=table._cells in code 2 use "_cells" to process the cell-merging, so ccls[column_idx + row_idx*table._column_count].text works just as fine as table.rows[row_idx].cells[column_idx].text, and dont require table to be exactly rectangular
For rectangular table without merged cells you can export all cells into list-of-lists structure and fill them very quickly (less then 0.5s vs 15s for ~300 lines tables with 3 columns):
from docx.table import _Cell
def get_cells_grid(table):
cells = [[]]
col_count = table._column_count
for tc in table._tbl.iter_tcs():
cells[-1].append(_Cell(tc, table))
if len(cells[-1]) == col_count:
cells.append([])
return cells
cells = get_cells_grid(t)
for i in range(df.shape[0]):
for j in range(df.shape[i]):
cells[i][j].text = str(df.values[i, j])
Function based on table._cells() code: https://github.com/python-openxml/python-docx/blob/da75fcf01f7f322e846e2ac3e1936aedd766acc8/docx/table.py#L162
Just to add my experience, if you have to create a huge table, create the whole structure first, meaning all the rows and cells you will need; and then store the cells like so
table_cells = table._cells (according to #kztopia)
And from there you can manipulate cells as you wish, merging, adding text etc... with a rather optimized fastness since you make only one call to cell()
In my use case, for a table being, in my opinion, not so big (~130rows, 8cells per row), it used to take 9sec to create the whole thing and now i'm at .5 or so.
Keep in mind that, the bigger the table, the more time it'll take to execute cell().

Can For loops be dynamic in that the limits change as the code is run?

I'm creating a forecasting model for a fleet of equipment using Excel wholly written with VBA.
While forecasting the utilisation of equipment, some equipment will reach its replacement threshold and a new piece of equipment takes over from there. This will require a new row added to the table for the new equipment.
I would have thought that a For loop would be dynamic, so using a variable for the upper limit would be re-evaluated on every loop, but this seems not to be the case.
I set up a simple scenario to test as per the code below, starting with 2 listrows in the table.
Sub Test1()
Set Table1 = Sheet1.ListObjects("Table1")
x = Table1.ListRows.Count
For i = 1 To x
Set NewRow = Table1.ListRows.Add
x = Table1.ListRows.Count
NewRow.Range(1, 1) = x
Next i
End Sub
I assumed it would run infinitely but it will only run as per the initial case provided.
Is using a different type of loop (Do While or Do Until) the ONLY way to achieve a genuinely dynamic outcome?
To sum things in the comments to your question up:
Modifying the target of a for-loop might be prohibited in Viual Basic. There are other languages out there that in principle allow for this kind of operation, however, it's not a good programming style.
The reason is, that a for-loop is a loop counting over a fixed interval (that should not change during the loop's execution).
Instead of using a for-loop here, one may consider using a while-loop:
i = 1
While i <= x
Set NewRow = Table1.ListRows.Add
x = Table1.ListRows.Count
NewRow.Range(1, 1) = x
i = i+1
Wend
Caution:
This loop will run forever (or rather: until you reach a maximum of resources, in which case it will crash). The reason is, that you move the upper bound for the iteration 1 unit further away while approaching it by 1 unit.
The best way to approach what you actually want to achieve is using a buffer list:
Identify the items you want to create a new row for
Insert that row into a second list
Iterate over the second list and append the items to the original list
This way, you avoid testing the newly inserted items (which most likely won't be outdated by now).

How to loop through excel sheets, perform calculations, and compile results

I have roughly 70,000 sheets that all have to have calculations done, and then all results compiled into a new sheet (which would be 70,000 lines long).
It needs to be sorted by date.
I'm VERY very very poor at matlab, but I've what I need the script to do for each excel sheet, I'm just unsure how to make it do them for all.
Thank you!!! (I took out some of the not important code)
%Reading in excel sheet
B = xlsread('24259893-008020361800.TorqueData.20160104.034602AM.csv');
%Creating new matrix
[inYdim, inXdim] = size(B);
Ydim = inYdim;
[num,str,raw]=xlsread('24259893-008020361800.TorqueData.20160104.034602AM.csv',strcat('A1:C',num2str(Ydim)));
%Extracting column C
C=raw(:,3);
for k = 1:numel(C)
if isnan(C{k})
C{k} = '';
end
end
%Calculations
TargetT=2000;
AvgT=mean(t12);
TAcc=((AvgT-TargetT)/TargetT)*100 ;
StdDev=std(B(ind1:ind2,2));
ResTime=t4-t3;
FallTime=t6-t5;
DragT=mean(t78);
BreakInT=mean(t910);
BreakInTime=(t10-t9)/1000;
BreakInE=BreakInT*BreakInTime*200*.1047;
%Combining results
Results=[AvgT TAcc StdDev ResTime FallTime DragT BreakInT BreakInTime BreakInE]
I think I need to do something along the lines of:
filenames=dir('*.csv')
and I found this that may be useful:
filenames=dir('*.csv');
for file=filenames'
csv=load(file.name);
with stuff in here
end
You have the right idea, but you need to index your file names in order to be able to step through them in the for loop.
FileDir = 'Your Directory';
FileNames = {'Test1';'Test2';'Test3'};
for k=1:length(FileNames)
file=[FileDir,'/',FileNames{k}]);
[outputdata]=xlsread(file,sheet#, data locations);
THE REST OF YOUR LOOP, Indexed by k
end
How you choose to get the file names and directory is up to you.

Creating EXCEL File with parfor command in MATLAB for parallel processing

I want to create excel file outside parfor loop (in the starting of the code) then updating excel file with each loop and finally storing excel file to specific location after loop is complete. But I am getting some errors. with following command :
matlabpool('open',2);
pwd='C:\Users\myPC\Desktop';
fName = fullfile(pwd, 'file.xls');
%# create Excel COM Server
Excel = actxserver('Excel.Application');
Excel.Visible = true;
%# create new XLS file
wb = Excel.Workbooks.Add();
wb.Sheets.Item(1).Activate();%line 10
offset = 0;
C1 = {'NAME', 'Max', 'Min','Average'};
%# calculate cell range to fit matrix (placed below previous one)
cellRange = xlcalcrange('A1', offset,0, size(C1,1),size(C1,2));
offset = offset + size(C1,1);
%# insert matrix in sheet
Excel.Range(cellRange).Select();
Excel.Selection.Value =C1;
parfor i=1:2
%some code , eg :
MAX =1
MIN =2
AVG=3
name='jpg'
row2 = { name MAX MIN AVG };
%# calculate cell range to fit matrix (placed below previous one)
cellRange = xlcalcrange('A1', offset,0, size(row2,1),size(row2,2));
offset = offset + size(row2,1);
Excel.Range(cellRange).Select(); %line32
Excel.Selection.Value =row2;
end
%# parsave XLS file
wb.SaveAs(fName,1);
wb.Close(false);
%# close Excel
Excel.Quit();
Excel.delete();
matlabpool('close');
Above code shows following error:
1. The variable Excel in a parfor cannot be classified.,
2. The PARFOR loop cannot run used due to the way variable 'offset' is used.,
3. The PARFOR loop cannot run used due to the way variable 'Excel' is used.,
4. Valid indices for 'Excel' are restricted in PARFOR loops.,
5. Call was rejected by callee.Error in line 10 (wb.Sheets.Item(1).Activate();).
Pls. help to use above code so that I can create excel file which updates inside PARFOR loop and EXCEL file get saved outside PARFOR loop
It is not safe to write to a file simultaneously from different threads. This is why MATLAB will automatically error when you try to do this sort of thing.
The answer to this problem will be one of the two:
Within the parfor loop, write your outputs to some sort of independent buffers, and then outside the parfor loop write the buffers to the file, serially. Or,
Don't use parfor. Use for instead.
See this: http://blogs.mathworks.com/loren/2009/10/02/using-parfor-loops-getting-up-and-running/ to learn more about the restrictions of the parfor command.

Resources