I have a machine that has a starting time and end time as shown below:
I want to read this values in MATLAB and save it to format HH:MM:SS, I'm using this command
filename='Name.csv';
W=xlsread(filename, 'B5:B5');
t = datetime(W,'ConvertFrom','excel');
I want to get a final array that starts with the start time and add 1 second until it reach the last time like this
time=[3:45:13,3:45:14,3:45:15,3:45:16,........,11:25:06]
Any ideas ?
Thanks in advance
So you've gotten this far:
W = xlsread(filename, 'B5:B5');
t = datetime(W,'ConvertFrom','excel');
P = xlsread(filename, 'B6:B6');
t1 = datetime(P,'ConvertFrom','excel');
At that point you have your start time in t and your end time in t1.
Similar to how you can construct ranges of numbers using the notation a:b, you can construct ranges of times using datetimes and the : operator. Use start_time:step_size:end_time to set the step/interval size.
time = t:seconds(1):t1
Then you can use datestr() on the results to convert it to whatever display format you want.
Related
I have an excel table that contain values in these formats. The tables span over 30000 entries.
I need to clean this data so that only the numbers directly after V- are left. This would mean that when the value is SV-51140r3_rule, V-4407..., I would only want 4407 to remain and when the value is SV-245744r822811_rule, I would only want 245744 to remain. I have about 10 formulas that can handle these variations, but it requires a lot of manual labor. I've also used the text to column feature of excel to clean this data as well, but it takes about 30 minutes to an hour to go through the whole document. I'm looking for ways that I can streamline this process so that one formula or function can handle all of these different variations. I'm open to using VBA but don't have a whole lot of experience with it and I am unable to use Pandas or any IDE or programming language. Help please!!
I've used text to columns to clean data that way and I've used a variation of this formula
=IFERROR(RIGHT(A631,LEN(A631)-FIND("#",SUBSTITUTE(A631,"-","#",LEN(A631)-LEN(SUBSTITUTE(A631,"-",""))))),A631)
Depending on your version of Excel, either of these should work. If you have the ability to use the Let function, it will improve your performance, as this outstanding article articulates.
If you're on a really old version of excel, you'll need to hit ctl shift enter to make array formula work.
While these look daunting, all these functions are doing is finding the last V (by this function) =SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄","") and then looping through each character and only returning numbers.
Obviously the mushroom 🍄 could be any character that one would consider improbable to appear in the actual data.
Old School
=TEXTJOIN("",TRUE,IF(ISNUMBER(MID(MID(SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄",""),
FIND("-",SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄","")),9^9),
FILTER(COLUMN($1:$1),COLUMN($1:$1)<=LEN(MID(SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄",""),
FIND("-",SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄","")),9^9))),1)+0),
MID(MID(SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄",""),
FIND("-",SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄","")),9^9),
FILTER(COLUMN($1:$1),COLUMN($1:$1)<=LEN(MID(SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄",""),
FIND("-",SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄","")),9^9))),1),""))
Let Function
(use this if you can)
=LET(zText,SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄",""),
TEXTJOIN("",TRUE,IF(ISNUMBER(MID(MID(zText,FIND("-",zText),9^9),
FILTER(COLUMN($1:$1),COLUMN($1:$1)<=LEN(MID(zText,FIND("-",zText),9^9))),1)+0),
MID(MID(zText,FIND("-",zText),9^9),
FILTER(COLUMN($1:$1),COLUMN($1:$1)<=LEN(MID(zText,FIND("-",zText),9^9))),1),"")))
VBA Custom Function
You could also use a VBA custom function to accomplish what you want.
Function getNumbersAfterCharcter(aCell As Range, aCharacter As String) As String
Const errorValue = "#NoValuesInText"
Dim i As Long, theValue As String
For i = Len(aCell.Value) To 1 Step -1
theValue = Mid(aCell.Value, i, 1)
If IsNumeric(theValue) Then
getNumbersAfterCharcter = Mid(aCell.Value, i, 1) & getNumbersAfterCharcter
ElseIf theValue = aCharacter Then
Exit Function
End If
Next i
If getNumbersAfterCharcter = "" Then getNumbersAfterCharcter = errorValue
End Function
Forgive me because this is my first post. I am trying to have my data auto-crop my data at a certain point for a plot I am making. The first point is always the same and I can find the reference for the final point using the Index function. The top code is how I defined my values initially which included all of the data in columns E and D in a large range that encompassed all of the data. When I replaced the "$E$10000" terms with my index function I could not get the code to work. Any help identifying the error in the second lines of code shown below would be much appreciated.
Code that works but does not crop data:
ActiveChart.FullSeriesCollection(1).XValues = "='Sheet123'!$E$14:$E$10000"
ActiveChart.FullSeriesCollection(1).Values = "='Sheet123'!$D$14:$D$10000"
Code that should crop data, but it does not work at the time being:
ActiveChart.FullSeriesCollection(1).XValues = "='Sheet123'!$E$14:INDEX(E:E,MATCH(J19,D:D,0))"
ActiveChart.FullSeriesCollection(1).Values = "='Sheet123'!$D$14:INDEX(D:D,MATCH(J19,D:D,0))"
I have to add a table from a CSV file around 1500 rows and 9 columns, (75 pages) in a docx word document. using python-docx.
I have tried differents approaches, reading ths csv with pandas or directly openning de csv file, It cost me around 150 minutes to finish the job independently the way I choose
My question is if this could be normal behavior or if exist any other way to improve this task.
I'm using this for loop to read several cvs files and parsing it in table format
for toTAB in listBRUTO:
df= pd.read_csv(toTAB)
# add a table to the end and create a reference variable
# extra row is so we can add the header row
t = doc.add_table(df.shape[0]+1, df.shape[1])
t.style = 'LightShading-Accent1' # border
# add the header rows.
for j in range(df.shape[-1]):
t.cell(0,j).text = df.columns[j]
# add the rest of the data frame
for i in range(df.shape[0]):
for j in range(df.shape[-1]):
t.cell(i+1,j).text = str(df.values[i,j])
#TABLE Format
for row in t.rows:
for cell in row.cells:
paragraphs = cell.paragraphs
for paragraph in paragraphs:
for run in paragraph.runs:
font = run.font
font.name = 'Calibri'
font.size= Pt(7)
doc.add_page_break()
doc.save('blabla.docx')
Thanks in advance
You'll want to minimize the number of calls to table.cell(). Because of the way cell-merging works, these are expensive operations that really add up when performed in a tight loop.
I would start with refactoring this block and see how much improvement that yields:
# --- add the rest of the data frame ---
for i in range(df.shape[0]):
for j, cell in enumerate(table.rows[i + 1].cells):
cell.text = str(df.values[i, j])
python-docx walk the whole table every single time you access its "cells" property.
so you better call ".cell" as less as possible and use a cache for cells instead.
these are two examples access a table with size 3*1500:
code 1: about 150.0s
for row in table.rows:
print('processing: {0:30s}'.format(row.cells[0].text),end='\r')
code 2: about 1.4s
clls=table._cells
for row_idx in range(len(clls)//table._column_count):
print('processing: {0:30s}'.format(
clls[0 + row_idx*table._column_count].text),end='\r')
clls=table._cells in code 2 use "_cells" to process the cell-merging, so ccls[column_idx + row_idx*table._column_count].text works just as fine as table.rows[row_idx].cells[column_idx].text, and dont require table to be exactly rectangular
For rectangular table without merged cells you can export all cells into list-of-lists structure and fill them very quickly (less then 0.5s vs 15s for ~300 lines tables with 3 columns):
from docx.table import _Cell
def get_cells_grid(table):
cells = [[]]
col_count = table._column_count
for tc in table._tbl.iter_tcs():
cells[-1].append(_Cell(tc, table))
if len(cells[-1]) == col_count:
cells.append([])
return cells
cells = get_cells_grid(t)
for i in range(df.shape[0]):
for j in range(df.shape[i]):
cells[i][j].text = str(df.values[i, j])
Function based on table._cells() code: https://github.com/python-openxml/python-docx/blob/da75fcf01f7f322e846e2ac3e1936aedd766acc8/docx/table.py#L162
Just to add my experience, if you have to create a huge table, create the whole structure first, meaning all the rows and cells you will need; and then store the cells like so
table_cells = table._cells (according to #kztopia)
And from there you can manipulate cells as you wish, merging, adding text etc... with a rather optimized fastness since you make only one call to cell()
In my use case, for a table being, in my opinion, not so big (~130rows, 8cells per row), it used to take 9sec to create the whole thing and now i'm at .5 or so.
Keep in mind that, the bigger the table, the more time it'll take to execute cell().
I would like to do a data reduction operation on a spreadsheet. Preferably I would like to use MATLAB/(or excel) since I need separate output files for each case.
The link is for the spreadsheet is below
Spreadsheet
A screenshot of the spreadsheet is as below
The output I required in text files is something as below
The first sheet in the .xls file is the main input. Wheras the the following sheets (d**) are my required output. I also need these sheets in a separate ASCII file (.dat) to plot hem later on. Here is how the algorithm works
Lookup the number/string in column B(FileName)
Extract all data in Columns C and D (Saturation and ETC) with same FileName Value(Column B)
Lookup the matching FileName(Column B) value in Column E (ImageIndex).
Copy Value of ImageName(Column F) to the corresponding Value in Image(IndexColumn E)
Result would be three columns (ImageName,Saturation,ETC). ImageName would be same for each subcase
Sort the columns based on Saturation
Write each sub case as a separate .dat file
I tried using a few recipes using categorical arrays (findgroups and splitapply) in MATLAB. Didn't seem to work out for me. I would be later working on a larger data set so automation is necessary. I think this could be done using macros on excel, but I would prefer using MATLAB since I would use MATLAB to plot the data. Any other alternative suggestions are welcome
Thanks,
Here's a Matlab solution. You could do it with a rather convoluted accumarray call, but readability would be rather bad, so I'm opting for a loop here.
out is a structure which you can use to either write files, or to plot the data.
tbl = readtable('yourFile.xls');
%# get the group indices for the files
%# this assumes that you have cleaned up the dash after the 1
%# so that all of the entries in the FileName column are numeric
idx = tbl.FileName;
%# the uIdx business is to account for the possibility
%# that there are images missing from the sequence
uIdx = unique(idx);
nImages = length(uIdx);
%# preassign output structure
out(1:nImages) = struct('name','','saturation',0,'etc',0);
%# loop to extract relevant information
for iImage = uIdx(:)'
myIdx = idx==iImage;
data = tbl(myIdx,{'Saturation','ETC'});
data = sortrows(data,'Saturation');
name = tbl.ImageName{tbl.ImageIdx==iImage};
out(iImage==uIdx).name = name;
out(iImage==uIdx).saturation = data.Saturation;
out(iImage==uIdx).etc= data.ETC;
end
%# plotting
for iImage = 1:nImages
figure('name',out(iImage).name)
plot(out(iImage).saturation, out(iImage).etc,'.');
end
I need to import some Excel files in MATLAB and work on them. My problem is that each Excel file has 15 sheets and I don't know how to "number" each sheet so that I can make a loop or something similar (because I need to find the average on a certain column on each sheet).
I have already tried importing the data and building a loop but MATLAB registers the sheets as chars.
Use xlsinfo to get the sheet names, then use xlsread in a loop.
[status,sheets,xlFormat] = xlsfinfo(filename);
for sheetindex=1:numel(sheets)
[num,txt,raw]=xlsread(filename,sheets{sheetindex});
data{sheetindex}=num; %keep for example the numeric data to process it later outside the loop.
end
I 've just remembered that i posted this question almost 2 years ago, and since I figured it out, I thought that posting the answer could prove useful to someone in the future.
So to recap; I needed to import a single column from 4 excel files, with each file containing 15 worksheets. The columns were of variable lengths. I figured out two ways to do this. The first one is by using the xlsread function with the following syntax.
for count_p = 1:2
a = sprintf('control_group_%d.xls',count_p);
[status,sheets,xlFormat] = xlsfinfo(a);
for sheetindex=1:numel(sheets)
[num,txt,raw]=xlsread(a,sheets{sheetindex},'','basic');
data{sheetindex}=num;
FifthCol{count_p,sheetindex} = (data{sheetindex}(:,5));
end
end
for count_p = 3:4
a = sprintf('exercise_group_%d.xls',(count_p-2));
[status,sheets,xlFormat] = xlsfinfo(a);
for sheetindex=1:numel(sheets)
[num,txt,raw]=xlsread(a,sheets{sheetindex},'','basic');
data{sheetindex}=num;
FifthCol{count_p,sheetindex} = (data{sheetindex}(:,5));
end
end
The files where obviously named control_group_1, control_group_2 etc. I used the 'basic' input in xlsread, because I only needed the raw data from the files, and it proved to be much faster than using the full functionality of the function.
The second way to import the data, and the one that i ended up using, is building your own activeX server and running a single excelapplication on it. Xlsread "opens" and "closes" an activeX server each time it's called so it's rather time consuming (using the 'basic' input does not though). The code i used is the following.
Folder=cd(pwd); %getting the working directory
d = dir('*.xls'); %finding the xls files
N_File=numel(d); % Number of files
hexcel = actxserver ('Excel.Application'); %starting the activeX server
%and running an Excel
%Application on it
hexcel.DisplayAlerts = true;
for index = 1:N_File %Looping through the workbooks(xls files)
Wrkbk = hexcel.Workbooks.Open(fullfile(pwd, d(index).name)); %VBA
%functions
WorkName = Wrkbk.Name; %getting the workbook name %&commands
display(WorkName)
Sheets=Wrkbk.Sheets; %sheets handle
ShCo(index)=Wrkbk.Sheets.Count; %counting them for use in the next loop
for j = 1:ShCo(index) %looping through each sheet
itemm = hexcel.Sheets.Item(sprintf('sheet%d',j)); %VBA commands
itemm.Activate;
robj = itemm.Columns.End(4); %getting the column i needed
numrows = robj.row; %counting to the end of the column
dat_range = ['E1:E' num2str(numrows)]; %data range
rngObj = hexcel.Range(dat_range);
xldat{index, j} = cell2mat(rngObj.Value); %getting the data in a cell
end;
end
%invoke(hexcel);
Quit(hexcel);
delete(hexcel);