I have several excel spreadsheets in a folder, where each spreadsheet contains several worksheets. I've written a code which loads a specific worksheet from each spreadsheet into matlab. The worksheet is called 'Bass min'.
files = dir('*.xls');
%read data from excel into matlab
for i=1:length(files);
File_Name{i}=files(i,1).name;%Removes the file names from 'files'
[num{i},txt{i},raw{i}] = xlsread(File_Name{i},'Bass min');
end
Is there a faster way of doing this? As I have many spreadsheets its takes a long time to read. I've heard some people mentioning actxserver as a faster method, but don't know how this would work!
many thanks
You could try reading the files in basic mode, in which case Matlab would read the files directly without going through Excel:
[num{i},txt{i},raw{i}] = xlsread(File_Name{i},'Bass min','','basic');
Related
I tried using XSSFBEventBasedExcelExtractor class but it reads all of the data in the sheets present.
I have many sheets in Binary excel file and I want to extract one sheet. Is there a way to do that? Other approaches are welcome.
Like many folks I need to read both .xls files (I call them S files, using xlrd) and .xlsx files (the X files, using openpyxl), in both cases files of about 30,000 rows. And in both cases I'm just copying all excel data read out to a .csv file, no other processing so just Input/Output.
But the X file operations are over 200 times slower than for .xls, for example reading a 30,000 row .xlsx file now takes 2 minutes compared to 1/2 second for .xls with xlrd. We have thousands of files to process so the time per file matters.
Is openpyxl that much slower or do I need to do something, like release some resource at the end of each row?
BTW, I have made several great improvements by using read_only=True and reading a row at a time instead of cell by cell
as shown in the following code segment. Thanks to blog.davep.org
https://blog.davep.org/2018/06/02/a_little_speed_issue_with_openpyxl.html
wb = openpyxl.load_workbook("excel_file.xlsx", data_only=True, read_only=True)
sheet = wb.active
for row in sheet.rows:
for cell in row:
cell_from_excel = cell.value
Here's my desired outcome: I want an Excel workbook (say Master.xls) that I can drop into a directory of other Excel workbooks and Master.xls will extract a given range of cells from all of the hundreds of other workbooks in that directory. I have multiple directories with hundreds of Excel files in each, so I need a Master.xls file that will easily move between directories with different file paths and update based on the files around it in the directory. In my Master.xls file, I can build the file names for all of these other workbooks using text functions like CONCATENATE.
The problem comes when I try to use Excel to reference cells in workbooks that are not currently open. The problems:
INDEX can access closed workbooks using hard-coded paths, but can't (as far as I can tell) accept cell ranges as text. To enter cell ranges as text to other functions, one has to use the...
INDIRECT function, which doesn't work for closed workbooks.
Basically, INDEX can solve my problem but I can't figure out how to get it to work without hard-coding the paths to the closed workbook into the function call. That's a deal breaker, since I have thousands of workbooks to reference and doing a find-replace to change the file path for each workbook is time-prohibitive and not maintainable.
Other constraints: no Excel add-ins since this sheet has to be shared with others and no VBA because this has to be used by people with fear of macros. I recognize that Excel is not the right tool for this job. Believe me, if I could use another tool, I would.
Update: sample Excel sheet showing the problem:
Going straight to the source at MS Office support, INDIRECT does not work with external workbooks.
I'm working on Talend and recently my goal was to parse each sheet of an Excel file to do some different things.
For example, nowadays I'm working on an excel file composed of 4 sheets and I want to replace some values by other values in both sheets. The output file would be the same excel file, composed of its 4 sheets with all the values, including those replaced.
I used tFileExcelWorkbook and tFileExcelSheetList to parse my Excel file, then tFlowIterate to create a global variable (name of sheet) and tReplace to make the search/replace.
But actually I'm stuck.. I really don't know How to make it to create the same excel file, with the same sheets by using that tReplace component.
Do you know what I could do to solve that problem, and more generally how to do to parse sheets of an Excel file ?
Thanks !
Julien
Julien,
Easier way to parse/process sheets in an Excel would be to use a tFileInputExcel and in the "Sheet list" define the sheet names/position that needs to be worked on.
Renju MAthews
Is there some way to write a csv file such that , when opened in MS Excel , it will open in different tabs in the workspace ?
The short answer is, NO.
For that matter, the long answer is NO too.
csv is a continuous run of lines of values separated by commas. each line doesn't even have to have the same number of values etc. there's no concept of workbooks or different "areas" in csv. Excel cannot be cajoled into opening a csv into multiple workbooks...well at least not without writing VBA to parse the csv file yourself.
the oxml or whatever they've ended up calling the xml file spec for office, allows workbooks and is still easy to deal with being text based. Do you have to use csv or can you switch (at least part way through) to xml?