Load big Excel (xlsx) file into matlab - excel

I use Windows 64bit with 8GB RAM and Matlab 64bit.
I tried to load a .xlsx file into matlab. The file size is around 700MB, containing a sheet with 673928 rows and 43 columns.
First I use the GUI tool 'uiimport'. After choosing the file path and name, the GUI tool needs around 3 minutes to read the .xlsx file, and then shows the data in a table. If I choose "cell array", it needs around 10 minutes to import the data into workspace.
>>whos
Name Size Bytes Class Attributes
NBPPdataV3YOS1 673928x43 3473588728 cell
It works very well, but I have many .xlsx files to import. It is impossible to import each file using GUI tool. So I use the GUI tool to generate function like this
function data = importfile(workbookFile, sheetName, range)
%% Import the data
[~, ~, data] = xlsread(workbookFile, sheetName, range);
data(cellfun(#(x) ~isempty(x) && isnumeric(x) && isnan(x),data)) = {''};
For simply, I ignore some irrelevant code. However, when I use this function to import the data, It does not work well. The used RAM by Matlab and Excel increases dramatically until almost all RAM is used. The data cannot be imported even after 30 minutes.
I also try to do it like this,
filename='E:\data.xlsx';
excelObj = actxserver('Excel.Application');
fileObj = excelObj.Workbooks.Open(filename);
sheetObj = fileObj.Worksheets.get('Item', 'sheet2');
%Read in ranges the same way as xlsread!
indata = sheetObj.Range('A1:AQ673928').Value;
The same problem occurs as xlsread().
My questions are:
1. Does the GUI import tool use xlsread() to read .xlsx file? If yes, why the generated function does not work? If no, which interface it uses?
2. Is there an efficient way to load Excel file into Matlab?
Thanks!

It sounds like you may be keeping the excel file in memory in Matlab. I would suggest looking into making sure you close the connection to each excel file after you have imported its data.
You may also find that the Matlab table class is more memory efficient than the cell class.
Good luck.

Related

How do I read all the .wav audio file in one folder using information from an excel sheet in Matlab?

Hi I have a question for Matlab programming, I want to ask if I am using Mac OS and I have placed all my audio files in the same folder as Matlab, how do I read all the .wav audio files? I want to automate the process.
Example:
Firstly, I have an excel sheet with the audio file name and information.
Secondly, I want to extract the audio file names from the excel sheet (first column) and put it into the audioread function in MatLab.
I need to use the following audioread function.
[y,Fs]=audioread('audio1.wav');
I want to read audio1.wav and do some calculations on it. After finishing the calculation, I will proceed to read audio2.wav and do the same calculation for it. Can you teach me how to do this and show me the code for this?
Thank you.
In Matlab you can read xls files with readmatrix. You are maybe best to export your spreadsheet of audio files to a csv file first.
With regard to organising the data, it would be easiest for the spreadsheet to contain the full pathname to the file (i.e. /path/from/root/to/file.wav)
So, say you had a audio_files.csv of file paths like
/path/to/file1.wav, file1data
/path/to/file2.wav, file2data
/path/to/file3.wav, file3data
You could read each file with something like
filename = 'audio_files.csv';
audio_file_list = readmatrix(filename);
for audio_file = audio_file_list(:,1) % so long as the first column is the file paths
[y,Fs]=audioread(audio_file);
% do something to y
end
Of course, the % do something to y will depend entirely on what you want to achieve.

How do I use python to show data in excel?

I'm trying to display data from Python in Excel. Ideally, a pandas dataframe's worth of data would appear in a new, unsaved excel instance. My search has turned up ways to create excel files, and lots of ways to 'open' an excel file to read data from it, but no way to display it. My current approach was to create a file and then figure out how to open it, but I consider that approach second-best.
I found this. Another way would be to export your data to CSV and import it in Excel.
I found you can 'run' files with the OS library. As long as your computer knows what to do with it, you can create the xlsx file with whatever method and then run it to display:
import xlsxwriter
import os
w = xlsxwriter.Workbook(r"C:\Temp\test.xlsx")
s=w.add_worksheet("Sheet1")
s.write(1,3,"7")
w.close()
os.startfile(r"C:\Temp\test.xlsx")
Still not sure if you can work with an unsaved open instance of excel

Run a VBA macro in Spotfire using Ironpython

So I would try ask over in this thread IronPython - Run an Excel Macro but I don't have enough reputation.
So roughly following the code given in the link I created some code which would save a file to a specific location, then open a workbook that exists there, calling the macro's within it, which would perform a small amount of manipulation on the data which I downloaded to .xls, to make it more presentable.
Now I've isolated the problem to this particular part of the code (below).
Spotfire normally is not that informative but it gives me very little to go on here. It seems to be something .NET related but that's about all I can tell.
The Error Message
Traceback (most recent call last): File
"Spotfire.Dxp.Application.ScriptSupport", line unknown, in
ExecuteForDebugging File "", line unknown, in
StandardError: Exception has been thrown by the target of an
invocation.
The Script
from Spotfire.Dxp.Data.Export import DataWriterTypeIdentifiers
from System.IO import File, Directory
import clr
clr.AddReference("Microsoft.Office.Interop.Excel")
import Microsoft.Office.Interop.Excel as Excel
excel = Excel.ApplicationClass()
excel.Visible = True
excel.DisplayAlerts = False
workbook = ex.Workbooks.Open('myfilelocation')
excel.Run('OpenUp')
excel.Run('ActiveWorkbook')
excel.Run('DoStuff')
excel.Quit()
Right, so I'm answering my own question here but I hope it helps somebody. So the above code, as far as I'm aware was perfectly fine but didn't play well with the way my spotfire environment is configured. However, after going for a much more macro heavy approach I was able to find a solution.
On the spotfire end I've two input fields, one which gives the file path, the other the file name. Then there's a button to run the below script. These are joined together in the script but are crucially separate, as the file name needs to be inputted into a separate file, to be called by the main macro so that it can find the file location of the file.
Fundamentally I created three xml's to achieve this solution. The first was the main one which contained all of the main vba stuff. This would look into a folder in another xml which this script populates to find the save location of the file specified in an input field in spotfire. Finding the most recent file using a custom made function, it would run a simple series of conditional colour operations on the cell values in question.
This script populates two xml files and tells the main macro to run, the code in that macro automatically closing excel after it is done.
The third xml is for a series of colour rules, so user can custom define them depending on their dashboard. This gives it some flexibility for customisation. Not necessary but a potential request by some user so I decided to beat them to the punch.
Anyway, code is below.
from Spotfire.Dxp.Data.Export import DataWriterTypeIdentifiers
from System.IO import File, Directory
import clr
clr.AddReference("Microsoft.Office.Interop.Excel")
import Microsoft.Office.Interop.Excel as Excel
from Spotfire.Dxp.Data.Export import *
from Spotfire.Dxp.Application.Visuals import *
from System.IO import *
from System.Diagnostics import Process
# Input field which takes the name of the file you want to save
name = Document.Properties['NameOfDocument']
# Document property that takes the path
location = Document.Properties['FileLocation']
# just to debug to make sure it parses correctly. Declaring this in the script
# parameters section will mean that the escape characters of "\" will render in a
# unusable way whereas doing it here doesn't. Took me a long time to figure that out.
print(location)
# Gives the file the correct extension.
# Couldn't risk leaving it up to the user.
newname = name + ".xls"
#Join the two strings together into a single file path
finalProduct = location + "\\" + newname
#initialises the writer and filtering schema
writer = Document.Data.CreateDataWriter(DataWriterTypeIdentifiers.ExcelXlsDataWriter)
filtering = Document.ActiveFilteringSelectionReference.GetSelection(table).AsIndexSet()
# Writes to file
stream = File.OpenWrite(finalProduct)
# Here I created a seperate xls which would hold the file path. This
# file path would then be used by the invoked macro to find the correct folder.
names = []
for col in table.Columns:
names.append(col.Name)
writer.Write(stream, table, filtering, names)
stream.Close()
# Location of the macro. As this will be stored centrally
# it will not change so it's okay to hardcode it in.
runMacro = "File location\macro name.xls"
# uses System.Diagnostics to run the macro I declared. This will look in the folder I
# declared above in the second xls, auto run a function in vba to find the most
# up to date file
p = Process()
p.StartInfo.FileName = runMacro
p.Start()
Long story short: To run excel macro's from spotfire one solution is to use the system.dianostics method I use above and simply have your macro set to auto run.

Filtering SAS datasets created with PROC IMPORT with dmbs=xlsx

My client uses SAS 9.3 running on an AIX (IBM Unix) server. The client interface is SAS Enterprise Guide 5.1.
I ran into this really puzzling problem: when using PROC IMPORT in combination with dbms=xlsx, it seems impossible to filter rows based on the value of a character variable (at least, when we look for an exact match).
With an .xls file, the following import works perfectly well; the expected subset of rows is written to myTable:
proc import out = myTable(where=(myString EQ "ABC"))
datafile ="myfile.xls"
dbms = xls replace;
run;
However, using the same data but this time in an .xlsx file, an empty dataset is created (having the right number of variables and adequate column types).
proc import out = myTable(where=(myString EQ "ABC"))
datafile ="myfile.xlsx"
dbms = xlsx replace;
run;
Moreover, if we exclude the where from the PROC IMPORT, the data is seemingly imported correctly. However, filtering is still not possible. For instance, this will create an empty dataset:
data myFilteredTable;
set myTable;
where myString EQ "ABC";
run;
The following will work, but is obviously not satisfactory:
data myFilteredTable;
set myTable;
where myString LIKE "ABC%";
run;
Also note that:
Using compress or other string functions does not help
Filtering using numerical columns works fine for both xls and xlsx files.
My preferred method to read spreadsheets is to use excel libnames, but this is technically not possible at this time.
I wonder if this is a known issue, I couldn't find anything about it so far. Any help appreciated.
It sounds like your strings have extra values on the end not being picked up by compress. Try using the countc function on MyString to see if any extra characters exist on the end. You can then figure out what characters to remove with compress once they're determined.

reading a big xls file into R

I have an excel file with ~10000 rows and ~250 columns, currently I am using RODBC to do the importing:
channel <- odbcConnectExcel(xls.file="s:/demo.xls")
demo <- sqlFetch(channel,"Sheet_1")
odbcClose(channel)
But this way is a bit slow (I need a minute or two to import them), and the excel is originally encrypted, I need to remove the password to work on it, which is something that I prefer not to, I wonder if there is any better way (i.e. import faster, and capable of importing encrypted excel files)
Thanks.
I recommend to try using the XLConnect package instead of RODBC.
http://cran.r-project.org/web/packages/XLConnect/index.html

Resources