Here is the result of a profiled simulation run of my MATLAB program. I need to run this simulation several hundred thousand times (~100,000 times).
Thus I need a faster way to read the Excel file.
Specifications: The Excel file is of 10000x2 cells and each simulation run is reading one such sheet each from 5 separate Excel files.
UPDATE: I put the xlsread in basic mode and also reduced the number of calls by combining my input into a single file. Next target is xlswrite now. Ah, that sinking feeling. :|
NOTE: Although writing to a CSV file using dlmread is very fast (around 20 times), I need to use the comfort of separate sheets that an .xls file provides.
I don't think you would be able to wring much out of xlswrite if you need Excel sheets as the output.
How about parallelizing?
Do you have access to the parallel computing toolbox? Or maybe you can run two instances of MATLAB if your box supports it. If so, you could consider two approaches:
Have the first process do the xlsread part, the simulation part and then write to mat files/plain binary/CSV, whatever is the fastest while preserving your data integrity. Have another process convert the matfiles/intermediate data files into Excel using xlswrite.
Have N MATLAB instances/workers (N depends on your physical machine capacity). Parallelize the whole read-process-write part across N workers. Note, I am not sure how Excel would scale when called by N workers! (xlswrite uses activeX/MS Excel to write the data).
As any parallel approach, your mileage will vary on the complexity of the simulation vs. required file I/O and its performance.
Related
I have big excel files, thousands lines and rows.
Gigs in size.
When i do work inside this files in excel, it throttling and lagging. Sometimes just get stucked ans freeze.
When i open a task manager, i see that Excel didnt eat even a one CPU.
RAM usage also not overloaded.
How to make excel use all my cores?
Excel 2019.
I see you are familiar with Python. Why don't you move your databases to Python? Actually I don't know capability of it. I work in R. Some of my data in Excel and even in *.csv take 100-200 Mb, while in *.Rdata format the same information is less then 5Mb. R functions work faster then heavy Excel.
And yes, how to make excel use all cores of processor is interesting to me too.
I developed a vb.net program that uses excel file to generate some reports.
Once the program takes too much time to generate a report, I usually do other things while the program is running. The problem is that sometimes I need to open other excel files and the excel files used in the program are shown to me. I want to still hide those files being processed even when I run other excel files. Is this possible? Thanks
The FileSystem.Lock Method controls access by other processes to all or part of a file opened by using the Open function.
The My feature gives you better productivity and performance in file I/O operations than Lock and Unlock. For more information, see FileSystem.
More information here.
I have a class that does some parsing of two large (~90K rows, 11 columns in the first and around ~20K, 5 columns in the second) CSV files. According to the specification I'm working with the CSV files can be externally changed (removing/adding of new rows; columns remain constant as well as the paths). Such updates can happen at any time (though highly unlikely that an update will be launched in time intervals shorter than a couple of minutes) and an update of any of the two files has to terminate the current processing of all that data (CSV, XML from an HTTP GET request, UDP telegrams), followed by re-parsing the content of each of the two (or just one if only one has changed).
I keep the CSV data (quite reduced since I apply multiple filters to remove unwanted entries) in memory to speed working with it and also to avoid unnecessary IO operations (opening, reading, closing file).
Right now I'm looking into the QFileSystemWatcher, which seems to be exactly what I need. However I'm unable to find any information on how it actually works internally.
Since all I need is to monitor 2 files for changes the number of files shouldn't be an issue. Do I need to run it in a separate thread (since the watcher is part of the same class where the CSV parsing happens) or is it safe to say that it can run without too much fuss (that is it works asynchronously like the QNetworkAccessManager)? My dev environment for now is a 64bit Ubuntu VM (VirtualBox) on a relatively powerful host (a HP Z240 workstation) however the target system is an embedded one. While the whole parsing of the CSV files takes just 2-3 seconds at the most I don't know how much performance impact there will be once the application gets deployed so additional overhead is something of a concern of mine.
I have a vba script that extract information from huge text files and does a lot of data manipulation and calculations on the extracted info. I have about 1000 files and each take an hour to finish.
I would like to run the script on as many computers (among others ec2 instances) as possible to reduce the time needed to finish the job. But how do I coordinate the work?
I have tried two ways: I set up a dropbox as a network drive with one txt file with the current last job number thart vba access, start the next job and update the number but there is apparently too much lag between an update on a file on one computer is updated throughout the rest to be practical. The second was to find a simple "private" counter service online that updated for each visit so han would access the page, read the number and the page would update the number for the next visit from another computer. But I have found no such service.
Any suggestions on how to coordinate such tasks between different computers in vba?
First of all if you can use a proper programming language, forexample c# for easy parallel processing.
If you must use vba than first optimize your code first. Can you show us the code?
Edit:
If you must than you could do the following. First you need some sort of fileserver to store all text files in a folder.
Then in the macro, foreach every .txt file in folder,
try to open the first in exclusive mode if the file can be opened, then ("your code" after your code is finished move the file elsewhere) else Next .txt file.
I recently wrote a program in MatLab which relies heavily on MatLab's 'importdata' function and the 'lsqcurvefit' function from the optimization toolbox. This code takes approximately 15 seconds to execute using MatLab R2011b in Windows. When I transferred the code to a Linux (CentOS) machine, it took approximately 30 minutes.
Using the Profile tool, I determined that the bulk of the additional computation time was spent on the 'importdata' function and the 'lsqcurvefit' function. I cleared all variables in both environments, and I imported identical data files in both environments using 'importdata'. In Linux, this took ~5 seconds, whereas in Windows, this took ~0.1 seconds.
Is there any way to fix this problem? It's absolutely essential that the code operate rapidly in Linux. Both the memory and the processing speed of the Linux machine far exceed that of the Windows machine. After doing some reading, I tried increasing the Java heap memory, but this had no effect.
importdata itself is really only a wrapper which calls various other functions - for example, xlsread - depending on the type of input data. Since it takes multiple types of file as input (you can even load images with it, although why you would is another matter), it needs to work out what the file is, then call the appropriate function.
dlmread, on the other hand, only takes a specific type of file (ASCII-delimited numeric).
In Short:
Never use one-size-fits-all functions like importdata when you can use more specific functions. You can see a list of file formats and specific read/write functions here.
I don't think this is limited to Linux, either. A thousand repeats, loading of a small tab delimited file on a Windows machine:
dlmread : 0.756655 seconds
csvread : 1.332842 seconds
importdata : 69.508252 seconds