I have a 2010 Excel 64-bit model that has a single VBA subroutine to run through 16 combinations of inputs, which all get processed using the same Excel model calculations and which then outputs the results to tabs in the model. I have access to a high performance cluster (HPC) and wish to run the VBA code such that I can run the 16 combinations in parallel, instead of the current sequential process on the HPC. How should I approach this? For examples, do I need to put each combination into a separate subroutine and have a main VBA subroutine to call each of the combinations? Is front end and back end VBA code that I need to include in order to run the model on the HPC?
Excel VBA does not directly allow multithreading, so unfortunately there is no simple VBA solution for this.
I can see a couple options here, and it will depend on your problem whether you will be able to use them.
In Excel 2007 and 2010, worksheet functions can execute in parallel. If your VBA code is a function and not a sub, and if most of your data comes from the worksheet, you could try to take advantage of that.
You could write a DLL that handles multithreading yourself, and call it from Excel. For this, you'd have to port your code to VB 6 or VB.NET (or straight up rewrite it in C/C++), and manually deal with multithreading.
Related
let's suppose I write the following VBA UDF:
Function TestFunction()
TestFunction = 0
End Function
and then I use it for the first 100000 rows in my sheet. It takes several minutes to execute.
Instead if i use TODAY() for the same number of rows it takes just 3-4 seconds to execute.
Can anyone tell me why and if is there a way to speed up UDFs?
Thank you!
Several reasons.
VBA functions need to run sequentially, off the UI/main thread, and the compiled p-code needs to be interpreted by the VBA runtime.
Native functions are native. They're (presumably - AFAIK they're written in C++) already compiled to machine code that's readily executable and doesn't need to be recompiled and/or interpreted. Some native functions can also leverage multithreaded and "background" computing.
As for speeding up your UDFs, we'd need to see your UDFs for that. A function that does nothing other than assigning a literal return value, doesn't have much room for optimization does it?
UDFs are great. But they're not a silver bullet. If I wanted to write the value 0 to A1:A1000000, I'd do Sheet1.Range("A1:A1000000").Value = 0 and that would be near-instant.
Consider looking into macros rather than UDFs if you're going to have hundreds of thousands of them to calculate.
There are a number of different reasons for this.
For one VBA UDFs are interpreted whereas native Excel worksheet functions are compiled. You would get big speed increases if you compiled your VBA code as VB6 for example. VBA and VB6 code are either the same or almost exactly the same. So the reason for the big speed increase would just be that the VB6 code is compiled rather than interpreted like VBA code is.
VBA code also doesn't produce the same type of worksheet functions that Excel does. VBA UDFs lack worksheet intellisense for example. You can't get this in any way through VBA. You can get it through external add-ins elsewhere however (e.g. Excel DNA.)
Another reason is that VBA isn't the best API for writing performant UDFs. That would be the C API. But the C API is harder to write UDFs in than in VBA.
There are also a number of other things that could affect speed, like your underlying hardware, or the algorithm you're using in the UDF. It's hard to give you useful suggestions without seeing your code.
Are you sure you need UDFs? The only advantages UDFs have over macros (that I'm aware of anyway) is that they don't delete the undostack after they're called whereas most macros do. And they can recalculate dynamically whereas you continuously have to rerun macros after they're called (unless you're using a worksheet event or something.)
If you're doing a ton of calculations on a range of cells, it's probably better to just write the range to an array, manipulate it in VBA, and then just write it back to the range.
I have an Excel "project" that includes a .dll where I have written some complex statistical calculations called through VBA. I have done that for speed reasons. The calculations take about a second each. Since they are called through VBA it stalls Excel for the duration of the calculations and that is acceptable. (The choice of Excel is not mine but a result of the way a third party has chosen to deliver data)
But for the purpose of the project I need to have the results of the calculations turn up after not one second but after ten. I could either expand the calculations for greater accuracy or simply include a pause in the code. But since it is done via VBA it stalls the whole project for all ten seconds and that is not acceptable.
I have looked into ExcelDNA since it avoids VBA completely and might make it possible to do ALL that is done via VBA with ExcelDNA or existing build in functions. I have modified this example for testing:
https://grumpyop.wordpress.com/2009/11/25/a-third-way-dna/
and included a simple Thread.Sleep(10000); to the code to simulate the pause. But that ALSO stalls Excel for the duration of the calculation.
Is there a way to include a pause in functions that doesn´t make Excel wait for the result but where the result is "pushed" to the cell/the cell "subscribes" to the result? Can it be done via ExcelDNA, XLL or via a third solution? I would prefer a soution where I can use C or very lightly modified C since all the statistical functions are written in C.
You need to make your function asynchronous.
Excel supports this from Excel 2010.
https://msdn.microsoft.com/en-us/library/office/ff796219%28v=office.14%29.aspx
ExceDNA also supports Asynchronous functions
https://exceldna.codeplex.com/wikipage?title=Asynchronous%20Functions
But you cannot use a VBA UDF to call an external resource asynchronously: the UDF has to be an XLL.
This is a question that I have always had but never really gave it much though.
What I have at the moment is a worksheet that displays data, and the user refreshes whenever needed. So:
User triggers a VBA Function
VBA Function gathers data and analyses WHILE USER WAITS
VBA Function dumps the result on the spreadsheet
User continues viewing data
Since the data analysis is all done internally in VBA (No use of workbook, only recordsets, arrays, library etc.) I wanted to somehow be able to allow the user to continue viewing the original data, while VBA works on getting and analyzing new data.
I know you cant use the workbook AND run VBA at the same time, but you can however, have two excel instances and work on one workbook while the other runs VBA.
So could I somehow have my original excel instance call another excel instance and have it run the VBA while I work on my first instance?
Any Ideas?
(Also, not sure if the tag "Multithreading" is technically correct)
First thing - there is no multithreading for VBA in Excel.
Second thing - since Excel 2007, Excel supports multithreaded recalculation of formulas.
Therefore to approach multithreading calculations in Excel you can do at least 2 things:
Create a second instance of Excel (new Application instance! Not a new workbook within the same Application!) and execute the macro remotely.
Create UDF functions (User Defined Functions) in VBA. Unfortunately you cannot edit other cells using UDF but you can save the results of your computations in a Global variable and then print the results.
My recommendation - go with option 2.
I haven't been able to try this, but it seems like you can launch Excel from VBA using Application.FollowHyperLink. The hyperlink would have to be the local path to the sheet. You might have to use VBA to also make a copy of the sheet first.
Have you thought through the concurrency issues with having two copies?
VBA is known to work on only one processor at the same time, so when I run a macro, Excel only uses 50% of CPU instead of totally (dual-core).
Is there a workaround to make VBA use both processors ?
Thanks.
The answer is no unless you split the workload across multiple Excel instances and recombine the results at the end. But the vast majority of cases of slow VBA execution can be speeded up by orders of magnitude:
switch off screen updating
set calculation to manual
read and write excel data using large ranges and variant arrays rather than cell-by-cell
look closely at the coding in VBA UDFs and bypass the VBE refresh bug
minimise the number of calls to the Excel object model
You could split the problem up across multiple instances of Excel.Applications, that way you could do half in one copy of Excel and half in the other.
I've seen someone doing simulation work in Excel VBA using 8 cores. A bit of a pain to bring the results back together at the end, but it worked.
In Excel 2010
Go to File --> Options --> Advanced.
Scroll down until you see the section for Formulas and you can set the number of processors to be used when calculating formulas.
This may not have an effect on VBA but it will work for running a formula down an entire column. Its a big time saver when working with large workbooks. Additionally you can go to task manager and right click on excel and set the priority to high so that your machine focuses its energy there.
** This option does not appear to be available for Mac **
Excel SUMIFS function can't execute if the other workbooks is closed. So, i did a SUMIFS function that opens my workbook, iterate via 'for' loop and verify if the value column needs to be added in my total variable.
I did another function that removes my 'for' loop and uses "WorksheetFunction.SumIfs(...)". My new function ran fastest than the old one.
What is the magic behind excel functions and VBA iteration?
From: http://msdn.microsoft.com/en-us/library/ff726673.aspx#xlUsingFuncts
(emphasis added)
User-Defined Functions
User-defined functions that are programmed in C or C++ and that use
the C API (XLL add-in functions) generally perform faster than
user-defined functions that are developed using VBA or Automation (XLA
or Automation add-ins). For more information, see Developing Excel
2010 XLLs.
XLM functions can also be fast, because they use the same tightly
coupled API as C XLL add-in functions. The performance of VBA
user-defined functions is sensitive to how you program and call them.
Faster VBA User-Defined Functions
It is usually faster to use the Excel formula calculations and
worksheet functions than to use VBA user-defined functions. This is
because there is a small overhead for each user-defined function call
and significant overhead transferring information from Excel to the
user-defined function. But well-designed and called user-defined
functions can be much faster than complex array formulas.
One way to get more insight is to test how the performance ratio changes with different input sizes. For example if the performance ratio remains about the same when input size increases 100-fold, than it is probably due to VBA abstraction overhead.