Clean data in excel that comes in varying formats - excel

I have an excel table that contain values in these formats. The tables span over 30000 entries.
I need to clean this data so that only the numbers directly after V- are left. This would mean that when the value is SV-51140r3_rule, V-4407..., I would only want 4407 to remain and when the value is SV-245744r822811_rule, I would only want 245744 to remain. I have about 10 formulas that can handle these variations, but it requires a lot of manual labor. I've also used the text to column feature of excel to clean this data as well, but it takes about 30 minutes to an hour to go through the whole document. I'm looking for ways that I can streamline this process so that one formula or function can handle all of these different variations. I'm open to using VBA but don't have a whole lot of experience with it and I am unable to use Pandas or any IDE or programming language. Help please!!
I've used text to columns to clean data that way and I've used a variation of this formula
=IFERROR(RIGHT(A631,LEN(A631)-FIND("#",SUBSTITUTE(A631,"-","#",LEN(A631)-LEN(SUBSTITUTE(A631,"-",""))))),A631)

Depending on your version of Excel, either of these should work. If you have the ability to use the Let function, it will improve your performance, as this outstanding article articulates.
If you're on a really old version of excel, you'll need to hit ctl shift enter to make array formula work.
While these look daunting, all these functions are doing is finding the last V (by this function) =SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄","") and then looping through each character and only returning numbers.
Obviously the mushroom 🍄 could be any character that one would consider improbable to appear in the actual data.
Old School
=TEXTJOIN("",TRUE,IF(ISNUMBER(MID(MID(SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄",""),
FIND("-",SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄","")),9^9),
FILTER(COLUMN($1:$1),COLUMN($1:$1)<=LEN(MID(SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄",""),
FIND("-",SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄","")),9^9))),1)+0),
MID(MID(SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄",""),
FIND("-",SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄","")),9^9),
FILTER(COLUMN($1:$1),COLUMN($1:$1)<=LEN(MID(SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄",""),
FIND("-",SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄","")),9^9))),1),""))
Let Function
(use this if you can)
=LET(zText,SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("🍄",999)),999),"🍄",""),
TEXTJOIN("",TRUE,IF(ISNUMBER(MID(MID(zText,FIND("-",zText),9^9),
FILTER(COLUMN($1:$1),COLUMN($1:$1)<=LEN(MID(zText,FIND("-",zText),9^9))),1)+0),
MID(MID(zText,FIND("-",zText),9^9),
FILTER(COLUMN($1:$1),COLUMN($1:$1)<=LEN(MID(zText,FIND("-",zText),9^9))),1),"")))
VBA Custom Function
You could also use a VBA custom function to accomplish what you want.
Function getNumbersAfterCharcter(aCell As Range, aCharacter As String) As String
Const errorValue = "#NoValuesInText"
Dim i As Long, theValue As String
For i = Len(aCell.Value) To 1 Step -1
theValue = Mid(aCell.Value, i, 1)
If IsNumeric(theValue) Then
getNumbersAfterCharcter = Mid(aCell.Value, i, 1) & getNumbersAfterCharcter
ElseIf theValue = aCharacter Then
Exit Function
End If
Next i
If getNumbersAfterCharcter = "" Then getNumbersAfterCharcter = errorValue
End Function

Related

Can you access a VBA list with an in-cell Excel formula?

I wrote a VBA script/macro which runs when a change is detected in a specific range (n x m) of cells. Then, it changes the values in another range (n x 1) based on what is detected in the first range.
This bit works perfectly ... but then comes the age old erased undo stack problem. Unfortunately, the ability for the user to undo their last ~10 or so actions is required.
My understanding is that the undo stack is only cleared when VBA directly edits something on the sheet - but it is preserved if the VBA is just running in the back without editing the sheet.
So my question is: Is it possible to use an in cell formula (something like below) to pull values from a VBA array?
'sample of in-cell function in cell A3
=function_to_get_value_from_vba_array(vba_array, index_of_desired_value)
Essentially, VBA would store a 1D array of strings with the values needed for the range. And by using a formula to grab the value from the array: I might be able to get around the issue of the undo stack being erased.
Thanks!
Solution
You need to do something like the following: your argument for the function should be calling the array bulding; I created one dummy function that creates some sample arrays to demonstrate it. In your case, likely you will need to store the changes on the worksheet event in a global array variable instead, and as you stated, do nothing on the worksheet (whenever a change happens, just redim or appended it on your global array as needed). However, a new problem may arise and that is when you close/reopen, or by some reason the array is lost, so you need to keep track of it, I would suggest to catch before close event and then convert the formulas to static values.
Function vba_array(TxtCase As String)
Dim ArrDummy(1) As Variant
Select Case TxtCase
Case "Txt": ArrDummy(0) = "Hi": ArrDummy(1) = "Hey"
Case "Long": ArrDummy(0) = 0: ArrDummy(1) = 1
Case "Boolean": ArrDummy(0) = True: ArrDummy(1) = False
End Select
vba_array = ArrDummy
End Function
In your calling function, do the following
Function get_value_from_vba_array(vba_array() As Variant, index_of_desired_value As Long) As Variant
'when parsing, even with option base 0 it starts at 1, so we need to add 1 up
get_value_from_vba_array = vba_array(index_of_desired_value + 1)
End Function
In your book, your formula should be something like:
=get_value_from_vba_array(vba_array("Txt"),1)
Demo
I did some actions before, so you are able to see that the "undo" works

Filter phone numbers from open text field - Power BI, excel, VBA

I have a text field in a table where I need to substitute phone numbers where applicable.
For example the text field could have:
Call me on 08588812885 immediately
Call me on 07525812845
I need assistance please contact me
Good service
Sometimes a phone number will be in the text but not always and the phone number entered will always be different.
Is there a measure to use to replace the phone numbers with no text.
Ideally the solution would be Power BI, but can also be done in the raw data using excel or VBA
Regular expression in VBA (excel) or Python (Power BI) is a straightforward solution.
I have never used PowerBI with Python before but manage to make following python script.
In PowerBI transformation steps I created a new column that would copy [message] columns and named it [noPhoneNumber], then next step ran this python script
import re
def removePhone(x):
return re.sub('\d{10,11}', "**number removed**", x)
length = len(dataset["noPhoneNumber"])
for iRow in range(length):
dataset["noPhoneNumber"][iRow] = removePhone(dataset["noPhoneNumber"][iRow])
so column "noPhoneNumber"
Call me on 08588812885 immediately
Call me on 07525812845
I need assistance please contact me
Good service
becomes
Call me on **number removed** immediately
Call me on **number removed**
I need assistance please contact me
Good service
In VBA Preferable create UDF (user defined function) and don't create a subroutine, that would be too error prone for this kind of problem.
[Added]
If you need to make a Excel based solution, you can create a UDF function like so:
(remember early binding to import of VBScript_RegExp_55.RegExp in excel)
Function removePhoneNumber(text As String, Optional replacement As String = "**number removed**") As String
Dim regex As New RegExp
regex.Pattern = "\d{10,11}"
removePhoneNumber = regex.Replace(text, replacement)
End Function
...and then use excel function like so:
=removePhoneNumber(A2),
=removePhoneNumber(A3)
and so on...
A simple VBA function alternative
Function removePhone(s As String) As String
Const DELIM As String = " "
Dim i As Long, tokens As Variant
tokens = Split(s, DELIM)
For i = LBound(tokens) To UBound(tokens)
If IsNumeric(tokens(i)) Then
tokens(i) = "*Removed*" ' << change to your needs
Exit For ' assuming a single phone number per string
End If
Next
removePhone = Join(tokens, DELIM)
End Function
You can do this in Power Query. Create a custom column with this below code. I have considered the column name is Comments but please adjust this with your column name.
if Text.Length(Text.Select([comments], {"0".."9"})) = 11
then
Text.Replace(
[comments],
Text.Select([comments], {"0".."9"}),
""
)
else [comments]
Here is the output below. You can also replace phone numbers with other text like #### to make is anonymous.
NOTE
This will only work if there are only 1 number in the string with length 11 (You can adjust the length in code as per requirement).
This will Not work if there are more than one Numbers in the string.
If there are 1 number in the string but length not equal 11, this will keep the whole string as original.

Way too long of a function but I need it - can it be done in VBA?

Because of severe lack of knowledge, I made a ridiculously long function so that I could make my calculation. The problem is that it is too long for Excel, and I tried looking online to see how I could maybe make a new function in VBA that referenced my function. I'm super lost on this one and any help would be awesome. The function would just be too messy to post here (it is 30k characters long).
Ok so here it goes - here's a part of the function:
=+IF(ISERROR(IF(LEFT(C12,FIND(" ",C12,1))=$C$2,SUMPRODUCT(P12:S12,Selection!$B$4:Selection!$E$4),IF(LEFT(C12,FIND(" ",C12,1))=$C$3,SUMPRODUCT(P12:S12,Selection!$B$5:Selection!$E$5),IF(LEFT(C12,FIND(" ",C12,1))=$C$4,SUMPRODUCT(P12:S12,Selection!$B$6:Selection!$E$6),IF(LEFT(C12,FIND(" ",C12,1))=$C$5,SUMPRODUCT(P12:S12,Selection!$B$7:Selection!$E$7),IF(RIGHT(C12,LEN($C$6))=$C$6,SUMPRODUCT(P12:S12,Selection!$B$8:Selection!$E$8),IF(RIGHT(C12,LEN($C$7))=$C$7,SUMPRODUCT(P12:S12,Selection!$B$9:Selection!$E$9),IF(RIGHT(C12,LEN($C$8))=$C$8,SUMPRODUCT(P12:S12,Selection!$B$10:Selection!$E$10),SUMPRODUCT(P12:S12,Selection!$B$11:Selection!$E$11))))))))),1,IF(LEFT(C12,FIND(" ",C12,1))=$C$2,SUMPRODUCT(P12:S12,Selection!$B$4:Selection!$E$4),IF(LEFT(C12,FIND(" ",C12,1))=$C$3,SUMPRODUCT(P12:S12,Selection!$B$5:Selection!$E$5),IF(LEFT(C12,FIND(" ",C12,1))=$C$4,SUMPRODUCT(P12:S12,Selection!$B$6:Selection!$E$6),IF(LEFT(C12,FIND(" ",C12,1))=$C$5,SUMPRODUCT(P12:S12,Selection!$B$7:Selection!$E$7),IF(RIGHT(C12,LEN($C$6))=$C$6,SUMPRODUCT(P12:S12,Selection!$B$8:Selection!$E$8),IF(RIGHT(C12,LEN($C$7))=$C$7,SUMPRODUCT(P12:S12,Selection!$B$9:Selection!$E$9),IF(RIGHT(C12,LEN($C$8))=$C$8,SUMPRODUCT(P12:S12,Selection!$B$10:Selection!$E$10),SUMPRODUCT(P12:S12,Selection!$B$11:Selection!$E$11)))))))))
To answer your question "Can it be done in VBA?", the answer is yes. If you can do it with excel functions you can do it with VBA. There may be functional disadvantages though, and coding the function will have completely different syntax (although it is usually pretty easy to translate using google searches).
One thing to consider before you go to VBA though is could the function be broken up into multiple cells? This might suit your needs and get around the character limit, although if it's 30k characters long this might not be practical or even possible to do this.
I would recommend starting out by literally googling "VBA equivalent of excel function XXXX" for each excel function you use. Then work your way inside out from the middle parentheses to perform operations on the inputs in the same order as your excel function. The main difference between VBA functions and Excel functions is that you can perform operations on the same variable line by line instead of using complicated order of operations.
For example, instead of putting =if(a2>3,b3+5,b3-5)*if(A1>3,B2+3,B2-3), you could put:
Function Your_Function_Name1(Cell_one As Range, Cell_two As Range, _
Cell_three As Range, Cell_four As Range) As Double
If Cell_four > 3 Then
If Cell_three > 3 Then
Your_Function_Name1 = (Cell_one.Value + 5) * (Cell_two.Value + 3)
Else
Your_Function_Name1 = (Cell_one.Value - 5) * (Cell_two.Value + 3)
End If
Else
If Cell_three > 3 Then
Your_Function_Name1 = (Cell_one.Value + 5) * (Cell_two.Value - 3)
Else
Your_Function_Name1 = (Cell_one.Value - 5) * (Cell_two.Value - 3)
End If
End If
End Function
and call by =Your_Function_Name1(B3,B2,A2,A1). But it is also perfectly legitimate and usually easier to do this instead:
Function Your_Function_Name(Cell_one As Range, Cell_two As Range, _
Cell_three As Range, Cell_four As Range) As Double
If Cell_three > 3 Then
Your_Function_Name = Cell_one.Value + 5
Else
Your_Function_Name = Cell_one.Value - 5
End If
If Cell_four > 3 Then
Your_Function_Name = Your_Function_Name * (Cell_two.Value + 3)
Else
Your_Function_Name = Your_Function_Name * (Cell_two.Value - 3)
End If
End Function
Both of these functions would be called the same way and yield the same result.
I think that should be enough to get you started, although you will probably end up be posting another question or two once you get into it and start debugging, but at least you will have specific code to ask about. VBA is hard at first but it is worth the time you put into it.
Good Luck!
The following guideline is a great way both to refactor your existing code, and to write new code in future:
For every block of code that has, or is big enough to have, a descriptive comment, make a subroutine and name it (in PascalCase) with the descriptive comment. Identify all local variables and redeclare them in the new subroutine. Pass in all global values as named parameters.
Rinse and Repeat until all subroutines are less than 40 lines or so.
You can cut your function in half by using IFERROR rather than IF(ISERROR
also, your Selection!$B$4:Selection!$E$4 can be reduced to Selection!$B$4:$E$4
=IFERROR(IF(LEFT(C12,FIND(" ",C12,1))=$C$2,SUMPRODUCT(P12:S12,Selection!$B$4:$E$4),IF(LEFT(C12,FIND(" ",C12,1))=$C$3,SUMPRODUCT(P12:S12,Selection!$B$5:$E$5),IF(LEFT(C12,FIND(" ",C12,1))=$C$4,SUMPRODUCT(P12:S12,Selection!$B$6:$E$6),IF(LEFT(C12,FIND(" ",C12,1))=$C$5,SUMPRODUCT(P12:S12,Selection!$B$7:$E$7),IF(RIGHT(C12,LEN($C$6))=$C$6,SUMPRODUCT(P12:S12,Selection!$B$8:$E$8),IF(RIGHT(C12,LEN($C$7))=$C$7,SUMPRODUCT(P12:S12,Selection!$B$9:$E$9),IF(RIGHT(C12,LEN($C$8))=$C$8,SUMPRODUCT(P12:S12,Selection!$B$10:$E$10),SUMPRODUCT(P12:S12,Selection!$B$11:$E$11)))))))),1)
Now that worked for me, but your test of LEFT(C12,FIND(" ",C12,1))=$C$2 seems suspect. If C12 contain Cat in the House, the left side would evaluate to
"Cat "
with a space on the end. That would be fine in the cells you are testing against contain a space on the end, but I would guess they don't. You might want to make the text
LEFT(C12,FIND(" ",C12,1)-1)=$C$2

VBA subroutine slows down a lot after first execution

I have a subroutine that generates a report of performance of different portfolios within 5 families. The thing is that the portfolios in question are never the same and the amount in each family neither. So, I copy paste a template (that is formated and...) and add the formated row (containing the formula and...) in the right family for each portfolio in the report. Everything works just fine, the code is not optimal and perfect of course, but it works fine for what we need. The problem is not the code itself, it is that when I execute the code the first time, it goes really fast (like 1 second)... but from the second time, the code slows down dramatically (almost 30 second for a basic task identical to the first one). I tried all the manual calculation, not refreshing the screen and ... but it is really not where the problem comes from. It looks like a memory leak to me, but I cannot find where is the problem! Why would the code runs very fast but sooooo much slower right after... Whatever the length of the report and the content of the file, I would need to close excel and reopen it for each report.
**Not sure if I am clear, but it is not because the code makes the excel file larger or something, because after the first (fast) execution, if I save the workbook, close and reopen it, the (new) first execution will again be very fast, but if I would have done the same excat thing without closing and reopening it would have been very slow...^!^!
Dim Family As String
Dim FamilyN As String
Dim FamilyP As String
Dim NumberOfFamily As Integer
Dim i As Integer
Dim zone As Integer
Sheets("RapportTemplate").Cells.Copy Destination:=Sheets("Rapport").Cells
Sheets("Rapport").Activate
i = 3
NumberOfFamily = 0
FamilyP = Sheets("RawDataMV").Cells(i, 4)
While (Sheets("RawDataMV").Cells(i, 3) <> "") And (i < 100)
Family = Sheets("RawDataMV").Cells(i, 4)
FamilyN = Sheets("RawDataMV").Cells(i + 1, 4)
If (Sheets("RawDataMV").Cells(i, 3) <> "TOTAL") And _
(Sheets("RawDataMV").Cells(i, 2) <> "Total") Then
If (Family <> FamilyP) Then
NumberOfFamily = NumberOfFamily + 1
End If
With Sheets("Rapport")
.Rows(i + 8 + (NumberOfFamily * 3)).EntireRow.Insert
.Rows(1).Copy Destination:=Sheets("Rapport").Rows(i + 8 + (NumberOfFamily * 3))
.Cells(i + 8 + (NumberOfFamily * 3), 6).Value = Sheets("RawDataMV").Cells(i, 2).Value
.Cells(i + 8 + (NumberOfFamily * 3), 7).Value = Sheets("RawDataMV").Cells(i, 3).Value
End With
End If
i = i + 1
FamilyP = Family
Wend
For i = 2 To 10
If Sheets("Controle").Cells(16, i).Value = "" Then
Sheets("Rapport").Cells(1, i + 11).EntireColumn.Hidden = True
Else
Sheets("Rapport").Cells(1, i + 11).EntireColumn.Hidden = False
End If
Next i
Sheets("Rapport").Cells(1, 1).EntireRow.Hidden = True
'Define printing area
zone = Sheets("Rapport").Cells(4, 3).End(xlDown).Row
Sheets("Rapport").PageSetup.PrintArea = "$D$4:$Y$" & zone
Sheets("Rapport").Calculate
Sheets("RANK").Calculate
Sheets("SommaireGroupeMV").Calculate
Sheets("SommaireGroupeAlpha").Calculate
Application.CutCopyMode = False
End Sub
I do not have laptop with me at the moment but you may try several things:
use option explicit to make sure you declare all variables before using them;
from what I remember native vba type for numbers is not integer but long, and integers are converted to long, to save the computation time use long instead of integers;
your Family variables are defined as strings but you store in them whole cells and not their values i.e. =cells() instead of =cells().value;
a rule of a thumb is to use cells(rows.count, 4).end(xlup).row
instead of cells(3, 4).end(xldown).row.;
conditional formatting may slow down things a lot;
use for each loop on a range if possible instead of while, or even copy range to variant array and iterate over that (that is the fastest solution);
use early binding rahter of late binding, i.e., define objects in a proper type as soon a possible;
do not show printing area (page breaks etc.);
try to do some pofiling and look for the bottlenecks - see finding excel vba bottlenecks;
paste only values if you do not need formats;
clear clipboard after each copy/paste;
set objects to Nothing after finishing using them;
use Value2 instead of Value - that will ignore formatting and take only numeric value instead of formatted value;
use sheet objects and refer to them, for example
Dim sh_raw As Sheet, sh_rap As Sheet
set sh_raw = Sheets("RawDataMV")
set sh_rap = Sheets("Rapport")
and then use sh_raw instead of Sheets("RawDataMV") everywhere;
I had the same problem, but I finally figured it out. This is going to sound ridiculous, but it has everything to do with print page setup. Apparently Excel recalculates it every time you update a cell and this is what's causing the slowdown.
Try using
Sheets("Rapport").DisplayPageBreaks = False
at the beginning of your routine, before any calculations and
Sheets("Rapport").DisplayPageBreaks = True
at the end of it.
I had the same problem. I am far from expert programer. The above answers helped my program but did not solve the problem. I'm running excel 2013 on a 5 year old lap top. Open the program without running it, go to File>OptionsAdvanced, Scroll down to Data and uncheck "Disable undo for large Pivot table refresh...." and "Disable undo for large data Model operation". You could also try leaving them checked but decreasing their value. One or both of these seem to be creating a ever increase file that slows the macro and eventual grinds it to a stop. I assume closing excel clears the files they create so that's why it runs fast when excel is closed and reopened at least for a while. Someone with more knowledge will have to explain what these changes will do and what the consequences are of unchecking them. It appears these changes will be applied to any new spread sheets you create. Maybe these changes would not be necessary if I had a newer more powerful computer.

Excel UDF calculation should return 'original' value

I have created a VSTO plugin with my own RTD implementation that I am calling from my Excel sheets. To avoid having to use the full-fledged RTD syntax in the cells, I have created a UDF that hides that API from the sheet.
The RTD server I created can be enabled and disabled through a button in a custom Ribbon component.
The behavior I want to achieve is as follows:
If the server is disabled and a reference to my function is entered in a cell, I want the cell to display Disabled.
If the server is disabled, but the function had been entered in a cell when it was enabled (and the cell thus displays a value), I want the cell to keep displaying that value.
If the server is enabled, I want the cell to display Loading.
Sounds easy enough. Here is an example of the - non functional - code:
Public Function RetrieveData(id as Long)
Dim result as String
// This returns either 'Disabled' or 'Loading'
result = Application.Worksheet.Function.RTD("SERVERNAME", "", id)
RetrieveData = result
If(result = "Disabled") Then
// Obviously, this recurses (and fails), so that's not an option
If(Not IsEmpty(Application.Caller.Value2)) Then
// So does this
RetrieveData = Application.Caller.Value2
End If
End If
End Function
The function will be called in thousands of cells, so storing the 'original' values in another data structure would be a major overhead and I would like to avoid it. Also, the RTD server does not know the values, since it also does not keep a history of it, more or less for the same reason.
I was thinking that there might be some way to exit the function which would force it to not change the displayed value, but so far I have been unable to find anything like that.
EDIT:
Due to popular demand, some additional info on why I want to do all this:
As I said, the function will be called in thousands of cells and the RTD server needs to retrieve quite a bit of information. This can be quite hard on both network and CPU. To allow the user to decide for himself whether he wants this load on his machine, they can disable the updates from the server. In that case, they should still be able to calculate the sheets with the values currently in the fields, yet no updates are pushed into them. Once new data is required, the server can be enabled and the fields will be updated.
Again, since we are talking about quite a bit of data here, I would rather not store it somewhere in the sheet. Plus, the data should be usable even if the workbook is closed and loaded again.
Different tack=new answer.
A few things I've discovered the hard way, that you might find useful:
1.
In a UDF, returning the RTD call like this
' excel equivalent: =RTD("GeodesiX.RTD",,"status","Tokyo")
result = excel.WorksheetFunction.rtd( _
"GeodesiX.RTD", _
Nothing, _
"geocode", _
request, _
location)
behaves as if you'd inserted the commented function in the cell, and NOT the value returned by the RTD. In other words, "result" is an object of type "RTD-function-call" and not the RTD's answer. Conversely, doing this:
' excel equivalent: =RTD("GeodesiX.RTD",,"status","Tokyo")
result = excel.WorksheetFunction.rtd( _
"GeodesiX.RTD", _
Nothing, _
"geocode", _
request, _
location).ToDouble ' or ToString or whetever
returns the actual value, equivalent to typing "3.1418" in the cell. This is an important difference; in the first case the cell continues to participate in RTD feeding, in the second case it just gets a constant value. This might be a solution for you.
2.
MS VSTO makes it look as though writing an Office Addin is a piece of cake... until you actually try to build an industrial, distributable solution. Getting all the privileges and authorities right for a Setup is a nightmare, and it gets exponentially worse if you have the bright idea of supporting more than one version of Excel. I've been using Addin Express for some years. It hides all this MS nastiness and let's me focus on coding my addin. Their support is first-rate too, worth a look. (No, I am not affiliated or anything like that).
3.
Be aware that Excel can and will call Connect / RefreshData / RTD at any time, even when you're in the middle of something - there's some subtle multi-tasking going on behind the scenes. You'll need to decorate your code with the appropriate Synclock blocks to protect your data structures.
4.
When you receive data (presumably asynchronously on a separate thread) you absolutely MUST callback Excel on the thread on which you were intially called (by Excel). If you don't, it'll work fine for a while and then you'll start getting mysterious, unsolvable crashes and worse, orphan Excels in the background. Here's an example of the relevant code to do this:
Imports System.Threading
...
Private _Context As SynchronizationContext = Nothing
...
Sub New
_Context = SynchronizationContext.Current
If _Context Is Nothing Then
_Context = New SynchronizationContext ' try valiantly to continue
End If
...
Private Delegate Sub CallBackDelegate(ByVal GeodesicCompleted)
Private Sub GeodesicComplete(ByVal query As Query) _
Handles geodesic.Completed ' Called by asynchronous thread
Dim cbd As New CallBackDelegate(AddressOf GeodesicCompleted)
_Context.Post(Function() cbd.DynamicInvoke(query), Nothing)
End Sub
Private Sub GeodesicCompleted(ByVal query As Query)
SyncLock query
If query.Status = "OK" Then
Select Case query.Type
Case Geodesics.Query.QueryType.Directions
GeodesicCompletedTravel(query)
Case Geodesics.Query.QueryType.Geocode
GeodesicCompletedGeocode(query)
End Select
End If
' If it's not resolved, it stays "queued",
' so as never to enter the queue again in this session
query.Queued = Not query.Resolved
End SyncLock
For Each topic As AddinExpress.RTD.ADXRTDTopic In query.Topics
AddinExpress.RTD.ADXRTDServerModule.CurrentInstance.UpdateTopic(topic)
Next
End Sub
5.
I've done something apparently akin to what you're asking in this addin. There, I asynchronously fetch geocode data from Google and serve it up with an RTD shadowed by a UDF. As the call to GoogleMaps is very expensive, I tried 101 ways and several month's of evenings to keep the value in the cell, like what you're attempting, without success. I haven't timed anything, but my gut feeling is that a call to Excel like "Application.Caller.Value" is an order of magnitude slower than a dictionary lookup.
In the end I created a cache component which saves and re-loads values already obtained from a very-hidden spreadsheet which I create on the fly in Workbook OnSave. The data is stored in a Dictionary(of string, myQuery), where each myQuery holds all the relevant info.
It works well, fulfils the requirement for working offline and even for 20'000+ formulas it appears instantaneous.
HTH.
Edit: Out of curiosity, I tested my hunch that calling Excel is much more expensive than doing a dictionary lookup. It turns out that not only was the hunch correct, but frighteningly so.
Public Sub TimeTest()
Dim sw As New Stopwatch
Dim row As Integer
Dim val As Object
Dim sheet As Microsoft.Office.Interop.Excel.Worksheet
Dim dict As New Dictionary(Of Integer, Integer)
Const iterations As Integer = 100000
Const elements As Integer = 10000
For i = 1 To elements + 1
dict.Add(i, i)
Next
sheet = _ExcelWorkbook.ActiveSheet
sw.Reset()
sw.Start()
For i As Integer = 1 To iterations
row = 1 + Rnd() * elements
Next
sw.Stop()
Debug.WriteLine("Empty loop " & (sw.ElapsedMilliseconds * 1000) / iterations & " uS")
sw.Reset()
sw.Start()
For i As Integer = 1 To iterations
row = 1 + Rnd() * elements
val = sheet.Cells(row, 1).value
Next
sw.Stop()
Debug.WriteLine("Get cell value " & (sw.ElapsedMilliseconds * 1000) / iterations & " uS")
sw.Reset()
sw.Start()
For i As Integer = 1 To iterations
row = 1 + Rnd() * elements
val = dict(row)
Next
sw.Stop()
Debug.WriteLine("Get dict value " & (sw.ElapsedMilliseconds * 1000) / iterations & " uS")
End Sub
Results:
Empty loop 0.07 uS
Get cell value 899.77 uS
Get dict value 0.15 uS
Looking up a value in a 10'000 element Dictionary(Of Integer, Integer) is over 11'000 times faster than fetching a cell value from Excel.
Q.E.D.
Maybe... Try making your UDF wrapper function non-volatile, that way it won't get called unless one of its arguments changes.
This might be a problem when you enable the server, you'll have to trick Excel into calling your UDF again, it depends on what you're trying to do.
Perhaps explain the complete function you're trying to implement?
You could try Application.Caller.Text This has the drawback of returning the formatted value from the rendering layer as text, but seems to avoid the circular reference problem.Note: I have not tested this hack under all possible circumstances ...

Resources