excel macro, need an algorithm to compare two lists - excel

I need to make a macro to compare two columns looking for duplicate cells.
I'm currently using this simple double for loop algorithm
for i = 0 To ColumnASize
Cell1 = Sheet.getCellByPosition(0,i)
for j = 0 to ColumnBSi
Cell2 = Sheet.getCellByPosition(1,j)
' Comparison happens here
Next j
Next i
However, as I have 1000+ items in each column this algorithm is quite slow and inefficient. Does anyone here know/have an idea for a more efficient way to do this?

If you want to ensure that no string in col A is equal to any string in col B, then your existing algorithm is order n^2. You may be able to improve that by the following:
1) Sort col A or a copy of it (order nlogn)
2)Sort col B or a copy of it (order nlogn)
3) Look for duplicates by list traversal, see this previous answer (order n).
That should give you an order nlogn solution and I don't think you can do much better than that.

Related

VBA function for Upside/Downside Capture

apologies for my ignorance, I'm brand new to VBA - I'm sure this is a simple problem...
I'm trying to write a fn. for up/down side capture in VBA. This is the problem:
There are two columns. One has fund performance in % (I've labelled 'returns'). The other has index performance in % (labelled 'index'). Both are same length / same number of rows. I need both to be variables to enter to the fn.
For UpsideCapture fn., for all nos. in the index column >0, I want to find the corresponding number in the returns column (which will be on the same row). Once I have those numbers I can compound them.
I've tried using Offset, assuming the returns column is 15 columns to the left of the index column but it doesn't return anything, and I don't really want to rely on it always being 15 columns apart (it arbitrary).
Many thanks!
One of my rubbish attempts is below. Any help is much appreciated. Its really just a case of finding the correct corresponding row based on the value in the index column...
Function UpsideCapture(returns As Range, index As Range) As Variant
Dim n As Integer
Dim m As Integer
Dim i As Integer
n = returns.Rows.Count
m = index.Rows.Count
For i = 1 To m
If index(i) > 0 Then
Upsidecap = ((1 + Upsidecap) * (1 + Offset(returns(i), -15))) - 1
End If
Next
UpsideCapture = Upsidecap
End Function
example

Excel find remaining columns efficiently

I have a script (thanks to SO for the help with that one!) to allow a user to select a number of discontinuous columns and insert their indexes into an array. What I need to be able to do now is efficiently select the remaining columns i.e. the ones that the user didn't select into another array to perform a separate action on these columns.
For example, the user selects columns A,C,F,G and these indexes are put into the array Usr_col(). The remaining columns (B,D,E) need to be stored in the array rem_col()
All I can think of right now is to test every used column's index against the array of user-selected columns and, if it is not contained in that array, insert it into a new array. Something like this:
For i = 1 to ws.cells(1, columns.count).end(xltoright).column
if isinarray(i, Usr_col()) = false Then
rem_col(n) = i
n = n+1
end if
next
I am just looking for a more efficient solution to this.
I agree with #ScottHoltzman that this site wouldn't normally be the arena to make working code more efficient. However, this question puts a different slant on your previous one, as the most obvious solution would be to assign column numbers to one or other of your arrays in one loop.
The code below gives you a skeleton example. You'd need to check the user's selection for proper columns. Also, it isn't great form to redimension an array within the loop, but if the user selects adjacent columns then you'd need to acquire area count and column count to get the array size. I'll leave that to you if rediming within the loop jars with you:
Dim targetCols As Range, allCols As Range
Dim selColNum() As Long, unselColNum() As Long
Dim selIndex As Long, unselIndex As Long
Set targetCols = Application.InputBox("Select your columns", Type:=8)
For Each allCols In Sheet1.UsedRange.Columns
If Intersect(allCols, targetCols) Is Nothing Then
ReDim Preserve unselColNum(unselIndex)
unselColNum(unselIndex) = allCols.Column
unselIndex = unselIndex + 1
Else
ReDim Preserve selColNum(selIndex)
selColNum(selIndex) = allCols.Column
selIndex = selIndex + 1
End If
Next

Summing a Product in Excel, over the same parameters

I've been struggling with something in excel which is quite easy to do individual cases of using an array, but I want to do in a single cell.
Effectively, in row C I have the multipliers I need, lets call them i_k, for j from 1 to n. The equation I want to calculate in mathematical notation is;
Sigma(from j = 0 to n) (Pi(from k = j to n) (i_k))
But I'm not quite sure how best to go about this. Effectively it should be;
(i_1)^n + (i_2)^(n-1) + (i_3)^(n-2) + ...
In the end. Any help?
I dont know if this is an elegant solution but I think this might solve it for you.
The key is to make a table so that your formulas continue.
The screenshot I have taken explains the formula I have used with its explanation I have used on top of it.
-- The first column will have the intended values of I
-- The second column's first row will have the value of N
-- The third column will have a formula which says:
=IF(ISNUMBER([#[Values of N]]),[#[Values of N]],C2-1)
This will give you the decreasing value of N
Now just multiply N with I on the 4th column using a simple multiplication and add the final result:

Sorting data by range of numbers

I recieve table of data sorted by purchase order PO number in asc.
For further analysis of data, I'm supposed to sort it out by business directions (sectors or departments e.g.). There is a range of order numbers, each of them related to particular direction, like 1000-1999 direction A, 2000-2999 direction B.
Ideally, I need them to be automatically sorted in ascending order with additional sum row underneath and direction name above.
The problem is, that quantity of digits in number is not limited, which means that for some extra orders might be added one digit aside, or even slash and letter, like 2000/A, or 20,000 - and this last one is supposed to be in the range of 2000-2999 POs.
What might be used to solve the problem?
I'm not certain that this will solve your problem. This fucntion will successively shrink a string from the right until it is a number. You can enter this function into a vba module and then use it in your spreadsheet.
Function getnumber(s As String) As Integer
Dim i As Integer
For i = Len(s) To 0 Step -1
If IsNumeric(Left(s, i)) Then
getnumber = Left(s, i)
i = 0
End If
Next i
End Function
If this works, I'm glad. If it doesn't, you'll need to clarify what you're looking for.
The biggest thing that I would suggest is redefining your input to make this less tedious.

SumProduct over sets of cells (not contiguous)

I have a total data set that is for 4 different groupings. One of the values is the average time, the other is count. For the Total I have to multiply these and then divide by the total of the count. Currently I use:
=SUM(D32*D2,D94*D64,D156*D126,D218*D188)/SUM(D32,D94,D156,D218)
I would rather use a SumProduct if I can to make it more readable. I tried to do:
=SUMPRODUCT((D2,D64,D126,D188),(D32,D94,D156,D218))/SUM(D32,94,D156,D218)
But as you can tell by my posting here, that did not work. Is there a way to do SumProduct like I want?
I agree with the comment "It might be possible with masterful excel-fu, but even if it can be done, it's not likely to be more readable than your original solution"
A possible solution is to embed the CHOOSE() function within your SUMPRODUCT (this trick actually is pretty handy for vlookups, finding conditional maximums, etc.).
Example:
Let's say your data has eight observations and is in two columns (columns B and C) but you don't want to include some observations (exclude observations in rows 4 and 5). Then the SUMPRODUCT code looks like this...
=SUMPRODUCT(CHOOSE({1,2},A1:A3,A6:A8),CHOOSE({1,2},B1:B3,B6:B8))
I actually thought of this on the fly, so I don't know the limitations and as you can see it is not that pretty.
Hope this helps! :)
It might be possible with masterful excel-fu, but even if it can be done, it's not likely to be more readable than your original solution. The problem is that even after 20+ years, Excel still borks discontinuous ranges. Naming them won't work, array formulas won't work and as you see with SUMPRODUCT, they don't generally work in tuple-wise array functions. Your best bet here is to come up with a custom function.
UPDATE
You're question got me thinking about how to handle discontinuous ranges. It's not something I've had to deal with much in the past. I didn't have the time to give a better answer when you asked the question but now that I've got a few minutes, I've whipped up a custom function that will do what you want:
Function gvSUMPRODUCT(ParamArray rng() As Variant)
Dim sumProd As Integer
Dim valuesIndex As Integer
Dim values() As Double
For Each r In rng()
For Each c In r.Cells
On Error GoTo VBAIsSuchAPainInTheAssSometimes
valuesIndex = UBound(values) + 1
On Error GoTo 0
ReDim Preserve values(valuesIndex)
values(valuesIndex) = c.Value
Next c
Next r
If valuesIndex Mod 2 = 1 Then
For i = 0 To (valuesIndex - 1) / 2
sumProd = sumProd + values(i) * values(i + (valuesIndex + 1) / 2)
Next i
gvSUMPRODUCT = sumProd
Exit Function
Else
gvSUMPRODUCT = CVErr(xlErrValue)
Exit Function
End If
VBAIsSuchAPainInTheAssSometimes:
valuesIndex = 0
Resume Next
End Function
Some notes:
Excel enumerates ranges by column then row so if you have a continuous range where the data is organized by column, you have to select separate ranges: gvSUMPRODUCT(A1:A10,B1:B10) and not gvSUMPRODUCT(A1:B10).
The function works by pairwise multiplying the first half of cells with the second and then summing those products: gvSUMPRODUCT(A1,C3,L2,B2,G5,F4) = A1*B2 + C3*G5 + L2*F4. I.e. order matters.
You could extend the function to include n-wise multiplication by doing something like gvNSUMPRODUCT(n,ranges).
If there are an odd number of cells (not ranges), it returns the #VALUE error.
Note that sumproduct(a, b) = sumproduct(a1, b1) + sumproduct(a2, b2) where range a is split into ranges a1 and a2 (and similar for b)
It might be helpful to create an intermediate table that summarizes the data that you are using to calculate the sum product. That would also make the calculation easier to follow.

Resources