Unique Count Formula for large dataset - excel

I am having trouble determining a way to enter a 1 or 0 into an adjacent cell to indicate whether or not a value is unique when working with a large dataset. I have read of multiple methods for accomplishing this, however none of them seem efficient for my purposes: I am using an instance of Excel 2010 (so I do not have the Distinct Count feature in PivotTables, and when I try to use PowerPivot it crashes my file due to processing limitations.
In this StackOverflow question: Simple Pivot Table to Count Unique Values there are suggestions to use SUMPRODUCT or COUNTIF, but when working with 50,000+ rows as I am, this causes terrible performance and a file size of ~35 MB instead of ~3 MB. I wanted to know if there is a better solution for a large, dynamic dataset whether it is a formula or VBA.
An example of what I would like to accomplish is (with the Unique column being the adjacent cell):
Name Week Unique
John 1 1
Sally 1 1
John 1 0
Sally 2 1
I attempted to script the same functionality of COUNTIF but with no success:
For Each Cell In ThisWorkbook.Worksheets("Overtime & Type Data").Range("Z2:Z" & DataLastRow)
If Worksheets("Overtime & Type Data").Cells(Cell.Row, 26) <> Worksheets("Overtime & Type Data").Cells(Cell.Row - 1, 26) Then
FirstCell = Cell.Row
End If
If (Worksheets("Overtime & Type Data").Range(Cells(FirstCell, 26), Cells(Cell.Row, 26)) = Worksheets("Overtime & Type Data").Range(Cells(Cell.Row, 26))) = True Then
Cell.Value = 1
Else
Cell.Value = 0
End If
Next Cell

This code ran on over 130,000 rows successfully in less than 3 seconds. Adjust the column letters to fit your dataset.
Sub tgr()
Const colName As String = "A"
Const colWeek As String = "B"
Const colOutput As String = "C"
Dim ws As Worksheet
Dim rngData As Range
Dim DataCell As Range
Dim rngFound As Range
Dim collUniques As Collection
Dim arrResults() As Long
Dim ResultIndex As Long
Dim UnqCount As Long
Set ws = ThisWorkbook.Sheets("Overtime & Type Data")
Set rngData = ws.Range(colName & 2, ws.Cells(Rows.Count, colName).End(xlUp))
Set collUniques = New Collection
ReDim arrResults(1 To rngData.Cells.Count, 1 To 1)
On Error Resume Next
For Each DataCell In rngData.Cells
ResultIndex = ResultIndex + 1
collUniques.Add ws.Cells(DataCell.Row, colName).Value & ws.Cells(DataCell.Row, colWeek).Value, ws.Cells(DataCell.Row, colName).Value & ws.Cells(DataCell.Row, colWeek).Value
If collUniques.Count > UnqCount Then
UnqCount = collUniques.Count
arrResults(ResultIndex, 1) = 1
Else
arrResults(ResultIndex, 1) = 0
End If
Next DataCell
On Error GoTo 0
ws.Cells(rngData.Row, colOutput).Resize(rngData.Cells.Count).Value = arrResults
End Sub

One approach is to sort by Name and Week. Then you can determine Unique for any row by comparing with the previous row.
If you need to preserve the order, you could first write a column of Index numbers (1, 2, 3, ...) to keep track of order. After calculating Unique, sort by Index to restore the original order.
The whole process could be done manually with relatively few steps, or automated with VBA.

I'm not sure how well this will work with 50000 values, but it goes through ~1500 in about a second.
Sub unique()
Dim myColl As New Collection
Dim isDup As Boolean
Dim myValue As String
Dim r As Long
On Error GoTo DuplicateValue
For r = 1 To Sheet1.UsedRange.Rows.Count
isDup = False
'Combine the value of the 2 cells together
' and add that string to our collection
'If it is already in the collection it errors
myValue = Sheet1.Cells(r, 1).Value & Sheet1.Cells(r, 2).Value
myColl.Add r, myValue
If isDup Then
Sheet1.Cells(r, 3).Value = "0"
Else
Sheet1.Cells(r, 3).Value = "1"
End If
Next
On Error GoTo 0
Exit Sub
DuplicateValue:
'The value is already in the collection so put a 0
isDup = True
Resume Next
End Sub

Just about any bulk operation will beat a loop involving worksheet cells. You might be able to trim the time down a bit by performing all of the calculations in memory and only returning the values back to the worksheet en masse when it is complete.
Sub is_a_dupe()
Dim v As Long, vTMP As Variant, vUNQs As Variant, dUNQs As Object
Debug.Print Timer
On Error GoTo bm_Uh_Oh
Set dUNQs = CreateObject("Scripting.Dictionary")
With Worksheets("Sheet1")
vTMP = .Range(.Cells(2, 1), .Cells(Rows.Count, 2).End(xlUp)).Value2
ReDim vUNQs(1 To UBound(vTMP, 1), 1 To 1)
For v = LBound(vTMP, 1) To UBound(vTMP, 1)
If dUNQs.Exists(Join(Array(vTMP(v, 1), vTMP(v, 2)))) Then
vUNQs(v, 1) = 0
Else
dUNQs.Add Key:=Join(Array(vTMP(v, 1), vTMP(v, 2))), _
Item:=vTMP(v, 2)
vUNQs(v, 1) = 1
End If
Next v
.Cells(2, 3).Resize(UBound(vUNQs, 1), 1) = vUNQs
End With
Debug.Print Timer
bm_Uh_Oh:
dUNQs.RemoveAll
Set dUNQs = Nothing
End Sub
Previous experience tells me that the variety of data (as well as hardware, etc) will impact timing the process but in my random sample data I received these elapsed times.
 50K records ..... 0.53 seconds
130K records .... 1.32 seconds
500K records .... 4.92 seconds

Related

Counter is working, but how to make it list counted values?

I have a macro with counter for unique values that met specific conditions. As you can see on the image, I have list of unique values in column F. Macro checks, if value is listed in column AE (can contain duplicated lines) and checks if there is no "OB" in column AH. Then returns how many values it found in cell K2. But I need this counter to also list these values in column AD, but I am struggling to make it happen. I checked many forums and managed to crash Excel twice already. Any ideas how to achieve it?
Dim myTbl As range, mStr As String, Miss As Long, xCol As Variant
Set myTbl = Sheets("OB").range("AE2") '
xCol = "AH"
mStr = ""
Set myTbl = range(myTbl, myTbl.End(xlDown).Offset(0, 1))
xCol = Cells(1, xCol).Column - myTbl.Cells(1, 1).Column + 1
For i = 1 To myTbl.Rows.count
If myTbl.Cells(i, 1) <> "" Then
If myTbl.Cells(i, xCol) <> "OB" And InStr(1, mStr, "##" & myTbl.Cells(i, 1), vbTextCompare) = 0 Then
mStr = mStr & "##" & myTbl.Cells(i, 1)
Miss = Miss + 1
End If
End If
Next i
If Miss > 0 Then
range("K2") = Miss & " still active"
range("K2").Font.ColorIndex = 46
Else
range("K2") = "None"
range("K2").Font.ColorIndex = 10
End If
Please, test the next code. It, also, is able to return how many occurrences per each Value x have been found (if more than one per each exist):
Sub ExtractUniqueCondValues()
Dim sh As Worksheet, lastR As Long, arr, i As Long, dict As Object
Set sh = Sheets("OB")
lastR = sh.Range("AE" & sh.rows.count).End(xlUp).row
arr = sh.Range("AE2:AH" & lastR).Value
Set dict = CreateObject("Scripting.Dictionary")
For i = 1 To UBound(arr)
If arr(i, 4) <> "OB" Then dict(arr(i, 1)) = dict(arr(i, 1)) + 1
Next i
sh.Range("K2").Value = dict.count
sh.Range("AD2").Resize(dict.count, 1).Value = Application.Transpose(dict.Keys)
End Sub
About occurrences per each 'Value x' element, it can return in an adiacent column 'Value 2| 1 andValue 4` | 2, for your picture case... Of course, if it may have relevance for your purpose. The dictionary already keeps this data.
Maybe using formulas is an option for you? See column G where the formula in G2 is the following and copied down.
=IF(COUNTIFS(AE:AE,F2,AH:AH,"<>OB")>0,F2,"")
Using Count or Countifs may be an option instead of VBA.

How to find a value in another sheet and get min and max values from adjustment columns

I am new to VBA macro and need some experts help on meeting the below requirement.
I got a workbook containing 2 sheets called 'Data' and 'Stats'.
'Data' contains the values as below
'Stats' contains the values as below
On click on the button, I would like to do the below
Get the values in column A in 'Stats' sheet
Find all the matching rows in 'Data' Sheet
Find the smallest start time and put that in 'Stats' sheet against the stage value
Find the biggest end time and that in 'Stats' sheet against the stage value
Final output would be like below
Note: I do not have the MINIFS or MAXIFS in my installation.
Incase you dont have MINIFS and MAXIFS you can use array formulas like so:
={MIN(IF(Stats!A1=Data!$A$1:$A$1000,Data!$C$1:$C$1000))}
and
={MAX(IF(Stats!A1=Data!$A$1:$A$1000,Data!$B$1:$B$1000))}
The {} indicates, that this is a Array-Formula. Enter with Ctrl + Shift + Enter
No VBA needed.
Just use in your Stats worksheet the following formula for Start:
=MINIFS(Data!A:A,Data!C:C,Stats!A:A)
and the following for End:
=MAXIFS(Data!B:B,Data!C:C,Stats!A:A)
Please, the VBA solution, too. It will be very fast, using arrays, processing everything in memory and dropping the result at once:
Sub BringStats()
Dim shD As Worksheet, shS As Worksheet, lastRD As Long, lastRS As Long
Dim arrD, arrS, i As Long, k As Long, dict As Object, El As Variant
Set shD = Worksheets("Data")
Set shS = Worksheets("Stats")
lastRD = shD.Range("A" & rows.count).End(xlUp).row
lastRS = shS.Range("A" & rows.count).End(xlUp).row
arrD = shD.Range("A2:C" & lastRD).Value
arrS = shS.Range("A2:C" & lastRS).Value
Set dict = CreateObject("Scripting.dictionary")
'load the dictionary with unique keys and all corresponding date in a string, as item
For i = 1 To UBound(arrD)
If Not dict.Exists(arrD(i, 3)) Then
dict.Add arrD(i, 3), CDate(arrD(i, 1)) & ";" & CDate(arrD(i, 2))
Else
dict(arrD(i, 3)) = dict(arrD(i, 3)) & "|" & CDate(arrD(i, 1)) & ";" & CDate(arrD(i, 2))
End If
Next
Dim arr As Variant, minTime As Date, minPos As Long
For i = 1 To UBound(arrS)
If dict.Exists(arrS(i, 1)) Then
arr = Split(dict(arrS(i, 1)), "|") 'extract each pair of time stamps
If UBound(arr) > 0 Then
For Each El In arr 'extract the element containing minimum time
If minTime = 0 Then
minTime = TimeValue(Split(El, ";")(0)): minPos = k
Else
If TimeValue(Split(El, ";")(0)) < minTime Then minTime = TimeValue(Split(El, ";")(0)): minPos = k
End If
k = k + 1
Next
arrS(i, 2) = Split(arr(minPos), ";")(0): arrS(i, 3) = Split(arr(minPos), ";")(1) 'load the array with the minimum time correspondent values
Else
arrS(i, 2) = Split(dict(arrS(i, 1)), ";")(0): arrS(i, 3) = Split(dict(arrS(i, 1)), ";")(1)'loading the array in case of only one occurrence
End If
End If
minPos = 0: minTime = 0: k = 0 'reinitialize the used variables
Next i
'drop the processed array at once
shS.Range("A2").Resize(UBound(arrS), UBound(arrS, 2)).Value = arrS
End Sub
There can be a lot of the same 'stage' occurrences...

VBA (RFC) SAP export to excel

I am writing a VB application for connecting to a sap system (using rfc).
Everything works fine and I do get a connection and the data as well.
Nevertheless the code for saving the accessed data and writing it to a excel file is really slow.
After the connection I call RFC_READ_TABLE, which returns with a result in <5 secs, which is perfect. Writing to excel (cell by cell) is pretty slow.
Is there any way to 'export' the whole tblData to excel and not being dependent on writing cell by cell?
Thanks in advance!
If RFC_READ_TABLE.Call = True Then
MsgBox tblData.RowCount
If tblData.RowCount > 0 Then
' Write table header
For j = 1 To Size
Cells(1, j).Value = ColumnNames(j)
Next j
Size = UBound(ColumnNames, 1) - LBound(ColumnNames, 1) + 1
For i = 1 To tblData.RowCount
DoEvents
Textzeile = tblData(i, "WA")
For j = 1 To Size
Cells(i + 1, j).Value = LTrim(RTrim(getPieceOfTextzeile(Textzeile)))
Next j
Next
Else
MsgBox "No entries found in system " & SYSID, vbInformation
End If
Else
MsgBox "ERROR CALLING SAP REMOTE FUNCTION CALL"
End If
Arrays: Faster Than Ranges
If the data was ready (need not to be processed) something like this could be a solution:
Sub Sap()
Const cStrStart As String = "A1" 'First cell of the resulting data
Dim tbldata
Dim arrSap As Variant 'Will become a one-based two dimensional array
Dim oRng As Range
arrSap = tbldata 'Data is in the array.
'Calculate the range: Must be the same size as arrSap
Set oRng = Range(Cells(Range(cStrStart).Row, UBound(arrSap)), _
Cells(Range(cStrStart)).Column, UBound(arrSap, 2))
oRng = arrSap 'Paste array into range.
End Sub
Since you need to process your data from tbldata do what you do not to the range, but to an array which should be much faster:
Sub Sap()
Const cStrStart As String = "A1" 'First cell of the resulting data
Dim arrSap() As Variant
Dim oRng As Range
Dim Size As Integer
If RFC_READ_TABLE.Call = True Then
'-------------------------------------------------------------------------------
MsgBox tbldata.RowCount
If tbldata.RowCount > 0 Then
Size = UBound(ColumnNames, 1) - LBound(ColumnNames, 1) + 1
ReDim arrSap(1 To tbldata.RowCount + 1, 1 To Size) '+ 1 for header
' Write table header
For j = 1 To Size
arrSap(1, j).Value = ColumnNames(j)
Next j
' Write data
For i = 1 + 1 To tbldata.RowCount + 1 '+ 1 for header
DoEvents
'- 1 due to header, don't know what "WA" is
Textzeile = tbldata(i - 1, "WA")
For j = 1 To Size
arrSap(i, j) = _
LTrim(RTrim(getPieceOfTextzeile(Textzeile)))
Next j
Next
'-------------------------------------------------------------------------------
'Calculate the range: Must be the same size as arrSap
Set oRng = Range(Cells(Range(cStrStart).Row, Range(cStrStart).Column), _
Cells(UBound(arrSap) + Range(cStrStart).Row -1, _
UBound(arrSap, 2) + Range(cStrStart).Column -1))
oRng = arrSap
'-------------------------------------------------------------------------------
Else
MsgBox "No entries found in system " & SYSID, vbInformation
End If
Else
MsgBox "ERROR CALLING SAP REMOTE FUNCTION CALL"
End If
End Sub
Now adjust the cStrStart, check the rest of the code and you're good to go.
I haven't created a working example so I edited this code a few times. Check it carefully not to lose data.

Array Formula into Regular one

Hi everyone, by using an array formulas to calculate (in the above example):
Count unique customers that had purchased only less than 5 units of only product 1 which area code match only with the adjacent D cells
I Use the following array formula to be in E11:
=SUM(IF(FREQUENCY(IF($G$2:$G$7=D11,
IF($I$2:$I$7="Product 1",IF($J$2:$J$7<5,IF($E$2:$E$7<>"",
MATCH($E$2:$E$7,$E$2:$E$7,0))))),ROW($E$2:$E$7)-ROW(G2)+1),1))
this formula doing great, at the same time when using it thru very huge database containing tons of rows and columns, excel takes a bout 3 minutes to calculate only one cell which is terrible to continue like that
is there any way to convert this array formula to regular one ... any help will be appreciated to the maximum ... Thanks in advance
Sorry for the late answer.
I created an UDF which is focused on doing the calculation several times without running the whole range multiple times.
Public Function getCounts(AreaStr As Variant, AreaRng As Range, CustomerRng As Range, ProductRng As Range, SalesRng As Range, Optional ProductName As String = "Product 1", Optional lessThan As Double = 5) As Variant
'make sure AreaStr is an array
If TypeOf AreaStr Is Range Then AreaStr = AreaStr.Value2
If Not IsArray(AreaStr) Then
AreaStr = Array(AreaStr)
ReDim Preserve AreaStr(1 To 1)
End If
'shorten the range (this way you can use whole columns)
If SalesRng(SalesRng.Cells.Count).Formula = "" Then Set SalesRng = SalesRng.Parent.Range(SalesRng.Cells(1), SalesRng(SalesRng.Cells.Count).End(xlUp))
'make sure all ranges have the same size
Set AreaRng = AreaRng.Resize(SalesRng.Rows.Count)
Set CustomerRng = CustomerRng.Resize(SalesRng.Rows.Count)
Set ProductRng = ProductRng.Resize(SalesRng.Rows.Count)
'Load values in variables to increase speed
Dim SalesValues As Variant, UserValues As Variant, ProductValues As Variant
SalesValues = AreaRng
UserValues = CustomerRng
ProductValues = ProductRng
'create temporary arrays to hold the values
Dim buffer() As Variant, expList() As Variant
ReDim buffer(1 To UBound(UserValues))
ReDim expList(1 To UBound(AreaStr), 1 To 1)
Dim i As Long, j As Double, k As Long
For i = 1 To UBound(AreaStr)
expList(i, 1) = buffer
Next
buffer = Array(buffer, buffer)
buffer(0)(1) = 0
For i = 1 To UBound(UserValues)
If ProductValues(i, 1) = ProductName Then 'this customer purchased our product
j = Application.IfError(Application.Match(UserValues(i, 1), buffer(0), 0), 0)
If j = 0 Then 'first time this customer in this calculation
j = i
buffer(0)(j) = UserValues(i, 1) 'remember the customer name (to not calculate him again later)
If Application.SumIfs(SalesRng, CustomerRng, UserValues(i, 1), ProductRng, ProductName) < lessThan Then
buffer(1)(j) = 1 'customer got less than "lessThan" -> remember that
End If
End If
If buffer(1)(j) = 1 Then 'check if we need to count the customer
k = Application.IfError(Application.Match(SalesValues(i, 1), AreaStr, 0), 0) 'check if the area is one of the areas we are looking for
If k Then expList(k, 1)(j) = 1 'it is -> set 1 for this customer/area combo
End If
End If
Next
For i = 1 To UBound(AreaStr) 'sum each area
expList(i, 1) = Application.Sum(expList(i, 1))
Next
getCounts = expList 'output array
End Function
I assume that you will be able to include it as an UDF without my help.
In the sheet you would use (for your example) E11:E16
=getCounts(D11:D15,G2:G7,E2:E7,I2:I7,J2:J7)
simply select the range of E11:E16 and enter the formula, then confirm it with CSE.
you also could use only =getCounts(D11,$G$2:$G$7,$E$2:$E$7,$I$2:$I$7,$J$2:$J$7) at E11 and then copy down... but that would be pretty slow.
The trick is, that we calculate the sum of the set for every customer, which at least bought it one time. Then we store 1 if it is less then your criteria. This goes for the general array. Every area you are looking for, will get its own array too. Here we also store the 1 at the same pos. As every costomer only gets calculated one time, having him multiple times doesn't matter.
the formula simply will be used like this:
getCounts(AreaStr,AreaRng,CustomerRng,ProductRng,SalesRng,[ProductName],[lessThan])
AreaStr: the area code you are looking for. should be an array of multiple cells to make the udf worth using it
AreaRng: the range where the area names are stored
CustomerRng: the range where the customer names are stored
ProductRng: the range where the product names are stored
SalesRng: the range where the sale counts are stored
ProductName (optional): the product you are looking for. Will be "Product 1" if omited
lessThan (optional): the trigger point for the sum of products. Will be 5 if omited
Most parts should be self explaining, but if you still have any questions, just ask ;)
OK, I am not sure of I understood all of the conditions and accumulation, but here is a VBA function that I think should do it.
First, open VBA from the Excel Developer menu. Then in VBA, create a new module from the Insert menu (just let it be Module1). Then paste the following 2 functions into the VBA module.
Public Function AreaUniqueCustomersLessThan(ReportAreaRange, AreaRange, ProductRange, SalesRange, CustomerRange)
On Error GoTo Err1
Dim RptAreas() As Variant
Dim Areas() As Variant, Products() As Variant, Sales() As Variant, Customers As Variant
RptAreas = ArrayFromRange(ReportAreaRange)
Areas = ArrayFromRange(AreaRange)
Products = ArrayFromRange(ProductRange)
Sales = ArrayFromRange(SalesRange)
Customers = ArrayFromRange(CustomerRange)
Dim r As Long, s As Long 'report and source rows indexes
Dim mxr As Long, mxs As Long
mxr = UBound(RptAreas, 1)
mxs = UBound(Areas, 1)
'encode the ReportAreasList into accumulation array indexes
Dim AreaCustomers() As Collection
Dim i As Long, j As Long
Dim colAreas As New Collection
ReDim AreaCustomers(1 To mxr)
For r = 1 To mxr
On Error Resume Next
'Do we have the area already?
j = colAreas(RptAreas(r, 1))
If Err.Number <> 0 Then
'Add a new area to the collection and array
i = i + 1
colAreas.Add i, RptAreas(r, 1)
Set AreaCustomers(i) = New Collection
j = i
End If
Next r
'now scan the source rows, accumulating distinct customers
' for any ReportAreas
For s = 1 To mxs
'is this row's Arera in the report Area list?
i = 0
On Error Resume Next
i = colAreas(Areas(s, 1))
On Error GoTo Err1
If i > 0 Then
'this is a report Area code, so check the conditions
If Products(s, 1) = "Product 1" Then
If Sales(s, 1) < 5 Then
On Error Resume Next 'just ignore any duplicate errors
AreaCustomers(i).Add Customers(s, 1), Customers(s, 1)
On Error GoTo Err1
End If
End If
End If
Next s
'finally, return to the report area codes, returning the distinct count
' of customers
Dim count() As Variant
ReDim count(1 To mxr, 1 To 1)
For r = 1 To mxr
count(r, 1) = AreaCustomers(colAreas(RptAreas(r, 1))).count
Next r
AreaUniqueCustomersLessThan = count ' "foo"
Exit Function
Err1:
AreaUniqueCustomersLessThan = "%ERR(" & Str(Err.Number) & ")%" & Err.Description
Exit Function
Resume
End Function
'handle all of the cases, checking and conversions to convert
' a variant range into an array of Variant(1 to n, 1 to 1)
' (we do this because it makes data access very fast)
Function ArrayFromRange(varRange As Variant)
Dim rng As Range
Dim A() As Variant
Set rng = varRange
'Check for degenerate cases
If rng Is Nothing Then
'do nothing
ElseIf rng.count = 0 Then
'do nothing
ElseIf rng.count = 1 Then
ReDim A(1 To 1, 1 To 1)
A(1, 1) = rng.Value
Else
A = rng.Value
End If
ArrayFromRange = A
End Function
Finally, go to your Array Formula area and paste in the following Array formula for the "Sales < 5" list: {=AreaUniqueCustomersLessThan(D$11:D$16, G$2:G$7, I$2:I$7,J$2:J$7,E$2:E$7)} Note that the first range must be the same length as the Array Formula range itself. And the other four ranges (the source data ranges) should all be the same length (they do not have to be the same length as the first range).

VBA-Excel and large data sets causes program to crash

First time poster and new to programming in general. I have a project in which i have to build a financial model to mine for data in excel. I have succeeded in building said model on VBA. I have ran tests on 3,000 line dataset and it was successful. I will briefly explain what it does.
I track a given stock on a given day on multiple exchanges. I download the data (roughly 935,000 lines) The first step is copy all the data for a given exchange (roughly 290,000) onto a new sheet (this takes roughly 8min), then I create a new column to log the bid ask spread (12secs), the next step is what Im having trouble with, I basically rank each line of data twice, one column for Bid size and one column for Ask size. I created a function which uses excel Percentile function and ranks based on where the given bid and ask size lands. As of right now, I have been running the Macro for the last 35min and has yet to execute. I cant attempt the other macros since each macro depends on the previous one.
So my basic issue is that since my data set is large, my model keeps crashing.The code seems to be fine when working with the test data, and it doesn't throw any errors when I run the program, but with the larger data set it just crashes. Does anyone have any suggestions? Is this normal with such large amounts of data?
Thanks in advance.
Sham
Here is the sub and function thats giving me the trouble, the sub takes in the required inputs to run the function and then pops into the assigned cell. The code is suppose to repeat the process for three separate sheets. For now, Id like it to work on one sheet, hence used the comments to not include the loop
Sub Bucketting()
Dim firstRow As Long
Dim lastRow As Long
Dim counter As Long
Dim bidRange As Range
Dim offerRange As Range
Dim bidScroll As Range
Dim offerScroll As Range
Dim Ex As String
Dim i As Integer
'For i = 1 To 1 Step 1 'Sheet Selection Process
' If i = 1 Then
' Ex = "Z"
' ElseIf i = 2 Then
' Ex = "P"
' Else
' Ex = "T"
' End If
Sheets("Z").Select 'Sheet selected
With ActiveSheet
firstRow = .UsedRange.Cells(1).Row + 1
lastRow = .UsedRange.Rows.Count
Set bidRange = .Range("F2:F" & lastRow)
Set offerRange = .Range("G2:G" & lastRow)
For counter = lastRow To firstRow Step -1
Set bidScroll = .Range("F" & counter)
Set offerScroll = .Range("G" & counter)
With .Cells(counter, "J")
.Value = DECILE_RANK(bidRange, bidScroll)
End With
With .Cells(counter, "K")
.Value = DECILE_RANK(offerRange, offerScroll)
End With
Next counter
End With
Range("J1").Select
ActiveCell = "Bid Rank"
ActiveCell.Offset(0, 1) = "Offer Rank"
'Next i
End Sub
Function DECILE_RANK(DataRange, RefCell)
'Credit: BJRaid
'DECILE_RANK(The Range of data)
'Declares the function that can be called in the spreadsheet cell - enter '=DECILE_RANK(A5:A50,A5)
'Using the percentile worksheet function calculate where the 10th, 20th etc percentile of the reference range are
DEC1 = Application.WorksheetFunction.Percentile(DataRange, 0.1)
DEC2 = Application.WorksheetFunction.Percentile(DataRange, 0.2)
DEC3 = Application.WorksheetFunction.Percentile(DataRange, 0.3)
DEC4 = Application.WorksheetFunction.Percentile(DataRange, 0.4)
DEC5 = Application.WorksheetFunction.Percentile(DataRange, 0.5)
DEC6 = Application.WorksheetFunction.Percentile(DataRange, 0.6)
DEC7 = Application.WorksheetFunction.Percentile(DataRange, 0.7)
DEC8 = Application.WorksheetFunction.Percentile(DataRange, 0.8)
DEC9 = Application.WorksheetFunction.Percentile(DataRange, 0.9)
' Calculate the Decile rank that the reference cell value sits within
If (RefCell <= DEC1) Then DECILE_RANK = 1
If (RefCell > DEC1) And (RefCell <= DEC2) Then DECILE_RANK = 2
If (RefCell > DEC2) And (RefCell <= DEC3) Then DECILE_RANK = 3
If (RefCell > DEC3) And (RefCell <= DEC4) Then DECILE_RANK = 4
If (RefCell > DEC4) And (RefCell <= DEC5) Then DECILE_RANK = 5
If (RefCell > DEC5) And (RefCell <= DEC6) Then DECILE_RANK = 6
If (RefCell > DEC6) And (RefCell <= DEC7) Then DECILE_RANK = 7
If (RefCell > DEC7) And (RefCell <= DEC8) Then DECILE_RANK = 8
If (RefCell > DEC8) And (RefCell <= DEC9) Then DECILE_RANK = 9
If (RefCell > DEC9) Then DECILE_RANK = 10
End Function
935,000 lines is a lot for excel. Like, really a lot. Barring saying using a real database, If your application is literally putting a =Percentile(...) in each cell, I would recommend Trying to use another tool for that. Perhaps something within VBA itself. More generally, use something outside of a cell - then store the result value in the cell. There is a lot of overhead in maintaining those formulas that are interdependent on 935k rows of data.
The problem is that your looping through each row individually, the Excel way is to try and work with whole ranges at once whenever possible. I would load the ranges into arrays, then modify your DECILE_RANK code to work with the items in the array.
Note that variant arrays that read ranges in are 2-D.
Here is the fully functioning code including my custom VBA array slicer. Note that it was only tested on a small dataset:
Sub Bucketting()
Dim lastRow As Long
Dim bidArray As Variant
Dim offerArray As Variant
Sheets("Sheet1").Select 'Sheet selected
With ActiveSheet
lastRow = .UsedRange.Rows.Count + 1
bidArray = .Range("F2:F" & lastRow)
offerArray = .Range("G2:G" & lastRow)
Range("J2:J" & lastRow).Value = GetArraySlice2D(DECILE_RANK(bidArray), "column", 1, 1, 0)
Range("K2:K" & lastRow).Value = GetArraySlice2D(DECILE_RANK(offerArray), "column", 1, 1, 0)
End With
Range("J1").Select
ActiveCell = "Bid Rank"
ActiveCell.Offset(0, 1) = "Offer Rank"
End Sub
Function DECILE_RANK(DataRange As Variant) As Variant
' Credit: BJRaid
' DECILE_RANK(The Range of data)
' Declares the function that can be called in the spreadsheet cell - enter '=DECILE_RANK(A5:A50,A5)
Dim DEC(0 To 10) As Variant
Dim i As Integer, j As Integer
'Using the percentile worksheet function calculate where the 10th, 20th etc percentile of the reference range are
DEC(0) = 0
For i = 1 To 9
DEC(i) = Application.WorksheetFunction.Percentile(DataRange, 0.1 * i)
Next i
DEC(10) = Application.WorksheetFunction.Max(DataRange)
' Calculate the Decile rank that the reference cell value sits within
For i = 1 To UBound(DataRange, 1)
For j = 1 To 10
If ((DataRange(i, 1) > DEC(j - 1)) And (DataRange(i, 1) <= DEC(j))) Then
DataRange(i, 1) = j
Exit For
End If
Next j
Next i
DECILE_RANK = DataRange
End Function
Public Function GetArraySlice2D(Sarray As Variant, Stype As String, Sindex As Integer, Sstart As Integer, Sfinish As Integer) As Variant
' this function returns a slice of an array, Stype is either row or column
' Sstart is beginning of slice, Sfinish is end of slice (Sfinish = 0 means entire
' row or column is taken), Sindex is the row or column to be sliced (NOTE:
' 1 is always the first row or first column)
' an Sindex value of 0 means that the array is one dimensional 3/20/09 Lance Roberts
Dim vtemp() As Variant
Dim i As Integer
On Err GoTo ErrHandler
Select Case Sindex
Case 0
If Sfinish - Sstart = UBound(Sarray) - LBound(Sarray) Then
vtemp = Sarray
Else
ReDim vtemp(1 To Sfinish - Sstart + 1)
For i = 1 To Sfinish - Sstart + 1
vtemp(i) = Sarray(i + Sstart - 1)
Next i
End If
Case Else
Select Case Stype
Case "row"
If Sfinish = 0 Or (Sstart = LBound(Sarray, 2) And Sfinish = UBound(Sarray, 2)) Then
vtemp = Application.WorksheetFunction.Index(Sarray, Sindex, 0)
Else
ReDim vtemp(1 To Sfinish - Sstart + 1)
For i = 1 To Sfinish - Sstart + 1
vtemp(i) = Sarray(Sindex, i + Sstart - 1)
Next i
End If
Case "column"
If Sfinish = 0 Or (Sstart = LBound(Sarray, 1) And Sfinish = UBound(Sarray, 1)) Then
vtemp = Application.WorksheetFunction.Index(Sarray, 0, Sindex)
Else
ReDim vtemp(1 To Sfinish - Sstart + 1)
For i = 1 To Sfinish - Sstart + 1
vtemp(i) = Sarray(i + Sstart - 1, Sindex)
Next i
End If
End Select
End Select
GetArraySlice2D = vtemp
Exit Function
ErrHandler:
Dim M As Integer
M = MsgBox("Bad Array Input", vbOKOnly, "GetArraySlice2D")
End Function
I'm not sure if this will directly address your problem, but have you considered using Application.ScreenUpdating = False? Don't forget to set it back to true once your data has processed.

Resources