Performing MATCH/INDEX on large dataset - excel

I have a dataset of 55.000 rows with 35.000 email-addresses of which 31.000 are unique, so a couple of users occupy multiple rows. I need to find the rows of these users and add them to a class object.
Loading the email column into an array and performing a MATCH/INDEX lookup took 200 seconds. It's acceptable for now, but definitely not fast enough for it's intended use of 200-500K datasets.
Dim StartTime As Double
Dim SecondsElapsed As Double
StartTime = Timer
Dim dict As Dictionary
Set dict = CreateObject("scripting.dictionary")
LastRow = Cells(Rows.Count, 1).End(xlUp).Row
Set UserRange = Range(Cells(2, 11), Cells(LastRow, 11))
For Each cell In UserRange
dict(cell.value) = dict(cell.value) + 1
Next
Debug.Print "Number of users: " & dict.Count
UserArray = Range(Cells(2, 11), Cells(LastRow, 11))
UserArray = WorksheetFunction.Transpose(WorksheetFunction.Transpose(WorksheetFunction.Transpose(UserArray)))
For Each User In dict
Dim UserIndex() As Variant
ReDim UserIndex(1 To dict(User))
For i = 1 To dict(User)
Row = WorksheetFunction.Match(User, UserArray, 0)
UserIndex(i) = Row
UserArray(Row) = Empty
Next
For i = LBound(UserIndex) To UBound(UserIndex)
Debug.Print User, UserIndex(i)
Next
Next
SecondsElapsed = Round(Timer - StartTime, 2)
Debug.Print "This code ran successfully in " & SecondsElapsed & " seconds"
I could probably build a block-based index based on source (every imported file +-10.000 entries has a start- and endrow) and speed it up by looking only in the appropriate block. But maybe there is another way?

Here's a different approach which is fairly fast:
Sub Lister()
Dim t, i, m, arr, rng, dict As Object, dictDupes As Object, usr, v
Set dict = CreateObject("scripting.dictionary")
Set dictDupes = CreateObject("scripting.dictionary")
Set rng = Range("A1:A500000")
'create some dummy data (0.5M rows)
With rng
.Formula = "=""USER_"" & ROUND(RAND()*5000,0) & ""_"" & ROUND(RAND()*3000,0)"
.Value = .Value
End With
t = Timer
arr = rng.Value
For i = 1 To UBound(arr, 1)
usr = arr(i, 1)
If Not dict.exists(usr) Then
dict.Add usr, i
Else
If Not dictDupes.exists(usr) Then dictDupes.Add usr, dict(usr)
dictDupes(usr) = dictDupes(usr) & "|" & i
End If
Next i
For Each usr In dictDupes
v = dictDupes(usr)
'Debug.Print "----" & usr & "---"
'Debug.Print Join(Split(v, "|"), ", ")
Next usr
Debug.Print dict.Count, dictDupes.Count
Debug.Print "Done in", Timer - t
End Sub
Completes in about 20-25 sec
Another note:
If you want to use Match then it's significantly faster to leave your data on the worksheet instead of running Match against an array.
Sub TestMatch()
Dim t, i, m, arr, rng
Set rng = Range("A1:A50000")
With rng
.Formula = "=ROUND(RAND()*30000,0)"
.Value = .Value
End With
t = Timer
For i = 1 To 10000
m = Application.Match(i, rng, 0)
Next i
Debug.Print "sheet", Timer - t
arr = rng.Value
t = Timer
For i = 1 To 10000
arr = rng.Value
m = Application.Match(i, arr, 0)
Next i
Debug.Print "array", Timer - t
End Sub
Output:
sheet 3.644531
array 131.9453
So the array is about 35x slower.

Thanks to Tim I got a solution going:
Dim dict As Dictionary
Set dict = CreateObject("scripting.dictionary")
LastRow = Cells(Rows.Count, 1).End(xlUp).Row
Set UserRange = Range(Cells(2, 11), Cells(LastRow, 11))
For Each cell In UserRange
dict(cell.value) = dict(cell.value) + 1
Next
Debug.Print "Number of users: " & dict.Count
t = Timer
For Each User In dict
Set Profile = New UserProfile
Profile.Count = dict(User)
Dim UserIndex() As Variant
ReDim UserIndex(1 To dict(User))
For i = 1 To dict(User)
Row = WorksheetFunction.Match(User, UserRange, 0)
UserIndex(i) = Row
Next
For i = LBound(UserIndex) To UBound(UserIndex)
Dim Purchase() As Variant
ReDim Purchase(1 To LastCol) As Variant
Purchase = Range(Cells(UserIndex(i) + 1, 1), Cells(UserIndex(i) + 1, LastCol))
Profile.Add Purchase
Next
Next
Debug.Print "Match/Index loop completed in ", Timer - t
Turns out that matching on a range instead of an array is much faster. And so is reading from a range instead of performing an WorksheetFunction.Index on an array. These results were both unexpected to me, as I thought reading/writing to the workbook generally slows things down. I also added a (1 , to thePurchase array readouts in my UserProfile class in order to ditch the Transpose.
Profiling for the whole 55K dataset completed in only 23 seconds!

Related

VBA: Adding an identifier to values in column if there are duplicates

I am trying to assign a identifier to the back of the string if there are duplicate values.
I considered a for loop with a counter but it simply gave me a sequence of numbers in each cells.
Is there another way I can approach this matter?
Sub Macro1()
For i = 1 To 10
For N = 1 To 10
If Worksheets("sheet1").Range("A" & i) = Worksheets("sheet1").Range("A" & N) Then
Worksheets("sheet1").Range("A" & i) = Worksheets("sheet1").Range("A" & i) & counter
counter = counter + 1
End If
Next N
Next i
End Sub
There might be other solutions, but this is what I actually came up with.
Don't forget to add this reference to your project:
Tools -> References -> Microsoft Scripting Runtime -> tick checkbox
Sub Macro1()
' Add reference to project:
' Tools -> References -> Microsoft Scripting Runtime -> tick checkbox
Dim dDict As New Scripting.Dictionary
Dim rngInput As Range
Dim sOutputCol As String
Dim lRow As Long, lCount As Long
With ThisWorkbook.Worksheets("SHEET_NAME") ' SHEET_NAME: the name of the sheet where input is
Set rngInput = .Range("A1:A10") ' A1:A10: the range on the sheet where input is
sOutputCol = "C" ' sOutputCol: output column's letter
For lRow = rngInput.Rows(1).Row To rngInput.Rows(rngInput.Rows.Count).Row
If dDict.Exists(rngInput.Cells(lRow, 1).Value2) Then
lCount = dDict(rngInput.Cells(lRow, 1).Value2)
dDict.Remove rngInput.Cells(lRow, 1).Value2
dDict.Add rngInput.Cells(lRow, 1).Value2, lCount + 1
Else
dDict.Add rngInput.Cells(lRow, 1).Value2, 1
End If
.Cells(lRow, sOutputCol).Value2 = rngInput.Cells(lRow, 1).Value2 & "_" & dDict(rngInput.Cells(lRow, 1).Value2)
Next lRow
End With
End Sub
Here's a slightly different Dictionary approach, which gives you an array of the final values.
Sub Tester()
Dim arr, r As Long, dict As Object, rng As Range, v, tmp
Set dict = CreateObject("scripting.dictionary")
Set rng = ActiveSheet.Range("B1:B20")
arr = rng.Value 'get the input values in an array
For r = 1 To UBound(arr, 1) 'loop over all the values
v = arr(r, 1)
If Not dict.exists(v) Then 'new value?
dict.Add v, Array(r, 1) 'store row for the first occurence and start the count
Else
tmp = dict(v) 'not new value: pull array from dict
tmp(1) = tmp(1) + 1 'increment count for this value
If tmp(1) = 2 Then arr(tmp(0), 1) = arr(tmp(0), 1) & "_1" 'second instance? Flag the first occurence also...
arr(r, 1) = arr(r, 1) & "_" & tmp(1) 'flag the n'th instance
dict(v) = tmp 're-store the array
End If
Next r
rng.Offset(0, 2).Value = arr 'output the (updated) array
End Sub

Excel VBA Passing Variables

I need to pass the variables max, min, and their respective locations to another sub where it will format each max and min in their respective column. I am trying to create an array that will store the locations and the values but its not working.
I was told to first identify the number of columns used and the number of rows, which is the beginning.
Rows = wsData.UsedRange.Rows.Count
Columns = wsData.UsedRange.Col.Count
j = 1
ReDim Min(j)
With wsData.Range("A3:A19")
For j = 1 To 19 'colum
Min(j) = WorksheetFunction.Min(Range(.Offset(1, j), .Offset(Row, j)))
Max = WorksheetFunction.Max(Range(.Offset(1, j), .Offset(Row, j)))
Min(j) = Min
j = j + 1
ReDim Preserve Min(j) 'saves variables
Next 'next column
End With
The code below uses the ActiveSheet which you need to change to reference the worksheet for your data. Additionally, it assumes that your data starts with Row 1. The code looks at each column in the range and stores the minimum/maximum (it does not account for multiple cells which may share the min or max value) value found in the column as well as the cell's address, in an array and then passes the array to two different subs, one which simply displays the information in a message and one which formats the the background color of the cells. This code does not perform any kind of error handling, but should get you where you want to go.
the line Option Explicit requires that all of the variables be defined using a Dim statement
the line Option Base 1 makes the default starting point for arrays 1 instead of 0
Option Explicit
Option Base 1
Sub GatherData()
Dim iRows As Long
Dim iCols As Long
Dim j As Long
Dim iMin() As Variant
Dim iMax() As Variant
Dim R As Range
iRows = ActiveSheet.UsedRange.Rows.Count
iCols = ActiveSheet.UsedRange.Columns.Count
ReDim iMin(iCols, 2)
ReDim iMax(iCols, 2)
For j = 1 To iCols
Set R = Range(Cells(1, j), Cells(iRows, j)).Find(WorksheetFunction.Min(Range(Cells(1, j), Cells(iRows, j))), LookIn:=xlValues)
iMin(j, 1) = R.Value
iMin(j, 2) = R.Address
Set R = Range(Cells(1, j), Cells(iRows, j)).Find(WorksheetFunction.Max(Range(Cells(1, j), Cells(iRows, j))), LookIn:=xlValues)
iMax(j, 1) = R.Value
iMax(j, 2) = R.Address
Next j
ListMinMax iMax(), True
ListMinMax iMin(), False
FormatMinMax iMax, "green"
FormatMinMax iMin, "yellow"
Set R = Nothing
End Sub
Sub ListMinMax(ByRef Arr() As Variant, ByVal MinMax As Boolean)
Dim strOutput As String
Dim i As Long
If MinMax = True Then
strOutput = "Maximums:" & vbCrLf & vbCrLf
Else
strOutput = "Minimums:" & vbCrLf & vbCrLf
End If
For i = 1 To UBound(Arr, 1)
strOutput = strOutput & "Cell: " & Arr(i, 2) & " = " & Arr(i, 1) & vbCrLf
Next i
MsgBox strOutput, vbOKOnly
End Sub
Sub FormatMinMax(ByRef Arr() As Variant, ByVal BGColor As String)
Dim i As Long
Select Case UCase(BGColor)
Case "GREEN"
For i = 1 To UBound(Arr, 1)
ActiveSheet.Range(Arr(i, 2)).Interior.Color = vbGreen
Next i
Case "YELLOW"
For i = 1 To UBound(Arr, 1)
ActiveSheet.Range(Arr(i, 2)).Interior.Color = vbYellow
Next i
Case Else
MsgBox "Invalid Option", vbCritical
End Select
End Sub
======================================================================
The code below does away with the need for the arrays and formats the color of the min/max values as it finds them
Sub GatherData2()
Dim iRows As Long
Dim iCols As Long
Dim j As Long
Dim R As Range
iRows = ActiveSheet.UsedRange.Rows.Count
iCols = ActiveSheet.UsedRange.Columns.Count
For j = 1 To iCols
Set R = Range(Cells(1, j), Cells(iRows, j)).Find(WorksheetFunction.Min(Range(Cells(1, j), Cells(iRows, j))), LookIn:=xlValues)
R.Interior.Color = vbYellow
Set R = Range(Cells(1, j), Cells(iRows, j)).Find(WorksheetFunction.Max(Range(Cells(1, j), Cells(iRows, j))), LookIn:=xlValues)
R.Interior.Color = vbGreen
Next j
Set R = Nothing
End Sub

Multiple Criteria Evaluate Match Function Prohibitively Slow?

The following code successfully executes for small data sets:
Option Explicit
Option Base 1
Sub Left()
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Dim ws1, _
ws2 As Worksheet, _
wb As Workbook
Set wb = ThisWorkbook
Set ws1 = wb.Worksheets("Adj")
Set ws2 = wb.Worksheets("Deleted")
Dim a, _
b, _
i, _
j, _
k As Long
a = 957
b = 290150
Dim Item1, _
Item2, _
Arr() As Variant
With ws2
For i = 2 To a
.Cells(i, 6) = Left(.Cells(i, 1), 11)
.Cells(i, 7) = Right(.Cells(i, 1), 4)
Next i
End With
With ws1
For j = 2 To b
ReDim Preserve Arr(j - 1)
Item1 = Chr(34) & .Cells(j, 7) & Chr(34)
Item2 = Chr(34) & .Cells(j, 9) & Chr(34)
On Error Resume Next
k = Evaluate("=MATCH(1,('Deleted'!F:F = " & Item1 & ")*('Deleted'!G:G = " & Item2 & "),0)")
If Err.Number = 13 Then
Arr(j - 1) = ""
Else: Arr(j - 1) = k
End If
On Error GoTo 0
Next j
.Range(.Cells(2, 15), .Cells(b, 15)) = WorksheetFunction.Transpose(Arr())
End With
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
End Sub
However, for large data sets - such as 290,150 rows - the macro spins its wheels. It's known that Evaluate is expensive to run and I have tried running for sample sizes of 30 (success) and 1,000 (unsuccessful) and debugged carefully. Obviously in-cell array formulation drag-and-drop is not a practical alternative. So, the problem reduces to resolving endless spinning for the given multiple criteria match function required.
How do I bypass this constraint?
Try this approach using a dictionary as a lookup:
Sub Left()
Dim wsAdj As Worksheet, wsDel As Worksheet, wb As Workbook
Dim lrDel As Long, lrAdj As Long, r As Long
Dim dict, t, arr, arrG, arrI, arrRes, k
Set wb = ThisWorkbook
Set wsAdj = wb.Worksheets("Adj")
Set wsDel = wb.Worksheets("Deleted")
lrAdj = 290150
lrDel = 957
t = Timer
'load a dictionary with lookup values constructed from wsDel ColA
Set dict = CreateObject("scripting.dictionary")
arr = wsDel.Range("A2:A" & lrDel).Value
For r = 1 To UBound(arr, 1)
k = Left(arr(r, 1), 11) & Chr(0) & Right(arr(r, 1), 4)
dict(k) = r + 1 '+1 to adjust for starting at row 2
Next r
arrG = wsAdj.Range("G2:G" & lrAdj).Value 'get the match columns as arrays
arrI = wsAdj.Range("I2:I" & lrAdj).Value
ReDim arrRes(1 To UBound(arrG, 1), 1 To 1) 'resize the "result" array
'loop the values from wsAdj
For r = 1 To UBound(arrG, 1)
k = arrG(r, 1) & Chr(0) & arrI(r, 1) 'build the "key"
If dict.exists(k) Then
arrRes(r, 1) = dict(k) 'get the matched row
End If
Next r
wsAdj.Cells(2, 15).Resize(UBound(arrRes, 1), 1).Value = arrRes 'put the array on the sheet
Debug.Print "done", Timer - t ' <1 sec
End Sub
Stating Ranges instead of Columns and removing ReDim on loop helped.

Updating the column based on Unique value in one col & max repeated values in another col

I am trying to convert the data based on the max repeated values.
I have truck numbers in col A and "Truck types" in column in B col.
For each unique truck number, the truck type should be same.(This is the expected result)
This can be achieved, by counting the maximum no. of truck types for the unique "truck no", and that cell to be updated with the Max. repeated "Truck type".
If there is equal no. of "Truck types" are available, It should be updated with the first available truck type.
Like this, there are thousands of rows to be updated. This can be
better understand by seeing the attached image.
I have attached the image & expected result is in the column C.
I have googled a lot, but I was unable to find the relevant solution.
Please help.
You do not say anything...
Please, test the next code. It works with assumption that the columns are sorted as we can see in the picture. It is very fast, since the result is put in an array and dropped on the sheet at once:
Sub findMaxCountVehType_Array()
Dim sh As Worksheet, lastRow As Long, rngVeh As Range, rngTemp As Range, arrFin As Variant
Dim i As Long, j As Long, w As Long, count As Long, maxCount As Long, ar As Long, maxStr As String
Set sh = ActiveSheet 'use here your sheet
lastRow = sh.Range("A" & Rows.count).End(xlUp).row
Set rngVeh = sh.Range("A2:C" & lastRow)
ReDim arrFin(1 To lastRow, 1 To 1)
arrFin(1, 1) = "Result": ar = 1
For i = 2 To lastRow
If sh.Range("A" & i).Value = sh.Range("A" & i + 1).Value Then
For j = i To j + 1000 'create a range of type cells for the same vehicle no
If sh.Range("A" & j).Value = sh.Range("A" & i).Value Then
If rngTemp Is Nothing Then
Set rngTemp = sh.Range("B" & j)
Else
Set rngTemp = Union(rngTemp, sh.Range("B" & j))
End If
Else
Exit For
End If
Next j
If rngTemp Is Nothing Then
ar = ar + 1: arrFin(ar, 1) = sh.Range("B" & i)
Else
For w = 1 To rngTemp.Cells.count 'determine the max occurrences string
count = WorksheetFunction.CountIf(rngTemp, rngTemp.Cells(w, 1).Value)
If count > maxCount Then maxCount = count: maxStr = rngTemp.Cells(w, 1).Value
Next
For w = 1 To rngTemp.Cells.count
ar = ar + 1: arrFin(ar, 1) = maxStr 'fill the max count in the array
Next
End If
Set rngTemp = Nothing: maxCount = 0: count = 0 'reinitialize variables
i = i + w - 2 'move the iteration to the following vehicle
Else
ar = ar + 1: arrFin(ar, 1) = sh.Range("B" & i)
End If
Next i
'drop the result array at once
sh.Range("C1").Resize(UBound(arrFin, 1), UBound(arrFin, 2)).Value = arrFin
End Sub
Here is a VBA routine that uses:
A class object which has
key:= Vehicle number
item:= dictionary of associated vehicle types
key:= vehicle type
item:= count of the vehicle types
After collecting the information, we merely need to cycle through the dictionary and extract, for any given vehicle ID, the vehicle type that has the largest count.
This routine, since it works entirely with VBA arrays, should run pretty fast, even with large amounts of data.
Also, with this method, no sorting is required.
ASSUMES the data starts in cell A1 (could be changed if necessary)
ASSUMES results are as you show with Desired Result in column C
Be sure to set a reference (Tools/References) to Microsoft Scripting Runtime
Class Module (rename this module cVehicle)
Option Explicit
Private pVehicleType As String
Private pVehicleTypes As Dictionary
Public Property Get VehicleType() As String
VehicleType = pVehicleType
End Property
Public Property Let VehicleType(Value As String)
pVehicleType = Value
End Property
Public Property Get VehicleTypes() As Dictionary
Set VehicleTypes = pVehicleTypes
End Property
Public Function addVehicleTypesItem(Value)
If pVehicleTypes.Exists(Value) Then
pVehicleTypes(Value) = pVehicleTypes(Value) + 1
Else
pVehicleTypes.Add Key:=Value, Item:=1
End If
End Function
Private Sub Class_Initialize()
Set pVehicleTypes = New Dictionary
pVehicleTypes.CompareMode = TextCompare
End Sub
Regular Module
'Set Reference to Microsoft Scripting Runtime
Option Explicit
Sub vehicle()
Dim dV As Dictionary, cV As cVehicle
Dim wsData As Worksheet, vData As Variant, rRes As Range
Dim V As Variant, I As Long, sKey As String, cKey As String, Cnt As Long
'set data worksheet
'read data into vba array
Set wsData = Worksheets("Sheet3")
With wsData
'add extra column for the "desired results"
vData = .Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp)).Resize(columnsize:=3)
Set rRes = .Cells(1, 1)
End With
'loop through the data and count the types
'no sorting necessary
Set dV = New Dictionary
For I = 2 To UBound(vData, 1)
Set cV = New cVehicle
With cV
sKey = vData(I, 1)
.VehicleType = vData(I, 2)
If Not dV.Exists(sKey) Then
.addVehicleTypesItem .VehicleType
dV.Add sKey, cV
Else
dV(sKey).addVehicleTypesItem .VehicleType
End If
End With
Next I
'Output the data
I = 1
'Header
vData(I, 3) = "Desired Result"
'Data
For I = 2 To UBound(vData, 1)
sKey = vData(I, 1)
With dV(sKey)
'which type has the highest count
Cnt = 0
For Each V In .VehicleTypes.Keys
If .VehicleTypes(V) > Cnt Then
Cnt = .VehicleTypes(V)
cKey = V
End If
Next V
vData(I, 3) = cKey
End With
Next I
'write the results
Set rRes = rRes.Resize(UBound(vData, 1), UBound(vData, 2))
rRes = vData
End Sub

Count string within string using VBA

I have product codes: (they are in C column of active sheet)
DO-001
DO-002
DO-003
DO-004
And I have big list of data: (they are in C column of "Sheet1")
41300100_DO-001_14215171
41300104_DO-001_14215173
K1_ISK_41300661_DO-002_13190369
NP_41533258_DO-003_14910884
DO-003_DD_44_ddd
And I want to count how many times do the product codes appear in the list of data. So the result for this case'd be: (result is H column of active sheet)
DO-001 2
DO-002 1
DO-003 2
DO-004
I have done this with this code:
Sub CountcodesPLC()
Dim i, j As Integer, icount As Integer
Dim ldata, lcodes As Long
icount = 0
lcodes = Cells(Rows.Count, 3).End(xlUp).Row
ldata = Worksheets("Sheet1").Cells(Rows.Count, 3).End(xlUp).Row
For i = 10 To lcodes
For j = 2 To ldata
If InStr(Worksheets("Sheet1").Range("C" & j), Range("C" & i)) <> 0 Then
icount = icount + 1
End If
Next j
If icount <> 0 Then
Range("H" & i).Value = icount
End If
icount = 0
Next i
End Sub
But I want to change it, so if the list of data contains some key words like "NP", "ISK", then not to count them, or if the first part of the data is the code then also not to count them, so the result for this example would be:
DO-001 2
DO-002
DO-003
DO-004
Also, I'll have around 1.000 product codes, and around 60.000 strings of data.
Will my code crash?
Seems your code is OK. But if you want to match only the first part of string (a'ka StartsWith), i'd change only this line:
If InStr(Worksheets("Sheet1").Range("C" & j), Range("C" & i)) <> 0 Then
to:
If Worksheets("Sheet1").Range("C" & j) Like Range("C" & i) & "*" Then
For further details, please see: Wildcard Characters used in String Comparisons
Use Dictionnary
Dim Dict As Scripting.Dictionary
Set Dict = New Scripting.Dictionary
Arr = Split("refer your text here", "_")
For I = LBound(Arr) To UBound(Arr)
If Dict.Exists(Arr(I)) Then
Dict(Arr(I)) = Dict(Arr(I)) + 1 'increment
Else
Dict.Add Arr(I), 1
End If
Next I
This may be OTT for the requirement but should work quite quickly.
Public Sub Sample()
Dim WkSht As Worksheet
Dim LngRow As Long
Dim AryLookup() As String
Dim VntItem As Variant
'We put what we want to search into an array, this makes it a lot quicker to search
Set WkSht = ThisWorkbook.Worksheets("Sheet1")
ReDim AryLookup(0)
LngRow = 1
Do Until WkSht.Range("A" & LngRow) = ""
If AryLookup(UBound(AryLookup, 1)) <> "" Then ReDim Preserve AryLookup(UBound(AryLookup, 1) + 1)
AryLookup(UBound(AryLookup, 1)) = Trim(UCase(WkSht.Range("A" & LngRow)))
LngRow = LngRow + 1
Loop
Set WkSht = Nothing
'Then we go down the list and check the array against each item
Set WkSht = ActiveSheet
LngRow = 1
Do Until WkSht.Range("A" & LngRow) = ""
WkSht.Range("B" & LngRow) = 0
For Each VntItem In AryLookup()
'This looks for the match without any of the exclusion items
If (InStr(1, VntItem, Trim(UCase(WkSht.Range("A" & LngRow)))) <> 0) And _
(InStr(1, VntItem, "NP") = 0) And _
(InStr(1, VntItem, "ISK") = 0) Then
WkSht.Range("B" & LngRow) = WkSht.Range("B" & LngRow) + 1
End If
Next
LngRow = LngRow + 1
Loop
Set WkSht = Nothing
MsgBox "Done"
End Sub
Basically, the 60,000 data strings will go into an array in memory, then the array will be searched against the 1,000 products. Searching in memory should be quick.
One thing I would raise is the exclusion method may produce false positives.
For example, excluding NP will exclude: -
NP_41533258_DO-003_14910884
NPA_41533258_DO-003_14910884
41533258_ANP_DO-003_14910884
You may want to think about the method overall.
Have you considered an array formula, not sure how it will perform vs code, but, you could do something along these lines, where list is in A and prod numbers in B
=SUM(IF(NOT(ISERR(SEARCH(B1 & "_",$A$1:$A$5,1)))*(ISERR(SEARCH("NP_",$A$1:$A$5,1))),1,0))
Where "NP" would be replaced by a range containing the exclusions, I've left as NP to show what's happening.
The code would be like this. But I don't know the speed.
Sub test()
Dim vDB, vLook, vSum(), Sum As Long
Dim Ws As Worksheet, dbWs As Worksheet
Dim s As String, sF As String, sCode As String
Dim i As Long, j As Long, n As Long
Set dbWs = Sheets("Sheet1")
Set Ws = ActiveSheet
With Ws
vLook = .Range("a1", .Range("a" & Rows.Count).End(xlUp))
End With
With dbWs
vDB = .Range("c1", .Range("c" & Rows.Count).End(xlUp))
End With
n = UBound(vLook, 1)
ReDim vSum(1 To n, 1 To 1)
For i = 1 To n
sF = Split(vLook(i, 1), "-")(0)
sCode = Replace(vLook(i, 1), sF, "")
Sum = 0
For j = 1 To UBound(vDB, 1)
s = vDB(j, 1)
If Left(s, Len(sF)) = sF Or InStr(s, "NP") Or InStr(s, "ISK") Then
Else
If InStr(s, sCode) Then
Sum = Sum + 1
End If
End If
Next j
If Sum > 0 Then
vSum(i, 1) = Sum
End If
Next i
Ws.Range("h1").Resize(n) = vSum
End Sub

Resources