I have an array of ~5,000 unique IDs loaded from a CSV file:
Dim wb As Workbook
Dim idRng As Variant
Set wb = Workbooks.Open(Filename:=ThisWorkbook.path & "\DataSource\ID.csv")
With wb.Sheets(1)
idRng = .Range("A2:A" & .Range("A" & .Rows.Count).End(xlUp).Row).Value2
End With
wb.Close
Alongside this, I also load in ~100,000 rows of data, which contains non-unique IDs with numerous possible duplicates. My aim is to loop through the 100,000 rows and check if the corresponding rows ID is contained within the smaller array, and if so, add the rows data to a collection. Both IDs are stored as Longs. I have completed this using the below:
Dim dataRng As Variant
Set wb = Workbooks.Open(Filename:=ThisWorkbook.path & "\DataSource\data.csv")
With wb.Sheets(1)
dataRng = .Range("A2:H" & .Range("A" & .Rows.Count).End(xlUp).Row).Value2
For i = LBound(dataRng) To UBound(dataRng)
If mUtil.IsInArray(dataRng(i, 1), idRng) Then
'Add object to collection
End If
Next
End With
'mUtil
Public Function IsInArray(v As Variant, arr As Variant) As Boolean
For i = LBound(arr) To UBound(arr)
If arr(i, 1) = v Then
IsInArray = True
Exit Function
End If
Next
IsInArray = False
End Function
Despite this working, as you can imagine iterating through the 5,000 unique IDs 100,000 times can take a fair amount of time, alongside this, the larger file can end up being much bigger.
Is there a more efficient way of performing this task, with the ultimate aim to reduce the run time?
I'd suggest throwing your 5,000 records into a dictionary and then use the Exists method to check to see if it does in fact exist.
Public Sub DictionaryTest()
Dim lngKey As Long, objDict As Object
Set objDict = CreateObject("Scripting.Dictionary")
lngKey = 123456
objDict.Add lngKey, 0
Debug.Print objDict.Exists(lngKey)
End Sub
It absolves you from having to loop over the 5,000 each time AND the power of the search within the dictionary should speed up the process 10 fold.
You can try something as simple as the following. Instead of looping twice, just loop one of them and Match if the item is found in the other array. I just tested with random numbers and just looped the unique values. This would work only if you want the first match. If you want all the matches you need to simply reverse it and loop the 100k non-unique array to the unique one.
What we do is create MatchArr as a Variant and then use that variable for our Application.Match function. If the function finds a match, it returns the row it found it on. If it doesn't find a match it will error, but because we made it a variant it won't stop the code. We simply check if it's an error or not and if it is then we simply move to the next line.
This is what I tried (Change as needed):
EDIT: I've updated to do the loop of the bigger array that needs to be refined.
Sub FindValues()
Dim Arr1, Arr2, MatchArr, i As Long, Col As New Collection
Arr1 = Sheet1.Range("A1:A50").Value
Arr2 = Sheet1.Range("C1:C1000").Value
For i = LBound(Arr2, 1) To UBound(Arr2, 1)
MatchArr = Application.Match(Arr2(i, 1), Arr1, 0)
If Not IsError(MatchArr) Then
Col.Add Arr2(i, 1)
End If
Next i
For i = 1 To Col.Count
Sheet1.Range("E" & i).Value = Col(i)
Next i
End Sub
Related
Looking for a more appropriate approach. I have a working solution, but it seems there should be a built-in or more elegant method.
I am comparing two sheets from separate workbooks, documenting the differences on a sheet in current workbook. Every time a difference is found, I'm generating a row of output data. As I'm unaware of the total number of differences I will find, the row of output data is appended to an ArrayList.
I have a working bit of code, but the effective method is:
Create a row as an arraylist.
Convert the row to an array.
Add the row to an arraylist for output
TWICE Transpose the output arraylist while converting to an array
Output the array to worksheet.
With all the benefit of using ArrayLists, it seems that there should be a direct method for outputting a 2D "ArrayList of ArrayLists" or something along those lines.
Here is the current code:
Sub findUnmatchingCells()
Dim oWB_v1 As Workbook, oWB_v2 As Workbook, oRange_v1 As Range, oRange_v2 As Range
On Error GoTo endofsub
With Me
.Cells.Clear
.Cells(1, 1) = "Row"
.Cells(1, 2) = "Column"
.Cells(1, 3) = "v1"
.Cells(1, 4) = "v2"
End With
Dim missing_items As Object
Dim output_row(), output(), missing_row As Object
Set oWB_v1 = Workbooks("foo.xls")
Set oWB_v2 = Workbooks("bar.xls")
Set oRange_v1 = oWB_v1.Sheets(1).Range("A1:AD102")
Set oRange_v2 = oWB_v2.Sheets(1).Range("A1:AD102")
Set missing_items = CreateObject("System.Collections.ArrayList")
For rRow = 1 To oRange_v1.Rows.Count
For cCol = 1 To oRange_v1.Columns.Count
If oRange_v1.Cells(rRow, cCol) <> oRange_v2.Cells(rRow, cCol) Then
Set missing_row = CreateObject("System.Collections.ArrayList")
missing_row.Add rRow
missing_row.Add cCol
missing_row.Add oRange_v1.Cells(rRow, cCol).Value2
missing_row.Add oRange_v2.Cells(rRow, cCol).Value2
output_row = missing_row.toarray
missing_items.Add output_row
End If
Next cCol
Next rRow
output = Application.WorksheetFunction.Transpose(Application.WorksheetFunction.Transpose(missing_items.toarray))
'my own output routine
If Not outputArrayToRange(output, Me.Range("A2")) Then Stop
Exit Sub
endofsub:
Debug.Print rRow, cCol, missing_items.Count, missing_row.Count, Error
Stop
End Sub
Seems like a lot of extra work here with ArrayList when you are not really using anything useful from them. As you know the mismatch count cannot be more than the number of start elements, and the columns will be 4 at end, you can do all of this just with a single array. Pre-size the array and in your loop populate it.
Simplified example:
As you are using Me this code would be in "Sheet1".
Now it would get more complicated if you wanted to ReDim to actual number of mismatches to avoid over-writing something, but generally it is wise to plan developments to avoid such risks. You would need the double transpose to be able to ReDim the rows as columns then back to rows.
With the ranges you mention I don't think the Transpose limit would be an issue, but that is a concern in other cases which needs to be resolved with additional looping.
The efficient way is to use arrays the whole time. Read the two ranges into arrays, loop one and compare against the other, write out changes to pre-sized array, write array to sheet
If this is just about is there nicer functionality for this within ArrayLists, no. What you have done is short and effective but incurs more overhead than is necessary.
Option Explicit
Public Sub findUnmatchingCells()
Dim oWB As ThisWorkbook, oRange_v1 As Range, oRange_v2 As Range
With Me
.Cells.Clear
.Cells(1, 1) = "Row"
.Cells(1, 2) = "Column"
.Cells(1, 3) = "v1"
.Cells(1, 4) = "v2"
End With
Dim rRow As Long, cCol As Long
Set oWB = ThisWorkbook
Set oRange_v1 = oWB.Worksheets("Sheet2").Range("A1:D3") 'would be faster to read this into array and later loop that
Set oRange_v2 = oWB.Worksheets("Sheet3").Range("A1:D3") 'would be faster to read this into array and later loop that
Dim totalElements As Long, output()
totalElements = oRange_v1.Rows.Count * oRange_v1.Rows.Count
ReDim output(1 To totalElements, 1 To 4)
For rRow = 1 To oRange_v1.Rows.Count 'would be faster to loop arrays than sheet
For cCol = 1 To oRange_v1.Columns.Count
If oRange_v1.Cells(rRow, cCol) <> oRange_v2.Cells(rRow, cCol) Then
output(rRow, 1) = rRow
output(rRow, 2) = cCol
output(rRow, 3) = oRange_v1.Cells(rRow, cCol).Value2
output(rRow, 4) = oRange_v2.Cells(rRow, cCol).Value2
End If
Next cCol
Next rRow
oWB.Worksheets("Sheet1").Cells(2, 1).Resize(UBound(output, 1), UBound(output, 2)) = output
End Sub
Other thoughts:
You can have early bound if adding references is not a concern:
From: https://www.snb-vba.eu/VBA_Arraylist_en.html
ThisWorkbook.VBProject.References.AddFromFile "C:\WINDOWS\Microsoft.NET\Framework\v4.0.30319\mscorlib.tlb"
or
ThisWorkbook.VBProject.References.AddFromguid "{BED7F4EA-1A96-11D2-8F08-00A0C9A6186D}", 2, 4
You are wasting an already created object by continually re-creating your missing_row ArrayList within loop. Create it once, before the loop, and just before you loop round again call the .Clear method.
Excel VBA
I will try my best to have this make sense.
I have set my array for 29 items (equipment ID T0001-T0028).
Essentially I have a report that has maintenance write-ups for these pieces of equipment. I am trying to create a loop that will go through the file and find each time the equid ID is listed and then i will use the left/right/mid functions to extract data from the file. However each equip ID will be listed multiple times so they way i am picturing this happening is for equip id T0001 the procedure will go through the entire file finding and extracting each time "T0001" is listed and then go to "T0002" and go through the entire file and so...
I know it will be a loop of some sort but I am so confused on whether to loop the file or loop the array. Can anyone help.
Sub EquipArray()
Dim sampleArr() As Variant
Dim i As Integer
Dim rng As Range, cell As Range
i = 1
Set rng = Range("A2:A29")
sampleArr = rng
End Sub
this is a loop that finds duplicates. alter to your needs
Private Sub this()
Dim rng As Range
Dim rCell As Range
Dim this As String
Dim arr(9)
Set rng = ThisWorkbook.Sheets("Sheet1").Range("a1:a10")
For Each rCell In rng.Cells
this = rCell.Value
For x = LBound(arr, 1) To UBound(arr, 1)
If this = arr(x) Then
rCell.Interior.ColorIndex = 7
Exit For
ElseIf this <> arr(x) And arr(x) = vbNullString Then
arr(x) = this
Exit For
End If
Next x
Next rCell
End Sub
If I understood it right, it will make no difference, since you'll have to loop it [number of files] * [number of items] (or [number of items] * [number of files], wich is the same).
Couldn't you use a formula to classify each file beforehand, so you don't have to test it against each item?
Is there a faster way to do this?
Set data = ws.UsedRange
Set unique = CreateObject("Scripting.Dictionary")
On Error Resume Next
For x = 1 To data.Rows.Count
unique.Add data(x, some_column_number).Value, 1
Next x
On Error GoTo 0
At this point unique.keys gets what I need, but the loop itself seems to be very slow for files that have tens of thousands of records (whereas this wouldn't be a problem at all in a language like Python or C++ especially).
Use Excel's AdvancedFilter function to do this.
Using Excels inbuilt C++ is the fastest way with smaller datasets, using the dictionary is faster for larger datasets. For example:
Copy values in Column A and insert the unique values in column B:
Range("A1:A6").AdvancedFilter Action:=xlFilterCopy, CopyToRange:=Range("B1"), Unique:=True
It works with multiple columns too:
Range("A1:B4").AdvancedFilter Action:=xlFilterCopy, CopyToRange:=Range("D1:E1"), Unique:=True
Be careful with multiple columns as it doesn't always work as expected. In those cases I resort to removing duplicates which works by choosing a selection of columns to base uniqueness. Ref: MSDN - Find and remove duplicates
Here I remove duplicate columns based on the third column:
Range("A1:C4").RemoveDuplicates Columns:=3, Header:=xlNo
Here I remove duplicate columns based on the second and third column:
Range("A1:C4").RemoveDuplicates Columns:=Array(2, 3), Header:=xlNo
Loading the values in an array would be much faster:
Dim data(), dict As Object, r As Long
Set dict = CreateObject("Scripting.Dictionary")
data = ActiveSheet.UsedRange.Columns(1).Value
For r = 1 To UBound(data)
dict(data(r, some_column_number)) = Empty
Next
data = WorksheetFunction.Transpose(dict.keys())
You should also consider early binding for the Scripting.Dictionary:
Dim dict As New Scripting.Dictionary ' requires `Microsoft Scripting Runtime` '
Note that using a dictionary is way faster than Range.AdvancedFilter on large data sets.
As a bonus, here's a procedure similare to Range.RemoveDuplicates to remove duplicates from a 2D array:
Public Sub RemoveDuplicates(data, ParamArray columns())
Dim ret(), indexes(), ids(), r As Long, c As Long
Dim dict As New Scripting.Dictionary ' requires `Microsoft Scripting Runtime` '
If VarType(data) And vbArray Then Else Err.Raise 5, , "Argument data is not an array"
ReDim ids(LBound(columns) To UBound(columns))
For r = LBound(data) To UBound(data) ' each row '
For c = LBound(columns) To UBound(columns) ' each column '
ids(c) = data(r, columns(c)) ' build id for the row
Next
dict(Join$(ids, ChrW(-1))) = r ' associate the row index to the id '
Next
indexes = dict.Items()
ReDim ret(LBound(data) To LBound(data) + dict.Count - 1, LBound(data, 2) To UBound(data, 2))
For c = LBound(ret, 2) To UBound(ret, 2) ' each column '
For r = LBound(ret) To UBound(ret) ' each row / unique id '
ret(r, c) = data(indexes(r - 1), c) ' copy the value at index '
Next
Next
data = ret
End Sub
PowerShell is a very powerful and efficient tool. This is cheating a little, but shelling PowerShell via VBA opens up lots of options
The bulk of the code below is simply to save the current sheet as a csv file. The output is another csv file with just the unique values
Sub AnotherWay()
Dim strPath As String
Dim strPath2 As String
Application.DisplayAlerts = False
strPath = "C:\Temp\test.csv"
strPath2 = "C:\Temp\testout.csv"
ActiveWorkbook.SaveAs strPath, xlCSV
x = Shell("powershell.exe $csv = import-csv -Path """ & strPath & """ -Header A | Select-Object -Unique A | Export-Csv """ & strPath2 & """ -NoTypeInformation", 0)
Application.DisplayAlerts = True
End Sub
it's funny because i've had to read these instructions over and over again, but it think i worked out a much faster way to do this:
Set data = ws.UsedRange
dim unique as variant
unique = WorksheetFunction.Unique(data)
And then you can do whatever you want with the unique array such as iterating it:
For i = LBound(unique) To UBound(unique)
Range("Q" & i) = indexes(i, 1)
Next
Try this
Option Explicit
Sub UniqueValues()
Dim ws As Worksheet
Dim uniqueRng As Range
Dim myCol As Long
myCol = 5 '<== set it as per your needs
Set ws = ThisWorkbook.Worksheets("unique") '<== set it as per your needs
Set uniqueRng = GetUniqueValues(ws, myCol)
End Sub
Function GetUniqueValues(ws As Worksheet, col As Long) As Range
Dim firstRow As Long
With ws
.Columns(col).RemoveDuplicates Columns:=Array(1), header:=xlNo
firstRow = 1
If IsEmpty(.Cells(1, col)) Then firstRow = .Cells(1, col).End(xlDown).row
Set GetUniqueValues = Range(.Cells(firstRow, col), .Cells(.Rows.Count, col).End(xlUp))
End With
End Function
it should be quite fast and without the drawback NeepNeepNeep told about
I have a worksheet with data in columns A and B.
I am looking for a convenient way to take these columns and convert to dictionary where the cell in column A is the key and column B is the value, something like :
Dim dict as Dictionary
Set dict = CreateDictFromColumns("SheetName", "A", "B")
NOTE: I am already referencing the scripting dll.
You would need to loop, E.g.
Function CreateDictFromColumns(sheet As String, keyCol As String, valCol As String) As Dictionary
Set CreateDictFromColumns = New Dictionary
Dim rng As Range: Set rng = Sheets(sheet).Range(keyCol & ":" & valCol)
Dim i As Long
Dim lastCol As Long '// for non-adjacent ("A:ZZ")
lastCol = rng.Columns.Count
For i = 1 To rng.Rows.Count
If (rng(i, 1).Value = "") Then Exit Function
CreateDictFromColumns.Add rng(i, 1).Value, rng(i, lastCol).Value
Next
End Function
This breaks on the first empty key value cell.
I think it'd be best form to pass two ranges to a create dictionary function. This allows for the ranges to be completely separate, even different workbooks. It also allows for a 1D range to be mapped to a 2D range as demonstrated below.
Alternatively, you could also pass two arrays of range values. That may be cleaner for 1D ranges, but would result in slightly more code for 2D mapping. Notice that range elements can be looped through left to right top to bottom by index. You can use Application.Transpose(Range("A1:A5")) to effectively run top to bottom left to right.
Jagged Mapping
Sub Test()
RangeToDict Sheets(1).Range("A1:A5"), Sheets(2).Range("C1:E2")
End Sub
Function RangeToDict(ByVal KeyRng As Range, ByVal ValRng As Range) As Dictionary
Set RangeToDict = New Dictionary
For Each r In KeyRng
vi = vi + 1
'It may not be advisable to handle empty key values this way
'The handling of empty values and #N/A/Error values
'Depends on your exact usage
If r.Value2 <> "" Then
RangeToDict.Add r.Value2, ValRng(vi)
Debug.Print r.Value2 & ", " & ValRng(vi)
End If
Next
End Function
Side-By-Side (As Range)
If your target range is a single 2 column range side by side, you can simplify to passing a single range as shown below. Consequently, this also works for mapping every other element in a 1 dimensional range.
Sub Test()
RangeToDict2 Range("A1:B5")
End Sub
Function RangeToDict2(ByVal R As Range) As Dictionary
Set RangeToDict2 = New Dictionary
i = 1
Do Until i >= (R.Rows.Count * R.Columns.Count)
RangeToDict2.Add R(i), R(i + 1)
Debug.Print R(i) & ", " & R(i + 1)
i = i + 2
Loop
End Function
Two Columns (As Array)
Lastly, as an example of passing arrays as arguments, you could do something like the following. However, the following code will only work given the OP's specific scenario of mapping two columns. As is, it won't handle mapping rows or alternating elements.
Sub Test()
Dim Keys() As Variant: Keys = Range("E1:I1").Value2
Dim Values() As Variant: Values = Range("E3:I3").Value2
RangeToDict Keys, Values
End Sub
Function RangeToDict(Keys() As Variant, Values() As Variant) As Dictionary
Set RangeToDict = New Dictionary
For i = 1 To UBound(Keys)
RangeToDict.Add Keys(i, 1), Values(i, 1)
Debug.Print Keys(i, 1) & ", " & Values(i, 1)
Next
End Function
Use of Named Ranges
It may be convenient to used named ranges, in which case you can pass a Range as an argument likes this...
Sub Test()
RangeToDict Names("Keys").RefersToRange, Names("Values").RefersToRange
End Sub
The best approach to take, is to populate a variant array with the data from the worksheet. You can then loop through the array, assigning the elements of the first array column as the dictionary key; the elements of the second array column can then be used as the value.
The lrow function is used to find the last populated row from column A - allowing the code to create a dynamically sized array and dictionary.
To enable use of dictionaries within VBA, you will need to go to Tools -> References and then enable Microsoft Scripting Runtime.
Sub createDictionary()
Dim dict As Scripting.Dictionary
Dim arrData() As Variant
Dim i as Long
arrData = Range("A1", Cells(lrow(1), 2))
set dict = new Scripting.Dictionary
For i = LBound(arrData, 1) To UBound(arrData, 1)
dict(arrData(i, 1)) = arrData(i, 2)
Next i
End Sub
Function lrow(ByVal colNum As Long) As Long
lrow = Cells(Rows.Count, 1).End(xlUp).Row
End Function
This should do the trick :
Public Function test_leora(SheetName As String, _
KeyColumn As String, _
ValColumn As String) _
As Variant
Dim Dic, _
Val As String, _
Key As String, _
Ws As Worksheet, _
LastRow As Long
Set Ws = ThisWorkbook.Sheets(SheetName)
Set Dic = CreateObject("Scripting.Dictionary")
With Ws
LastRow = .Range(KeyColumn & .Rows.Count).End(xlUp).Row
For i = 1 To LastRow
Val = .Cells(i, ValColumn)
Key = .Cells(i, KeyColumn)
If Dic.exists(Key) Then
Else
Dic.Add Key, Val
End If
Next i
End With
test_leora = Dic
End Function
I do not know the size of the array.
The array is created from a column L on sheet "INPUT_MASTERDATA".
This is what I have so far:
With Worksheets("INPUT_MASTERDATA")
arrInputUniqueItems = .Range("L2", .Range("L" & Rows.Count).End(xlUp))
End With
I would like to include the value "x" in the above array.
Sample data from Range L on Worksheet "INPUT_MASTERDATA"
R83711850
1210221340
1210223342
R83711181
R83711931
These all goes into the array just fine. Now I would like to add the value "x". So the array afterwards looks like this:
R83711850
1210221340
1210223342
R83711181
R83711931
x
Any ideas or help is highly appreciated! Thank you in advance.
If you are going to be dynamically adding items to your collection you should consider a better suitable data type like Collection. Array's are not supposed by resized after their dimensions are specified.
So... consider creating an array from range in one go, then loading that into a Collection and then you can add more items to the collection. The reason you want to use the array is to load the entire range into memory is going to be faster than iterating a Range and adding each cell to the collection directly.
Sub Main()
[L1] = "header"
[L2] = "R83711850"
[L3] = "1210221340"
[L4] = "1210223342"
[L5] = "R83711181"
[L6] = "R83711931"
Dim lastRow As Long
lastRow = Range("L" & Rows.Count).End(xlUp).Row
Dim v As Variant
Dim c As New Collection
Dim arr As Variant
arr = Range("L2:L" & lastRow).Value
For Each v In arr
c.Add v
Next
' then if you ever need to add more just add it to the collection
c.Add "new value"
' print to confirm
For Each v In c
Debug.Print v
Next
End Sub
prints
R83711850
1210221340
1210223342
R83711181
R83711931
new value
in the Immediate Window CTRL+G
You can do this easily enough without resorting to further loops and/or other methods.
Simply use a range 1 cell longer than the end of the range of interest, or
Use ReDim on a 1D array (of course you can and should resize arrays)
code 1
Sub Method1_2D()
Dim arrInputUniqueItems
With Worksheets("INPUT_MASTERDATA")
arrInputUniqueItems = .Range("L2", .Range("L" & Rows.Count).End(xlUp).Offset(1, 0))
arrInputUniqueItems(UBound(arrInputUniqueItems), 1) = "X"
End With
End Sub
code 2
Sub Method2_1D()
Dim arrInputUniqueItems
With Worksheets("INPUT_MASTERDATA")
arrInputUniqueItems = Application.Transpose(.Range("L2", .Range("L" & Rows.Count).End(xlUp)))
ReDim Preserve arrInputUniqueItems(UBound(arrInputUniqueItems))
arrInputUniqueItems(UBound(arrInputUniqueItems)) = "X"
End With
End Sub