VBA for Duplicates (specific need) - excel

I've been scouring stackoverflow and haven't found what I am looking for. Which is frustrating because I know I can't be the only one who has come across this.
I have an excel spreadsheet with 26k rows -- and I need to purge all rows where column D and E have same values -- except I want to keep at most 10 rows and purge rest. In some instances there will only be 3 duplicate rows so those can stay.
Here is example of my spreadsheet.
+------+-------+--------+---------+---------+
| Code | Local | Number | Place A | Place B |
+------+-------+--------+---------+---------+
| A | 558 | 25 | DEW | ABE |
+------+-------+--------+---------+---------+
| A | 485 | 14 | DEW | FXD |
+------+-------+--------+---------+---------+
| A | 658 | 85 | DEW | ABE |
+------+-------+--------+---------+---------+
| A | 225 | 68 | ABE | FXD |
+------+-------+--------+---------+---------+
| A | 1 | 56 | ABE | FXD |
+------+-------+--------+---------+---------+
| A | 47 | 412 | DEW | CDE |
+------+-------+--------+---------+---------+
Imagine I had 15 rows where Place A and Place B were DEW and ABE -- I would want to delete 5 of those. I don't care what 5, just 5 have to go and I need to be left with 10.

Tim Williams' comment will do exactly what you're after and is far simpler than a VBA solution:
Public Sub FilterRange(ByRef TargetTable As Range, ByVal TargetColumns As Variant, Optional ByVal MaxDuplicateCount As Long = 10, _
Optional ByVal IsCaseSensitive As Boolean = False, Optional ByVal Delimiter As String = "^&")
Dim Temp As Variant, x As Long, y As Long
'Error checking
If Not IsArray(TargetColumns) Then
MsgBox "Target columns must be provided as a one dimensional array i.e. ""Array(1, 4, 5)"" ", vbCritical
Exit Sub
End If
'More error checking
For x = 0 To UBound(TargetColumns, 1)
If Not IsNumeric(TargetColumns(x)) Then
MsgBox "Invalid column number supplied: " & TargetColumns(x), vbCritical
Exit Sub
ElseIf TargetColumns(x) < 1 Then
MsgBox "Invalid column number supplied: " & TargetColumns(x), vbCritical
Exit Sub
ElseIf TargetColumns(x) > TargetTable.Columns.Count Then
MsgBox "Invalid column number supplied: " & TargetColumns(x), vbCritical
Exit Sub
End If
Next x
'Create Dictionary object
Dim DuplicateCounter As Object, ThisRowVal As Variant
Set DuplicateCounter = CreateObject("Scripting.Dictionary")
'Set Dictionary case sensitivity
If IsCaseSensitive Then
DuplicateCounter.CompareMode = 0
Else
DuplicateCounter.CompareMode = 1
End If
'Pull table into an array
Temp = TargetTable.Value
'Check each row in the array
For x = 1 To UBound(Temp, 1)
'Determine this row's unique value (based on the supplied columns)
ThisRowVal = Empty
For y = 0 To UBound(TargetColumns, 1)
ThisRowVal = ThisRowVal & Temp(x, TargetColumns(y)) & Delimiter
Next y
'Check for duplicates
If DuplicateCounter.Exists(ThisRowVal) Then
If DuplicateCounter(ThisRowVal) >= MaxDuplicateCount Then
'Too many with this unique value, delete the excess row data
For y = 1 To UBound(Temp, 2)
Temp(x, y) = Empty
Next y
Else
'We haven't exceeded the max row count: increment the counter
DuplicateCounter(ThisRowVal) = DuplicateCounter(ThisRowVal) + 1
End If
Else
'This value is new: add to dictionary with a count of 1
DuplicateCounter.Add ThisRowVal, 1
End If
Next x
'Write the output data to the table range
TargetTable.Value = Temp
End Sub
If you put the above code into a module, you can write the below into a command button, or type it into the Immediate window.
FilterRange Sheets("Sheet1").Range("A1:E26000"), Array(4, 5)
As this pulls the data into an array, it will operate quickly, but will overwrite the table range with values (formulas will be lost). I've written some self-describing optional parameters that allow you to change the code's behaviour.

Related

Macro stops prematurely

Macro to keep on going to the next cell till the value doesn't match and for all the similar values, subtract the values from the bottom most row
Essentially my data is like this (There is only one buy for each name and it is the bottom most cell)
Name | Transaction.Type | Amount | Remaining (what macro needs to do)
Name1 | Sell | 5 | 15 (20-5)
Name1 | Sell | 10 | 10 (20-10)
Name1 | Sell | 15 | 5 (20-15)
Name1 | Buy | 20 |
Name2 | Sell | 25 | 5
Name2 | Buy | 30 |
So far my macro looks like
Dim sline As Integer
Dim eline As Integer
Dim rng As Range
Dim lastrow(1 To 3) As Long
Application.DisplayAlerts = False
With Worksheets("Testing Data 2")
lastrow(1) = .Cells(Rows.Count, "A").End(xlUp).Row
End With
For i = 2 To 4151
If Worksheets("Testing Data 2").Range("A" & i) <> Worksheets("Testing Data 2").Range("A" & i).Offset(1, 0) Then
eline = i
Worksheets("Testing Data 2").Range(":C" & eline)
'struggling to go from here
End If
Next i
Application.DisplayAlerts = True
You can do this without VBA with the understanding that each Name only has one instnace of Buy
=SUMIFS(C:C,A:A,A3,B:B,"Buy")-C2 'Drag down as needed

Split a delimited list based on variable ending characters into specific columns

I have managed to get to a point with a data set where i have a list of items delimited with a "|" symbol. I am now trying to separate each item in the list into the corresponding column, however the identifier of the column is a bit of text at the end of each value of variable length.
Example Data (all in one column):
Column A
40.00A|24.00QS|8.00J[a]
40.00A|12.00J|8.00J[a]
20.00A|4.00V
30.00A|12.00CS|8.00QS
Desired Outcome:
+-------+-------+------+-------+-------+------+
| A | QS | J[a] | J | CS | V |
+-------+-------+------+-------+-------+------+
| 40.00 | 23.00 | 8.00 | | | |
| 40.00 | | 8.00 | 12.00 | | |
| 20.00 | | | | | 4.00 |
| 30.00 | 8.00 | | | 12.00 | |
+-------+-------+------+-------+-------+------+
The number of trailing characters that define columns is fixed to 6 (A,QS,J[a],J,CS & V), so I know at the beginning how many columns I will need.
I have some ideas on how to do it directly through formulas, but it would require me to split out the items into individual columns by the delimiter, then use some sort of if statement on some additional columns. Would prefer to avoid the helper column issue. Also, looked at the following link, but it doesn't solve the solution, as it assumes the value matches the column heading (I can correct that, but I feel like there is a faster VBA solution here):
How to split single column (with unequal values) to multiple columns sorted according to values from the original single column?
I have been reading about Regular Expressions, and i suspect there is a solution there, but I can't quite figure out how to sort the result.
Once i have this data setup, it is a small task to unpivot it and get the data in a proper tabular format using Power Query.
Thanks in advance!
since headers are fixed, it can simply be tried out like this (the Row & Column of the Source & destination data may be changed to your requirement)
Option Explicit
Sub test()
Dim Ws As Worksheet, SrcLastrow As Long, TrgRow As Long, Rw As Long
Dim Headers As Variant, xLine As Variant
Dim i As Long, j As Long
Set Ws = ThisWorkbook.ActiveSheet
'Column A assumed to have the texts
SrcLastrow = Ws.Range("A" & Rows.Count).End(xlUp).Row
TrgRow = 2
Headers = Array("A", "QS", "J[a]", "J", "CS", "V")
For Rw = 1 To SrcLastrow
xLine = Split(Ws.Cells(Rw, 1).Value, "|")
For i = 0 To UBound(xLine)
For j = 0 To UBound(Headers)
xLine(i) = Trim(xLine(i))
If Right(xLine(i), Len(Headers(j))) = Headers(j) Then
Ws.Range("D" & TrgRow).Offset(0, j).Value = Replace(xLine(i), Headers(j), "") ' The output data table was assumed to be at Column D
End If
Next j
Next i
TrgRow = TrgRow + 1
Next
End Sub

VBA - Avoiding for loop

I have the following data in an excel worksheet, in columns A, B and C respectively.
+-----------+--------------+----------------+
| RangeName | Clear? | Value if Clear |
+-----------+--------------+----------------+
| Name1 | DO NOT CLEAR | |
| Name2 | | 6 |
| Name3 | | 7 |
| Name4 | DO NOT CLEAR | |
| Name5 | DO NOT CLEAR | |
| Name6 | DO NOT CLEAR | |
| Name7 | DO NOT CLEAR | |
| Name8 | DO NOT CLEAR | |
| Name9 | | 5 |
| Name10 | | 9 |
+-----------+--------------+----------------+
Theres a "clear" macro which checks for each excel range name, if column B says "DO NOT CLEAR" then it will skip and do nothing, if it is blank then it will clear the range name and set the range name value to column C. The code is as follows:
For i = 1 To MaxRowCount
Select Case Range("RngeTbl").Cells(i, 2).Value
Case "DO NOT CLEAR" 'do nothing
Case Else 'set to default value
Range(Range("RngeTbl").Cells(i, 1).Value).Value = Range("RngeTbl").Cells(i, 3).Value
End Select
Next i
However, the number of range names is increasing massively, and right now I have 32571 range names.
Is there a way I can speed this macro up? I've been trying put the column into an array and somehow check that way but I'm having no luck.
Any help please!
The following code should be slightly better (if run in the context of Application.ScreenUpdating = Fasle, etc.):
Dim A As Variant
Set A = Range("RngeTbl").Value
For i = 1 To UBound(A)
If A(i,2) <> "DO NOT CLEAR" Then Range(A(i,1)).Value = A(i,3)
Next i
If MaxRowCount is smaller than the number of rows in the range, then of course you could use that rather than UBound(A) in the loop.
This code will Sort your RngeTbl range on the "Clear?" column, then count how many non-Blank cells are in the "Clear?" column, and start the loop at the next row.
This will mean that the loop skips all of the "DO NOT CLEAR" ranges - if all ranges are to be cleared then the code will run slightly slower. If there are no ranges to be cleared then the code will only take about as long as the Sort does.
Dim lStart As Long
'Sort the range, without header
[RngeTbl].Sort [RngeTbl].Cells(1, 2), xlAscending, Header:=xlNo
'Since Calculation should be Manual for speed, we recalculate the sorted Range...
[RngeTbl].Calculate
'Count the Non-Blank cells in the "Clear?" column, to find the first non-blank cell
lStart = 1 + WorksheetFunction.CountA([RngTbl].Columns(2))
'If there ARE any non-blank cells
If lStart <= MaxRowCount Then
'Skip all of the "DO NOT CLEAR" cells
For i = lStart To MaxRowCount
Range(Range("RngeTbl").Cells(i, 1).Value).Value = Range("RngeTbl").Cells(i, 3).Value
Next i
Next lStart

Move to new row every 4th column

I got data that is stored in one row of columns in an excel sheet.
Is there a way to move data to next row every 4th column? Is there a built in function for this?
For example:
Data:
Actinium | Ac | 89 | 227.0278* | Aluminum | Al | 13 | 26,981539 ...
Output:
Actinium |Ac | 89 | 227.0278*
Aluminum |Al | 13 | 26,981539
I made a macro:
Sub dela_igen()
Dim i As Integer, j As Integer, cur_column As Integer
cur_column = 1
For i = 1 To 100
For j = 1 To 4
Cells(i, j).Value = Cells(1, cur_column).Value
cur_column = cur_column + 1
Next j
Next i
End Sub
Worked like a charm!

How to compare two (pivot) tables data on (different) worksheets?

There are 2 worksheets in 1 excel file with the following identical column structure:
BuildIndex | Phase | Module | Duration
The column BuildIndex is used as primary key.
Assume the following example data:
Worksheet 1:
1 | Phase 1 | Module 1 | 5
1 | Phase 2 | Module 1 | 3
1 | Phase 3 | Module 1 | 10
1 | Phase 1 | Module 2 | 6
1 | Phase 2 | Module 2 | 2
1 | Phase 3 | Module 2 | 5
Worksheet 2:
2 | Phase 1 | Module 1 | 3
2 | Phase 2 | Module 1 | 7
2 | Phase 3 | Module 1 | 9
2 | Phase 1 | Module 2 | 2
2 | Phase 2 | Module 2 | 10
2 | Phase 3 | Module 2 | 4
For now I create different pivot tables and diagrams and analyze the differences "by hand" to make decisions like
for build index 1 the module 2 is build X seconds faster than in build index 2
for build index 2 the phase 3 (sum of all modules) is build Y seconds faster than in build index 1
That's what I want to do:
Because there are many phases and the count of modules is increasing continuously, the above procedure takes too much time and I think there's an automatic way to perform analyzes like these.
So, do you have any idea if there's a way to realize my intention? Feel free to provide hints for excel formulas or pivot tables or vba or or or :-)
I solved it using VBA. Never worked before with it, so my code could be improved ;-)
Call AllInOne for phases (any variable used is declared as String):
Option Explicit
Sub ExtractUniquePhasesAndModules()
'--------------------------------------
'| Perform calculations for TEST DATA |
'--------------------------------------
srcSheet = "CompareData"
destSheet = "CompareResults"
destPkColumn = "A"
destColumn = "B"
calculateColumn = "C"
'Phases 1
srcPkCell = "A2"
srcColumn = "B"
sumValuesColumn = "D"
AllInOne srcSheet, srcColumn, destSheet, destColumn, calculateColumn, sumValuesColumn, srcPkCell, destPkColumn
'Phases 2
srcPkCell = "F2"
srcColumn = "G"
sumValuesColumn = "I"
AllInOne srcSheet, srcColumn, destSheet, destColumn, calculateColumn, sumValuesColumn, srcPkCell, destPkColumn
End Sub
And this is the problem solving function:
Private Sub AllInOne(srcSheetName As String, srcColumnName As String, destSheetName As String, _
destColumnName As String, calculateColumnName As String, sumValuesColumnName As String, _
srcPkCellName As String, destPkColumnName As String)
Dim srcSheet As Worksheet
Dim destSheet As Worksheet
Dim srcColumn As Range
Dim destColumn As Range
Dim srcPkCell As Range
Dim destPkColumn As Range
Dim sumValuesColumn As Range
Dim wsf As WorksheetFunction
Set srcSheet = Worksheets(srcSheetName)
Set srcColumn = srcSheet.Range(srcColumnName + ":" + srcColumnName)
Set destSheet = Worksheets(destSheetName)
Set destColumn = destSheet.Range(destColumnName + ":" + destColumnName)
Set srcPkCell = srcSheet.Range(srcPkCellName)
Set destPkColumn = destSheet.Range(destPkColumnName + ":" + destPkColumnName)
Set sumValuesColumn = srcSheet.Range(sumValuesColumnName + ":" + sumValuesColumnName)
Set wsf = WorksheetFunction
'-----------------------
'Copy all unique values|
'-----------------------
destSheet.Select
Dim ctr As Range
'find the first empty cell
For Each ctr In destColumn.Cells
If ctr.Value = "0" Then
'do nothing
ElseIf ctr.Value = Empty Then
Exit For
End If
Next
'start copying
srcColumn.AdvancedFilter Action:=xlFilterCopy, CopyToRange:=ctr, Unique:=True
'set destination range to only the new cells
Set destColumn = destSheet.Range(ctr.Address + ":" + destColumnName & destColumn.Count)
Dim cell As Range
Dim calcCell As Range
Dim destPkCell As Range
For Each cell In destColumn.Cells
'end of list reached?
If cell.Value = Empty Then
Exit For
End If
'Fill in primary key
Set destPkCell = destSheet.Range(destPkColumnName & cell.Row)
destPkCell.Value = srcPkCell.Value
'Perform the sum-calculation and show the result
Set calcCell = destSheet.Range(calculateColumnName & cell.Row)
calcCell.Value = wsf.SumProduct(wsf.SumIf(srcColumn, "=" & cell.Value, sumValuesColumn))
Next
End Sub
First it iterates over the destination column to find the first empty cell. This cell is then used as CopyToRange argument in the AdvancedFilter function.
Then it inserts the primary key (BuildIndex in my case) and the result of SumProduct for every row.
The result for phases using the questions data is this:
1 | Phase | 0
1 | Phase 1 | 11
1 | Phase 2 | 5
1 | Phase 3 | 15
2 | Phase | 0
2 | Phase 1 | 5
2 | Phase 2 | 17
2 | Phase 3 | 13
Now I'm able to create diagrams just like I want :-)

Resources