I have a spreadsheet that contains over 100k rows in a single column (I know crazy) and I need to find an efficient way to highlight partial duplicates and remove them. All the records are all in the same format, but may have an additional letter attached at the end. I would like to keep the first instance of the partial duplicate, and remove all instances after.
So from this:
1234 W
1234 T
9456 S
1234 T
To This:
1234 W
9456 S
I was going to use the formula below to conditionally highlight the partial dupes, but i receive an error "You may not use reference operators (such as unions....) or array constants for Conditional Formatting criteria" and use VBA to remove those highlighted cells.
=if(A1<>"",Countif(A$1:A,left(A1,4)& "*") > 1)
Any thoughts? I know conditional formatting is memory intensive, so if there's any way to perform this using VBA I'm open to suggestion.
Here is one way to remove the duplicates quickly:
Text to Columns, using space delimiter.
Remove Duplicates referring to duplicates in the first column only.
Merge the content of each row with =Concatenate(A1, B1).
If the "unique identifier" of each value is just its first 4 characters, then maybe the code below will be okay for you.
I recommend making a copy of your file before running any code, as code tries to overwrite the contents of column A. (The procedure to run is PreprocessAndRemoveDuplicates.)
You may need to change the name of the sheet (in the code). I assumed "Sheet1".
Code assumes data is only in column A.
Option Explicit
Private Sub PreprocessAndRemoveDuplicates()
Dim targetSheet As Worksheet
Set targetSheet = ThisWorkbook.Worksheets("Sheet1") ' Change to whatever yours is called. You could use code name instead too.
Dim lastCell As Range
Set lastCell = targetSheet.Cells(targetSheet.Rows.Count, "A").End(xlUp)
Debug.Assert lastCell.Row > 1
Dim inputArray() As Variant
inputArray = targetSheet.Range("A1", lastCell) ' Assumes data starts from A1.
Dim uniqueValues As Scripting.Dictionary
Set uniqueValues = New Scripting.Dictionary
Dim rowIndex As Long
For rowIndex = LBound(inputArray, 1) To UBound(inputArray, 1)
Dim currentKey As String
currentKey = GetKeyFromValue(CStr(inputArray(rowIndex, 1)))
If Not uniqueValues.Exists(currentKey) Then ' Only first instance added.
uniqueValues.Add currentKey, inputArray(rowIndex, 1)
End If
Next rowIndex
WriteDictionaryItemsToSheet uniqueValues, targetSheet.Cells(1, lastCell.Column)
End Sub
Private Function GetKeyFromValue(ByVal someText As String, Optional charactersToExtract As Long = 4) As String
' If below logic is not correct/appropriate for your scenario, replace with whatever it should be.
' Presently this just gets the first N characters of the string, where N is 4 by default.
GetKeyFromValue = Left$(someText, charactersToExtract)
End Function
Private Sub WriteDictionaryItemsToSheet(ByVal someDictionary As Scripting.Dictionary, ByVal firstCell As Range)
Dim initialArray() As Variant
initialArray = someDictionary.Items()
Dim arrayToWriteToSheet() As Variant
arrayToWriteToSheet = StandardiseArray(initialArray)
With firstCell
.EntireColumn.ClearContents
.Resize(UBound(arrayToWriteToSheet, 1), UBound(arrayToWriteToSheet, 2)).Value = arrayToWriteToSheet
End With
End Sub
Private Function StandardiseArray(ByRef someArray() As Variant) As Variant()
' Application.Transpose might be limited to ~65k
Dim baseDifference As Long
baseDifference = 1 - LBound(someArray)
Dim rowCount As Long ' 1 based
rowCount = UBound(someArray) - LBound(someArray) + 1
Dim outputArray() As Variant
ReDim outputArray(1 To rowCount, 1 To 1)
Dim readIndex As Long
Dim writeIndex As Long
For readIndex = LBound(someArray) To UBound(someArray)
writeIndex = writeIndex + 1
outputArray(writeIndex, 1) = someArray(readIndex)
Next readIndex
StandardiseArray = outputArray
End Function
Processed 1 million values (A1:A1000000) in under 3 seconds on my machine, but performance on your machine may differ.
Related
I am using an IF statement in Excel to search for portions of text in the previous column in order to assign a supplier and category to the expense.
Supplier Column
=IF(ISNUMBER(SEARCH("tit",[#Description])),"TITAN",IF(ISNUMBER(SEARCH("Sol",[#Description])),"Soltrack",IF(ISNUMBER(SEARCH("coin",[#Description])),"Coin",IF(ISNUMBER(SEARCH("gree",[#Description])),"Green Dream Projects",IF(ISNUMBER(SEARCH("sars V",[#Description])),"SARS VAT",IF(ISNUMBER(SEARCH("sars p",[#Description])),"SARS PAYE",IF(ISNUMBER(SEARCH("acb",[#Description])),"Debit Order","")))))))
Category Column
the next column then has the following to get the category of the supplier
=IF(ISNUMBER(SEARCH("TITAN",[#Payee])),"Direct Operating Cost",IF(ISNUMBER(SEARCH("Soltrack",[#Payee])),"Direct Operating Cost",IF(ISNUMBER(SEARCH("Coin",[#Payee])),"Direct Operating Cost",IF(ISNUMBER(SEARCH("Green Dream Projects",[#Payee])),"Direct Operating Cost",IF(ISNUMBER(SEARCH("SARS VAT",[#Payee])),"VAT",IF(ISNUMBER(SEARCH("SARS PAYE",[#Payee])),"PAYE",IF(ISNUMBER(SEARCH("Debit Order",[#Payee])),"Debit Order","")))))))
this is working great, but seems i have reached the limit (7) of IF statements I can use in one formula?
I have created the below function to search for text "tit" and if it matches it updates the Payee column.
'excel if range of cells contains specific text vba
Sub MacroToCheckIfRangeOfCellsContainsSpecificText_vba()
Set Rng = Range("B2:B572") ' You can change this
specificText = "*tit*" ' You can change this
For Each Cell In Rng.Cells
If UCase(Cell.Value) Like "*" & UCase(specificText) & "*" Then
Cell.Offset(0, 1) = "Titan"
Else
Cell.Offset(0, 1) = ""
End If
Next
End Sub
Would I need to create a new specificText = "*tit*" for each of the keywords and also a whole section for each of the "For Each" functions?
Dictionary Solution
The first idea is to use a dictionary Replacements and add all the serach/replace pairs there. This has one huge disadvantage. It is against the good practice to not mix logic (code) and data. Good practice would be to put the data not into the code but into a worksheet (see next solution).
Option Explicit
Public Sub MacroToCheckIfRangeOfCellsContainsSpecificText_vba()
Dim RngToCheck As Range
Set RngToCheck = ThisWorkbook.Worksheets("Sheet1").Range("B2:B572") ' specify in which workbook and worksheet
Dim Replacements As Object
Set Replacements = CreateObject("Scripting.Dictionary")
With Replacements
.Add "tit", "Titan"
.Add "sol", "Soltrack"
'add more here
End With
Dim InputValues() As Variant
InputValues = RngToCheck.Value 'read input values into array
Dim OutputValues() As Variant 'create an output array (same size as RngToCheck)
ReDim OutputValues(1 To RngToCheck.Rows.Count, 1 To 1)
Dim iRow As Long
For iRow = 1 To UBound(OutputValues, 1)
Dim Key As Variant
For Each Key In Replacements.Keys
If UCase(InputValues(iRow, 1)) Like "*" & UCase(Key) & "*" Then
OutputValues(iRow, 1) = Replacements(Key)
Exit For 'we don't need to test for the others if we found a key
End If
Next Key
Next iRow
'write output values from array next to input values in the cells
RngToCheck.Offset(ColumnOffset:=1).Value = OutputValues
End Sub
Worksheet Solution
The better solution would be to create a new worksheet Replacements as below:
This can easily be edited by anyone and you don't need to fiddle with the code later if you want to delete or add pairs.
Public Sub ImprovedCheckUsingWorksheet()
Dim RngToCheck As Range
Set RngToCheck = ThisWorkbook.Worksheets("Sheet1").Range("B2:B572") ' specify in which workbook and worksheet
Dim Replacements() As Variant 'read replacements from worksheet
Replacements = ThisWorkbook.Worksheets("Replacements").Range("A2", ThisWorkbook.Worksheets("Replacements").Cells(Rows.Count, "B").End(xlUp)).Value 'read input values into array
Dim InputValues() As Variant
InputValues = RngToCheck.Value 'read input values into array
Dim OutputValues() As Variant 'create an output array (same size as RngToCheck)
ReDim OutputValues(1 To RngToCheck.Rows.Count, 1 To 1)
Dim iRow As Long
For iRow = 1 To UBound(OutputValues, 1)
Dim rRow As Long
For rRow = 1 To UBound(Replacements, 1)
If UCase(InputValues(iRow, 1)) Like "*" & UCase(Replacements(rRow, 1)) & "*" Then
OutputValues(iRow, 1) = Replacements(rRow, 2)
Exit For 'we don't need to test for the others if we found a key
End If
Next rRow
Next iRow
'write output values from array next to input values in the cells
RngToCheck.Offset(ColumnOffset:=1).Value = OutputValues
End Sub
For a 3ʳᵈ column in your replacements worksheet you would need to adjust the following line to be until column "C":
Replacements = ThisWorkbook.Worksheets("Replacements").Range("A2", ThisWorkbook.Worksheets("Replacements").Cells(Rows.Count, "C").End(xlUp)).Value 'read input values into array
and the output values need another column too (second parameter needs to go 1 To 2):
ReDim OutputValues(1 To RngToCheck.Rows.Count, 1 To UBound(Replacements, 2) - 1) 'this works for any amount of columns as it reads the column count from the `Replacements`
the data needs to be written
OutputValues(iRow, 1) = Replacements(rRow, 2) 'first output column
OutputValues(iRow, 2) = Replacements(rRow, 3) 'second output column
and writing the output values needs to be adjusted too:
RngToCheck.Offset(ColumnOffset:=1).Resize(ColumnSize:=UBound(OutputValues, 2)).Value = OutputValues 'this works for any amount of columns as it reads the column count from `OutputValues`
Formula Solution
But if you have your data in a worksheet Replacements like above, and you don't rely an a partial match. Then you don't need VBA and can easily use a formula instead to look it up:
=IFERROR(INDEX(Replacements!B:B,MATCH(B:B,Replacements!A:A,0)),"")
Using VBA, I would like to AND across each row in a 2D array and star the result in separate 1D array without ANDing a single pair the ANDing the result with the next item in that row.
FYI This is my first time using 2D arrays so sorry if there is an obvious solution.
For example if the data in my sheet looked like this (the actual range is much larger):
I would like to do the equlavant of an excel formula: =AND(B2:D2) then =AND(B3:D3), etc...
I have code that sets everything up but I don't know how to proceed except to loop across each element of a row, store the result then loop across the next, etc, etc. I'm hoping the there is a much better (more efficient) way to proceed.
Here is my code so far
Sub Exceptions()
' Setup worksheet
Dim wks As Worksheet
Set wks = cnTest
' Find last row of range
Dim LastRow As Long
LastRow = Find_LastRow(wks) 'Functionthat returns last row
' load range into array
Dim MyArray As Variant
MyArray = wks.Range("B2:D8")
' Setup 1D Result array
Dim Results As Variant
Results = wks.Range("A2:A8")
Dim i As Long
For i = 1 To LastRow
' Perform AND function on each row of the array
' then place result in 1D array (Results())
' If this were a formul: =AND(B2:D2)
'
' Is there way to "AND" across a row in and array or
' must I "AND" MyArray(1,1) with MyArray(1,2) then AND
' that result with MyArray(1,3)
Next i
End Sub
Thank you
Try this.
Sub Exceptions()
' Setup worksheet
' load range into array
Dim MyArray As Variant
MyArray = ActiveSheet.Range("B2:D8")
' Setup 1D Result array
Dim Results As Variant
Results = ActiveSheet.Range("A2:A8")
Dim i As Long
Dim X As Long
For i = 1 To UBound(MyArray, 1)
Results(i, 1) = "True"
For X = 1 To UBound(MyArray, 2)
If MyArray(i, X) = False Then
Results(i, 1) = "False"
Exit For
End If
Next X
Next i
End Sub
Try,
Sub test()
Dim vR()
Dim rngDB As Range, rng As Range
Dim i As Long, r As Long
Set rngDB = Range("b2:b8")
r = rngDB.Rows.Count
ReDim vR(1 To r)
For Each rng In rngDB
i = i + 1
vR(i) = WorksheetFunction.And(rng.Resize(1, 3))
Next rng
Range("a2").Resize(r) = WorksheetFunction.Transpose(vR)
End Sub
In the formula bar, type:
=IF(-PRODUCT(IF(A1,-1,0),IF(C1,-1,0)),TRUE,FALSE)
(if the data is in columns A and C), and drag down.
Because, as everyone knows, A AND B = AB if A and B are Boolean variables (and watch the minus in front of the PRODUCT).
I have a range consisting of two columns that the user would define thru Application.Inputbox method. I would store that as rng in the VBA to be copied then pasted later to some cells in Excel sheet. Before pasting, I would like to swap these two columns in rng. Is there a way to do that without a loop and without having to swap the actual original columns in the excel sheet?
So what I mean is something like this:
rng_swapped.Columns(1).Value = rng.Columns(2).Value
rng_swapped.Columns(2).Value = rng.Columns(1).Value
rng = rng_swapped
Use a variant array as an intermediate temporary storage so you can overwrite the original.
dim arr as variant
arr = rng_swapped.Columns(1).value
rng_swapped.Columns(1) = rng_swapped.Columns(2).Value
rng_swapped.Columns(2) = arr
from your narrative my understanding is that the range to paste to is different from the range to copy from.
so just go like this
Dim rng As Range
Set rng = Application.InputBox("Please select a range:", "Range Selection", , , , , , 8)
Dim rngToPaste As Range
Set rngToPaste = rng.Offset(, 20) ' just a guess...
rngToPaste.Columns(1).Value = rng.Columns(2).Value
rngToPaste.Columns(2).Value = rng.Columns(1).Value
How to use Jeeped's code
While playing around with the code... my curiosity fires away:
Why not:?
arr1 = oRng.Columns(1)
arr2 = oRng.Columns(2)
oRng.Columns(1) = arr2
oRng.Columns(2) = arr1
It turns out something (probably) the extra line makes the code slower (by about 10%).
I have a similar scenario and I know the range address. How should I use the code?
Sub SwapColumnsRange()
'Description
'In a specified range, swaps the first two columns i.e. the values of
'column(1) become the values of column(2) and the values of column(2) become
'the values of column(1).
'Arguments as constants
'cStrRange
'A string containing the Address of the range to be processed.
Const cStrRange As String = "A1:B50000" 'Your range address here.
Dim arr As Variant
Dim oRng As Range
Set oRng = Range(cStrRange)
If oRng.Areas.Count > 1 Then Exit Sub
If oRng.Columns.Count < 2 Then Exit Sub
'Slightly modified Jeeped's code
arr = oRng.Columns(1) '.Value
oRng.Columns(1) = oRng.Columns(2).Value
oRng.Columns(2) = arr
End Sub
I forgot to mention that I have more than two columns to be swapped!?
Sub ShiftColumnsRangeLeft()
'Description
'In a specified range with columns from 1 to 'n', shifts columns to the left
'i.e. the values of column(1) become the values of column(n), the values of
'column(2) become the values of column(1)... ...the values of column(n), the
'last column, become the values of column(n-1).
'Arguments as constants
'cStrRange
'A string containing the Address of the range to be processed.
Const cStrRange As String = "A1:I50000" 'Your range address here.
Dim arr As Variant
Dim oRng As Range
Dim i As Integer
Set oRng = Range(cStrRange)
If oRng.Areas.Count > 1 Then Exit Sub
If oRng.Columns.Count < 2 Then Exit Sub
For i = 1 To oRng.Columns.Count - 1 'ShiftColumnsRangeRight Difference
'Slightly modified Jeeped's code
arr = oRng.Columns(i) '.Value
oRng.Columns(i) = oRng.Columns(i + 1).Value
oRng.Columns(i + 1) = arr
Next
End Sub
You're a little off topic here, aren't you?
But not to this side, to the other side, please!?
Sub ShiftColumnsRangeRight()
'Description
'In a specified range with columns from 1 to 'n', shifts columns to the right
'i.e. the values of column(1) become the values of column(2), the values of
'column(2) become the values of column(3)... ...the values of column(n), the
'last column, become the values of column(1).
'Arguments as constants
'cStrRange
'A string containing the Address of the range to be processed.
Const cStrRange As String = "A1:I50000" 'Your range address here.
Dim arr As Variant
Dim oRng As Range
Dim i As Integer
Set oRng = Range(cStrRange)
If oRng.Areas.Count > 1 Then Exit Sub
If oRng.Columns.Count < 2 Then Exit Sub
For i = oRng.Columns.Count - 1 To 1 Step -1 'ShiftColumnsRangeLeft Difference
'Slightly modified Jeeped's code
arr = oRng.Columns(i) '.Value
oRng.Columns(i) = oRng.Columns(i + 1).Value
oRng.Columns(i + 1) = arr
Next
End Sub
I've changed my mind, I want to select a range and then run the macro to shift the columns!?
Sub ShiftColumnsSelectionRight()
'Description
'In a selection with columns from 1 to 'n', shifts columns to the right
'i.e. the values of column(1) become the values of column(2), the values of
'column(2) become the values of column(3)... ...the values of column(n), the
'last column, become the values of column(1).
Dim arr As Variant
Dim oRng As Range
Dim i As Integer
Set oRng = Selection
If oRng.Areas.Count > 1 Then Exit Sub
If oRng.Columns.Count < 2 Then Exit Sub
For i = oRng.Columns.Count - 1 To 1 Step -1 'ShiftColumnsRangeLeft Difference
'Slightly modified Jeeped's code
arr = oRng.Columns(i) '.Value
oRng.Columns(i) = oRng.Columns(i + 1).Value
oRng.Columns(i + 1) = arr
Next
End Sub
I've had it! Do the other two versions (Swap & ShiftLeft) yourself!
Remarks
These examples demonstrate how by making some simple modifications, the code can be used in different scenarios.
50000 is used to emphasize that the handling of the initial problem by looping through the range instead of using an array gets much, much slower as more rows are in the range.
The first If Statement ensures that the range is contiguous, and the second one ensures that there are at least two columns in the range.
Issues
I'm not completely sure that the '.value' part in the first line is not needed, but the code worked fine so far. On the other hand the '.value' part in the second line is needed or empty cells will be transferred.
When there are formulas in the range, they will be lost i.e. values will be transferred instead.
I'm trying to resize a table located in the middle of my sheet. Code snippet - most integers you're seeing are actually variables, but there's no sense in having a huge amount of extra code.
Sub StackOverFlowTest()
destSheet = Thisworkbook.Sheets("Test")
Set DestTb = destSheet.ListObjects("CTROutputTable")
DestTb.Resize DestTb.Range.Resize(100+1,5)
End Sub
I have a table, DestTb, of N rows and 5 columns. I'd like it to turn into 100 rows + a header column. It's located at an unknown (Read: Dynamic) location in my sheet. Moving data into it directly isn't auto-expanding the table, so I need to resize the table first.
How can I easily resize the number of rows in a table?
Full code so far, if you're really interested:
https://gist.github.com/OlivierHJ/5b039a8c5da05d137f5c8d00f6108309
This is an updated answer because OP has a dynamic table. That means the table won't be always in same range, so we need to get the address of ListObject everytime.
For this code, I needed two extra functions to locate the where is the table every time. 1 UDF to extract the text part of an address and 1 UDF to extract the number part of Address.
Sub RESIZZE_TABLE()
Dim DestTb As ListObject
Set DestTb = Sheets("Hoja1").ListObjects("Tabla1")
Dim TotalRows As Long
Dim MyFirstCell As String
Dim MyLastCol As String
Dim MyLastRow As String
Dim ColCount As Integer
ColCount = 5 'number of columns in your table
MyFirstCell = Range(DestTb).Cells(0, 1).Address
TotalRows = (DestTb.DataBodyRange.Count / ColCount) 'how many rows got table
MyLastCol = TextOnly(Range(DestTb).Cells(TotalRows, ColCount).Address) 'we get letter of last column of table
MyLastRow = onlyDigits(Range(DestTb).Cells(TotalRows, ColCount).Address) 'we get number of last row of table
DestTb.Resize Range(MyFirstCell & ":" & MyLastCol & (MyLastRow + 2)) 'change 2 by number of rows you want to increase
End Sub
Private Function TextOnly(ByVal xValue As String) As String
'source: https://www.extendoffice.com/documents/excel/1625-excel-extract-text-from-alphanumeric-string.html
'adapted for SO
Dim OutValue As String
Dim xIndex As Byte
For xIndex = 1 To Len(xValue)
If Not IsNumeric(Mid(xValue, xIndex, 1)) Then
OutValue = OutValue & Mid(xValue, xIndex, 1)
End If
Next
TextOnly = OutValue
End Function
Private Function onlyDigits(s As String) As String
' Variables needed (remember to use "option explicit"). '
Dim retval As String ' This is the return string. '
Dim i As Integer ' Counter for character position. '
' Initialise return string to empty '
retval = ""
' For every character in input string, copy digits to '
' return string. '
For i = 1 To Len(s)
If Mid(s, i, 1) >= "0" And Mid(s, i, 1) <= "9" Then
retval = retval + Mid(s, i, 1)
End If
Next
' Then return the return string. '
onlyDigits = retval
End Function
Tested with a table moving around a worksheet. It did not matter where it was the table, the code always resized the table with no problems.
Hope this works now!
I would imagine that if you are reducing the number of rows you will wish to clear the cells that no longer belong to the table.
Dim iTBLrws As Long, rng As Range, rngOLDBDY As Range
iTBLrws = 100
With Thisworkbook.Sheets("Test").ListObjects("CTROutputTable")
Set rngOLDBDY = .DataBodyRange
.Resize .Range.Cells(1, 1).Resize(iTBLrws, .DataBodyRange.Columns.Count)
If rngOLDBDY.Rows.Count > .DataBodyRange.Rows.Count Then
For Each rng In rngOLDBDY
If Intersect(rng, .DataBodyRange) Is Nothing Then
rng.Clear
End If
Next rng
End If
End With
Can someone tell me how I can resize a column reference to the first row in a worksheet?
I get errors (for different reasons on both) which is understandable but still frustrating:
Attempt 1:
Dim Columns_To_Export as String
Columns_To_Export ="$B:$C,$E:$E"
Range(Columns_To_Export).Resize(1).Select
Attempt 2:
Dim Columns_To_Export as String
Columns_To_Export ="$B:$C,$E:$E"
Colulumns(Columns_To_Export).Resize(1).Select
I'm afraid I wasn't very attentive with my above response. Here is what I believe is a better try.
Private Sub SelectCells()
On Error Resume Next
Debug.Print CellsForExport("$B:$C, $E:$E, G").Address
End Sub
Function CellsForExport(ByVal Desc As String) As Range
' 15 Apr 2017
Dim Fun As Range, Rng As Range
Dim Spi() As String, Spj() As String
Dim i As Integer, j As Integer
Dim Cstart As Long, Cend As Long
Desc = Replace(Desc, "$", "")
Spi = Split(Desc, ",")
For i = 0 To UBound(Spi)
Spj = Split(Trim(Spi(i)), ":")
Cstart = Columns(Trim(Spj(0))).Column
Cend = Cstart
If UBound(Spj) Then Cend = Columns(Trim(Spj(1))).Column
With ActiveSheet.Rows(1)
Set Rng = .Range(.Cells(Cstart), .Cells(Cend))
End With
If Fun Is Nothing Then
Set Fun = Rng
Else
Set Fun = Application.Union(Fun, Rng)
End If
Next i
Set CellsForExport = Fun
End Function
The function CellsForExport returns a range as specified by the calling procedure SelectCells. The calling procedure in this case doesn't select the range. Instead it prints its address. That is for testing purposes. CellsForExport("$B:$C, $E:$E, G").Select will select the cells. You could also paste this range somewhere or manipulate it in any other way you can manipulate a range.
Note that you can omit the $ signs when specifying the columns. You also don't have to specify E:E to define a single column, but if someone does all that the macro will sort it out. Blank spaces don't matter, commas are of the essence.
Basically, Column is a range, not a string. This code will do the job.
Dim Rng As Range
With ActiveSheet
Set Rng = Application.Union(.Columns("B"), .Columns("C"), .Columns("E"))
End With
Rng.Select