My array fails to replace and offset values VBA - excel

MyFile = Dir(MyFolder)
Do While MyFile <> ""
Application.StatusBar = "Opening" & MyFile
Set wbk = Workbooks.Open(MyFolder & MyFile, True, True)
bFound = False
For Each ws In wbk.Sheets
If ws.Name = "Sheet 1" Then
Range("B2").Select 'This gives us the first cell
Do Until IsEmpty(ActiveCell)
For i = LBound(arr, 1) To UBound(arr, 1)
For j = LBound(arr, 2) To UBound(arr, 2)
If arr(i, j) <> "" And arr(i, j) <> ActiveCell.Value Then
For k = UBound(arr, 2) To j + 1 Step -1
arr(k, i) = arr(k - 1, i)
Next k
arr(i, j) = ActiveCell.Value
End If
If arr(i, j) = "" Then
arr(i, j) = ActiveCell.Value
ActiveCell.Offset(1, 0).Select
End If
If arr(i, j) = ActiveCell.Value Then
ActiveCell.Offset(1, 0).Select
End If
If ActiveCell.Value = "" Then
Exit For
End If
Next j
If ActiveCell.Value = "" Then
Exit For
End If
Next i
Loop
End If
Next
Loop
As you can see in the image I have an array of 4 elements, I got these elements into the array by iterating through the second column of a worksheet called "Sheet 1" from a workbook called "food.xlsx" in a folder that the user choses by selection. after the array places every element from column 2 of sheet "Sheet 1" and then places these elements into column 1 of itself, our array looks like the following image...
We then move on to the next workbook called "food2.xlsx" which is located in the same folder. We look at column 2 of food2.xlsx. Column 2 of food2.xlsx has the exact same values at the exact same rows as column 2 of food.xlsx. The only difference is that in row 3 of column 2 in food2.xlsx, instead of having a value of "chocolate", there is a value of "vanilla". What am I trying to do is place "vanilla" in the location of the array where "chocolate" is currently located, this would be at arr(1,3). Then what I want is to push "chocolate" and every other value under it down one spot. So the array should end up like..
The part of the code that is NOT doing its job is the if statement that starts with "If arr(i, j) <> "" And arr(i, j) <> ActiveCell.Value Then"
IMPORTANT: I need this to work for any new encountered value, not just vanilla

Unrelated note: I prefer the FileSystemObject API over the VBA Dir function; which you can use by adding a reference (Tools -> References...) to the Microsoft Scripting Runtime library.
I would suggest using a disconnected ADO recordset. Recordsets are commonly associated with pulling data from databases or other data sources; but we can construct and fill our own, and use the recordset's built-in sorting capabilities. This frees us from worrying about shifting elements back and forth within the array, or even from the proper position in which to insert the new element.
Add a reference (Tools -> References...) to Microsoft ActiveX Data Objects; choose the latest version -- usually 6.1.
Then, you could have code like the following:
' Define the shape of the recordset
Dim rs As New ADODB.Recordset
rs.Fields.Append "Entry", adVarChar, 100
rs.Fields.Append "FileIndex", adTinyInt
rs.Fields.Append "RowIndex", adTinyInt
rs.Open
' Loop over the files, and populate the recordset
MyFile = Dir(MyFolder)
Do While MyFile <> ""
Dim fileIndex As Integer
Application.StatusBar = "Opening" & MyFile
Set wbk = Workbooks.Open(MyFolder & MyFile, True, True)
Dim data As Variant
data = wbk.Worksheets("Sheet 1").UsedRange.Columns(2).Value
Dim ubnd As Integer
ubnd = UBound(data, 1)
Dim rowIndex As Integer
For rowIndex = 1 To ubnd
Dim entry As String
entry = data(rowIndex, 1)
rs.Find "Entry='" & entry & "'"
If rs.BOF Or rs.EOF Then ' record hasn't been found or recordset is empty
rs.AddNew _
Array("Entry", "RowIndex", "FileIndex"), _
Array(entry, rowIndex, fileIndex)
rs.Update
End If
rs.MoveFirst
Next
wbk.Close
MyFile = Dir
fileIndex = fileIndex + 1
Loop
' Specify the sort order, first by the row within the file, then by the order in which
' the file was processed
rs.Sort = "RowIndex,FileIndex"
' Iterate over the data, and print it to the Immediate pane
rs.MoveFirst
Do Until rs.EOF
Debug.Print rs("Entry")
Loop
Note that the elements are sorted first by the order in which they appear in their respective files, then by the order in which the file was processed.
Links
Excel
Workbook object — Worksheets property and collection, Close method
Worksheet object — UsedRange property
Range object — Columns and Value properties
VBA
Dir, UBound functions
ADO
Recordset object
Fields property and collection, Fields.Append method
BOF / EOF properties
Find, AddNew, Update, MoveFirst methods
Sort property

Since you're determined to use an array, here's an example of how to shift "rows" down to create an empty slot at a specified index:
Sub Tester()
Dim arr
arr = Range("A1:E10").Value 'get some data
ShuffleDown arr, 3 'insert a blank row at index 3
Range("G1:K10").Value = arr 'show the modified content
End Sub
'Create a blank row at InsertRowIndex in a 2-D array, by shifting content down
'Does not warn about content being lost from the last "row"
' of the array if there's already content there !
Sub ShuffleDown(ByRef arr, InsertRowIndex As Long)
Dim rw As Long, col As Long
For rw = UBound(arr, 1) To InsertRowIndex Step -1
For col = LBound(arr, 2) To UBound(arr, 2)
If rw > InsertRowIndex Then
arr(rw, col) = arr(rw - 1, col)
Else
arr(rw, col) = ""
End If
Next col
Next rw
End Sub

Related

Reading medium large .dat file with Excel VBA

Your support is really appreciated!
I am receiving a .dat file from a measuring tool, which is found hard to get in to excel.
I would like to do it without power query as well.
I do this in steps:
Step 1; convert dat file to "csv/txt" by removing duplicate spaces and replacing spaces with ";", also replacing "." with ",".
I would like to keep this format as several other tools tends to use similar format.
And from this I thought it would be fairly ok to import it, however...
First row of 11000 rows of .dat file:
1 1 -0.4200 -0.0550 0.1420 173 174 181 56.3 55.5 59.3 87 84 95 0.778 0 0 0
first row of the converted file, all rows below looks good as well.
1;1;-0,4260;-0,1500;0,0990;171;168;176;55,5;53,8;57,6;96;83;82;4,794;0;0;0
if I import this file with power query it seems ok.
Step 2:
When importing it with the code below, following occurs on line 660
from txt file
1;660;-1,0210;-0,0340;0,0470;169;164;176;54,6;51,2;57,2;15;96;63;0,782;0;0;0
from excel:
Debuging the shows following:
file:
format of the cell is "Numbers" and not "geeral" as for other numbers.
This seems to occure now and then, and typically when the number goes above -1,xx.
Code is found online, and is fairly quick.
I suspect that something happens when putting the two-dimensional variant array into the sheet
Dim Data As Variant 'Array for the file values
.
.
.
.
With Sheets(parSheetName)
'Delete any old content
.cells.ClearContents
'A range gets the same dimensions as the array
'and the array values are inserted in one operation.
.cells(4, 1).Resize(UBound(Data, 1), UBound(Data, 2)) = Data
End With
End If
Option Explicit
'**************************************************************
' Imports CSV to sheet, following the generated numbers will be placed in a table.
'**************************************************************
Public Sub copyDataFromCsvFileToSheet(parFileName As String, _
parDelimiter As String, parSheetName As String)
Dim Data As Variant 'Array for the file values
Dim I As Long
Dim J As Long
Dim prt As String
'Function call - the file is read into the array
Data = getDataFromFile(parFileName, parDelimiter)
'If the array isn't empty it is inserted into
'the sheet in one swift operation.
If Not isArrayEmpty(Data) Then
'If you want to operate directly on the array,
'you can leave out the following lines.
ActiveWorkbook.Worksheets.Add(After:=Sheets(Sheets.Count)).Name = parSheetName
'For I = 1 To 1000 'UBound(Data, 1)
'For J = 1 To 18 'UBound(Data, 2)
''prt = Data(I, J)
''Debug.Print prt
''ThisWorkbook.Worksheets(parSheetName).cells(I, J) = Data(I, J)
'Next J
'Next I
'Debug.Print "done"
'End If
With Sheets(parSheetName)
'Delete any old content
.cells.ClearContents
'A range gets the same dimensions as the array
'and the array values are inserted in one operation.
.cells(4, 1).Resize(UBound(Data, 1), UBound(Data, 2)) = Data
End With
End If
'Call sbCreatTable(parSheetName)
End Sub
'**************************************************************
Private Function isArrayEmpty(parArray As Variant) As Boolean
'Returns False if not an array or a dynamic array
'that hasn't been initialised (ReDim) or
'deleted (Erase).
If IsArray(parArray) = False Then isArrayEmpty = True
On Error Resume Next
If UBound(parArray) < LBound(parArray) Then
isArrayEmpty = True
Exit Function
Else
isArrayEmpty = False
End If
End Function
Private Function getDataFromFile(parFileName As String, _
parDelimiter As String, _
Optional parExcludeCharacter As String = "") As Variant
'parFileName is the delimited file (csv, txt ...)
'parDelimiter is the separator, e.g. semicolon.
'The function returns an empty array, if the file
'is empty or cannot be opened.
'Number of columns is based on the line with most
'columns and not the first line.
'parExcludeCharacter: Some csv files have strings in
'quotations marks ("ABC"), and if parExcludeCharacter = """"
'quotation marks are removed.
Dim locLinesList() As Variant 'Array
Dim locData As Variant 'Array
Dim I As Long 'Counter
Dim J As Long 'Counter
Dim locNumRows As Long 'Nb of rows
Dim locNumCols As Long 'Nb of columns
Dim fso As Variant 'File system object
Dim ts As Variant 'File variable
Const REDIM_STEP = 10000 'Constant
'If this fails you need to reference Microsoft Scripting Runtime.
'You select this in "Tools" (VBA editor menu).
Set fso = CreateObject("Scripting.FileSystemObject")
On Error GoTo error_open_file
'Sets ts = the file
Set ts = fso.OpenTextFile(parFileName)
On Error GoTo unhandled_error
'Initialise the array
ReDim locLinesList(1 To 1) As Variant
I = 0
'Loops through the file, counts the number of lines (rows)
'and finds the highest number of columns.
Do While Not ts.AtEndOfStream
'If the row number Mod 10000 = 0
'we redimension the array.
If I Mod REDIM_STEP = 0 Then
ReDim Preserve locLinesList _
(1 To UBound(locLinesList, 1) + REDIM_STEP) As Variant
End If
locLinesList(I + 1) = Split(ts.ReadLine, parDelimiter)
J = UBound(locLinesList(I + 1), 1) 'Nb of columns in present row
'If the number of columns is then highest so far.
'the new number is saved.
If locNumCols < J Then locNumCols = J
I = I + 1
Loop
ts.Close 'Close file
locNumRows = I
'If number of rows is zero
If locNumRows = 0 Then Exit Function
ReDim locData(1 To locNumRows, 1 To locNumCols + 1) As Variant
'Copies the file values into an array.
'If parExcludeCharacter has a value,
'the characters are removed.
If parExcludeCharacter <> "" Then
For I = 1 To locNumRows
For J = 0 To UBound(locLinesList(I), 1)
If Left(locLinesList(I)(J), 1) = parExcludeCharacter Then
If Right(locLinesList(I)(J), 1) = parExcludeCharacter Then
locLinesList(I)(J) = _
Mid(locLinesList(I)(J), 2, Len(locLinesList(I)(J)) - 2)
Else
locLinesList(I)(J) = _
Right(locLinesList(I)(J), Len(locLinesList(I)(J)) - 1)
End If
ElseIf Right(locLinesList(I)(J), 1) = parExcludeCharacter Then
locLinesList(I)(J) = _
Left(locLinesList(I)(J), Len(locLinesList(I)(J)) - 1)
End If
locData(I, J + 1) = locLinesList(I)(J)
Next J
Next I
Else
For I = 1 To locNumRows
For J = 0 To UBound(locLinesList(I), 1)
locData(I, J + 1) = locLinesList(I)(J)
Next J
Next I
End If
getDataFromFile = locData
Exit Function
error_open_file: 'Returns empty Variant
unhandled_error: 'Returns empty Variant
End Function
Due to mentioned several measuring tools, the power query is un suited, and the control is better when using the ole way of doing it.
Solution:
Setting the variant to decimal when building the array
CDec(locLinesList(I)(J))
Thanks for your responce!

Sort names based on last name within excel and paste in 3-column table in a newly generated word document

I currently have an excel document where I want to copy over the names that have been "checked" via a checkbox, alphabetize on the last names and copy over these alphabetized names to a newly generated word document and paste them in a 3 column table. The number of rows is dynamic, based on the number of checkboxes the user has checked. Here is the code I currently have:
Dim cl As Range, txt As String, temp1 as Variant, temp2 as String
For Each cl In ThisWorkbook.Worksheets(1).Range("D6:D122")
If cl Then
temp1 = VBA.Split(cl.offset(0,-2), " ") ' the names are a couple columns before the corresponding checkbox
temp2 = temp1(1) & ", " & temp1(0)
txt = txt & vbLf & temp2
End If
Next cl
If Len(txt) > 0 Then ' show Word if you have something to output
Dim wdApp As New Word.Application ' declare and create object at once
With wdApp
.Documents.Add ' the added document automatically becomes active
.Selection.TypeText Mid(txt, 2) 'remove extra (lead) vbLf and output text to Word
.ActiveDocument.Range.Sort
.Visible = True 'show Word after processing to improve performance
End With
End Sub
The problem I have is that I don't want to paste "last name, first name" in the word document, I want to paste "first name last name" which has been sorted by the last name in a word table with dynamic/undefined number of rows.
You could sort the names first using an array. The example below is based on a range with lastname in the first column and firstname in the second. Depending on your needs you might have to tweak it a bit, but the principle should still work.
Sub RangeToArr()
Dim wApp As New Word.Application
Dim wDoc As Document, wTbl As Table
Dim i As Long, Arr As Variant
Arr = Sheet1.Range("A2:B7").Value
Arr = SortArr(Arr)
''create document
Set wDoc = wApp.Documents.Add
''create table
With wDoc
Set wTbl = .Tables.Add(.Paragraphs(.Paragraphs.Count).Range, UBound(Arr), 3)
End With
''fill names into table
With wTbl
For i = LBound(Arr) To UBound(Arr)
.cell(i, 1).Range.InsertAfter Arr(i, 2) & " " & Arr(i, 1) ''names to column 1
'' .cell(i, 2).Range.InsertAfter ''Optional input column 2
'' .cell(i, 3).Range.InsertAfter ''Optional input column 3
Next i
.Columns.AutoFit
End With
wApp.Visible = True ''show Word
End Sub
This is the sort function:
Function SortArr(Arr As Variant)
Dim i As Long, j As Long, Temp1, Temp2
For i = LBound(Arr) To UBound(Arr) - 1
For j = i + 1 To UBound(Arr)
If UCase(Arr(i, 1)) > UCase(Arr(j, 1)) Then
Temp1 = Arr(j, 1)
Temp2 = Arr(j, 2)
Arr(j, 1) = Arr(i, 1)
Arr(j, 2) = Arr(i, 2)
Arr(i, 1) = Temp1
Arr(i, 2) = Temp2
End If
Next j
Next i
SortArr = Arr
End Function

Find and extract data from Excel sheet and paste it into related columns using VBA

I have the following objective:
Loop through a huge excel sheet (200,000+ rows)
Find some data based on matching parameters (the original file is an xml file, so structured data... but I am using a Mac, where the XML parser is not supported)
Copy the data between double quote related to each variable
Paste the value in the relative column
Additional constraints I have to face:
Every value to copy is between double quotes (this is "good news", helping me to identify the right data to copy and paste)
Imagine the txt. data as a list of data objects (=> it's sequence of purchases, with related info, made by customers). The macro should be able to loop through the list and copy paste the data, starting a new row every time a new ID purchase comes up. Good news is that every purchase is marked by a unique ID.
I’m providing below an example of input and output. I would really be grateful if someone could help me on this.
//INPUT
<SequenceNumber="1">
<PurchaseSegment DayDateTime="2020-02-29T06:45:00" ArrivalDateTime="2020-02-29T09:40:00" StopQuantity="0" PurchaseNumber="229" ElapsedTime="115">"
<DayPoS LocationCode="AAA" DockID="4" />"
<ArrivalPoS LocationCode="CCC" />"
</SequenceNumber>
<SequenceNumber="2">
<PurchaseSegment DayDateTime="2019-09-28T06:41:00" ArrivalDateTime="2020-02-29T09:40:00" StopQuantity="1" PurchaseNumber="123" ElapsedTime="115">"
<DayPoS LocationCode="AAA" DockID="3" />"
<ArrivalPoS LocationCode="QQC" />"
</SequenceNumber>
//EXPECTED OUTPUT (by running the VBA macro)
Here you can find also my VBA attempt, I leveraged some VBA code I already found, but didn't succeed.
Public Sub TextDataToColumn()
Dim val As Variant val = "PurchaseSegment DayDateTime" // it would be great to have a list of paramaters here...
Set c = Cells.Find(val, LookIn:=xlValues, MatchCase:=False)
If Not c Is Nothing Then
Do
MsgBox "Value of val is found at " & c.Address & vbCrLf & c.Offset(0, 1).Value & vbCrLf & c.Offset(0, 2).Value
Set c = Cells.FindNext(c)
Loop While Not c Is Nothing And c.Address <> firstaddress
End If
End Sub
You can parse your text using VBA text functions.
As I mentioned in my comments, what you posted is NOT a valid XML document.
I adapted what I would have done using a Dictionary and Classes, to using a Collection and Array. (Although there is a Mac add-in to enable use of dictionary object).
After pre-processing the text lines to make it easier to parse, we loop through all the text lines and store the appropriate items in defined locations in the array.
We collect each row of item into the collection object, and then output them onto a worksheet.
It works for the sample data you posted, but if your data is, in addition to being invalid xml, also has irregularities in the naming and formatting of the different nodes, you'll need a more sophisticated parsing method.
Option Explicit
Option Compare Text
Sub splitSeq()
Dim cS As Collection
Dim WB As Workbook, wsSrc As Worksheet, wsRes As Worksheet
Dim rRes As Range
Dim vSrc As Variant, vRes As Variant
Dim I As Long, v, w, x
'Set workbook, worksheet and range parameters
Set WB = ThisWorkbook
With WB
Set wsSrc = .Worksheets("Sheet4") 'or wherever the data exists
On Error Resume Next 'add a sheet if not present
Set wsRes = Worksheets("Results")
Select Case Err.Number
Case 9 'need to add a sheet
Set wsRes = WB.Worksheets.Add
wsRes.Name = "Results"
Case Is <> 0 'Something else went wrong
MsgBox "Error number " & Err.Number & vbLf & Err.Description
Err.Clear
End Select
End With
'set results range
Set rRes = wsRes.Cells(1, 1)
'read data into array for processing speed
'assuming all data is in column A
With wsSrc
vSrc = .Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp))
End With
'loop through data and save the Sequence objects
'Each starts with <sequence and ends with </sequence
Set cS = New Collection
For I = 1 To UBound(vSrc, 1)
If vSrc(I, 1) Like "<Sequence*" Then
ReDim vRes(1 To 8) 're-initialize array
Else
'Remove confusing spaces in node names and <> in attributes
vSrc(I, 1) = Replace(vSrc(I, 1), "Segment DayDate", "SegmentDayDate")
vSrc(I, 1) = Replace(vSrc(I, 1), "Pos Location", "PosLocation")
vSrc(I, 1) = Replace(vSrc(I, 1), "<", "")
vSrc(I, 1) = Replace(vSrc(I, 1), ">""", "")
vSrc(I, 1) = Replace(vSrc(I, 1), ">", "")
v = Split(vSrc(I, 1))
For Each w In v
x = Split(w, "=")
'Remove leading and trailing double quotes
If Left(x(1), 1) = """" And Right(x(1), 1) = """" Then
x(1) = Mid(x(1), 2)
x(1) = Left(x(1), Len(x(1)) - 1)
End If
Select Case x(0)
Case "PurchaseSegmentDayDateTime"
vRes(1) = x(1)
Case "ArrivalDateTime"
vRes(2) = x(1)
Case "StopQuantity"
vRes(3) = x(1)
Case "PurchaseNumber"
vRes(4) = x(1)
Case "ElapsedTime"
vRes(5) = x(1)
Case "DayPosLocationCode"
vRes(6) = x(1)
Case "ArrivalPosLocationCode"
vRes(8) = x(1)
Case "DockID"
vRes(7) = x(1)
Case "/SequenceNumber"
cS.Add vRes
End Select
Next w
End If
Next I
'set up results array
ReDim vRes(0 To cS.Count, 1 To 8)
'Headers
vRes(0, 1) = "PurchaseSegment DayDateTime"
vRes(0, 2) = "ArrivalDateTime"
vRes(0, 3) = "StopQuantity"
vRes(0, 4) = "PurchaseNumber"
vRes(0, 5) = "ElapsedTime"
vRes(0, 6) = "DayPoS LocationCode"
vRes(0, 7) = "DockID"
vRes(0, 8) = "ArrivalPoS LocationCode"
'fill in the data
I = 0
For Each v In cS
I = I + 1
With v
vRes(I, 1) = v(1)
vRes(I, 2) = v(2)
vRes(I, 3) = v(3)
vRes(I, 4) = v(4)
vRes(I, 5) = v(5)
vRes(I, 6) = v(6)
vRes(I, 7) = v(7)
vRes(I, 8) = v(8)
End With
Next v
'Set Results range
Set rRes = rRes.Resize(UBound(vRes, 1) + 1, UBound(vRes, 2))
'Write and format results
With rRes
.EntireColumn.Clear
.Value2 = vRes
.Style = "Output"
.EntireColumn.AutoFit
End With
End Sub

Dynamically Named range in excel vba

I am trying to copy named range in excel from one sheet to another, this works superb when I am using a static name. However now I would like to get the named range from a userform list box, and I am unsure how to do this. My copy function takes in the row number and I need to find this row number based on the string coming from the Listbox. If the listbox says Bolts the named range would be _OutputBolts which is refered to A123.
Call copyRows(ws, ThisWorkbook.Sheets("Templates").[_DrawingInputs].Row)
Call copyRows(ws, ThisWorkbook.Sheets("Templates").[_GeneralInputs].Row)
Call copyRows(ws, ThisWorkbook.Sheets("Templates").[_MaterialData].Row)
If GUI.ListBox_AdditionalComponents.ListCount > 0 Then
For i = 0 To GUI.ListBox_AdditionalComponents.ListCount - 1
namedRange = "[_Output" & GUI.ListBox_AdditionalComponents.List(i) & "]"
Call copyRows(ws, ThisWorkbook.Sheets("Templates").namedRange.Row)
Next i
End If
The copy procedure
Public Sub copyRows(ByRef shNew As Worksheet, startRow As Integer)
Dim i, j As Integer
Dim wsTemplates As Worksheet
Dim temp As Variant
Dim rowOverview As Integer
Dim lastCol As Integer
On Error Resume Next
Set wsTemplates = ThisWorkbook.Sheets("Templates")
i = startRow ' Where to copy from in templates
j = getLastRow(shNew, 1) 'Where to copy to, i.e append
If j > 2 Then
j = j + 2
End If
Do While wsTemplates.Cells(i, 1) <> ""
'copy the old range
wsTemplates.Rows(i).EntireRow.Copy
'paste it
shNew.Rows(j).EntireRow.Select
shNew.Paste
'format height
temp = wsTemplates.Rows(i).Height
shNew.Rows(j).RowHeight = CInt(temp)
' fill in the value from the GUI
temp = ""
temp = GUI.Controls("TextBox_" & Replace(shNew.Cells(j, 1).value, " ", "")).value
If temp = "" Then
temp = GUI.Controls("ComboBox_" & Replace(shNew.Cells(j, 1).value, " ", "")).value
End If
If temp <> "" Then
shNew.Cells(j, 4).value = temp
End If
'hyperlink drawing
If shNew.Cells(j, 1).value = "Drawing Name" Then
Call createHyperLink(shNew, j, 4, shNew.Cells(j, 4).value, GetFileNameWithOutExtension(getFilenameFromPath(shNew.Cells(j, 4).value)), shNew.Cells(j, 4).value)
End If
'update counters
i = i + 1
j = j + 1
Loop
' Format column widths, seems to be bug in this one...Maybe move out due to the fact we could do this once..
lastCol = getLastColumn(wsTemplates, 1)
For i = 1 To lastCol
temp = wsTemplates.Cells(1, i).Width
shNew.Columns(i).ColumnWidth = temp
Next i
End Sub
Solved by using Range(address), see comment

Removing duplicates based on their occurrence

I would like to check a certain column (W) for duplicates (number of occurrences is stored in another column (AZ)) and than delete all row this way:
Value is found two times in the column - delete only one row containing the value.
Value is found more times in the column - delete all the rows with the values.
My code works quite well but sometimes it doesn't delete all the duplicates as it should do. Any idea for improvement?
EDIT: The updated code works really good except that it always misses one duplicate and leaves it not deleted.
fin = ws.UsedRange.Rows.count
For i = 2 To fin
ws.Range("AZ" & i).value = Application.WorksheetFunction.CountIf(ws.Range("W2:W" & fin), ws.Range("W" & i))
Next i
For j = fin To 2 Step -1
If ws.Range("AZ" & j).value > 2 Then
ws.Range("AZ" & j).EntireRow.Delete
fin = ws.UsedRange.Rows.count
ElseIf ws.Range("AZ" & j).value = 2 Then
Set rng = Range("W:W").Find(Range("W" & j).value, , xlValues, xlWhole, , xlNext)
rngRow = rng.Row
If rngRow <> j Then
ws.Range("AZ" & rngRow) = "1"
ws.Range("AZ" & j).EntireRow.Delete
fin = ws.UsedRange.Rows.count
Else
MsgBox "Error at row " & rngRow
End If
End If
Next j
If speed is an issue, here is a method that should be faster, as it creates a collection of rows to be deleted, then deletes them. Since everything, except for the actual row deletion, is done in VBA, there are far fewer calls back and forth to the worksheet.
The routine could be sped up as noted in the inline comments.
If it is still too slow, depending on the size of the worksheet, it might be feasible to read the entire worksheet into a VBA Array; test for duplicates; write back the results to a new array and write that out to the worksheet. (If your worksheet is too large, this method might run out of memory, though).
In any event, we need both a Class Module which YOU must rename cPhrases, as well as a Regular Module
Class Module
Option Explicit
Private pPhrase As String
Private pCount As Long
Private pRowNums As Collection
Public Property Get Phrase() As String
Phrase = pPhrase
End Property
Public Property Let Phrase(Value As String)
pPhrase = Value
End Property
Public Property Get Count() As Long
Count = pCount
End Property
Public Property Let Count(Value As Long)
pCount = Value
End Property
Public Property Get RowNums() As Collection
Set RowNums = pRowNums
End Property
Public Function ADDRowNum(Value As Long)
pRowNums.Add Value
End Function
Private Sub Class_Initialize()
Set pRowNums = New Collection
End Sub
Regular Module
Option Explicit
Sub RemoveDuplicateRows()
Dim wsSrc As Worksheet
Dim vSrc As Variant
Dim CP As cPhrases, colP As Collection, colRowNums As Collection
Dim I As Long, K As Long
Dim R As Range
'Data worksheet
Set wsSrc = Worksheets("sheet1")
'Read original data into VBA array
With wsSrc
vSrc = .Range(.Cells(1, "W"), .Cells(.Rows.Count, "W").End(xlUp))
End With
'Collect list of items, counts and row numbers to delete
'Collection object will --> error when trying to add
' duplicate key. Use that error to increment the count
Set colP = New Collection
On Error Resume Next
For I = 2 To UBound(vSrc, 1)
Set CP = New cPhrases
With CP
.Phrase = vSrc(I, 1)
.Count = 1
.ADDRowNum I
colP.Add CP, CStr(.Phrase)
Select Case Err.Number
Case 457 'duplicate
With colP(CStr(.Phrase))
.Count = .Count + 1
.ADDRowNum I
End With
Err.Clear
Case Is <> 0 'some other error. Stop to debug
Debug.Print "Error: " & Err.Number, Err.Description
Stop
End Select
End With
Next I
On Error GoTo 0
'Rows to be deleted
Set colRowNums = New Collection
For I = 1 To colP.Count
With colP(I)
Select Case .Count
Case 2
colRowNums.Add .RowNums(2)
Case Is > 2
For K = 1 To .RowNums.Count
colRowNums.Add .RowNums(K)
Next K
End Select
End With
Next I
'Revers Sort the collection of Row Numbers
'For speed, if necessary, could use
' faster sort routine
RevCollBubbleSort colRowNums
'Delete Rows
'For speed, could create Unions of up to 30 rows at a time
Application.ScreenUpdating = False
With wsSrc
For I = 1 To colRowNums.Count
.Rows(colRowNums(I)).Delete
Next I
End With
Application.ScreenUpdating = True
End Sub
'Could use faster sort routine if necessary
Sub RevCollBubbleSort(TempCol As Collection)
Dim I As Long
Dim NoExchanges As Boolean
' Loop until no more "exchanges" are made.
Do
NoExchanges = True
' Loop through each element in the array.
For I = 1 To TempCol.Count - 1
' If the element is less than the element
' following it, exchange the two elements.
If TempCol(I) < TempCol(I + 1) Then
NoExchanges = False
TempCol.Add TempCol(I), after:=I + 1
TempCol.Remove I
End If
Next I
Loop While Not (NoExchanges)
End Sub
no need to use that inefficient second loop in the second section, just use a live count like so
fin = ws.UsedRange.Rows.count
For i = 2 To fin
ws.Range("AZ" & i).value = Application.WorksheetFunction.CountIf(ws.Range("W2:W" & fin), ws.Range("W" & i))
Next i
For j = fin To 2 Step -1
If ws.Range("AZ" & j).value > 2 OR Application.WorksheetFunction.CountIf(ws.Range("W2:W" & fin), ws.Range("W" & j)) = 2 Then
ws.Range("AZ" & j).EntireRow.Delete
End If
Next j
While your logic is basically sound, the method is not the most efficient. The AutoFilter Method can quickly remove all counts greater than 2 and the Range.RemoveDuplicates¹ method cansubsequently make quick work of removing one of the rows that still contain duplicate values in column W.
Dim r As Long, c As Long
With ws
If .AutoFilterMode Then .AutoFilterMode = False
r = .Cells.SpecialCells(xlLastCell).Row
c = Application.Max(52, .Cells.SpecialCells(xlLastCell).Column)
With .Range("A1", .Cells(r, c)) '.UsedRange
With .Columns(52)
If IsEmpty(.Cells(1, 1)) Then .Cells(1, 1) = "count"
With .Resize(.Rows.Count - 1, 1).Offset(1, 0)
.Cells.FormulaR1C1 = "=COUNTIF(C[-29], RC[-29])"
.Cells = .Cells.Value
End With
.AutoFilter field:=1, Criteria1:=">2"
With .Resize(.Rows.Count - 1, 1).Offset(1, 0)
If CBool(Application.Subtotal(103, .Cells)) Then
.SpecialCells(xlCellTypeVisible).EntireRow.Delete
End If
End With
.AutoFilter
End With
.RemoveDuplicates Columns:=23, Header:=xlYes
End With
End With
When you rewrite the count values in column AZ, you are likely going to rewrite 3 counts to 2, etc.
¹ The Range.RemoveDuplicates method removes duplicate rows from the bottom up.

Resources