Optimizing Excel CountIfs - can I make it go faster? - excel

I have some larger files I need to validate the data in. I have most of it automated to input the formulas I need automatically. This helps eliminate errors of copy and paste on large files. The problem is with this latest validation.
One of the latest validations involves counting the number of rows that match 3 columns. The 3 columns are in Sheet 2 and the rows to count are in Sheet 1. Then compare this count with an expected number based on Sheet 2. It is easy enough to do with CountIFs, but there are large files and it can take up to an hour on some of them. I am trying to find something faster.
I am using a smaller file and it is still taking about 1 minute. There are only about 1800 rows.
I have something like this:
In Check1 I am using: =COUNTIFS(Sheet1!A:A,A2,Sheet1!B:B,B2,Sheet1!C:C,C2)
My code puts that formula in the active cell. Is there a better way to do this?
Is there anyway - using VB or anything - to improve the performance.
When the rows start getting into the 10's of thousands it is time to start this and get lunch. And, then hope it is done when I get back to my desk!
Thanks.

You basically have to iterate over all rows for each column, this is expensive. You might be able to split this into two tasks:
Merge your Columns A-C into one value =CONCAT(A2,B2,C2)
Then do only a single countif on this column =COUNTIF(D:D,D2)
That way you get rid of two (time) expensive countifs at the cost of the new concat.

You should narrow the range CountIf acts on from entire columns to the actual used range
And your code could write the result of the formula instead of the formula itself
Like follows:
With Sheet1
Set sheet1Rng = Intersect(.UsedRange, .Range("A:C"))
End With
With Sheet2
For Each cell in Intersect(.UsedRange, .Range("A:A"))
cell.Offset(,3) = WorksheetFunction.CountIfs(sheet1Rng.Columns(1), cell.Value, sheet1Rng.Columns(2), cell.Offset(,1).Value, sheet1Rng.Columns(3),cell.Offset(2).Value)
Next cell
End With

I set up a mock sheet, using a layout similar to what you show, with 10,000 rows, and manually filled it with the COUNTIFS formula you show. Changing a single item in the data triggered a recalculation which took about ten seconds or so to execute.
I then tried the following macro, which executed in well under one second. All of the counting is done within the VBA macro. So this Dictionary method may be an answer to your speed problems.
Before running this, you may want to set the Calculation state to Manual (or do it in the code) if you have COUNTIFS on the worksheet.
Option Explicit
'set reference to Microsoft Scripting Runtime
Sub CountCol123()
Dim DCT As Dictionary
Dim V As Variant
Dim WS As Worksheet, R As Range
Dim I As Long
Dim sKey As String
Set WS = Worksheets("sheet2")
'read the info into an array
With WS
Set R = .Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp)).Resize(columnsize:=4)
V = R
End With
'Get count of the matches
Set DCT = New Dictionary
For I = 2 To UBound(V, 1)
sKey = V(I, 1) & "|" & V(I, 2) & "|" & V(I, 3)
If DCT.Exists(sKey) Then
DCT(sKey) = DCT(sKey) + 1
Else
DCT.Add Key:=sKey, Item:=1
End If
Next I
'Get the results and write them out
For I = 2 To UBound(V, 1)
sKey = V(I, 1) & "|" & V(I, 2) & "|" & V(I, 3)
V(I, 4) = DCT(sKey)
Next I
'If you have COUNTIFS on the worksheet when testing this,
' or any other formulas that will be triggered,
' then uncomment the next line
'Application.Calculation = xlCalculationManual
With R
.EntireColumn.Clear
.Value = V
End With
End Sub

The Excel alternative named Cell in Hancom Office 2020 is insanely fast at countifs. Not sure why. On my i7-5775C, Excel 2019 takes 90 seconds for a countifs with two criteria for populating 10,000 rows with the results. Using Cell, the exact same operation completes in less than 28 seconds. I have verified that the results match those generated by Excel 2019.

Related

Excel Table - Convert Range-Based Formula to Field-Based

I have inherited a very large spreadsheet and am trying to migrate it to a database. The table has over 300 columns, many of which reference other columns.
By converting it to a table (ListObject) in Excel, I thought it would be easier to deconstruct the logic... basically turn the formula:
=CJ6-CY6
into
=[#[Sale Price]]-[#[Standard Cost]]
Converting it to a table worked great... unfortunately it didn't change any of the embedded formulas. They still reference the ranges.
I think I may notionally understand why -- if a formula references a value in another row, then it's no longer a primitive calculation. But for formulas that are all on the same row, I'm wondering if there is any way to convert them without manually going into each of these 300+ columns and re-writing them. Some of them are beastly. No joke, this is an example:
=IF(IF(IF(HD6="",0,IF(HD6=24,0,IF(HD6="U",((FI6-(ES6*12))*$I6),($I6*FI6)*HS6)))<0,0,IF(HD6="",0,IF(HD6=24,0,IF(HD6="U",((FI6-(ES6*12))*$I6),($I6*FI6)*HS6))))>GO6,GO6,IF(IF(HD6="",0,IF(HD6=24,0,IF(HD6="U",((FI6-(ES6*12))*$I6),($I6*FI6)*HS6)))<0,0,IF(HD6="",0,IF(HD6=24,0,IF(HD6="U",((FI6-(ES6*12))*$I6),($I6*FI6)*HS6)))))
And it's not the worst one.
If anyone has ideas, I'd welcome them. I'm open to anything. VBA included.
I would never use this to teach computer science, but this is the hack that did the trick. To keep things simple, I transposed header names and the corresponding column into A17:
And then this VBA code successfully transformed each range into the corresponding column property.
Sub FooBomb()
Dim ws As Worksheet
Dim r, rw, translate As Range
Dim col, row As Integer
Dim find, anchored, repl As String
Set ws = ActiveWorkbook.ActiveSheet
Set rw = ws.Rows(6)
Set translate = ws.Range("A17:B363")
For col = 12 To 347
Set r = rw.Cells(1, col)
For row = 363 To 17 Step -1
find = ws.Cells(row, 1).Value2 & "6"
anchored = "$" & find
repl = "[#[" & ws.Cells(row, 2).Value2 & "]]"
r.Formula = VBA.Replace(r.Formula, anchored, repl)
r.Formula = VBA.Replace(r.Formula, find, repl)
Next row
Next col
End Sub
Hard-coded and not scalable, but I'm not looking to repeat this ever again.
-- EDIT --
Word to the wise to help performance, especially with as many columns and formulas are in this spreadsheet.
Set Formula calculation to manual before
Check before the field exists before doing a replacement -- skipping happens more often than not
Program ran in a few seconds (minutes prior) before these changes:
If InStr(r.Formula, anchored) > 0 Then
r.Formula = VBA.Replace(r.Formula, anchored, repl)
End If
If InStr(r.Formula, find) > 0 Then
r.Formula = VBA.Replace(r.Formula, find, repl)
End If

How to cycle through merged cells (and populate values from a one-dimensional array)?

I would like to find a way how to cycle through merged cells, e.g. using a For...Next loop.
I could only manage to make it work like this:
Cells(1,1).Select
For i=1 to 6
Selection.Value = arrData(i)
Selection.Offset(0,1).Select
Next i
I hate using .Select - but if I use .Offset(0,i) it won't move from merged cell to merged cell, but just the number of columns from the original cell.
For more detail - I am copying values from a csv-like format into a nicer formatted output sheet, that is then supposed to be exported with bunch of merged cells.
There are multiple sections to the sheet but within each section there is a known amount of cells per row.
My only working solution without .Select is to use .Cells
Example:
For row=0 to 12
with rng.Offset(1,0)
.cells(row+1,1)=arrdata(1+(row*6))
.cells(row+1,3)=arrdata(2+(row*6))
.cells(row+1,7)=arrdata(3+(row*6))
.cells(row+1,9)=arrdata(4+(row*6))
.cells(row+1,14)=arrdata(1+(row*6))
.cells(row+1,16)=arrdata(1+(row*6))
End with
Next row
but this is pretty ardous.
EDIT: Here is a screenshot:
target area
The idea is that the amount of rows is completely flexible, depending on the transaction. So sometimes there is only one row, but can be anything really.
My code generates this section using relative references based on named ranges.
And then from the ugly sheet (where all information is stored in a single row) the values are fed into a one-dimensional array, then the array should be fed into the nice looking sheet.
If the sheet had no merged cells, the formula would look quite simple:
Dim i as integer, j as integer
Dim ws as worksheet: set ws = Worksheets("Printable")
'data array has already been filled with info in a linear order beforehand
k=1
For i=1 to item_qt 'number of cost items lines
For j=1 to item_col 'number of detail columns (in this section)
ws.Range("item_title").Offset(1,0).Cells(i,j).Value=data(k)
k=k+1
Next j
Next i
But because of the nature of this sheet - supposed to be printable and nicer on the eyes - I can't do that and have to find a way how to switch between the merged cells.
Hope this Edit cleared some things up.
I am also looking into the suggestions now to see if I can apply those somehow, but if anybody knows of something better, I am open for everything.
If you're stepping through merged columns, you could use something like
For i = startColumn To endColumn
If Cells(row,StartColumn).MergeArea.Columns.Count > 1 Then
'Do Stuff
i = i + Cells(row,StartColumn).MergeArea.Columns.Count - 1
End If
Debug.Print i
Next i
This will test for merged columns and then jump to the next column after the merge.
EDIT:
Seeing your data structure added in your edit, you could incorporate the MergeArea.Columns.Count method into your For j-Next j loop like
k=1
For i=1 to item_qt 'number of cost items lines
For j=1 to item_col 'number of detail columns (in this section) <-this will need to
'be the total number of columns, not just the number of
'detail fields
ws.Range("item_title").Offset(1,0).Cells(i,j).Value=data(k)
j = j + ws.Range("item_title").Offset(1,0).Cells(i,j).MergeArea.Columns.Count - 1
k=k+1
Next j
Next i
By searching for "excel find merged cells vba" Google comes up with:
How To Identify And Select All Merged Cells In Excel?
https://www.extendoffice.com/documents/excel/962-excel-select-merged-cells.html
Sub FindMergedcells()
'updateby Extendoffice 20160106
Dim x As Range
For Each x In ActiveSheet.UsedRange
If x.MergeCells Then
x.Interior.ColorIndex = 8
End If
Next
End Sub
and
2 Practical Methods to Find Merged Cells in Your Excel
https://www.datanumen.com/blogs/2-practical-methods-find-merged-cells-excel/
Sub FindMerge()
Dim cel As Range
For Each cel In ActiveSheet.Range(“A1:G13”)
If cel.MergeCells = True Then
‘change the color to make it different
cel.Interior.Color = vbYellow
End If
Next cel
End Sub

Highlighting duplicate rows

I would like to highlight duplicate rows in Excel VBA. Assume I have the following exemplary table with columns A, B, C and D for testing:
A B C D (Strings)
1 1 1 dsf
2 3 5 dgdgdgdg
1 1 1 dsf
2 2 2 xxx
6 3 4 adsdadad
2 2 2 xxx
The duplicate rows should be highlighted in any colour, e.g. grey. I am looking ideally for fast performing code, as it will be used for rather big tables.
Note there are solutions available for highlighting duplicate cells (but not duplicate rows). I don't know how to identify if rows are duplicates and at the same time how to do that fast, i.e. without nested looping. The solution should be in VBA (not Excel).
What is the best/fastest way to achieve that?
add a conditional formatting with the following sumproduct formula (or a countifs)
=SUMPRODUCT(($A$1:$A$6&$B$1:$B$6&$C$1:$C$6=$A1&$B1&$C1)*1)>1
Explanation:
SUMPRODUCT is handy to work with ranges which you need to manipulate prior to checking a condition. In this case I concatenate A, B & C columns across the range and compare it with the concatenation of the current row. I then convert the TRUE/FALSE array to a 1/0 array by multiplying by 1 and the SUM part of SUMPRODUCT sums the rows where the condition is true, giving me the duplicate rows (all occurences). If you have a small range, using the formula evaluation you can clearly see how this works.
It's a quick fix, but performance is not ideal, I use it a lot for detecting duplicates or generating sequential numbers.
Solution from comments suggested by ponydeer - higher performance
based on sorting suggesting, requires to add key column, put in auto filters and sort on key, then do conditional on key column:
I have tested 3 different approaches on the sample file link from OP's comment. Probably the VBA implementations were not optimal, but below are the results with average time of 100 passes:
1) Conditional formatting using:
a)SUMPRODUCT concatenating columns - 3s
b) COUNTIFS with full column reference - 1.9s
c) COUNTIFS referencing used ranges - 0.2s
2) Sorting the range on all columns, comparing row by row, sorting back - 0.3s
3) Using advanced filter 3.5s
Here is the code for the fastest method:
Sub CF1()
Application.ScreenUpdating = False
Dim sFormula As String
Dim rRng As Range
Dim nCol As Integer, i As Integer
Set rRng = Range("A1").CurrentRegion
nCol = rRng.Columns.Count
'build the formula
sFormula = "=COUNTIFS("
For i = 1 To nCol
sFormula = sFormula & rRng.Columns(i).Address & "," & _
rRng.Cells(1, i).Address(False, True)
If i < nCol Then sFormula = sFormula & ","
Next
sFormula = sFormula & ")>1"
'write the formula in helper cell to get it's local version
rRng.Cells(1, nCol + 1).Formula = sFormula
rRng.FormatConditions.Delete
With rRng.FormatConditions.Add(Type:=xlExpression, _
Formula1:=rRng.Cells(1, nCol + 1).FormulaLocal)
.Interior.ThemeColor = xlThemeColorAccent3
End With
rRng.Cells(1, nCol + 1).Clear
Application.ScreenUpdating = True
End Sub
Sort your range first regarding all columns
Workbooks(1).Sheets(1).Range("A:C").Sort Key1:=Workbooks(1).Sheets(1).Range("A:A"), Order1:=xlAscending, Key2:=Workbooks(1).Sheets(1).Range("B:B"), Order2:=xlAscending, Key3:=Workbooks(1).Sheets(1).Range("C:C"), Order3:=xlAscending, Orientation:=xlSortRows
Then loop through all rows and compare them with the one above them
Dim a As Application
Set a = Application
For i=1 to 1000 ' here you need to set the number of rows you have
if Join(a.Transpose(a.Transpose(ActiveSheet.Rows(i).Value)), Chr(0)) = _
Join(a.Transpose(a.Transpose(Sheets(1).Rows(i+1).Value)), Chr(0)) then
Sheets(1).Range(i+1 & ":" & i+1).EntireRow.Interior.Color = 49407
end if
Next i
The comparison of two rows is based on this thread: How to compare two entire rows in a sheet
Please insert the names of your Workbook, Sheet and set your range and the limits in the code yourself.
I think fastest/best will depend upon the proportion of duplicates – only one row should be quicker than 50% as in the example – and on the actual size of the array (how many columns from which to create a key, etc).
Given that it is rarely possible to beat inbuilt functions with ‘pure’ VBA I suspect using the UI, within VBA if desired, will be faster in some circumstances. Eg:
Add an index column (series fill would serve), copy entire sheet (say to Sheet2), apply Remove Duplicates to all but index column, then apply as CF formula rule of this kind the relevant range of the original sheet:
=$A1=MATCH($A1,Sheet2!$A$1:$A$3000,0)>0
Assuming the start point is like so:
and a ColumnA inserted with numeric series fill starting 1, Sheet2 should look so after Remove Duplicates:
I have assumed ColumnE is to be ignored as far as duplication is concerned.
In source sheet, select array (from A1: - see!), eg A1:I6 and HOME >Styles - Conditional Formatting, New Rule..., Use a formula to determine which cells to format, Format values where this formula is true::
=$A1=MATCH($A1,Sheet2!$A:$A,0)>0
Format..., Fill, grey, OK, OK.
For me results in:
Sub HighlightDuplicateRows_2()
'This one is more modifiable and can handle multiple columns of data
'Just add another *WorksheetFunction.CountIf(Range("A2:A" & LastRow),Cells(r,1).Value) > 1* and change the column values
Dim LastRow As Long
LastRow = Cells(Rows.Count, "A").End(xlUp).Row
For r = 2 To LastRow
If WorksheetFunction.CountIf(Range("A2:A" & LastRow), Cells(r, 1).Value) > 1 And WorksheetFunction.CountIf(Range("B2:B" & LastRow), Cells(r, 2).Value) > 1 Then
Rows(r).Interior.ColorIndex = 6
End If
Next r
End Sub

VBA copy and past values in location based off other values

I'm just learning how to do VBA in excell and I need some help. I have been searching this site but have not found an example that I could tweak that solves my needs (or at least what I could understand). I am trying to make a button that archives data. I have 2 sheets where one is for user input and the other is the archived location. I would like to have excell take the value in column C and past it in the matching location in sheet 2 based on the valves of sheet 1's values in column A and B.
Sheet 1
A _______ B______C (user inputed value)
Item 1 ___Date ___ 5
Item 2 ___Date ___ 8
Item 3 ___Date ___ 2
Sheet 2 (archive sheet)
A ______ B _________ C _______ D
_______Item 1 ___ Item 2 ____ Item 3
Date
Date
Date
I was using a method of just copying the sheet 1 data on a 3rd sheet and running a vlookup but if the user archived the same date twice it would only get the value of the most recent archive. Im not sure how loops work but what I found on other peoples requests I think something like that may be helpful.
Any insight would be most appreciated.
If you do not know how loops work, you must learn the basics of Excel VBA. You cannot hope to stitch together bits of code gathered from the internet without some understanding of VBA.
Search for "Excel VBA Tutorial". You will get many hits of which many will be for free online tutorials. These tutorials differ in approach so try a few to see which best matches your learning style. Alternatively, visit a good bookshop or library where you will find a selection of Excel VBA Primers. I suggest a library so you can take a few books home for a try before purchasing your favourite.
There are many holes in your specification. Perhaps you have a complete specification which you have not documented here. If you have a complete specification, please do not add it to your question. For sites like this, you need small focused questions.
Two design questions I have spotted are:
Why fill the Date column with =TODAY()? If the archive macro is not run at the end of the day, Excel will have changed the date when the macro is run the next day. Either fill the column with the date value or use the nearest VBA equivalent function which is Now().
You imply the user might enter a count for Item A and then enter another count later in the day. The archive sheet is to hold the total of those two counts. How is this handled? You could have two or more rows for Item A. The user could run the archive macro before entering a new value in the Item A row. You could use a Worksheet Change event to automatically archive the value after the user has entered it.
You need to fully specify what the macro is going to do and how it is going to be used before trying to code it. Below I have provided two alternative macros that achieve what I believe is the first step of your requirement: locate valid rows in the data entry worksheet and extract the values ready for achiving.
I suggest you study basic Excel VBA first. That should give you enough knowledge to understand my macros even though the second macro uses non-basic statements. Come back with questions as necessary but please run and try to understand the macros before asking these questions.
Demo1
I created a worksheet "Data Entry" and filled it with data that matches my understanding of your worksheet "Sheet1". Please do not use the default worksheet names because it gets very confusing. Replace my name with whatever you choose.
The macro Demo1 outputs the values from valid rows to the Immediate Window. Writing to the Immediate Window is a convenient way of testing small blocks of code as they are written.
I have documented what the code does but not the VBA statements. Once you know a statement exists, it is usually easy to look it up.
Option Explicit
Sub Demo1()
Dim CountCrnt As Long
Dim DateCrnt As Date
Dim ItemCrnt As String
Dim RowCrnt As Long
Dim RowLast As Long
With Worksheets("Data Entry")
' This sets RowLast to the last used row in column "C" or sets it to 1 if no
' row is used. It is the VBA equivalent of positioning the cursor to the
' bottom of column C and clicking Ctrl+Up
RowLast = .Cells(Rows.Count, "C").End(xlUp).Row
' I have assumed the first data row is 2
For RowCrnt = 2 To RowLast
' I have allowed for column C being empty. I assume such rows are
' to be ignored. I also ignore rows with invalid values in columns
' B or C.
If .Cells(RowCrnt, "C").Value <> "" And _
IsNumeric(.Cells(RowCrnt, "C").Value) And _
IsDate(.Cells(RowCrnt, "B").Value) Then
' Extract the validated values to variables ready for the next stage
' of processing.
ItemCrnt = .Cells(RowCrnt, "A").Value
DateCrnt = .Cells(RowCrnt, "B").Value
CountCrnt = .Cells(RowCrnt, "C").Value
' Output row values to Immediate Window
Debug.Print RowCrnt & " " & ItemCrnt & " " & _
Format(DateCrnt, "dmmmyy") & " " & CountCrnt
End If
Next
End With
End Sub
Demo2
Macro Demo2 achieves the same as macro Demo1 but in a different way.
Demo1 accessed the cells within the worksheet individually. Demo2 copies the entire
worksheet to a Variant which can then be accessed as a 2D array. This is much faster that individual cell access and is usually more convenient if you only want the cell values.
Demo1 output values to the Immediate Window. This is very convenient for small volumes of output but early lines will be lost for larger volumes. Demo2 creates a file within the same folder as the workbook and writes the output to that file so nothing will be lost.
Sub Demo2()
Dim CountCrnt As Long
Dim DateCrnt As Date
Dim FileOutNum As Long
Dim ItemCrnt As String
Dim RowCrnt As Long
Dim RowLast As Long
Dim SheetValue As Variant
FileOutNum = FreeFile
Open ActiveWorkbook.Path & "\Demo2.txt" For Output As #FileOutNum
With Worksheets("Data Entry")
' This statement converts Variant SheetValue to an appropriately sized
' two-dimensional array and copies the values from the entire used
' range of the worksheet to it.
SheetValue = .UsedRange.Value
' Standard practice for 2D arrays is to have the first dimension for
' columns and the second for rows. For arrays copied from or to
' worksheets, the first dimension is for rows and the second is for
' columns. This can be confusing but means that array elements are
' accessed as SheetValue(Row, Column) which matches Cells(Row, Column).
' Note that the lower bounds for both dimensions are always one. If the
' range copied from the worksheet starts at Cell A1, row and column
' numbers for the array will match those of the worksheet.
End With
For RowCrnt = 2 To UBound(SheetValue, 1)
' I have allowed for column 3 (= "C") being empty. I assume such rows
' are to be ignored. I also ignore rows with invalid values in columns
' 2 (= "B") or 3.
If SheetValue(RowCrnt, 3) <> "" And _
IsNumeric(SheetValue(RowCrnt, 3)) And _
IsDate(SheetValue(RowCrnt, 2)) Then
ItemCrnt = SheetValue(RowCrnt, 1)
DateCrnt = SheetValue(RowCrnt, 2)
CountCrnt = SheetValue(RowCrnt, 3)
' Output row values to file
Print #FileOutNum, RowCrnt & " " & ItemCrnt & " " & _
Format(DateCrnt, "dmmmyy") & " " & CountCrnt
End If
Next
Close #FileOutNum
End Sub
Edit New section in response to supplementary question.
As you have discovered there is no way of "printing" to a worksheet but it is easy to write to individual cells. I have used a diagnostic worksheet but I normally consider this technique more trouble than it is worth. Output to a file is easier to add and easier to delete and it does not interfere with the code.
The code below is in the correct order but I have added explanations between blocks.
Dim RowDiagCrnt As Long
The above statement is not within a subroutine which makes is a gloabl variable that can be accessed from any routine. If there are several routines that need to output diagnostic information, it is easier to use a global variable for the row number than pass it as a parameter from the parent routine.
I have a system for naming variables, "Row" means this is a row. "Diag" identifies the worksheet". "Crnt" identifies this as the current row number. In Demo1, I had RowCrnt because I only had one worksheet. You may not like my system. Fine, develop your own. Having a system means I can look at a macro I wrote years ago and know what all the variables are. This makes maintenance much, much easier.
Sub Demo3()
Dim CountCrnt As Long
Dim DateCrnt As Date
Dim ItemCrnt As String
Dim RowDiagCrnt As Long
Dim RowEntryCrnt As Long
Dim RowEntryLast As Long
Dim ValidRow As Boolean
Dim WkshtDiag As Worksheet
Dim WkshtEntry As Worksheet
I now have two worksheets and I will have to switch between them. I do not like multiple uses of Worksheets("Xxxxx") because I might change "Xxxxx". A reference avoids multiple uses of the name and is faster.
Set WkshtEntry = Worksheets("Data Entry")
Set WkshtDiag = Worksheets("Diagnostics")
' Delete existing contents of diagnostic worksheet and create header row
With WkshtDiag
.Cells.EntireRow.Delete
.Cells(1, "A").Value = "Row"
.Cells(1, "B").Value = "Item"
.Cells(1, "C").Value = "Date"
.Cells(1, "D").Value = "Count"
End With
RowDiagCrnt = 2
With WkshtEntry
RowEntryLast = .Cells(Rows.Count, "C").End(xlUp).Row
End With
For RowEntryCrnt = 2 To RowEntryLast
I must keep the access to the two worksheet separate if I want to use With statements. I have used a boolean to handle this problem.
With WkshtEntry
If .Cells(RowEntryCrnt, "C").Value <> "" And _
IsNumeric(.Cells(RowEntryCrnt, "C").Value) And _
IsDate(.Cells(RowEntryCrnt, "B").Value) Then
ItemCrnt = .Cells(RowEntryCrnt, "A").Value
DateCrnt = .Cells(RowEntryCrnt, "B").Value
CountCrnt = .Cells(RowEntryCrnt, "C").Value
ValidRow = True
Else
ValidRow = False
End If
End With
If ValidRow Then
With WkshtDiag
' Output row values to Diagnostic worksheet
.Cells(RowDiagCrnt, "A").Value = RowEntryCrnt
.Cells(RowDiagCrnt, "B").Value = ItemCrnt
With .Cells(RowDiagCrnt, "C")
.Value = DateCrnt
.NumberFormat = "dmmmyy"
End With
.Cells(RowDiagCrnt, "D").Value = CountCrnt
RowDiagCrnt = RowDiagCrnt + 1
End With
End If
Next
' Set columns to appropriate width for contents
With WkshtDiag
.Columns.AutoFit
End With
End Sub
I hope you can see why the reasons for all the changes I made to Demo1 to create Demo3. Having a second worksheet that is not required for the final solution adds complexity that I normally prefer to avoid.

Speeding up a search and delete macro

I have a list containing three columns. The first column contains Names and the other two columns have numbers. The macro takes the first name(A1) and then searches down column A for another occurrence.
When it finds it, it deletes the entire row.It then goes to A2 and does the same thing agan. It works ok for about 500 entries, but using 3000 entries slows it down considerably. Is there a way to speed up this code?
Sub Button1_DeleteRow()
Dim i As Integer
Dim j As Integer
Dim Value As Variant
Dim toCompare As Variant
For i = 1 To 3000
Value = Cells(i, 1)
For j = (i + 1) To 3000
toCompare = Cells(j, 1)
If (StrComp(Value, toCompare, vbTextCompare) = 0) Then
Rows(j).EntireRow.Delete
End If
Next j
Next i
End Sub
If you are running xl07/10 then you can do this with a single line with Remove Duplicates. If you are running 03 then a solution with AutoFilter will be most efficient (I can provide this if you are on the older version)
Remove Duplicates
Manually
Select column A
Data .... Remove Duplicates
Expand selection
Select only column A to find duplicates on
Code
ActiveSheet.Range("$A$1:$A$3000").EntireRow.RemoveDuplicates Columns:=1, Header:=xlNo
To supplement #brettdj's answer, if you are running Excel 2003, you can do this using AdvancedFilter as follows:
Range("A1:A11").AdvancedFilter Action:=xlFilterInPlace, Unique:=True
Note: AdvancedFilter assumes that the first row of your range (row A in this example) contains column headers and will not include that row in the filtering.
To do this manually: Data > Filter > Advanced Filter... > Unique records only
Using Bretts technique is a good answer: but to answer your question about why does it take so long:
- Your macro is getting a value from over 4 million cells one by one. This is very slow.
- I don't see that your macro has switched off screenupdating and automatic calculation: every time a row is deleted the screen will refresh and Excel will recalculate. If you have not switched these off it is very slow.
This code should run a lot faster
Option Explicit
Sub Button1_DeleteRow()
Dim i As Long
Dim j As Long
Dim vArr As Variant
Dim iComp As Long
Dim Deletes(1 To 3000) As Boolean
Application.ScreenUpdating = False
iComp = Application.Calculation
Application.Calculation = xlCalculationManual
vArr = Range("a1:A3000")
For i = 1 To 3000
For j = (i + 1) To 3000
If (StrComp(vArr(i, 1), vArr(j, 1), vbTextCompare) = 0) Then
Deletes(j) = True
End If
Next j
Next i
For j = 3000 To 1 Step -1
If Deletes(j) Then Rows(j).EntireRow.Delete
Next j
Application.ScreenUpdating = True
Application.Calculation = iComp
End Sub
Sorting the data on column A would then make it trivial to identify and remove the duplicates in a single pass
In response to the comment below, I'll explain why sorting is a useful technique.
By sorting column A into order, duplicate removal simply becomes a matter of comparing adjacent entries in column A. You can then either delete the duplicate rows as you find them or flag them for later deletion.
The process should actually be a lot less tedious as you only have to sort the list (and sorting, being built-in, tends to be very fast) and then do one pass (instead of 4498500) through the list deleting/flagging as you go (obviously you need a subsequent clean-up pass if you go for flagging).
On the issue of changing the order of the list, start by adding an extra column (e.g. column D) and have D2 contain the value 2 (i.e. just the row number). A quick fill-down later and every row is numbered. After sorting and deleting/flagging, restoring the original order is just a matter of re-sorting on column D which could then be deleted.
I use this method when I have to perform some operation or other on the duplicates. In other words, column A has duplicate values but the values in columns B and C are meaningful (for example, I might want to sum these values from all of the entries relating to the specific value of column A). In many cases, however, it would be easier just to use SQL to achieve the same result

Resources