I have a list containing three columns. The first column contains Names and the other two columns have numbers. The macro takes the first name(A1) and then searches down column A for another occurrence.
When it finds it, it deletes the entire row.It then goes to A2 and does the same thing agan. It works ok for about 500 entries, but using 3000 entries slows it down considerably. Is there a way to speed up this code?
Sub Button1_DeleteRow()
Dim i As Integer
Dim j As Integer
Dim Value As Variant
Dim toCompare As Variant
For i = 1 To 3000
Value = Cells(i, 1)
For j = (i + 1) To 3000
toCompare = Cells(j, 1)
If (StrComp(Value, toCompare, vbTextCompare) = 0) Then
Rows(j).EntireRow.Delete
End If
Next j
Next i
End Sub
If you are running xl07/10 then you can do this with a single line with Remove Duplicates. If you are running 03 then a solution with AutoFilter will be most efficient (I can provide this if you are on the older version)
Remove Duplicates
Manually
Select column A
Data .... Remove Duplicates
Expand selection
Select only column A to find duplicates on
Code
ActiveSheet.Range("$A$1:$A$3000").EntireRow.RemoveDuplicates Columns:=1, Header:=xlNo
To supplement #brettdj's answer, if you are running Excel 2003, you can do this using AdvancedFilter as follows:
Range("A1:A11").AdvancedFilter Action:=xlFilterInPlace, Unique:=True
Note: AdvancedFilter assumes that the first row of your range (row A in this example) contains column headers and will not include that row in the filtering.
To do this manually: Data > Filter > Advanced Filter... > Unique records only
Using Bretts technique is a good answer: but to answer your question about why does it take so long:
- Your macro is getting a value from over 4 million cells one by one. This is very slow.
- I don't see that your macro has switched off screenupdating and automatic calculation: every time a row is deleted the screen will refresh and Excel will recalculate. If you have not switched these off it is very slow.
This code should run a lot faster
Option Explicit
Sub Button1_DeleteRow()
Dim i As Long
Dim j As Long
Dim vArr As Variant
Dim iComp As Long
Dim Deletes(1 To 3000) As Boolean
Application.ScreenUpdating = False
iComp = Application.Calculation
Application.Calculation = xlCalculationManual
vArr = Range("a1:A3000")
For i = 1 To 3000
For j = (i + 1) To 3000
If (StrComp(vArr(i, 1), vArr(j, 1), vbTextCompare) = 0) Then
Deletes(j) = True
End If
Next j
Next i
For j = 3000 To 1 Step -1
If Deletes(j) Then Rows(j).EntireRow.Delete
Next j
Application.ScreenUpdating = True
Application.Calculation = iComp
End Sub
Sorting the data on column A would then make it trivial to identify and remove the duplicates in a single pass
In response to the comment below, I'll explain why sorting is a useful technique.
By sorting column A into order, duplicate removal simply becomes a matter of comparing adjacent entries in column A. You can then either delete the duplicate rows as you find them or flag them for later deletion.
The process should actually be a lot less tedious as you only have to sort the list (and sorting, being built-in, tends to be very fast) and then do one pass (instead of 4498500) through the list deleting/flagging as you go (obviously you need a subsequent clean-up pass if you go for flagging).
On the issue of changing the order of the list, start by adding an extra column (e.g. column D) and have D2 contain the value 2 (i.e. just the row number). A quick fill-down later and every row is numbered. After sorting and deleting/flagging, restoring the original order is just a matter of re-sorting on column D which could then be deleted.
I use this method when I have to perform some operation or other on the duplicates. In other words, column A has duplicate values but the values in columns B and C are meaningful (for example, I might want to sum these values from all of the entries relating to the specific value of column A). In many cases, however, it would be easier just to use SQL to achieve the same result
Related
Am trying to make a VBA validation sheet on Excel to find all the cells that do not match a predefined pattern and copy it to another sheet
My pattern is "4 numbers/5 numbers"
Ex: 1234/12345 is accepted
2062/67943 is accepted
372/13333 is not accepted
1234/1234 is not accepted etc...
I tried to put the following in the conditions sheet : <>****/***** and <>????/????? and both did not work (am not sure about the correctness of the approach as am still a beginner in VBA)
For the code itself, this is what I wrote :
Sub GuaranteeElig()
Sheets.Add After:=ActiveSheet
ActiveSheet.Name = SheetName
Sheets("MainSheet").UsedRange.AdvancedFilter Action:= _
xlFilterCopy,
CriteriaRange:=Sheets("ConditionsSheet").Range("B1:B2"), _
CopyToRange:=Range("A1"), Unique:=False
End Sub
Any tips on how I can do it ?
Thanks in advance :)
As long as the values of the numbers are independent and do not matter, and it is only the Length of the numerical strings that count, you could use a for loop on the cells from the "search" sheet (I assume this is the MainSheet as shown in your code?) where your values are contained.
From there, I'll give you a couple ways to place the data in the validation sheet (assuming this is your ConditionsSheet as shown in your code?) where you are trying to pinpoint the values.
(You may need to change part of your approach depending on how you want the incorrect set of values laid out on your secondary sheet - but this should get you started.) I added a TON of comments as you say you're new to VBA - these will help you understand what is being done.
Sub GuaranteeElig()
'Adding this to help with performance:
Application.ScreenUpdating = False
'Assuming you are adding a sheet here to work with your found criteria.
Sheets.Add After:=ActiveSheet
ActiveSheet.Name = "ConditionsSheet"
'Using the naming bits below I am assuming the data you are searching for is on MainSheet
'Get used range (most accurate and efficient way I have found yet, others on S.O.
'may have better ways for this - research it if this does not work for you)
'I have had problems using the Sheets().UsedRange method.
Dim c as Long 'This may not be necessary for you if you are looping through only column "A"
Dim r as Long
'Cells(y,x) method uses numerical values for each row (y) or column (x).
c = Cells(1, Columns.Count).End(xlToLeft).Column 'May not be necessary depending on your needs.
'Using this because you have "UsedRange" in your
'code.
'.End(xlToLeft) signifies we are going to the end of the available cell range of
'Row 1 and then performing a "Ctrl+Left Arrow" to skip all blank cells until we hit
'the first non-blank cell.
r = Cells(Rows.Count, 1).End(xlUp).Row
'.End(xlUp) method is similar - we go to the end of the available cell range for the
'column ("A" in this case), then performing a "Ctrl+Up Arrow" to skip all blank cells.
'If you have a header row which spans across the sheet, this is your best option,
'unless you have 'helper' cells which extend beyond the final column of this header
'row. I am assuming Row 1 is a header in this case - change to your needs.
'For your Rows - choose the column which contains congruent data to the bottom of
'your used range - I will assume column 1 in this case - change to suit your needs.
Dim i as long
Dim j as integer
Dim cel as Range
Dim working_Str() as String 'String Array to use later
Dim string1 as String
Dim string2 as String
Dim badString as Boolean
For i = 2 to r Step 1 'Step down from row 2 to the end of data 1 Row at a time
'Row 1 is header.
set cel=Cells(i, 1) 'Sets the cell to check - assuming data is in Column "A"
'i will change from for loop so 'cel' changes from "A2555"
'to "A2554" to "A2553" etc.
working_Str=Split(cel.Value, "/", -1) 'Splits the value based on "/" inside of cel
string1=working_Str(0) 'what we hope will always be 4 digits
string2=working_Str(1) 'what we hope will always be 5 digits
If Len(string1)<>4 Then 'string1 _(xxxx)_(/)(don't care) does not equal 4 digits in length
badString = True
Elseif Len(string2)<>5 Then ''string1 (don't care)(/)_(xxxxx)_ does not equal 5 digits in length
badString = True
End If
If badString Then 'If either strings above were not correct length, then
'We will copy cell value over to the new sheet "ConditionsSheet"
'Comment the next 2 commands to change from going to one row at a time to
'Matching same row/Cell on the 2nd sheet. Change to suit your needs.
j = j + 1 'Counter to move through the cells as you go, only moving one cell
'at a time as you find incorrect values.
Sheets("ConditionsSheet").Range("A" & j).Value=cel.Value 'sets the value on other sheet
'UNComment the next command to change from going to one row at a time to
'matching same row/cell on the 2nd sheet. Change to suit your needs.
'Sheets("ConditionsSheet").Range("A" & i).Value=cel.Value
End if
badString = False 'resets your boolean so it will not fail next check if strings are correct
Next i
'Returning ScreenUpdating back to True to prevent Excel from suppressing screen updates
Application.ScreenUpdating = True
End Sub
UPDATE
Check the beginning and ending lines I just added into the subroutine. Application.ScreenUpdating will suppress or show the changes as they happen - suppressing them makes it go MUCH quicker. You also do not want to leave this setting disabled, as it will prevent Excel from showing updates as you try to work in the cell (like editing cell values, scrolling etc. . . Learned the hard way. . .)
Also, if you have a lot of records in the given row, you could try putting the data into an array first. There is a great example here at this StackOverflow Article.
Accessing the values of a range across multiple rows takes a LOT of bandwidth, so porting the range into an Array first will make this go much quicker, but it still may take a bit. Additionally, how you access the array information will be a little different, but it'll make sense as you research it a little more.
Alternative To VBA
If you want to try using a formula instead, you can use this - just modify for the range you are looking to search. This will potentially take longer depending on processing speed. I am entering the formula on 'Sheet2' and accessing 'Sheet1'
=IF(COUNTIF(Sheet1!A1,"????/?????"),1,0)
You are spot on with the search pattern you want to use, you just need to use a function which uses wildcard characters within an "if" function. What you do with the "If value is true" vs "If value is false" bits are up to you. COUNTIF will parse wildcards, so if it is able to "count" the cell matching this string combination, it will result in a "True" value for your if statement.
Regex method, this will dump the mismatched value in a worksheet named Result, change the input range and worksheet name accordingly.
In my testing, 72k cells in UsedRange takes about 4seconds~:
Option Explicit
Sub GuaranteeElig()
Const outputSheetName As String = "Result"
Dim testValues As Variant
testValues = ThisWorkbook.Worksheets("MainSheet").UsedRange.Value 'Input Range, change accordingly
Const numPattern As String = "[\d]{4}\/[\d]{5}"
Dim regex As Object
Set regex = CreateObject("VBScript.Regexp")
regex.Pattern = numPattern
Dim i As Long
Dim n As Long
Dim failValues As Collection
Set failValues = New Collection
'Loop through all the values and test if it fits the regex pattern - 4 digits + / + 5 digits
'Add the value to failValues collection if it fails the test.
For i = LBound(testValues, 1) To UBound(testValues, 1)
For n = LBound(testValues, 2) To UBound(testValues, 2)
If Not regex.Test(testValues(i, n)) Then failValues.Add testValues(i, n)
Next n
Next i
Erase testValues
Set regex = Nothing
If failValues.Count <> 0 Then
'If there are mismatched value(s) found
'Tranfer the values to an array for easy output later
Dim outputArr() As String
ReDim outputArr(1 To failValues.Count, 1 To 1) As String
For i = 1 To failValues.Count
outputArr(i, 1) = failValues(i)
Next i
'Test if output worksheet exist
Dim outputWS As Worksheet
On Error Resume Next
Set outputWS = ThisWorkbook.Worksheets(outputSheetName)
On Error GoTo 0
'If output worksheet doesn't exist, create a new sheet else clear the first column for array dump
If outputWS Is Nothing Then
Set outputWS = ThisWorkbook.Worksheets.Add
outputWS.Name = outputSheetName
Else
outputWS.Columns(1).Clear
End If
'Dump the array starting from cell A1
outputWS.Cells(1, 1).Resize(UBound(outputArr, 1)).Value = outputArr
Else
MsgBox "No mismatched value found in range"
End If
Set failValues = Nothing
End Sub
If you do not need duplicate values in the list of mismatched (i.e. unique values) then sound out in the comment.
I have some larger files I need to validate the data in. I have most of it automated to input the formulas I need automatically. This helps eliminate errors of copy and paste on large files. The problem is with this latest validation.
One of the latest validations involves counting the number of rows that match 3 columns. The 3 columns are in Sheet 2 and the rows to count are in Sheet 1. Then compare this count with an expected number based on Sheet 2. It is easy enough to do with CountIFs, but there are large files and it can take up to an hour on some of them. I am trying to find something faster.
I am using a smaller file and it is still taking about 1 minute. There are only about 1800 rows.
I have something like this:
In Check1 I am using: =COUNTIFS(Sheet1!A:A,A2,Sheet1!B:B,B2,Sheet1!C:C,C2)
My code puts that formula in the active cell. Is there a better way to do this?
Is there anyway - using VB or anything - to improve the performance.
When the rows start getting into the 10's of thousands it is time to start this and get lunch. And, then hope it is done when I get back to my desk!
Thanks.
You basically have to iterate over all rows for each column, this is expensive. You might be able to split this into two tasks:
Merge your Columns A-C into one value =CONCAT(A2,B2,C2)
Then do only a single countif on this column =COUNTIF(D:D,D2)
That way you get rid of two (time) expensive countifs at the cost of the new concat.
You should narrow the range CountIf acts on from entire columns to the actual used range
And your code could write the result of the formula instead of the formula itself
Like follows:
With Sheet1
Set sheet1Rng = Intersect(.UsedRange, .Range("A:C"))
End With
With Sheet2
For Each cell in Intersect(.UsedRange, .Range("A:A"))
cell.Offset(,3) = WorksheetFunction.CountIfs(sheet1Rng.Columns(1), cell.Value, sheet1Rng.Columns(2), cell.Offset(,1).Value, sheet1Rng.Columns(3),cell.Offset(2).Value)
Next cell
End With
I set up a mock sheet, using a layout similar to what you show, with 10,000 rows, and manually filled it with the COUNTIFS formula you show. Changing a single item in the data triggered a recalculation which took about ten seconds or so to execute.
I then tried the following macro, which executed in well under one second. All of the counting is done within the VBA macro. So this Dictionary method may be an answer to your speed problems.
Before running this, you may want to set the Calculation state to Manual (or do it in the code) if you have COUNTIFS on the worksheet.
Option Explicit
'set reference to Microsoft Scripting Runtime
Sub CountCol123()
Dim DCT As Dictionary
Dim V As Variant
Dim WS As Worksheet, R As Range
Dim I As Long
Dim sKey As String
Set WS = Worksheets("sheet2")
'read the info into an array
With WS
Set R = .Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp)).Resize(columnsize:=4)
V = R
End With
'Get count of the matches
Set DCT = New Dictionary
For I = 2 To UBound(V, 1)
sKey = V(I, 1) & "|" & V(I, 2) & "|" & V(I, 3)
If DCT.Exists(sKey) Then
DCT(sKey) = DCT(sKey) + 1
Else
DCT.Add Key:=sKey, Item:=1
End If
Next I
'Get the results and write them out
For I = 2 To UBound(V, 1)
sKey = V(I, 1) & "|" & V(I, 2) & "|" & V(I, 3)
V(I, 4) = DCT(sKey)
Next I
'If you have COUNTIFS on the worksheet when testing this,
' or any other formulas that will be triggered,
' then uncomment the next line
'Application.Calculation = xlCalculationManual
With R
.EntireColumn.Clear
.Value = V
End With
End Sub
The Excel alternative named Cell in Hancom Office 2020 is insanely fast at countifs. Not sure why. On my i7-5775C, Excel 2019 takes 90 seconds for a countifs with two criteria for populating 10,000 rows with the results. Using Cell, the exact same operation completes in less than 28 seconds. I have verified that the results match those generated by Excel 2019.
I have 3 columns in Excel 2010 with email addresses, I need to be able to narrow all 3 columns to only have unique values. I don't necessarily need to merge the remaining values into a single column, but I definitely need to eliminate duplicates. I found another post that had a VB with it, but it didn't seem to work. It removed only a few duplicates:
Sub removeDuplicates()
Dim lastCol As Integer
lastCol = 5 'col 5 is column E
Dim wks As Worksheet
Set wks = Worksheets("Sheet1")
Dim searchRange As Range
Set searchRange = wks.Range("A1:A" & wks.Cells(Rows.Count, "A").End(xlUp).Row)
Dim compareArray As Variant
Dim searchArray As Variant
'Get all values from Col A to search against
compareArray = searchRange.Value
For col = lastCol - 1 To 1 Step -1
'Set values to search for matches
searchArray = searchRange.Offset(0, col - 1).Value
'Set values to last column to compare against
compareArray = searchRange.Offset(0, col).Value
For i = 1 To UBound(compareArray)
If compareArray(i, 1) = searchArray(i, 1) Then
'Match found, delete and shift left
Cells(i, col).Delete Shift:=xlToLeft
End If
Next i
Next col
End Sub
Thanks!
Here is how I would propose doing this if it is a one-off task that you don't have to do very often.
Rather than typing out the entire process in detail, I have done a screencast of how I did this (and the entire process barely took me a minute to do).
The quick overview:
You will need to add a few temporary helper columns for unique values from each email list (one for each list), a 'merged list' column and then a final column. Filter for the unique emails using the 'Advanced' filter option one column at a time. Paste those values into the temporary column for that email list and then clear the filter. Repeat until you have gone through each column and each temporary column has the unique values in it from each list. Once you have the uniques from each list, paste these one at a time into the 'merged list' column (stacking the results in one long list) and then do a unique filter on that. Copy/paste the uniques from that list into your final column, clear the filter, and you're done.
Screencast is below:
http://screencast.com/t/zL8VmUut
Cheers!
Since the first column are the ones you already contacted, swap the first column with the second and on the 3rd write a YES or NO value if email was found on the second column (the ones you already contacted).
Formula.
=IF(ISERROR(VLOOKUP(A2,$B$2:$B$11,1,FALSE)),"Not Contacted","Yes")
As you can see, the one with Yes status is on the contacted list, you just filter the Not Contacted and you will have a new pending list in column A.
Simple.
I want to remove rows based on duplicate cells in a column from large sheet, without leaving duplicate sample (like "Remove Duplicates" Excel command does). So if I have:
1
2
2
3
I want, as a result:
1
3
This can be accomplished with conditional formatting, then filtering or sorting duplicates and deleting filtered data, but the process is very slow for large sheet.
Conditional formatting takes second, but just clicking on filter takes around 5min to display filter context menu and additional 20-30min to do actual filtering based on color. I tried this process on a different PCs with 4 cores and plenty of RAM and 100.000 rows sheet
I then thought to write VBA, iterate column cells and if cell is colored, then delete entire row (this is possible in Excel 2010, with Cells().DisplayFormat) but processing takes even more time.
Can someone suggest a faster way to remove duplicates on large sheet?
EDIT: Note that I have used 2 functions. Of this, test is a function to test whether the function works (which you will have to modify per your scenario).
Also, I filled cell A1 to A100000 with test values. Please modify it per your needs.
Option Explicit
Function GetUniqueItems(ByVal src As Range) As Variant
Dim returnValue
Dim dictOfItemsWith1Value
Dim dictOfItemsWithMoreThan1Value
Dim countOfCells As Long
Dim counter As Long
Dim srcValues As Variant
Dim currentValue
Dim cell As Range
srcValues = src.Value
countOfCells = src.Cells.Count
Set dictOfItemsWith1Value = CreateObject("Scripting.Dictionary")
Set dictOfItemsWithMoreThan1Value = CreateObject("Scripting.Dictionary")
For counter = 1 To countOfCells
currentValue = srcValues(counter, 1)
If dictOfItemsWithMoreThan1Value.exists(currentValue) Then
dictOfItemsWithMoreThan1Value(currentValue) = dictOfItemsWithMoreThan1Value(currentValue) + 1
Else
If Not dictOfItemsWith1Value.exists(currentValue) Then
dictOfItemsWith1Value.Add currentValue, 1
Else
dictOfItemsWith1Value.Remove currentValue
dictOfItemsWithMoreThan1Value.Add currentValue, 1
End If
End If
Next
ReDim returnValue(1 To dictOfItemsWith1Value.Count, 1 To 1)
Dim key
counter = 1
For Each key In dictOfItemsWith1Value.keys
returnValue(counter, 1) = key
counter = counter + 1
Next
GetUniqueItems = returnValue
End Function
Sub test()
Debug.Print Now
Dim uniqueValues
uniqueValues = GetUniqueItems(Range("A1:A100000"))
Range("A1:A100000").ClearContents
Range("A1").Resize(UBound(uniqueValues, 1)) = uniqueValues
Debug.Print Now
End Sub
My way to deal with large excel files where I have to remove large chunks of data:
After the last column, use a countif() (much like KazJaw and DanM's countif)
=COUNTIF($A$1:$A$100000,A1)
$A$1:$A$100000 contains your ids. Change accordingly.
Drag the formula to the bottom (Fill Down, or select the range $B$1:$B$100000 if this is the column you put the helper column then Ctrl+D)
Copy column and paste values in place to remove the formula. This will prevent any recalculations during/after any filtering.
Sort by the column with the counts. This makes deleting the large portion of rows much faster later on.
Look for where you start to get counts of 2 and delete all rows till bottom.
Delete the helper column.
Now, if you want to restore the original order, put yet another column after the count, after step 3 above, and after step 5, sort this new column by ascending order before deleting it in step 6.
If you data is in located in column A, this formula should do what you need fairly efficiently:
=COUNTIF(A$1:A$100000,A1)
This formula counts how many times the value in A1 appears in the range A1:A100000. (The dollar signs keep that range from moving down as your drag your formula down.)
Place this in B1 and drag down* to B100000 (assuming you have 100,000 rows).
Then just do a filter on column B to show only 1. (More than 1 means you have duplicates and shouldn't show it.)
*A short cut for dragging down is to just select B1, then press Ctrl-End, then hold down shift and click B100000. Then do Ctrl-D (which is a shortcut for Fill Down).
I have this piece of code which finds the excel row of an item from a list and deletes the items from a list. What I want... is to delete the Excel row as well.
The code is here
Private Sub imperecheaza_Click()
Dim ws As Worksheet
Dim Rand As Long
Set ws = Worksheets("BD_IR")
Rand = 3
Do While ws.Cells(Rand, 4).Value <> "" And Rand < 65000
If ws.Cells(Rand, 4).Value = gksluri.Value * 1 And ws.Cells(Rand, 5).Value = gksluri.List(gksluri.ListIndex, 1) * 1 Then
ws.Range(Rand, 1).EntireRow.Delete '(here I want to delete the entire row that meets the criteria from the If statement)
gksluri.RemoveItem gksluri.ListIndex
Exit Do
End If
Rand = Rand + 1
Loop
End Sub
Where I added ws.Range(Rand,1).EntireRow.Delete is where I want to delete the entire row but I don't know how to do it. What I want... if it finds the same value in a cell like in some selected item of my list to be able to remove both the entire row in excel and the item from the listbox. It works to remove the item from the listbox but I don't know how to remove the row as well
Chris Nielsen's solution is simple and will work well. A slightly shorter option would be...
ws.Rows(Rand).Delete
...note there is no need to specify a Shift when deleting a row as, by definition, it's not possible to shift left
Incidentally, my preferred method for deleting rows is to use...
ws.Rows(Rand) = ""
...in the initial loop. I then use a Sort function to push these rows to the bottom of the data. The main reason for this is because deleting single rows can be a very slow procedure (if you are deleting >100). It also ensures nothing gets missed as per Robert Ilbrink's comment
You can learn the code for sorting by recording a macro and reducing the code as demonstrated in this expert Excel video. I have a suspicion that the neatest method (Range("A1:Z10").Sort Key1:=Range("A1"), Order1:=xlSortAscending/Descending, Header:=xlYes/No) can only be discovered on pre-2007 versions of Excel...but you can always reduce the 2007/2010 equivalent code
Couple more points...if your list is not already sorted by a column and you wish to retain the order, you can stick the row number 'Rand' in a spare column to the right of each row as you loop through. You would then sort by that comment and eliminate it
If your data rows contain formatting, you may wish to find the end of the new data range and delete the rows that you cleared earlier. That's to keep the file size down. Note that a single large delete at the end of the procedure will not impair your code's performance in the same way that deleting single rows does
Change your line
ws.Range(Rand, 1).EntireRow.Delete
to
ws.Cells(Rand, 1).EntireRow.Delete
Better yet, use union to grab all the rows you want to delete, then delete them all at once. The rows need not be continuous.
dim rng as range
dim rDel as range
for each rng in {the range you're searching}
if {Conditions to be met} = true then
if not rDel is nothing then
set rDel = union(rng,rDel)
else
set rDel = rng
end if
end if
next
rDel.entirerow.delete
That way you don't have to worry about sorting or things being at the bottom.
Something like this will do it:
Rows("12:12").Select
Selection.Delete
So in your code it would look like something like this:
Rows(CStr(rand) & ":" & CStr(rand)).Select
Selection.Delete