Highlight Full Duplicate Rows (Not Individual Values) - excel

I have looked at several other stack overflow questions, however I haven't found the answer I am looking for.
I have an excel sheet with 25000 rows in the following format:
userid | taskid | taskcode
1 | 3 | K
1 | 4 | O
1 | 4 | L
1 | 5 | O
2 | 3 | O
What I want to do is identify any rows that are duplicated considering userid and taskid, in the above example I would expect the second and third rows to be highlighted.
I'm not sure how feasible it is, I'd ideally like to identify the duplicate row containing the taskcode O rather than L.
My end goal is to remove all duplicate userid / taskid rows with the taskcode O regardless of how it is achieved (highlighting then sorting by highlights or using vba).

Ok solved it after messing around - I'll leave this answer here for future reference unless there is a more efficient way identified.
Step 1
Create a combined cell with both userid and taskid values using =A2&B2 in column D.
Step 2
Use the built in condition formatting for highlighting duplicates.
Step 3
Custom sort using the highlighted duplicate red highlighted cells at the top, then sorting the taskcode in the custom order selected (O, K, L)
Step 4
Simply highlight all the rows (which are now in blocks) that you wish to delete.

if you wanna try using VBA.
Sub duplicates_separation()
Dim duplicate(), i As Long
Dim delrange As Range, cell As Long
Dim delrange2 As Range
Dim shtIn As Worksheet
Dim numofrows1
Set shtIn = ThisWorkbook.Sheets("sheet1")
x = 2
y = 1
Set delrange = shtIn.Range("c1:b30000") 'set your range here
Set delrange2 = shtIn.Range("b1:f30000")
ReDim duplicate(0)
'search duplicates in 2nd column
For cell = 1 To delrange.Cells.Count
If Application.CountIf(delrange, delrange(cell)) > 1 Then
ReDim Preserve duplicate(i)
duplicate(i) = delrange(cell).Address
i = i + 1
End If
Next
'search duplicates in 3rd column
For cell = 1 To delrange2.Cells.Count
If Application.CountIf(delrange2, delrange2(cell)) > 1 Then
ReDim Preserve duplicate(i)
duplicate(i) = delrange2(cell).Address
i = i + 1
End If
Next
'print duplicates
For i = UBound(duplicate) To LBound(duplicate) Step -1
shtIn.Range(duplicate(i)).EntireRow.Value.delete
x = x + 1
Next i
End Sub

Related

VBA Word table to Excel Paragraph and For Error

I have two problems:
I need to copy a table in .docx that has paragraph numbering in column A. The first row the of the table is a always merged(A-C). The table can be any number of rows but follows the same format.
.docx table Ex:
A B C
|'title...'|
|1.| T | F |
|2.| F | T |
|3.| T | T |
I know this code looks at (2, 1) but it does not return that table numbering '1.'. It just returns (2, 1) as a blank cell.
Ideally it would return the values of '1' (Without the period).
When I run the full code it passes through to 'Next iCol' the first time and then errors at 'Cells(resultRow, iCol)...' with: "The Requested member of the collection does not exist". I am thinking it has something to do with the first row being merged so Cell(1,2) does not exist but I am not sure of the solution.
CODE IN QUESTION:
ElseIf .Found = True Then
For iRow = 1 To wrdDoc.Tables(3).Rows.Count
For iCol = 1 To wrdDoc.Tables(3).Columns.Count
Cells(resultRow, iCol) = WorksheetFunction.Clean(wrdDoc.Tables(3).Cell(iRow, iCol).Range.Text)
Next iCol
resultRow = resultRow + 1
Next iRow
resultRow = resultRow + 1
End If
The final results in excel should match the .docx table without the column A period. If is easier the 'title' can just be placed in A1 with the rest of the table to follow.
A B C
|'title..'|
|1| T | F |
|2| F | T |
|3| T | T |
OR
A B C
|tle| | |
|1 | T | F |
|2 | F | T |
|3 | T | T |
Thank you for your help and time.
If the first cell in each row is formatted as "numbered list" then you can read the value like this:
Dim r As Long
With wrdDoc.Tables(3)
For r=2 to .Rows.Count
Debug.Print .Cell(r, 1).Range.ListFormat.ListValue
Next r
End with
Or use ListFormat.ListString if the list uses (eg) A, B, C, ...
Try something along the lines of:
Dim i as long
With wrdDoc.Tables(3).Range
For i = 1 To .Cells.Count
If .Cells(i).RowIndex > iRow Then resultRow = resultRow + 1
iRow = .Cells(i).RowIndex: iCol = .Cells(i).ColumnIndex
ActiveSheet.Cells(resultRow, iCol) = WorksheetFunction.Clean(.Cells(i).Range.Text)
Next
End With
Note the inclusion of a worksheet reference in the code - you may need to define that differently.
The code as posted works fine for me - The first output cell in Excel contains the first cell's text from the Word table. If that cell contains automatic numbering, though, the automatic number is not output. That is consistent with your own code.
The problem with trying to capture content that is included with automatic numbering is that such numbering is usually followed by a tab, which Excel will at best interpret as a column separator. For example:
wrdDoc.Tables(3).Range.Copy
xlWkSht.Cells(resultRow, 1).PasteSpecial xlPasteValues
resultRow = resultRow + wrdDoc.Tables(1).Range.Rows.Count

Excel How To Count Dynamic values for Duplicates

I seem to be having issues finding a solution,
I want to count duplications in a row, the row has 100 columns. I Just want to count many how duplications across the row.
For example,
1,2,3,1,4,9,2,9,1,4
I just want to see how many times the same set of numbers show up.
1 = 3
2 = 2
3 = 0
4 = 2
9 = 2
For example, 3 + 2 + 0 + 2 + 2 = 9
This row has 9 duplications. ie the same value is being displayed more than once. However the value is dynamic.
The VBA function below is a UDF, meaning it's like a normal Excel worksheet function but doing designed to do precisely what you want. Install it in a standard code module.
Function CountDuplicates(Rng As Range) As Integer
' set a Reference to "Microsoft Scripting Runtime"
Dim Fun As Integer ' function return value
Dim Uniques As Scripting.Dictionary ' list of occurrences
Dim Arr As Variant ' array of all values
Dim C As Long
Set Uniques = CreateObject("Scripting.Dictionary")
Arr = Rng.Value
With Uniques
For C = 1 To UBound(Arr, 2)
If Not IsEmpty(Arr(1, C)) Then
If .Exists(Arr(1, C)) Then
.Item(Arr(1, C)) = .Item(Arr(1, C)) + 1
Else
.Add Arr(1, C), 0
End If
End If
Next C
For C = 0 To .Count - 1
Fun = Fun + .Items(C)
Next C
End With
CountDuplicates = Fun
End Function
A standard code module is one that you must add to your project. Its default name will be like Module1 but you can change it to anything you like (wrong syntax names will be rejected). Call the function from the worksheet by entering its call in any cell, for example.
= CountDuplicates(A2:DD2)
This function will return the number of all duplicates counted in the defined range, excluding unique values. Look at the code. When an item is found for the first time a value of 0 is recorded against it. Thereafter, each time it is found again 1 is added to the number of recurrences already found. In the end all values will be added up to return the total count. This method ensures that all first occurrences will be counted as 0 (meaning not counted). Only repeats are included in the returned total.
As with other Excel functions, the result will appear in the cell containing the formula. You can copy that formula down as you do with any other, meaning the original above must be in row 2. If you paste it elsewhere consider the use of absolute addressing to define the action range.
If you have O365 with the UNIQUE function, you can use:
=COUNT(A1:J1)-COUNT(UNIQUE(A1:J1,TRUE,TRUE))
Another way
=COUNT(A1:J1)-SUMPRODUCT(--(FREQUENCY(A1:J1,A1:J1)=1))
or
=SUMPRODUCT(--(COUNTIF(A1:J1,A1:J1)>1))

Application or User Defined Error

I have the following code in one of my workbooks. Basically, there are two sheets - one with a matrix where the top row is names, and the left column is dates. As of now, there are 735 rows (or dates) in the "attendance" spreadsheet, and around 80 names of individuals. It's supposed to track peoples' attendance.
I need to find out how much each person works per week. The code below attempts to do the following:
For each individual, scan the first period of 7 days (or 7 rows). Sum up that period, and place the value in a dictionary at an incremeneted item. So for instance, in 700 days, there will be a dictionary 100 units large. Do this only if the value which is going to be added to the dictionary is greater than 0 (if they worked that week)
Take this dictionary, and sum up the value of all the items in the dictionary, then divide it by the count of that dictionary, to get an average work week during a 7 day block.
Then, place the value of that dictionary into another dictionary, where the individuals name (top row in attendance sheet) is the key, and the value of the previous dictionary is the item.
In the "Summary Sheet", place the item associated with each key in the 9th column. For instance -
Individual | Weekly Hours
John | 20
Jane | 15
Joe | 12
the hope is to be able to derive the amount of work each person puts in per week unit (not 7 unit block) directly from attendance data that is entered into the spreadsheet.
The error occurs at the following line:
For k = 2 To attendanceSheet.Range("a1").End(xlRight).Row
Excel says "Application or user defined error: #1004"
Also, any assistance with optimization would be appreciated as this seems to be rather bulky code.
Public Sub calculateAverageWeek()
Dim i As Long
Dim attendanceSheet As Worksheet
Set attendanceSheet = ActiveWorkbook.Worksheets("Attendance")
'calculate week block
Dim lastRow As Long
lastRow = attendanceSheet.Range("a1").End(xlDown).Row
Dim indivName As Dictionary
Set indivName = New Dictionary
Dim k As Long
For k = 2 To attendanceSheet.Range("a1").End(xlRight).Row
Dim total As Long
Dim v As Variant
Dim totalWeeklyHours As Dictionary
Set totalWeeklyHours = New Dictionary
Dim j As Long
j = 1
Dim curTotal As Double
curTotal = 0
'scan attendance worksheet
For i = 2 To lastRow
curTotal = curTotal + attendanceSheet.Cells(i, 2)
If (i - 1) Mod 7 = 0 Then
If curTotal > 0 Then
totalWeeklyHours.Add j, curTotal
j = j + 1
curTotal = 0
Else
End If
End If
If i = lastRow Then
For Each v In totalWeeklyHours
total = total + totalWeeklyHours.Item(v)
Next
' Worksheets("Summary").Cells(2, 9) = CLng(total / totalWeeklyHours.Count)
indivName.Add attendanceSheet.Cells(k, 1), attendanceSheet.Cells(k, CLng(total / totalWeeklyHours.Count))
End If
Next i
Next k
For i = 2 To Worksheets("Summary").Range("A2").End(xlDown).Row
Worksheets("Summary").Cells(i, 9) = indivName.Item(Cells(i, 1))
Next i
End Sub

Collect and subtotal duplicate rows in a Variant 2D array

I have a dataset in Excel consisting of 20 columns and a varying number of rows, ranging from 20,000 - 50,000.
Each row is a collection of items with one column denoting the quantity of items in the collection and another column denoting the total combined weight of the collection.
Some of these rows are completely identical in all columns, while some are identical in all but quantity and weight.
I want to create a macro that runs through the dataset and "stacks" rows that are duplicate on all other parameters than quantity and weight, and sums these two up.
In other words, a macro that converts this:
|Param1|Param2|...|Param18|Quantity|Weight|
| A | 1 |...| C | 5 | 12.5 |
| A | 1 |...| C | 2 | 5.0 |
| A | 1 |...| C | 3 | 7.5 |
| B | 2 |...| C | 1 | 2.3 |
| B | 2 |...| C | 2 | 4.6 |
To this:
|Param1|Param2|...|Param18|Quantity|Weight|
| A | 1 |...| C | 10 | 25.0 |
| B | 2 |...| C | 3 | 6.9 |
I know this is possible to do in a simple pivot table, but for a number of reasons this is not viable in this case.
Since I'm dealing with a large dataset, I want to load it all into memory at once rather than reading and writing line by line to speed up performance (as suggested in tip #13 in this great article http://www.databison.com/how-to-speed-up-calculation-and-improve-performance-of-excel-and-vba/). However I'm stuck as to how to make the row operations on the data stored in memory.
So far my code looks like this:
Dim r, c, LastRow As Integer
Dim temp_range As Variant
LastRow = Cells(65536, 2).End(xlUp).Row
'Load the data set into memory
temp_range = Sheets("1.1").Range(Sheets("1.1").Cells(2, 1), Sheets("1.1").Cells(LastRow, 20)).Value
'Run through the data set from bottom to top and bulk identical rows together
For r = UBound(temp_range) To LBound(temp_range)
For i = r - 1 To LBound(temp_range)
'PSEUDO CODE START HERE
If row temp_range(r) = row temp_range(i) Then
temp_range(i,19) = temp_range(r,19) + temp_range(i,19)
temp_range(i,20) = temp_range(r,19) + temp_range(i,20)
Delete row temp_range(r)
Exit For
End if
'PSEUDO CODE END HERE
Next i
Next r
I'm stuck at the Pseudo-code section highligted in the code. I simply don't know how to compare the rows, copy quantity and weight from one row to another and then delete the duplicate row in the variant holding the range in memory.
Use Microsoft Query (SQL) in Excel:
Such data manipulation operations are ideal for SQL queries. No need of going through the data line by line using VBA:
SELECT S1.Param1, S1.Param2, S1.Param18, SUM(S1.Quantity), SUM(S1.Weight)
FROM [Sheet1$] AS S1 GROUP BY Param1, Param2,Param18
This is guaranteed to run quickly and efficiently via OLE DB. Whatever VBA code will be much less efficient.
To refresh the query at any time simply run the following code from VBA:
Set ws = ActiveSheet
ws.QueryTables(1).Refresh BackgroundQuery:=False
Where ws is the worksheet where you locate the query table.
Either use the Microsoft Query from Data->From other sources->From Microsoft Query or feel free to use my Add-In: http://www.analystcave.com/excel-tools/excel-sql-add-in-free/
See if you can F8 through this. You will need to go into the VBE's Tools ► References and add Microsoft Scripting Runtime for the use of the Scripting.Dictionary object.
Sub Stack_Dupes()
Dim r As Long, c As Long, v As Long, k As Long
Dim vKEYs As Variant, vITMs As Variant, vTMP1 As Variant, vTMP2 As Variant
Dim sKEY As String, sITM As String
Dim dITMs As New Scripting.dictionary
With ActiveSheet
With .Cells(1, 1).CurrentRegion
.Rows(1).Cells.Copy _
Destination:=.Cells(.Rows.Count + 3, 1)
With .Offset(1, 0).Resize(.Rows.Count - 1, .Columns.Count)
vKEYs = .Cells.Resize(.Rows.Count, .Columns.Count - 2).Value2
vITMs = .Cells.Offset(0, .Columns.Count - 2).Resize(.Rows.Count, 2).Value2
For r = LBound(vKEYs, 1) To UBound(vKEYs, 1)
sKEY = Join(Application.Index(vKEYs, r, 0), ChrW(8203))
sITM = Join(Application.Index(vITMs, r, 0), ChrW(8203))
If dITMs.Exists(sKEY) Then
vTMP1 = Split(dITMs.Item(sKEY), ChrW(8203))
vTMP2 = Split(sITM, ChrW(8203))
vTMP1(0) = CDbl(vTMP1(0)) + CDbl(vTMP2(0))
vTMP1(1) = CDbl(vTMP1(1)) + CDbl(vTMP2(1))
sITM = Join(vTMP1, ChrW(8203))
dITMs.Item(sKEY) = sITM
Else
dITMs.Add Key:=sKEY, Item:=sITM
End If
Next r
ReDim vKEYs(1 To (.Columns.Count - 2), 1 To dITMs.Count)
ReDim vITMs(1 To 2, 1 To dITMs.Count)
For Each vTMP1 In dITMs.Keys
k = k + 1
vTMP2 = Split(vTMP1, ChrW(8203))
For v = LBound(vTMP2) To UBound(vTMP2)
vKEYs(v + 1, k) = vTMP2(v)
Next v
vTMP2 = Split(dITMs.Item(vTMP1), ChrW(8203))
For v = LBound(vTMP2) To UBound(vTMP2)
vITMs(v + 1, k) = vTMP2(v)
Next v
Next vTMP1
.Cells.Offset(.Rows.Count + 3, 0).Resize(UBound(vKEYs, 2), .Columns.Count - 2) = Application.Transpose(vKEYs)
.Cells.Offset(.Rows.Count + 3, .Columns.Count - 2).Resize(UBound(vITMs, 2), 2) = Application.Transpose(vITMs)
End With
End With
End With
dITMs.RemoveAll: Set dITMs = Nothing
End Sub
Results are written underneath the sample data as follows:
        
I would be interested in the timed results on larger data sets if you find time to post them back here.
I found this article (http://sitestory.dk/excel_vba/arrays-and-ranges.htm, go to middle of the page) on how to delete identical rows that I built some code on. It doesn't solve my original question 100% since it doesn't compare entire rows in the array but rather each column separately in each row, but it turned out to have quite good performance nonetheless.
Here's the code:
'I couldn't get the final step of pasting the output array into the new sheet working properly without declaring this option, otherwise the data would be pasted one cell to the right and below where I wanted it.
Option Base 1
Dim r, i, c, LastRow, DeletedRows As Integer
Dim input_array, output_array As Variant
Dim identical As Boolean
Dim s As Worksheet
Dim NewRange As Range
LastRow = Cells(65536, 2).End(xlUp).Row
'Load the data set into memory, consisting of 20 columns of data and a 21th column with no data that is used for marking rows for deletion.
input_array = Sheets("1.1").Range(Sheets("1.1").Cells(2, 1), Sheets("1.1").Cells(LastRow, 21)).Value
DeletedRows = 0
'Run through the data set from bottom to top comparing rows one at a time, copy Quantity and Weight values and mark rows for deletion
For r = UBound(input_array) To 2 Step -1
For i = r - 1 To 2 Step -1
'Assume row r and i are identical
identical = True
'Run through columns of r and i, flag if non-identical value is found and stop the loop (col 18 is ignored, since this is the "Quantity" column, while col 20 is the "Weight" column)
For c = 1 To 18
If input_array(r, c) <> input_array(i, c) And c <> 18 Then
identical = False
Exit For
End If
Next c
' If no non-identical columns were found, add "Quantity" and "Weight" from row r to row i and mark row r for deletion
If identical Then
input_array(i, 18) = input_array(i, 18) + input_array(r, 18)
input_array(i, 20) = input_array(i, 20) + input_array(r, 20)
input_array(r, 21) = "_DELETE_"
DeletedRows = DeletedRows + 1
Exit For
End If
Next i
Next r
' Resize the new array to the size of the old array minus the number of deleted rows
ReDim output_array(UBound(input_array) - DeletedRows, 20)
' Copy rows not marked as deleted from old array to new array
i = 1
For r = 1 To UBound(input_array)
If input_array(r, 21) <> "_DELETE_" Then
For c = 1 To 20
output_array(i, c) = input_array(r, c)
Next c
i = i + 1
End If
Next r
' Create new sheet and
Set s = Sheets.Add
Set NewRange = s.Range("A2").Resize(UBound(output_array), 20)
NewRange = output_array
The macro takes about 30 seconds to reduce 20,000 rows to 3,000 on my computer, which I think is quite good considering the large amounts of data.
This can be done in milliseconds, not seconds. I'm also using arrays and dictionary object. However showing in simpler and more thoughtful implementation. It's faster than using sql against sheet. It can compare any number of columns, just make sure they are included in the KeyIn variable as concatenated string. I also simplify the function by assuming my value to sum is in column 4. You can adjust code for multiple values in other columns. I'm writing from 1 array to another (InAy to OutAy), the dictionary determines if row already existed. The magic happens in the dictionary's Item property. I assign the item property value to the row (r) when a new OutAy row is written. Then when it exists already, I retrieve the row (r) where it was written to OutAy using the item key: d.item(KeyIn) I can then update that value in OutAy(r, 4) with a sum of existing value and new value 'KeyVal'.
This solves the same as sql query aggregate: "Select a, b, c, sum(d) from data group by a, b, c"
Note: add a tools->reference to Microsoft Scripting runtime
sub somesub()
...
data = Range("WhereYourDataIs") 'create data array
Range("WhereYourDataIs").clear 'assumes you'll output to same location
data = RemoveDupes(data) 'removedupes and sum values
Range("A2").Resize(UBound(data), UBound(data, 2)) = data 'A2 assumes your data headers begin in row 1, column 1
...
End Sub
Function RemoveDupes(InAy As Variant) As Variant
Dim d As Scripting.Dictionary
Set d = New Scripting.Dictionary
ReDim OutAy(1 To UBound(InAy), 1 To 4)
r = 1
For i = 1 To UBound(InAy)
KeyIn = ""
KeyVal = InAy(i, 4) 'the value field to sum/aggregate if exists
For c = 1 To 3 'a, b, c metadata to roll up
KeyIn = KeyIn & InAy(i, c)
Next c
If d.Exists(KeyIn) Then
OutAy(d.item(KeyIn), 4) = OutAy(d.item(KeyIn), 4) + KeyVal 'd.item(KeyIn) is r, set when OutAy row was added. Same as OutAy(r,4)=OutAy(r,4) + KeyVal
Else:
d.Add KeyIn, r 'r is set as the item value referencing the row of the OutAy when it was first added. The reference is used when .Exists is true
For c = 1 To 4
OutAy(r, c) = InAy(i, c)
Next c
r = r + 1
End If
Next
RemoveDupes = OutAy
End Function

Split Values In Column & Put On New Row With Same Number In Another Column

I have a spreadsheet with two columns: PRODUCT ID and CATEGORIES. There is only one number in the PRODUCT ID column but in the CATEGORIES column there can be multiple numbers separated by commas. What I need is for the numbers separated by commas to go on to new rows but keep the same PRODUCT ID. Here's an example;
What it looks like now:
PRODUCT ID | CATEGORIES
1 | 76,81
2 | 76,78
3 | 76,80
What I need:
PRODUCT ID | CATEGORIES
1 | 76
1 | 81
2 | 76
2 | 78
3 | 76
3 | 80
How can I do this in excel, I need a formula to do this. I have about 6000 products so doing this manually is not an option.
Your help would be appreciated
You can use this:
Sub test()
Dim c As Range
Dim ID As Variant
Dim arrCategories() As String
Dim d As Range
Set d = Range("D1")
Dim i As Long
For Each c In Range(Range("A1"), Range("A1").End(xlDown))
ID = c.Value
arrCategories() = Split(c.Offset(0, 1).Text, ",")
For i = LBound(arrCategories) To UBound(arrCategories)
d.Value = ID
d.Offset(0, 1).Value = arrCategories(i)
Set d = d.Offset(1, 0)
Next i
Next c
End Sub
The input data must be in columns A:B starting in row 1.
The output will be written in columns D:E starting in row 1.
Best regards,
Simon

Resources