Keeping first two instances in a column of duplicates in excel

Keeping first two instances in a column of duplicates in excel - excel

I have a long list of items where some are duplicate identification numbers in one column. The records are not duplicates all across the spreadsheet. I am looking to extract the first two rows from the first two iterations of duplicate numbers when sorted by a different value (time/date).
I've seen topics on keeping the first instance of duplicate items, but not keeping the first two instances in records. I'm looking for a formula or vba.
Thanks

First sort your records so that the ones you want to keep are higher up the column.
Add a column where you'll put the formula (I'm assuming the first ID number is in cell A1):
=COUNTIF($A$1:A1, A1)
Drag the formula to the bottom of the table and copy/paste values in place to remove the formula.
Insert a filter and you can filter on only the results of 1 and 2 to get the first two instances of the ID numbers. Copy to a fresh spreadsheet to get only those in a sheet.

Here is a subroutine that should do what you are asking, you will need to alter it to your specific data as it assumes that columns A to G hold the data you want to extract and that column A has the duplicate data, column B holds the other data you want to sort by and that there are no empty cells in the data for column A.
Sub SortAndExctract()
Dim wsInputWorksheet As Worksheet
Dim wsOutputWorksheet As Worksheet
Dim lInputRowNumber As Long
Dim lOutputRowNumber As Long
Dim sLastExtract As Variant 'A variant as I don't know what type of value you are looking for
Dim iColumnCounter As Integer
'Sort the worksheet, assumes that the columns are in the range A:G and that you
'Want to sort according to column A and then column B
Range("A:G").Select
Selection.Sort Key1:=Range("A1"), Order1:=xlAscending, _
Key2:=Range("B1"), Order2:=xlAscending, Header:=xlGuess, _
OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom, _
DataOption1:=xlSortNormal
Set wsInputWorksheet = ThisWorkbook.ActiveSheet
Set wsOutputWorksheet = ThisWorkbook.Worksheets.Add
lInputRowNumber = 1
lOutputRowNumber = 1
'Until an empty cell is found check for duplicate values in column A
'Assumes that you don't have empty cells in column A within your data
'and that the duplicate values are in column A
Do While wsInputWorksheet.Cells(lInputRowNumber, 1).Value <> Empty
If wsInputWorksheet.Cells(lInputRowNumber, 1).Value <> sLastExtract Then
If wsInputWorksheet.Cells(lInputRowNumber, 1).Value = wsInputWorksheet.Cells(lInputRowNumber + 1, 1).Value Then
For iColumnCounter = 1 To 6 'Assuming againg that colum G is the last column
'copy cells to output worksheet
wsOutputWorksheet.Cells(lOutputRowNumber, iColumnCounter).Value = _
wsInputWorksheet.Cells(lInputRowNumber, iColumnCounter).Value
wsOutputWorksheet.Cells(lOutputRowNumber + 1, iColumnCounter).Value = _
wsInputWorksheet.Cells(lInputRowNumber + 1, iColumnCounter).Value
Next iColumnCounter
lInputRowNumber = lInputRowNumber + 1 'Will be incremented again later
lOutputRowNumber = lOutputRowNumber + 2
End If
End If
lInputRowNumber = lInputRowNumber + 1
Loop
End Sub

Related

Sorting places my data with empty cells above it

I have written a bunch of VBA macros to get my data formatted how I need it, and the last step is to sort by this new column I have generated in ascending order. However, when I hit sort by the new column, the code now places all the empty cells above my newly generated column as I think it is reading the empty as a 0 and sorts it above any alphanumeric data. This is happening because of the UDF I have for sorting the data. I need to insert the new column with the UDF for each new cell that I insert, but I don't know how to define the range in the new column.
I am close to solving this but would love some help.
Essentially what I have tried for placing the data in a new column works, but the way I have set the range is placing it in a bad spot and it can easily be sorted in the wrong order now. I include all of my code, but the issue is in the last portion of it where I am setting a range to place the new data.
I think what is happening is when I set my range from C3-C2000 and populate it, the remaining empty cells are now included in my sort and give me "lower" numbers when I sort it ascending. Thus all the empty cells are ranked higher up in the column.
Option Explicit
Sub ContractilityData()
Dim varMyItem As Variant
Dim lngMyOffset As Long, _
lngStartRow As Long, _
lngEndRow As Long
Dim strMyCol As String
Dim rngCell As Range
Columns("B:B").Insert Shift:=xlToRight, CopyOrigin:=xlFormatFromLeftOrAbove 'make new column for the data to go
lngStartRow = 3 'Starting row number for the data. Change to suit
strMyCol = "A" 'Column containing the data. Change to suit.
Application.ScreenUpdating = False
For Each rngCell In Range(strMyCol & lngStartRow & ":" & strMyCol & Cells(Rows.Count, strMyCol).End(xlUp).Row)
lngMyOffset = 0
For Each varMyItem In Split(rngCell.Value, "_") 'put delimiter you want in ""
If lngMyOffset = 2 Then 'Picks which chunk you want printed out (each chunk is set by a _ currently)
rngCell.Offset(0, 1).Value = varMyItem
End If
lngMyOffset = lngMyOffset + 1
Next varMyItem
Next rngCell
Application.ScreenUpdating = True
'Here is where my problem arises
Range("C:C").EntireColumn.Insert
Dim sel As Range
Set sel = Range("C3:C2000")
sel.Formula = "=PadNums(B3,3)"
MsgBox "Data Cleaned"
End Sub
What I would like instead is a way to insert a new column, then have my UDF "PadNums" populate each cell up to the last cell of the previous column, essentially re-naming all my data from the previous column. I can then sort by the new column in ascending order and my data is in the correct order.
I think perhaps what I should do is copy column B into my newly inserted column C, then use some sort of last row function to apply the formula in all cells. That would give me the appropriate range always based on my original column?

I solved this! What I did was use range and xlDown to last row on column B, then pasted it to C, then inserted my UDF into C using the xlDown range!

Validate cell value in one column based on cell value in another column

I want to validate three columns in Excel.
If I give value in one column and remaining two columns leave it as empty then it should throw error for remaining two columns.
If we leave three columns as empty then it should not throw any error
Here is my code:
rowToValidate = ActiveCell.Row
colToValidate = ActiveCell.Column + 1
Dim celAdd As String
celAdd = rowToValidate + colToValidate

Different ways to do this but here's one approach, assuming the columns are side-by-side. You can adjust the Offset as necessary.
Sub Test()
If Not IsEmpty(ActiveCell.Value) And _
IsEmpty(ActiveCell.Offset(, 1).Value) And _
IsEmpty(ActiveCell.Offset(, 2).Value) Then
' Do whatever you wanted to do
End If
End Sub

Highlighting differences between duplicates in VBA

Hi I have a spreadsheet with the following columns :
Transaction_ID counter State File_Date Date_of_Service Claim_Status NDC_9 Drug_Name Manufacturer Quantity Original_Patient_Pay_Amount Patient_Out_of_Pocket eVoucher_Amount WAC_per_Unit__most_recent_ RelayHealth_Admin_Fee Total_Voucher_Charge Raw_File_Name
There are duplicate transaction ID's here. Is there VBA that would highlight where there are differences between two rows? So there may be data with the same Transaction ID but I want to highlight where they may have other fields that are different, therefore they aren't truly duplicates and would like to see what information is different.
thanks!

Excel's find duplicates conditional format should suffice for this. The problem is that it only works well off one column.
So there may be data with the same Transaction ID but I want to highlight where they may have other fields that are different, therefore they aren't truly duplicates
So instead of tracking duplicates in the Transaction ID column alone, you can try adding a new column and, in that new column, concatenate all the columns for which the combined values should be unique - and then run Excel's find duplicates conditional format on that column.
For example if the combination of [Transaction_ID], [File_Date] and [NDC_9] should be unique, make a new column that combines [Transaction_ID], [File_Date] and [NDC_9] column values - assuming your data is in an actual table you could have a table formula like so:
=[#Transaction_ID]&[#File_Date]&[#NDC_9]
and would like to see what information is different.
You can then filter the dupes in that column, and then, looking at the other columns you can see how they are different. It's not really possible to be any more specific than that with the way you've worded your question...

Assuming:
It's an unsorted dataset
column 1 contains the repeatable ID
the first row contains headers
...the following code (in the SHeet's module) will turn any cell yellow that has a value that is totally unique for the ID that appears in the leftmost column...
Option Explicit
Public Sub HighlightUniqueValues()
Dim r As Long, c As Long 'row and column counters
Dim LastCol As Long, LastRow As Long 'right-most and bottom-most column and row
Dim ColLetter As String
Dim RepeatValues As Long
'get right-most used column
LastCol = Me.Cells(1, Me.Columns.Count).End(xlToLeft).Column
'get bottom-most used row
LastRow = Me.Cells(Me.Rows.Count, "A").End(xlUp).Row
'assume first column has the main ID
For r = 2 To LastRow 'skip the top row, which presumably holds the column headers
For c = 2 To LastCol 'skip the left-most column, which should contain the ID
'Get column letter
ColLetter = Split(Cells(1, c).Address(True, False), "$")(0)
' Count the number of repeat values in the current
'column associated with the same value in the
'left-most column
RepeatValues = WorksheetFunction.CountIfs(Range("A:A"), Range("A" & r), Range(ColLetter & ":" & ColLetter), Range(ColLetter & r))
' If there is only one instance, then it's a lone
'value (unique for that ID) and should be highlighted
If RepeatValues = 1 Then
Range(ColLetter & r).Interior.ColorIndex = 6 'yellow background
Else
Range(ColLetter & r).Interior.ColorIndex = 0 'white background
End If
Next c
Next r
End Sub
e.g...

Macro to insert blank cells below if value >1 and copy/paste values from cell above

This site already has something similar: Copy and insert rows based off of values in a column
but the code doesn't take me quite where I need to go, and I haven't been able to tweak it to make it work for me.
My user has a worksheet with 4 columns, A-D. Column A contains specific contract numbers, column B is blank, column C has part numbers, and column D has the entire range of contract numbers. My user wants to count the number of times the entire range contract numbers has duplicates so I entered the formula =countif($D$2:$D$100000,A2) in cell E2 and copied down, giving me the number of times the specific contract in column A appears in column D. The numbers range from 1 to 11 in this workbook but the number may be higher in other workbooks this method will be used in.
The next thing I need to do is to enter blank cells below all values in column E that are greater than 1, very much like the example in the previously asked question. I then also need to copy in the same row and insert copied cells exactly to match in the same row in column A. Example: Cell E21 has the number 5 so I need to shift cells in column E only so that there are 4 blanks cells directly below it. In column A, I need to copy cell A21 and insert copied cells in four rows directly below.
Just trying to get the blank cells to insert has been a trial, using the code as given in the previous question.
Dim sh As Worksheet
Dim lo As ListObject
Dim rColumn As Range
Dim i As Long
Dim rws As Long
Set sh = ActiveSheet
Set lo = sh.ListObjects("Count")
Set rColumn = lo.ListColumns("Count").DataBodyRange
vTable = rColumn.Value
For i = rColumn.Rows.Count To 1 Step -1
If rColumn.Cells(i, 1) > 1 Then
rws = rColumn.Cells(i, 1) - 1
With rColumn.Rows(i)
.Offset(1, 0).Resize(rws, 1).Cells.Insert
.EntireRow.Copy .Offset(1, 0).Resize(rws, 1).Cells
.Offset(1, 0).Resize(rws, 1).EntireRow.Font.Strikethrough = True
End With
End If
Next
I would be very grateful for any help as I have been fighting with this monster for a week.

While this is indeed possible to do, it might be a good idea to look into moving the list of all contract numbers from column D to a different sheet. Even though it is quite simple to loop through a range and insert rows based on cell values - it'll also create holes in columns D and E.
Here's code for simply adding the rows and copying the values as you specified.
Sub Main()
'---Variables---
Dim source As Worksheet
Dim startRow As Integer
Dim num As Integer
Dim val As String
Dim i As Long
'---Customize---
Set source = ThisWorkbook.Sheets(1) 'The sheet with the data
startRow = 2 'The first row containing data
'---Logic---
i = startRow 'i acts as a row counter
Do While i <= source.Range("E" & source.Rows.Count).End(xlUp).Row
'looping until we hit the last row with a value in column E
num = source.Range("E" & i).Value 'Get number of appearances
val = source.Range("A" & i).Value 'Get the value
If num > 1 Then 'Number of appearances > 1
Do While num > 1 'Create rows
source.Range("A" & i + 1).EntireRow.Insert 'Insert row
source.Range("A" & i + 1) = val 'Set value
num = num - 1
i = i + 1 'Next row
Loop
End If
i = i + 1 'Next row
Loop
End Sub
Of course you could also remove the holes from column D after inserting the new rows and modify the formula in column E so that it remains copyable and doesn't calculate for the copied rows.
Generally it makes things easier if a single row can be thought of as a single object, as creating or deleting a row only affects that one single object. Here we have one row represent both a specific contract and a contract in the all contracts list - this could end up causing trouble later on (or it could be totally fine!)

Excel Macro. Remove Non-Duplicate Rows Based on Column

Trying to run a macro in Excel to remove non dupes so dupes can be examined easily.
Step through each cell in column "B", starting at B2 (B1 is header)
During run, if current cell B has a match anywhere in column B - leave it, if it' unique - remove entire row
The code below is executing with inconsistent results.
Looking for some insight
Sub RemoveNonDupes()
Selection.Copy
Range("B2").Select
ActiveSheet.Paste
Application.CutCopyMode = False
Range("B2:B5000").AdvancedFilter Action:= xlFilterInPlace, CriteriaRange:= Range("B2"), Unique := True
Range("B2:B5000").SpecialCells(xlCellTypeVisible).EntireRow.Delete
ActiveSheet.showalldata
End Sub

Not the most direct route, but you could have the macro insert between B and C. Then dump a formula in that column that counts.
Something like =countifs(B:B,B:B) That will give you a count of how many times a record shows, then you can set the script to Loop deleting any row where that value is 1.
Something like
Sub Duplicates()
Columns("B:B").Insert Shift:=xlToRight ' inserts a column after b
count = Sheet1.Range("B:B").Cells.SpecialCells(xlCellTypeConstants).count ' counts how many records you have
crange = "C1:C" & count ' this defines the range your formula's go in if your data doesn't start in b1, change the c1 above to match the row your data starts
Sheet1.Range(crange).Formula = "=countifs(B:B,B:B)" ' This applies the same forumla to the range
ct=0
ct2=0 'This section will go cell by cell and delete the entire row if the count value is 1
Do While ct2 < Sheet1.Range("C:C").Cells.SpecialCells(xlCellTypeConstants).count
For ct = 0 To Sheet1.Range("C:C").Cells.SpecialCells(xlCellTypeConstants).count
If Sheet1.Range("C1").Offset(ct, 0).Value > 1 Then
Sheet1.Range("C1").Offset(ct, 0).EntireRow.Delete
End If
Next
ct2 = ct2 + 1
Loop
Sheet1.Columns("B:B").EntireColumn.delete
end sub
Code isn't pretty, but it should do the job.
**Updated code per comments
Sub Duplicates()
Columns("C:C").Insert Shift:=xlToRight ' inserts a column after b
count = Activesheet.Range("C:C").Cells.SpecialCells(xlCellTypeConstants).count ' counts how many records you have
crange = "C1:C" & count ' this defines the range your formula's go in if your data doesn't start in b1, change the c1 above to match the row your data starts
Activesheet.Range(crange).Formula = "=countifs(B:B,B:B)" ' This applies the same forumla to the range
ct=0
ct2=0 'This section will go cell by cell and delete the entire row if the count value is 1
'''''
Do While ct2 < Activesheet.Range("C:C").Cells.SpecialCells(xlCellTypeConstants).count
For ct = 0 To Activesheet.Range("C:C").Cells.SpecialCells(xlCellTypeConstants).count
If Activesheet.Range("C1").Offset(ct, 0).Value = 1 Then
Activesheet.Range("C1").Offset(ct, 0).EntireRow.Delete
End If
Next
ct2 = ct2 + 1
Loop
ActiveSheet.Columns("C:C").EntireColumn.delete
end sub
You can try that updated code, the part with the Do Loop is what will delete each column, I fixed it to delete any row where the count is 1.
Based on what I understand, your data should be in column B and the counts should be in column C. If that isn't correct, update the formula's to match

Chris, to examine the unique values in a given range of data, I suggest utilizing Excel's Advanced Copy function in a slightly different way:
Range("RangeWithDupes").AdvancedFilter Action:=xlFilterCopy, CopyToRange:=Range("TargetRange"), unique:=True
The operation will provide you a list of unique values from 'RangeWithDupes' located at 'TargetRange'. You can then use the resultant range to manipulate the source data in many ways. Hope this helps.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string