Highlighting differences between duplicates in VBA - excel

Hi I have a spreadsheet with the following columns :
Transaction_ID counter State File_Date Date_of_Service Claim_Status NDC_9 Drug_Name Manufacturer Quantity Original_Patient_Pay_Amount Patient_Out_of_Pocket eVoucher_Amount WAC_per_Unit__most_recent_ RelayHealth_Admin_Fee Total_Voucher_Charge Raw_File_Name
There are duplicate transaction ID's here. Is there VBA that would highlight where there are differences between two rows? So there may be data with the same Transaction ID but I want to highlight where they may have other fields that are different, therefore they aren't truly duplicates and would like to see what information is different.
thanks!

Excel's find duplicates conditional format should suffice for this. The problem is that it only works well off one column.
So there may be data with the same Transaction ID but I want to highlight where they may have other fields that are different, therefore they aren't truly duplicates
So instead of tracking duplicates in the Transaction ID column alone, you can try adding a new column and, in that new column, concatenate all the columns for which the combined values should be unique - and then run Excel's find duplicates conditional format on that column.
For example if the combination of [Transaction_ID], [File_Date] and [NDC_9] should be unique, make a new column that combines [Transaction_ID], [File_Date] and [NDC_9] column values - assuming your data is in an actual table you could have a table formula like so:
=[#Transaction_ID]&[#File_Date]&[#NDC_9]
and would like to see what information is different.
You can then filter the dupes in that column, and then, looking at the other columns you can see how they are different. It's not really possible to be any more specific than that with the way you've worded your question...

Assuming:
It's an unsorted dataset
column 1 contains the repeatable ID
the first row contains headers
...the following code (in the SHeet's module) will turn any cell yellow that has a value that is totally unique for the ID that appears in the leftmost column...
Option Explicit
Public Sub HighlightUniqueValues()
Dim r As Long, c As Long 'row and column counters
Dim LastCol As Long, LastRow As Long 'right-most and bottom-most column and row
Dim ColLetter As String
Dim RepeatValues As Long
'get right-most used column
LastCol = Me.Cells(1, Me.Columns.Count).End(xlToLeft).Column
'get bottom-most used row
LastRow = Me.Cells(Me.Rows.Count, "A").End(xlUp).Row
'assume first column has the main ID
For r = 2 To LastRow 'skip the top row, which presumably holds the column headers
For c = 2 To LastCol 'skip the left-most column, which should contain the ID
'Get column letter
ColLetter = Split(Cells(1, c).Address(True, False), "$")(0)
' Count the number of repeat values in the current
'column associated with the same value in the
'left-most column
RepeatValues = WorksheetFunction.CountIfs(Range("A:A"), Range("A" & r), Range(ColLetter & ":" & ColLetter), Range(ColLetter & r))
' If there is only one instance, then it's a lone
'value (unique for that ID) and should be highlighted
If RepeatValues = 1 Then
Range(ColLetter & r).Interior.ColorIndex = 6 'yellow background
Else
Range(ColLetter & r).Interior.ColorIndex = 0 'white background
End If
Next c
Next r
End Sub
e.g...

Related

Sorting places my data with empty cells above it

I have written a bunch of VBA macros to get my data formatted how I need it, and the last step is to sort by this new column I have generated in ascending order. However, when I hit sort by the new column, the code now places all the empty cells above my newly generated column as I think it is reading the empty as a 0 and sorts it above any alphanumeric data. This is happening because of the UDF I have for sorting the data. I need to insert the new column with the UDF for each new cell that I insert, but I don't know how to define the range in the new column.
I am close to solving this but would love some help.
Essentially what I have tried for placing the data in a new column works, but the way I have set the range is placing it in a bad spot and it can easily be sorted in the wrong order now. I include all of my code, but the issue is in the last portion of it where I am setting a range to place the new data.
I think what is happening is when I set my range from C3-C2000 and populate it, the remaining empty cells are now included in my sort and give me "lower" numbers when I sort it ascending. Thus all the empty cells are ranked higher up in the column.
Option Explicit
Sub ContractilityData()
Dim varMyItem As Variant
Dim lngMyOffset As Long, _
lngStartRow As Long, _
lngEndRow As Long
Dim strMyCol As String
Dim rngCell As Range
Columns("B:B").Insert Shift:=xlToRight, CopyOrigin:=xlFormatFromLeftOrAbove 'make new column for the data to go
lngStartRow = 3 'Starting row number for the data. Change to suit
strMyCol = "A" 'Column containing the data. Change to suit.
Application.ScreenUpdating = False
For Each rngCell In Range(strMyCol & lngStartRow & ":" & strMyCol & Cells(Rows.Count, strMyCol).End(xlUp).Row)
lngMyOffset = 0
For Each varMyItem In Split(rngCell.Value, "_") 'put delimiter you want in ""
If lngMyOffset = 2 Then 'Picks which chunk you want printed out (each chunk is set by a _ currently)
rngCell.Offset(0, 1).Value = varMyItem
End If
lngMyOffset = lngMyOffset + 1
Next varMyItem
Next rngCell
Application.ScreenUpdating = True
'Here is where my problem arises
Range("C:C").EntireColumn.Insert
Dim sel As Range
Set sel = Range("C3:C2000")
sel.Formula = "=PadNums(B3,3)"
MsgBox "Data Cleaned"
End Sub
What I would like instead is a way to insert a new column, then have my UDF "PadNums" populate each cell up to the last cell of the previous column, essentially re-naming all my data from the previous column. I can then sort by the new column in ascending order and my data is in the correct order.
I think perhaps what I should do is copy column B into my newly inserted column C, then use some sort of last row function to apply the formula in all cells. That would give me the appropriate range always based on my original column?
I solved this! What I did was use range and xlDown to last row on column B, then pasted it to C, then inserted my UDF into C using the xlDown range!

Excel VBA set print area to last row with data

I have an Excel table with a sheet "CR" where columns A to K are filled, with the first row being a header row.
Rows 1-1000 are formatted (borders) and column A contains a formula to autonumber the rows when data in column F is entered.
Sheet "CR" is protected to prevent users from entering data in column A (locked).
Using the Workbook_BeforePrint function, I'm trying to set the print area to columns A to K and to the last row of column A that contains a number.
My code (in object 'ThisWorkbook') is as follows:
Private Sub Workbook_BeforePrint(Cancel As Boolean)
Dim ws As Worksheet
Dim lastRow As Long
Set ws = ThisWorkbook.Sheets("CR")
' find the last row with data in column A
lastRow = .Cells(.Rows.Count, 1).End(xlUp).Row
ws.PageSetup.PrintArea = ws.Range("A1:K" & lastRow).Address
End Sub
However, when I click File -> Print, the range of columns A to K up to row 1000 is displayed instead of only the rows that have a number in column A. What am I doing wrong?
Change:
lastRow = .Cells(.Rows.Count, 1).End(xlUp).Row
To:
lastRow = [LOOKUP(2,1/(A1:A65536<>""),ROW(A1:A65536))]
.End(...) will act like ctrl + arrow-key. if the cell has a formula (which looks empty due to the formula, then it will still stop there... another way would be the use of evaluate (if you do not want to loop) like this:
lastRow = .Evaluate("MAX(IFERROR(MATCH(1E+100,A:A,1),0),IFERROR(MATCH(""zzz"",A:A,1),0))")
This will give the last row (which has a value in column A).
Also check if there are hidden values (looking empty due number format or having characters you can't see directly. Try to go below row 1000 in column A (select a cell after A1000 in column A) and hit ctrl+up to validate where it stops (and why).
EDIT:
(regarding your comment)
"" still leads to a "stop" for the .End(...)-command. So either use my formula, translate the formula into vba or loop the cells it get the last value. Also Find is a good tool (if not doing it over and over for countless times).
lastRow = .Cells.Find("*", .Range("B1"), xlValues, , xlByColumns, xlPrevious).Row

Excel VBA defining standard user input field

I have 7 columns representing 7 diferent (let's call it) sources of input.
Row 1 of each column has the names of each source.
Row 2 of each column is a sum of all rows from the 4th down. Ex: A2 = SUM(A4:A1048576)
Since I am suposed to make random entries to each column, I want Row 3 of each column to be standard user input field so that any value input on 3rd row of each column is appended to the first empty cell in that column, triggered by some event (keypress, buttonpress, sheetupdate?). That is, the first entry to column "A" in "A3" will be put in "A4", second entry to "A3" should be put in "A5" and so on. Same goes for each column, independently. Also, if possible, I want cells in Row 3 to be cleared in the end.
How do i do this?
Please, answer with full tutorial explanation or a heavily sourced one, because my experience with EXCEL and VBA is close to none.
For anyone else finding this answer, understand that each column is independent of the others so it doesn't matter if the row counts vary by column.
Paste the following code into the Worksheet you plan to use. This will monitor changes ONLY in row 3 (but all columns - you can alter to only monitor the seven columns you want to watch).
When a change is detected in row 3, the column is determined, then the last used ROW is found for that column. The value entered is moved to the first available row, the value entered on row 3 is erased, and the SUM in row 2 is updated to reflect the new sum range.
Option Explicit
Private Sub Worksheet_Change(ByVal Target As Range)
Dim ColName As String
Dim lLastRow As Long ' Saves Last Row for any column
Dim lColumn As Long
' This will monitor changes made in row 3 only.
' - move entered value to end of column
' - erase row 3 value entered
' - change 'SUM' in row 2 to reflect new row
On Error GoTo Enable_Events ' Need this! If error, need to enable events!!
If Target.Row <> 3 Then Exit Sub ' Only track row 3 changes
Application.EnableEvents = False ' Turn off event tracking because we make changes
lColumn = Target.column ' Get column that's being used.
With ActiveSheet 'Find the last used row in a Column
lLastRow = .Cells(.Rows.Count, Target.column).End(xlUp).Row
End With
Cells(lLastRow + 1, Target.column).value = Target.value ' Move value to end
Target.value = "" ' Clear value entered
' Get Column name (A, B, C...) then create new SUM
ColName = Left(Cells(1, lColumn).Address(False, False), 1 - (lColumn > 26))
Cells(2, lColumn).Formula = "=SUM(" & ColName & "4:" & ColName & lLastRow + 1 & ")"
Enable_Events:
Application.EnableEvents = True
End Sub

Comparing multiple columns in Excel and remove dups

I have 3 columns in Excel 2010 with email addresses, I need to be able to narrow all 3 columns to only have unique values. I don't necessarily need to merge the remaining values into a single column, but I definitely need to eliminate duplicates. I found another post that had a VB with it, but it didn't seem to work. It removed only a few duplicates:
Sub removeDuplicates()
Dim lastCol As Integer
lastCol = 5 'col 5 is column E
Dim wks As Worksheet
Set wks = Worksheets("Sheet1")
Dim searchRange As Range
Set searchRange = wks.Range("A1:A" & wks.Cells(Rows.Count, "A").End(xlUp).Row)
Dim compareArray As Variant
Dim searchArray As Variant
'Get all values from Col A to search against
compareArray = searchRange.Value
For col = lastCol - 1 To 1 Step -1
'Set values to search for matches
searchArray = searchRange.Offset(0, col - 1).Value
'Set values to last column to compare against
compareArray = searchRange.Offset(0, col).Value
For i = 1 To UBound(compareArray)
If compareArray(i, 1) = searchArray(i, 1) Then
'Match found, delete and shift left
Cells(i, col).Delete Shift:=xlToLeft
End If
Next i
Next col
End Sub
Thanks!
Here is how I would propose doing this if it is a one-off task that you don't have to do very often.
Rather than typing out the entire process in detail, I have done a screencast of how I did this (and the entire process barely took me a minute to do).
The quick overview:
You will need to add a few temporary helper columns for unique values from each email list (one for each list), a 'merged list' column and then a final column. Filter for the unique emails using the 'Advanced' filter option one column at a time. Paste those values into the temporary column for that email list and then clear the filter. Repeat until you have gone through each column and each temporary column has the unique values in it from each list. Once you have the uniques from each list, paste these one at a time into the 'merged list' column (stacking the results in one long list) and then do a unique filter on that. Copy/paste the uniques from that list into your final column, clear the filter, and you're done.
Screencast is below:
http://screencast.com/t/zL8VmUut
Cheers!
Since the first column are the ones you already contacted, swap the first column with the second and on the 3rd write a YES or NO value if email was found on the second column (the ones you already contacted).
Formula.
=IF(ISERROR(VLOOKUP(A2,$B$2:$B$11,1,FALSE)),"Not Contacted","Yes")
As you can see, the one with Yes status is on the contacted list, you just filter the Not Contacted and you will have a new pending list in column A.
Simple.

Macro to insert blank cells below if value >1 and copy/paste values from cell above

This site already has something similar: Copy and insert rows based off of values in a column
but the code doesn't take me quite where I need to go, and I haven't been able to tweak it to make it work for me.
My user has a worksheet with 4 columns, A-D. Column A contains specific contract numbers, column B is blank, column C has part numbers, and column D has the entire range of contract numbers. My user wants to count the number of times the entire range contract numbers has duplicates so I entered the formula =countif($D$2:$D$100000,A2) in cell E2 and copied down, giving me the number of times the specific contract in column A appears in column D. The numbers range from 1 to 11 in this workbook but the number may be higher in other workbooks this method will be used in.
The next thing I need to do is to enter blank cells below all values in column E that are greater than 1, very much like the example in the previously asked question. I then also need to copy in the same row and insert copied cells exactly to match in the same row in column A. Example: Cell E21 has the number 5 so I need to shift cells in column E only so that there are 4 blanks cells directly below it. In column A, I need to copy cell A21 and insert copied cells in four rows directly below.
Just trying to get the blank cells to insert has been a trial, using the code as given in the previous question.
Dim sh As Worksheet
Dim lo As ListObject
Dim rColumn As Range
Dim i As Long
Dim rws As Long
Set sh = ActiveSheet
Set lo = sh.ListObjects("Count")
Set rColumn = lo.ListColumns("Count").DataBodyRange
vTable = rColumn.Value
For i = rColumn.Rows.Count To 1 Step -1
If rColumn.Cells(i, 1) > 1 Then
rws = rColumn.Cells(i, 1) - 1
With rColumn.Rows(i)
.Offset(1, 0).Resize(rws, 1).Cells.Insert
.EntireRow.Copy .Offset(1, 0).Resize(rws, 1).Cells
.Offset(1, 0).Resize(rws, 1).EntireRow.Font.Strikethrough = True
End With
End If
Next
I would be very grateful for any help as I have been fighting with this monster for a week.
While this is indeed possible to do, it might be a good idea to look into moving the list of all contract numbers from column D to a different sheet. Even though it is quite simple to loop through a range and insert rows based on cell values - it'll also create holes in columns D and E.
Here's code for simply adding the rows and copying the values as you specified.
Sub Main()
'---Variables---
Dim source As Worksheet
Dim startRow As Integer
Dim num As Integer
Dim val As String
Dim i As Long
'---Customize---
Set source = ThisWorkbook.Sheets(1) 'The sheet with the data
startRow = 2 'The first row containing data
'---Logic---
i = startRow 'i acts as a row counter
Do While i <= source.Range("E" & source.Rows.Count).End(xlUp).Row
'looping until we hit the last row with a value in column E
num = source.Range("E" & i).Value 'Get number of appearances
val = source.Range("A" & i).Value 'Get the value
If num > 1 Then 'Number of appearances > 1
Do While num > 1 'Create rows
source.Range("A" & i + 1).EntireRow.Insert 'Insert row
source.Range("A" & i + 1) = val 'Set value
num = num - 1
i = i + 1 'Next row
Loop
End If
i = i + 1 'Next row
Loop
End Sub
Of course you could also remove the holes from column D after inserting the new rows and modify the formula in column E so that it remains copyable and doesn't calculate for the copied rows.
Generally it makes things easier if a single row can be thought of as a single object, as creating or deleting a row only affects that one single object. Here we have one row represent both a specific contract and a contract in the all contracts list - this could end up causing trouble later on (or it could be totally fine!)

Resources