Remove duplicate rows based on value of cell in column - excel

I have the following script that works well in google docs --> sheets. It doesn't work well with a lot of rows. I am guessing because of the array that keeps getting bigger that tracks the values.
I need a script I can run in MS EXCEL that will remove rows that have a duplicate value in a column. (Unless the column is "")
Google docs script that works for small files:
function removeDuplicates()
{
var s = SpreadsheetApp.getActiveSheet();
var c = Browser.inputBox("Please", "Type in the column name (e.g.: A, B, etc.)", Browser.Buttons.OK_CANCEL);
var r, v;
var aValues = [];
try
{
if(c != "cancel")
{
r = 2; // first row is row two
while (r <= s.getLastRow())
{
v = s.getRange(c + r).getValue();
if(v != "")
{
if(aValues.indexOf(v) == -1)
{
aValues.push(v);
}
else
{
s.deleteRow(r);
continue;
}
}
r++;
}
Browser.msgBox("Duplicates removed!");
}
} catch (e) {Browser.msgBox("Error Alert:", e.message, Browser.Buttons.OK);}
}
Any help would be appreciated.

Here is something that seems to fit the bill.
Sub dedupe_ignore_blanks()
Dim r As Long, v As Long, vVALs As Variant, sCOL As String
Application.ScreenUpdating = False
Application.EnableEvents = False
Application.Calculation = xlCalculationManual
With ActiveSheet.Cells(1, 1).CurrentRegion
sCOL = "B"
sCOL = Application.InputBox("Type in the column name (e.g.: A, B, etc.)", _
"Please", sCOL, 250, 75, "", , 2)
If CBool(Len(sCOL)) And sCOL <> "False" Then
For r = .Rows.Count To 2 Step -1
If Application.CountIf(.Columns(sCOL), .Cells(r, sCOL).Value) > 1 Then _
.Rows(r).EntireRow.Delete
Next r
End If
End With
FallThrough:
Application.Calculation = xlCalculationAutomatic
Application.EnableEvents = True
Application.ScreenUpdating = True
End Sub
I gathered from your code snippet that you had a header row in the data row 1. The Application.CountIF does not count blank cells.

Related

How to lookup multiple cells based on multiple criteria in VBA

I'm extremely new to VBA and have tried Googling to find what I need, but have fallen short.
I have a sheet (Sheet1) containing a list of companies that currently have, or at some point have had, a subscription. The list contains the City (Col A), the Company (Col B), the Category (Col C) and a Cancellation Date (Col D) (if applicable). What I want to do is fill in the current company for that city/category on a different sheet. I want those headers to be City (Col D), Category 1 (Col E), Category 2 (Col F), and Category 3 (Col G).
Here are images of the two sheets of test data:
Sheet 1
Sheet 2
There can only be one company per category per city. For example: in my test data, company D was under Category 1 in San Antonio, but cancelled on 11/12/2021. Then, company N took that spot in San Antonio. So, in my table on Sheet 2, I want company N to be populated. The data set I'm using this for is very large and constantly changing, so I would like an automated way to do this.
Here is a copy of the code I pieced together:
Sub CompanyLookup()
With Application
.Calculation = xlCalculationManual
.EnableEvents = False
.ScreenUpdating = False
End With
Dim lastRowInCity, lastRowOutCity, i, k, m As Long
Dim lookFor, j, inArray, outArray, findArray As Variant
Dim inWks, outWks As Worksheet
Set inWks = ThisWorkbook.Sheets(1)
Set outWks = ThisWorkbook.Sheets(2)
lastRowInCity = inWks.Cells(Rows.Count, "A").End(xlUp).Row
lastRowOutCity = outWks.Cells(Rows.Count, "D").End(xlUp).Row
lastRowCategory = inWks.Cells(Rows.Count, "C").End(xlUp).Row
lastRowDate = inWks.Cells(Rows.Count, "D").End(xlUp).Row
lastColCategory = outWks.Cells(Columns.Count, "D").End(xlToLeft).Column
inArray = Range(inWks.Cells(1, 1), inWks.Cells(lastRowInCity, 3))
findArray = Range(outWks.Cells(1, 4), outWks.Cells(lastRowOutCity, 4))
outArray = Range(outWks.Cells(1, 5), outWks.Cells(lastRowOutCity, 5))
On Error Resume Next
For i = 2 To lastRowOutCity
For j = 2 To lastRowInCity
For k = 2 To lastRowCategory
For m = 2 To lastRowDate
lookFor = findArray(i, 1)
If inArray(j, 1) = lookFor And inArray(m, 4) < 1 And inArray(k, 3) = outArray(lastColCategory, 1) Then
outArray(i, 1) = inArray(j, 2)
Exit For
End If
Next j
Next m
Next k
Next i
Range(outWks.Cells(1, 5), outWks.Cells(lastRowOutCity, 5)) = outArray
With Application
.Calculation = xlCalculationAutomatic
.EnableEvents = True
.ScreenUpdating = True
End With
End Sub
Assuming your data looks exactly as your screenshots:
Sub CompanyLookup()
Dim sourceData, resultData, rngSource As Range, rngResult As Range
Dim r As Long, c As Long, city As String, cat As String, rSrc As Long
Set rngSource = ThisWorkbook.Sheets(1).Range("A1").CurrentRegion
Set rngResult = ThisWorkbook.Sheets(2).Range("D1").CurrentRegion
sourceData = rngSource.Value
resultData = rngResult.Value
'scan through the results array
For r = 2 To UBound(resultData, 1)
city = resultData(r, 1) 'city
For c = 2 To UBound(resultData, 2)
cat = resultData(1, c) 'category
'Scan the source data for a city+category match,
' ignoring lines with a cancellation date
For rSrc = 2 To UBound(sourceData, 1)
If Len(sourceData(rSrc, 4)) = 0 Then 'no cancellation date
If sourceData(rSrc, 1) = city And sourceData(rSrc, 3) = cat Then
resultData(r, c) = sourceData(rSrc, 2) 'populate the company
Exit For 'done searching
End If
End If
Next rSrc
Next c
Next r
rngResult.Value = resultData 'populate the results
End Sub
I had exact same issue this week, and from what i read online, the fact that you cannot use vlookup or find function for multiple criteria. Mostly people prefer using .find fuction and when you find it, you can use loop to find second criteria. It was what i used.

Is there a way to progressively subtract the quantity of same item on different cells and keep track of it?

I try to better explain the problem using this screenshot as example:
As you can see from the screenshot, what's going here is the following:
When an item is received, it is put on column G with the actual quantity received. Also an OrderID is associated to the item.
Everytime an item is shipped, it is put in column A.
What I would like to achieve?
Everytime I ship an item, I would like to progressively subtract the quantity in column B to the first non-zero quantity in column H (corresponding to the same item I just put).
If I would be able to create a list ( as in C++) the pseudo code would be the following:
item = $A2;
While(item =/= blank){
If(QuantityReceived > 0 && item == ItemReceived)
QuantityReceived--; ' here I just decrement by 1, because default quantity shipped is 1
else {
ItemReceived = ItemReceived -> next;
QuantityReceived = QuantityReceived -> next;
}
ItemReceived = $G2;
QuantityReceived = $H2;
item = item -> next;
}
I wrote this code to explain what I would like to achieve.
Do you have any tips/solution/ideas?
Hope I explained the problem well.
Thanks.
put this in the code for the sheet (not a module)
Private Sub Worksheet_Change(ByVal Target As Range)
'
Dim ws As Worksheet
Set ws = ThisWorkbook.Worksheets("Reference")
Application.EnableEvents = True
'restrict to users entering data in column B and only one cell
If Target.Cells.Count = 1 And Target.Cells.Column = 2 Then
' get the item name depending on removed or add
If Target.Value = 1 Then
itemName = Target.Offset(0, -1).Value
amt = -1
ElseIf Target.Value = 0 Then
itemName = ws.Range("A" & Target.Row).Value
amt = 1
Else
End
End If
' set up rng then look through all of the items in column G
Dim rng As Range
For Each rng In Range("G1:G" & Range("J" & Rows.Count).End(xlUp).Row)
' look for the item and a whats left of more than 0
If rng.Value = itemName And rng.Offset(0, 3) > 0 Then
rng.Offset(0, 3) = rng.Offset(0, 3) + amt
ws.Columns("A:B").Clear
lastrow = Range("A" & Range("A" & Rows.Count).End(xlUp).Row).Row
ws.Range("A1:B" & lastrow).Value = Range("A1:B" & lastrow).Value
Application.EnableEvents = True
End
End If
Next rng
' message if item with positive left not found
MsgBox ("no item remaining found")
End If
End Sub
test and let me know how you get on / accept the answer if it works well for what you want

Get all columns from Excel except selected

I have excel files converted to txt. In some files, some columns are skipped. That is controlled by database:
file | remove_column
=======+===============
file1 | CASE NOTE
-------+---------------
file2 | Description
-------+---------------
file3 | Item | Address
Remove_Column has the header (1st row). If several columns should be skipped, they are delimited with '|'
I have to compare converted txt file with original excel file if they match. How can I read all columns except those showed in DB table?
I am using UFT 12.5. Reading Excel through Excel.Application or ADO.
Thnx)
UPD: Code I use:
I have columns hard-coded:
Select Case OrigFileName 'file names come from database
Case "Fees mm-yy.xls"
ColumnNames = Split("1,2,3,4,5,6,7,8,9,10,11,12,13", ",")
Case "Exp mm-yy.xls"
ColumnNames = Split("1,2,3,4,5,6,7,8,9,12,13,14,15,16,19,20", ",")
End Select
But there are 50 files, and the business might ask to remove or to add back any columns; also, new files are coming...(((
Dim fsox : Set fsox = CreateObject("Scripting.FileSystemObject")
Dim TargFileRead : Set TargFileRead = fsox.OpenTextFile(targetFile)
Dim OrgExcel : Set OrgExcel = CreateObject("Excel.Application")
OrgExcel.Workbooks.Open(originalfile)
Set vSheet = OrgExcel.WorkSheets(TabUse) 'excel sheet name, comes from database
print vSheet.UsedRange.Rows.Count
For rc = 1 To vSheet.UsedRange.Rows.Count
For coc = 0 To UBound(ColumnNames) 'column names hard-coded
cc = cInt(ColumnNames(coc))
vtext = vSheet.cells(rc,cc)
If NOT(vtext=ChrW(9)) Then
If vstring="" Then
vstring=vtext
Else
vstring = vstring&vbTab&vtext
End If
End If
If len(vstring)>0 Then
TargFileText = TargFileRead.ReadLine
Do
If Left(TargFileText, 1)=ChrW(9) Then
TargFileText = MID(TargFileText, 2)
Else
Exit Do
End If
Loop
Do
If RIGHT(TargFileText, 1)=ChrW(9) Then
TargFileText= mid(TargFileText,1,len(TargFileText)-1)
Else
Exit Do
End If
Loop
TargFileStr = Trim(TargFileText)
If trim(vstring) = trim(TargFileStr) Then
' print "match"
Else
print "-=Not Match=-"&VBNewLine&"txt:::"&trim(TargFileStr)&VBNewLine&"xls:::"&trim(vstring)
End If
End If
Next
I would suggest to replace the Switch statement with a function call that gives you the relevant columns for the sheet as an array. The logic which column is allowed is then put in another function. That should make the logic more flexible than fixed columns.
Function getColumns(OrigFileName as String) As String()
Dim lastCol As Integer
Dim ColumnNumbers As String
lastCol = Sheets(OrigFileName).UsedRange.Columns.Count
For col = 1 To lastCol
If isColumnAllowed(OrigFileName, Sheets(OrigFileName).Cells(1, col)) Then
ColumnNumbers = ColumnNumbers & IIf(Len(ColumnNumbers) = 0, "", ",") & col
End If
Next
getColumns = Split(ColumnNumbers, ",")
End Function
Function isColumnAllowed(ByVal OrigFileName As String, columnName As String) As Boolean
Select Case OrigFileName
Case "file1"
Forbidden = Split("CASE NOTE", "/")
Case "file2"
Forbidden = Split("Description", "/")
Case "file3"
Forbidden = Split("Item/ Address", "/")
End Select
isColumnAllowed = (UBound(Filter(Forbidden, columnName)) = -1)
End Function
This is what I have now and is working:
If LEN(ColumnToRemove)>0 Then
ColumnToRemoveCol = split(ColumnToRemove, "|") 'set collection of header strings to skip column
For L = 1 To vSheet.UsedRange.Columns.Count
For x = 0 to UBound(ColumnToRemoveCol)
AddCol = 0 'ColumnToRemoveCol can have more than 1 item, that may cause any column to be added more than once; we will use the true/false logic via 0 and 1 to avoid that doubling
If vSheet.cells(1, l)=ColumnToRemoveCol(x) Then
AddCol = AddCol + 1
End If
Next
If AddCol =0 Then ColumnNumbers = ColumnNumbers&","&L
Next
Else
For L = 1 To vSheet.UsedRange.Columns.Count
ColumnNumbers = ColumnNumbers&","&L
Next
End If
If LEFT(ColumnNumbers, 1)="," Then ColumnNumbers=MID(ColumnNumbers, 2)
If RIGHT(ColumnNumbers, 1)="," Then ColumnNumbers=MID(ColumnNumbers, 1, LEN(ColumnNumbers)-1)
Printing the columns for first excel file in my case gives the next line:
ColumnNumbers: 1,2,3,4,5,6,7,8,10,11,12,15,16,17
Further usage:
getColumns = Split(ColumnNumbers, ",")
For rc = 1 To vSheet.UsedRange.Rows.Count
For coc = 0 To UBound(getColumns)
cc = cInt(getColumns(coc))
vtext = vSheet.cells(rc,cc)
.....
Next
Next

VBA excel , improve performance without loops

I have two identical sheets that i want to take the rows of , that are identical in multiple columns (the sheets are 63 columns always and 504 rows and increasing) , i am using two for loops to increase the row in one and then comparing all the rows in the other with that row then increase the row again and compare all the rows of the other with that row ect. till the last row , then an if loop to see if they match my conditions . The problem is that it is taking too much time (about 8 mins) , i tried to use the lookup functions but it failed because it can only take one value . I added the false screenupdating , calculation , and enableevents and even changed the statusbar to something very basic to improve performance but non of them gave me the result I wanted .
How can i improve performance in any way possible , a new function or anything ??
PS some times some of the conditions are not important and it depends on the true or fasle values on some of the cells .
For Row_S = 2 To MAX_Row_S
SourceMonth = Worksheets(NBG_SourceRegionDataWorksheetName).Cells(Row_S, SOP).Value
SourceMonth = DatePart("m", SourceMonth)
SourceYear = Worksheets(NBG_SourceRegionDataWorksheetName).Cells(Row_S, SOP).Value
SourceYear = DatePart("yyyy", SourceYear)
SourceCarmaker = Worksheets(NBG_SourceRegionDataWorksheetName).Cells(Row_S, carmaker).Value
SourceProject = Worksheets(NBG_SourceRegionDataWorksheetName).Cells(Row_S, Project).Value
SourceFamily = Worksheets(NBG_SourceRegionDataWorksheetName).Cells(Row_S, Family).Value
SourceStatus = Worksheets(NBG_SourceRegionDataWorksheetName).Cells(Row_S, Status).Value
SourceShare = Worksheets(NBG_SourceRegionDataWorksheetName).Cells(Row_S, Share).Value
SourceCst = Worksheets(NBG_SourceRegionDataWorksheetName).Cells(Row_S, "A").Value
SourcePID = Worksheets(NBG_SourceRegionDataWorksheetName).Cells(Row_S, ProjectID).Value
' Take the data from NBG_Data_Region sheet to be Compared with each row of the NBG_Data_Source_Region sheet
For Row_T = 2 To MAX_Row_T
If Row_T >= MAX_Row_T Then
Exit For
End If
NBGMonth = Worksheets(NBG_RegionaDataWorksheetName).Cells(Row_T, SOP).Value
NBGMonth = DatePart("m", NBGMonth)
NBGYear = Worksheets(NBG_RegionaDataWorksheetName).Cells(Row_T, SOP).Value
NBGYear = DatePart("yyyy", NBGYear)
NBGCarmaker = Worksheets(NBG_RegionaDataWorksheetName).Cells(Row_T, carmaker).Value
NBGProject = Worksheets(NBG_RegionaDataWorksheetName).Cells(Row_T, Project).Value
NBGFamily = Worksheets(NBG_RegionaDataWorksheetName).Cells(Row_T, Family).Value
NBGStatus = Worksheets(NBG_RegionaDataWorksheetName).Cells(Row_T, Status).Value
NBGShare = Worksheets(NBG_RegionaDataWorksheetName).Cells(Row_T, Share).Value
NBGCst = Worksheets(NBG_RegionaDataWorksheetName).Cells(Row_T, "A").Value
NBGPID = Worksheets(NBG_RegionaDataWorksheetName).Cells(Row_T, ProjectID).Value
' StatusBar Show
Application.StatusBar = "Running"
'Application.StatusBar = "VerifyMultipleCustomerProjects. Progress: " & Row_S & " of " & MAX_Row_S
' Check if any project in the NBG_Data_Region have multiple customers and add it ti the sheet Issue_MultipleCustomerProjects
' NAF 20161208
'Test with Source of YEAR and MONTH
If ((NBGMonth = SourceMonth Or Worksheets(Issue_MultipleCustomerProjectsWorksheetName).Range("C21") = True) And _
(NBGYear = SourceYear Or Worksheets(Issue_MultipleCustomerProjectsWorksheetName).Range("C25") = True) And _
(SourceCarmaker = NBGCarmaker Or Worksheets(Issue_MultipleCustomerProjectsWorksheetName).Range("G25") = True) And _
(SourceProject = NBGProject Or Worksheets(Issue_MultipleCustomerProjectsWorksheetName).Range("F25") = True) And _
(SourceFamily = NBGFamily Or Worksheets(Issue_MultipleCustomerProjectsWorksheetName).Range("E25") = True) And _
(SourceShare + NBGShare <> 1 Or Worksheets(Issue_MultipleCustomerProjectsWorksheetName).Range("H25") = True) And NBGCst <> SourceCst) Then
Have you tried adding
Application.ScreenUpdating = False
Application.EnableEvents = False
Application.DisplayAlerts = False
at the beginning of your code, and
Application.ScreenUpdating = True
Application.EnableEvents = True
Application.DisplayAlerts = True
at the end of your code?
This will turn off screen updating, events, and alerts causing faster run-time.
Also, loading and unloading arrays are the fastest way if you decide to take that route.
An example of loading an array:
Dim arr() As Variant ' let brackets empty, not Dim arr(1) As Variant !
For Each a In Range.Cells
' change / adjust the size of array
ReDim Preserve arr(1 To UBound(arr) + 1) As Variant
' add value on the end of the array
arr(UBound(arr)) = a.Value
Next
An example of iterating through the array to pull your data:
For Each element In arr 'Each array element
do_something (element)
Next element

How to optimize exporting data to excel in Excel Interop?

I am using Excel interop to create excel workbooks from my query results. When there are thousands of records it takes a long time for the workbook to be generated. The below code is a sample of how I am populating the cells.
RowNo = 1
For i = 1 To 4000
ColNo = 1
For j = 1 To 5
Dim cell As excel.Range = ws.Cells(RowNo, ColNo)
Dim value = j
cell.Value = value
ColNo = ColNo + 1
Next
RowNo = RowNo + 1
Next
For the above code to run it takes more than a minute. How can I optimize it?
Found the answer. You can write data to an array and then write the array to the excel range rather than writing the data cell by cell. See http://www.clear-lines.com/blog/post/Write-data-to-an-Excel-worksheet-with-C-fast.aspx
private static void WriteArray(int rows, int columns, Worksheet worksheet)
{
var data = new object[rows, columns];
for (var row = 1; row <= rows; row++)
{
for (var column = 1; column <= columns; column++)
{
data[row - 1, column - 1] = "Test";
}
}
var startCell = (Range)worksheet.Cells[1, 1];
var endCell = (Range)worksheet.Cells[rows, columns];
var writeRange = worksheet.Range[startCell, endCell];
writeRange.Value2 = data;
}

Resources