VBA Range.End(xlDown) stops at last visible row - excel

I am doing a simple VBA script in Microsoft Excel which iterates a list of cars and a list of information about when the cars were refueled to provide an overview of how many kilometers each car is driving each month.
I make use of the Range.End property to calculate the number of rows with data and then loop through the indicies.
Set Data = Worksheets("Tankninger") ' Danish for refuellings
NumRows = Data.Range("A1", Data.Range("A1").End(xlDown)).Rows.Count
For x = 1 To NumRows
' Process data
Next
Everything seemed to be working fine, however I found that if someone applied a filter to e.g. the sheet with refuelling data - e.g. only showing data related to car A, then NumRows would be assigned the index of the last visible row.
Example: if the refuling sheet contains 100 records and the records related car A are located on row 50-60, then NumRows would be assigned the value 60 - resulting in my script ignoring the last 40 records.
Is there a way to make the Range.End property ignore any filter applied to sheet, or will I have to change the implementation to use a while-loop instead?

I ended up replacing the for-loop with a while-loop. This allowed me to access every cell regardless of any filtering applied to the sheets.
Set Data = Worksheets("Tankninger") ' Danish for refuellings
r = 2
While Not IsEmpty(Cars.Cells(r, 1).value)
' Process data
Wend

What you can do is add the following in your code to remove filters before you find the last row with data.
'Remove all filters
Worksheets("Sheet1").Activate
On Error Resume Next
ActiveSheet.ShowAllData

Related

Copy & Pasting values from one Table to another using VBA and ListObjects

I am trying to compare spending data from two sources: a curated manual input from users and an automated extract, for different business units. The common data from both sources is the ID of the spending.
The idea is to aggregate both data sources (excel Tables) into one Table where the first two columns are the ID of the spending, the next column is the spending data from users related to that ID and the last one is the spending data from automated extract.
In this table, I'll have "double" the total spending for each ID, but then I can do a pivot table where I'll clearly compare the users input with the automated extract for each ID.
I highlighted the important fields I need to copy and paste.
[![PGIvsManual][3]][3]
My code is the following
Sub PGIvsManualInput()
With Application
.ScreenUpdating = False
.EnableEvents = False
End With
Set PGIvsManualTable = Worksheets("PGI vs Dépenses (Auto)").ListObjects("PGIvsManualInputAuto")
Set PGITable = Worksheets("PGI Clean").ListObjects("PGIExtract")
Set ManualInputTable = Worksheets("Dépenses").ListObjects("Dépenses")
'Cleaning the table
With Worksheets("PGI vs Dépenses (Auto)").Range("PGIvsManualInputAuto")
.ClearContents
.Borders(xlInsideHorizontal).LineStyle = xlNone
End With
With PGIvsManualTable
If .ListRows.Count >= 1 Then
.DataBodyRange.Rows.Delete
End If
End With
'Copy the data
PGITable.ListColumns(1).DataBodyRange.Resize(, 2).Copy Destination:= _
PGIvsManualTable
Ant that's where it gets messy. I can't even get the first batch of data to properly import! I am trying to copy the 2 first columns from PGITable and paste them in the 2 first columns of PGIvsManualTable. This worked previously without defining any destination column in my first example, even though both the input and destination Tables didn't have the same number of columns
But in this case, it extends the pasting to all columns of my destination table! I don't understand this comportment as it doesn't happen on my previous example with basically the exact same code!!
I tried to set the destination as follows but always got errors:
PGIvsManualTable.ListColumns(1).DataBodyRange.Resize(, 2) ==> Error 91
PGIvsManualTable.DataBodyRange(1,1) ==> Error 438
PGIvsManualTable.ListColumns(1).Resize(, 2) ==> Error 438
And a few others, but it never worked properly.
I expect the output to be my selected columns copy/pasted properly in my destination column, based on the coordinates I provide in the ListObecjts.DataBodyRange.
I guess that if I manage to make this first import work, all other will work on the same template, but in the meantime, my code seem to work on the previous example.
Deletion of the DataBodyRange.Rows will cause an issue if you then try to paste into the DataBodyRange.
As a workaround, you could delete all rows after the first, something like this example:
Sub Test()
Dim firstTbl As ListObject, secondTbl As ListObject
Set firstTbl = Sheet1.ListObjects("Table1")
Set secondTbl = Sheet1.ListObjects("Table2")
With secondTbl
.DataBodyRange.Clear
If .ListRows.Count > 1 Then
.DataBodyRange.Offset(1).Resize(.ListRows.Count - 1).Rows.Delete
End If
End With
firstTbl.ListColumns(1).DataBodyRange.Resize(, 2).Copy secondTbl.DataBodyRange(1, 1)
End Sub

Sum of a specific range that changes on each iteration of a loop

I have a sheet that the values of a range change each time I change a specific cell. Let's say that the cell C8 is an indentity of a person and column H the scheduled monthly repayments. I need to find the aggregate monthly repayments, hence on each possible value of C8 (and that actually means for every person as you can think of different values of C8) I need the aggegate of repayments, hence the aggegate of cell Hi Hence, keeping row i constant and changing cell C8, I always need to sum Hi. So I actually need sum(Hi) (i constant and the index of the sum is cell c8, so if c8 takes value from 1 to 200, I need the sum(Hi(c8)), again row i . Hi(c8) it is just a notation to show you that Hi depends on the value of c8. The actual formula in cell H10 is INDEX('Sheet2'!R:R,MATCH('Sheet1'!$C$8,'Sheet2'!F:F,0)))). H11 and onwards have the same formula with slight twists for the fact that the repayments are not always equal, but the index function remains the same.
Then, the total of H10 for all possible values of c8 is pasted in c17, the total of H11 is pasted in C18 etc. Please find some images below, maybe that helps to support what I try to achieve. enter image description here
I have the following code for that purpose. Note that the above example was just to explain you a bit the background, the cells and the range that changes are different.
sub sumloop()
Application.ScreenUpdating = False
Application.DisplayStatusBar = False
Sheets("Sheet1").Range("C8").Value = 1
Dim i, k As Integer
i = 1
k = Sheets("Sheet1").Range("C9").Value
Dim LR As Long
LR = Sheets("Sheet1").Range("C" &
Sheets("Sheet1").Rows.Count).End(xlUp).row
Sheets("Sheet1").Range("C17:C" & LR).ClearContents
Do While i <= k
If (Sheets("Sheet1").Range("J9").Value = "") Then
Sheets("Sheet1").Range("h10:h200").Copy
Sheets("Sheet1").Range("c17").PasteSpecial
Paste:=xlValues, Operation:=xlAdd, SkipBlanks:= _
False, Transpose:=False
Else
Sheets("Sheet1").Range("h9:h200").Copy
Sheets("Sheet1").Range("c17").PasteSpecial
Paste:=xlValues, Operation:=xlAdd, SkipBlanks:= _
False, Transpose:=False
End If
Sheets("Sheet1").Range("C8").Value = Sheets("Sheet1").Range("C8").Value+1
i = i + 1
Loop
Sheets("Sheet1").Range("C8").Value = 1
Application.ScreenUpdating = True
Application.DisplayStatusBar = True
End Sub
The if inside of the loop is needed as the location of the first value of the range depends on some criteria which have not to do with the code. Also k denotes the maximum number of possible values. What I need is approximately 250.
While the code works, it takes approximately 40 seconds to run for 84 values of cell C8 and approximately 1.5 minute for 250. I tried some things, changed do while to for but nothing significant, used variable ranges instead of fixed ones like h10:h100, very similar to what I do with Sheet1.Range(C17:C&LR). Again no significant changes. As I am very new to vba I don't know if 1.5 minutes are a lot for such a simple code, but to me it seems a lot and this analysis is needed for 10 different combinations of 250 different values for cell c8, which means 15 minutes approximately.
I would appreciate if anyone can suggest me something faster.
Thank you very much in advance.
Here is a complete solution, with explainations in comments.
Because we do not have you source spreadsheet, I could not run any tests on this.
Option Explicit 'This forces you to declare all your varaibles correctly. It may seem annoying at first glance, but will quickly save you time in the future.
Sub sumloop()
Application.ScreenUpdating = False
'Application.DisplayStatusBar = False -> This is not noticely slowing down your code as soon as you do not refresh the StatusBar value for more than say 5-10 times per second.
'Save the existing Calculation Mode to restore it at the end of the Macro
Dim xlPreviousCalcMode As XlCalculation
xlPreviousCalcMode = Application.Calculation
Application.Calculation = xlCalculationManual
'Conveniently store the Sheet into a variable. You might want to do the same with your cells, for example: MyCellWhichCounts = MySheet.Range("c17")
Dim MySheet As Worksheet
MySheet = ActiveWorkbook.Sheets("Sheet1")
MySheet.Range("C8").Value2 = 1 'It is recommended to use.Value2 instead of .Value (notably in case your data type is Currency, but it is good practice to use that one all the time)
Dim LR As Long
LR = MySheet.Range("C" & MySheet.Rows.Count).End(xlUp).Row 'Be carefull with "MySheet.Rows.Count", it may go beyond your data range, for example if you modify the formatting of a cell below your "last" row.
MySheet.Range("C17:C" & LR).Value2 = vbNullString 'It is recommended to use vbNullString instead of ""; although I agree it makes it more difficult to read.
Dim i As Integer, k As Integer 'Integers are ok, just make sure you neer exceed 255
k = MySheet.Range("C9").Value2
For i = 1 To k 'Use a For whenever you can, it is easier to maintain (i.e. avoid errors and also for you to remember when you go back to it years later)
'Little extra so you can track progress of your calcs
Dim z As Integer
z = 10 'This can have any value > 0. If the value is low, you will refresh your app often but it will slow down. If the value is high, it won't affect performance but your app might freeze and/or you will not have your Statusbar updated as often as you might like. As a rule of thumb, I aim to refresh around 5 times per seconds, which is enough for the end user not to notice anything.
If i Mod z = 0 Then 'Each time i is a mutliple of z
Application.StatusBar = "Calculating i = " & i & " of " & k 'We refresh the Statusbar
DoEvents 'We prevent the Excel App to freeze and throw messages like: The application is not responding.
End If
'Set the range
Dim MyResultRange As Range
If (MySheet.Range("J9").Value2 = vbNullString) Then
MyResultRange = MySheet.Range("h10:h200")
Else
MyResultRange = MySheet.Range("h9:h200")
End If
'# Extract Result Data
MyResultRange.Calculate 'Refresh the Range values
Dim MyResultData As Variant
MyResultData = MyResultRange.Value2 'Store the values in VBA all at once
'# Extract Original Data
Dim MyOriginalRange as Range
MyOriginalRange.Calculate
MyOriginalRange = MySheet.Range("c17").Resize(MyResultRange.Rows.Count,MyResultRange.Columns.Count) 'This produces a Range of the same size as MyResultRange
Dim MyOriginalData as Variant
MyOriginalData = MyOriginalRange.Value2
'# Sum Both Data Arrays
Dim MySumData() as Variant
Redim MySumData(lbound(MyResultRange,1) to ubound(MyResultRange,1),lbound(MyResultRange,2) to ubound(MyResultRange,2))
Dim j as long
For j = lbound(MySumData,1) to ubound(MySumData,1)
MySumData(j,1)= MyResultData(j,1) + MyOriginalData(j,1)
Next j
'Instead of the "For j = a to b", you could use this, but might be slower: MySumData = Application.WorksheetFunction.MMult(Array(1, 1), Array(MyResultData, MyOriginalData))
MySheet.Range("C8").Value2 = MySheet.Range("C8").Value2 + 1
Next i
MySheet.Range("C8").Value2 = 1
Application.ScreenUpdating = True
Application.StatusBar = False 'Give back the status bar control to the Excel App
Application.Calculation = xlPreviousCalcMode 'Do not forget to restore the Calculation Mode to its previous state
End Sub
Added by OP (see comments)
Image 1 Code written in the initially question. enter image description here
Image 2 Code above enter image description here
OK, A few things.
Firstly, Dim i, k As Integer doesn't do what you think it does, you need to do: Dim i As Integer, k As Integer
Secondly don't use Integer in VBA use Long so Dim i As Long, k As Long
Third the calculations are killing you. Turn them off with Application.Calculation = xlCalculationManual at the start of your code and back on with Application.Calculation = xlCalculationAutomatic at the end of your code.
Now we are presented with really fast code but the problem that it doesn't update on each iteration which you need it to do. You can calculate just a range like so: Sheets("Sheet1").Range("h10:h200").Calculate so put that in just before you copy the range
There will be an even faster way to do this but I just can't seem to wrap my head around your requirements so I am unable to assist further.
Welcome to StackOverflow.
I must admit I got a bit confused by your narrative, as I did not fully understand if you are doing a sum(a,b,c) or a sum(sum(a,b,c), sum(d,e,f), ...).
In any cases, a trick that will dramatically accelerate your script is the use of arrays.
Performing calcs with VBA is not slow, but retrieving the data from Excel (communicating with the application) IS slow, and pretty much depending on the number of "requests", rather than the quantity of data requested.
You can use arrays to request the data from a range all at once, isntead of requesting the value of each cell separately.
Dim Arr() As Variant
Arr = Range("A1:E999")
It is as simple as this.
Give it a try and if you are still struggling let us know.
BONUS
If you are new to Arrays, keep in mind you can have a two-dimmensionnal array:
Dim 2DArray(0 to 10, 0 to 50)
Or a stacked array (an array of arrays):
Dim MyArray() as String
Dim StackedArray() as MyArray
Dim StackedArray() as Variant
You will need a 2D-Array for extracting the data from a range, but I feel you may need an Array of 2D-Arrays for your Sum of Sums.
Some recommended reading: https://excelmacromastery.com/excel-vba-array/
How to achieve the same through pivot charts (no VBA)
Step 1
First, you must organize your data in a specific way, where each column is a field, and each row is a data entry. If you are not familiar with databases, this is the most tricky point as you may arrange your data in different ways.
Long story short, we will take an example where you have 3 customers and 4 dates.
So that is 12 data entries, which will provide the repayment value for each of the possible customer ID and date.
Step 2
Select that data and insert a PivotChart.
Note: you could insert a PivotTable alone, or a PivotChart alone. I recommend the option hwere you insert both, as managing your data will be more intuitive when working on the Chart. The table is updated at the same time you update the chart.
Step 3
Make sure the all your data is selected, including the top row which will dictate the name of each field (the name of each column).
Step 4
A new sheet has just been create, and you can see where both your PivotTble and PivotCharts will appear. Select the chart.
Step 5
A menu to the right will appear (it might have already been there, so make sure you selected the Chart and not the Table, as that menu would be slightly different).
Step 6
Drag and drop the field names into the categories as shown.
What you are doing here is telling Excel what data you want to see (Values) and how you want to break it down (per date, and per customer).
Step 7
By default dates data is always groupped quartile and year. To be able to see all the date we have data for, you can click the [+] near the data on the Table: this will show more details for both the table and the chart.
Step 8
But we want to get completely rid of the quartils and years. In order to achieve this, you need to right click any value of your date column in the Table, and choose "Ungroup" as displayed.
Step 9
Your data now looks like this.
Note the time axis is not on scale. For example if you hae monthly data and a month is missing, there will be no gap. This is one of the difficulties with Pivot data. This can be overcomes, but it is off topic here.
Step 10
Now we want to have a cumulative view of the data, so we want to play with the way the values are proessed by Excel.
Select the chart, then in the right panel: right click on the "Sum of Repayment" field, and select "Value Field Settings".
Step 11
In the "Show Values As" tab, select "Show values as" "Running Tital In".
Then choose "Date".
Here we are telling Excel that the value to display should be a cumulative total, cumulated according to the "Date" field.
Press OK.
Step 12
You now have what you are looking for. If you look in the Table, you have one column per Customer ID, and one row per date. For a given Date, you have the cumulative repayment made by a given Customer ID. At the very right, you have the Grand Total, which is, for a given date, the sum of all the Customer ID values.
Step 13
The Chart keeps showing the cumulative payment per CUstomer ID, and we cannot see the grand total.
In orer to achieve this, simply remove the "Customer ID" field from the "Legend (Series)" category area in the Fields Panel, as shown. (you can untick the Customer Id [x] box, or you can drag and drop it from the category area to the main list area).
Step 14
Now we only have the Grand total in the chart. But why?
If you display the "Value Field Settings" of Sum of Repyament" (Step 10), the first tab "Summarize Values By" will tell Excel what to do when several value meet the same Legend and Axis values.
Now that we removed the Customer ID field from the Legend area, for each date, we have 3 repayment values (one for each Customer ID). In the field settings, we tell Excel to use a "Sum". So it returns the sum of the 3 values.
But you could play around and return the Average, or even use "Count", which will show you how many records you have (it will return 3).
That is why pivot charts are so powerful: with only a few clicks and/or drag and drop, you can display a myriad of different graphics for your data.
For future interest, you should look online for Filters, and "Insert Slicer" (which is equivalent to filtering, but will add button directly on your chart: great when showing the data to colleagues and switch from one setting to another)
Hope this helped!

RemoveDuplicates is not working as expected

I have a large data set that is exported from a website. I use a macro in my main ‘filter’ workbook to find the file and copy the data from Sheet1 of the exported file into Sheet1 of the filter workbook.
Once the data is copied into Sheet1 of the filter workbook, I use VBA to copy columns A/B/D/F/H/Z/AA/etc from Sheet 1 of the filter workbook into Sheet2 of the filter workbook AND also at the same time, I use this code here to attempt to delete any duplicate rows:
Worksheets("Sheet2").Range("A:DZ").RemoveDuplicates Columns:=15, Header:=xlYes
I am finding though that the RemoveDuplicates is not working as expected.
As an example, Sheet1 in the filter workbook (and export workbook) has 3344 rows. When I manually filter using conditional formatting to highlight duplicates, I can find 314 rows listed as duplicates (meaning 157 genuine rows of actual data and 157 rows which are duplicates of that actual data. I haven’t found any examples of duplicates existing more than one time each). So on Sheet2 I was expecting to see 3344 – 157 = 3157 Rows of real data. I don’t see that, or even 3030 rows (3344-314). Instead, I am getting 1897 rows paste into Sheet2, which is a difference of 1447 rows (1290 less rows than expected).
On top of that, I am manually checking the data to see what is up by using Control-F in the column and am finding that in some instances that both of the two duplicated items are missing from Sheet2 (versus it just deleting the one duplicate row).
The duplicate data is not on sequential rows and is scattered throughout the column in Sheet2. But when I sort it before I attempt to DeleteDuplicates, it does not seem to impact its accuracy or make it right.
I have also tried using the DeleteDuplicates in different locations of the code / at different times but the results are always off by the same amount (1447 missing rows instead of 157 missing rows).
I found only a few articles on this site, this one was the closest but not quiet my issue: Delete Rows With Duplicate Data VBA
And other help sites/forums mention there was some bug with office 2007 that prevents this from working (am using 2013).
Does anyone know where I can find a work around, or if one exists - or if this is still a real bug or just a problem with the code line I have above.
Adding bits of code I use in this example in case it is something within these that is causing the problem…
Part of the copy code:
wsFromSheet.Cells.Copy
wsToFile.Range("A1").PasteSpecial xlPasteValuesAndNumberFormats
Application.CutCopyMode = False
wbFromFile.Close True
Part of the ‘paste column code’:
Sheets("Sheet2").Rows(1 & ":" & Sheets("Sheet2").Columns.Count).ClearContents 'Clear from row 1 down
LastRowFromSiteTracker = xSht.Cells.SpecialCells(xlCellTypeLastCell).Row 'original report has 128 columns
xSht.Range("B1:B" & LastRowFromSiteTracker).Copy ySht.Cells(Rows.Count, "A").End(xlUp) 'customer name
‘repeat a bunch of times, then…
Application.CutCopyMode = False 'do I need this line?
Worksheets("Sheet2").Range("A:DZ").RemoveDuplicates Columns:=15, Header:=xlYes
End Sub
Example/sample of data:
Row Source Data Expected Data Actual Data
1 1000474608 1000474608 1000474608 (Dup missing from sheet2)
2 1000474608 1000487672 1000487672
3 1000487672 1000487674 1000487674
4 1000487674 1000487676 1000487676 (missing from sheet2, wasn’t a dup)
5 1000487676 1000487678 1000487678
6 1000487678 1000487680 1000487680
7 1000487680 1000487682 1000487682 (Dup missing from sheet2)
8 1000487682 1000520278 1000520278
9 1000487682 1000520280 1000520280
10 1000520278 1000520282 1000520282 (Is there)
11 1000520280 1000520286 1000520286
12 1000520282 1000520336 1000520336 (Is there)
13 1000520282 1000520338 1000520338
14 1000520286 1000520392 1000520392
15 1000520286 1000520394 1000520394
16 1000520336 1000530333 1000530333
17 1000520338
18 1000520392
19 1000520394
20 1000530333
EDIT: EDIT: EDIT:
So I've tried to do some more manual testing, and tried two separate things with the same set of data, getting two different results. I used the Conditional Formatting - Highlight Duplicates from the main Home ribbon and the Remove Duplicates from the Data ribbon.
The Remove Duplicates button finds and removed 163 items in Column P and leaves 3181 rows.
However, when I use the Highlight Duplicates conditional format finds 314 items that are duplicated within Column P, leaving 3030 non duplicates.
It does not make sense that these two numbers do not match. I thought it has something to do with the duplicates themselves - as most duplicated items have only one dup of itself (123123 shows up in two rows) but then just a small handful of rows are duplicated multiple times (234234 shows up in 4 or 6 columns).
So instead of using the manual way, I used the suggestions I've found online, and both of these also provide differing results when run:
3344 Base records
1897 left after scrub of duplicates (1446 removed)
Dim tmpAddress As String
tmpAddress = "A2:BZ" & Worksheets("ColScrub").UsedRange.Rows.Count
Worksheets("ColScrub").Range(tmpAddress).RemoveDuplicates Columns:=15, Header:=xlNo
3181 left after scrub of duplicates (162 removed)
Cells.Select
ActiveSheet.Range("$A$1:$EI$3345").RemoveDuplicates Columns:=31, Header:=xlYes
My further experience now shows that UsedRange is completely unreliable if you have blank rows or columns. UsedRange only includes rows/columns up to the blank one. I have found a better way to get the last of each. These function use 2 basic assumptions, which should hold true for mostof your spreadsheets.
For LastRow there is a "key" column, i.e. a column where the MUST be
data, for example an ID column
For LastCol there should be a header row (or row where you can guarantee the last column is filled)
With this in mind, I have created the following 2 functions retrieve the last values accurately, every time ... well, almost (my complete function handles issues of the footer rows with merged cells)
For the last row
Public Function Excel_GetLastRow(xlSheet As Excel.Worksheet, _
ByVal KeyColumn As Long) As Long
' This could be adjusted for exact max rows Excel allows
Const MAX_XL_ROWS As Long = 1048000
Excel_GetLastRow = xlSheet.Cells(MAX_XL_ROWS, KeyColumn).End(xlUp).row
End Function
And for last column
Public Function Excel_GetLastCol(xlSheet As Excel.Worksheet, _
ByVal HeaderRow As Long) As Long
' This could be adjusted for exact max columns Excel allows
Const MAX_XL_COLS As Long = 16000
Excel_GetLastCol = xlSheet.Cells(MAX_XL_COLS, HeaderRow).End(xlToLeft).Column
End Function
Using these values you can now set your complete data range successfully.
top left = Cells(HeaderRow + 1, 1)
bottom right = Cells(LastRow, LastCol)
My complete functions include error handling and allowances for possible merged cells in the footer row and last header column, but you get the idea.
Art
1) you are only clearing as many rows as your have columns, not rows
Also, you may not be clearing anything, so use the UsedRange.Rows for proper rowcount
This line ...
Sheets("Sheet2").Rows(1 & ":" & Sheets("Sheet2").Columns.Count).ClearContents
Should read ...
Sheets("Sheet2").Rows(1 & ":" & Sheets("Sheet2").UsedRange.Rows.Count).ClearContents
Without properly clearing the old data, unpredictable results may occur.
2) Excel VBA seems rather quirky in that many things won't work correctly without specifically "selecting" the object(s) in question AND specifiying complete (not columnar) ranges
3) I also prefer to leave out the header row (note the "A2") and pass Header:=xlNo
4) Unless you have more than 625 columns, BZ should be far enough
So add this to your code ...
Dim tmpAddress as String
tmpAddress = "A2:BZ" & Worksheets("Sheet2").UsedRange.Rows.Count
Worksheets("Sheet2").Activate
Worksheets("Sheet2").Range(tmpAddress).RemoveDuplicates Columns:=15, Header:=xlNo
Hope this helps :)
I don't know why (or if/how) this is any different, but this seems to be the closest I can get to true removal of duplicates. I wanted to add it here as an answer for others in similar situations.
Dim lastrow As Long
With ThisWorkbook.Worksheets("ColScrub")
If Application.WorksheetFunction.CountA(.Cells) <> 0 Then
lastrow = .Cells.Find(What:="*", _
After:=.Range("A1"), _
Lookat:=xlPart, _
LookIn:=xlFormulas, _
SearchOrder:=xlByRows, _
SearchDirection:=xlPrevious, _
MatchCase:=False).Row 'Change P1 back to A1 if needed
Else
lastrow = 1
End If
.Range("A1:AZ" & lastrow).RemoveDuplicates Columns:=Array(16), Header:=xlYes
End With
I have to go through each row visually to prove this works I think, and to rule out that it isn't deleting things that should not be deleted - but this seems to get rid of the 'double duplicates' (4-6 lines items instead of 2 like the other duplicates).

How can I add a 1 to the most recent, repeated row in Excel?

I have a dataset with 60+ thousand rows in excel and about 20 columns. The "ID column" sometimes repeats itself and I want to add a column that will return 1 only in the row that is the most recent only IF it repeats itself.
Here is the example. I have…
ID DATE ColumnX
AS1 Jan-2013 DATA
AS2 Feb-2013 DATA
AS3 Jan-2013 DATA
AS4 Dec-2013 DATA
AS2 Dec-2013 DATA
I want…
ID DATE ColumnX New Column
AS1 Jan-2013 DATA 1
AS2 Feb-2013 DATA 0
AS3 Jan-2013 DATA 1
AS4 Dec-2013 DATA 1
AS2 Dec-2013 DATA 1
I've been trying with a combination of sort and nested if's, but it depends on my data being always in the same order (so that it looks up the ID in the previous row).
Bonus points: consider my dataset if fairly large for excel, so the most efficient code that won't eat up processor would be appreciated!
An approach you could use is to point MSQuery at your table and use SQL to apply the business rules. On the positive side, this runs very quickly (a couple seconds in my tests against 64k rows). A huge minus is the query engine does not seem to support Excel tables exceeding 64k rows, but there might be ways to work around this. Regardless, I offer the solution in case it gives you some ideas.
To set up first give your data set a named range. I called it MYTABLE. Save. Next select a cell to the right of your table in row 1, and click through Data | From other sources | from Microsoft Query. Choose Excel Files* | OK, browse for your file. The Query Wiz should open, showing MYTABLE available, add all the columns. Click Cancel (really), and click Yes, you want to continue editing.
The MSQuery interface should open, click the SQL button and replace the code with the following. You will need to edit some specifics, such as the file path. (Also, note I used different column names. This was sheer paranoia on my part. The Jet engine is very finicky and I wanted to rule out conflicts with reserved words as I built this.)
SELECT
MYTABLE.ID_X,
MYTABLE.DATE_X,
MYTABLE.COLUMN_X,
IIF(MAXDATES.ID_x IS NULL,0,1) * IIF(DUPTABLE.ID_X IS NULL,0,1) AS NEW_DATA
FROM ((`C:\Users\andy3h\Desktop\SOTEST1.xlsx`.MYTABLE MYTABLE
LEFT OUTER JOIN (
SELECT MYTABLE1.ID_X, MAX(MYTABLE1.DATE_X) AS MAXDATE
FROM `C:\Users\andy3h\Desktop\SOTEST1.xlsx`.MYTABLE MYTABLE1
GROUP BY MYTABLE1.ID_X
) AS MAXDATES
ON MYTABLE.ID_X = MAXDATES.ID_X
AND MYTABLE.DATE_X = MAXDATES.MAXDATE)
LEFT OUTER JOIN (
SELECT MYTABLE2.ID_X
FROM `C:\Users\andy3h\Desktop\SOTEST1.xlsx`.MYTABLE MYTABLE2
GROUP BY MYTABLE2.ID_X
HAVING COUNT(1) > 1
) AS DUPTABLE
ON MYTABLE.ID_X = DUPTABLE.ID_X)
With the code in place MSQuery will complain the query can't be represented graphically. It's OK. The query will execute -- it might take longer than expected to run at this stage. I'm not sure why, but it should run much faster on subsequent refreshes. Once results return, File | Return data to Excel. Accept the defaults on the Import Data dialog.
That's the technique. To refresh the query against new data simply Data | Refresh. If you need to tweak the query you can get back to it though Excel via Data | Connections | Properties | Definition tab.
The code I provided returns your original data plus the NEW_DATA column, which has value 1 if the ID is duplicated and the date is the maximum date for that ID, otherwise 0. This code will not sort out ties if an ID's maximum date is on several rows. All such rows will be tagged 1.
Edit: The code is easily modified to ignore the duplication logic and show most recent row for all IDs. Simply change the last bit of the SELECT clause to read
IIF(MAXDATES.ID_x IS NULL,0,1) AS NEW_DATA
In that case, you could also remove the final LEFT JOIN with alias DUPTABLE.
Sort by ID, then by DATE (ascending). Define entries in new column to be 1 if previous row has the same ID and next row has a different ID or is empty (for last row), 0 otherwise.
It could be done in VBA. I'd be interested to know if this is possible just using formulas, I had to do something similar once before.
Sub Macro1()
Dim rowCount As Long
Sheets("Sheet1").Activate
rowCount = Cells(Rows.Count, 1).End(xlUp).Row
Columns("A:D").Select
Selection.AutoFilter
Range("D2:D" & rowCount).Select
Selection.ClearContents
Columns("A:D").Select
ActiveWorkbook.Worksheets("Sheet1").AutoFilter.Sort.SortFields.Add Key:=Range _
("B1:B" & rowCount), SortOn:=xlSortOnValues
ActiveWorkbook.Worksheets("Sheet1").AutoFilter.Sort.SortFields.Add Key:=Range _
("A1:A" & rowCount), SortOn:=xlSortOnValues
ActiveWorkbook.Worksheets("Sheet1").AutoFilter.Sort.Apply
Dim counter As Integer
For counter = 2 To rowCount
Cells(counter, 4) = 1
If Cells(counter, 1) = Cells(counter + 1, 1) Then Cells(counter, 4) = 0
Next counter
End Sub
So you activate the sheet and get the count of rows.
Then select and autofilter the results, and clear out Column D which has the 0s or 1s. Then filter on the values mbroshi suggested that you say you're already using. Then execute a loop for each record, changing the value to 1, but then back to 0 if the value ahead of it has the same ID.
Depending on your processor I dont think this would take more than a minute or two to run. If you do find something using formulas I would be interested to see it!

Turning an excel formula into a VBA function

I'm a bit new to trying to program and originally was just trying to improve a spreadsheet but it's gone beyond using a basic function in excel. I have a table that I am having a function look at to find a building number in the first column and then look at start and finish dates in two other respective columns to find out if it should populate specific blocks on a calendar worksheet. The problem occurs because the same building number may appear multiple times with different dates and I need to to find an entry that matches the correct dates.
I was able to create a working though complicated formula to find the first instance and learned I can add a nested if of that formula again in the false statement with a slight change. I can continue doing that but it becomes very large and cumbersome. I'm trying to find a way to make a function for the formula with a variable in it that would look at how many times the it has already been used so it keeps searching down the table for an answer that fits the parameters.
This is currently my formula:
=IFERROR(IF(AND(DATE('IF SHEET (2)'!$F$7,MATCH('IF SHEET (2)'!$C$2,'IF SHEET (2)'!$C$2:'IF SHEET (2)'!$N$2,0),'IF SHEET (2)'!C$4)>=VLOOKUP("2D11"&1,A2:F6,4,0),DATE('IF SHEET (2)'!$F$7,MATCH('IF SHEET (2)'!$C$2,'IF SHEET (2)'!$C$2:'IF SHEET (2)'!$N$2,0),'IF SHEET (2)'!C$4)<=VLOOKUP("2D11"&1,A2:F6,4,0)),IF(VLOOKUP("2D11"&1,A2:F6,3,0)="2D11",VLOOKUP("2D11"&1,A2:F6,6,FALSE)),"NO ANSWER"),"ERROR")
Where you see 2D11&1 is where I need the variable for 1 so it would be "number of times it's been used in the function +1" then I could just loop it so it would keep checking till it ran out of 2D11's or found one that matched. I haven't posted before and I'm doing this through a lot of trial and error so if you need more info please post and say so and I'll try to provide it.
So rather than have someone try to make sense of the rediculous formula I posted I though I would try to make it simpler by just stating what I need to accomplish and trying to see how to turn that into a VBA function. So I'm kinda looking at a few steps:
Matches first instance of building name in column A with
building name for the row of the output cell.
Is date connected with the output cell >= start date of first entry(which is user entered in column D).
Is date connected with the output cell <= end date of first entry(which is user entered in column E).
Enters Unit name(located in column F) for first instance of the building if Parts 1, 2, and 3 are all True.
If parts 1, 2, or 3 are False then loops to look at next instance of the building name down column 1.
Hopefully this makes things clearer than the formula so I'm able to get help as I'm still pretty stuck due to low knowledge of VBA.
Here is a simple solution...
Building_name = ???
Date = ???
Last_Row = Range("A65536").End(xlUp).Row
For i = 1 To Last_Row
if cells(i,1).value = Building_Name Then
if date >= cells(i,4).value Then
if date <= cells(i,5).value Then
first instance = cells(i,6).value
end if
end if
end if
next
you should add a test at the end to avoid the case where there is no first instance in the table
If I understand correctly, you have a Table T1 made of 3 columns: T1.building, T1.start date, T1.end date.
Then you have 3 parameters: P1=building, P2=start date, P3=end date.
You need to find the first entry in table T1 that "fits" within the input parameters dates, that is:
P1=T1.building
P2<=T1.start date
P3>=T1.end date
If so, you can define a custom function like this
Public Function MyLookup(Key As Variant, DateMin As Variant, DateMax As Variant, LookUpTable As Range, ResultColumn As Integer) As Range
Dim iIndx As Integer
Dim KeyValue As Variant
Dim Found As Boolean
On Error GoTo ErrHandler
Found = False
iIndx = 1
Do While (Not Found) And (iIndx <= LookUpTable.Rows.Count)
KeyValue = LookUpTable.Cells(iIndx, 1)
If (KeyValue = Key) And _
(DateMin <= LookUpTable.Cells(iIndx, 2)) And _
(DateMax >= LookUpTable.Cells(iIndx, 3)) Then
Set MyLookup = LookUpTable.Cells(iIndx, ResultColumn)
Found = True
End If
iIndx = iIndx + 1
Loop
Exit Function
ErrHandler:
MsgBox "Error in MyLookup: " & Err.Description
End Function
That may not be the most performant piece of code in the world, but I think it's explanatory.
You can download this working example

Resources