How to select the last cell in a range in Excel - excel

I've got a spreadsheet that updates throughout the day with data, I need to be able to grab the last cell in a column but for certain date ranges, not just the last cell in the column.
Column C contains the data I need, column A and B update with the date and time, (some cells in column A could be blank too). Column D I can change to make column E display the latest data for the selected date.
Here's what I've got so far to put in column E:
VLOOKUP(D1, $A:$C,3,FALSE)
I've managed to get data from my formula but only the first entry. For example if I enter the date 17/05/2016 it will return '5'. Whereas I need the more recent data '28'.
Example sheet:
A | B | C | D | E
16/05/2016 | 08:00:00 | 3 | date | data
16/05/2016 | 12:00:00 | 7
16/05/2016 | 18:00:00 | 15
16/05/2016 | 22:00:00 | 27
17/05/2016 | 08:00:00 | 5
17/05/2016 | 12:00:00 | 11
17/05/2016 | 18:00:00 | 21
17/05/2016 | 22:00:00 | 28
18/05/2016 | 08:00:00 | 4
18/05/2016 | 12:00:00 | 13
18/05/2016 | 18:00:00 | 19
18/05/2016 | 22:00:00 | 30
I've only just started getting my head around excel formulas so any help would be greatly appreciated!

=INDEX(C2:C13,MATCH(D3,A2:A13,1))
INDEX/MATCH is a very powerful combination. It can perform the same job as VLOOKUP and then a bit more. VLOOKUP is restricted to searching the first column and returning information to the right. With MATCH you can search any column, and you can return information from any column (even to the left which vlookup cant do)
If you start reading with the MATCH function, it searches for the value in D3 within the range A2:A13 and return an integer representing the row the value of D3 was found it. The 1 at the end of match tell match to look for that last entry that D3 exceeded. This means that column A needs to be sorted in ASCENDING order
INDEX uses the integer from MATCH and goes down that many rows in in specified range. so if match returned 1, then it would read C2.

Related

Excel: Difference in hours between duplicates

I am having a problem, hope you can help.
I need to have the differente in hours between duplicates. Example:
Date Time | SESSION_ID | Column I need
24/01/2020 10:00 | 100 | NaN
24/01/2020 11:00 | 100 | 1
14/03/2020 12:00 | 290 | NaN
16/03/2020 13:00 | 254 | NaN
16/03/2020 14:00 | 100 | 1251
In session_ID column, there are 3 duplicates with value 100.
I need to know the difference in hours between those sessions, which would be 1 hour between the first and the second, and 1 251 hours between the second and the third.
Does anyone has any type of clue on how this could be done?
If one has the Dynamic Array formula XLOOKUP, put this in C2 and copy down:
=IF(COUNTIF($B$1:B1,B2),A2-XLOOKUP(B2,$B$1:B1,$A$1:A1,,0,-1),"NaN")
Then format the column: [h]
If not then use INDEX/AGGREGATE in its place:
=IF(COUNTIF($B$1:B1,B2),A2-INDEX(A:A,AGGREGATE(14,7,ROW($B$1:B1)/($B$1:B1=B2),1)),"NaN")

Excel: Remove Duplicates based on time condition

I'm looking to remove duplicates from a 250,000 row excel sheet based on a 3 month rolling time condition.
We have a lot of usersIDs and the dates which they visited but a lot of these visits are very far apart (sometimes over a year) and a lot of them are within the same day/couple day period.
The best way to explain what I want to do is with an example. So if they first visited on 1st Jan, 1st Jan, 3rd Jan, 8th Feb, 4th June, 5th June, 1st Dec, 1st Dec, 2nd Dec, I would want to grab that first date of 1st Jan, 4th June and 1st Dec.
If they visited 1st Jan, 1st Jan, 3rd Jan, 8th Feb, 9th Apr then 1st August, 1st Sept, I would want 1st Jan and 8th August.
So we want to grab the first date, then see how often they visit within 3 months of each visit and if they leave for more than a 3 month period, grab the first date that they return. Sometimes they come back 4 or 5 times after 3 months and the data can span several years.
Is there a way for me to achieve this? It would be great to get some help as this is driving me mad.
Cheers
If the UserID is in column A and the VisitDate is in B with the headings in row 1 and then a blank row in 2 and the data starting in row 3 then try this (explanation below):
Array Formula version:
sort the rows ascending by VisitDate
in B2 put 1/1/1900 so it won't match anything (but it has to be a date)
in C3 put this array formula (press control-shift-enter instead of just enter):
=SUM((B$2:B2<DATE(YEAR(B3),MONTH(B3)-3,DAY(B3)))*(A$2:A2=A3))=SUM((A$2:A2=A3)*1)
Copy the formula in C3 down to every row of data
Filter on Unique = TRUE
if you want to resort you will need to copy and paste back column C by values
New non-array formula version:
sort the rows ascending by VisitDate
in B2 put 1/1/1900 so it won't match anything (but it has to be a date)
in C3 put this normal formula (just press enter):
=COUNTIFS(B$2:B2,"<"&DATE(YEAR(B3),MONTH(B3)-3,DAY(B3)),A$2:A2,A3)=COUNTIF(A$2:A2,A3)
Copy the formula in C3 down to every row of data
Filter on Unique = TRUE
if you want to resort you will need to copy and paste back column C by values
This produces the following with my sample data (array formulas may take a very long time to calculate for lots of rows):
| A | B | C
---+--------+------------+--------
1 | UserID | VisitDate | Unique
2 | | 1/01/1900 |
3 | a | 1/01/2017 | TRUE
4 | a | 1/01/2017 | FALSE
5 | b | 2/01/2017 | TRUE
6 | b | 2/01/2017 | FALSE
7 | a | 3/01/2017 | FALSE
8 | c | 3/01/2017 | TRUE
9 | c | 3/01/2017 | FALSE
10 | b | 4/01/2017 | FALSE
11 | c | 5/01/2017 | FALSE
12 | a | 8/02/2017 | FALSE
13 | b | 9/02/2017 | FALSE
14 | c | 10/02/2017 | FALSE
15 | a | 4/06/2017 | TRUE
16 | a | 5/06/2017 | FALSE
17 | b | 5/06/2017 | TRUE
18 | b | 6/06/2017 | FALSE
19 | c | 6/06/2017 | TRUE
20 | c | 7/06/2017 | FALSE
21 | a | 1/12/2017 | TRUE
22 | a | 1/12/2017 | FALSE
23 | a | 2/12/2017 | FALSE
24 | b | 2/12/2017 | TRUE
25 | b | 2/12/2017 | FALSE
26 | b | 3/12/2017 | FALSE
27 | c | 3/12/2017 | TRUE
28 | c | 3/12/2017 | FALSE
29 | c | 4/12/2017 | FALSE
Because the formula compares the current row with all the rows above looking for rows with dates in the past the data needs to be sorted with the oldest dates first.
How the array formula works:
=SUM((B$2:B2<DATE(YEAR(B3),MONTH(B3)-3,DAY(B3)))*(A$2:A2=A3))=SUM((A$2:A2=A3)*1)
DATE(YEAR(B3),MONTH(B3)-3,DAY(B3)) is 3 months ago (even if it is 92 days)
(B$2:B2<DATE(YEAR(B3),MONTH(B3)-3,DAY(B3))) is an array of TRUE/FALSE values which has a TRUE for every row above that is older than 3 months ago
(A$2:A2=A3) is an array of TRUE/FALSE values which has a TRUE for every row above that matches the user ID
(B$2:B2<DATE(YEAR(B3),MONTH(B3)-3,DAY(B3)))*(A$2:A2=A3) does an AND of the arrays so 1 is returned (TRUE*TRUE=1) for each row above that has the same name and a date that is older than 3 months ago
SUM((B$2:B2<DATE(YEAR(B3),MONTH(B3)-3,DAY(B3)))*(A$2:A2=A3)) adds all the TRUE rows above that have the same name and a date that is older than 3 months ago
SUM((A$2:A2=A3)*1) adds the number of rows above that have the same name (TRUE*1=1)
=SUM((B$2:B2<DATE(YEAR(B3),MONTH(B3)-3,DAY(B3)))*(A$2:A2=A3))=SUM((A$2:A2=A3)*1) compares the two sums and returns TRUE if all the rows above that have the same name are all older than 3 months ago
Methodology:
I originally just played with a column of dates - no userID. I wanted to find a way to know if the date on a particular was more than 3 months after all the dates before it (I implicitly assumed that the dates were sorted). I reasoned that if a count of the dates before the current row matched a count of the dates before the current row that were older than 3 months in the past then I would have the answer I wanted. So I originally put this formula in C3 and copied it down:
=COUNTIF(B$2:B2,"<"&(B3-90))=COUNTA(B$2:B2)
Then change it to 3 months instead of 90 days:
=COUNTIF(B$2:B2,"<"&DATE(YEAR(B3),MONTH(B3)-3,DAY(B3)))=COUNTA(B$2:B2)
And then to add the userID we need a way to compare multiple criteria - this is where COUNTIFS comes in (if you have Excel 2007 or better):
=COUNTIFS(B$2:B2,"<"&DATE(YEAR(B3),MONTH(B3)-3,DAY(B3)),A$2:A2,A3)=COUNTIF(A$2:A2,A3)
And then I converted it to this array formula:
=SUM((B$2:B2<DATE(YEAR(B3),MONTH(B3)-3,DAY(B3)))*(A$2:A2=A3))=SUM((A$2:A2=A3)*1)
In retrospect I don't know if giving the array formula was a good idea or not: I don't know whether the array formula would be better/faster than COUNTIFS or not. So use whichever you prefer.

Sum only values which fall Monday to Friday

I receive a statement (as a .xls) each month which list a bunch billable items with an associated date. I want to create a formula (using either =sum() or =sumifs() to total the billable items, but only those which fall Monday to Friday (i.e., not weekends). Is that possible?
A B
------+--------------+-------------
1 | 05/12/2016 | $10.00
2 | 06/12/2016 | $10.00
3 | 07/12/2016 | $10.00
4 | 08/12/2016 | $10.00 dates are formatted as
5 | 09/12/2016 | $10.00 dd/mm/yyyy
6 | 10/12/2016 | $10.00
7 | 11/12/2016 | $10.00
8 | 12/12/2016 | $10.00
------+--------------+-------------
| Sum | $80.00
------+--------------+-------------
| Sum |
| (no weekends)| $60.00
------+--------------+-------------
EDIT:
I've just looked closer at the excel doc, and it's actually a datetime field, e.g. 31/10/2016 12:44:00 pm (displayed as 31/10/16 12:44).
I'm also not looking for a formula which works line by line, I'd like something which I can just copy and paste into a single cell at the bottom of the doc each month which examines A:A.
You need to use this formula:
=SUMPRODUCT(B1:B8,--(WEEKDAY(A1:A8,2)<6))
This is a hack which behaves like SUMIF but lets you use a function in your criteria. Otherwise, you would need to create an auxiliary column with WEEKDAY (in C for example) and then use =SUMIF(C1:C8,"<6",B1:B8).
WEEKDAY by default returns 1-7 for SUN-SAT. As this doesn't help, you can change the return type to type 2 with the optional second parameter to make the function return 1-7 for MON-SUN, which lets you do the easy <6 comparison. You can also use type 3, which returns 0-6 for MON-SUN, and then obviously use <5 instead.
More about the -- hack here.

Calculate Average based on multiple condition in Excel

I am back with my new excel question.
Lets say I have table like this.
| A | B
------------------------------------------
1 | ENV | Value
------------------------------------------
2 | ABC - 10/1/2014 1:38:32 PM | 4
3 | XYZ - 10/1/2014 1:38:32 PM | 6
4 | ABC - 9/1/2014 1:38:32 PM | 1
5 | XYZ - 10/1/2014 1:38:32 PM | 10
6 | ABC - 10/1/2014 1:38:32 PM | 7
7 | XYZ - 9/1/2014 1:38:32 PM | 1
8 | ABC - 9/1/2014 1:38:32 PM | 10
9 | ABC - 10/1/2014 1:38:32 PM | 7
10 | XYZ - 10/1/2014 1:38:32 PM | 7
Now, in Cell C2, I've selected ABC.
So in cell D2, I want the average (from col B) of all the "ABC" (col A) where Month = 10 (col A) and in cell E2, Max (from col B) of all the "ABC" where Month = 10 (col A).
So, my result in cells D2 and E2 would be 6 and 7 respectively.
I hope my question and example make sense.
UPDATE:
Thank you all for all your help.
Now let's say I am not sure how many rows I'll have on this spreadsheet, so I came up with this formula, but its not working, giving me #DIV/0! error.
*Note: I am using formula to get "ABC" and "10" from cell C2.
=AVERAGEIFS(
(OFFSET($A$1,1,1,COUNTA($B:$B)-1,1)),
OFFSET($A$1,1,0,COUNTA($A:$A)-1,1), (MID(C2,1,(FIND("-",C2))-2)),
OFFSET($A$1,1,0,COUNTA($A:$A)-1,1), (MID(C2,(FIND("-",C2)+1),(FIND("/",C2))-(FIND("-",C2)+1))))
Even tried this, but same error:
=SUMPRODUCT(((MID(A2:A10,1,(FIND("-",A2:A10))-1))=(MID(C2,(FIND("-",C2)+1),(FIND("/",C2))-(FIND("-",C2)+1))))*
(MONTH(DATEVALUE(MID(A2:A10,7,99)))=(MID(C2,(FIND("-",C2)+1),(FIND("/",C2))-(FIND("-",C2)+1))))*
(B2:B10))/SUMPRODUCT(((MID(A2:A10,1,(FIND("-",A2:A10))-1))=(MID(C2,(FIND("-",C2)+1),(FIND("/",C2))-(FIND("-",C2)+1))))*
(MONTH(DATEVALUE(MID(A2:A10,7,99)))=(MID(C2,(FIND("-",C2)+1),(FIND("/",C2))-(FIND("-",C2)+1)))))
Can you help me with this...?
Solution with Intermediary Values
To solve the issue (I tested the average only) I first used 2 intermediary values: this solution is not optimal and there will be many smarter ways to address the issue (e.g. pivot tables).
ENV Value Intermediary 1 Intermediary 2
ABC - 10/1/2014 1:38:32 PM 4 ABC 10
XYZ - 10/1/2014 1:38:32 PM 6 XYZ 10
ABC - 9/1/2014 1:38:32 PM 1 ABC 9
XYZ - 10/1/2014 1:38:32 PM 10 XYZ 10
The first intermediary column contains the first 3 chars of ENV column (=LEFT(A9,3)), while the second intermediary column contains the month (=MID(A9,7,2)). This works only if your ENV records are fixed size and homogeneous (e.g. your env name has exactly 3 chars).
With this layout, you can compute the average putting in any cell the following formula:
=AVERAGEIFS(D9:D12, F9:F12,"=ABC", G9:G12, "=10")
Where D9:D12 is the values interval, F9:F12 is the 1st intermediary column and G9:G12 the second intermediary column.
One Shot Compact Solution (Arrays)
An optimized solution can be found relying on arrays. For instance, to calculate the average and the max of an interval based on 2 "vectorial" conditions you can write this one liners:
= MAX(IF((LEFT(A9:A12,3)="ABC")*(MID(A9:A12,7,2)="10"),D9:D12))
= AVERAGE(IF((LEFT(A9:A12,3)="ABC")*(MID(A9:A12,7,2)="10"),D9:D12))
With A9:A12 your original records, and D9:D12 is the values interval.
The advantages of this solution are that you don't need any intermediary column and that you can extend this approach to all the other formulas that don't have 'xxxxxIFS' (it's the case for MAX).
NOTE: you have to confirm this formula with CTRL + SHIFT + RETURN or your formula will fail with #VALUE error.
Live Demo
Live demo available here.
You can start by spiting column A into a date and letters using - Data > Text to Columns with the delimiter " - ".
after you have the new two columns (let say F and G) you can use the function "AVERAGEIF" with a condition that check is the value of the cell in "F" is ABC and the Moth(cell in "G") = 10.
as for the max, you can do the same with MAX(IF....) for column E.
SUMPRODUCT will allow you to parse the left-most and date characters from your combined string. A pseudo-MAXIF() can be similarly constructed using MAX() and INDEX().
In D2 use =SUMPRODUCT((LEFT(A2:A10,3)="ABC")*(MONTH(DATEVALUE(MID(A2:A10,7,99)))=10)*(B2:B10))/SUMPRODUCT((LEFT(A2:A10,3)="ABC")*(MONTH(DATEVALUE(MID(A2:A10,7,99)))=10))
In E2 use =MAX(INDEX((LEFT(A2:A10,3)="ABC")*(MONTH(DATEVALUE(MID(A2:A10,7,99)))=10)*(B2:B10),,))
Both SUMPRODUCT and INDEX like to choke on anything remotely resembling an error when parsing text so keep the cell range references to what your actual data is and avoid blanks.
Your results should look like the following.
            

How to automatically delete rows in Excel

Consider the following (partial) Excel worksheet:
A | B | C | D
---+-------------+-------+-------
id | date | var_a | var_b
1 | 2011-03-12 | 200 | 34.22
1 | 2011-03-13 | 203 | 35.13
1 | 2011-03-14 | 205 | 34.14
1 | 2011-03-15 | 207 | 54.88
1 | 2011-03-16 | 208 | 12.01
1 | 2011-03-18 | 203 | 76.10
1 | 2011-03-19 | 210 | 14.86
1 | 2011-03-20 | 200 | 25.45
. | . | . | .
. | . | . | .
2 | 2011-03-12 | 200 | 34.22
2 | 2011-03-13 | 203 | 35.13
2 | 2011-03-14 | 205 | 34.14
2 | 2011-03-15 | 207 | 54.88
2 | 2011-03-16 | 208 | 12.01
2 | 2011-03-18 | 203 | 76.10
2 | 2011-03-19 | 210 | 14.86
2 | 2011-03-20 | 200 | 25.45
. | . | . | .
. | . | . | .
In reality, there are over 5.000 rows. I need to delete all rows which date falls on a saturday or sunday. In the example, March 12 and 13 (2011-03-12/13) and March 19 and 20 are Saturdays and Sundays. I cannot just delete every nth rows, since there might be days missing in the list (as is the case here with 2011-03-17).
Is this possible to do with either a formula or VBScript? I have never written a VBScript macro before (I have never had a use for it) so I would appreciate some help.
If you only need to do this once, this is what I would do. This should preserve the order, but if you're really worried about it, read very end of the post:
Add a new column, call it "Is Weekend". In it, put =if(WEEKDAY(B2, 2) > 5, 1, 0). Drag that formula down for the entire table.
Filter the columns. To do that, select the entire table (click on any table cell then hit Ctrl-A), then
On Excel 2007+, go to Data-> click "Filter"
On Excel 2003, go to Data->Filter->Auto Filter.
Sort everything by last column (Is Weekend) in descending order. This should put all weekend rows together without altering the order among the other rows.
Delete all rows with 1 in "Is Weeked" column. Delete that column.
If you're really worried about preserving order, before you do the above, you can do the following:
Add a new column called "Position". Put 1 in the first row, 2 in the second row, select them and drag it down to the bottom so every row has its own position number in increasing order.
Perform the filtering as above.
After you're done, sort everything in ascending order by "Position" column.
The trick is that you don't need to delete those rows, you need to replace their values for C and D with 0. This is easiest done with IF() and WEEKDAY() within two new columns C' and D' referencing C and D. Feel free to then just delete C and D.
You can do this in one go using an array formula. In cell E2, enter the following formula (on one line), and confirm with Ctrl-Shift-Enter (as opposed to the regular Enter)
=INDEX($A$2:$D$5000, SMALL(IF(WEEKDAY($B$2:$B$5000,2)>5, "",
ROW($B$2:$B$5000)-MIN(ROW($B$2:$B$5000))+1), ROW(A1)),COLUMN(A1))
5000 indicates the number of rows in your spreadsheet. After this, the formula should have curly braces around it to indicate it is an array formula. E2 should have the value 1. Then select cell E2 and drag the lower-right corner of the cell to the right until 4 cells are covered. Then drag the lower-right corner of the 4-cell-selection all the way down. At the bottom you will see rows containing #NUM!, one for each deleted row. You can delete those in the regular way.
In stead of starting off in cell E2, you could start off in cell A2 of a new sheet. In that case, you need to prepend the original sheet name to each reference in the formula, as in OriginalSheet!$A$2
This formula is an adaption from the one given in Excel: Remove blank cells
In case you decide to delete the rows, please make sure to run the VBA code from the last row to the first row. Here is a piece of code just written from memory to show you the idea of running from bottom to the top.
For i = Selection.Rows.Count To 1 Step -1
If WEEKDAY(Cells(r, 2),2) > 5 Then
Selection.Rows(i).EntireRow.Delete
End If
Next i

Resources