Excel - What is the easiest way to calculate incidence plus prevalence over time? - excel

Say I have the dataset below, what is the most efficient formula to fill the cells in column D, where the number of patients alive are calculated?
Example data set in excel
The way it should calculate is:
month 1: 8*100% = 8
Month 2: 8*80%+6*100% = 12.4
Month 3: 8*75%+6*80%+9*100% = 19.8
...
Month 10: etc.
The problem that I have is that which each row, the formula becomes longer. It is feasible to just manually enter the formulas for small datasets, but as datasets become larger, this task becomes unfeasible.
I have been able to use VBA to code the survival of the number of new patients column (C). But then I would have to rerun the VBA code as soon as I change a single value in that column.
I have a feeling it should be possible with some combination of the INDEX function in excel, I just haven't been able to figure it out.
Who can help me out here?
Kind regards,
Sander

If moving the data a bit is allowed at least for the calculation, you could do something like this:
=SUMPRODUCT($F$11:$F$20,B2:B11)
It uses a reversed list of your current list of new patients. That list is created with (formula obtained from this site):
=INDEX($C$11:$C$20,COUNTA($C$11:$C$20)+ROW($C$11:$C$20)-ROW())
Result:
The added space is necessary for the formula to work (so that it gets 0% for patients not present yet).
Or one where you don't have to leave spaces (everything from above is reversed however):
=SUMPRODUCT($C$2:$C$11,G11:G20)

Related

Excel Group rows based on time interval

I have the following excel result:
I want to group the above result in groups based on sessions i.e. if the time gap between two successive timestamps is greater than 5 minutes, it must be a new row.
For example :
I need some formula to achieve this. As I'm fairly new to Excel this is causing to be a major headache for me. Please help me, if anyone knows how to do it or at least point me in a direction.
Thanks a ton !!!
Judging by your screenshot, it appears your timestamps are actually text values. Text by default is usually left aligned where as numbers are right aligned. You seem to have a space at the end of your time stamp suggesting that it is probably left aligned and therefore text. You can test it with the following formula which will return TRUE if its text.
=ISTEXT(P2)
where P2 is one of your time stamps.
CONVERT TIMESTAMPS TO TIME
There are a variety of ways to do this. Some will depend on system settings. Take a look at the following functions as each might be useable depending on your system. The first two are a guarantee, the last two are more dependent on system settings.
DATE
TIME
DATEVALUE
TIMEVALUE
Something important to remember here is that in excel dates are integers counting the days since 1900/01/01 with that date being 1. Time is stored as a decimal and represents fraction/percentage of a day. 24:00:00 is not a valid time in excel though some functions may work with it.
So in order to convert your time stamp in P2 I used the following formula to pull out the date:
=DATE(LEFT(P2,4),MID(P2,FIND("-",P2)+1,2),MID(P2,FIND(" ",P2)-2,2))
Basically it goes into the text and strips out the individual numbers for Year, Month and Day.
To pull out the time, I could have done the same procedure but elected to demonstrate the TIMEVALUE method which is a little more robust than DATEVALUE and not a subjective to system settings as much. With the following formula I stripped out the whole time code (MINUS"UTC"):
=TIMEVALUE(TRIM(MID(P2,FIND(" ",P2)+1,FIND("UTC",P2)-FIND(" ",P2)-1)))
I also made an assumption that you are not mixing and matching UTC with other time zones which means it can be ignored. Now to get DATE and TIME all in one cell, you just need to add the two formulas together to get:
=DATE(LEFT(P2,4),MID(P2,FIND("-",P2)+1,2),MID(P2,FIND(" ",P2)-2,2))+TIMEVALUE(TRIM(MID(P2,FIND(" ",P2)+1,FIND("UTC",P2)-FIND(" ",P2)-1)))
In the example at the end, I placed that formula in Q2 and copied down
DELTA TIME
Since you want to break your groups out based on a time difference between individual entries, I used a helper column to store the time difference. In my example at the end I stored this difference in Column S. The first entry is blank as there is no time before it. I used the following formula in S3 and copied it downward.
=Q3-Q2
I applied the custom formatting of [h]:mm:ss to the cell to get it to display as shown.
FIND GROUP BREAK POINTS
In my example I am using helper column T to hold breakpoint flags. At a minimum, you will have two break points. Your first time entry and your last time entry. To make like simple I simply hard coded my first breakpoint flag in T2 as 1. Stating in T3, Three checks need to be made. If any of them are TRUE then the next flag needs to be added with a value increase by one. the three checks are:
Is this the last entry
Is the next time delta greater than 5 minutes (means end of a group)
Is this time delta greater than 5 minutes (means start of a group)
Based on those three checks I placed the following formula in T3 and copied down:
=IF(OR(S4="",S4>TIME(0,5,0),S3>TIME(0,5,0)),MAX($T$2:T2)+1,"")
Note the $ on the first part of the range for the MAX function. This will lock the start of the range while the formula gets copied down while the end of the range increases accordingly.
Also the row after the last time entry must be blank. IF it is not blank and has a set value in it, change the S4="" to S4="set value".
GENERATE TABLE
There are multiple ways to reference the flags and pull the corresponding times. a couple of formulas you can look into are:
INDEX / MATCH
LOOKUP
In this example I elected to use LOOKUP though I believe INDEX and MATCH are more appropriate and robust. For starters we want to generate a list of ODD number and EVEN numbers. These represent the start and end of the groups and correspond to the flags set in column T. One way to generate ODD and EVEN numbers as you copy down is:
=ROW(A1)*2-1 (ODD)
=ROW(A1)*2 (EVEN)
The next step is to find the generated number in Column T and then pull its corresponding timestamp in Column Q. I did this with the following formula in V2 and copied down.
=LOOKUP(ROW(A1)*2-1,T:T,Q:Q)
And in W2
=LOOKUP(ROW(A1)*2,T:T,Q:Q)

How to reference one value to various cells, depending on a date?

Please have a look at the picture I attached, it'll make understanding my problem easier because it's hard to describe.
In the first table, I have capacity data for a product. The capacity changes by the date indicated in the column, i.e. from July 2017 the capacity would be 56, from December 2018 78, and from October 2019 99. The reason why I don't write down the capacity for every month is that I want to save columns.
In the second table, I have every month. I want to reference the correct capacity for each month, e.g. it would be 56 for every month until December 2018.
I have been considering an =INDEX function, but it seems to complex for that. Is there a way to reference like this without using VBA? Would the VBA solution be simple? Or am I forced to write a column for every month's capacity in the first table? Thank you!
https://i.imgur.com/mRoBtTo.png
You can simply use several IF statements to compare the month in your column with the months given in your first table, and put the value of the correponding month.
Let's admit your first row is 1 and first column is A, it should give something like:
= IF(D7>=$F$2; $F$3; IF(D7 >= $E$2; $E$3; IF(D7 >= $D$2; $D$3; "")))
I dont see you columns and rows so i hope you will change them correctly on this formula:
=HLOOKUP(C111,$C$106:$P$107,2,TRUE)
C111 is the cell above your red row.
$C$106:$P$107 is the tableof capacities, i know it is bigger then the current one so you see you can add more columns.
2 is the row number from the capacity table.
true is becouse you dont want it to be the exact value it will take the previews in hte order of items
Both previous answers work perfectly, but I would go this way-
You don't actually need an if to find the previous capacity. you can simply use the approximate match (similar to the hlookup answer) in an index formula
=+INDEX($B$4:$E$5,MATCH($B$9,$B$4:$B$5,0),MATCH(C9,$B$4:$E$4,1))
The product $B$9 matches exact (0), but the date C9 is bigger than or equal (1).
$B$4:$E$5 is the source of capacity and
C9:AF9 the date timeline
Final advantage would be that you can have several products to index, not only a single one.
Could you please try the below formula and provide feedback please?
=IF(AND(D8>=D2,D8<E2),"56",IF(AND(D8>=E2,D8<F2),"78",IF(D8>=F2,"99")))

Distribution of time values randomly in a table Excel - Modeling Power Grid

I am working on a model of charging load of electric vehicle. I am attaching a link to an excel workbook for your better understanding.
Column B contains random time values
Column G to P represents houses and each house can have 1 car. So the each time values needs to be distributed in one column. Now when a car is plugged in, its load stays constant for 3 cells.
I want excel to randomly distribute these cars e.g. 4 cars to 4 houses and leave others blank.
what i can think of is, to assign each time a random house then use IF formula with AND function to match random times with time series and second condition to match random houses with columns 1-10.
the problem i am facing is, the formula gives a value error and only works in the rows with has random generated time in front of them screenshot. I know there is a very small thing that i am missing. please help me find it
Regards
workbook
=IF(ISNA(MATCH(G$5,$C$6:$C$9,FALSE)),"",IF(AND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))>=$F6,INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))<=$F6+TIME(0,30,0)),11,""))
The two elements in the AND find the house number in column C and return the corresponding time in column B.
The first element compares the time in F to that time. The second element compares the time + 30 minutes to F (three cells). If it's between those two times, it gets an 11.
The ISNA makes sure that the house in question is on the list. You could also use an IFERROR, but I prefer the precision of ISNA.
Update
If you want the values to wrap around, you need to OR compare to the next day.
=IF(ISNA(MATCH(G$5,$C$6:$C$9,FALSE)),"",IF(OR(AND(ROUND($F6,5)>=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE)),5),ROUND($F6,5)<=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))+TIME(0,30,0),5)),AND(ROUND($F6+1,5)>=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE)),5),ROUND($F6+1,5)<=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))+TIME(0,30,0),5))),11,""))
That formula structure looks like
=If(isna(),"",if(or(and(today,today),and(tomorrow,tomorrow)),11,"")
This formulas already getting too big. If you triple it for your three voltages, it will be huge. You should consider writing a UDF in VBA. It won't be as quick to calculate, but will probably be more maintainable.
If you want to stick with a formula, you could put the wattage in row 4 above the house number. Then in another table, list the wattages and minutes to charge. So in, say, B12:C14 you have
3.7 120
11 30
22 15
Now where you have 11 in your formula, you'd have G$4 and the two placed you have TIME(0,30,0), you'd have TIME(0,INDEX($C$12:$C$14,MATCH(G$4,$B$12:$B$14,FALSE)),0). I re-arranged some stuff to make it more 'readable' (but it's still pretty tough) and here's the final formula
=IF(ISNA(MATCH(G$5,$C$6:$C$9,FALSE)),"",IF(OR(AND(ROUND($F6,5)>=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE)),5),ROUND($F6,5)<=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))+TIME(0,INDEX($C$12:$C$14,MATCH(G$4,$B$12:$B$14,FALSE)),0),5)),AND(ROUND($F6+1,5)>=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE)),5),ROUND($F6+1,5)<=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))+TIME(0,INDEX($C$12:$C$14,MATCH(G$4,$B$12:$B$14,FALSE)),0),5))),G$4,""))

Excel Rolling Mean of 3 Similar Consecutive Observations

I'm trying to find the rolling mean of time series while ignoring values that do not follow the trend.
x
869
1570
946
0
1136
So, what I would want the result to look like is...
x | y
869 | 0
1570 | 0
946 | 1128.33
3 | 0
1136 | 1217.33 ([1136+1570+946]/3)
900 | 2982 ([946+1136+900]/3)
860 | 2896
The tough part here is if the row I'm on is a trending value I want to take the 3 previous trending values and find them mean of them, but if it's a non-trending value I want it to just zero out. Sometimes I might have to skip 2 or 3 previous lines to get 3 trending values to take the average as well.
So far I've been using array, RC formulas in a VBA macro form, but I'm not sure I could use RC here or if it has to be something else completely. Any help would be greatly appreciated.
I believe I can help you with your problem. First three notes:
1) It appears to me that you are trying to do DCA on smoothed production profiles, ignoring months without a complete record or no data. I'm making this assumption since you mentioned this was time series data but didn't give a sample rate. 2) I've added some extra 'data' for the sake of demo-ing. 3) In your example you shared, the last two values in your 'Y' column it looks like you may have summed but have forgotten to divide.
The solution I came up with has three parts: 1) create a metric to identify 'outliers'; 2) flag 'outliers'; 3) smooth non-flagged data. Let's establish some worksheet infrastructure and say that your production values are in column B and the associated time is in column A as follows:
Part 1) In column 'C', estimate a rough data value based on a trend approximated from two points on either side of your current time step. Subtract the actual value from this approximation. The result will always be positive and quite large for a timestep with little or no production.
=(INTERCEPT(B1:B6,A1:A6)+(A4*SLOPE(B1:B6,A1:A6)))-B4
Part 2) In column 'D', add a condition for when the value computed above is larger than the actual data point. Have it use '0' to identify a point that shouldn't be included in your average. Copy this down to the end of your data as well.
=IF(C4>B4,0,1)
Our sheet now looks like this:
3) Your three element average can now be computed. In the last cell of column 'E', enter the following array formula. You have to accept this formula by pressing ctrl + shift + enter. Once that is done fill the column from bottom to top:
=IFERROR(IF(D17=1,AVERAGE(INDEX(B12:B17,MATCH(2,1/(FIND(1,D12:D17)))),INDEX(B12:B16,MATCH(2,1/(FIND(1,D12:D16)))-COUNTIF(D17,"=0")),INDEX(B12:B15,MATCH(2,1/(FIND(1,D12:D15)))-COUNTIF(D16:D17,"=0"))),0),"")
This takes averages the most recent three values and allows for a skip of up to three time steps of outlier data per your problem statement. For an idea of how the completed sheet looks:
This was a fun challenge, I have some ideas for a more efficient formula but this should get the job done. Please let me know how this works for you!
Cheers
[EDIT]
An alternative approach which allows the user to specify the number of previous entries to include is detailed below. This is a more general (preferred alternative) and picks up in place of the previously described step 3.
3Alt) In cell G2 enter a number of previous values to average, for this example I am sticking with 3. In cell E4 enter the following array expression (ctrl+shift+enter) and drag to the end of column E:
=IFERROR(IF(D4=1,SUM(INDEX(D:D,LARGE(($D$4:D4=1)*ROW($D$4:D4),$G$2)):D4 * INDEX(B:B,LARGE(($D$4:D4=1)*ROW($D$4:D4),$G$2)):B4)/$G$2,0),"")
This uses the LARGE function to find the 'nth' largest value, where n is the number of preceding values from the current time-step to average. Then it builds a range that extends from the found cell to the current time step. Then it multiplies the flags (0's and 1's) by each month's production value, sums them and divides by n. In this way months flagged as bad are set to 0 and not included in the sum.
This is a much cleaner way to achieve the desired result and has the flexibility to average different periods of time. See example of the final value below.

Excel: Allocating Streams to Students based on GPA and Choice Preference

I have a total of 64 students to be allocated to 19 different streams on the basis of their GPA. Students have also indicated their preference for the streams. Only 5 students can be allocated to first three streams (A-C), 4 students to Stream D and the remaining fifteen streams get 3 students each. I have added the data to an excel sheet but have no idea how to process that to achieve the above results. The data is organized like this. I'm happy to share the original file but couldn't find way to upload that.
This could be solved automatically in a vba macro but the slightly faster version is doing it a bit fast and dirty with formulas. It starts like you have already, ranking by gpa.
Insert 2 rows for each student. In this row you will calculate the "filling" of the streams. If you have a bazillion students, you should write a small macro to copy paste so you get every other line. A shame that excel does not accept pasting empty values as no-change-values.
Either change the rankings by search/replace or create a new sheet where the "desired allocation" value is HIGHEST for the MOST WANTED stream.
Cell C2 should be like this:
=20-'originalsheet'!C2
We are now working on the new sheet
So you have these rows: (student - allocation - availability - student - allocation - availability)
In the allocation line, you will fill the allocation of the student above, in the availability row you will fill the current available spaces.
The first student gets allocated to his or hers choice.
The allocation row gets a formula that investigates if the current cell has the lowest possible value. So. Row2: first student, Row3: Allocation, Row4: new availability, Row5: Students wishes, Row 6: We use this formula in cell C6
=if(sumproduct(max(($C5:$U5)*(($C4:$U4)>0)))=C6;1;0)
Then the next line, new availability is simply
=C4-C6
copy paste the formulas downwards. You may get trouble with students on exactly similar GPAs but your ranking from top to bottom is what sets the priority in this scheme. I don't have the original file and I don't have time to build it from an image but it looked like it worked for my tiny test-setup.
Slightly easier than doing it by hand, but again; if you have a bazillion students and are doing this allocation often - ask in the vba section.
Summary:
Rank by GPA
Change so highest wish is highest number
Insert 2 rows for each student
Calculate manually for first student, should not be difficult...
Use formulas and make certain the cell references are not bungled up.

Resources