In Excel: How to built a graph with time as X, names as Y with multiples series? - excel

I am looking for a way to make a specific graph in Excel and I can't find a solution in Excel or on the web.
I have data about an online training with people completing parts of a course at a certain time:
FullName
Course
TIME
Name-A
Part 1
23/03/2022 10:38
Name-A
Part 2
23/03/2022 12:07
Name-A
Part 3
23/03/2022 16:55
Name-B
Part 1
11/03/2022 15:14
Name-B
Part 2
22/03/2022 12:08
Name-B
Part 3
28/03/2022 16:06
Name-B
Part 4
30/03/2022 14:55
Name-B
Part 5
18/04/2022 08:13
Name-C
Part 1
11/04/2022 15:25
Name-C
Part 2
20/04/2022 13:50
I would like to have a specific graph of this data:
On the vertical axis: one row for each user' name: Name-A, Name-B and Name-C.
On the horizontal axis: continuous time (say, in days) From the minimum time in the table (or less) to the maximum (or more)
Series of plots for the data: Each part of the course (from Part 1 to Part 5 here) would be a series of dots of a specific color, placed on the right row (for a learner's name) above the corresponding time on the horizontal axis.
Do you have any idea on how it could be achieved?
All the best, R.S.
Edit: The table does not appear as in the preview so i try to add a screenshot:
Screenshot of the table

So one way to visualise this as mentioned in the comments is to create a separate series for each person and show passing each part of the course as a vertical step:
It's based very loosely on this but I've set each day in the date range as the x-coordinates and used a lookup to transform the data in H2
=RIGHT(XLOOKUP($G2+TIME(23,59,59),FILTER($C$2:$C$11,$A$2:$A$11=H$1),FILTER($B$2:$B$11,$A$2:$A$11=H$1),0,-1))+(COLUMN()-COLUMN($G$1))*10
pulled down and across to give
Explanation
The data for the graph has dates spanning the times in the raw data for its x-coordinates (column G). I generated it manually but could have used Sequence in Excel 365.
There are three columns of y-values, H to J, generating a separate series for each person. The three lines are initially spaced out by 10 units based on the column number. In the formula above, the raw data is filtered by the person's name so the headers in columns H, I or J match the names in column A in the raw data. Xlookup is used with 'next smallest' match so where the date in column G is greater or equal to the date/time in column C it will return the corresponding course from column B. Because column C actually contains date/times, I have added almost 24 hours when matching the date in column G to make sure that a match is found if the day is the same, regardless of time. In a case like Name-A, where three courses are completed in the same day, this will automatically select the last one (Part 3). Then I take the right-hand character of the course name (which is a digit in the sample data) and add it to the relative column number multiplied by 10. If there is no match, Xlookup returns zero so you just get the initial value for each series (10, 20 or 30), otherwise the result will be an increase by one unit each time a course is passed. If you couldn't assume the last character of the course name was a digit, you would need a lookup to assign a number to each course name.
The data is then plotted on a scatter graph with points joined by straight lines. I had to adjust the x-axis manually to make the range correct and the labelling clearer.
This could be done without Excel 365, probably using Aggregate to get the highest row number with a condition on the name and date.
EDIT
I could have achieved the same result much more easily using Countifs to find how many courses had been passed by a certain person by a certain date:
=COUNTIFS($A$2:$A$11,H$1,$C$2:$C$11,"<="&$G2+TIME(23,59,59))+(COLUMN()-COLUMN($G$1))*10
This wouldn't have needed Excel 365. If you needed to give different courses different weightings, you could do this with a sumproduct and a lookup, also fairly straightforward.

Related

Excel formula to split an amount per year depending on expenditure days within a date range

I am in the process of building a formula to split a total cost (in column J) based on start and end expenditure periods that can vary from 2021 to 2031. Based on the days between the expenditure period dates (column M), I managed to work out to split the cost using the formulas below up to 2023 but it is not consistent and at times incorrect.
In cell P5 I have the following formula. For year 2021, I seem to get the correct split result.
=IF($K5>AS5,0,$J5/$M5*(AS5-$K5))
In cell Q5, I have the following formula. For year 2022, I seem to get the correct spit as well
=MIN(IF(SUM($N5:P5)>=$J5,0,IF($L5>=AS5,$J5/$M5*(AS5-AR5),$J5/$M5*($L5-MAX(AR5,$K5)))),K5)
However, I don't get the right result in cell Q6 which has the same formula but different dates
=MIN(IF(SUM($N6:P6)>=$J6,0,IF($L6>=AS6,$J6/$M6*(AS6-AR6),$J6/$M6*($L6-MAX(AR6,$K6)))),K6)
Cell R6 shouldn't return any result because it is out of date range. This is where things get mixed up.
Note that from column AR to BC, it is all year end dates from 2020 to 2031 as shown below.
Is there a better way to tackle this sort of formula as I seem to get dragged into a long and unreliable way of doing this.
Here single function(♣) that will create a series of pro-rata multipliers (of appropriate length) for any given start/end date:
EDIT: see end of soln for extended version per OP comment to original soln...
SINGLE FUNCTION
=J11*(LET(dates,EDATE(DATE(YEAR($K11),1,1),12*(SEQUENCE(1,YEAR($L11)-YEAR($K11)+2,1))),IF(dates<K11,K11,IF(dates<L11,dates,L11)))-LET(dates,EDATE(DATE(YEAR($K11)-1,1,1),12*(SEQUENCE(1,YEAR($L11)-YEAR($K11)+2,1))),IF(dates<K11,K11,IF(dates<L11,dates,L11))))/(L11-K11)
It may appear somewhat unwieldy in length, but it is far more robust (and concise) compared to the combination of steps/series you have created. What's more, it returns the precise answer RE: pro-rata payments and is guarenteed to never over/under-run RE: total payment (by design).
BREAK-DOWN
Comprises 3 distinct parts (some of which are similar in pattern/formation):
1] First part - create a series (array) of years spanning start-end dates:
=LET(dates,EDATE(DATE(YEAR($K5)-1,1,1),12*(SEQUENCE(1,YEAR($L5)-YEAR($K5)+2,1))),IF(dates<K5,K5,IF(dates<L5,dates,L5)))
Thanks to the lovely Spill functionality the new Office 365 variant Excel boasts, you never have to worry about how many years are required -- so long as you have the space to the right of this workbook (would be unusual otherwise - assuming you start in column O and clear any content to the right of this, you'd need an end date beyond the year 2557 (26th century) to run out of columns! ☺
2] Second part is merely a replica of the firs series, albeit shifted to the right 'once' (so starts with the 2nd element in the 1st series):
=LET(dates,EDATE(DATE(YEAR($K5),1,1),12*(SEQUENCE(1,YEAR($L5)-YEAR($K5)+2,1))),IF(dates<K5,K5,IF(dates<L5,dates,L5)))
3] Third part - you have the basic ingredients from parts 1 and 2 to complete the required task easily: simply deduct series 2 from 1 (giving days between successive dates in series 1 - i.e. days for each year between start and end dates), divide by total days (to yield pro-rata multipliers), and then multiply these by the total £amount and voila - you have your series!
=J5*(O6#-O5#)/(M5)
♣ Caveat(s) - assuming you have Office 365 compatible version of Excel (which is quite common nowadays)
*EDIT - EXTENDED VERSION
Given above, the following extends this to align monetary results (1st table, o11:w21) within respective calendar period columns (spanning the entire period in question).
This soln:
Determines header row based upon the number of columns & corresponding calender periods (financial yrs commencing 1/1) as an array function (i.e. dynamic range)
Utilises a modified version of the eq. provided for dates arrage (refer: "First Part", original soln)
Comment - same caveats as before - i.e. Office 365 etc.
Screenshot(s)/here refers:
DATES (HEADER) - Y10 (array)
=LET(y_,MIN(K11:K21),x_,MAX(L11:L21),EDATE(DATE(YEAR(y_)-1,1,1),12*(SEQUENCE(1,YEAR(x_)-YEAR(y_)+2,1))))
Comment - enter once within single cell Y10 - i.e. as an array function with Spill to right
ALIGNED/SHIFTED FINANCIALS - Y11:Y21 (each cell in col is an array)
=IFERROR(IF(Y$10#<EDATE(K11,-12),"",IF(Y$10#>EDATE(L11,12),"",INDEX(O11#,1,MATCH(Y$10#,EDATE(DATE(YEAR($K11)-1,1,1),12*(SEQUENCE(1,YEAR($L11)-YEAR($K11)+2,1))),0)))),"")
Comment - enter this as an array fn. (#SPILL! to the right) in each cell within column Y (can drag this function down Y11:Y21 as required)

Find row which has one cell similar and other cell different than in another row

Let's say I have this:
A B
1 10 20
2 12 30
3 25 15
4 40 30
How do I find the row which have same value in column B and different value for column A when compared to all the rows above or below ?
I want to find this cell:
A2:B2
Update: NO revision necessary
Following feedback I have tested this equation (below) with 20k rows (link below) - happy to report back results as expected/all still in order. No changes necessary/warranted. This function works just fine/as expected. Beaut!
Explanation:
When testing large samples of data of type 'integer' (say) that range a common order of magnitude/size (i.e. have material probability of re-occurring), the probability of obtaining a unique value for field A (col B, below screenshot) reduces, due to the law of large numbers (variance is what leads to unique values, and this reduces as the sample size increases).
As a consequence, one may encounter results = !Calc# which simply means 'no unique values could be found in col A (or they could but only for when col C was also unique - although the probability of this is remote, it's mainly due to numerous other cells in respective columns containing identical data. Throw a negative 100 in column A (assuming all other values are positive real/integer number plane), and you should see my eqn. below return '-100' and whatever the corresponding 'col-C' data is (assuming that is not unique too, as I have mentioned)...
NOW - back to the solution already! :)
ORIGINAL SOLN:
This will give you back every such combination (besides {12,30} there is also {40,30}):
=FILTER(B2:B5&"-"&C2:C5,(COUNTIFS(B2:B5,"="&$B$2:$B$5)=1)*(COUNTIFS(C2:C5,"="&$C$2:$C$5)>1))
OneDrive excel-linked spreadsheet for your convenience here, taking careful note of restrictions per 1st comment to this proposed soln.
Screenshot
Notes
Assumes you have Office 365 version of Excel

Distribution of time values randomly in a table Excel - Modeling Power Grid

I am working on a model of charging load of electric vehicle. I am attaching a link to an excel workbook for your better understanding.
Column B contains random time values
Column G to P represents houses and each house can have 1 car. So the each time values needs to be distributed in one column. Now when a car is plugged in, its load stays constant for 3 cells.
I want excel to randomly distribute these cars e.g. 4 cars to 4 houses and leave others blank.
what i can think of is, to assign each time a random house then use IF formula with AND function to match random times with time series and second condition to match random houses with columns 1-10.
the problem i am facing is, the formula gives a value error and only works in the rows with has random generated time in front of them screenshot. I know there is a very small thing that i am missing. please help me find it
Regards
workbook
=IF(ISNA(MATCH(G$5,$C$6:$C$9,FALSE)),"",IF(AND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))>=$F6,INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))<=$F6+TIME(0,30,0)),11,""))
The two elements in the AND find the house number in column C and return the corresponding time in column B.
The first element compares the time in F to that time. The second element compares the time + 30 minutes to F (three cells). If it's between those two times, it gets an 11.
The ISNA makes sure that the house in question is on the list. You could also use an IFERROR, but I prefer the precision of ISNA.
Update
If you want the values to wrap around, you need to OR compare to the next day.
=IF(ISNA(MATCH(G$5,$C$6:$C$9,FALSE)),"",IF(OR(AND(ROUND($F6,5)>=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE)),5),ROUND($F6,5)<=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))+TIME(0,30,0),5)),AND(ROUND($F6+1,5)>=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE)),5),ROUND($F6+1,5)<=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))+TIME(0,30,0),5))),11,""))
That formula structure looks like
=If(isna(),"",if(or(and(today,today),and(tomorrow,tomorrow)),11,"")
This formulas already getting too big. If you triple it for your three voltages, it will be huge. You should consider writing a UDF in VBA. It won't be as quick to calculate, but will probably be more maintainable.
If you want to stick with a formula, you could put the wattage in row 4 above the house number. Then in another table, list the wattages and minutes to charge. So in, say, B12:C14 you have
3.7 120
11 30
22 15
Now where you have 11 in your formula, you'd have G$4 and the two placed you have TIME(0,30,0), you'd have TIME(0,INDEX($C$12:$C$14,MATCH(G$4,$B$12:$B$14,FALSE)),0). I re-arranged some stuff to make it more 'readable' (but it's still pretty tough) and here's the final formula
=IF(ISNA(MATCH(G$5,$C$6:$C$9,FALSE)),"",IF(OR(AND(ROUND($F6,5)>=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE)),5),ROUND($F6,5)<=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))+TIME(0,INDEX($C$12:$C$14,MATCH(G$4,$B$12:$B$14,FALSE)),0),5)),AND(ROUND($F6+1,5)>=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE)),5),ROUND($F6+1,5)<=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))+TIME(0,INDEX($C$12:$C$14,MATCH(G$4,$B$12:$B$14,FALSE)),0),5))),G$4,""))

Spotfire: calculating difference between two rows in same column based on attributes in different columns

I am new to Spotfire and need help in getting the right expression for a calculated column.
My Data contains different subjects grouped in column ID. For every ID, Bodyweight was measured on different days. Days are given in column Day and stated as 1,2,3...
The last day is denoted by Last and Bodyweight measurements given in another column. Another column is present which is called Baseline. The Body weight measured is considered as baseline if the column contains a Y for that row.
I need to insert a calculated column, which will contain the difference between Body measurement measured on Day denoted Last and Body measurement marked by Y in column Baseline.
This should be done for every new ID. I am not able to figure this out. Could someone advise me on how to go about it?
Here is an example attached
So, the calculated column for Rita will give -4 (body weight at Last=56 and BodyWeight at baseline=56, so 52-56 =-4)
the sample data you provided is a little weird, particularly the [Day] column. if it's within your control, I suggest to use actual dates rather than a number/string here.
barring that, I was able to get your desired results, but it required two calculated columns: the first one will consolidate the [Day] and [Baseline] columns into a single column, and the second one contains your desired info.
column 1, which I called Day (int):
CASE
WHEN [Day]="Last" THEN 1000000
WHEN [Baseline]="Y" THEN -1000000
WHEN [Day]!="Last" THEN Integer([Day])
END
I picked a random high and low max to establish a chronological order. this will put 1000000 in place of "Last" (if you have any programs that are longer than one million days, you'll need to increase this number). the same for the [Baseline] column, but that value will be -1000000, which is presumably the lowest value you will ever see in this column. both of these are assumptions and may not work for your implementation. finally, in all other cases, the day number will be used.
column 2, which I called Diff:
Last([Weight]) OVER (Intersect([Name],LastNode([Day (int)]))) -
First([Weight]) OVER (Intersect([Name],FirstNode([Day (int)])))
the first line uses what's called an OVER expression to retrieve the first value for [Weight], ordered by [Day (int)], per [Name]. the second line gives the reverse of that, and so the difference is calculated as -4 (or whatever is appropriate).

AverageIf and Multiple data strings

I'm involved with a youth football tournament on the referee side, with assessing/coaching the referees. I've just taken over doing the data entry for the referees assessment scores which we then use to determine who gets finals etc and am looking to extract more usable information from the data to help us identify trends.
I've got (up to) 200 referees, each receiving from none to two assessment scores each day for 5 days. The scores are entered as both the raw mark and the weighted mark based on match difficulty (along with a host of other data about the match that isn't relevant to this issue.
I can extract the average mark (raw and weighted) across all referees without issues and have done so using the below formula, which is the raw average mark:
=AVERAGE(Working!AK4:AK200,Working!BK4:BK200,Working!CL4:CL200,Working!DL4:DL200,Working!EM4:EM200,Working!FM4:FM200,Working!GN4:GN200,Working!HN4:HN200,Working!IO4:IO200,Working!JO4:JO200)
But I also want to extract the average mark (raw and weighted) across two subsets - Academy and non academy referees, to help plot trends and determine where resources need to be utilised.
I've attempted to use an AVERAGEIF formula, but am getting a #VALUE! return. This is the formula that I've attempted to use to return the average raw mark for those referees in the academy:
=AVERAGEIF(Working!G4:G200,Working!G4:G200="Yes",(Working!AK4:AK200,Working!BK4:BK200,Working!CL4:CL200,Working!DL4:DL200,Working!EM4:EM200,Working!FM4:FM200,Working!GN4:GN200,Working!HN4:HN200,Working!IO4:IO200,Working!JO4:JO200))
If I do the same formula as above, but without the brackets around the [average_range], I get a 'you've used too many arguments, and it highlights BK200.
From what I've been able to find so far online, it seems that the formula I'm trying to use would only work if ALL the cells in (Working!G4:G200) returned "Yes". However if there are only 50 academy referees as indicated by "Yes" in G column, then I want those specific scores to be averaged, and the inverse for the non-academy referees.
I thought about having another sheet, which would simply contain populate from Column G (a simple =G4 and then populated down to =G200 next to all of the scores), consolidated into a block of raw marks columned under Assessment 1, 2, 3, 4.... and then the same for all of the weighted marks which would populate from the equivalent cell on the working sheet, but there's a lot of filtering, and re-sorting that goes on on the working sheet, and I'm not 100% certain that that wouldn't cause issues.
Any feedback on how to work through this problem, so that I can display the overall average mark for academy and non-academy referees in both raw and weighted form would be much appreciated, and I apologize if this post is rather convoluted.
I don't think there is a neat solution if the scores are in several columns which are not consecutive.
My suggestion is:-
(1) Work out the sum for each column separately and total them up
(2) Work out the count for each column separately and total them up
(3) Divide Sum by Count to get Average.
In my small example below with 3 referees and 3 columns:-
(1) In K2:-
=SUMIF(H2:H4,"Yes",B2:B4)+SUMIF(H2:H4,"Yes",D2:D4)+SUMIF(H2:H4,"Yes",F2:F4)
(2) In K3:-
=COUNTIFS(B2:B4,">=0",H2:H4,"Yes")+COUNTIFS(D2:D4,">=0",H2:H4,"Yes")+COUNTIFS(F2:F4,">=0",H2:H4,"Yes")
(3) In K4:
=K2/K3
This would include any zero scores (if this is possible) but exclude any blanks.
You can then scale it up to your data.
Beyond this, you would have to change the data structure either
(1) Add a row to label the columns that you want to average e.g.
Score 1 Score 2 Score 3
3 0 3
so you could pick up only the columns labelled 3 say
Here's how it would be in my small example:-
In K3:-
=SUM((B$2:F$2=3)*($H3:$H5="Yes")*B3:F5)
Which is an array formula and must be entered with Ctrl-Shift-Enter
In K4:-
=SUM((B$2:F$2=3)*($H3:$H5="Yes")*(B3:F5<>""))
another array formula
In K5:-
=K3/K4
This is how the columns you want are labelled with a 3 in row 2, so it ignores the other columns:-
(2) Consolidate them into another sheet as you suggest.

Resources