Find Confidence via applying excel formula - excel

I'm trying to find out the affinity among two products based on 16 transactions happened. First I calculated the support, which means no. of times the product bought in all 16 transactions.
Next one is confidence, which is nothing but the occurrence of another product provided the first one has already occurred.
Ex:- Confidence (11 when 7 occurred)=P(11intersection 7)= no. of (11 intersection 7)/no.of occurance of 7= (1/16)/(1/16)=100%
Similarly, for confidence (11 when 3 occurred)=(1/16)/(4/16)=25%
I want to find the confidence of all the products vs other products by applying excel formula. I'm not able to apply the formula to find the confidence among the products. The sheet is attached here. Confidence (expected Output) is the table where I want to find the formula to find confidence automatically.
Sample Data
enter image description here
enter image description here

P(X) and P(X|Y) could be done by countif() and countifs(). To identify the correct column, use offset() and match()
with the location of data as
the formula in cell J3 of the Confidence matrix is:
=COUNTIFS(OFFSET($A$3:$A$18,0,MATCH($I3,$B$2:$G$2,0)),"=TRUE",OFFSET($A$3:$A$18,0,MATCH(J$2,$B$2:$G$2,0)),"=TRUE")/COUNTIF(OFFSET($A$3:$A$18,0,MATCH(J$2,$B$2:$G$2,0)),TRUE)
which you can use to fill the rest of the matrix.

Related

Facing difficulty in Excel when creating a logic for moving average of different variables

So, the point is, in my dataset I have to create a variable "Moving Avg. Amt paid per sq. ft." and the formula or the logic I need is to calculate the last five values as per most recent transactions. i.e. most recent sales by date. but this average should only return value in case it matches the same building and same area variable.
This is what my data looks like
Area ID has three categories. Building number has 5 categories. Date is sorted in ascending order. Now my variable moving average should calculate last 5 averages w.r.t date but for the same building in the same area. e.g. there are buildings 1 and 2 in area 102. I need my Mov Avg. variable to calculate using conditions when it matches criteria of building 1 in 102 for past five sales and when it finds building 2 in the building number variable, it should calculate average of last 5 sales of that building in area 102.
So my approach to this issue was (which is flawed at the moment):
I calculate average of amount paid per sq. foot w.r.t area & building based on dates using the formula
=AVERAGEIFS($N$2:$N$6547,$D$2:$D$6547,D14,$C$2:$C$6547,C14,$B$2:$B$6547,B14)
but I cannot make this formula work, to calculate moving average whenever it meets the criteria. I tried the offset the point as well by 5 but the logic is not right and hence its not working and returning #value in the cells. The formula I used to offset the above condition is
=AVERAGEIFS((OFFSET(N13,5,,5)),$D$2:$D$6547,D13,$C$2:$C$6547,C13,$B$2:$B$6547,B13)
(These formulae are used in column Q of my data)
Need a support from the community as I am badly stuck in making this data useful and I am out of any ideas to make this work.
Edit 1: I am not sure how I can attach my excel file here so you may review the dataset. I have uploaded it on a third party site, for which the link is shared below, so you can view the file in detail.
https://file.io/hlciAHJOHzWA
Expected result is as I have mentioned the instruction said
"Create a variable called "mov. avg amt. paid per sq ft". For each row, this variable should calculate average amt paid per sq ft for the most recent past five sales (by date) for the same building in the same area."
And my approach to build a logic or formula to make this variable calculate moving average w.r.t date for same building in the same area doesn't seem to work because there might be some flaws.
In Office 365 you could use:
=LET(f,FILTER($N$1:N13,($B$1:B13=B14)*($C$1:C13=C14),""),
c,COUNTA(f),
s,SEQUENCE(5,,c-5),
IFERROR(IF(c<5,SUM(f)/c,SUM(INDEX(f,s))/5),""))
If there's less than 5 matches prior to the current sales it'll calculate the average of the count. If 5 or more matches it'll calculate the average of the last 5 prior to the current sale.

Excel Sumif, Sumifs with partial strings in multiple columns?

So this is the simplified question I broke down from a former question I had here: Excel help on combination of Index - match and sumifs? .
For this one, I have Table1 (the black-gray one) with two or more columns for adjustments for various order numbers. See this image below:
What I want to achieve is to have total adjustments for those order numbers that contain the numbers in Total Adjustment column in the blue table, each of which will depend on the cell beside it.
Example: Order number 17051 has two products: 17051A (Apple) and 17051B (Orange).
Now what I want to achieve in cell C10 is the sum of adjustment for both 17051A and 17051B, which will be: Apple Adjustment (5000) + Orange Adjustment (4500) = 9500.
The formula I used below (and in the image) kept giving me error messages, and this happens even before I add the adjustment for Orange.
=SUMIF(Text(LEFT(Table1[Order Number],5),"00000"),text(B10,"00000"),Table1[Apple Adjustment])
I have spent the whole day looking for a solution for this and didn’t even come close to find any. Any suggestion is appreciated.
Assuming your headers always have the text "adjustment" in them, you could use:
=SUMPRODUCT((LEFT($B$4:$B$7,5)=B10&"")*(RIGHT($C$3:$F$3,10)="adjustment")*$C$4:$F$7)
In C10 you could add two sumproducts. This assumes that products are always 5 numbers long at the start. If not swop the 5 to use the length of the product reference part you are matching on.
=SUMPRODUCT(--(1*LEFT($B$4:$B$7,5)=$B10),$D$4:$D$7)+SUMPRODUCT(--(1*LEFT($B$4:$B$7,5)=$B10),$F$4:$F$7)
Which with table syntax is:
=SUMPRODUCT(--(1*LEFT(Table1[Order Number],5)=$B10),Table1[Apple Adjustment])+SUMPRODUCT(--(1*LEFT(Table1[Order Number],5)=$B10),Table1[Orange Adjustment])
Using LEN
=SUMPRODUCT(--(1*LEFT(Table1[Order Number],LEN($B10))=$B10),Table1[Apple Adjustment])+SUMPRODUCT(--(1*LEFT(Table1[Order Number],LEN($B10))=$B10),Table1[Orange Adjustment])
I am multiplying by 1 to ensure Left, 5 becomes numeric.

Excel incorrectly calculating the average of numbers when using division

I am trying to work out the average of scores for the male artists. When I tried using the AVERAGEIFs function it returned a #VALUE error and therefore I have created a table to work out the average manually. The expected score is 3.9 however when I use my table which adds up all the scores and then the number of scores which are not blank and divides the total score by number of scores the answer is 3.4.
Could you please advise on why this would be and if there is a way to calculate the average of male artists?
Here is an image of my spreadsheet:
To my knowledge, AVERAGEIFS() works on a single column / single row.
Thus, you need a helping row, showing you the AVERAGE per person, and then taking the average, with conditions such as Gender and Job.
I have 6.5 with this formula:
=AVERAGEIFS(F2:F5 F2:F5,B2:B5,B2,A2:A5,A2)
It simply gives the average of 2 and 11, as they are the only 2 male artists in my sample.
MSDN AVERAGEIFS
If you do not like the idea of additional column, then making a PIVOT table would be a good solution.

AverageIf and Multiple data strings

I'm involved with a youth football tournament on the referee side, with assessing/coaching the referees. I've just taken over doing the data entry for the referees assessment scores which we then use to determine who gets finals etc and am looking to extract more usable information from the data to help us identify trends.
I've got (up to) 200 referees, each receiving from none to two assessment scores each day for 5 days. The scores are entered as both the raw mark and the weighted mark based on match difficulty (along with a host of other data about the match that isn't relevant to this issue.
I can extract the average mark (raw and weighted) across all referees without issues and have done so using the below formula, which is the raw average mark:
=AVERAGE(Working!AK4:AK200,Working!BK4:BK200,Working!CL4:CL200,Working!DL4:DL200,Working!EM4:EM200,Working!FM4:FM200,Working!GN4:GN200,Working!HN4:HN200,Working!IO4:IO200,Working!JO4:JO200)
But I also want to extract the average mark (raw and weighted) across two subsets - Academy and non academy referees, to help plot trends and determine where resources need to be utilised.
I've attempted to use an AVERAGEIF formula, but am getting a #VALUE! return. This is the formula that I've attempted to use to return the average raw mark for those referees in the academy:
=AVERAGEIF(Working!G4:G200,Working!G4:G200="Yes",(Working!AK4:AK200,Working!BK4:BK200,Working!CL4:CL200,Working!DL4:DL200,Working!EM4:EM200,Working!FM4:FM200,Working!GN4:GN200,Working!HN4:HN200,Working!IO4:IO200,Working!JO4:JO200))
If I do the same formula as above, but without the brackets around the [average_range], I get a 'you've used too many arguments, and it highlights BK200.
From what I've been able to find so far online, it seems that the formula I'm trying to use would only work if ALL the cells in (Working!G4:G200) returned "Yes". However if there are only 50 academy referees as indicated by "Yes" in G column, then I want those specific scores to be averaged, and the inverse for the non-academy referees.
I thought about having another sheet, which would simply contain populate from Column G (a simple =G4 and then populated down to =G200 next to all of the scores), consolidated into a block of raw marks columned under Assessment 1, 2, 3, 4.... and then the same for all of the weighted marks which would populate from the equivalent cell on the working sheet, but there's a lot of filtering, and re-sorting that goes on on the working sheet, and I'm not 100% certain that that wouldn't cause issues.
Any feedback on how to work through this problem, so that I can display the overall average mark for academy and non-academy referees in both raw and weighted form would be much appreciated, and I apologize if this post is rather convoluted.
I don't think there is a neat solution if the scores are in several columns which are not consecutive.
My suggestion is:-
(1) Work out the sum for each column separately and total them up
(2) Work out the count for each column separately and total them up
(3) Divide Sum by Count to get Average.
In my small example below with 3 referees and 3 columns:-
(1) In K2:-
=SUMIF(H2:H4,"Yes",B2:B4)+SUMIF(H2:H4,"Yes",D2:D4)+SUMIF(H2:H4,"Yes",F2:F4)
(2) In K3:-
=COUNTIFS(B2:B4,">=0",H2:H4,"Yes")+COUNTIFS(D2:D4,">=0",H2:H4,"Yes")+COUNTIFS(F2:F4,">=0",H2:H4,"Yes")
(3) In K4:
=K2/K3
This would include any zero scores (if this is possible) but exclude any blanks.
You can then scale it up to your data.
Beyond this, you would have to change the data structure either
(1) Add a row to label the columns that you want to average e.g.
Score 1 Score 2 Score 3
3 0 3
so you could pick up only the columns labelled 3 say
Here's how it would be in my small example:-
In K3:-
=SUM((B$2:F$2=3)*($H3:$H5="Yes")*B3:F5)
Which is an array formula and must be entered with Ctrl-Shift-Enter
In K4:-
=SUM((B$2:F$2=3)*($H3:$H5="Yes")*(B3:F5<>""))
another array formula
In K5:-
=K3/K4
This is how the columns you want are labelled with a 3 in row 2, so it ignores the other columns:-
(2) Consolidate them into another sheet as you suggest.

Picking top 5 scores from a range

I run a small golf eclectic with excel. One of the things we have is a points system. I would like to get the 5 highest points scored over the season and have them ranked from 1 (being the highest points scored) to 5.
My knowledge of excel "sums" goes only a wee bit further than add and subtract.
Thanks!
If you don't want to change the order that they are presently in you can use the LARGE function. It returns the kth largest value.
Below is a great formula, if you drag it down it will automatically get the second, third and nth largest value from a table of data (in this example the data is between A1 to A10).
=LARGE(A1:A10,ROW(A1)-ROW($A$1)+1)
You can then match the values with names or corresponding data from the tables using the MATCH and INDEX functions. The example below would fetch the name for each value from the second column.
=INDEX($A$1:$B$10,MATCH(cell reference with score or value,$A$1:$B$10,2))
Play around with these formulas, they are very convenient for data m
If you have a column containing the scores, you could add a filter (Data->Filter I think) and sort descending.
Though, if you just have rows that are something like [Date][Person][Score] you'll need to go to another sheet and SUM the scores for each person then sort that... Unfortunately my Excel skills aren't up to par to pull a score for each person like that.
Given a list of numbers in A1 to A10, you can work out their 'Rank' relative to each other by using 'RANK'.
e.g.
RANK(A1,A1:A6,0)
RANK(cell, list of cells to check against, order)
For order, 0 = descending.
From there you can work out which one is first pragmatically.
If you have Excel 2007,
Check that your data is continuous, with no blank rows or columns. Click on your scores and then select 'Data - Filter'
Using the dropdown that the filter creates at the top of your scores column and select 'Number filters - Top ten'
A 'Top ten Autofilter' dialog will be displayed, reduce the show 10 to 5 and then click on OK.
For earlier versions of Excel add a RANK formula in a new column. Be careful as the scores need to be sorted, usually into descending order. If there are any ties, they will be given the same ranking number and the subsequent rank number will be incremented by the number of ties. (E.g. If there are two scores of 2, ranked as 5. The next score will be ranked as 7, not 6)
If you want to use the LARGE Function as described above, make sure you put the same range in the list for each of the LARGE functions. That is, change =LARGE(A1:A10,ROW(A1)-ROW($A$1)+1) to =LARGE(A$1:A$10,ROW(A1)-ROW($A$1)+1) or you will get some strange incorrect results

Resources