I have the following problem:
A report is composed by 3 different metrics and each metric can be obtain from different place.
Example:
A Report is composed by 3 Metrics: Metric1, Metric2, Metric3.
Each metric is obtain from different C. for example Metric1 is obtain from C1 and C2, and Metric2 is obtain from C1, Metric3 is obtain from C2 .
I made this 2 solutions to the problem I wonder which one is more correct if any or if is there another solution:
Related
I'm trying to understand the example presented in Appendix C here
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6481149/
Equation C1 is clear to me.
But in Equation C2 they use the mean values.
Such mean values are clear to me in the case of categorical variables for example 1.548 is the mean value of the Sex variable (as shown in the Table 3). Please correct me if I'm wrong.
But in numerical variables I don't understand which mean values are they using. For example for the Age variable they use 3.768, if I understand right, that value is the log of the mean age, should be log(44.15)=1.64. Instead the used value is 3.768.
Please could anybody clarify where does this value come from?
In statistics log often means the natural logarithm, sometimes denoted ln. The four values they take the logarithms of are:
Variable
Reported Mean
ln(Mean)
Reported
Age
44.15
3.788
3.768
BMI
25.61
3.243
3.230
BP Syst
138.6
4.932
4.913
Pulse Rate
75.61
4.326
4.311
The calculated values are not exactly equal to the reported values. But it looks close enough that this is probably the calculation they used. Without the data and/or code they used it's hard to say why the results are different. The study mentions excluding 130 participants because of ethics protections. So, perhaps one table was calculated using a slightly different group of participants than the other table?
I have a group of players, 915 in total, each with different engagement scores that I am trying to break out into two evenly distributed groups based on their engagement scores. I tried using Solver in excel to set contstraints, indices etc, but unfortunately Solver can only handle 200 variables, and I have 915. Another approach I researched is to give the first group with the best player also the worst player, give the second group the second best player and the second worst player, and so on. Problem is I am not an excel wiz and need some help writing out this formula in excel so both columns A and B show "1's" for the agents that should be selected for both groups in the group A and group B columns in the below screenshot (the screenshot is a small sample of the entire data set, FYI),
Screenshot Here
As you mentioned combination of best and worst player.
Your data is already sorted on descending index. Say, the data is in A,B and C Columns. Just put A in D2 and B in D3.
Select D2 and D3 and once you get + cursor on the bottom right of the selection, double click.
Filter A for group A and B for group B.
I'm trying to find out the affinity among two products based on 16 transactions happened. First I calculated the support, which means no. of times the product bought in all 16 transactions.
Next one is confidence, which is nothing but the occurrence of another product provided the first one has already occurred.
Ex:- Confidence (11 when 7 occurred)=P(11intersection 7)= no. of (11 intersection 7)/no.of occurance of 7= (1/16)/(1/16)=100%
Similarly, for confidence (11 when 3 occurred)=(1/16)/(4/16)=25%
I want to find the confidence of all the products vs other products by applying excel formula. I'm not able to apply the formula to find the confidence among the products. The sheet is attached here. Confidence (expected Output) is the table where I want to find the formula to find confidence automatically.
Sample Data
enter image description here
enter image description here
P(X) and P(X|Y) could be done by countif() and countifs(). To identify the correct column, use offset() and match()
with the location of data as
the formula in cell J3 of the Confidence matrix is:
=COUNTIFS(OFFSET($A$3:$A$18,0,MATCH($I3,$B$2:$G$2,0)),"=TRUE",OFFSET($A$3:$A$18,0,MATCH(J$2,$B$2:$G$2,0)),"=TRUE")/COUNTIF(OFFSET($A$3:$A$18,0,MATCH(J$2,$B$2:$G$2,0)),TRUE)
which you can use to fill the rest of the matrix.
I have researched thoroughly before posting and still can't figure this out.
I currently have a file that uses an INDIRECT logic to gather information from one worksheet to a summary worksheet. The file works perfectly as is. But I'm trying to make it more efficient as right now calculating times are huge due to so much volatility.
In the data worksheet I have a lot of columns, each with a concept and an account number assigned to it.
So, for example:
5014255 5324232 5566544
Name Store Salary Taxes Other Benefit P1 Weight P2 Weight
John Main 222222 50000 30000 0.4 0.6
Jane Annex 222224 50002 30004 0.3 0.7
Then in the summary page, I use a SUMPRODUCT to gather the information per store per account and separating by period, like this as:
Store Account Concept P1 P2
Main 5014255 Salary 88888.8 133333.2
Main 5324232 Taxes 20000.0 30000.0
Main 5566544 Other Benefit 12000.0 18000.0
Annex 5014255 Salary 66667.2 155556.8
Annex 5324232 Taxes 15000.6 35001.4
Annex 5566544 Other Benefit 9001.2 21002.8
This of course is just a basic example. The real file has several more validations and several categories of weights that are used for different accounts. In order to pull the information I have the following formula.
=IFNA(SUMPRODUCT(
INDIRECT(INDEX("'Sheet1'!"&ADDRESS(6,COLUMN($A$5)+MATCH($N5,Sheet1!$1:$1,0)-1,4)&":"&SUBSTITUTE(ADDRESS(1,COLUMN($A$5)+MATCH($N5,Sheet1!$1:$1,0)-1,4),"1","")&$AU$1,1,1)),
--(Sheet1!$C$6:$C$1754=2018),
--(Sheet1!$J$6:$J$1754=$I5),
IF($G5=1,Sheet1!$BN$6:$BN$1754,IF($F5=1,Sheet1!$BA$6:$BA$1754,Sheet1!$AN$6:$AN$1754))
),0)
The important piece of code is:
INDIRECT(INDEX("'Sheet1'!"&ADDRESS(6,COLUMN($A$5)+MATCH($N5,Sheet1!$1:$1,0)-1,4)&":"&SUBSTITUTE(ADDRESS(1,COLUMN($A$5)+MATCH($N5,Sheet1!$1:$1,0)-1,4),"1","")&$AU$1,1,1))
This uses string manipulation to generate the column range according to the account number. So for example, the account number 5014255 would generate Sheet1!$C$6:$C$1754, so that the values in that column can be gathered according to weight and validations.
So, I'm trying to change the logic to remove volatiles. But so far I haven't been able to do it.
I basically need my summary to find the column where the account number is, so it can do the sumproduct calculations with that column.
Any ideas or suggestions on how to work around this?
To get the full column without volatiles use:
INDEX(Sheet1!$C$6:$E$1754,0,MATCH($N5,Sheet1!$C$1:$E$1,0))
This will return the correct column to the SUMPRODUCT formula
I know this will require an advanced custom expression - I have 2 relevant columns, commodity and sub-commodity. A single event can have multiple commodities and sub-commodities. However, some commodities will naturally have more data about them. I don't want this to skew representation of the data.
I want to avoid that by counting values by their unique sub-commodity. In the below example, I want the bar chart to represent two instances of food for Event 1 (1 for meat, 1 for beverages) and one instance of garden for Event 2. How do I accomplish this?
EventID | Commodity | Sub-Commodity
1 Food Meat
1 Food Meat
1 Food Beverages
2 Garden Lawn Mower
2 Garden Lawn Mower
Thanks for the sample data. It helps in providing an answer.
You should be able to do this by using the UniqueCount() function. In this case the Y axis would be the expression UniqueCount([Sub-Commodity]). You can use custom expressions to input this or use the built-in drop down and select UniqueCount from the aggregation drop-down.
Take a look at the below image showing my implementation with a custom expression. Coloring is optional but I did so to show how it included your multiple sub-commodities and how they were counted (1 for Food=>Meat).
See below for an example with the drop-down:
Let me know if you have any questions or need further clarification. Thanks!