Average values in a column by Class - excel

I need to create something that is used to track and report class ratings. The class evaluation form has eight questions and the ratings are from 1-4. We need to show the average rating for each question per class. It will be an ongoing list so needs to have functionality that allows users to continue to add new class ratings without re-formatting the spreadsheet.
Example, Class PD100 has a total of 5 for Q1. I need to show the average score of 1.6 for that question. Then the same for all the other questions grouped by Class.
Class # Q1 Q2 Q3 Q4
PD100 1 2 3 1
PD100 3 2 3 4
PD100 1 2 3 1
PD200 2 1 2 2
PD200 1 2 3 4
PD200 1 4 1 4
PD300 1 4 4 1

This is a bit unwieldy as the data is already laid out in a pivot, but the total average columns give you what you're looking for in creation of a pivot table.
if the data was laid out class question answer the pivot could be cleaner.
.

Related

Using apply to modify different dataframe in pandas

I'm running into some issues regarding the use of apply in Pandas.
I have a dataframe, where there is multiple measurements made on certain days, on different measurement sites. To give an example, Site 1 has 2 measurements to make, every 7 or so days.
We know it has to be 2 measurements. So what I'm trying to do now is to check where on which days there were not enough measurements made.
site measurement expected date
0 1 2 1 01-01-2020
1 2 3 2 01-01-2020
2 3 4 2 01-01-2020
3 3 5 2 01-01-2020
4 2 1 2 08-01-2020
5 2 4 2 08-01-2020
I've made a sorted and aggregated DataFrame, that has aggregated the measurements, to basically be able to iterate over the measurements as to not go over the same days twice when there are multiple measurements.
site measurement expected date
0 1 2 1 01-01-2020
1 2 3 2 01-01-2020
2 3 9 2 01-01-2020
3 2 5 2 08-01-2020
For a function, I'm now using the function filter_amount(df_sorted, df, group).
group is to count the amount of measurements made df.groupby(["site_id", "date"]).count()
site measurement date
0 1 2 01-01-2020
1 2 1 01-01-2020
2 3 2 01-01-2020
3 2 2 08-01-2020
The current function basically goes like this:
def filter_amount(df_sorted, df, group):
for i in df_sorted.index:
"Locate amount of measurements for this day and site actually made, in group"
Check how many measurements are expected.
If not enough measurements:
find all measurements in normal df and drop them
So in this example, the measurements from site 2 on 1-1-20 have to be dropped, because there are not enough measurements. The ones from 8-1-20 are valid, because they expect 2, and 2 happen.
The problem is that this is extremely slow with over 500k rows.
The variables I need, I get by using .at[], but I'm trying to make it faster by using apply so I can parallelize the operations, but can't figure it out. I'm doing the apply on df_sorted and passing the arguments needed in, but it's not actually dropping the measurements from the original df.
I have a feeling that it's possible to do it with some sort of groupby on the original df to save operations..
I hope it's clear enough, happy to elaborate any questions.

Operation over several columns

I am wondering , if I can write a formula which would operate over several columns, e.g. I want to calculate the amount of males in the school and I have a table:
A B C
Class Sex Number
1 male 3
2 male 4
1 female 6
2 female 5
Right now I have to break the operations into parts:
=(B2="Male")*C2 - additional column and then
=SUMME(D2:D5)
I want to do it at once. It seems like a trivial functionality, but I can not figure it out, how I can do it in one formula.

Compare multiple data from rows

I'm looking for a way to compare multiple rows with data to each other, trying to find the best possible match. Each number in every column must be an approximately match the other numbers in the same column.
Example:
Customer #1: 1 5 10 9 7 7 8 2 3
Customer #2: 10 5 9 3 5 7 4 3 2
Customer #3: 1 4 10 9 8 7 6 2 2
Customer #4: 9 5 6 7 2 1 10 5 6
In this example customer #1 and #3 is quite similar, and I need to find a way to highlight or sort the rows so I can easily find the best match.
I've tried using conditional formatting to highlight the numbers that are the similar, but that is quite confusing, because the amount of data is quite big.
Any ideas of how I could solve this?
Thanks!
The following formula entered in (say) L1 and pulled down gives the best match with the current row based on the sum of the absolute differences between corresponding cells:-
=MIN(IF(ROW($C$1:$K$4)<>ROW(),(MMULT(ABS($C1:$K1-$C$1:$K$4),TRANSPOSE(COLUMN($C$1:$K$4))^0))))
It is an array formula and must be entered with CtrlShiftEnter.
You can then sort on column L to bring the customers with lowest similarity scores to the top or use conditional formatting to highlight rows with a certain similarity value.
EDIT
If you wanted to penalise large differences in individual columns more heavily than small differences to try and avoid pairs of customers which are fairly similar except for having some columns very different, you could try something like the square of the differences:-
=MIN(IF(ROW($C$1:$K$4)<>ROW(),(MMULT(($C1:$K1-$C$1:$K$4)^2,TRANSPOSE(COLUMN($C$1:$K$4))^0))))
then the scores for your test data would come out as 7,127,7,127.
I'm assuming you want to compare customers 2-4 with customer 1 and that you are comparing only within each column. In this case, you could implement a 'scoring system' using multiple IFs. For example,:
A B C D E
1 Customer 1 1 1 2
2 Customer 2 1 2 2
3 Customer 3 0 1 0
you could use in E2
=if(B2=$B$1,1,0)+if(C2=$C$1,1,0)+if(D2=$D$1,1,0)
This will return a 'score' of 1 when you have a match and a 'score' of 0 when you don't. It then adds up the scores and your highest value will be your best match. Copying down would then give
A B C D E
1 Customer 1 1 1 2
2 Customer 2 1 2 2 2
3 Customer 3 0 1 0 1
so customer 2 is the best match.

Removing Excel Pivot-table columns but retaining grand total

I'm using Excel pivot-tables to produce a report. The pivot-table connects to a SSAS cube. I have 2 measures- measure 1 is a 'real' measure, measure 2 is calculated based upon measure 1. Measure 1 must be shown broken out by dimB members across the columns. WIth Measure 2 I just want the totals column.
I've hidden the measure 2 columns as a workaround but this is less than ideal as when users expand or contract the dimension B members the pivot-table moves relative to the hidden columns and the report becomes a mess. It's also returning extra data which can't help performance.
Here is what I have:
Measure 1 Measure 2 Measure 1 Total Measure 2 Total
a b c a b c
DimA- member1 2 3 4 2 3 4 9 9
DimA- member2 1 4 5 1 2 5 10 8
This is what I want:
Measure 1 Measure 1 Total Measure 2 Total
a b c
DimA- member1 2 3 4 9 9
DimA- member2 1 4 5 10 8
Is there a way to achieve the second option? Either with perhaps some mdx on the calculated measure (scope/custom rollup etc) or with the pivottable itself?
Basically I want the total without the dimension B breakdown for measure 2.
I don't know that technique, and afraid, that it's impossible.
What if to NULLify all members in scope at least for the lowest levels?
Something like this:
SCOPE ([Measures].[Measure 2 Hide]);
THIS = IIF(Axis(1).Item(0).Item(0).Hierarchy.Level.Ordinal=1
,[Measures].[Measure 2],null);
END SCOPE;
Or <2 instead of =1. Measure 2 will be at least empty for users.
Here is my Excel example (measure is called Measure 2 Hide):
CREATE MEMBER CURRENTCUBE.[Measures].[Measure 2 Hide]
AS
[Measures].[Count]-1,
VISIBLE = 1;
I understand that it's not a solution, but maybe will help somehow.

Count and average data inside categories under two types of conditions

I'm working on Excel with a lot of data and I'm having difficulty with knowing how to sort through it to get some important numbers. I have minimal Excel experience.
Right now I'm struggling with knowing how to get the average in the difference between two columns. The trick is that I have to get the average in difference when column A is less that column B and then, the same when it's more. And all that within a category.
So for example let's say I have 3 categories: Football, Soccer, and Basketball (these are just made up ones).
So in column A, I have: Soccer, Football, Basketball. Then, in column B and C, I have the scores for John and Adam for the last 3 months, respectively. Lastly, in column D, I have the differences between their scores.
So, for example:
Category John Adam Differences
Soccer 5 3 2
Soccer 6 2 4
Soccer 3 5 2
Soccer 4 0 4
I want to create a table for within each category I have a table like below:
NÂș of cases Avg. Difference between John and Adam
When John's score is >
When John's score is <
When they are equal
Is there some type of formula where I can say something like this:
If the category is Soccer (the category being in column A), take the difference between John's score (column B) and Adam's score (column C) when John's score is larger than Adam's score, then calculate the average of those differences? Then, I would use the same formula but tweak it when John's score is smaller.
Additionally, would there be a formula where I can also, calculate within the category Soccer, how many times John's score is bigger than Adam's?
My data is much larger and I can't do this manually.
A B C D
1 Sport John Adam Differences
2 Soccer 5 3 2
3 Soccer 6 2 4
4 Soccer 3 5 -2
5 Soccer 4 0 4
6 Basketball 20 15 5
7 Basketball 7 13 -6
8 Basketball 26 10 16
9 Basketball 8 11 -3
Type in D1:
=B1-C1
Drag the formula in Column D to all rows which there are values in columns A, B and C.
Create the PivotTable.
Drag Sport to "Row Label" field. Drag Differences to "Row Label" field under Sport.
Drag Differences to "Values" field as: Count of Differences (same way the previous question)
Drag Differences to "Values" field (below Count of Differences), and set the mathematical operation as "Average" of Differences (left-mouse click Differences, choose "Values fields settings" and select "Average").
Give a right-click mouse in cell A5 (see picture bellow) and select "Group" option.
Set "Starting at" = 0; "Ending at" = 1000; "By" = 1000 (as in the picture below). Click ok.
You will have in each Sport, the count (frequency) and average Differences values for two groups:
When the Difference B1-C1 is negative; and
When the Difference B1-C1 is zero or positive.
The average of Differences when the score is equal will be always zero.

Resources