Bar Chart of student performance - Tableau - graphics

I have this dashboard. In sheet two , the bar chart is just counting the number of marks
in each range specified in the x-axis. What I actually want is a bar chart according to the same range, but it should count the average of marks of each student. In sheet 3, the bar chart looks similar to what I expect, but if you take a look, it's just adding each average of student one above the another.
So, how can I make a char bart with frequency of students average of marks. The ranges should be: [0 , 5>,[5,10>, [10,15>, [15,20].

One solution is to create a custom SQL data connection to first calculated the avg NOTA for each student as below:
select NOMBRES, avg(NOTA) as avg_nota from YOUR_TABLE group by NOMBRES
Then you can create a histogram for avg_nota, either with Show Me or manually.
Here is a link to an example based on your original
The SQL above weighs each score equally, which is fine if each course has exactly the same number of grades. But if the number of records varies between courses, you should adjust the approach to make sure each course is weighted the same (e.g. so that a course with 10 small tests does not get weighted twice as much as a course with 5 larger tests). The solution in that case, might involve repeating the above step in a nested subquery or view grouping by both NOMBRE and CURSO. Still this simple approach should give you the basic idea.
The solution above works but I think there ought to be a way to get the same effect using table calculations without resorting to custom SQL

Related

Using Sumproduct to calculate two tables using horizontal (table headers) and vertical references

Hopefully the title makes some sense because I'm trying to wrap my head around the logic and I'm not quite sure how to phrase the question.I'll try to give a brief explanation of the end goal without over complicating it with unnecessary details.
I have a table of survey score averages for every month per person and a correlating table with the number of surveys each person received for each month. The logic is essentially multiple the score for each month by the number of surveys, combine them, divide by the total number of surveys within that time period to get their true average. Where things get a little complicated is that I have to include the ability to set a custom date range and return the value. So sometimes I might be looking at the average for Jan - Apr, other times I might just be looking at Feb-Mar etc.
I think sumproduct is going to get what I need done but I'm running into issues trying to write it out. I've written it several different ways and none of them worked so here's one that best conveys what I'm trying to do,
=SUMPRODUCT(--(F7:I7,L7:O7>=C2),--(F7:I7,L7:O7<=C3),--(E8:E12,K8:K12=B9),tbl_average[[Jan-20]:[Apr-20]],tbl_surveys[[Jan-20]:[Apr-20]])
I super appreciate any assistance I can get on this. I'm hoping the end result is not nearly as difficult as I'm making it out to be.
Some additional information:
I'm going to be using this same process to calculate multiple metrics across multiple worksheets.In the test example each of the tables will most likely be on different sheets. The dashboard with the calculated results will contain everyone's names and will be filtered and rearranged frequently, so I need to make sure we're always matching directly to their names and not just the relative rows. Basically, in my example I show that Agent 1 is always lined up on row 8 but that's not always going to be the case. Agent 1 could be in Row 8 on Sheet 1, Row 10 on Sheet 2, and Row 12 on Sheet 3 and I need all the correct values to multiply and sum against one another.

PowerBI DAX: logic to use aggregated table as parameter in functions or another workaround to calculate dataset KPI filtered by any field?

In PowerBI, I need to create a Performance Indicator (KPI) measure which evaluates dataset values in a scale from 0 to 1, with target (1) being the MAX value in a 20 years history. It's a national airport trip records open database. The formula is basically [value]/[max value].
My dataset has a lot of fields and I wish I could filter it by any of these fields, with a line chart showing the 0-1 indicator for each month based on the filters.
This is my workaround test solution:
Table 1 - Original dataset: if I filter something here, below tables also update (there are more fields to the left, including YEAR and MONTH
Table 2 - Reference to original dataset, aggregating YEAR-MONTH by the sum of "take-offs" (decolagens)
Table 3 - Reference to above (sum) table, aggregating MONTH by the max of "take-offs" (decolagens)
Table 4 - 'Sum table' merged to 'Max table' by MONTH as new table: then do [Value]/[Max] and we've got the indicator
So if i filter the original dataset by any fields, all other tables update accordingly and the indicators always stays between 0-1, works like a charm.
TL;DR
The problem is: I need to create a dashboard of this on Power Bi. So I need this calculation to be in a measure or another workaround.
My possible solution: by pure DAX code in the measure field, to produce Tables 2 and 3 so I'll divide the month sum values by their month max value (which will both be produced according to PowerBi dashboard slicers) and get the indicator dinamically produced.
I'm stuck at: I don't understand how can I reference a sum/max aggregate table in dax code. Something like = SUM (dataset[take-offs]) / MAX (SUM (dataset[take-offs])). Of course these functions do not work like that, but I hope I made my point clear: how can I produce this four table effect with a single measure?
Other solutions are welcome.
Link to the original dataset: https://www.anac.gov.br/assuntos/dados-e-estatisticas/dados-estatisticos/arquivos/DadosEstatsticos.csv
It's an open dataset, so I guess there's no problem sharing it. Please help! :)
EDIT: please download the dataset and try to solve this. Personally I think it's a quality statistics doubt that will eventually help others. The calculation works, it only needs a Power Bi Measure port.
Add the ALL formula:
Measure = SUMX(ALL('Table'),[Valor])/SUM('Table'[Max])
Example

Two different numbers after an average inside Google Big Query and Data Studio

I am averaging a number, grouped by week inside of Google Data Studio, and i am averaging the same numbers grouped by week inside of Big query however the output is slightly different.
Overall Score
AVG(table.score) OVER (PARTITION BY Weeknum) as OverallScore
The datasource is a list of scores, along with a date. I am averaging this inside DS using the aggregate function within the metric, and using the Time dimension ISO Year Week.
The purpose of this is to have one set of numbers hard coded, whilst the other line is used to filter to different departments, keeping the original "overall" score present to be used as a benchmark.
Exporting my table into excel, i can average it filtered by week 3 (See below) and i it returns 19.59 as well. Meaning, the avg aggregate function inside Datastudio is the same as excel. Also, i can query the table using the below, which rules out an averaging difference inside bigquery. However when i place overall score into the graph below i get slightly different numbers for the overall score..
SELECT avg(overallscore) FROM `dbo.table` where weeknum = '2018 3'
Output = 19.59
Does anyone have an idea what may be causing this?
When you open the report, you should be able to see the query it runs in your query history in Big Query. Check that it's using the same formula as sometimes it uses approximate aggregates.

Creating independent variable Stata in Panel Data model

Data and description of variables
Picture 1 and Sample unbalanced paneldata
Picture 1 shows a balanced panel data that I have created using an unbalanced one provided as a sample in the same image, where I had multiple products (ID) for different amount of years (YEAR). For each product, there were a different number of Shops offering the given product (ID). So as stated, this is a balanced set created by sorting out for the same years, same products (ID), and same shops (marked by the orange area in the sample unbalanced paneldata). This is an important assumption that might affect the perception of the issue stated below. The following is therefore a description of the table shown in Picture 1:
Years indicates the amount of period a product lasts for a given product (ID)
Shop 1, Shop 2, Shop 3 indicate different prices for a given product (ID) by different firms
The minimum and second minimum value depict what shops for a given year and product (ID), have the lowest and second lowest price for that given year. This is needed to calculate the Price difference, which is **(Second minimum value - Minimum Value) / (Minimum Value)
An example of this, is given for row 5 (Year 01.01.1995 - ID 101) where Price difference would be (3999-3790)/3790 = 5,51% (In Picture 1)
Issue
In my balanced panel data, (Picture 1), I want to run a fixed effect regression in STATA using xtreg function, where the dependent variable is the Price difference, and number of shops selling a product are the independent variables. This is, so I can say how Price difference as a dependent variable is affected when there is 1 shop selling, when there are two shops selling, and when there are three shops selling.
Another problem is, is my assumption valid at all of creating a balanced panel? Is it correct to create a balanced from the unbalanced paneldata, or must I use the unbalanced panel to create such a variable?
So my main issue is how to create such independent variables, that measure the dimension of number of shops offering products. To
clarify what I mean, I have included an example of a sample fixed
effect regression that may explain the structure that I attempt to
seek, in Picture 2 below:
NOTE (In picture 2 expected cell mean to the right is the same as Price difference in Picture 1, and is used as dependent variable. They are regressed on number of firms/shops as independent variables, and these I have an issue creating)
Picture 2
What I have tried
I have tried, using dummy variables, on shops, but they ended up getting dropped. The dataset provided in picture 1 is a balanced data set as mentioned, which is needed to run (I assume) a fixed effect regression on a paneldata.
End remark
I stated this question earlier in a much more imprecise manner, where I apologiese for any inconvenience. The problem I think, might be that either I have set it up wrong in excel, hence the dummy's are dropped, or something of that nature. It might also be, that I have to use the unbalanced set in order to create this independent variable, so that might also be a problem, that I am attempting to use a balanced set instead of the unbalanced one.
In your unbalanced sample (as we discussed in the comments, the balanced sample will not make sense) we first need to create a variable for the number of shops offering each ID, let us say we have the same data as in the top portion of your Picture 1
egen number_of_firms = rownonmiss(Shop*)
xtset ID year // to use xtreg, we must tell Stata the data are panel
xtreg Price_difference i.number_of_firms
The xtreg is the regression shown in your Picture 2.
If you want the number of firms variable to be formatted a bit more like Picture 2, you can do something like this:
qui levelsof number_of_firms, local(num)
foreach n in `num' {
local lab_def `lab_def' `n' "`n' Firms"
}
label def num_firms `lab_def'
label values number_of_firms num_firms
label var number_of_firms "Number of Firms"
And then run the regression and the output will be formatted with the number of firms lables.

Displaying multiple items in Excel graph and few calculation issues

I've done some Googling for each of my issues but haven't found exactly the results as I wanted. Things I need to be done doesn’t probably include any macros/VBA skills, just basic knowledge of Excel.
Now to my spreadsheet. I'm a Dota 2 player and I like statistics. I like it that much that I'd like to keep track of my achievements and results. Only problem is that the game tracker sucks and to get great information in web you have to pay for it, so I decided it's time for me to create my own spreadsheet to track my skills.
I don't know which place is the best to share my spreadsheet but I uploaded it to Estonian uploading host, link is here. I will also provide with pictures so you don't have to download anything.
This is what it looks like in general:
Problem number 1: The left table, or column has 1000 rows. In web design it's possible to make elements fixed depending on the scroll, I'd like to use similar feature here. If the table gets scrolled down, the right table (area with games, bonus and graph) will get scrolled down with it.
Problem number 2: Average MMR. I'd like to show average MMR after each entry depending on the first entries. Right now there's avg MMR for J4:J8. The calculation for J8 looks like this: =AVERAGE(C4:C8). For J7 it looks like this: =AVERAGE(C4:C7). I'd like to do this for all my 1000 rows, but I don't want to type it out. If I try to drag down from the corner, it will continue with C5:C8, C6:C9 etc (so it changes the starting point)
Problem number 3: Under longestGame there's currently Date and Hero. This should show the Date and Hero of which the longest game occurred. I tried to do this with LOOKUP function but it required table to be in ascending order, which I don't want. For current, 44,22, there should be Storm Spirit and 14.06.2015.
Problem number 4: Graph. I'd like to display three series on graph - MMR, average MMR and game length (time). The problem is, that MMR and average MMR will be in the numbers on 3000-7000 but the game length will only be probably in timeframe 20:00-120:00. Maybe it's possible to add two sets of values to the Y axis or maybe set Time series maximum 200:00 and minimum 0:00 and create graph according to this. I'm really stupid making graphs and I haven't figured out a clever way yet.
Problem number 5: Graph again. Right now I have to set the series for the graph. I've currently set it to C4:C54 (so 50 rows). I'd like it to move around a bit and by that I mean that if there happens to be C55-th game then the graph would start from C5:C55 and move along (so it'll count 50 last games).
I'm in a benevolent mood so rather than downvoting your question, because it is not really suitable for this forum I'm going to give you some hints and guidance. The numbers below correspond to the problems in your question.
Excel permits more than one window to be used on the same workbook -
so one window can show the data and one the summary.
Find out about absolute and relative cell addressing - its a valuable bit of knowledge for anyone serious about Excel and it will be of use in solving your problem.
Find out about the MATCH function. You can use this to find out which row of your table contains the longest game, shortest game, max MMR, min MMR by matching an element from the summary on the right (cols M onward) against the appropriate column table on the left. The find out about the INDEX function - this can be used to pull the values in the columns for Hero and Date which correspond to a specific row (such as the row containing the longest game, shortest game, etc). Search INDEX MATCH and find out why using these two functions in combination is often preferred to using the VLOOKUP function
Persevere - there are graph options available to do what you want and the only way to really learn is to go through the pain of trying them out, failing and working at it until you succeed.
Set up an area of worksheet to hold the 50*3 table of data for your graphs. Find out about the COUNT function and think how it might be of use in determining which rows of the data table map to the 50 rows of graph data. Then think about how to populate the graph data table using one of the functions mentioned above. Incidentally, C4:C54 is actually 51 rows, not 50.

Resources