Sorry if the title is vague, my mind is completelty blanking on what to do.
I have a data set that holds broadcast stations and counties and a Error-Ticket. I want to visualize this as a proportional size to station numbers and ticket count.
Example if I have 50 Tickets in NYC but the station count is 200 stations and 5 tickets in a county in Montana with only 1 station then Montana's error rate is statistically higher.
I figure it would be a Count([Error-Ticket]) by StationID but of course that isn't it.
I was also thinking it would be an OVER All type of function but have not gotten the desired results; in fact no results, only syntax errors.
Columns are StationID, String (KTWR, KNOE, WRTV 22.2 ); County, String (Fayette, LEE, Clark); Error-Ticket, String (ABC123, ABC124, ABC125)
Any help would be appreciated.
For the error, I think you want something proportional - so it'd be something like:
(COUNT([Error-Ticket]) Over [County]) / (UNIQUECOUNT([StationID]) Over [County])
This would tell you the number of errors happening per county in proportion to how many stations are in that county.
Related
The condition is as follows:
I have a budget of 500 EUR.
There are three bikes at different prices.
I need to buy a bike at the best price
If I set the budget to 0 then I get an else print, but if I change the budget to 1 then I get a successful if print even though my budget is less than the price of the bike.
Maybe you can help and explain what I'm doing wrong?
CODE:
budget = 500
bikeUnivega = 375
bikeKalkhoff = 449
bikeFocus = 462
bikeCatalogList = (bikeUnivega, bikeKalkhoff, bikeFocus)
lowestBikePrice = bikeCatalogList.index(min(bikeCatalogList))
if budget > lowestBikePrice:
print(f'I Will buy the bike for the lowest price of {bikeCatalogList[lowestBikePrice]} EUR.')
else:
print('I dont have enough money for the bike. I will save money until next year.')
I tried to find the information on google, but I was unsuccessful.
There are a couple things I've noticed in your code:
For starters, your list is using () like a tuple, instead of [].
Next is that lowestBikePrice is not returning the value stored in the variable, but its index.
So, if you were to get the min, it returns a 0 bc the lowest value variable is in position 0.
If you were to use a max instead, it would return a 2.
lastly your f string is hashing your list for an item lowestBikePrice which doesn't exist in the list.
I know this is an old question. You will probably conclude that Average of the Average is always wrong. Consider the following example:
You want to know the purchasing behaviour for a supermarket by understanding the share% of the baskeket. For each order, you can have a share% across product categories. The dataset can be like this:
order_id, grocery%, tabacco%, cloth%, etc. The share% is based on the order amount. Each row is a unique order_id.
If you are summing up all grocery amount and divided by total order amount, you can indeed get the average grocery share. If given more contexts, let's say, the VIP in this supermarket accounts for 10% and each order they can spend 1 million (just assumption). So it is quite possible that the result tends to be close to the VIP result.
If I am more interested in the average player behaviour, it seems to use the average of the average metric, which is this one: (grocery% + grocery% + ...)/order number.
Any thoughts?
So let me try to answer your question with an example.
Let us say, there were only three purchases made in the supermarket.
Purchase 1
Grocery Amount = 30$ (60%)
Cloth Amount = 20$ (40%)
Purchase 2
Grocery Amount = 10$ (50%)
Cloth Amount = 10$ (50%)
Purchase 3
Grocery Amount = 5$ (25%)
Cloth Amount = 15$ (75%)
Now let us calculate our metrics:
Approach "Average of Average"
Final Answer = (25% + 50% + 60%)/3 = 45%
Approach "Average"
Final Answer = (5$ + 10$ + 30$)*100/140$ = 32.14%
Conclusion
Given the example above, obviously, the "average" approach leads to a more accurate result. But given your use case, you can use any of these.
Hope this helps!
I have different water tanks and 2 employees who measures the water tanks. Sometimes they measure the volume of the tank on the same day and sometimes not. I want to see how much their measurements differed. I understand sometimes the dates are not the same, thus, I would like to vlookup the volume of an exact tank to the closest date possible of Bob's reading dates.
Bob's Readings
Water_Tank_Name
Date
Volume
Red
15/02/2021
300
Blue
15/02/2021
145
Red
21/02/2021
280
Red
04/03/2021
339
Blue
05/03/2021
170
Sarah's Readings
Water_Tank_Name
Date
Volume
Blue
15/02/2021
148
Blue
19/02/2021
190
Red
23/02/2021
294
Blue
01/03/2021
140
I used xlookup but that only returns a value if the exact Water_Tank_Name and exact Date return a value. However, I would like to exactly watch the Water_Tank_Name and match to the closet Date.
=XLOOKUP(Bob!A2 & Bob!A2, Sarah!A:A & Sarah!B:B, Sarah!C:C)
You could use (with Excel 365):
=LET( tf, Bob!A2, df, Bob!B2,
tS, Sarah!A:A, dS,Sarah!B:B, dV, Sarah!C:C,
L, tS & dS,
S, SIGN(ABS(IFERROR(XLOOKUP(tf & df, L, dS,,-1)-df, 999)) - ABS(IFERROR(XLOOKUP(tf & df, L, dS,,1)-df, 999))),
XLOOKUP(tf & df, L, dV,,S) )
Where tf is the tank identity that you want use for the search and df is the date value that you want to search. This finds the nearest date and determines if it is smaller or larger than df and then tells the XLOOKUP to search for the next larger or smaller (S is either 1 or -1) that will arrive at the nearest date. It might be possible to replace the two XLOOKUPs for S with FILTERs, but I am not sure if it would be faster. The use of whole columns for SARAH should be replaced with Excel table columns - otherwise, it will run slow.
I am almost positive there is something that is less verbose than this monstrosity, but it works and makes sense if you take it apart piece by piece.
=IFERROR(FILTER($EG$12:$EG$15,$EE$12:$EE$15=$EE5)*$EF$12:$EF$15=$EF5+MIN(ABS(EF5-FILTER($EF$12:$EF$15, $EE$12:$EE$15=$EE5))))),FILTER($EG$12:$EG$15,($EE$12:$EE$15=$EE5)*($EF$12:$EF$15=$EF5-MIN(ABS(EF5-FILTER($EF$12:$EF$15, $EE$12:$EE$15=$EE5))))))
hope you are doing well and can help solve this puzzle in DAX for PowerBI and PowerPivot.
I'm having troubles with my measure in the subtotals and grand totals. My scene is the following:
I have 3 tables (I share a link below with a test file so you can see it and work there :robothappy:):
1) "Data" (where every register is a sold ticket from a bus company);
2) "Km" (where I have every possible track that the bus can do with their respective kilometer). Related to "Data";
3) and a "Calendar". Related to "Data".
In "Data" I have all the tickets sold from a period with their price, the track that the passenger bought and the departure time of that track.
Each track can have more than 1 departure time (we can call it a service) but only have a specific lenght in kilometers (their kilometers are specified in the "Km" table).
Basically what I need is to calculate the revenue per kilometer for each service in a period (year, month, day).
The calculation should be, basically:
Sum of [Price] (each ticket sold in the period) / Sum of [Km] (of the period considerating the services with their respective kilometers)
I managed to calculate it for the day granularity with the following logic and measures:
Revenue = SUM(Data[Price])
Unique dates = DISTINCTCOUNT(Data[Date])
Revenue/Km = DIVIDE([Revenue]; SUM(Km[Km])*[Unique dates]; 0)
I created [Unique dates] to calculate it because I tried to managed the subtotals of track granularity taking into account that you can have more than 1 day with services within the period. For example:
For "Track 1" we have registered:
1 service on monday (lunes) at 5:00am.
Revenue = $1.140.
Km = 115.
Tickets = 6.
Revenue/Km = 1.140/115 = 9,91.
1 service on tuesday (martes) at 5:00am.
Revenue = $67.
Km = 115.
Tickets = 2.
Revenue/Km = 67/115 = 0,58.
"Subtotal Track 1" should be:
Revenue = 1.140 + 67 = 1.207.
Km = 115 + 115 = 230.
Tickets = 6 + 2 = 8.
Revenue/Km = 1.207/230 = 5,25.
So at that instance someone can think my formula worked, but the problem you can see it when I have more than 1 service per day, for example for Track 3. And also this impact in the grand total of march (marzo).
I understand that the problem is to calculate the correct kilometers for each track in each period. If you check the column "Sum[Km]" is also wrong.
Here is a table (excel file to download - tab "Goal") with the values that should appear:
[goal] https://drive.google.com/file/d/1PMrc-IUnTz0354Ko6q3ZvkxEcnns1RFM/view?usp=sharing
[pbix sample file] https://drive.google.com/file/d/14NBM9a_Frib55fvL-2ybVMhxGXN5Vkf-/view?usp=sharing
Hope you can understand my problem. If you need more details please let me know.
Thank you very much in advance!!!
Andy.-
Delete "Sum of Km" - you should always write DAX measures instead.
Create a new measure for the km traveled:
Total Km =
SUMX (
SUMMARIZE (
Data,
Data[Track],
Data[Date],
Data[Time],
"Total_km", DISTINCT ( Data[Kilometers Column] )
),
[Total_km]
)
Then, change [Revenue/Km] measure:
Revenue/Km = DIVIDE([Revenue], [Total Km])
Result:
The measure correctly calculates km on both subtotal and total levels.
The way it works:
First, we use SUMMARIZE to group records by trips (where trip is a unique combination of track, date and time). Then, we add a column to the summary that contains km for each trip. Finally, we use SUMX to iterate the summary record by record, and sum up trip distances.
The solution should work, although I would recommend to give more thoughts to the data model design. You need to build a better star schema, or DAX will continue to be challenging. For example, I'd consider adding something like "Trip Id" to each record - it will be much easier to iterate over such ids instead of grouping records all the time. Also, more descriptive names can help make DAX clean (names like km[km] look a bit strange :)
By reducing I mean some form of caching when you can reduce 100 rows with 1 row (accumulate counters etc).
I want to able to answer queries how many people are from %EACH_COUNTRY, so basically return an array of pairs / map from (Country, COUNT). And then I've got huge number (think of 50 * 10^8) of people so I can't allocate 1 row for each person, so I'd like to cache the results somehow to keep PeopleTable under 10^6 entries at least (and then merge the results with the fast read from CacheTable). By caching I mean count the number of people with country=%SPECIFIC_COUNTRY, write %SPECIFIC_COUNTRY, COUNT(*) in CacheTable (to be precise, increment the count for %SPECIFIC_COUNTRY and then remove these rows from PeopleTable):
personId, country
1132312312, Russia
2344333333, the USA
1344111112, France
1133555555, Russia
1132666666, Russia
3334124124, Russia
....
and then
CacheTable
country, count
Russia, 4
France, 1