POWER BI Creating visual using two dates from the same table - powerbi-desktop

I have a table of bugs that I want to create a line graph visual on:
| Id | Created Date | Closed Date |
|----|--------------|-------------|
| 1 | 01/01/2020 | 01/02/2020 |
| 2 | 01/01/2020 | 01/03/2020 |
| 3 | 02/01/2020 | |
I want to create a line chart visual that shows per day how many bugs were created and how many were closed cumulatively (running total) using two lines.
Is it possible to create this from the one table (using two y-axis)? Do I need another table for the dates and what is the best way to create the relationship?

This would be a great use case for a simple measure.
Running Total MEASURE =
CALCULATE (
COUNT( 'Table'[ID] ),
FILTER (
ALL ( 'Table' ),
'Table'[Created Date] <= MAX ( 'Table'[Created Date] )
)
)
In the above DAX expression, simply plugin your created date and the bug ID where appropriate. Basically, it is counting the instance of each ID that occurs on or before every created date.
Let me know if this helps

Related

Optimizing Theta Joins in Spark SQL

I have just 2 tables wherein I need to get the records from the first table (big table 10 M rows) whose transaction date is lesser than or equal to the effective date present in the second table (small table with 1 row), and this result-set will then be consumed by downstream queries.
Table Transact:
tran_id | cust_id | tran_amt | tran_dt
1234 | XYZ | 12.55 | 10/01/2020
5678 | MNP | 25.99 | 25/02/2020
5561 | XYZ | 32.45 | 30/04/2020
9812 | STR | 10.32 | 15/08/2020
Table REF:
eff_dt |
30/07/2020 |
Hence as per logic I should get back the first 3 rows and discard the last record since it is greater than the reference date (present in the REF table)
Hence, I have used a non-equi Cartesian Join between these tables as:
select
/*+ MAPJOIN(b) */
a.tran_id,
a.cust_id,
a.tran_amt,
a.tran_dt
from transact a
inner join ref b
on a.tran_dt <= b.eff_dt
However, this sql is taking forever to complete due to the cross Join with the transact table even using Broadcast hints.
So is there any smarter way to implement the same logic which will be more efficient than this ? In other words, is it possible to optimize the Theta join in this query ?
Thanks in advance.
So I wrote something like this:
Referring from https://databricks.com/session/optimizing-apache-spark-sql-joins
Can you try Bucketing on trans_dt (Bucketed on Year/Month only). And write 2 queries to do the same work
First query, trans_dt(Year/Month) < eff_dt(Year/Month). So this could help you actively picking up buckets(rather than checking each and every record trans_dt) which is less than 2020/07.
second query, trans_dt(Year/Month) = eff_dt(Year/Month) and trans_dt(Day) <= eff_dt(Day)

Making Switch function return a column in a table and not a measure (PowerBI DAX)

What I'm trying to do is to change measures using slicers in Power BI Desktop. I have found some examples of people who have done that (for instance this ) .
What they do is that they create a table where there are ID and Measure names. The 'Measure names' column of this table will be used as field value in the filter of the visualization. Then, they create a switch function that, given a certain value in the filter, switch the value in the filter to a measure.
You can see an example below:
Measure Value = SWITCH(
MIN('Dynamic'[Measure ID]) ,
1,[Max Temp],
2,[Min Temp],
3,[Air Pressure],
4,[Rainfall],
5,[Wind Speed],
6,[Humidity]
)
Where 'Dynamic' is a group containing a measure ID and a Measure name:
Dynamic:
Measure ID | Measure Name
1 | Max Temp
2 | Min Temp
3 | Air Pressure
4 | Rainfall
5 | Wind Speed
6 | Humidity
All of the 'Measure Name' are measures as well.
My problem is: I have too many columns (400!) and I cannot turn them into measures one by one. It will take days. I was thinking that maybe I could use the switch function so that it returns the column in the table and NOT the corresponding measure. However I cannot just insert
'Name of the table'['Name of the column'] in the switch function as result parameter.
Does anyone know how to make the function Switch return a column and not a measure? (Or any other suggestion)
DAX doesn't work well for lots of columns like this, so I'd suggest reshaping your data (in the query editor) by unpivoting all those columns you want to work with so that instead of a table that looks like this
ID | Max Temp | Min Temp | Air Pressure | Rainfall | Wind Speed | Humidity
---+----------+----------+--------------+----------+------------+----------
1 | | | | | |
...
you'd unpivot all those data columns so it looks more like this:
ID | ColumnName | Value
---+--------------+-------
1 | Max Temp |
1 | Min Temp |
1 | Air Pressure |
1 | Rainfall |
1 | Wind Speed |
1 | Humidity |
...
Then you can create a calculated table, Dynamic, to use as your slicer:
Dynamic = DISTINCT ( Unpivoted[ColumnName] )
Now you can write a switching measure like this:
SwitchingMeasure =
VAR ColName = SELECTEDVALUE ( Dynamic[ColumnName] )
RETURN
CALCULATE ( [BaseMeasure], Unpivoted[ColumnName] = ColName )
where [BaseMeasure] is whatever aggregation you're after, e.g., SUM ( TableName[Value] ).

Azure Application Insights Query - How to calculate percentage of total

I'm trying to create a row in an output table that would calculate percentage of total items:
Something like this:
ITEM | COUNT | PERCENTAGE
item 1 | 4 | 80
item 2 | 1 | 20
I can easily get a table with rows of ITEM and COUNT, but I can't figure out how to get total (5 in this case) as a number so I can calculate percentage in column %.
someTable
| where name == "Some Name"
| summarize COUNT = count() by ITEM = tostring( customDimensions.["SomePar"])
| project ITEM, COUNT, PERCENTAGE = (C/?)*100
Any ideas? Thank you.
It's a bit messy to create a query like that.
I've done it bases on the customEvents table in AI. So take a look and see if you can adapt it to your specific situation.
You have to create a table that contains the total count of records, you then have to join this table. Since you can join only on a common column you need a column that has always the same value. I choose appName for that.
So the whole query looks like:
let totalEvents = customEvents
// | where name contains "Opened form"
| summarize count() by appName
| project appName, count_ ;
customEvents
// | where name contains "Opened form"
| join kind=leftouter totalEvents on appName
| summarize count() by name, count_
| project name, totalCount = count_ , itemCount = count_1, percentage = (todouble(count_1) * 100 / todouble(count_))
If you need a filter you have to apply it to both tables.
This outputs:
It is not even necessary to do a join or create a table containing your totals
Just calculate your total and save it in a let like so.
let totalEvents = toscalar(customEvents
| where timestamp > "someDate"
and name == "someEvent"
| summarize count());
then you can simply add a row to your next table, where you need the percentage calcualtion by doing:
| extend total = totalEvents
This will add a new column to your table filled with the total you calculated.
After that you can calculate the percentages as described in the other two answers.
| extend percentages = todouble(count_)*100/todouble(total)
where count_ is the column created by your summarize count() which you presumably do before adding the percentages.
Hope this also helps someone.
I think following is more intuitive. Just extend the set with a dummy property and do a join on that...
requests
| summarize count()
| extend a="b"
| join (
requests
| summarize count() by name
| extend a="b"
) on a
| project name, percentage = (todouble(count_1) * 100 / todouble(count_))
This might work too:
someTable
| summarize count() by item
| as T
| extend percent = 100.0*count_/toscalar(T | summarize sum(count_))
| sort by percent desc
| extend row_cumsum(percent)

Recreating a non-straightforward Excel 'vlookup'

I'm looking for some thoughts on how you might recreate a 'vlookup' that I currently do in excel.
I have two tables: Data contains a list of datetime values; DateConverter; contains a list of calendar dates and their associated "network dates." Imagine for a business - not every day is a workday, so if I want to calculate differences in dates, I'm most interested in the number of work days that elapsed between my two dates.
Here is what the data might look like:
Data Table DateConverter Table
================= ===================
| Datetime | | Calendar date | Netowrk date |
| ------------- | | ------------- | ------------ |
| 6-1-15 8:00a | | 6-1-15 | 1000 |
| 6-2-15 1:00p | | 6-2-15 | 1001 |
| 6-3-15 7:00a | | 6-3-15 | 1002 |
| 6-10-15 3:00p | | 6-4-15 | 1003 |
| 6-15-15 1:00p | | 6-5-15 | 1004 |
| 6-12-15 2:00a | | 6-8-15 | 1005 | // Skips the weekend
| ... | | ... | ... |
In excel, I can easily map in the network date for each date in the Datetime field with a variant of vlookup:
// Assume that Datetime values are in Column A, Calendar date values in
// Column C, Network date values in Column D - this formula fills Column B
// Headers are in row 1 - first values are in row 2
B2=OFFSET($D$1,COUNTIFS($C:$C,"<"&A2),)
The formula counts the dates that are less than the lookup value (using countifs because the values in the search array are dates, and the search value is datetime) and returns the associate network date.
Is there a way to do this in Tableau? Will it require a calculated field or can I do this with some kind of join?
Thanks in advance for the help! Let me know if there is anything I can clarify. Thanks!
If the tables are on the same data server, you have the option to use joins, which is usually the most efficient way to combine information from different tables. If the tables are on different servers or platforms, then you can't use a single query to join them.
In either case, you can use Tableau data blending, which is sort of like a client-side join of aggregated results from multiple queries. Its a pretty useful technique, but a little more complex and restricted and also usually less efficient than a server side join.
So if you have the option to have both tables on the same server, start with that. It will be simpler and likely faster.
Note if you are going to use a date as a join key, you probably want to define it is a date and not a datetime.
#alex-blakemore's response would normally be adequate, but if you can change the schema, you could simply add the network date to the DataTable. The hourly granularity should not cause excessive growth and you don't need to navigate the joining.
Then, instead of counting rows and requiring a sorted table, simply subtract the Network date from each other and add 1.

Excel: max() of count() with column grouping in a pivot table

I have a pivot table fed from a MySQL view. Each returned row is basically an instantiation of "a person, with a role, at a venue, on a date". The each cell then shows count of person (lets call it person_id).
When you pivot this in excel, you get a nice table of the form:
| Dates -->
--------------------------
Venue |
Role | -count of person-
This makes a lot of sense, and the end user likes this format BUT the requirement has changed to group the columns (date) into a week.
When you group them in the normal way, this count is then applied in columns as well. This is, of course, logical behaviour, but what I actually want is max() of the original count().
So the question: Does anyone know how to have cells count(), but the grouping perform a max()?
To illustrate this, imagine the columns for a week. Then imaging the max() grouped as a week, giving:
Old:
| M | T | W | T | F | S | S ||
--------------------------------------- .... for several weeks
Venue X |
Role Y| 1 | 1 | 2 | 1 | 2 | 3 | 1 ||
New (grouped by week)
| Week 1 | ...
---------------------------
Venue X |
Role Y| 3 | ...
I'm not on my pc, but the steps below should be broadly correct:
You should be able to right click on the date field on pivot table and select group.
Then highlight week, you may have to select year also.
Lastly right click on the count data you already have and expand the summarise by, and select max.

Resources