Excel: max() of count() with column grouping in a pivot table - excel

I have a pivot table fed from a MySQL view. Each returned row is basically an instantiation of "a person, with a role, at a venue, on a date". The each cell then shows count of person (lets call it person_id).
When you pivot this in excel, you get a nice table of the form:
| Dates -->
--------------------------
Venue |
Role | -count of person-
This makes a lot of sense, and the end user likes this format BUT the requirement has changed to group the columns (date) into a week.
When you group them in the normal way, this count is then applied in columns as well. This is, of course, logical behaviour, but what I actually want is max() of the original count().
So the question: Does anyone know how to have cells count(), but the grouping perform a max()?
To illustrate this, imagine the columns for a week. Then imaging the max() grouped as a week, giving:
Old:
| M | T | W | T | F | S | S ||
--------------------------------------- .... for several weeks
Venue X |
Role Y| 1 | 1 | 2 | 1 | 2 | 3 | 1 ||
New (grouped by week)
| Week 1 | ...
---------------------------
Venue X |
Role Y| 3 | ...

I'm not on my pc, but the steps below should be broadly correct:
You should be able to right click on the date field on pivot table and select group.
Then highlight week, you may have to select year also.
Lastly right click on the count data you already have and expand the summarise by, and select max.

Related

Dual criteria data validation in Excel

Unlike the other questions posted with this topic, my criteria are not simple comparators. I want a dropdown list that includes all values in one named table excluding those values that meet another criteria. For instance a table includes employee names in one column and vacation dates in another column. I want the data validation to allow a list of employees who are not on vacation for a variable date drawn from another cell. The general method seems to be to create additional tables where the secondary criteria (in this case date) is the column header populated by items from the first list that satisfy some criteria. It seems impractical to create 365 tables named for each date and populated by rows of employees from the first table that have not requested that date off. Is there another way to accomplish this?
Sample Data:
| Employee| Vacation Dates | | work on 1/26/20 |
_____________________________ ___________________
| Bob | 1/26/20, 1/27/20| | <allow only |
| Mike | 2/20/20, 2/21/20| | Mike or Cindy> |
| Cindy | 2/20/20, 1/28/20|
Had to transpose my thinking. Rather than a table for each date, I can have a vacation table for each employee. The validation formula has to be a custom validation rather than a list, so no drop down selection list is available, but it will work. Error message also cannot discriminate which criteria is being violated -- name not on employee list versus name from employee list who is on vacation. Would be great if validation worked like conditional formatting with different rules applied in sequence.
| Employee| Bob | Mike | Cindy | | 1/26/20 |
____________________________________| ___________
| Bob |1/26/20| 2/20/20 |2/20/20| | |
| Mike |1/27/20| 2/21/20 |1/28/20| | |
| Cindy |
The validation formula for the "1/26/20" column (F in the scheme above) would be
=AND(COUNTIF($A$2:$A$4,F2)>0,COUNTIF(INDIRECT(ADDRESS(2,MATCH(F2,$B$1:$D$1,0)+1)):INDIRECT(ADDRESS(3,MATCH(G2,$B$1:$D$1,0)+1)),F1)<1)

Making Switch function return a column in a table and not a measure (PowerBI DAX)

What I'm trying to do is to change measures using slicers in Power BI Desktop. I have found some examples of people who have done that (for instance this ) .
What they do is that they create a table where there are ID and Measure names. The 'Measure names' column of this table will be used as field value in the filter of the visualization. Then, they create a switch function that, given a certain value in the filter, switch the value in the filter to a measure.
You can see an example below:
Measure Value = SWITCH(
MIN('Dynamic'[Measure ID]) ,
1,[Max Temp],
2,[Min Temp],
3,[Air Pressure],
4,[Rainfall],
5,[Wind Speed],
6,[Humidity]
)
Where 'Dynamic' is a group containing a measure ID and a Measure name:
Dynamic:
Measure ID | Measure Name
1 | Max Temp
2 | Min Temp
3 | Air Pressure
4 | Rainfall
5 | Wind Speed
6 | Humidity
All of the 'Measure Name' are measures as well.
My problem is: I have too many columns (400!) and I cannot turn them into measures one by one. It will take days. I was thinking that maybe I could use the switch function so that it returns the column in the table and NOT the corresponding measure. However I cannot just insert
'Name of the table'['Name of the column'] in the switch function as result parameter.
Does anyone know how to make the function Switch return a column and not a measure? (Or any other suggestion)
DAX doesn't work well for lots of columns like this, so I'd suggest reshaping your data (in the query editor) by unpivoting all those columns you want to work with so that instead of a table that looks like this
ID | Max Temp | Min Temp | Air Pressure | Rainfall | Wind Speed | Humidity
---+----------+----------+--------------+----------+------------+----------
1 | | | | | |
...
you'd unpivot all those data columns so it looks more like this:
ID | ColumnName | Value
---+--------------+-------
1 | Max Temp |
1 | Min Temp |
1 | Air Pressure |
1 | Rainfall |
1 | Wind Speed |
1 | Humidity |
...
Then you can create a calculated table, Dynamic, to use as your slicer:
Dynamic = DISTINCT ( Unpivoted[ColumnName] )
Now you can write a switching measure like this:
SwitchingMeasure =
VAR ColName = SELECTEDVALUE ( Dynamic[ColumnName] )
RETURN
CALCULATE ( [BaseMeasure], Unpivoted[ColumnName] = ColName )
where [BaseMeasure] is whatever aggregation you're after, e.g., SUM ( TableName[Value] ).

Azure Application Insights Query - How to calculate percentage of total

I'm trying to create a row in an output table that would calculate percentage of total items:
Something like this:
ITEM | COUNT | PERCENTAGE
item 1 | 4 | 80
item 2 | 1 | 20
I can easily get a table with rows of ITEM and COUNT, but I can't figure out how to get total (5 in this case) as a number so I can calculate percentage in column %.
someTable
| where name == "Some Name"
| summarize COUNT = count() by ITEM = tostring( customDimensions.["SomePar"])
| project ITEM, COUNT, PERCENTAGE = (C/?)*100
Any ideas? Thank you.
It's a bit messy to create a query like that.
I've done it bases on the customEvents table in AI. So take a look and see if you can adapt it to your specific situation.
You have to create a table that contains the total count of records, you then have to join this table. Since you can join only on a common column you need a column that has always the same value. I choose appName for that.
So the whole query looks like:
let totalEvents = customEvents
// | where name contains "Opened form"
| summarize count() by appName
| project appName, count_ ;
customEvents
// | where name contains "Opened form"
| join kind=leftouter totalEvents on appName
| summarize count() by name, count_
| project name, totalCount = count_ , itemCount = count_1, percentage = (todouble(count_1) * 100 / todouble(count_))
If you need a filter you have to apply it to both tables.
This outputs:
It is not even necessary to do a join or create a table containing your totals
Just calculate your total and save it in a let like so.
let totalEvents = toscalar(customEvents
| where timestamp > "someDate"
and name == "someEvent"
| summarize count());
then you can simply add a row to your next table, where you need the percentage calcualtion by doing:
| extend total = totalEvents
This will add a new column to your table filled with the total you calculated.
After that you can calculate the percentages as described in the other two answers.
| extend percentages = todouble(count_)*100/todouble(total)
where count_ is the column created by your summarize count() which you presumably do before adding the percentages.
Hope this also helps someone.
I think following is more intuitive. Just extend the set with a dummy property and do a join on that...
requests
| summarize count()
| extend a="b"
| join (
requests
| summarize count() by name
| extend a="b"
) on a
| project name, percentage = (todouble(count_1) * 100 / todouble(count_))
This might work too:
someTable
| summarize count() by item
| as T
| extend percent = 100.0*count_/toscalar(T | summarize sum(count_))
| sort by percent desc
| extend row_cumsum(percent)

Recreating a non-straightforward Excel 'vlookup'

I'm looking for some thoughts on how you might recreate a 'vlookup' that I currently do in excel.
I have two tables: Data contains a list of datetime values; DateConverter; contains a list of calendar dates and their associated "network dates." Imagine for a business - not every day is a workday, so if I want to calculate differences in dates, I'm most interested in the number of work days that elapsed between my two dates.
Here is what the data might look like:
Data Table DateConverter Table
================= ===================
| Datetime | | Calendar date | Netowrk date |
| ------------- | | ------------- | ------------ |
| 6-1-15 8:00a | | 6-1-15 | 1000 |
| 6-2-15 1:00p | | 6-2-15 | 1001 |
| 6-3-15 7:00a | | 6-3-15 | 1002 |
| 6-10-15 3:00p | | 6-4-15 | 1003 |
| 6-15-15 1:00p | | 6-5-15 | 1004 |
| 6-12-15 2:00a | | 6-8-15 | 1005 | // Skips the weekend
| ... | | ... | ... |
In excel, I can easily map in the network date for each date in the Datetime field with a variant of vlookup:
// Assume that Datetime values are in Column A, Calendar date values in
// Column C, Network date values in Column D - this formula fills Column B
// Headers are in row 1 - first values are in row 2
B2=OFFSET($D$1,COUNTIFS($C:$C,"<"&A2),)
The formula counts the dates that are less than the lookup value (using countifs because the values in the search array are dates, and the search value is datetime) and returns the associate network date.
Is there a way to do this in Tableau? Will it require a calculated field or can I do this with some kind of join?
Thanks in advance for the help! Let me know if there is anything I can clarify. Thanks!
If the tables are on the same data server, you have the option to use joins, which is usually the most efficient way to combine information from different tables. If the tables are on different servers or platforms, then you can't use a single query to join them.
In either case, you can use Tableau data blending, which is sort of like a client-side join of aggregated results from multiple queries. Its a pretty useful technique, but a little more complex and restricted and also usually less efficient than a server side join.
So if you have the option to have both tables on the same server, start with that. It will be simpler and likely faster.
Note if you are going to use a date as a join key, you probably want to define it is a date and not a datetime.
#alex-blakemore's response would normally be adequate, but if you can change the schema, you could simply add the network date to the DataTable. The hourly granularity should not cause excessive growth and you don't need to navigate the joining.
Then, instead of counting rows and requiring a sorted table, simply subtract the Network date from each other and add 1.

Data model for inconsistent data on Cassandra

I am pretty new to NoSQL and Cassandra but I was told by my architecture committee to use this. I just want to understand how to convert the RDBMS model to noSQL.
I have a database where user needs to import data from an excel or csv file into the database. This file may have different columns each time.
For example in the excel file data might look something like this:
Name| AName| Industry| Interest | Pint |Start Date | End date
x | 111-121 | IT | 2 | 1/1/2011 | 1/2/2011
x | 111-122 | hotel | 1 | "" | ""
y| 111-1000 | IT | 2 | 1/1/2011 | 1/2/2011
After we upload this the next excel file might look
Name| AName| Industry| Interest | Pint |Start Date | isTrue | isNegative
x | 111-121 | IT | 2 | 1/1/2011 | 1/2/2011 | yes | no
x | 111-122 | hotel | 1 | "" | no | no
y| 111-1000 |health | 2 | 1/1/2010 | yes|""
I would not know in advance what columns I am going to create when importing data. I am totally confused with noSQL and unable to understand how handle this on how to import data when I don't know the table structure
Start with the basic fact that a column family (cassandra for "table") is made up of rows. Each row has a row key and some number of key/value pairs (called columns). For a particular column in a row the name of the column is the key for the pair and the value of the column is the value of the pair. Just because you have a column by some name in one row does not necessarily mean you'll have a column by that name in any other row.
Internally, row keys, column names and column values are stored as byte arrays and you'll need to use serializers to convert program data to the byte arrays and back again.
It's up to you as to how you define the row key, column name and column value.
One approach would be to have a row in the CF correspond to a row from Excel. You'd have to identify the one Excel column that will provide a unique id and store that in the row key. The remained of the Excel columns can get stored in cassandra columns, one-to-one. This lets you be very flexible on most column names, but you have to have a unique key value somewhere. The unique key requirement will always hold for any storage scheme you use.
There are other storage schemes, but they all boil down to you defining in the Excel what your row key is and how you break the Excel data into key/value pairs.
Check out some noSQL patterns and I highly suggest reading "Building on Quicksand" by Pat Helland
some good patterns(with or without using PlayOrm)...
http://buffalosw.com/wiki/Patterns-Page/

Resources