I am looking to create a YTD total, however, the year end date needs to change depending on the value in another column. In other words, the fiscal year for group 1 would be from 11-1 (Nov-1) to 10-31 (Oct-31), while the fiscal year for group 2 would be from 7-1 (Jul-1) to 6-30 (Jul-30). What I need to do is when calculating the fiscal year, I need the calculation for that year to be different depending on what group the line item is in. So 2015 for group 1 would be 2014-11-01 to 2015-10-31 while 2015 for group 2 would be 2014-07-01 to 2014-06-30. Please see an example table here (Please note that I do have a date table related to this one in order to create date functions):
**Table 1**
-------------------------
Group | Date | Qty
1 | 2014-10-01 | 1
1 | 2014-11-01 | 1
1 | 2015-01-01 | 2
1 | 2015-05-01 | 1
1 | 2015-10-31 | 2
1 | 2015-11-01 | 1
2 | 2014-06-01 | 1
2 | 2014-07-01 | 1
2 | 2014-12-25 | 2
2 | 2015-01-01 | 1
2 | 2015-06-30 | 2
2 | 2015-07-01 | 1
With this information in mind, I need to create a YTDTOTAL function that will dynamically change the , parameter depending on what group the line item is in. I thought of using an if statement, but realized that it wouldn't work on a measure. Something like this:
Total $ Sold YTD = TOTALYTD([TOTAL $ Sold],directSQLDateTable[date],ALL(directSQLDateTable[date]), IIF([GROUP = "A","10/31","6/30"))
In the end, I would like to create an output similar to this (The "Group A YTD" and "Group B YTD" columns really are not needed, just wanted to add to demonstrate my example):
Year-Month | Total_Qty | Group A YTD | Group B YTD
--------------------------------------------------
2014-07 | 1 | 0 | 1
2014-08 | 1 | 0 | 1
2014-09 | 1 | 0 | 1
2014-10 | 2 | 1 | 1
2014-11 | 2 | 1 | 1
2014-12 | 4 | 1 | 3
2015-01 | 7 | 3 | 4
2015-02 | 7 | 3 | 4
2015-03 | 7 | 3 | 4
2015-04 | 7 | 3 | 4
2015-05 | 8 | 4 | 4
2015-06 | 10 | 4 | 6
2015-07 | 5 | 4 | 1
2015-08 | 5 | 4 | 1
2015-09 | 5 | 4 | 1
2015-10 | 7 | 6 | 1
2015-11 | 2 | 1 | 1
2015-12 | 2 | 1 | 1
Please let me know if you have any questions. My apologies ahead of time if I didn't do that great of job explaining this or if I have left out a piece of info.
Thanks for any advice/help in advance! You guys on here are the best!
TOTALYTD() includes everything you need for this.
TotalQty:= SUM(Table1[Qty])
QtyYTDGroup1:=
TOTALYTD(
[TotalQty]
,DimDate[Date]
,Table1[Group] = 1
,"10/31"
)
QtyYTDGroup2:=
TOTALYTD(
[TotalQty]
,DimDate[Date]
,Table1[Group] = 2
,"6/30"
)
TotalQtyYTD:= [QtyYTDGroup1] + [QtyYTDGroup2]
I can provide a detailed explanation if you want, but I think the function definition pretty much covers it.
Related
I'm doing a case-control study about ovarian cancer. I want to do stratified analyses for the different histotypes but haven't found a good way of doing it in SPSS. I was thinking about copying the information about the diagnoses from the cases to the controls, but I don't know the proper syntax to do it.
So - what I want to do is to find the diagnosis within the case-control pair, copy it, and paste it into the same variable for all the controls within that pair. Does anyone know a good way to do this?
ID = unique ID for the individual, casecontrol = 1 for case, 0 for control, caseset = stratum, ID for each matched group of individuals.
My dataset looks like this:
ID | casecontrol | caseset | diagnosis
1 | 1 | 1 | 1
2 | 0 | 1 | 0
3 | 0 | 1 | 0
4 | 0 | 1 | 0
5 | 1 | 2 | 3
6 | 0 | 2 | 0
7 | 0 | 2 | 0
8 | 0 | 2 | 0
And I want it to look like this:
ID | casecontrol | caseset | diagnosis
1 | 1 | 1 | 1
2 | 0 | 1 | 1
3 | 0 | 1 | 1
4 | 0 | 1 | 1
5 | 1 | 2 | 3
6 | 0 | 2 | 3
7 | 0 | 2 | 3
8 | 0 | 2 | 3
Thank you very much.
According to your example, in each value of caseset you have one line with diagnosis equals some positive number, and in the rest of the lines diagnosis equals zero (or is missing?).
If this is true, all you need to do is this:
aggregate out=* mode=add overwrite=yes /break=caseset /diagnosis=max(diagnosis).
The above command will overwrite the original data, so make sure you have that data backed up, or use a different name for the aggregated data (eg /FullDiagnosis=max(diagnosis) .
So I've looked at some other posts, but they didn't quite help. I'm not new to python, but I'm relatively new to pandas and this has me stumped as to how to accomplish it in any manner that's not horribly inefficient. The data sets I've got are a little bit large and have some extraneous columns of data that I don't need, I've got them loaded as dataframes but they basically look like this:
+---------+---------+--------+-------+
| Subject | Week | Test | Value |
+---------+---------+--------+-------+
| 1 | Week 4 | Test 1 | 4 |
| 1 | Week 8 | Test 1 | 7 |
| 1 | Week 12 | Test 1 | 3 |
| 1 | Week 4 | Test 2 | 6 |
| 1 | Week 8 | Test 2 | 3 |
| 1 | Week 12 | Test 2 | 9 |
| 2 | Week 4 | Test 1 | 1 |
| 2 | Week 8 | Test 1 | 4 |
| 2 | Week 12 | Test 1 | 2 |
| 2 | Week 4 | Test 2 | 8 |
| 2 | Week 8 | Test 2 | 1 |
| 2 | Week 12 | Test 2 | 3 |
+---------+---------+--------+-------+
I want to rearrange the dataframes so that they look like this:
+---------+---------+--------+--------+
| Subject | Week | Test 1 | Test 2 |
+---------+---------+--------+--------+
| 1 | Week 4 | 4 | 6 |
| 1 | Week 8 | 7 | 3 |
| 1 | Week 12 | 3 | 9 |
| 2 | Week 4 | 1 | 8 |
| 2 | Week 8 | 4 | 1 |
| 2 | Week 12 | 2 | 3 |
+---------+---------+--------+--------+
If anyone has any ideas on how I can make this happen, I'd greatly appreciate it, and thank you in advance for your time!
Edit: After trying the solution provided by #HarvIpan, this is the output I'm getting:
+-----------------------------------------------+
| Subject Week Test_Test 1 Test_Test 2 |
+-----------------------------------------------+
| 0 1 Week 12 5 0 |
| 1 1 Week 4 5 0 |
| 2 1 Week 8 11 0 |
| 3 2 Week 12 0 12 |
| 4 2 Week 4 0 14 |
| 5 2 Week 8 0 4 |
+-----------------------------------------------+
Try using df.pivot_table.
You should be able to get the desired outcome with:
df.pivot_table(index=['Subject','Week'], columns='Test', values='Value')
You need get dummy variable for column Test with pd.get_dummies(df[['Test', 'Value']], 'Test').mul(df['Value'], 0)] with multiplication of their Value before concatenating them back to your original df. Then groupby Subject and Week before summing them.
pd.concat([df.drop(['Test', 'Value'],1), pd.get_dummies(df[['Test']], 'Test').mul(df['Value'], 0)], axis=1).groupby(['Subject', 'Week']).sum(axis=1).reset_index()
Output:
Subject Week Test_ Test 1 Test_ Test 2
0 1 Week 12 3 9
1 1 Week 4 4 6
2 1 Week 8 7 3
3 2 Week 12 2 3
4 2 Week 4 1 8
5 2 Week 8 4 1
I am trying to build a dataset from an online questionnaire. In this questionnaire, participants were asked to name 6 items. These items are represented with numbers from 1 to 6 (order of mention does not matter). Afterwards, participants were asked to rank those items from most important to least important (order here matters). Right now I have three columns "Named items", "Item ranked" and "Rank." The last column represents the position at which each case was ranked at. Thus, the idea would be to look at the number in the first column "Named item" and search for its position on the second column "Items Ranked" and return its position to the third column corresponding row.
Since the numbers go from 1 to 6, every six rows the process has to start again on the 7th row. I have a total of 186 participants, which means there's a total of 1116 items. What would be the most efficient way of doing this and preventing human error?
Here is an example of how the sheet looks like done manually:
+----------------------+-----------------------------+------+
| Order of named items | Items ranked (# = Identity) | Rank |
+----------------------+-----------------------------+------+
| 1 | 2 | 4 |
| 2 | 5 | 1 |
| 3 | 6 | 6 |
| 4 | 1 | 5 |
| 5 | 4 | 2 |
| 6 | 3 | 3 |
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |
| 4 | 4 | 4 |
| 5 | 5 | 5 |
| 6 | 6 | 6 |
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |
| 4 | 4 | 4 |
| 5 | 5 | 5 |
| 6 | 6 | 6 |
| 1 | 5 | 3 |
| 2 | 6 | 4 |
| 3 | 1 | 5 |
| 4 | 2 | 6 |
| 5 | 3 | 1 |
| 6 | 4 | 2 |
| 1 | 2 | 2 |
| 2 | 1 | 1 |
| 3 | 6 | 4 |
| 4 | 3 | 5 |
| 5 | 4 | 6 |
| 6 | 5 | 3 |
+----------------------+-----------------------------+------+
You can use this non volatile function:
=MATCH(A2,INDEX(B:B,INT((ROW(1:1)-1)/6)*6+2):INDEX(B:B,INT((ROW(1:1)-1)/6)*6+7),0)
Assuming 1st column starts at A2 and second column at B2 use this formula in C2 copied down
=MATCH(A2,OFFSET(B$2,6*INT((ROWS(C$2:C2)-1)/6),0,6),0)
OFFSET returns the 6 cell range required and MATCH finds the position of the relevant item within that
See screenshot below
I have the following tables
Orders:
OrderID|Cost|Quarter|User
-------------------------
1 | 10 | 1 | 1
2 | 15 | 1 | 2
3 | 3 | 2 | 1
4 | 5 | 3 | 3
5 | 8 | 4 | 2
6 | 9 | 2 | 3
7 | 6 | 3 | 3
Goals:
UserID|Goal|Quarter
-------------------
1 | 20 | 1
1 | 15 | 2
2 | 12 | 2
2 | 15 | 3
3 | 5 | 3
3 | 7 | 4
Users:
UserID|Name
-----------
1 | John
2 | Bob
3 | Homer
What I'm trying to do is to sum up all orders that one user had, divide it by the sum of his goals, then sum up all orders, devide the result by the sum of all goals and then add this result to the previous result of all Users.
The result should be:
UserID|Name |Goal|CostSum|Percentage|Sum all
---------------------------------------------------
1 |John | 35 | 13 | 0.37 |
2 |Bob | 27 | 23 | 0.85 |
3 |Homer| 12 | 20 | 1.67 |
the calculation is as follow:
CostSum: 10+3=13
Goal: 20+15=35
Percentage: CostSum/Goal=13/35=0.37
Sum all: 10+15+3+5+8+9+6=56
Goal all: 20+15+12+15+5+7=74
percentage all= Sum_all/Goal_all=56/74=0.76
Result: percentage+percentage_all=0.37+0.76=1.13 for John
1.61 for Bob
2.43 for Homer
My main problem is the last step. I cant get it to add the whole percentage. It will always filter the result so making it wrong.
To do this you're going to need to create some measures.
(I will assume you've already set your pivot table to be in tabular layout with subtotals switched off - this allows you to set UserID and Name next to each other in the row labels section.)
This is what our output will look like.
First let's be sure you've set up your relationships correctly - it should be like this:
I believe you already have the first 5 columns set up in your pivot table, so we need to create measures for CostSumAll, GoalSumAll, PercentageAll and Result.
The key to making this work is to ensure PowerPivot ignores the row label filter for your CostSumAll and GoalSumAll measures. The ALL() function acts as an override filter when used in CALCULATE() - you just have to specify which filters you want to ignore. In this case, UserID and Name.
CostSumAll:
=CALCULATE(SUM(Orders[Cost]),ALL(Users[UserID]),ALL(Users[Name]))
GoalSumAll:
=CALCULATE(SUM(Goals[Goal]),ALL(Users[UserID]),ALL(Users[Name]))
PercentageAll:
=Orders[CostSumAll]/Orders[GoalSumAll]
Result:
=Orders[Percentage]+Orders[PercentageAll]
Download - Example file available for download here. (Don't actually read it in Google Docs - it won't be able to handle the PowerPivot stuff. Save locally to view.)
I'm trying to find a solution without macros in excel for following problem:
There is a table containing ratings of a student for different time periods.
So the rating of the student with ID=1 was 1 from January to April and 3 from Mai to June.
Two other students had a constant ranking (6 and 9) from January to June
| A | B | C |D |
---| ----|------------|------------|-------|
1 | ID | START | END |RANKING|
2 | 1 | 01.01.2014 | 30.04.2014 | 1 |
3 | 1 | 01.05.2014 | 30.06.2014 | 3 |
4 | 2 | 01.01.2014 | 30.06.2014 | 6 |
5 | 3 | 01.01.2014 | 30.06.2014 | 9 |
Next table contains IDs (y axis) and Months (x axis)
| F | G | H | I | J | K | L |
---| ----|--------|--------|--------|--------|--------|--------|
1 | ID | 201401 | 201402 | 201403 | 201404 | 201405 | 201406 |
2 | 1 | | | | | | |
3 | 2 | | | | | | |
4 | 3 | | | | | | |
And I wish to feel this second table like this:
| ID | 201401 | 201402 | 201403 | 201404 | 201405 | 201406 |
| ----|--------|--------|--------|--------|--------|--------|
| 1 | 1 | 1 | 1 | 1 | 3 | 3 |
| 2 | 6 | 6 | 6 | 6 | 6 | 6 |
| 3 | 9 | 9 | 9 | 9 | 9 | 9 |
I tried to use Index and Match, but without any good results because I haven't found a posibility to use IF (if (
Could anybody help?
You can get what you're looking for with SUMPRODUCT
Given the layout you provided, this formula should work when put in G2 and filled down and over
=SUMPRODUCT(--($A:$A=$F2),--($B:$B<=G$1),--($C:$C>G$1),$D:$D)
That looks in column A for an ID matching F2, then for every one it finds of those:
It checks the date in column B against the date in G1
It checks the date in column C against the date in G1
If all criteria match, it returns the value in Column D
This assumes you only have one entry for each period, otherwise it will sum them.
Also, you can use SUMIFS, it's a little less easy to read but I think it's slightly more efficient than SUMPRODUCT (I'm not positive, just anecdotal evidence from usage)
=SUMIFS($D:$D,$A:$A,"="&$F3,$B:$B,"<="&G$1,$C:$C,">"&G$1)
It does the exact same thing, just with different syntax.