excel formula in one column based on variable entries in another column - excel

My goal is to populate column G for each row with a CPTCode(column B). If I were to copy and paste by hand this would simply be: f2/f5, f3/f5,f4/f5,f6/f15,f7/f15, etc. Column A contains the names of staff. Each month staff codes for different types of procedures(columns B and C) and a variable quantity of each(column E). The rows with total in column B represent the end of a staff person's monthly list of codes. We are hoping to provide monthly updates to our service chiefs. I am hoping someone might be able to help me devise a formula I could use to copy down column G. We are looking at roughly 3000-4000 rows of code per month for about 200 different clinical providers.
CPTCode CPTName Work RVU FY16 Total Qty FY16 Total RVU CPT's % of FY RVUs
96119 NEUROPSYCH TESTING BY TECH 0.6 76 41.8
99212 OFFICE/OUTPATIENT VISIT EST 0.5 2 1.0
T1016 CASE MANAGEMENT 0.5 1 0.5
Total 79 43.3
H0038 SELF-HELP/PEER SVC PER 15MIN 0.0 727 0.0
90853 GROUP PSYCHOTHERAPY 0.6 236 139.2
99212 OFFICE/OUTPATIENT VISIT EST 0.5 153 73.4
S9446 PT EDUCATION NOC GROUP 0.4 105 42.0
99211 OFFICE/OUTPATIENT VISIT EST 0.2 44 7.9
90785 PSYTX COMPLEX INTERACTIVE 0.3 10 3.3
99202 OFFICE/OUTPATIENT VISIT NEW 0.9 1 0.9
99213 OFFICE/OUTPATIENT VISIT EST 1.0 1 1.0
H0031 MH HEALTH ASSESS BY NON-MD 0.6 1 0.6
Total 1278 268.4
H0038 SELF-HELP/PEER SVC PER 15MIN 0.0 452 0.0
98967 HC PRO PHONE CALL 11-20 MIN 0.5 1 0.5
Total 453 0.5
[1]: http://i.stack.imgur.com/dw44F.jpg

Related

how to get quartiles and classify a value according to this quartile range

I have this df:
d = pd.DataFrame({'Name':['Andres','Lars','Paul','Mike'],
'target':['A','A','B','C'],
'number':[10,12.3,11,6]})
And I want classify each number in a quartile. I am doing this:
(d.groupby(['Name','target','number'])['number']
.quantile([0.25,0.5,0.75,1]).unstack()
.reset_index()
.rename(columns={0.25:"1Q",0.5:"2Q",0.75:"3Q",1:"4Q"})
)
But as you can see, the 4 quartiles are all equal because the code above is calculating per row so if there's one 1 number per row all quartiles are equal.
If a run instead:
d['number'].quantile([0.25,0.5,0.75,1])
Then I have the 4 quartiles I am looking for:
0.25 9.000
0.50 10.500
0.75 11.325
1.00 12.300
What I need as output(showing only first 2 rows)
Name target number 1Q 2Q 3Q 4Q Rank
0 Andres A 10.0 9.0 10.5 11.325 12.30 1
1 Lars A 12.3 9.0 10.5 11.325 12.30 4
you can see all quartiles has the the values considering tall values in the number column. Besides that, now we have a column names Rank that classify the number according to it's quartile. ex. In the first row 10 is within the 1st quartile.
Here's one way that build on the quantiles you've created by making it a DataFrame and joining it to d. Also assigns "Rank" column using rank method:
out = (d.join(d['number'].quantile([0.25,0.5,0.75,1])
.set_axis([f'{i}Q' for i in range(1,5)], axis=0)
.to_frame().T
.pipe(lambda x: x.loc[x.index.repeat(len(d))])
.reset_index(drop=True))
.assign(Rank=d['number'].rank(method='dense')))
Output:
Name target number 1Q 2Q 3Q 4Q Rank
0 Andres A 10.0 9.0 10.5 11.325 12.3 2.0
1 Lars A 12.3 9.0 10.5 11.325 12.3 4.0
2 Paul B 11.0 9.0 10.5 11.325 12.3 3.0
3 Mike C 6.0 9.0 10.5 11.325 12.3 1.0

how to update rows based on previous row of dataframe python

I have a time series data given below:
date product price amount
11/01/2019 A 10 20
11/02/2019 A 10 20
11/03/2019 A 25 15
11/04/2019 C 40 50
11/05/2019 C 50 60
I have a high dimensional data, and I have just added the simplified version with two columns {price, amount}. I am trying to transform it relatively based on time index illustrated below:
date product price amount
11/01/2019 A NaN NaN
11/02/2019 A 0 0
11/03/2019 A 15 -5
11/04/2019 C NaN NaN
11/05/2019 C 10 10
I am trying to get relative changes of each product based on time indexes. If previous date does not exist for a specified product, I am adding "NaN".
Can you please tell me is there any function to do this?
Group by product and use .diff()
df[["price", "amount"]] = df.groupby("product")[["price", "amount"]].diff()
output :
date product price amount
0 2019-11-01 A NaN NaN
1 2019-11-02 A 0.0 0.0
2 2019-11-03 A 15.0 -5.0
3 2019-11-04 C NaN NaN
4 2019-11-05 C 10.0 10.0

Excel and selecting variables conditionally

I have a data set which contains information by country. For example, Australia_F is the observation for Australia and Australia_Weight is the weight of Australia. Each period, represents a specific year.
Period Australia_F Canada_F Denmark_F Japan_F Australia_Weight Canada_Weight Denmark_Weight Japan_weight
1985 0.05 -0.02 0.02 0.03 0.10 0.30 0.45 0.15
1986 -0.04 -0.03 0.02 0.01 0.15 0.30 0.30 0.25
The user can input any value to the following cell. For example I have inserted 3
Weight_Modification = 3
The goal is to only include countries where the variable XXXXX_F are positive
and use those with the highest values such that the total weight of counties selected is not greater than 1.
The problem is complicated by the fact that the weight_modification variable, multiplies each individual county weight by whatever the value is. For example, the Weight for Australia would be 0.10 *3 = 0.3 in 1985.
Total weights can be less than 1.00 but can't be greater than 1.00
So taking the above data as an example and for 1985 the results would be
Australia_weight Canada_weight Denmark_weight Japan_weight Total_weight
0.3 0.45 0.75
This is because in 1985 Australia has the highest value (Australia_F = 0.05), followed by Japan (Japan_F = 0.03).
Each countries weights are multiplied by 3.
Denmark is not selected even through Denmark_F is positive, because including Denmark the total weight exceeds 1.
In the actual file there are many more countries (12 in total) and many years.
Any help with how to put this together in excel is greatly appreciated.

Convert/unpack pandas dataframe of tuples into a list to use as column headers without ( ,) syntax

I have trimmed strings within a column to isolate key words and create a dataframe (totalarea_cols) which I can then use to label headers of a second dataframe (totalarea_p).
However, it appears that keywords are created as tuples and when used to label columns in second dataframe, the tuples syntax is included (see sample below; totalarea_p.head())
Here is a sample of the code:
totalarea_code = df_meta_p2.loc[df_meta_p2['Label English'].str.contains('Total area under dry season '), 'Code'];
totalarea_cols = df_meta_p2['Label English'].str.extractall('Total area under dry season (.*)').reset_index(drop=True)
totalarea_p = df_data_p2.loc[: , totalarea_code];
totalarea_p.columns = totalarea_cols
Sample of metadata from which I would like to extract keyword from string:
In[33]: df_meta_p2['Label English']
Out[33]:
0 District code
1 Province code
2 Province name in English
3 District name in English
4 Province name in Lao
5 Total area under dry season groundnut (peanut)
6 Total number of households growing dry season ...
7 Total number of households growing dry season ...
8 Total number of households growing dry season ...
9 Total number of households growing dry season ...
10 Total number of households growing dry season ...
11 Total number of households growing dry season yam
12 Total number of households growing dry season ...
13 Total number of households growing dry season ...
14 Total number of households growing dry season ...
15 Total number of households growing dry season ...
16 Total number of households growing dry season ...
17 Total number of households growing dry season ...
18 Total number of households growing dry season ...
19 Total number of households growing dry season ...
Name: Label English, dtype: object
Sample of DataFrame output using str.extractall:
In [34]: totalarea_cols
Out[34]:
0
0 groundnut (peanut)
1 lowland rice/irrigation rice
2 upland rice
3 potato
4 sweet potato
5 cassava
6 yam
7 taro
8 other tuber, root and bulk crops
9 mungbeans
10 cowpea
11 sugar cane
12 soybean
13 sesame
14 cotton
15 tobacco
16 vegetable not specified
17 cabbage
Sample of column headers when substitute into second DataFrame, totalarea_p:
In [36]: totalarea_p.head()
Out[36]:
(groundnut (peanut),) (lowland rice/irrigation rice,) (upland rice,) \
0 0.0 0.00 0
1 0.0 0.00 0
2 0.0 0.00 0
3 0.0 0.30 0
4 0.0 1.01 0
(potato,) (sweet potato,) (cassava,) (yam,) (taro,) \
0 0.0 0.00 0.0 0.0 0
1 0.0 0.00 0.0 0.0 0
2 0.0 0.52 0.0 0.0 0
3 0.0 0.01 0.0 0.0 0
4 0.0 0.00 0.0 0.0 0
I have spent the better part of a day searching for an answer but, other than the post found here, am coming up blank. Any ideas??
You need select column 0 for Series, so change code to:
totalarea_p.columns = totalarea_cols[0]
Or select by position by iloc:
totalarea_p.columns = totalarea_cols.iloc[:, 0]

Excel need to sum distinct id's value

I am struggling to find the sum of distinct id's value. Example given below.
Week TID Ano Points
1 111 ANo1 1
1 112 ANo1 1
2 221 ANo2 0.25
2 222 ANo2 0.25
2 223 ANo2 0.25
2 331 ANo3 1
2 332 ANo3 1
2 333 ANo3 1
2 999 Ano9 0.25
2 998 Ano9 0.25
3 421 ANo4 0.25
3 422 ANo4 0.25
3 423 ANo4 0.25
3 531 ANo5 0.5
3 532 ANo5 0.5
3 533 ANo5 0.5
From the above data i need to bring the below result. Could anyone help please using some excel formula?
Week Points_Sum
1 1
2 1.50
3 0.75
You say "sum of distinct id's value"? All the IDs are different so I'm assuming you want to sum for each different "Ano" within the week?
=SUM(IF(FREQUENCY(IF(A$2:A$17=F2,MATCH(C$2:C$17,C$2:C$17,0)),ROW(A$2:A$17)-ROW(A$2)+1),D$2:D$17))
confirmed with CTRL+SHIFT+ENTER
where F2 contains a specific week number
Assumes that each "Ano" will always have the same points value
Probably not the most efficient solution... but this array formula works:
= SUMPRODUCT(IF($A$2:$A$15=$F2,$D$2:$D$15),1/MMULT((IF($A$2:$A$15=$F2,$D$2:$D$15)=
TRANSPOSE(IF($A$2:$A$15=$F2,$D$2:$D$15)))+0,(ROW($A$2:$A$15)>0)+0))
Note this is an array formula, so you have to press Ctrl+Shift+Enter after typing this formula instead of just Enter.
See working example below. This formula is in cell G2 and dragged down.

Resources