Excel and selecting variables conditionally - excel

I have a data set which contains information by country. For example, Australia_F is the observation for Australia and Australia_Weight is the weight of Australia. Each period, represents a specific year.
Period Australia_F Canada_F Denmark_F Japan_F Australia_Weight Canada_Weight Denmark_Weight Japan_weight
1985 0.05 -0.02 0.02 0.03 0.10 0.30 0.45 0.15
1986 -0.04 -0.03 0.02 0.01 0.15 0.30 0.30 0.25
The user can input any value to the following cell. For example I have inserted 3
Weight_Modification = 3
The goal is to only include countries where the variable XXXXX_F are positive
and use those with the highest values such that the total weight of counties selected is not greater than 1.
The problem is complicated by the fact that the weight_modification variable, multiplies each individual county weight by whatever the value is. For example, the Weight for Australia would be 0.10 *3 = 0.3 in 1985.
Total weights can be less than 1.00 but can't be greater than 1.00
So taking the above data as an example and for 1985 the results would be
Australia_weight Canada_weight Denmark_weight Japan_weight Total_weight
0.3 0.45 0.75
This is because in 1985 Australia has the highest value (Australia_F = 0.05), followed by Japan (Japan_F = 0.03).
Each countries weights are multiplied by 3.
Denmark is not selected even through Denmark_F is positive, because including Denmark the total weight exceeds 1.
In the actual file there are many more countries (12 in total) and many years.
Any help with how to put this together in excel is greatly appreciated.

Related

How to get the column name of a dataframe from values in a numpy array

I have a df with 15 columns:
df.columns:
0 class
1 name
2 location
3 income
4 edu_level
--
14 marital_status
after some transformations I got an numpy.ndarray with shape (15,3) named loads:
0.52 0.33 0.09
0.20 0.53 0.23
0.60 0.28 0.23
0.13 0.45 0.41
0.49 0.9
so on so on so on
So, 3 columns with 15 values.
What I need to do:
I want to get the df column name of the values from the first column of loads that are greater then .50
For this example, the columns of df related to the first column of loadswith values higher than 0.5 should return:
0 Class
2 Location
Same for the second column of loads, should return:
1 name
3 income
4 edu_level
and the same logic to the 3rd column of loads.
I managed to get the numparray loads they way I need it but I am having a bad time with this last part. I know I can simple manually pick the columns but this will be a hard task when df has more than 15 features.
Can anyone help me, please?
given your threshold you can create a boolean array in order to filter df.columns:
threshold = .5
for j in range(loads.shape[1]):
print(df.columms[loads[:,j]>threshold])

Calculating regional contribution to national GDP growth

Is there a simple way in R or Stata to calculate the regional contribution to national GDP growth?
For instance if I have the following, how do I calculate the contribution of the regions' growth to the overall national growth?
Region/country
% change
weight
Region 1
0.3
0.25
Region 2
0.1
0.25
Region 3
0.25
0.25
Region 4
0.15
0.25
Country
0.2
1
To get the contribution of each region, you just need to multiply the %change by it"s weight

In Python: How to convert 1/8th of space to 1/6th of space?

Have got dataframe at store-product level as shown in sample below:
Store Product Space Min Max Total_table Carton_Size
11 Apple 0.25 0.0625 0.75 2 6
11 Orange 0.5 0.125 0.5 2 null
11 Tomato 0.75 0.0625 0.75 2 6
11 Potato 0.375 0.0625 0.75 2 6
11 Melon 0.125 0.0625 0.5 2 null
Scenario: All product here have space in terms of 1/8th. But if a product have carton_size other than null, then that particular product space has to be converted in terms of 1/(carton_size)th considering the Min(Space shouldn't be lesser than Min) and Max(Space shouldn't be greater than Max) values. Can get space from non-carton products but at the end, sum of 'Space' column should be equivalent/lesser than 'Total_table' value. Also, these 1/8th and 1/6th values are in relation to the 'Total_table', this total_table value is splitted as Space for each product.
Example: In above given dataframe, Three products have carton size, so we can take 1/8th space from the non-carton product selecting from top and split it as 1/24(means 1/24 + 1/24 + 1/24 = 1/8), which can be added to three carton products to make it 1/6, which forms the expected output shown below considering Min and Max values. If any of the product doesn't satisfy Min or Max condition - leave that product(eg., Tomato).
Roughly Expected Output:
Store Product Space Min Max Total_table Carton_Size
11 Apple 0.292 0.0625 0.75 2 6
11 Orange 0.375 0.125 0.5 2 null
11 Tomato 0.75 0.0625 0.75 2 6
11 Potato 0.417 0.0625 0.75 2 6
11 Melon 0.125 0.0625 0.5 2 null
Need solution in Python.
Thanks in Advance!

Linear calculation of time in excel

I have a column containing time (s) in excel. But the problem is that there are duplicate time values and a given time could be repeated "n" times. What I'm trying to achieve is devide the time step linearly. So as you can see below 0.02 was repeated 3 times (i.e. n=3), so ideally I would want to find the difference between 0.02 and 0.01 and then divide that by n. so the first time value after 0.01 would be = 0.01333 which can be worked out as follows (0.02-0.01)/n then 0.01+n.
The problem is n is not constant and could have any value between 2 and 10.
Please find a sample of the data below.
time (s)
0.00
0.01
0.02
0.02
0.02
0.03
0.03
0.03
0.03
0.03
0.03
0.04
0.04
0.04
0.04
Assuming your list starts in cell A1, put this in cell B2:
=IF(COUNTIF(A:A,A2)=1,A2,B1+(A2-AGGREGATE(14,6,($A$2:A2)/($A$2:A2<>A2),1))/COUNTIF(A:A,A2))

excel formula in one column based on variable entries in another column

My goal is to populate column G for each row with a CPTCode(column B). If I were to copy and paste by hand this would simply be: f2/f5, f3/f5,f4/f5,f6/f15,f7/f15, etc. Column A contains the names of staff. Each month staff codes for different types of procedures(columns B and C) and a variable quantity of each(column E). The rows with total in column B represent the end of a staff person's monthly list of codes. We are hoping to provide monthly updates to our service chiefs. I am hoping someone might be able to help me devise a formula I could use to copy down column G. We are looking at roughly 3000-4000 rows of code per month for about 200 different clinical providers.
CPTCode CPTName Work RVU FY16 Total Qty FY16 Total RVU CPT's % of FY RVUs
96119 NEUROPSYCH TESTING BY TECH 0.6 76 41.8
99212 OFFICE/OUTPATIENT VISIT EST 0.5 2 1.0
T1016 CASE MANAGEMENT 0.5 1 0.5
Total 79 43.3
H0038 SELF-HELP/PEER SVC PER 15MIN 0.0 727 0.0
90853 GROUP PSYCHOTHERAPY 0.6 236 139.2
99212 OFFICE/OUTPATIENT VISIT EST 0.5 153 73.4
S9446 PT EDUCATION NOC GROUP 0.4 105 42.0
99211 OFFICE/OUTPATIENT VISIT EST 0.2 44 7.9
90785 PSYTX COMPLEX INTERACTIVE 0.3 10 3.3
99202 OFFICE/OUTPATIENT VISIT NEW 0.9 1 0.9
99213 OFFICE/OUTPATIENT VISIT EST 1.0 1 1.0
H0031 MH HEALTH ASSESS BY NON-MD 0.6 1 0.6
Total 1278 268.4
H0038 SELF-HELP/PEER SVC PER 15MIN 0.0 452 0.0
98967 HC PRO PHONE CALL 11-20 MIN 0.5 1 0.5
Total 453 0.5
[1]: http://i.stack.imgur.com/dw44F.jpg

Resources