Find Rank of a Variable in my Dataframe within For Loop - python-3.x

I understand how to add a new column that shows the Rank of the number, but I am looking to change this to show the rank of a variable in that column...
list_of_values = [1,14,125,23,12]
df['price'] contains all 500 of my prices, and I'd like to see how 1 compares to these 500 or how 125 ranks (ties should reflect the minimum (e.g. if there are two values of price=1, the ranking should be 500/500 for both))

Related

Excel - getting a value based on the max value off another row in a Table

I'm looking for a solution for a problem I'm facing in Excel. This is my table simplified:
Every sale has an unique ID, but more people can have contributed to a sale. the column "name" and "share of sales(%)" show how many people have contributed and what their percentage was.
Sale_ID
Name
Share of sales(%)
1
Person A
100
2
Person B
100
3
Person A
30
3
Person C
70
Now I want to add a column to my table that shows the name of the person that has the highest share of sales percentage per Sales_ID. Like this:
Sale_ID
Name
Share of sales(%)
Highest sales
1
Person A
100
Person A
2
Person B
100
Person B
3
Person A
30
Person C
3
Person C
70
Person C
So when multiple people have contributed the new column shows only the one with the highest value.
I hope someone can help me, thanks in advance!
You can try this on cell D2:
=LET(maxSales, MAXIFS(C2:C5,A2:A5,A2:A5),
INDEX(B2:B5, XMATCH(A2:A5&maxSales,A2:A5&C2:C5)))
or just removing the LET since maxSales is used only one time:
=INDEX(B2:B5, XMATCH(A2:A5&MAXIFS(C2:C5,A2:A5,A2:A5),A2:A5&C2:C5))
On cell E2 I provided another solution via MAP/XLOOKUP:
=LET(maxSales, MAXIFS(C2:C5,A2:A5,A2:A5),
MAP(A2:A5, maxSales, LAMBDA(a,b, XLOOKUP(a&b, A2:A5&C2:C5, B2:B5))))
similarly without LET:
=MAP(A2:A5, MAXIFS(C2:C5,A2:A5,A2:A5),
LAMBDA(a,b, XLOOKUP(a&b, A2:A5&C2:C5, B2:B5)))
and here is the output:
Explanation
The trick here is to identify the max share of sales per each group and this can be done via MAXIFS(max_range, criteria_range1, criteria1, [criteria_range2, criteria2], ...). The size and shape of the max_range and criteria_rangeN arguments must be the same.
MAXIFS(C2:C5,A2:A5,A2:A5)
it produces the following output:
maxSales
100
100
70
70
MAXIFS will provide an output of the same size as criteria1, so it returns for each row the corresponding maximum sales for each Sale_ID column value.
It is the array version equivalent to the following formula expanding it down:
MAXIFS($C$2:$C$5,$A$2:$A$5,A2)
INDEX/XMATCH Solution
Having the array with the maximum Shares of sales, we just need to identify the row position via XMATCH to return the corresponding B2:B5 cell via INDEX. We use concatenation (&) to consider more than one criteria to find as part of the XMATCH input arguments.
MAP/XLOOKUP Solution
We use MAP to find for each pair of values (a,b) per row, of the first two MAP input arguments where is the maximum value found for that group and returns the corresponding Name column value. In order to make a lookup based on an additional criteria we use concatenation (&) in XLOOKUP first two input arguments.

Arithmetic operations for groups within a dataframe

I have loaded multiple CSV (time series) to create one dataframe. This dataframe contains data for multiple stocks. Now I want to calculate 1 month return for all the datapoints.
There 172 datapoints for each stock i.e. from index 0 to 171. The time series for next stock starts from index 0 again.
When I am trying to calculate the 1 month return its getting calculated correctly for all data points except for index 0 of new stock. Because it is taking the difference with index 171 of the previous stock.
I want the return to be calculated per stock name basis so I tried the for loop but it doesnt seem working.
e.g. In the attached image (highlighted) the 1 month return is calculated for company name ITC with SHREECEM. I expect for SHREECEM the first value of 1Mreturn should be NaN
Using groupby instead of a for loop you can get the result you want:
Mreturn_function = lambda df: df['mean_price'].diff(periods=1)/df['mean_price'].shift(1)*100
gw_stocks.groupby('CompanyName').apply(Mreturn_function)

How to find row and column names of the n-th highest values in a 2D array in Excel?

I want to find the row and column names of the n-th highest values in a 2D array in Excel.
My array has a header row (the Coins) and a header column (the Markets). The data itself displays if a coin is supported on the market and if so what the approximate return of investment (ROI) will be in percent.
Example
An example of the array could look like this:
ROI
Coin A
Coin B
Coin C
Market 1
N/A
7.8%
5.7%
Market 2
0.4%
6.8%
N/A
Market 3
0.45%
7.6%
12.3%
Pay attention: So some values are also set to N/A (or is there a better way to display that a market doesn't support a specific coin? I don't want to enter 0% as it makes it harder to spot is a coin is supported by the market. I also don't want to leave that field blank because then I don't know if I already checked that market for that coin.)
Preferred output
The output for the example table from above with n=3 should then look like this (from high ROI to low):
Coin
Market
ROI
C
3
12.3%
B
1
7.8%
A
3
0.45%
Requirements
Each coin must only be shown once. So, for example, Coin B must not be listed twice in the Top3 output (once for Market 1: 7.8% and once for Market 3: 7.6%)
What I tried
So I thought about how to split up that problem into smaller parts. I think, it will come to these main parts:
find header/row name
here I found something to find the column name for the highest value per row but I wasn't able to adapt it to a working solution for a 2D array
find max in 2D array
here they describe to find the max value in a 2D array but not how to find the n-th highest values
find n-th highest values
here is a good explanation on how to find the highest n values of a 1D array but not how to apply that for a 2D array
only include each coin once
So I really tried to solve this myself but I struggle with adding these different parts together.

Excel IF OR Statement

I am having trouble determining the correct way to calculate a final rank order for four categories. Each of the four metrics make up a higher group. A Top 10 of each category is applied to the respective product to risk analysis.
CURRENT LOGIC - Assignment of 25% max per category.
Columns - Y4
Parts
0.25
25
=IF(L9=1,$Y$4,IF(L9=2,$Y$4*0.9, IF(L9=3,$Y$4*0.8, IF(L9=4,$Y$4*0.7, IF(L9=5,$Y$4*0.6, IF(L9=6,$Y$4*0.5, IF(L9=7,$Y$4*0.4, IF(L9=8,$Y$4*0.3, IF(L9=9,$Y$4*0.2, IF(L9=10,$Y$4*0.1,0))))))))))
DESIRED...
I would like to use a statement to determine three criteria in order to apply a score (1=100, 2=90, 3=80, etc..).
SUM the rank positions of each of the four categories-apply product rank ascending (not including NULL since it's not in the Top 10)
IF a product is identified in more than one metric-apply a significant contribution weight of (*.75),
IF a product has the number 1 rank in any of the four metrics-apply a score of (100).
Data - UPDATED EXAMPLE
(Product) Parts Labor Overhead External Final Score
"XYZ" 3 1 7 7 100
"ABC" NULL 6 NULL 2 100
"LMN" 4 NULL NULL NULL 70
This is way beyond my capability. ANY assistance is appreciated greatly!!!
Jim
I figured this is a good start and I can alter the weight as needed to reflect the reality of the situation.
=AVERAGE(G28:I28)+SUM(G28:I28)*0.25
However, I couldn't figure out how to put a cap on the score of no more than 100 points.
I am still unclear of what exactly you are attempting and if this will work, but how about this simple matrix using an array formula and some conditional formatting.
Array Formula in F2 (make sure to press Ctrl+Shift+Enter when exiting formula edit mode)
=MIN(100,SUM(IF(B2:E2<>"NULL",CHOOSE(B2:E2,100,90,80,70,60,50,40,30,20,10))))
Conditional Formatting defined as shown below.
Red = 100 value where it comes from a 1
Yellow = 100 value where it comes from more than 1 factor, but without a 1.

Find the top n values in a range while keeping the sum of values in another range under x value

I'd like to accomplish the following task. There are three columns of data. Column A represents price, where the sum needs to be kept under $100,000. Column B represents a value. Column C represents a name tied to columns A & B.
Out of >100 rows of data, I need to find the highest 8 values in column B while keeping the sum of the prices in column A under $100,000. And then return the 8 names from column C.
Can this be accomplished?
EDIT:
I attempted the Solver solution w/ no luck. 200 rows looks to be the max w/ Solver, and that is what I'm using now. Here are the steps I've taken:
Create a column called rank RANK(B2,$B$2:$B$200) (used column D -- what is the purpose of this?)
Create a column called flag just put in zeroes (used column E)
Create 3 total cells total_price (=SUM(A2:A200)), total_value (=SUM(B2:B200)) and total_flag (=(E2:E200))
Use solver to minimize total_value (shouldn't this be maximize??)
Add constraints -Total_price<=100000 -Total_flag=8 -Flag cells are binary
Using Simplex LP, it simply changes the flags for the first 8 values. However, the total price for the first 8 values is >$100,000 ($140k). I've tried changing some options in the Solver Parameters as well as using different solving methods to no avail. I'd like to post an image of the parameter settings, but don't have enough "reputation".
EDIT #2:
The first 5 rows looks like this, price goes down to ~$6k at the bottom of the table.
Price Value Name Rank Flag
$22,538 42.81905675 Blow, Joe 1 0
$22,427 37.36240932 Doe, Jane 2 0
$17,158 34.12127693 Hall, Cliff 3 0
$16,625 33.97654031 Povich, John 4 0
$15,631 33.58212402 Cow, Holy 5 0
I'll give you the solver solution as a starting point. It involves the creation of some extra columns and total cells. Note solver is limited in the amount of cells it can handle but will work with 100 anyway.
Create a column called rank RANK(B2,$B$2:$B$100)
Create a column called flag just put in zeroes
Create 3 total cells total_price, total_value and total_flag
Use solver to minimize total_value
Add constraints
-Total_price<=100000
-Total_flag=8
-Flag cells are binary
This will flag the rows you want and you can grab the names however you want.

Resources