How to write maximize or minimize function in J - j

For example, if I want to maximize the expectation of returns function
E[r]= w1r1+w2r2 and solve the optimization value for the weight w1 and w2.

The only constraint that you have really given is that w1+w2=1
w1 =.0.25
(,~ -.)w1
0.25 0.75
That takes care of both w1 and w2 given the value of w1.
r1 r2 +/#:* w1 w2 calculates r1w1 + r2w2
r1 =. 5
r2 =.10
(r1,r2) (+/#:* (,-.))w1
8.75
(r1,r2) (+/#:* (,-.))0.9
5.5
(r1,r2) (+/#:* (,-.))0.01
9.95
If you really wanted to maximize you would need to add equations for the value of r1 and r2 and take those into account as well, but perhaps I don't understand your question?
Responding to the comment below: If the constraint of w1+w2=1 still is in play, then the matter just becomes summing the values in r1 and r2, then whichever is bigger should get the w value of 1 and the other will get the w value of 0
r1=.2 4 6 3 2
r2=.2.1 4 6 3 2
r3=.2 4 6 3 2.3
r1 (,-.)#:>/#:(+/#:,.) r2
0 1
r2 (,-.)#:>/#:(+/#:,.) r1
1 0
r3 (,-.)#:>/#:(+/#:,.) r2
1 0
'w1 w2'=.r3 (,-.)#:>/#:(+/#:,.) r2
w1
1
w2
0
'w1 w2'=.r1 (,-.)#:>/#:(+/#:,.) r2
w1
0
w2
1
(r1,.r2) +/#:,#:(+ . *) (0 1) NB. w1=.0 w2=.1
17.1
(r1,.r2) +/#:,#:(+ . *) (1 0) NB. w1=.1 w2=.0
17
(r1,.r2) +/#:,#:(+ . *) (0.5 0.5) NB. w1=.0.5 w2=.0.5
17.05
Based on the follow up comment below I would approach it in one of two ways. I could dig up all my linear programming texts from the 1980's and come up with the definitive mathematical solution (including degenerative cases and local maxima/ minima) or using the same technique as above but for a larger case than n=2. I'm going with the second option.
Let's look first at the r matrix which will be a set of constants. For this example I am taking a random 5 X 10 matrix with values from 1 to 10.
r=. >: ? 5 10 $ 10
r
4 4 8 1 4 3 6 9 6 2
2 6 5 4 4 7 5 10 4 6
2 4 9 10 1 1 9 8 2 7
5 6 5 4 7 9 2 6 10 6
10 3 6 2 10 2 7 10 4 2
Now the trick that I am going to use is that I want to find the column with the highest average to be multiplied by the largest value of w. Easy to do with J using (+/ % #)
(+/ % #) r
4.6 4.6 6.6 4.2 5.2 4.4 5.8 8.6 5.2 4.6
Then find the ranking of the list to be able to reorder the columns of the original r matrix. The leading 7 means that 7 { r is the largest average etc.
\:#:(+/ % #) r
7 2 6 4 8 0 1 9 5 3
I use this to in turn reorder the columns of the matrix r using {"1 since I am working columns. The result is that I have reordered the columns of r so that the column with the largest average is on the left and smallest on the right.
(\:#:(+/ % #) {"1 ]) r
9 8 6 4 6 4 4 2 3 1
10 5 5 4 4 2 6 6 7 4
8 9 9 1 2 2 4 7 1 10
6 5 2 7 10 5 6 6 9 4
10 6 7 10 4 10 3 2 2 2
Once I have that, then the next thing is to develop the w vector. Since I now have all the largest averages on the left I will just maximize the values to the left of w to be as large as possible within the noted constraints.
w=. 0.2 0.2 0.2 0.2 0.15 0.01 0.01 0.01 0.01 0.01
#w NB. w1 through w10
10
+/w NB. sum of the values in w
1
>./w NB. largest value in w
0.2
<./w NB. smallest value in w
0.01
Because the r matrix has been reordered using + . * the dot product gives values for w1r1 , w2r2 , w3r3 ... w10r10
(({"1~ \:#: (+/ % #))r) + . * w
1.8 1.6 1.2 0.8 0.9 0.04 0.04 0.02 0.03 0.01
2 1 1 0.8 0.6 0.02 0.06 0.06 0.07 0.04
1.6 1.8 1.8 0.2 0.3 0.02 0.04 0.07 0.01 0.1
1.2 1 0.4 1.4 1.5 0.05 0.06 0.06 0.09 0.04
2 1.2 1.4 2 0.6 0.1 0.03 0.02 0.02 0.02
to actually get the weight of the matrix ravel all the values then sum
+/ , (({"1~ \:#: (+/ % #))r) + . * w
31.22

Related

Split value of a row with equal values based on the unique entry

I have a dataframe which looks like this:
Name T1 T2 alpha
A 10 3 30
A 11 5 Nan
A 13 5 Nan
B 5 2 7
B 3 1 Nan
However need to divide the alpha column in equal parts to replace Nan values for each unique names in Name column like this: eg 30 for A becomes 10 each for corresponding row where A is present
Name T1 T2 alpha
A 10 3 10
A 11 5 10
A 13 5 10
B 5 2 3.5
B 3 1 3.5
I tried using explode but it is not working as I want it to look like, any idea here would help
With groupby and transform
df['alpha'] = pd.to_numeric(df['alpha'],errors = 'coerce').fillna(0).groupby(df['Name']).transform('mean')
df
Out[50]:
Name T1 T2 alpha
0 A 10 3 10.0
1 A 11 5 10.0
2 A 13 5 10.0
3 B 5 2 3.5
4 B 3 1 3.5
This'll get the job done:
df['alpha'] = (df['alpha'] / df['T2']).ffill()
Output:
Name T1 T2 alpha
0 A 10 3 10.0
1 A 11 5 10.0
2 A 13 5 10.0
3 B 5 2 3.5
4 B 3 1 3.5

Create Multiple rows for each value in given column in pandas df

I have a dataframe with points given in two columns x and y.
Thing x y length_x length_y
0 A 1 3 1 2
1 B 2 3 2 1
These (x,y) points are situated in the middle of one of the sides of a rectangle with vertex lengths length_x and length_y. What I wish to do is for each of these points give the coordinates of the rectangles they are on. That is: the following coordinated for Thing A would be:
(1+1*0.5, 3), (1-1*0.5,3), (1+1*0.5,3-2*0.5), (1-1*0.5, 3-2*0.5)
The half comes from the fact that the given lengths are the middle-points of an object so half the length is the distance from that point to the corner of the rectangle.
Hence my desired output is:
Thing x y Corner_x Corner_y length_x length_y
0 A 1 3 1.5 2.0 1 2
1 A 1 3 1.5 1.0 1 2
2 A 1 3 0.5 2.0 1 2
3 A 1 3 0.5 1.0 1 2
4 A 1 3 1.5 2.0 1 2
5 B 2 3 3.0 3.0 2 1
6 B 2 3 3.0 2.5 2 1
7 B 2 3 1.0 3.0 2 1
8 B 2 3 1.0 2.5 2 1
9 B 2 3 3.0 3.0 2 1
I tried to do this with defining a lambda returning two value but failed. Tried even to create multiple columns and then stack them, but it's really dirty.
bb = []
for thing in list_of_things:
new_df = df[df['Thing']=='{}'.format(thing)]
df = df.sort_values('x',ascending=False)
df['corner 1_x'] = df['x']+df['length_x']/2
df['corner 1_y'] = df['y']
df['corner 2_x'] = df['x']+1df['x_length']/2
df['corner 2_y'] = df['y']-df['length_y']/2
.........
Note also that the first corner's coordinates need to be repeated as I later what to use geopandas to transform each of these sets of coordinates into a POLYGON.
What I am looking for is a way to generate these rows is a fast and clean way.
You can use apply to create your corners as lists and explode them to the four rows per group.
Finally join the output to the original dataframe:
df.join(df.apply(lambda r: pd.Series({'corner_x': [r['x']+r['length_x']/2, r['x']-r['length_x']/2],
'corner_y': [r['y']+r['length_y']/2, r['y']-r['length_y']/2],
}), axis=1).explode('corner_x').explode('corner_y'),
how='right')
output:
Thing x y length_x length_y corner_x corner_y
0 A 1 3 1 2 1.5 4
0 A 1 3 1 2 1.5 2
0 A 1 3 1 2 0.5 4
0 A 1 3 1 2 0.5 2
1 B 2 3 2 1 3 3.5
1 B 2 3 2 1 3 2.5
1 B 2 3 2 1 1 3.5
1 B 2 3 2 1 1 2.5

Replace a column value with number using pandas

For the following dataset, I can replace column 1 with the numeric value easily.
df['1'].replace(['A', 'B', 'C', 'D'], [0, 1, 2, 3], inplace=True)
But if I have 3600 or more than that different values in a column, how can I replace it with the numeric values without writing the value of the column.
Please let me know. I don't understand how to do that. If anybody has any solution please share with me.
Thanks in advance.
import pandas as pd
df = pd.DataFrame({1:['A','B','C','C','D','A'],
2:[0.6,0.9,5,4,7,1,],
3:[0.3,1,0.7,8,2,4]})
print(df)
1 2 3
0 A 0.6 0.3
1 B 0.9 1.0
2 C 5.0 0.7
3 C 4.0 8.0
4 D 7.0 2.0
5 A 1.0 4.0
np.where makes it easy.
import numpy as np
df[1] = np.where(df[1]=="A", "0",
np.where(df[1]=="B", "1",
np.where(df[1]=="C","2",
np.where(df[1]=="D","3",np.nan))))
print(df)
1 2 3
0 0 0.6 0.3
1 1 0.9 1.0
2 2 5.0 0.7
3 2 4.0 8.0
4 3 7.0 2.0
5 0 1.0 4.0
But if you have a lot of categories, you might want to think about other ways.
import string
upper=list(string.ascii_uppercase)
a=pd.DataFrame({'Alp':upper})
print(a)
Alp
0 A
1 B
2 C
3 D
4 E
5 F
6 G
7 H
8 I
9 J
.
.
19 T
20 U
21 V
22 W
23 X
24 Y
25 Z
for k in np.arange(0,26):
a=a.replace(to_replace =upper[k],value =k)
print(a)
Alp
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
.
.
.
21 21
22 22
23 23
24 24
25 25
If there is many values for replace you can use factorize:
df[1] = pd.factorize(df[1])[0] + 1
print (df)
1 2 3
0 1 0.6 0.3
1 2 0.9 1.0
2 3 5.0 0.7
3 3 4.0 8.0
4 4 7.0 2.0
5 1 1.0 4.0
You could do something like
df.loc[df['1'] == 'A','1'] = 0
df.loc[df['1'] == 'B','1'] = 1
### Or
keys = df['1'].unique().tolist()
i = 0
for key in keys
df.loc[df['1'] == key,'1'] = i
i = i+1

Year On Year Growth Using Pandas - Traverse N rows Back

I have a lot of parameters on which I have to calculate the year on year growth.
Type 2006-Q1 2006-Q2 2006-Q3 2006-Q4 2007-Q1 2007-Q2 2007-Q3 2007-Q4 2008-Q1 2008-Q2 2008-Q3 2008-Q4
MonMkt_IntRt 3.44 3.60 3.99 4.40 4.61 4.73 5.11 4.97 4.92 4.89 5.29 4.51
RtlVol 97.08 97.94 98.25 99.15 99.63 100.29 100.71 101.18 102.04 101.56 101.05 99.49
IntRt 4.44 5.60 6.99 7.40 8.61 9.73 9.11 9.97 9.92 9.89 7.29 9.51
GMR 9.08 9.94 9.25 9.15 9.63 10.29 10.71 10.18 10.04 10.56 10.05 9.49
I need to calculate the growth, i.e in column 2007-Q1 i need to find the growth from 2006-Q1. The formula is (2007-Q1/2006-Q1) - 1
I have gone through the link below and tried to code
Calculating year over year growth by group in Pandas
df = pd.read_csv('c:/Econometric/EconoModel.csv')
df.set_index('Type',inplace=True)
df.sort_index(axis=1, inplace=True)
df_t = df.T
df_output=(df_cd_americas_t/df_cd_americas_t.shift(4)) -1
The output is as below
Type 2006-Q1 2006-Q2 2006-Q3 2006-Q4 2007-Q1 2007-Q2 2007-Q3 2007-Q4 2008-Q1 2008-Q2 2008-Q3 2008-Q4
MonMkt_IntRt 0.3398 0.3159 0.2806 0.1285 0.0661 0.0340 0.0363 -0.0912
RtlVol 0.0261 0.0240 0.0249 0.0204 0.0242 0.0126 0.0033 -0.0166
IntRt 0.6666 0.5375 0.3919 0.2310 0.1579 0.0195 0.0856 -0.2688
GMR 0.0077 -0.031 0.1124 0.1704 0.0571 -0.024 -0.014 -0.0127
Use iloc to shift data slices. See an example on test df.
df= pd.DataFrame({i:[0+i,1+i,2+i] for i in range(0,12)})
print(df)
0 1 2 3 4 5 6 7 8 9 10 11
0 0 1 2 3 4 5 6 7 8 9 10 11
1 1 2 3 4 5 6 7 8 9 10 11 12
2 2 3 4 5 6 7 8 9 10 11 12 13
df.iloc[:,list(range(3,12))] = df.iloc[:,list(range(3,12))].values/ df.iloc[:,list(range(0,9))].values - 1
print(df)
0 1 2 3 4 5 6 7 8 9 10
0 0 1 2 inf 3.0 1.50 1.00 0.75 0.600000 0.500000 0.428571
1 1 2 3 3.0 1.5 1.00 0.75 0.60 0.500000 0.428571 0.375000
2 2 3 4 1.5 1.0 0.75 0.60 0.50 0.428571 0.375000 0.333333
11
0 0.375000
1 0.333333
2 0.300000
I could not find any issue with your code.
Simply added axis=1 to the dataframe.shift() method as you are trying to do the column comparison
I have executed the following code it is giving the result you expected.
def getSampleDataframe():
df_economy_model = pd.DataFrame(
{
'Type':['MonMkt_IntRt', 'RtlVol', 'IntRt', 'GMR'],
'2006-Q1':[3.44, 97.08, 4.44, 9.08],
'2006-Q2':[3.6, 97.94, 5.6, 9.94],
'2006-Q3':[3.99, 98.25, 6.99, 9.25],
'2006-Q4':[4.4, 99.15, 7.4, 9.15],
'2007-Q1':[4.61, 99.63, 8.61, 9.63],
'2007-Q2':[4.73, 100.29, 9.73, 10.29],
'2007-Q3':[5.11, 100.71, 9.11, 10.71],
'2007-Q4':[4.97, 101.18, 9.97, 10.18],
'2008-Q1':[4.92, 102.04, 9.92, 10.04],
'2008-Q2':[4.89, 101.56, 9.89, 10.56],
'2008-Q3':[5.29, 101.05, 7.29, 10.05],
'2008-Q4':[4.51, 99.49, 9.51, 9.49]
}) # Your data
return df_economy_model>
df_cd_americas = getSampleDataframe()
df_cd_americas.set_index('Type', inplace=True)
df_yearly_growth = (df/df.shift(4, axis=1))-1
print (df_cd_americas)
print (df_yearly_growth)

How to expand Python Pandas Dataframe in linearly spaced increments

Beginner question:
I have a pandas dataframe that looks like this:
x1 y1 x2 y2
0 0 2 2
10 10 12 12
and I want to expand that dataframe by half units along the x and y coordinates to look like this:
x1 y1 x2 y2 Interpolated_X Interpolated_Y
0 0 2 2 0 0
0 0 2 2 0.5 0.5
0 0 2 2 1 1
0 0 2 2 1.5 1.5
0 0 2 2 2 2
10 10 12 12 10 10
10 10 12 12 10.5 10.5
10 10 12 12 11 11
10 10 12 12 11.5 11.5
10 10 12 12 12 12
Any help would be much appreciated.
The cleanest way I know how to expand rows like this is through groupby.apply. May be faster to use something like itertuples in pandas but it will be a little more complicated code (keep that in mind if your data-set is larger).
groupby the index which will send each row to my apply function (your index has to be unique for each row, if its not just run reset_index). I can return a DataFrame from my apply therefore we can expand from one row to multiple rows.
caveat, your x2-x1 and y2-y1 distance must be the same or this won't work.
import pandas as pd
import numpy as np
def expand(row):
row = row.iloc[0] # passes a dateframe so this gets reference to first and only row
xdistance = (row.x2 - row.x1)
ydistance = (row.y2 - row.y1)
xsteps = np.arange(row.x1, row.x2 + .5, .5) # create steps arrays
ysteps = np.arange(row.y1, row.y2 + .5, .5)
return (pd.DataFrame([row] * len(xsteps)) # you can expand lists in python by multiplying like this [val] * 3 = [val, val, val]
.assign(int_x = xsteps, int_y = ysteps))
(df.groupby(df.index) # "group" on each row
.apply(expand) # send row to expand function
.reset_index(level=1, drop=True)) # groupby gives us an extra index we don't want
starting df
x1 y1 x2 y2
0 0 2 2
10 10 12 12
ending df
x1 y1 x2 y2 int_x int_y
0 0 0 2 2 0.0 0.0
0 0 0 2 2 0.5 0.5
0 0 0 2 2 1.0 1.0
0 0 0 2 2 1.5 1.5
0 0 0 2 2 2.0 2.0
1 10 10 12 12 10.0 10.0
1 10 10 12 12 10.5 10.5
1 10 10 12 12 11.0 11.0
1 10 10 12 12 11.5 11.5
1 10 10 12 12 12.0 12.0

Resources