Look at the gld_weight column of figure 1. It is throwing off completely wrong values. The btc_weight + gld_weight should always adds up to 1. But why is the gld_weight column not corresponding to the returned row values when I used the describe function?
Figure 1:
Figure 2:
Figure 3:
This is my source code:
import numpy as np
import pandas as pd
from pandas_datareader import data as wb
import matplotlib.pyplot as plt
assets = ['BTC-USD', 'GLD']
mydata = pd.DataFrame()
for asset in assets:
mydata[asset] = wb.DataReader(asset, data_source='yahoo', start='2015-1-1')['Close']
cleandata = mydata.dropna()
log_returns = np.log(cleandata/cleandata.shift(1))
annual_log_returns = log_returns.mean() * 252 * 100
annual_log_returns
annual_cov = log_returns.cov() * 252
annual_cov
pfolio_returns = []
pfolio_volatility = []
btc_weight = []
gld_weight = []
for x in range(1000):
weights = np.random.random(2)
weights[0] = weights[0]/np.sum(weights)
weights[1] = weights[1]/np.sum(weights)
weights /= np.sum(weights)
btc_weight.append(weights[0])
gld_weight.append(weights[1])
pfolio_returns.append(np.dot(annual_log_returns, weights))
pfolio_volatility.append(np.sqrt(np.dot(weights.T, np.dot(annual_cov, weights))))
pfolio_returns
pfolio_volatility
npfolio_returns = np.array(pfolio_returns)
npfolio_volatility = np.array(pfolio_volatility)
new_portfolio = pd.DataFrame({
'Returns': npfolio_returns,
'Volatility': npfolio_volatility,
'btc_weight': btc_weight,
'gld_weight': gld_weight
})
I'am not 100% sure i got your question correctly, but an issue might be, that you are not reassigning the output to new variable, therefore not saving it.
Try to adjust your code in this matter:
new_portfolio = new_portfolio.sort_values(by="Returns")
Or turn inplace parameter to True - link
Short answer :
The issue at hand was found in the for-loop were the initial weight value normalization was done. How its fixed: see update 1 below in the answer.
Background to getting the solution:
At first glance the code of OP seemed to be in order and values in the arrays were fitted as expected by the requests OP made via the written codes. From testing it appeared that with range(1000) was asking for trouble because value-outcome oversight was lost due to the vast amount of "randomness" results. Especially as the question was written as a transformation issue. So x/y axis values mixing or some other kind of transformation error was hard to study.
To tackle this I used static values as can be seen for annual_log_returns and annual_cov.
Then I've locked all outputs for print so the values become locked in place and can't be changed further down the processing. .. it was possible that the prints of code changed during run-time because the arrays were not locked (also suggested by Pavel Klammert in his answer).
After commented feedback I've figured out what OP meant with "the values are wrong. I then focused on the method how the used values, to fill the arrays, were created.
The issue of "throwing wrong values was found :
The use of weights[0] = weights[0]/np.sum(weights) replaces the original list weights[0] value for new weights[0] which then serves as new input for weights[1] = weights[1]/np.sum(weights) and therefore sum = 1 is never reached.
The variable names weights[0] and weights[1] were then changed into 'a' and 'b' at two places directly after the creation of weights [0] and [1] values to prevent overwriting the initial weights values. Then the outcome is as "planned".
Problem solved.
import numpy as np
import pandas as pd
pfolio_returns = []
pfolio_volatility = []
btc_weight = []
gld_weight = []
annual_log_returns = [0.69, 0.71]
annual_cov = 0.73
ranger = 5
for x in range(ranger):
weights = np.random.random(2)
weights[0] = weights[0]/np.sum(weights)
weights[1] = weights[1]/np.sum(weights)
weights /= np.sum(weights)
btc_weight.append(weights[0])
gld_weight.append(weights[1])
pfolio_returns.append(np.dot(annual_log_returns, weights))
pfolio_volatility.append(np.sqrt(np.dot(weights.T, np.dot(annual_cov, weights))))
print (weights[0])
print (weights[1])
print (weights)
#print (pfolio_returns)
#print (pfolio_volatility)
npfolio_returns = np.array(pfolio_returns)
npfolio_volatility = np.array(pfolio_volatility)
#df = pd.DataFrame(array, index = row_names, columns=colomn_names, dtype = dtype)
new_portfolio = pd.DataFrame({'Returns': npfolio_returns, 'Volatility': npfolio_volatility, 'btc_weight': btc_weight, 'gld_weight': gld_weight})
print (new_portfolio, '\n')
sort = new_portfolio.sort_values(by='Returns')
sort_max_gld_weight = sort.loc[ranger-1, 'gld_weight']
print ('Sort:\n', sort, '\n')
print ('sort max_gld_weight : "%s"\n' % sort_max_gld_weight) # if "999" contains the highest gld_weight... but most cases its not!
sort_max_gld_weight = sort.max(axis=0)[3] # this returns colomn 4 'gld_weight' value.
print ('sort max_gld_weight : "%s"\n' % sort_max_gld_weight) # this returns colomn 4 'gld_weight' value.
desc = new_portfolio.describe()
desc_max_gld_weight =desc.loc['max', 'gld_weight']
print ('Describe:\n', desc, '\n')
print ('desc max_gld_weight : "%s"\n' % desc_max_gld_weight)
max_val_gld = new_portfolio.loc[new_portfolio['gld_weight'] == sort_max_gld_weight]
print('max val gld:\n', max_val_gld, '\n')
locations = new_portfolio.loc[new_portfolio['gld_weight'] > 0.99]
print ('location:\n', locations)
Result can be for example:
0.9779586087178525
0.02204139128214753
[0.97795861 0.02204139]
Returns Volatility btc_weight gld_weight
0 0.702820 0.627707 0.359024 0.640976
1 0.709807 0.846179 0.009670 0.990330
2 0.708724 0.801756 0.063786 0.936214
3 0.702010 0.616237 0.399496 0.600504
4 0.690441 0.835780 0.977959 0.022041
Sort:
Returns Volatility btc_weight gld_weight
4 0.690441 0.835780 0.977959 0.022041
3 0.702010 0.616237 0.399496 0.600504
0 0.702820 0.627707 0.359024 0.640976
2 0.708724 0.801756 0.063786 0.936214
1 0.709807 0.846179 0.009670 0.990330
sort max_gld_weight : "0.02204139128214753"
sort max_gld_weight : "0.9903300366638084"
Describe:
Returns Volatility btc_weight gld_weight
count 5.000000 5.000000 5.000000 5.000000
mean 0.702760 0.745532 0.361987 0.638013
std 0.007706 0.114057 0.385321 0.385321
min 0.690441 0.616237 0.009670 0.022041
25% 0.702010 0.627707 0.063786 0.600504
50% 0.702820 0.801756 0.359024 0.640976
75% 0.708724 0.835780 0.399496 0.936214
max 0.709807 0.846179 0.977959 0.990330
desc max_gld_weight : "0.9903300366638084"
max val gld:
Returns Volatility btc_weight gld_weight
1 0.709807 0.846179 0.00967 0.99033
loacation:
Returns Volatility btc_weight gld_weight
1 0.709807 0.846179 0.00967 0.99033
Update 1 :
for x in range(ranger):
weights = np.random.random(2)
print (weights)
a = weights[0]/np.sum(weights) # see comments below.
print (weights[0])
b = weights[1]/np.sum(weights) # see comments below.
print (weights[1])
print ('w0 + w1=', weights[0] + weights[1])
weights /= np.sum(weights)
btc_weight.append(a)
gld_weight.append(b)
print('a=', a, 'b=',b , 'a+b=', a+b)
The new output becomes for example:
[0.37710183 0.72933416]
0.3771018292953062
0.7293341569809412
w0 + w1= 1.1064359862762474
a= 0.34082570882790686 b= 0.6591742911720931 a+b= 1.0
[0.09301326 0.05296838]
0.09301326441107827
0.05296838430180717
w0 + w1= 0.14598164871288544
a= 0.637157240181712 b= 0.3628427598182879 a+b= 1.0
[0.48501305 0.56078073]
0.48501305100305336
0.5607807281299131
w0 + w1= 1.0457937791329663
a= 0.46377503928658087 b= 0.5362249607134192 a+b= 1.0
[0.41271663 0.89734662]
0.4127166254704412
0.8973466186511199
w0 + w1= 1.3100632441215612
a= 0.31503564986069105 b= 0.6849643501393089 a+b= 1.0
[0.11854074 0.57862593]
0.11854073835784273
0.5786259314340823
w0 + w1= 0.697166669791925
a= 0.1700321364950252 b= 0.8299678635049749 a+b= 1.0
Results printed outside the for-loop:
0.1700321364950252
0.8299678635049749
[0.17003214 0.82996786]
I would like to round number in a code but in a way that it adapts to each values.
For example i would like a rounding algorithm to return :
0.999999 it should return 1
0.0749999 it should return 0.075
0.006599 it should return 0.0066
and so on ...
I don't know in advance the number of digits (which is kinda my problem)
I was thinking to use strings to find where are the 9s (or count the 0s) but it is quite a lot of effort for that i was thinking ?
If you know any way to do that (if possible without advanced libraries) i would appreciate.
Thanks.
It's some complicated. but, It works. Please make sure the result is what you want. I think you can understand how to round the number from code.
def clear9(numstr):
liststr = list(numstr)
for index in range(len(liststr)-1,-1,-1):
if liststr[index] == '.': continue
if liststr[index] == '9':
liststr[index] = '0'
if index == 0:
liststr.insert(0, '1')
else:
if index != len(liststr)-1:
liststr[index] = str(int(liststr[index])+1)
break
numstr = ''
for item in liststr:
numstr += item
return numstr
def myround(num):
numstr = str(num)
numstr = clear9(numstr)
return float(numstr)
print (myround(9.05))
print (myround(9.999999))
print (myround(0.999999))
print (myround(0.0749999))
print (myround(0.006599))
print (myround(0.00659923))
print (myround(0.09659923))
print (myround(-0.00659923))
9.05
10.0
1.0
0.075
0.0066
0.00659923
0.09659923
-0.00659923
import math
def round_(number):
dist = int(math.log10(abs(number))) #number of zeros after decimal point
return (round(number, abs(dist) + 2) if dist != 0 else round(number))
print(round_(0.999999))
print(round_(0.0749999))
print(round_(0.006599))
print(round_(-0.00043565))
output:
1
0.075
0.0066
-0.00044
Dealing with floating point numbers is tricky. You want to do a kind of round-off in base 10, but floating point numbers are base 2.
So I propose to use the decimal module, which can represent real numbers exactly, as opposed to base-2 floating point.:
from decimal import Decimal
def myround(num):
dec = Decimal(num)
adj = abs(dec.adjusted())+1
return round(num, adj)
Look at the documentation for Decimal.adjusted() to understand how this works.
A test:
In [1]: from decimal import Decimal
In [2]: def myround(num):
...: dec = Decimal(num)
...: adj = abs(dec.adjusted())+1
...: return round(num, adj)
...:
In [3]: myround(0.999999)
Out[3]: 1.0
In [4]: myround(0.006599)
Out[4]: 0.0066
In [5]: myround(0.0749999)
Out[5]: 0.075
I am trying to find conditional mutual information between three discrete random variable using pyitlib package for python with the help of the formula:
I(X;Y|Z)=H(X|Z)+H(Y|Z)-H(X,Y|Z)
The expected Conditional Mutual information value is= 0.011
My 1st code:
import numpy as np
from pyitlib import discrete_random_variable as drv
X=[0,1,1,0,1,0,1,0,0,1,0,0]
Y=[0,1,1,0,0,0,1,0,0,1,1,0]
Z=[1,0,0,1,1,0,0,1,1,0,0,1]
a=drv.entropy_conditional(X,Z)
##print(a)
b=drv.entropy_conditional(Y,Z)
##print(b)
c=drv.entropy_conditional(X,Y,Z)
##print(c)
p=a+b-c
print(p)
The answer i am getting here is=0.4632245116328402
My 2nd code:
import numpy as np
from pyitlib import discrete_random_variable as drv
X=[0,1,1,0,1,0,1,0,0,1,0,0]
Y=[0,1,1,0,0,0,1,0,0,1,1,0]
Z=[1,0,0,1,1,0,0,1,1,0,0,1]
a=drv.information_mutual_conditional(X,Y,Z)
print(a)
The answer i am getting here is=0.1583445441575102
While the expected result is=0.011
Can anybody help? I am in big trouble right now. Any kind of help will be appreciable.
Thanks in advance.
I think that the library function entropy_conditional(x,y,z) has some errors. I also test my samples, the same problem happens.
however, the function entropy_conditional with two variables is ok.
So I code my entropy_conditional(x,y,z) as entropy(x,y,z), the results is correct.
the code may be not beautiful.
def gen_dict(x):
dict_z = {}
for key in x:
dict_z[key] = dict_z.get(key, 0) + 1
return dict_z
def entropy(x,y,z):
x = np.array([x,y,z]).T
x = x[x[:,-1].argsort()] # sorted by the last column
w = x[:,-3]
y = x[:,-2]
z = x[:,-1]
# dict_w = gen_dict(w)
# dict_y = gen_dict(y)
dict_z = gen_dict(z)
list_z = [dict_z[i] for i in set(z)]
p_z = np.array(list_z)/sum(list_z)
pos = 0
ent = 0
for i in range(len(list_z)):
w = x[pos:pos+list_z[i],-3]
y = x[pos:pos+list_z[i],-2]
z = x[pos:pos+list_z[i],-1]
pos += list_z[i]
list_wy = np.zeros((len(set(w)),len(set(y))), dtype = float , order ="C")
list_w = list(set(w))
list_y = list(set(y))
for j in range(len(w)):
pos_w = list_w.index(w[j])
pos_y = list_y.index(y[j])
list_wy[pos_w,pos_y] += 1
#print(pos_w)
#print(pos_y)
list_p = list_wy.flatten()
list_p = np.array([k for k in list_p if k>0]/sum(list_p))
ent_t = 0
for j in list_p:
ent_t += -j * math.log2(j)
#print(ent_t)
ent += p_z[i]* ent_t
return ent
X=[0,1,1,0,1,0,1,0,0,1,0,0]
Y=[0,1,1,0,0,0,1,0,0,1,1,0]
Z=[1,0,0,1,1,0,0,1,1,0,0,1]
a=drv.entropy_conditional(X,Z)
##print(a)
b=drv.entropy_conditional(Y,Z)
c = entropy(X, Y, Z)
p=a+b-c
print(p)
0.15834454415751043
Based on the definitions of conditional entropy, calculating in bits (i.e. base 2) I obtain H(X|Z)=0.784159, H(Y|Z)=0.325011, H(X,Y|Z) = 0.950826. Based on the definition of conditional mutual information you provide above, I obtain I(X;Y|Z)=H(X|Z)+H(Y|Z)-H(X,Y|Z)= 0.158344. Noting that pyitlib uses base 2 by default, drv.information_mutual_conditional(X,Y,Z) appears to be computing the correct result.
Note that your use of drv.entropy_conditional(X,Y,Z) in your first example to compute conditional entropy is incorrect, you can however use drv.entropy_conditional(XY,Z), where XY is a 1D array representing the joint observations about X and Y, for example XY = [2*xy[0] + xy[1] for xy in zip(X,Y)].
I am using scipy.optimize.nnls to compute non-negative least square fit with a coefficients sum to 1 :
#! /usr/bin/env python3
import numpy as np
import scipy.optimize as soptimize
if __name__ == '__main__':
C = np.array([[112.771820, 174.429720, 312.175750, 97.348620],
[112.857010, 174.208300, 312.185270, 93.467580],
[114.897210, 175.661850, 314.275100, 99.015480]
]);
d = np.array([[112.7718, 174.4297, 312.1758, 97.3486]]);
for line in d:
ret , _= soptimize.nnls(C.T, line)
print(ret)
And I get :
[9.99992794e-01 7.27824399e-06 0.00000000e+00]
Is it possible to set some result column to a specific value for the nnls algorithm and to force it to generate the remaining result columns ?
For instance if my result is : [0.3 0.3 0.4]. I want to force the first column to 0.9 and the nnls should generate the other columns, like this :
[0.9 0.06 0.04]
Any help will be appreciated !
I was trying to replicate this paper (which is about to the Heston Model) using QuantLib tool (python 3.5).
Following the Python Quantlib Cookbook I was able to setup the parameters of page 12 from the paper. Quantlib´s result is 0.0497495 which is slightly different from paper´s result (0.049521147).
So, my question is what is the cause of this difference? Is it possible that day account have something to do here?
Code following Cookbook with papers´s parameters:
from QuantLib import *
import numpy as np
import math
#parameters
strike_price = 2
payoff = PlainVanillaPayoff(Option.Call, strike_price)
#option data
maturity_date = Date(16, 4, 2028)
spot_price = 1
strike_price = 2
volatility = 0.16 # the historical vols for a year
dividend_rate = 0.000
option_type = Option.Call
risk_free_rate = 0.000
day_count = Actual365Fixed()
calendar = UnitedStates()
calculation_date = Date(16, 4, 2018)
Settings.instance().evaluationDate = calculation_date
# construct the European Option
payoff = PlainVanillaPayoff(option_type, strike_price)
exercise = EuropeanExercise(maturity_date)
european_option = VanillaOption(payoff, exercise)
# construct the Heston process
v0 = 0.16 #volatility*volatility # spot variance
kappa = 1
theta = 0.16
sigma = 2
rho = -0.8
spot_handle = QuoteHandle(SimpleQuote(spot_price))
flat_ts = YieldTermStructureHandle(FlatForward(calculation_date,
risk_free_rate, day_count))
dividend_yield = YieldTermStructureHandle(FlatForward(calculation_date,
dividend_rate, day_count))
heston_process = HestonProcess(flat_ts, dividend_yield,spot_handle,
v0, kappa,theta, sigma, rho)
engine = AnalyticHestonEngine(HestonModel(heston_process),0.01, 1000)
european_option.setPricingEngine(engine)
h_price = european_option.NPV()
print("The Heston model price is",h_price)
PD: I used QuantLib engine to double check my code (I must say I have no experience using QuantLib). I get the paper´s result using my code.
The difference is partly, but not entirely due to the day counter.
If you use day_count = SimpleDayCounter(), leaving all else the same the QuantLib result becomes 0.04964543.
The rest of the difference is because you set the "relative tolerance" in the AnalyticHestonEngine to 0.01. If you set it to a smaller value, e.g. to 0.001, you get an answer of 0.04951948, which is consistent with the answer obtained in the paper of 0.0495.