I am having difficulty reading open source mllib code for SGD with L2 regularization.
The code is
class SquaredL2Updater extends Updater {
override def compute(
weightsOld: Vector,
gradient: Vector,
stepSize: Double,
iter: Int,
regParam: Double): (Vector, Double) = {
// add up both updates from the gradient of the loss (= step) as well as
// the gradient of the regularizer (= regParam * weightsOld)
// w' = w - thisIterStepSize * (gradient + regParam * w)
// w' = (1 - thisIterStepSize * regParam) * w - thisIterStepSize * gradient
val thisIterStepSize = stepSize / math.sqrt(iter)
val brzWeights: BV[Double] = weightsOld.toBreeze.toDenseVector
brzWeights :*= (1.0 - thisIterStepSize * regParam)
brzAxpy(-thisIterStepSize, gradient.toBreeze, brzWeights)
val norm = brzNorm(brzWeights, 2.0)
(Vectors.fromBreeze(brzWeights), 0.5 * regParam * norm * norm)
}
The part I am having trouble with is
brzWeights :*= (1.0 - thisIterStepSize * regParam)
the breeze lib has documentation that explains the :*= operator
/** Mutates this by element-wise multiplication of b into this. */
final def :*=[TT >: This, B](b: B)(implicit op: OpMulScalar.InPlaceImpl2[TT, B]): This = {
op(repr, b)
repr
}
it looks like its just multiplication of a vector by a scalar.
The formula I found for gradient in case of L2 regularization is
How does the code represent this gradient in this update? Can someone help please.
Ok, I figured it out. The updater equation is
rearranging terms gives
recognizing the last term is just the gradient
This is equivalent to the code which has
brzAxpy(-thisIterStepSize, gradient.toBreeze, brzWeights)
breaking that out
brzWeights = brzWeights + -thisIterStepSize * gradient.toBreeze
in the previous line, brzWeights :*= (1.0 - thisIterStepSize * regParam)
which means
brzWeights = brzWeights * (1.0 - thisIterStepSize * regParam)
so, finally
brzWeights = brzWeights * (1.0 - thisIterStepSize * regParam) + (-thisIterStepSize) * gradient.toBreeze
Now the code and equation match within a normalization factor, which I believe is taken care of in the following line.
Related
I want to get a piecewise function like this for tensors in pytorch. But I don't know how to define it. I use a very stupid method to do it, but it seems not to work in my code.
def trapezoid(self, X):
Y = torch.zeros(X.shape)
Y[X % (2 * pi) < (0.5 * pi)] = (X[X % (2 * pi) < (0.5 * pi)] % (2 * pi)) * 2 / pi
Y[(X % (2 * pi) >= (0.5 * pi)) & (X % (2 * pi) < 1.5 * pi)] = 1.0
Y[X % (2 * pi) >= (1.5 * pi)] = (X[X % (2 * pi) >= (1.5 * pi)] % (2 * pi)) * (-2 / pi) + 4
return Y
could do you help me find out how to design the function trapezoid, so that for tensor X, I can get the result directly using trapezoid(X)?
Since your function has period 2π we can focus on [0,2π]. Since it's piecewise linear, it's possible to express it as a mini ReLU network on [0,2π] given by:
trapezoid(x) = 1 - relu(x-1.5π)/0.5π - relu(0.5π-x)/0.5π
Thus, we can code the whole function in Pytorch like so:
import torch
import torch.nn.functional as F
from torch import tensor
from math import pi
def trapezoid(X):
# Left corner position, right corner position, height
a, b, h = tensor(0.5*pi), tensor(1.5*pi), tensor(1.0)
# Take remainder mod 2*pi for periodicity
X = torch.remainder(X,2*pi)
return h - F.relu(X-b)/a - F.relu(a-X)/a
Plotting to double check produces the correct picture:
import matplotlib.pyplot as plt
X = torch.linspace(-10,10,1000)
Y = trapezoid(X)
plt.plot(X,Y)
plt.title('Pytorch Trapezoid Function')
I have a dataset from kaggle of 45,253 rows and a single column for temperature in Kelvin for the city of Detroit. It's mean = 282.97, std = 11, min = 243.48, max = 308.05.
This is the result when plotted as a histogram of 100 bins with density=True:
I am expected to write the following two functions and see whichever one approximates the closest to the histogram:
Like this one here using scipy.stats.norm.pdf:
I generated the above image using:
x = np.linspace(dataset.Detroit.min(), dataset.Detroit.max(), 1001)
P_norm = norm.pdf(x, dataset.Detroit.mean(), dataset.Detroit.std())
plot_pdf_single(x, P_norm)
However, whenever I try to implement any of the two approximation functions all of my values for P_norm result in 0s or infs.
This is what I tried:
P_norm = [(1.0/(np.sqrt(2.0*pi*(std*std))))*np.exp(((-x_i-mu)*(-x_i-mu))/(2.0*(std*std))) for x_i in x]
I also broke it down into parts for a single x_i:
part1 = ((-x[0] - mu)*(-x[0] - mu)) / (2.0*(std * std))
part2 = np.exp(part1)
part3 = 1.0 / (np.sqrt(2.0 * pi * (std*std)))
total = part3*part2
I got the following values:
1145.3913234604413
inf
0.036267480036493875
inf
Since both of the equations use the same formula:
def pdf_approximation(x_i, mu, std):
return (1.0 / (np.sqrt(2.0 * pi * (std*std)))) * np.exp((-(x_i-mu)*(x_i-mu)) / (2.0 * (std*std)))
The code for the first approximation is:
mu = 283
std = 11
P_norm = np.array([pdf_approximation(x_i, mu, std) for x_i in x])
plot_pdf_single(x, P_norm)
The code for the second approximation is:
mu1 = 276
std1 = 6
mu2 = 293
std2 = 6.5
P_norm = np.array([(pdf_approximation(x_i, mu1, std1) * 0.5) + (pdf_approximation(x_i, mu2, std2) * 0.5) for x_i in x])
plot_pdf_single(x, P_norm)
I am trying to model something which needs to do the definite integration. The code is showing as below:
import tensorflow as tf
from numpy import pi, inf
from tensorflow import log, sqrt, exp, pow
from scipy.integrate import quad # for integration
def risk_neutral_pdf(phi, a, S, K, r, sigma, Mt, p_dict):
phii = tf.complex(0., phi)
A = tf.cast(0., tf.complex64)
B = tf.cast(0., tf.complex64)
p_dict['gamma'] = p_dict['gamma'] + p_dict['lamda'] + .5
p_dict['lamda'] = -.5
for t in range(Mt-1, -1, -1):
temp = 1. - 2. * p_dict['alpha'] * B
A = A + (phii + a) * r + p_dict['omega'] * B - .5 * log(temp)
B = B * p_dict['beta'] + (phii + a) * (p_dict['lamda'] + p_dict['gamma']) - \
.5 * p_dict['gamma']**2. + (.5*((phii + a) - p_dict['gamma'])**2. / temp)
return tf.real(S**a * (S/K)**phii * exp(A + B * sigma**2.) / phii)
p_dict={'lamda': 0.205, 'omega': 5.02e-6, 'beta': 0.589, 'gamma': 421.39, 'alpha': 1.32e-6}
S = 100.
K = 100.
r = 0.
Mt = 0
sq_ht = sqrt(.15**2/252.)
sigma = sq_ht
P1 = tf.py_func(lambda z: quad(risk_neutral_pdf, z, inf, args=(1., S, K, r, sigma, Mt, p_dict))[0],
[0.], tf.float64)
with tf.Session() as sess:
res = sess.run(P1)
print(res)
The result returns "InvalidArgumentError (see above for traceback): ValueError: Tensor("pow:0", shape=(), dtype=float32) must be from the same graph as Tensor("Cast_2:0", shape=(), dtype=complex64)." However, no matter how I change the code or reference the solution in "ValueError: Tensor A must be from the same graph as Tensor B", it does not work. I am wondering if I did wrong when putting the tf.reset_default_graph() at the top place or should the code needs be done some changes.
Thank you. (Tensroflow version: 1.6.0)
Update:
I find that the sigma variable has been sqrt before passing into the risk_neutral_pdf function and be powered when return which is not necessary. So after modifying the return to return tf.real(S**a * (S/K)**phii * exp(A + B * sigma) / phii) and the sq_ht to .15**2/252.. The error changes to "TypeError: a float is required", which I think caused by quad and Tensor. Any ideas to solve??
Many thanks.
I want to put the red polygon in place of the empty one. But it goes above it first before returning again to it. Can someone help me with that?
Why's the red polygon goes out of plane, then returns to its specified place again?
def Rotating(Rotating_angle, polygon_points): # Drawing the rotated figure
my_points = (re.findall("\(\-?\d*\.?\d*\,\-?\d*\.?\d*\)", polygon_points))
sleep_time = .5
my_new_points = [] # Scale_points
for point in my_points:
new_point = str(point).replace(")", "").replace("(", "").split(",")
# creating a list with all coordinates components
all_coordinates_components = []
all_coordinates_components.append(abs(eval(new_point[0])))
all_coordinates_components.append(abs(eval(new_point[1])))
point = (scale * eval(new_point[0]), scale * eval(new_point[1]))
my_new_points.append(point)
rotated_points = []
for point in my_new_points:
new_point = str(point).replace(")", "").replace("(", "").split(",")
theta = Rotating_angle
X = (eval(new_point[0]) * cos(theta * pi / 180)) - (eval(new_point[1]) * sin(theta * pi / 180))
Y = (eval(new_point[0]) * sin(theta * pi / 180)) + (eval(new_point[1]) * cos(theta * pi / 180))
# length = sqrt((X) ** 2 + (Y) ** 2)
point = (X, Y)
rotated_points.append(point)
# draw steps
time.sleep(sleep_time)
draw_rotation_steps(my_new_points, theta) # draw steps ((((( 3 )))))
# drawing rotated polygon
draw_polygon(rotated_points) # draw rotated polygon ((((( 4 )))))
s = Shape('compound')
poly1 = (my_new_points)
s.addcomponent(poly1, fill="red")
register_shape('myshape', s)
shape('myshape')
polygon = Turtle(visible=False)
polygon.setheading(90)
polygon.speed('slowest')
polygon.penup()
polygon.shape('myshape')
polygon.st()
polygon.circle(0, theta)
pen_dot = Turtle(visible=False)
pen_dot.speed('fastest')
for point in rotated_points:
pen_dot.penup()
pen_dot.goto(point)
pen_dot.pendown()
pen_dot.dot(5, 'blue violet')
I can't reproduce the behaviour you describe. But your code is riddled with issues that should be addressed so perhaps fixing those might also fix the positioning issue:
These loops are nested, but they shouldn't be:
my_new_points = []
for point in my_points:
...
rotated_points = []
for point in my_new_points:
They both should be at the same level.
You shouldn't use eval(). In this situation, use float():
point = (scale * eval(new_point[0]), scale * eval(new_point[1]))
Here, you've already converted the points but you turn them back into strings and reconvert them:
new_point = str(point).replace(")", "").replace("(", "").split(",")
X = (eval(new_point[0]) * cos(theta * pi / 180)) - (eval(new_point[1]) * sin(theta * pi / 180))
when you can simply do:
X = point[0] * cos(theta * pi / 180) - point[1] * sin(theta * pi / 180)
You don't have to put the pen down for the .dot() method to work:
pen_dot.goto(point)
pen_dot.pendown()
pen_dot.dot(5, 'blue violet')
so you can move the penup() out of the loop.
Below is my rework of your example code. I've added just enough code to make it runnable and removed anything that had nothing to do with the problem. See if this gives you any ideas of how to simplify and fix your own code:
import re
from math import sin, cos, pi
from turtle import *
scale = 20
def draw_rotation_steps(points, theta):
''' Not supplied by OP '''
pass
def draw_polygon(rotated_points):
''' Simple replacement since not supplied by OP '''
hideturtle()
penup()
goto(rotated_points[-1])
pendown()
for point in rotated_points:
goto(point)
def Rotating(theta, polygon_points): # Drawing the rotated figure
my_points = re.findall(r"\(\-?\d*\.?\d*\,\-?\d*\.?\d*\)", polygon_points)
my_new_points = [] # Scaled points
for point in my_points:
X, Y = point.replace(")", "").replace("(", "").split(",")
point = (scale * float(X), scale * float(Y))
my_new_points.append(point)
rotated_points = []
for point in my_new_points:
X = point[0] * cos(theta * pi / 180) - point[1] * sin(theta * pi / 180)
Y = point[0] * sin(theta * pi / 180) + point[1] * cos(theta * pi / 180)
point = (X, Y)
rotated_points.append(point)
# draw steps
draw_rotation_steps(my_new_points, theta) # draw steps
# drawing rotated polygon
draw_polygon(rotated_points) # draw rotated polygon
s = Shape('compound')
s.addcomponent(my_new_points, fill="red")
register_shape('myshape', s)
polygon = Turtle('myshape', visible=False)
polygon.setheading(90)
polygon.showturtle()
polygon.circle(0, theta, steps=100) # added steps to visually slow it down
pen_dot = Turtle(visible=False)
pen_dot.penup()
for point in rotated_points:
pen_dot.goto(point)
pen_dot.dot(5, 'blue violet')
Rotating(180, "(-8,-6) (-6,-3) (-3,-4)")
mainloop()
I use the following equation to get a nice color gradient from colorA to colorB, but I have no idea how to do the same for 3 colors, so the gradient goes from colorA to colorB to colorC
colorT = colorA * p + colorB * (1.0 - p); where "p" is the a percentage from 0.0 to 1.0
Thanks
Thanks for the formula. But I had to make some modifications to it, as it didn't interpolate between the 3 colors properly (there was jumps in color change)
Here is the fix for that:
if (p < 0.5)
{
COLORx = (COLORb * p * 2.0) + COLORa * (0.5 - p) * 2.0;
}
else
{
COLORx = COLORc * (p - 0.5) * 2.0 + COLORb * (1.0 - p) * 2.0;
}
Well, for 3 colors, you can just to the same with p = 0.0 to 2.0:
if p <= 1.0
colorT = colorA * p + colorB * (1.0 - p);
else
colorT = colorB * (p - 1.0) + colorC * (2.0 - p);
Or scale it so you can still use p = 0.0 to 1.0:
if p <= 0.5
colorT = colorA * p * 2.0 + colorB * (0.5 - p) * 2.0;
else
colorT = colorB * (p - 0.5) * 2.0 + colorC * (1.0 - p) * 2.0;
It might be possible to construct a single expression for that, but the simplest is to use a condition to use different expressions depending on whether you are in the A - B part or B - C part of the range:
colorT =
p < 0.5
? colorA * p * 2.0 + colorB * (1.0 - p * 2.0)
: colorB * (p - 0.5) * 2.0 + colorC * (1.0 - (p - 0.5) * 2.0);
one possible solution is to use interpolation via Bézier Curve:
http://en.wikipedia.org/wiki/B%C3%A9zier_curve
if you look at the special case Quadratic Bézier Curve, you can see a formula that interpolate between 3 points, or colors in your case.
colorT=(1-p*p)Color0 + 2(1-p)Color1 + (p*p)Color2 , 0<=p<=1
This is a generalization of your linear formula.
EDIT:
on second though, this method doesn't get your results, as the intermediate point is never touched.
To get a smooth curve that touch all of your points (colors) you have to use a spline http://en.wikipedia.org/wiki/Spline_interpolation
You want to be able to create 3 color but equal gradients? Exactly the same: after you're done with this gradient, start a new one where colorA is the current colorB and colorB is the new color. Append the results and you're done:
colorA ---- colorB colorB ---- colorC
Good luck!