Modifying variables used in givens - theano

This is the definition of the train_model function.
train_model = theano.function(
[index],
cost,
updates=updates,
givens={
x: train_set_x[index * batch_size: (index + 1) * batch_size],
y: train_set_y[index * batch_size: (index + 1) * batch_size]
}
)
Say I modify training_set_x after this definition, which value will reflect in givens x? The old value or the new modified value?
In other words, when when a theano function is compiled, are the variables used in the expressions copied or just a reference is given?
Please do correct me if my question is wrong. Thank you.

My experiments have yielded that such variables are passed by value while compiling the theano function ie any change in the value of the variable after the function is compiled doesn't have any effect on the behaviour of the function.
Say I define my function like this
x=T.lscalar('x')
y=T.lscalar('y')
z=x+y
i=T.lscalar('i')
a=theano.shared(np.asarray([1,2,3,4]))
f=function([i],z,givens={x:a[i],y:a[i]})
Then f's behaviour is as follows
>>> f(0)
array(2)
>>> f(1)
array(4)
>>> f(2)
array(6)
>>> f(3)
array(8)
Now if I modify a as follows
a=theano.shared(np.asarray([0,0,0,0]))
Then the behaviour of the function remains the same.

Related

Reshaping with asterisk operator `*` in PyTorch

While reading this annotated implementation of Diffusion Probabilistic models in PyTorch, I got stuck at understanding this function
def extract(a, t, x_shape):
batch_size = t.shape[0]
out = a.gather(-1, t.cpu())
return out.reshape(batch_size, *((1,) * (len(x_shape) - 1))).to(t.device)
What it's not clear it's the final return statement, what does the *((1,) mean into reshape function? Does that asterisk correspond to the unpacking operator? And if yes, how is it used here?
(1,) * (len(x_shape) - 1))
means to create a tuple with length len(x_shape) - 1 filled with just 1s
*(...)
means to spread the tuple into arguments
So it ends up being (say len(x_shape) == 5)
return out.reshape(batch_size, 1, 1, 1, 1).to(t.device)

How to define a function in python containing a definite integral whose free parameter is running between two fixed values?

I would like to define a python function that has three parameters: C, psi, and m but it actually internally has an integral that is running based on another variable called eta.
def func(x: np.float64, C: np.float64, psi: np.float64, m: np.float64) -> np.float64:
return C * (1-psi) * quad(np.power(1-eta, psi) * np.power(m/x, C/eta+1)/eta, 0, np.inf, args=(C, psi, m, x))[0]
Since I would like to fit this function to a dataset, I cannot choose eta to be a free parameter for fitting and hence it should not appear as a parameter in the function signature. But, python doesn't know the internal eta as a defined variable.
File "c:\Users\username\Desktop\Project\my_script.py", line 277, in func
np.power(1 - eta, psi) * np.power(m / x, C/eta + 1) / eta, 0, np.inf,
NameError: name 'eta' is not defined
How can I let the function know that it is not expecting to be fed with a parameter but the inside integral takes care of that through integration.
---Thanks,

How to evaluate a function for a certain output list?

I am trying to find the output of a langrange interpolation function and predict the interpolated values from the equation post fitting the curve.
I have got the code for the function from a website. However I presume it just stores the equation in a format whereas I expect a result of values for the list supplied.
def langrange_polynomial(X, Y):
def L(i):
return lambda x: np.prod([(x-X[j])/(X[i]-X[j]) for j in range(len(X)) if i != j]) * Y[i]
Sx = [L(i) for i in range(len(X))] # summands
return lambda x: np.sum([s(x) for s in Sx])
Expectation is for the given function evaluate or predict the function at a certain value or list i.e.if I pass a list of numbers [2,3,4,5], i should get the corresponding output value f(x) where f(x) is my lagrange equation
As you said, the langrange_polynomial function returns a special kind of function in python, called a "lambda function". You can read about lambdas here: https://www.w3schools.com/python/python_lambda.asp
To use your code just assign the output of the langrange_polynomial to a variable (which will be of type function) and then use that variable as a usual function luke such:
f = langrange_polynomial(np.arange(10),np.arange(10))
f([3,60,30])
# output
3.115663712769497e+21

Theano multiplying by zero

Can anybody explain to me what is the meaning behind these two lines of code from here: https://github.com/Newmu/Theano-Tutorials/blob/master/4_modern_net.py
acc = theano.shared(p.get_value() * 0.)
acc_new = rho * acc + (1 - rho) * g ** 2
Is it a mistake? Why do we instantiate acc to zero and then multiply it by rho in next line? It looks like it will not achieve anything this way and remain zero. Will there be any difference if we replace "rho * acc" by just "acc"?
The full function is given below:
def RMSprop(cost, params, lr=0.001, rho=0.9, epsilon=1e-6):
grads = T.grad(cost=cost, wrt=params)
updates = []
for p, g in zip(params, grads):
acc = theano.shared(p.get_value() * 0.)
acc_new = rho * acc + (1 - rho) * g ** 2
gradient_scaling = T.sqrt(acc_new + epsilon)
g = g / gradient_scaling
updates.append((acc, acc_new))
updates.append((p, p - lr * g))
return updates
This is just a way to tell Theano "create a shared variable and initialize its value to be zero in the same shape as p."
This RMSprop method is a symbolic method. It does not actually compute the RmsProp parameter updates, it only tells Theano how parameter updates should be computed when the eventual Theano function is executed.
If you look further down the tutorial code you linked to you'll see the symbolic execution graph for the parameter updates are constructed by RMSprop via a call on line 67. These updates are then compiled into a Theano function called train in Python on line 69 and the train function is executed many times on line 74 within the for loops of lines 72 and 73. The Python function RMSprop will be called only once, irrespective of how many times the train function is called within the for loops on lines 72 and 73.
Within RMSprop, we are telling Theano that, for each parameter p, we need a new Theano variable whose initial value has the same shape as p and is 0. throughout. We then go on to tell Theano how it should update both this new variable (unnamed as far as Theano is concerned but named acc in Python) and how to update the parameter p itself. These commands do not alter either p or acc, they just tell Theano how p and acc should be updated later, once the function has been compiled (line 69) each time it is executed (line 74).
The function executions on line 74 will not call the RMSprop Python function, they execute a compiled version of RMSprop. There will be no initialization inside the compiled version because that already happened in the Python version of RMSprop. Each train execution of the line acc_new = rho * acc + (1 - rho) * g ** 2 will use the current value of acc not its initial value.

matrices are not aligned Error: Python SciPy fmin_bfgs

Problem Synopsis:
When attempting to use the scipy.optimize.fmin_bfgs minimization (optimization) function, the function throws a
derphi0 = np.dot(gfk, pk)
ValueError: matrices are not aligned
error. According to my error checking this occurs at the very end of the first iteration through fmin_bfgs--just before any values are returned or any calls to callback.
Configuration:
Windows Vista
Python 3.2.2
SciPy 0.10
IDE = Eclipse with PyDev
Detailed Description:
I am using the scipy.optimize.fmin_bfgs to minimize the cost of a simple logistic regression implementation (converting from Octave to Python/SciPy). Basically, the cost function is named cost_arr function and the gradient descent is in gradient_descent_arr function.
I have manually tested and fully verified that *cost_arr* and *gradient_descent_arr* work properly and return all values properly. I also tested to verify that the proper parameters are passed to the *fmin_bfgs* function. Nevertheless, when run, I get the ValueError: matrices are not aligned. According to the source review, the exact error occurs in the
def line_search_wolfe1
function in # Minpack's Wolfe line and scalar searches as supplied by the scipy packages.
Notably, if I use scipy.optimize.fmin instead, the fmin function runs to completion.
Exact Error:
File
"D:\Users\Shannon\Programming\Eclipse\workspace\SBML\sbml\LogisticRegression.py",
line 395, in fminunc_opt
optcost = scipy.optimize.fmin_bfgs(self.cost_arr, initialtheta, fprime=self.gradient_descent_arr, args=myargs, maxiter=maxnumit, callback=self.callback_fmin_bfgs, retall=True)
File
"C:\Python32x32\lib\site-packages\scipy\optimize\optimize.py", line
533, in fmin_bfgs old_fval,old_old_fval)
File "C:\Python32x32\lib\site-packages\scipy\optimize\linesearch.py", line
76, in line_search_wolfe1
derphi0 = np.dot(gfk, pk)
ValueError: matrices are not aligned
I call the optimization function with:
optcost = scipy.optimize.fmin_bfgs(self.cost_arr, initialtheta, fprime=self.gradient_descent_arr, args=myargs, maxiter=maxnumit, callback=self.callback_fmin_bfgs, retall=True)
I have spent a few days trying to fix this and cannot seem to determine what is causing the matrices are not aligned error.
ADDENDUM: 2012-01-08
I worked with this a lot more and seem to have narrowed the issues (but am baffled on how to fix them). First, fmin (using just fmin) works using these functions--cost, gradient. Second, the cost and the gradient functions both accurately return expected values when tested in a single iteration in a manual implementation (NOT using fmin_bfgs). Third, I added error code to optimize.linsearch and the error seems to be thrown at def line_search_wolfe1 in line: derphi0 = np.dot(gfk, pk).
Here, according to my tests, scipy.optimize.optimize pk = [[ 12.00921659]
[ 11.26284221]]pk type = and scipy.optimize.optimizegfk = [[-12.00921659] [-11.26284221]]gfk type =
Note: according to my tests, the error is thrown on the very first iteration through fmin_bfgs (i.e., fmin_bfgs never even completes a single iteration or update).
I appreciate ANY guidance or insights.
My Code Below (logging, documentation removed):
Assume theta = 2x1 ndarray (Actual: theta Info Size=(2, 1) Type = )
Assume X = 100x2 ndarray (Actual: X Info Size=(2, 100) Type = )
Assume y = 100x1 ndarray (Actual: y Info Size=(100, 1) Type = )
def cost_arr(self, theta, X, y):
theta = scipy.resize(theta,(2,1))
m = scipy.shape(X)
m = 1 / m[1] # Use m[1] because this is the length of X
logging.info(__name__ + "cost_arr reports m = " + str(m))
z = scipy.dot(theta.T, X) # Must transpose the vector theta
hypthetax = self.sigmoid(z)
yones = scipy.ones(scipy.shape(y))
hypthetaxones = scipy.ones(scipy.shape(hypthetax))
costright = scipy.dot((yones - y).T, ((scipy.log(hypthetaxones - hypthetax)).T))
costleft = scipy.dot((-1 * y).T, ((scipy.log(hypthetax)).T))
def gradient_descent_arr(self, theta, X, y):
theta = scipy.resize(theta,(2,1))
m = scipy.shape(X)
m = 1 / m[1] # Use m[1] because this is the length of X
x = scipy.dot(theta.T, X) # Must transpose the vector theta
sig = self.sigmoid(x)
sig = sig.T - y
grad = scipy.dot(X,sig)
grad = m * grad
return grad
def fminunc_opt_bfgs(self, initialtheta, X, y, maxnumit):
myargs= (X,y)
optcost = scipy.optimize.fmin_bfgs(self.cost_arr, initialtheta, fprime=self.gradient_descent_arr, args=myargs, maxiter=maxnumit, retall=True, full_output=True)
return optcost
In case anyone else encounters this problem ....
1) ERROR 1: As noted in the comments, I incorrectly returned the value from my gradient as a multidimensional array (m,n) or (m,1). fmin_bfgs seems to require a 1d array output from the gradient (that is, you must return a (m,) array and NOT a (m,1) array. Use scipy.shape(myarray) to check the dimensions if you are unsure of the return value.
The fix involved adding:
grad = numpy.ndarray.flatten(grad)
just before returning the gradient from your gradient function. This "flattens" the array from (m,1) to (m,). fmin_bfgs can take this as input.
2) ERROR 2: Remember, the fmin_bfgs seems to work with NONlinear functions. In my case, the sample that I was initially working with was a LINEAR function. This appears to explain some of the anomalous results even after the flatten fix mentioned above. For LINEAR functions, fmin, rather than fmin_bfgs, may work better.
QED
As of current scipy version you need not pass fprime argument. It will compute the gradient for you without any issues. You can also use 'minimize' fn and pass method as 'bfgs' instead without providing gradient as argument.

Resources