How to reuse Theano function with different shared variables without rebuilding graph? - theano

I have a Theano function that is called several times, each time with different shared variables. The way it is implemented now, the Theano function gets redefined every time it is run. I assume, that this make the whole program slow, because every time the Theano functions gets defined the graph is rebuild.
def sumprod_shared(T_shared_array1, T_shared_array2):
f = theano.function([], (T_shared_array1 * T_shared_array2).sum(axis=0))
return f()
for factor in range(10):
m1 = theano.shared(factor * array([[1, 2, 4], [5, 6, 7]]))
m2 = theano.shared(factor * array([[1, 2, 4], [5, 6, 7]]))
print sumprod_shared(m1, m2)
For non shared (normal) variables I can define the function once and then call it with different variables without redefining.
def sumprod_init():
T_matrix1 = T.lmatrix('T_matrix1')
T_matrix2 = T.lmatrix('T_matrix2')
return theano.function([T_matrix1, T_matrix2], (T_matrix1 * T_matrix2).sum(axis=0))
sumprod = sumprod_init()
for factor in range(10):
np_array1 = factor * array([[1, 2, 4], [5, 6, 7]])
np_array2 = factor * array([[1, 2, 4], [5, 6, 7]])
print sumprod(np_array1, np_array2)
Is this possible also for shared variables?

You can use the givens keyword in theano.function for that. Basically, you do the following.
m1 = theano.shared(name='m1', value = np.zeros((3,2)) )
m2 = theano.shared(name='m2', value = np.zeros((3,2)) )
x1 = theano.tensor.dmatrix('x1')
x2 = theano.tensor.dmatrix('x2')
y = (x1*x2).sum(axis=0)
f = theano.function([],y,givens=[(x1,m1),(x2,m2)],on_unused_input='ignore')
then to loop through values you just set the value of the shared variables to the value you'd like. You have to set the on_unused_input to 'ignore' to use functions with no arguments in theano, by the way. Like this:
array1 = array([[1,2,3],[4,5,6]])
array2 = array([[2,4,6],[8,10,12]])
for i in range(10):
m1.set_value(i*array1)
m2.set_value(i*array2)
print f()
It should work, at least that's how I've been working around it.

Currently it is not easily possible to reuse a Theano function with different shared variable.
But you have alternative:
Is it really a bottleneck? In the example, it is, but I suppose it is a simplified case. The only way to know is to profile it.
You compile 1 Theano function with the first shared variable. Then you can call the get_value/set_value on those shared variables before calling the Theano function. This way, you won't need to recompile the Theano function.

Related

Updated list does not get used in Python

import math
array = [16,5,3,4,11,9,13]
for x in array[0:len(array)-1]:
key=x
index=array.index(x)
posj=index
for y in array[index+1:len(array)]:
if y<key:
key=y
posj=array.index(y)
if index!=posj:
hold=array[index]
array[index]=key
array[posj]=hold
print(array)
I'm trying to implement insertion sort.
It appears after using the debugger that in every loop iteration, it is using the array [16,5,3,4,11,9,13] instead of the updated array that results after a loop iteration.
How can I make x be the updated element for the given indicie?
Instead of
for x in array[0:len(array)-1]:
try
for x in array:
Output
[3, 4, 5, 9, 11, 13, 16]

Why is taking a slice of a list which is assigned to another list not changing the original?

I have a class that is a representation of a mathematical tensor. The tensor in the class, is stored as a single list, not lists inside another list. That means [[1, 2, 3], [4, 5, 6]] would be stored as [1, 2, 3, 4, 5, 6].
I've made a __setitem__() function and a function to handle taking slices of this tensor while it's in single list format. For example slice(1, None, None) would become slice(3, None, None) for the list mentioned above. However when I assign this slice a new value, the original tensor isn't updated.
Here is what the simplified code looks like
class Tensor:
def __init__(self, tensor):
self.tensor = tensor # Here I would flatten it, but for now imagine it's already flattened.
def __setitem__(self, slices, value):
slices = [slices]
temp_tensor = self.tensor # any changes to temp_tensor should also change self.tensor.
for s in slices: # Here I would call self.slices_to_index(), but this is to keep the code simple.
temp_tensor = temp_tensor[slice]
temp_tensor = value # In my mind, this should have also changed self.tensor, but it hasn't.
Maybe i'm just being stupid and can't see why this isn't working. Maybe my actual questions isn't just ' why doesn't this work?' but also 'is there a better way to do this?'. Thanks for any help you can give me.
NOTES:
Each 'dimension' of the list must have the same shape, so [[1, 2, 3], [4, 5]] isn't allowed.
This code is massively simplified as there are many other helper functions and stuff like that.
in __init__() I would flatten the list but as I just said to keep things simple I left that out, along with self.slice_to_index().
You should not think of python variables as in c++ or java. Think of them as labels you place on values. Check this example:
>>> l = []
>>> l.append
<built-in method append of list object at 0x7fbb0d40cf88>
>>> l.append(10)
>>> l
[10]
>>> ll = l
>>> ll.append(10)
>>> l
[10, 10]
>>> ll
[10, 10]
>>> ll = ["foo"]
>>> l
[10, 10]
As you can see, ll variable first points to the same l list but later we just make it point to another list. Modifying the later ll won't modify the original list pointed by l.
So, in your case if you want self.tensor to point to a new value, just do it:
class Tensor:
def __init__(self, tensor):
self.tensor = tensor # Here I would flatten it, but for now imagine it's already flattened.
def __setitem__(self, slices, value):
slices = [slices]
temp_tensor = self.tensor # any changes to the list pointed by temp_tensor will be reflected in self.tensor since it is the same list
for s in slices:
temp_tensor = temp_tensor[slice]
self.tensor = value

Tensorflow map_fn Out of Memory Issues

I am having issues with my code running out of memory on large data sets. I attempted to chunk the data to feed it into the calculation graph but I eventually get an out of memory error. Would setting it up to use the feed_dict functionality get around this problem?
My code is set up like the following, with a nested map_fn function due to a result of the tf_itertools_product_2D_nest function.
tf_itertools_product_2D_nest function is from Cartesian Product in Tensorflow
I also tried a variation where I made a list of tensor-lists which was significantly slower than doing it purely in tensorflow so I'd prefer to avoid that method.
import tensorflow as tf
import numpy as np
config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.9
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
sess = tf.Session()
sess.run(tf.global_variables_initializer())
tensorboard_log_dir = "../log/"
def tf_itertools_product_2D_nest(a,b): #does not work on nested tensors
a, b = a[ None, :, None ], b[ :, None, None ]
#print(sess.run(tf.shape(a)))
#print(sess.run(tf.shape(b)))
n_feat_dimension_in_common = tf.shape(a)[-1]
c = tf.concat( [ a + tf.zeros_like( b ), tf.zeros_like( a ) + b ], axis = 2 )
return c
def do_calc(arr_pair):
arr_1 = arr_pair[0]
arr_binary = arr_pair[1]
return tf.reduce_max(tf.cumsum(arr_1*arr_binary))
def calc_row_wrapper(row):
return tf.map_fn(do_calc,row)
for i in range(0,10):
a = tf.constant(np.random.random((7,10))*10,tf.float64)
b = tf.constant(np.random.randint(2, size=(3,10)),tf.float64)
a_b_itertools_product = tf_itertools_product_2D_nest(a,b)
'''Creates array like this:
[ [[arr_a0,arr_b0], [arr_a1,arr_b0],...],
[[arr_a0,arr_b1], [arr_a1,arr_b1],...],
[[arr_a0,arr_b2], [arr_a1,arr_b2],...],
...]
'''
with tf.summary.FileWriter(tensorboard_log_dir, sess.graph) as writer:
result_array = sess.run(tf.map_fn(calc_row_wrapper,a_b_itertools_product),
options=run_options,run_metadata=run_metadata)
writer.add_run_metadata(run_metadata,"iteration {}".format(i))
print(result_array.shape)
print(result_array)
print("")
# result_array should be an array with 3 rows (1 for each binary vector in b) and 7 columns (1 for each row in a)
I can imagine that is unnecessarily consuming memory due to the extra dimension added. Is there a way to mimic the outcome of the standard itertools.product() function to output 1 long list of every possible combination of items in the 2 input iterables? Like the result of:
itertools.product([[1,2],[3,4]],[[5,6],[7,8]])
# [([1, 2], [5, 6]), ([1, 2], [7, 8]), ([3, 4], [5, 6]), ([3, 4], [7, 8])]
That would eliminate the need to call map_fn twice.
When map_fn is called within a loop as my code shows, will it keep spawning graphs for every iteration? There appears to be a big "map_" node for every iteration cycle in this code's Tensorboardgraph.
Tensorboard Default View (not enough reputation yet)
When I select a particular iteration based on the tag in Tensorboard, only the map node corresponding to the iteration is highlighted with all the others grayed out. Does that mean that for that cycle only the map node for that cycle is present (and the others no longer, if from a previous cycle , exist in memory)?
Tensorboard 1 iteration view

Integer Linear Programming with CVXPY in python3

I'm trying to solve an integer linear programming problem using the CVXPY but am struggling with some syntax and can not figure out a way of how to enforce my variable that I'm interested to solve for the constraint to take values of either 0 or 1. I thought that setting it to be boolean was the solution in the Variable object, but for some reason I'm not getting what I want to
I installed the cvxpy library and tried to run it using a small example to solve it. The input for my problem is a binary matrix M of size (I, J) that only has values of (0 or 1),
also the variable that I want to solve for is a boolean (or a binary vector again) vector P of size J,
the objective function is to minimize the sum of the values of my Variable vector of size J (i.e. minimize the number of 1s inside that vector)
such that sum of each row of my matrix M times my variable Vector P is greater or equal to 1.
i.e. Summation(over j) of Mij*Pj >= 1, for all i.
with the objective of minimizing sum of vector P.
I wrote the following code to do that however I'm struggling in finding what is it that I did wrong in it.
import numpy as np
import cvxpy as cp
M = np.array([[1,0,0,0], [1,0,0,0], [0,1,1,0], [1,0,0,0], [0,0,1,1], [0,0,1,0]])
variable= cp.Variable(M.shape[1], value = 1, boolean=True)
one_vec = np.ones(M.shape[1])
obj = cp.Minimize(sum(np.dot(variable, one_vec)))
constraints = []
for i in range(len(M)):
constraints.append(np.sum(np.dot(M[i], variable)) >= 1)
problem = cp.Problem(obj, constraints=constraints)
problem.solve()
so as an answer to this simple example given by the matrix M in my code, the answer should be such that variable vector's value should be [1, 0, 1, 0], since multiplying the vector [1, 0, 1, 0] with the matrix
[[1, 0, 0, 0]
[1, 0, 0, 0]
[0, 1, 1, 0]
[1, 0, 0, 0]
[0, 0, 1, 1]
[0, 0, 1, 0]
]
would give a value of at least 1 for each row.
But if I run this code that I have written, I'm getting a value that is a float as my answer, hence I'm doing something wrong which I cannot figure out. I do not know how to phrase this question programmatically I guess so that the solver would solve it. Any help would be well appreciated. Thanks.
UPDATE! I think I figured it out
I modified the code to this:
import numpy as np
import cvxpy as cp
M = np.array([[1,0,0,1], [1,0,0,1], [0,1,1,1], [1,0,0,1], [0,0,1,1], [0,0,1,1]])
selection = cp.Variable(M.shape[1], boolean = True)
ones_vec = np.ones(M.shape[1])
constraints = []
for i in range(len(M)):
constraints.append(M[i] * selection >= 1)
total_genomes = ones_vec * selection
problem = cp.Problem(cp.Minimize(total_genomes), constraints)
problem.solve()
and now it's working. I used the * operator instead of numpy dot product, cvxpy has overloaded that operator I think to perform vector multiplications.

Using Theano.scan with multidimensional arrays

To speed up my code I am converting a multidimensional sumproduct function from Python to Theano. My Theano code reaches the same result, but only calculates the result for one dimension at a time, so that I have to use a Python for-loop to get the end result. I assume that would make the code slow, because Theano cannot optimize memory usage and transfer (for the gpu) between multiple function calls. Or is this a wrong assumption?
So how can I change the Theano code, so that the sumprod is calculated in one function call?
The original Python function:
def sumprod(a1, a2):
"""Sum the element-wise products of the `a1` and `a2`."""
result = numpy.zeros_like(a1[0])
for i, j in zip(a1, a2):
result += i*j
return result
For the following input
a1 = ([1, 2, 4], [5, 6, 7])
a2 = ([1, 2, 4], [5, 6, 7])
the output would be: [ 26. 40. 65.] that is 1*1 + 5*5, 2*2 + 6*6 and 4*4 + 7*7
The Theano version of the code:
import theano
import theano.tensor as T
import numpy
a1 = ([1, 2, 4], [5, 6, 7])
a2 = ([1, 2, 4], [5, 6, 7])
# wanted result: [ 26. 40. 65.]
# that is 1*1 + 5*5, 2*2 + 6*6 and 4*4 + 7*7
Tk = T.iscalar('Tk')
Ta1_shared = theano.shared(numpy.array(a1).T)
Ta2_shared = theano.shared(numpy.array(a2).T)
outputs_info = T.as_tensor_variable(numpy.asarray(0, 'float64'))
Tsumprod_result, updates = theano.scan(fn=lambda Ta1_shared, Ta2_shared, prior_value:
prior_value + Ta1_shared * Ta2_shared,
outputs_info=outputs_info,
sequences=[Ta1_shared[Tk], Ta2_shared[Tk]])
Tsumprod_result = Tsumprod_result[-1]
Tsumprod = theano.function([Tk], outputs=Tsumprod_result)
result = numpy.zeros_like(a1[0])
for i in range(len(a1[0])):
result[i] = Tsumprod(i)
print result
First, there is more people that will answer your questions on theano mailing list then on stackoverflow. But I'm here:)
First, your function isn't a good fit for GPU. Even if everything was well optimized, the transfer of the input to the gpu just to add and sum the result will take more time to run then the python version.
Your python code is slow, here is a version that should be faster:
def sumprod(a1, a2):
"""Sum the element-wise products of the `a1` and `a2`."""
a1 = numpy.asarray(a1)
a2 = numpy.asarray(a2)
result (a1 * a2).sum(axis=0)
return result
For the theano code, here is the equivalent of this faster python version(no need of scan)
m1 = theano.tensor.matrix()
m2 = theano.tensor.matrix()
f = theano.function([m1, m2], (m1 * m2).sum(axis=0))
The think to remember from this is that you need to "vectorize" your code. The "vectorize" is used in the NumPy context and it mean to use numpy.ndarray and use function that work on the full tensor at a time. This is always faster then doing it with loop (python loop or theano scan). Also, Theano optimize some of thoses cases by moving the computation outside the scan, but it don't always do it.

Resources