Multiplying two sub-matrices in NumPy without copying the matrices - python-3.x

I have two matrices A and B and I'm interested in multiplying a sub-matrix of A (defined by some of its rows) and a sub-matrix of B (defined by some of its columns) as shown in the example below:
import numpy as np
# create two matrices
A = np.ones((5, 3))
B = np.ones((3, 5))
# define sub-matrices
rows_idx = [0, 2, 3]
cols_idx = [1, 2, 4]
# print sub-matrices
print(A[rows_idx])
print(B[:, cols_idx])
I'm aware that this can be achieved directly through
A[rows_idx, :] # B[:, cols_idx]
However, the latter comes with the drawback of making copies of the rows of A and columns of B since rows_idx and cols_idx are lists, as mentioned here.
This is less efficient in terms of time and memory.
Also, I refrained from creating a Python function to perform the multiplication as Python loops are slow. Numpy array operations are much faster. Here are some timing comparisons.
Is there way to compute A[rows_idx, :] # B[:, cols_idx] without copying?

Related

Expand the tensor by several dimensions

In PyTorch, given a tensor of size=[3], how to expand it by several dimensions to the size=[3,2,5,5] such that the added dimensions have the corresponding values from the original tensor. For example, making size=[3] vector=[1,2,3] such that the first tensor of size [2,5,5] has values 1, the second one has all values 2, and the third one all values 3.
In addition, how to expand the vector of size [3,2] to [3,2,5,5]?
One way to do it I can think is by means of creating a vector of the same size with ones-Like and then einsum but I think there should be an easier way.
You can first unsqueeze the appropriate number of singleton dimensions, then expand to a view at the target shape with torch.Tensor.expand:
>>> x = torch.rand(3)
>>> target = [3,2,5,5]
>>> x[:, None, None, None].expand(target)
A nice workaround is to use torch.Tensor.reshape or torch.Tensor.view to do perform multiple unsqueezing:
>>> x.view(-1, 1, 1, 1).expand(target)
This allows for a more general approach to handle any arbitrary target shape:
>>> x.view(len(x), *(1,)*(len(target)-1)).expand(target)
For an even more general implementation, where x can be multi-dimensional:
>>> x = torch.rand(3, 2)
# just to make sure the target shape is valid w.r.t to x
>>> assert list(x.shape) == list(target[:x.ndim])
>>> x.view(*x.shape, *(1,)*(len(target)-x.ndim)).expand(target)

Using a subset of classes in ImageNet

I'm aware that subsets of ImageNet exist, however they don't fulfill my requirement. I want 50 classes at their native ImageNet resolutions.
To this end, I used torch.utils.data.dataset.Subset to select specific classes from ImageNet. However, it turns out, class labels/indices must be greater than 0 and less than num_classes.
Since ImageNet contains 1000 classes, the idx of my selected classes quickly goes over 50. How can I reassign the class indices and do so in a way that allows for evaluation later down the road as well?
Is there a way more elegant way to select a subset?
I am not sure I understood your conclusions about labels being greater than zero and less than num_classes. The torch.utils.data.Subset helper takes in a torch.utils.data.Dataset and a sequence of indices, they correspond to indices of data points from the Dataset you would like to keep in the subset. These indices have nothing to do with the classes they belong to.
Here's how I would approach this:
Load your dataset through torchvision.datasets (custom datasets would work the same way). Here I will demonstrate it with FashionMNIST since ImageNet's data is not made available directly through torchvision's API.
>>> ds = torchvision.datasets.FashionMNIST('.')
>>> len(ds)
60000
Define the classes you want to select for the subset dataset. And retrieve all indices from the main dataset which correspond to these classes:
>>> targets = [1, 3, 5, 9]
>>> indices = [i for i, label in enumerate(ds.targets) if label in targets]
You have your subset:
>>> ds_subset = Subset(ds, indices)
>>> len(ds_subset)
24000
At this point, you can use a dictionnary to remap your labels using targets:
>>> remap = {i:x for i, x in enumerate(targets)}
{0: 1, 1: 3, 2: 5, 3: 9}
For example:
>>> x, y = ds_subset[10]
>>> y, remap[y] # old_label, new_label
1, 3

Tensorflow map_fn Out of Memory Issues

I am having issues with my code running out of memory on large data sets. I attempted to chunk the data to feed it into the calculation graph but I eventually get an out of memory error. Would setting it up to use the feed_dict functionality get around this problem?
My code is set up like the following, with a nested map_fn function due to a result of the tf_itertools_product_2D_nest function.
tf_itertools_product_2D_nest function is from Cartesian Product in Tensorflow
I also tried a variation where I made a list of tensor-lists which was significantly slower than doing it purely in tensorflow so I'd prefer to avoid that method.
import tensorflow as tf
import numpy as np
config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.9
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
sess = tf.Session()
sess.run(tf.global_variables_initializer())
tensorboard_log_dir = "../log/"
def tf_itertools_product_2D_nest(a,b): #does not work on nested tensors
a, b = a[ None, :, None ], b[ :, None, None ]
#print(sess.run(tf.shape(a)))
#print(sess.run(tf.shape(b)))
n_feat_dimension_in_common = tf.shape(a)[-1]
c = tf.concat( [ a + tf.zeros_like( b ), tf.zeros_like( a ) + b ], axis = 2 )
return c
def do_calc(arr_pair):
arr_1 = arr_pair[0]
arr_binary = arr_pair[1]
return tf.reduce_max(tf.cumsum(arr_1*arr_binary))
def calc_row_wrapper(row):
return tf.map_fn(do_calc,row)
for i in range(0,10):
a = tf.constant(np.random.random((7,10))*10,tf.float64)
b = tf.constant(np.random.randint(2, size=(3,10)),tf.float64)
a_b_itertools_product = tf_itertools_product_2D_nest(a,b)
'''Creates array like this:
[ [[arr_a0,arr_b0], [arr_a1,arr_b0],...],
[[arr_a0,arr_b1], [arr_a1,arr_b1],...],
[[arr_a0,arr_b2], [arr_a1,arr_b2],...],
...]
'''
with tf.summary.FileWriter(tensorboard_log_dir, sess.graph) as writer:
result_array = sess.run(tf.map_fn(calc_row_wrapper,a_b_itertools_product),
options=run_options,run_metadata=run_metadata)
writer.add_run_metadata(run_metadata,"iteration {}".format(i))
print(result_array.shape)
print(result_array)
print("")
# result_array should be an array with 3 rows (1 for each binary vector in b) and 7 columns (1 for each row in a)
I can imagine that is unnecessarily consuming memory due to the extra dimension added. Is there a way to mimic the outcome of the standard itertools.product() function to output 1 long list of every possible combination of items in the 2 input iterables? Like the result of:
itertools.product([[1,2],[3,4]],[[5,6],[7,8]])
# [([1, 2], [5, 6]), ([1, 2], [7, 8]), ([3, 4], [5, 6]), ([3, 4], [7, 8])]
That would eliminate the need to call map_fn twice.
When map_fn is called within a loop as my code shows, will it keep spawning graphs for every iteration? There appears to be a big "map_" node for every iteration cycle in this code's Tensorboardgraph.
Tensorboard Default View (not enough reputation yet)
When I select a particular iteration based on the tag in Tensorboard, only the map node corresponding to the iteration is highlighted with all the others grayed out. Does that mean that for that cycle only the map node for that cycle is present (and the others no longer, if from a previous cycle , exist in memory)?
Tensorboard 1 iteration view

python3: find most k nearest vectors from a list?

Say I have a vector v1 and a list of vector l1. I want to find k vectors from l1 that are most closed (similar) to v1 in descending order.
I have a function sim_score(v1,v2) that will return a similarity score between 0 and 1 for any two input vectors.
Indeed, a naive way is to write a for loop over l1, calculate distance and store them into another list, then sort the output list. But is there a Pythonic way to do the task?
Thanks
import numpy as np
np.sort([np.sqrt(np.sum(( l-v1)*(l-v1))) For l in l1])[:3]
Consider using scipy.spatial.distance module for distance computations. It supports the most common metrics.
import numpy as np
from scipy.spatial import distance
v1 = [[1, 2, 3]]
l1 = [[11, 3, 5],
[ 2, 1, 9],
[.1, 3, 2]]
# compute distances
dists = distance.cdist(v1, l1, metric='euclidean')
# sorted distances
sd = np.sort(dists)
Note that each parameter to cdist must be two-dimensional. Hence, v1 must be a nested list, or a 2d numpy array.
You may also use your homegrown metric like:
def my_metric(a, b, **kwargs):
# some logic
dists = distance.cdist(v1, l1, metric=my_metric)

Using Theano.scan with multidimensional arrays

To speed up my code I am converting a multidimensional sumproduct function from Python to Theano. My Theano code reaches the same result, but only calculates the result for one dimension at a time, so that I have to use a Python for-loop to get the end result. I assume that would make the code slow, because Theano cannot optimize memory usage and transfer (for the gpu) between multiple function calls. Or is this a wrong assumption?
So how can I change the Theano code, so that the sumprod is calculated in one function call?
The original Python function:
def sumprod(a1, a2):
"""Sum the element-wise products of the `a1` and `a2`."""
result = numpy.zeros_like(a1[0])
for i, j in zip(a1, a2):
result += i*j
return result
For the following input
a1 = ([1, 2, 4], [5, 6, 7])
a2 = ([1, 2, 4], [5, 6, 7])
the output would be: [ 26. 40. 65.] that is 1*1 + 5*5, 2*2 + 6*6 and 4*4 + 7*7
The Theano version of the code:
import theano
import theano.tensor as T
import numpy
a1 = ([1, 2, 4], [5, 6, 7])
a2 = ([1, 2, 4], [5, 6, 7])
# wanted result: [ 26. 40. 65.]
# that is 1*1 + 5*5, 2*2 + 6*6 and 4*4 + 7*7
Tk = T.iscalar('Tk')
Ta1_shared = theano.shared(numpy.array(a1).T)
Ta2_shared = theano.shared(numpy.array(a2).T)
outputs_info = T.as_tensor_variable(numpy.asarray(0, 'float64'))
Tsumprod_result, updates = theano.scan(fn=lambda Ta1_shared, Ta2_shared, prior_value:
prior_value + Ta1_shared * Ta2_shared,
outputs_info=outputs_info,
sequences=[Ta1_shared[Tk], Ta2_shared[Tk]])
Tsumprod_result = Tsumprod_result[-1]
Tsumprod = theano.function([Tk], outputs=Tsumprod_result)
result = numpy.zeros_like(a1[0])
for i in range(len(a1[0])):
result[i] = Tsumprod(i)
print result
First, there is more people that will answer your questions on theano mailing list then on stackoverflow. But I'm here:)
First, your function isn't a good fit for GPU. Even if everything was well optimized, the transfer of the input to the gpu just to add and sum the result will take more time to run then the python version.
Your python code is slow, here is a version that should be faster:
def sumprod(a1, a2):
"""Sum the element-wise products of the `a1` and `a2`."""
a1 = numpy.asarray(a1)
a2 = numpy.asarray(a2)
result (a1 * a2).sum(axis=0)
return result
For the theano code, here is the equivalent of this faster python version(no need of scan)
m1 = theano.tensor.matrix()
m2 = theano.tensor.matrix()
f = theano.function([m1, m2], (m1 * m2).sum(axis=0))
The think to remember from this is that you need to "vectorize" your code. The "vectorize" is used in the NumPy context and it mean to use numpy.ndarray and use function that work on the full tensor at a time. This is always faster then doing it with loop (python loop or theano scan). Also, Theano optimize some of thoses cases by moving the computation outside the scan, but it don't always do it.

Resources