Updated list does not get used in Python - python-3.x

import math
array = [16,5,3,4,11,9,13]
for x in array[0:len(array)-1]:
key=x
index=array.index(x)
posj=index
for y in array[index+1:len(array)]:
if y<key:
key=y
posj=array.index(y)
if index!=posj:
hold=array[index]
array[index]=key
array[posj]=hold
print(array)
I'm trying to implement insertion sort.
It appears after using the debugger that in every loop iteration, it is using the array [16,5,3,4,11,9,13] instead of the updated array that results after a loop iteration.
How can I make x be the updated element for the given indicie?

Instead of
for x in array[0:len(array)-1]:
try
for x in array:
Output
[3, 4, 5, 9, 11, 13, 16]

Related

numpy array shows same id after changing values

I have created 2 numpy arrays.
Original
Cloned
Cloned is a copy of original array. I changed an element in cloned array. I am checking id of original and cloned array using is keyword. It returns false. But when I print id of both elements, it is same.
I know about peephole optimization techniques and numbers from -5 to 256 is stored at same address in python. But here I have changed value to 400 (> 256). Still it shows same id. WHY?
Please correct me if I am wrong. I am new to numpy arrays
import numpy as np
original = np.array([
[1, 2, 3, 4, 5],
[6, 7, 9, 10, 11]
])
# Copying array "original" to "cloned"
cloned = original.copy()
# Changing first element of Cloned Array
cloned[0, 1] = 400
print(id(cloned[0, 1]))
print(id(original[0, 1]))
print(id(cloned[0, 1]) is id(original[0, 1]))
Output:
140132171232408
id is same
140132171232408
id is same
False
is returns false although id is same
The is keyword is used to test if two variables refer to the same object, not to check if they are equal. Use == instead as following:
print(id(cloned[0, 1]) == id(original[0, 1]))
# Returns True
Apparently id() function return the same value but different objects therefore using is operator does not work.

Choose n random elements from every row of list of list

I have a list of list L as :
[
[1,2,3,4,5,6],
[10,20,30,40,50,60],
[11,12,113,4,15,6],
]
Inner list are of same size.
I want to choose n-random elements from every row of L and output it as same list of list.
I tried the following code:
import random
import math
len_f=len(L)
index=[i for i in range(len_f)]
RANDOM_INDEX=random.sample(index, 5))
I am stuck at this point that how can I use random index to get output from L.
The output for "2" random elements would be:
[
[1,6],
[10,60],
[11,6],
]
If random function chose 1 and 6 as index.
random.sample could be leveraged. Adapt sample size k according to your needs.
In: import random
In: [random.sample(ls, k=3) for ls in L]
Out: [[1, 2, 6], [60, 10, 30], [4, 12, 15]]
It assumes the order of the picked elements doesn't matter.
Doc for random.sample for convenience: https://docs.python.org/3/library/random.html#random.sample

Convert map object into numpy ndarray in python3

The following code works well in python2, but after migration to python3, it does not work.
How do I change this code in python3?
for i, idx in enumerate(indices):
user_id, item_id = idx
feature_seq = np.array(map(lambda x: user_id, item_id))
X[i, :len(item_id), :] = feature_seq # ---- error here ----
error:
TypeError: int() argument must be a string, a bytes-like object or a number, not 'map'
Thank you.
In python3, Map returns a iterator, not a list.
You can also try numpy.fromiter to get an array from a map obj,only for 1-D data.
Example:
a=map(lambda x:x,range(10))
b=np.fromiter(a,dtype=np.int)
b
Ouput:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
For multidimensional arrays, refer to Reconcile np.fromiter and multidimensional arrays in Python
In PY3, map is like a generator. You need to wrap it in list() to produce a list that np.array can use: e.g.
np.array(list(map(...)))

How to reuse Theano function with different shared variables without rebuilding graph?

I have a Theano function that is called several times, each time with different shared variables. The way it is implemented now, the Theano function gets redefined every time it is run. I assume, that this make the whole program slow, because every time the Theano functions gets defined the graph is rebuild.
def sumprod_shared(T_shared_array1, T_shared_array2):
f = theano.function([], (T_shared_array1 * T_shared_array2).sum(axis=0))
return f()
for factor in range(10):
m1 = theano.shared(factor * array([[1, 2, 4], [5, 6, 7]]))
m2 = theano.shared(factor * array([[1, 2, 4], [5, 6, 7]]))
print sumprod_shared(m1, m2)
For non shared (normal) variables I can define the function once and then call it with different variables without redefining.
def sumprod_init():
T_matrix1 = T.lmatrix('T_matrix1')
T_matrix2 = T.lmatrix('T_matrix2')
return theano.function([T_matrix1, T_matrix2], (T_matrix1 * T_matrix2).sum(axis=0))
sumprod = sumprod_init()
for factor in range(10):
np_array1 = factor * array([[1, 2, 4], [5, 6, 7]])
np_array2 = factor * array([[1, 2, 4], [5, 6, 7]])
print sumprod(np_array1, np_array2)
Is this possible also for shared variables?
You can use the givens keyword in theano.function for that. Basically, you do the following.
m1 = theano.shared(name='m1', value = np.zeros((3,2)) )
m2 = theano.shared(name='m2', value = np.zeros((3,2)) )
x1 = theano.tensor.dmatrix('x1')
x2 = theano.tensor.dmatrix('x2')
y = (x1*x2).sum(axis=0)
f = theano.function([],y,givens=[(x1,m1),(x2,m2)],on_unused_input='ignore')
then to loop through values you just set the value of the shared variables to the value you'd like. You have to set the on_unused_input to 'ignore' to use functions with no arguments in theano, by the way. Like this:
array1 = array([[1,2,3],[4,5,6]])
array2 = array([[2,4,6],[8,10,12]])
for i in range(10):
m1.set_value(i*array1)
m2.set_value(i*array2)
print f()
It should work, at least that's how I've been working around it.
Currently it is not easily possible to reuse a Theano function with different shared variable.
But you have alternative:
Is it really a bottleneck? In the example, it is, but I suppose it is a simplified case. The only way to know is to profile it.
You compile 1 Theano function with the first shared variable. Then you can call the get_value/set_value on those shared variables before calling the Theano function. This way, you won't need to recompile the Theano function.

Using Theano.scan with multidimensional arrays

To speed up my code I am converting a multidimensional sumproduct function from Python to Theano. My Theano code reaches the same result, but only calculates the result for one dimension at a time, so that I have to use a Python for-loop to get the end result. I assume that would make the code slow, because Theano cannot optimize memory usage and transfer (for the gpu) between multiple function calls. Or is this a wrong assumption?
So how can I change the Theano code, so that the sumprod is calculated in one function call?
The original Python function:
def sumprod(a1, a2):
"""Sum the element-wise products of the `a1` and `a2`."""
result = numpy.zeros_like(a1[0])
for i, j in zip(a1, a2):
result += i*j
return result
For the following input
a1 = ([1, 2, 4], [5, 6, 7])
a2 = ([1, 2, 4], [5, 6, 7])
the output would be: [ 26. 40. 65.] that is 1*1 + 5*5, 2*2 + 6*6 and 4*4 + 7*7
The Theano version of the code:
import theano
import theano.tensor as T
import numpy
a1 = ([1, 2, 4], [5, 6, 7])
a2 = ([1, 2, 4], [5, 6, 7])
# wanted result: [ 26. 40. 65.]
# that is 1*1 + 5*5, 2*2 + 6*6 and 4*4 + 7*7
Tk = T.iscalar('Tk')
Ta1_shared = theano.shared(numpy.array(a1).T)
Ta2_shared = theano.shared(numpy.array(a2).T)
outputs_info = T.as_tensor_variable(numpy.asarray(0, 'float64'))
Tsumprod_result, updates = theano.scan(fn=lambda Ta1_shared, Ta2_shared, prior_value:
prior_value + Ta1_shared * Ta2_shared,
outputs_info=outputs_info,
sequences=[Ta1_shared[Tk], Ta2_shared[Tk]])
Tsumprod_result = Tsumprod_result[-1]
Tsumprod = theano.function([Tk], outputs=Tsumprod_result)
result = numpy.zeros_like(a1[0])
for i in range(len(a1[0])):
result[i] = Tsumprod(i)
print result
First, there is more people that will answer your questions on theano mailing list then on stackoverflow. But I'm here:)
First, your function isn't a good fit for GPU. Even if everything was well optimized, the transfer of the input to the gpu just to add and sum the result will take more time to run then the python version.
Your python code is slow, here is a version that should be faster:
def sumprod(a1, a2):
"""Sum the element-wise products of the `a1` and `a2`."""
a1 = numpy.asarray(a1)
a2 = numpy.asarray(a2)
result (a1 * a2).sum(axis=0)
return result
For the theano code, here is the equivalent of this faster python version(no need of scan)
m1 = theano.tensor.matrix()
m2 = theano.tensor.matrix()
f = theano.function([m1, m2], (m1 * m2).sum(axis=0))
The think to remember from this is that you need to "vectorize" your code. The "vectorize" is used in the NumPy context and it mean to use numpy.ndarray and use function that work on the full tensor at a time. This is always faster then doing it with loop (python loop or theano scan). Also, Theano optimize some of thoses cases by moving the computation outside the scan, but it don't always do it.

Resources