Is the order of batches guaranteed in Keras' OrderedEnqueuer? - keras

I have a custom keras.utils.sequence which generates batches in a specific (and critical) order.
However, I need to parellelise batch generation across multiple cores. Does the name 'OrderedEnqueuer' imply that the order of batches in the resulting queue is guaranteed to be the same as the order of the original keras.utils.sequence?
My reasons for thinking that this order is not guaranteed:
OrderedEnqueuer uses python multiprocessing's apply_async internally.
Keras' docs explicitly say that OrderedEnqueuer is guaranteed not to duplicate batches - but not that the order is guaranteed.
My reasons for thinking that it is:
The name!
I understand that keras.utils.sequence objects are indexable.
I found test scripts on Keras' github which appear to be designed to verify order - although I could not find any documentation about whether these were passed, or whether they are truly conclusive.
If the order here is not guaranteed, I would welcome any suggestions on how to parellelise batch preparation while maintaining a guaranteed order, with the proviso that it must be able to parellelise arbitrary python code - I believe e.g tf.data.Dataset API does not allow this (tf.py_function calls back to original python process).

Yes, it's ordered.
Check it yourself with the following test.
First, let's create a dummy Sequence that returns just the batch index after waiting a random time (the random time is to assure that the batches will not be finished in order):
import time, random, datetime
import numpy as np
import tensorflow as tf
class DataLoader(tf.keras.utils.Sequence):
def __len__(self):
return 10
def __getitem__(self, i):
time.sleep(random.randint(1,2))
#you could add a print here to see that it's out of order
return i
Now let's create a test function that creates the enqueuer and uses it.
The function takes the number of workers and prints the time taken as well as the results as returned.
def test(workers):
enq = tf.keras.utils.OrderedEnqueuer(DataLoader())
enq.start(workers = workers)
gen = enq.get()
results = []
start = datetime.datetime.now()
for i in range(30):
results.append(next(gen))
enq.stop()
print('test with', workers, 'workers took', datetime.datetime.now() - start)
print("results:", results)
Results:
test(1)
test(8)
test with 1 workers took 0:00:45.093122
results: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
test with 8 workers took 0:00:09.127771
results: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Notice that:
8 workers is way faster than 1 worker -> it is parallelizing ok
results are ordered for both cases

Related

Repetitive sequence (optimization)

I try to solve this problem:
initial list = [0, 1, 2, 2]
You get this sequence of numbers [0, 1, 2, 2] and you need to add every time the next natural number (so 3, 4, 5, etc.) n times, where n is the element of its index. For example, the next number to add is 3, and list[3] is 2, so you append [3] 2 times. New list will be: [0, 1, 2, 2, 3, 3]. Then the index of 4 is 3, so you have to append 4 three times. The list will be [0, 1, 2, 2, 3, 3, 4, 4, 4] and so on. ([0, 1, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10])
In order to solve this, I tried various approaches. I used recursion, but a recursive approach is very slow in this case. I tried as well the mathematical formula from OEIS (A055086) => a(n) = ceiling(2*sqrt(n+1)) - 2. The problem with the formula is that after 2 ** 20 it is too imprecise.
So, my next idea was to use memoization:
lst = [0, 1, 2, 2]
from itertools import repeat
def find(n):
global lst
print(lst[-1], n, flush = True)
if len(lst) > n:
return lst[n]
for number in range(lst[-1]+1, n+1):
lst += list(repeat(number, lst[number]))
if len(lst) > n:
return lst[n]
Now, this approach works until 2 ** 37, but after this is just timing out. The site where I try to implement my algorithm is (https://www.codewars.com/kata/5f134651bc9687000f8022c4/train/python). I don't ask for a solution, but for any hint on how to optimize my code.
I googled some similar problems and I found that in this case, I could use the total sum of the list, but is not very clear to me yet how could this help me.
Any help is welcomed!
You can answer it iteratively like so:
def find(n):
lst = [0,1,2,2]
if n < 4:
return lst[n]
to_add = 3
while n >= len(lst):
for i in range(lst[to_add]):
lst.append(to_add)
to_add += 1
return lst[n]
You could optimise for large n by breaking early in the for loop, and by keeping track of the list length separately, rather than calls to len

Does Scipy recognize the special structure of this matrix to decompose it faster?

I have a matrix whose many rows are already in the upper triangular form. I would like to ask if the command scipy.linalg.lu recognize this special structure to faster decompose it. If I decompose this matrix on paper, I only use Gaussian elimination on those rows that are not in the upper triangular form. For example, I will only make transformations on the last row of matrix B.
import numpy as np
A = np.array([[2, 5, 8, 7, 8],
[5, 2, 2, 8, 9],
[7, 5, 6, 6, 10],
[5, 4, 4, 8, 10]])
B = np.array([[2, 5, 8, 7, 8],
[0, 2, 2, 8, 9],
[0, 0, 6, 6, 10],
[5, 4, 4, 8, 10]])
Because my square matrix is of very large dimension and this procedure is repeated thousands of times. I would like to make use of this special structure to reduce the computational complexity.
Thank you so much for your elaboration!
Not automatically.
You'll need to use the structure yourself if want to. Whether you can make it faster then the built-in implementation depends on many factors (the number of zeros etc)

Why there are two square brackets required inside numpy array?

I am learning python, and I recently came across a module Numpy. With the help of Numpy, one can convert list to arrays and perform operations much faster.
Let's say we create an array with following values :
import numpy as np
np_array=array([1,2,3,4,5])
So we need one square bracket if we need to store one list in the form of array. Now if I want to create a 2D array, why it should be defined like this:
np_array=array([[1,2,3,4,5],[6,7,8,9,10]])
And not like this:
np_array=array([1,2,3,4,5],[6,7,8,9,10])
I apologize if this question is a duplicate, but I couldn't find any answer.
Many Thanks
Array function has the following form.
array(object, dtype=None, copy=True, order=None, subok=False, ndmin=0)
If you use
np_array=array([1,2,3,4,5],[6,7,8,9,10])
The function call will result in passing [1,2,3,4,5] to object and [6,7,8,9,10] to dtype, which wont make any sense.
This actually has little to do with numpy. You are essentially asking what is the difference between foo(a, b) and foo([a, b]).
arbitrary_function([1, 2, 3, 4, 5], [6, 7, 8, 9, 10]) passes two lists as separate arguments to arbitrary_function (one argument is [1, 2, 3, 4, 5] and the second is [6, 7, 8, 9, 10]).
arbitrary_function([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]) passes a list of lists ([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]) to arbitrary_function.
Now, numpy creators could have chosen to allow arbitrary_function([1, 2, 3, 4, 5], [6, 7, 8, 9, 10]) but it would have made little to no sense to do so.

variable within a function seems not to be local

I writing a program that takes an initial sequence and performs 3 functions on it and then spits out the 3 answers but I want to keep the original variable intact so it can be reused. From other answers on the forum I have concluded that the variable within a function should be local but it appears to be acting globally.
from collections import deque
from sys import exit
initial_state = (1,2,3,4,5,6,7,8)
initial_state = deque(initial_state)
def row_exchange(t):
t.reverse()
return t
def rt_circ_shift(t):
t.appendleft(t[3])
del t[4]
t.append(t[4])
del t[4]
return t
def md_clk_rot (t):
t.insert(1,t[6])
del t[7]
t.insert(3,t[4])
del t[5]
t.insert(4,t[5])
del t[6]
return t
print(row_exchange(initial_state))
print(initial_state)
print(rt_circ_shift(initial_state))
print(md_clk_rot(initial_state))
I would expect to get:
deque([8, 7, 6, 5, 4, 3, 2, 1])
deque([1, 2, 3, 4, 5, 6, 7, 8])
deque([4, 1, 2, 3, 6, 7, 8, 5])
deque([1, 7, 2, 4, 5, 3, 6, 8])
but instead I get:
deque([8, 7, 6, 5, 4, 3, 2, 1])
deque([8, 7, 6, 5, 4, 3, 2, 1])
deque([5, 8, 7, 6, 3, 2, 1, 4])
deque([5, 1, 8, 6, 3, 7, 2, 4])
so why isn't my variable local within the function?
is there a way I can rename the output within the function so that it isn't using the same identifier initial_state?
I'm pretty new to programming so over explanation would be appreciated.
Per the docs for deque.reverse:
Reverse the elements of the deque in-place and then return None.
(my emphasis). Therefore
def row_exchange(t):
t.reverse()
return t
row_exchange(initial_state)
modifies initial_state. Note that append, appendleft and insert also modify the deque in-place.
To reverse without modifying t inplace, you could use
def row_exchange(t):
return deque(reversed(t))
In each of the functions, t is a local variable. The effect you are seeing is
not because t is somehow global -- it is not. Indeed, you would get a
NameError if you tried to reference t in the global scope.
For more on why modifying a local variable can affect a value outside the local scope, see Ned Batchelder's Facts and myths about Python names and values. In particular, look at the discussion of the function augment_twice.

How to use iter and next in a function to return a list of peaks?

Problem:
I need to define a peaks function which passed to an iterable as parameter, by computing this iterable, the function should return a list of peaks. The only data structure that I can create is the list is returning; use no intermediate data structures to help the computation: e.g., I cannot create a list with all the values in the iterable. Note that I also cannot assume the argument is indexable, nor can I compute len for it: it is just iterable.
For example:
peaks([0,1,-1,3,8,4,3,5,4,3,8]) returns [1, 8, 5].
This result means the values 1, 8, and 5 are strictly bigger than the values immediately preceding and following them.
def peaks(iterable):
l=[]
i=iter(iterable)
v=next(i)
try:
while True:
if v>next(iter(iterable)):
l.append(v)
v=next(i)
except StopIteration:
pass
return l
Calling these should give me:
peaks([0,1,-1,3,8,4,3,5,4,3,8]) --> [1,8,5]
peaks([5,2,4,9,6,1,3,8,0,7]) -->[9,8]
But I am getting:
peaks([0,1,-1,3,8,4,3,5,4,3,8]) --> [1, 3, 8, 4, 3, 5, 4, 3, 8]
peaks([5,2,4,9,6,1,3,8,0,7]) --> [9, 6, 8, 7]
Please help me with this problem, I have spent so much time on it and am making no progress on it. And, I don't know how to write the if statement to check the values immediately preceding and following. Any helps would be great! Actual codes would be really appreciated since my English is bad.
def peaks(L):
answer = []
a,b,c = itertools.tee(L, 3)
next(b)
next(c)
next(c)
for first, second, third in zip(a,b,c):
if first <= second >= third:
answer.append(second)
return answer
In [61]: peaks([0,1,-1,3,8,4,3,5,4,3,8])
Out[61]: [1, 8, 5]
In [62]: peaks([5,2,4,9,6,1,3,8,0,7])
Out[62]: [9, 8]

Resources