Related
I am trying to improve a part of code that is slowing down the whole script significantly, right to the point of making it unfeasible. In particular the piece of code is:
for vectors1 in EC1:
for vectors2 in EC2:
r = np.add(vectors1, vectors2)
for vectors3 in CDC:
result = np.add(r, vectors3).tolist()
if result not in states: # This is what makes it very slow
states.append(result)
EC1, EC2 and CDC are lists that contains as elements, lists of lists, as an example of one iteration, we get:
vectors1: [[2, 0, 0], [0, 0, 0], [0, 0, 0], [2, 0, 0], [0, 0, 0], [0, 0, 0], [2, 0, 0], [2, 0, 0], [0, 0, 0]]
vectors2: [[0, 0, 0], [2, 0, 0], [0, 0, 0], [0, 0, 0], [2, 0, 0], [2, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]
vectors3: [[0, 0, 0], [0, 0, 0], [2, 1, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [2, 1, 0], [2, 1, 0]]
result: [[2, 0, 0], [2, 0, 0], [2, 1, 0], [2, 0, 0], [2, 0, 0], [2, 0, 0], [2, 0, 0], [4, 1, 0], [2, 1, 0]]
Notice how vectors1, vectors2 and vectors3 correspond to one element from EC1, EC2 and CDC respectively, also how 'result' is the summation from vectors1, vectors2 and vectors3, hence the previous vectors cannot be altered in any manner or sorted, otherwise it would change the expected result from the 'result' variable.
In the first two loops each item in EC1 and EC2 are summed, for later on sum up the previous result with items in CDC. To sum the list of lists from EC1 and EC2 and later on the previous result ('r') with the list of lists from CDC I use numpy.add(). Finally, I reconvert 'result' back to list. So Basically I am managing lists of lists as elements from EC1, EC2 and CDC.
The problem is that I must deal with hundreds of thousands (close to 1M) of results and having to check if a result exists in states list is slowing things drastically, specially since states list grows as more results are processed.
I've tried to keep inside the numpy world by managing everything as numpy arrays. First declaring states as:
states = np.empty([9, 3], int)
Then, concatenating the result numpy array to states numpy array, prior checking if already exists in states:
for vectors1 in EC1:
for vectors2 in EC2:
r = np.add(vectors1, vectors2)
for vectors3 in CDC:
result = np.add(r, vectors3)
if not np.isin(states, result).any():
np.concatenate(states, result, axis=0)
But definitely I am doing something wrong because result is not being concatenated to states, I've also tried without success:
np.append(states, result, axis=0)
Could this be parallelized in some way?
You can do the sums solely in numpy by using broadcasting
res = ((EC1[:,None,:] + EC2).reshape(-1, 1, 3) + CDC).reshape(-1, 3)
given that EC1, EC2 and CDC are arrays.
Afterwards you can filter out the duplicates with
np.unique(res, axis=0)
But like Lucas, I would strongly advise you to filter the arrays beforehand. For your example arrays that would shrink the number of rows in res from 729 to 8.
I'm not sure how large the data are that you are working with but this may speed things up somewhat:
EC1 = [[2, 0, 0], [0, 0, 0], [0, 0, 0], [2, 0, 0], [0, 0, 0], [0, 0, 0], [2, 0, 0], [2, 0, 0], [0, 0, 0]]
EC2 = [[0, 0, 0], [2, 0, 0], [0, 0, 0], [0, 0, 0], [2, 0, 0], [2, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]
CDC = [[0, 0, 0], [0, 0, 0], [2, 1, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [2, 1, 0], [2, 1, 0]]
EC1.sort()
EC2.sort()
CDC.sort()
unique_triples = dict()
for v1 in EC1:
for v2 in EC2:
for v3 in CDC:
if str(v1)+str(v2)+str(v3) not in unique_triples: # list not hashable but strings are
unique_triples[str(v1)+str(v2)+str(v3)] = list(np.add(np.add(v1, v2), v3))
The basic idea is to remove duplicate triples of (EC1,EC2, CDC) entries and only do the additions on unique triples, sort the lists so that they are ordered lexicographically
A dictionary has O(1) lookups so these lookups are (maybe) faster.
Whether this is faster or not might depend on how large-and how many unique values of triples-the data are that are being processed.
The 3-vector sums are the values of the dictionary, e.g.
list(unique_triples.values()) for me gives:
>>> list(unique_triples.values())
[[0, 0, 0], [2, 1, 0], [2, 0, 0], [4, 1, 0], [2, 0, 0], [4, 1, 0], [4, 0, 0], [6, 1, 0]]
I did not remove the duplicates in the original lists of lists here. If the application you are looking at allows, it is also likely beneficial to remove these duplicates in EC1, EC2, and CDC before iterating over the values.
Expected Output:
indenitiy_matrix(3)
[[1, 0, 0], [0, 1, 0], [0, 0, 1]]
Actual Output with Error:
indenitiy_matrix(3)
[[1, 1, 1], [1, 1, 1], [1, 1, 1]]
def identity_matrix(n):
list_template = [[]]
list_n = list_template*n
for sub_l in list_n:
sub_l.append(0)
for val in range(n):
# I have the feeling that the problem lies somewhere around here.
list_n[val][val]=1
return(list_n)
list_template*n does not create n copies, instead but all those n copies reference to only one copy. For example see this
a = [[0,0,0]]*2
# Now, lets change first element of the first sublist in `a`.
a[0][0] = 1
print (a)
# but since both the 2 sublists refer to same, both of them will be changed.
Output:
[[1, 0, 0], [1, 0, 0]]
Fix for your code
def identity_matrix(n):
list_n = [[0]*n for i in range(n)]
for val in range(n):
list_n[val][val]=1
return list_n
print (identity_matrix(5))
Output:
[[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 0, 1]]
No, the problem lies here:
list_template = [[]]
list_n = list_template*n
After this, try doing:
list_n[0].append(1) # let's change the first element
The result:
[[1], [1], [1], [1], [1]]
is probably not what you expect.
Briefly, the problem is that after its construction, your list consists of multiple references to same list. A detailed explanation is at the link given by #saint-jaeger : List of lists changes reflected across sublists unexpectedly
Finally, the numpy library is your friend for creating identity matrices and other N-dimensional arrays.
I have a list of lists:
[[0, 0], [0, 0], [0, 0], [0, 1, 0], [0, 0]]
I want to split it into what comes before the list [0,1,0] and what comes after like so:
[[0, 0], [0, 0], [0, 0]], [[0, 0]]
If I had a list:
[[0, 0], [0, 0], [0, 0], [0, 1, 0], [0, 0], [0, 1, 0], [0, 0]]
I would want to split it into a list like this:
[[0, 0], [0, 0], [0, 0]], [[0, 0]], [[0, 0]]
I am really stuck with this while loop, which does not seem to reset the temporary list at the right place:
def count_normal_jumps(jumps):
_temp1 = []
normal_jumps = []
jump_index = 0
while jump_index <= len(jumps) - 1:
if jumps[jump_index] == [0,0]:
_temp1.append(jumps[jump_index])
else:
normal_jumps.append(_temp1)
_temp1[:] = []
jump_index += 1
return normal_jumps
Why does this not work and is there a better approach?
You can use a for loop to append the sublists in the list to the last sublist in a list of lists, and append a new sublist to the list of lists when the input sublist is equal to [0, 1, 0]:
def split(lst):
output = [[]]
for l in lst:
if l == [0, 1, 0]:
output.append([])
else:
output[-1].append(l)
return output
or you can use itertools.groupby:
from itertools import groupby
def split(lst):
return [list(g) for k, g in groupby(lst, key=[0, 1, 0].__ne__) if k]
so that:
print(split([[0, 0], [0, 0], [0, 0], [0, 1, 0], [0, 0]]))
print(split([[0, 0], [0, 0], [0, 0], [0, 1, 0], [0, 0], [0, 1, 0], [0, 0]]))
outputs:
[[[0, 0], [0, 0], [0, 0]], [[0, 0]]]
[[[0, 0], [0, 0], [0, 0]], [[0, 0]], [[0, 0]]]
You can do something like this:
myList = [[0, 0], [0, 0], [0, 0], [0, 1, 0], [0, 0]]
toMatch = [0, 1, 0]
allMatches = []
currentMatches = []
for lst in myList:
if lst == toMatch:
allMatches.append(currentMatches)
currentMatches = []
else:
currentMatches.append(lst)
#push leftovers when end is reached
if currentMatches:
allMatches.append(currentMatches)
print(allMatches)
Can someone please elaborate how it is functioning?
Like I am taking a value (3,3)
colNum is 3 and rowNum is 3.
multilist = [[0 for col in range(colNum)]for row in range(rowNum)]
prints '0' in specified number of columns and rows.
ex: if colNum is 4 and rowNum is 4... then you will notice a multilist like below.
[[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]
Here in the list comprehending statement the syntax is
[ [ output_expression() for(set of columns to iterate) ]for(set of rows to iterate)]
And 0 is your outputexpression.
Now another example, in which your output expression is adding row index and column index
multilist = [[col+row for col in range(4)]for row in range(4)]
and the output is
[[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6]]
def print_multilist(rowNum, colNum):
multilist = [[0 for col in range(colNum)]for row in range(rowNum)]
print(multilist)
print_multilist(3,3)#prints [[0, 0, 0], [0, 0, 0], [0, 0, 0]]
print_multilist(2,2) #prints [[0, 0], [0, 0]]
The above line is a list comprehension. It is equivalent to as follows without using list comprehensions
def print_multilist(rowNum, colNum):
multilist = []
for row in range(rowNum):
multilist.append([])
for col in range(colNum):
multilist[row].append(0)
print(multilist)
print_multilist(3,3) #prints [[0, 0, 0], [0, 0, 0], [0, 0, 0]]
print_multilist(2,2) #prints [[0, 0], [0, 0]]
List comprehensions are very useful in python, as they decreases the code written like above. For more information about list comprehensions, Visit link
I'm trying to construct permutations of a list with the i-th bit flipped every time.
For example:
With input:
[1,0,0,0]
To get:
[0,0,0,0]
[1,1,0,0]
[1,0,1,0]
[1,0,0,1]
I wrote a method which given a list returns the same list with the bit at position p changed:
def flipBit(l,p):
l[p] = ~l[p]&1
return l
And I'm trying to apply it using a map(), but I can't even get a basic example working:
p=list(map(flipBit, [[1,0,0]]*3,range(3))))
This is what it returns:
[[0, 1, 1], [0, 1, 1], [0, 1, 1]]
while expecting:
[[0, 0, 0], [1, 1, 0], [1, 0, 1]]
What I'm I doing wrong? (if anyone can suggest an even shorter code for this maybe without using a flipbit method I'd appreciate it as I won't really use flipbit other than this purpose and I want to keep the code concise and clean)
The issue is that [[1,0,0]]*3 creates a list containing three references to the same sublist. When you change one sublist, they all change.
Here is one way to fix this:
>>> list(map(flipBit, [[1,0,0] for _ in range(3)], range(3)))
↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑
[[0, 0, 0], [1, 1, 0], [1, 0, 1]]
And here is a way to implement this functionality without using a helper function:
>>> l = [1, 0, 0]
>>> [l[:i] + [1-l[i]] + l[i+1:] for i in range(len(l))]
[[0, 0, 0], [1, 1, 0], [1, 0, 1]]
The code you posted is was not valid at all, but I presume you need this:
>>> p=list(map(lambda x: flipBit([1, 0, 0], x), range(3)))
>>> p
[[0, 0, 0], [1, 1, 0], [1, 0, 1]]
Basically, you map with a lambda function, that partially applies [1, 0, 0] as l, and then takes each element in range(3) and applies it to p.