Creating a 2D Array with all elements initially set to 'None' - python-3.x

I found the answer to this question very helpful, but I have never seen the keyword None used in such a way before and cannot understand what it's function is in the below block of code:
def get_matrix(self, n, m):
num = 1
***matrix = [[None for j in range(m)] for i in range(n)]***
for i in range(len(matrix)):
for j in range(len(matrix[i])):
matrix[i][j] = num
num += 1
return matrix
If anyone is able to clarify, thank you in advance and I will rename the question to more accurately reflect the topic involved.

It's creating a 2D array of None before populating it. The value of the array doesn't really matter, since it is reassigned later, but None takes less space than other types (specifically, numbers, since that is what is being stored here)
Related - How to define a two-dimensional array?

Related

How can I avoid an “out of index” error with 2D arrays in Python?

The task is to iterate over all elements within a two-dimensional list and do some specific calculations on each element and its nearest neighbors:
count = 0
for i in range(0,len(arr)):
for j in range(0,len(arr)):
if arr[i][j] == 7 and is_perfect_cube(arr[i-1][j] + arr[i+1][j] + arr[i][j-1] + arr[i][j+1]):
count += 1
Unfortunately, I keep getting an index out of range error. Basing on what I’ve managed to debug so far is that the error occurs for the first and last elements of the collection. I know I could use float[int], but I’m not sure how to apply it to my particular implementation. I couldn’t find any similar questions.
Your code explicitly tries to use elements that are out of array bounds. In this call,
is_perfect_cube(arr[i-1][j]+arr[i+1][j]+arr[i][j-1]+arr[i][j+1])
you are asking for arr[i-1][j] which equals to -1 for i=0 - and fetches the last row - this may be desirable in some cases. Then you also ask for arr[i+1][j] which for i equal to the last element, i.e. len(arr)-1 is clearly out of bounds and causes an error. The same holds for arr[i][j+1] - just in terms of columns and not rows.
To handle this, you need to either ignore the end points (loop from 1 to size-2 in both dimensions) or modify your algorithm for the end points. The choice depends on the problem you are trying to solve. All established algorithms consider end points, you can check what are the solutions for yours.
count = 0
for i in range(0,len(arr)):
for j in range(0,len(arr)):
if arr[i][j] == 7:
try:
up = arr[i-1][j]
except:
up = 0
try:
down = arr[i+1][j]
except:
down = 0
try:
left = arr[i][j-1]
except:
left = 0
try:
right = arr[i][j+1]
except:
right = 0
if is_perfect_cube(up+down+left+right):
count += 1
This is my solutions for my question. I don't know if this is the correct way to solve out of boundary problems. I am still open for advice

Python pair sum problem in O(n) clarification

Given an integer array, output all of the unique pairs that sum up to a specific value k.
Here is the standard solution, in O(n) time:
if len(arr) < 2:
return
seen = set()
output = set()
for num in arr:
target = k - num
if target not in seen:
seen.add(num)
else:
output.add((min(num, target), max(num, target)))
return print('\n'.join(map(str, list(output))))
I have a few questions regarding this solution:
1) Why do we use a set to store the seen values of the array? Why not a list? Does this change anything?
2) Why the min(num, target), max(num, target)? Is this for the sake of consistent formatting or is there a deeper reason behind this? At first, I thought it would be to deal with duplicate cases such as (1,3) & (3,1), but this solution doesn't come across that I don't think?
1) Set is a faster way in python to check if a value exists in and not to store duplicates that not needed in this case.
2) The reason of doing (min(num, target), max(num, target)) is probably to add to the output set a tuple that contains both of the numbers in the order of (min, max) that it will print in the last print statement in a better format.

Find unique rows in a numpy array

I know this question has been answered. But I do not understand something in the code. I am trying to find unique rows in a numpy array. The solution using structured arrays given as follows:
x is your input array‍
‍‍‍‍‍‍y = np.ascontiguousarray(x).view(np.dtype((np.void, x.dtype.itemsize * x.shape[1])))
_, idx = np.unique(y, return_index=True)
unique_result = x[idx]
My question is that why we need this line:
y = np.ascontiguousarray(x).view(np.dtype((np.void, x.dtype.itemsize * x.shape[1])))
why cannot we use only:
_, idx = np.unique(x, return_index=True)
unique_result = x[idx]
You are asking a couple of questions. I am not sure where you found the solution you mention but I will explain why that probably was done. This:
_, idx = np.unique(x, return_index=True)
unique_result = x[idx]
does not work to find unique rows in an np.array because np.unique will flatten the given array if no axis is given. I then imagine that the y = np.ascontiguousarray(x).view(np.dtype((np.void, x.dtype.itemsize * x.shape[1]))) line was added so that, even when flattened, the inner arrays, which they are trying to compare, would each be represented as individual items (of type void). Then, using np.unique would indeed return unique rows.
However, I do not see why any of that is necessary. You can just use unique directly while passing the axis you are interested in:
unique_result = np.unique(x, axis=0)

comparing two arrays and get the values which are not common

I am doing this problem a friend gave me where you are given 2 arrays say (a[1,2,3,4] and b[8,7,9,2,1]) and you have to find not common elements.
Expected output is [3,4,8,7,9]. Code below.
def disjoint(e,f):
c = e[:]
d = f[:]
for i in range(len(e)):
for j in range(len(f)):
if e[i] == f[j]:
c.remove(e[i])
d.remove(d[j])
final = c + d
print(final)
print(disjoint(a,b))
I tried with nested loops and creating copies of given arrays to modify them then add them but...
def disjoint(e,f):
c = e[:] # list copies
d = f[:]
for i in range(len(e)):
for j in range(len(f)):
if e[i] == f[j]:
c.remove(c[i]) # edited this line
d.remove(d[j])
final = c + d
print(final)
print(disjoint(a,b))
when I try removing common element from list copies, I get different output [2,4,8,7,9]. why ??
This is my first question in this website. I'll be thankful if anyone can clear my doubts.
Using sets you can do:
a = [1,2,3,4]
b = [8,7,9,2,1]
diff = (set(a) | set(b)) - (set(a) & set(b))
(set(a) | set(b)) is the union, set(a) & set(b) is the intersection and finally you do the difference between the two sets using -.
Your bug comes when you remove the elements in the lines c.remove(c[i]) and d.remove(d[j]). Indeed, the common elements are e[i]and f[j] while c and d are the lists you are updating.
To fix your bug you only need to change these lines to c.remove(e[i]) and d.remove(f[j]).
Note also that your method to delete items in both lists will not work if a list may contain duplicates.
Consider for instance the case a = [1,1,2,3,4] and b = [8,7,9,2,1].
You can simplify your code to make it works:
def disjoint(e,f):
c = e.copy() # [:] works also, but I think this is clearer
d = f.copy()
for i in e: # no need for index. just walk each items in the array
for j in f:
if i == j: # if there is a match, remove the match.
c.remove(i)
d.remove(j)
return c + d
print(disjoint([1,2,3,4],[8,7,9,2,1]))
Try it online!
There are a lot of more effecient way to achieve this. Check this stack overflow question to discover them: Get difference between two lists. My favorite way is to use set (like in #newbie's answer). What is a set? Lets check the documentation:
A set object is an unordered collection of distinct hashable objects. Common uses include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference. (For other containers see the built-in dict, list, and tuple classes, and the collections module.)
emphasis mine
Symmetric difference is perfect for our need!
Returns a new set with elements in either the set or the specified iterable but not both.
Ok here how to use it in your case:
def disjoint(e,f):
return list(set(e).symmetric_difference(set(f)))
print(disjoint([1,2,3,4],[8,7,9,2,1]))
Try it online!

Efficiently Perform Nested Dictionary Lookups and List Appending Using Numpy Nonzero Indices

I have working code to perform a nested dictionary lookup and append results of another lookup to each key's list using the results of numpy's nonzero lookup function. Basically, I need a list of strings appended to a dictionary. These strings and the dictionary's keys are hashed at one point to integers and kept track of using separate dictionaries with the integer hash as the key and the string as the value. I need to look up these hashed values and store the string results in the dictionary. It's confusing so hopefully looking at the code helps. Here's a simplified version of code:
for key in ResultDictionary:
ResultDictionary[key] = []
true_indices = np.nonzero(numpy_array_of_booleans)
for idx in range(0, len(true_indices[0])):
ResultDictionary.get(HashDictA.get(true_indices[0][idx])).append(HashDictB.get(true_indices[1][idx]))
This code works for me, but I am hoping there's a way to improve the efficiency. I am not sure if I'm limited due to the nested lookup. The speed is also dependent on the number of true results returned by the nonzero function. Any thoughts on this? Appreciate any suggestions.
Here are two suggestions:
1) since your hash dicts are keyed with ints it might help to transform them into arrays or even lists for faster lookup if that is an option.
k, v = map(list, (HashDictB.keys(), HashDictB.values())
mxk, mxv = max(k), max(v, key=len)
lookupB = np.empty((mxk+1,), dtype=f'U{mxv}')
lookupB[k] = v
2) you probably can save a number of lookups in ResultDictionary and HashDictA by processing your numpy_array_of_booleans row-wise:
i, j = np.where(numpy_array_of_indices)
bnds, = np.where(np.r_[True, i[:-1] != i[1:], True])
ResultDict = {HashDictA[i[l]]: [HashDictB[jj] for jj in j[l:r]] for l, r in zip(bnds[:-1], bnds[1:])}
2b) if for some reason you need to incrementally add associations you could do something like (I'll shorten variable names for that)
from operator import itemgetter
res = {}
def add_batch(data, res, hA, hB):
i, j = np.where(data)
bnds, = np.where(np.r_[True, i[:-1] != i[1:], True])
for l, r in zip(bnds[:-1], bnds[1:]):
if l+1 == r:
res.setdefault(hA[i[l]], set()).add(hB[j[l]])
else:
res.setdefault(hA[i[l]], set()).update(itemgetter(*j[l:r])(hB))
You can't do much about the dictionary lookups - you have to do those one at a time.
You can clean up the array indexing a bit:
idxes = np.argwhere(numpy_array_of_booleans)
for i,j in idxes:
ResultDictionary.get(HashDictA.get(i)).append(HashDictB.get(j)
argwhere is transpose(nonzero(...)), turning the tuple of arrays into a (n,2) array of index pairs. I don't think this makes a difference in speed, but the code is cleaner.

Resources