setting an array element with a list - python-3.x

I'd like to create a numpy array with 3 columns (not really), the last of which will be a list of variable lengths (really).
N = 2
A = numpy.empty((N, 3))
for i in range(N):
a = random.uniform(0, 1/2)
b = random.uniform(1/2, 1)
c = []
A[i,] = [a, b, c]
Over the course of execution I will then append or remove items from the lists. I used numpy.empty to initialize the array since this is supposed to give an object type, even so I'm getting the 'setting an array with a sequence error'. I know I am, that's what I want to do.
Previous questions on this topic seem to be about avoiding the error; I need to circumvent the error. The real array has 1M+ rows, otherwise I'd consider a dictionary. Ideas?

Initialize A with
A = numpy.empty((N, 3), dtype=object)
per numpy.empty docs. This is more logical than A = numpy.empty((N, 3)).astype(object) which first creates an array of floats (default data type) and only then casts it to object type.

Related

Conditionally use parts of a nested for loop

I've searched for this answer extensively, but can't seem to find an answer. Therefore, for the first time, I am posting a question here.
I have a function that uses many parameters to perform a calculation. Based on user input, I want to iterate through possible values for some (or all) of the parameters. If I wanted to iterate through all of the parameters, I might do something like this:
for i in range(low1,high1):
for j in range(low2,high2):
for k in range(low3,high3):
for m in range(low4,high4):
doFunction(i, j, k, m)
If I only wanted to iterate the 1st and 4th parameter, I might do this:
for i in range(low1,high1):
for m in range(low4,high4):
doFunction(i, user_input_j, user_input_k, m)
My actual code has almost 15 nested for-loops with 15 different parameters - each of which could be iterable (or not). So, it isn't scalable for me to use what I have and code a unique block of for-loops for each combination of a parameter being iterable or not. If I did that, I'd have 2^15 different blocks of code.
I could do something like this:
if use_static_j == True:
low2 = -999
high2 = -1000
for i in range(low1,high1):
for j in range(low2,high2):
for k in range(low3,high3):
for m in range(low4,high4):
j1 = j if use_static_j==False else user_input_j
doFunction(i, j1, k, m)
I'd just like to know if there is a better way. Perhaps using filter(), map(), or list comprehension... (which I don't have a clear enough understanding of yet)
As suggested in the comments, you could build an array of the parameters and then call the function with each of the values in the array. The easiest way to build the array is using recursion over a list defining the ranges for each parameter. In this code I've assumed a list of tuples consisting of start, stop and scale parameters (so for example the third element in the list produces [3, 2.8, 2.6, 2.4, 2.2]). To use a static value you would use a tuple (static, static+1, 1).
def build_param_array(ranges):
r = ranges[0]
if len(ranges) == 1:
return [[p * r[2]] for p in range(r[0], r[1], -1 if r[1] < r[0] else 1)]
res = []
for p in range(r[0], r[1], -1 if r[1] < r[0] else 1):
pa = build_param_array(ranges[1:])
for a in pa:
res.append([p * r[2]] + a)
return res
# range = (start, stop, scale)
ranges = [(1, 5, 1),
(0, 10, .1),
(15, 10, .2)
]
params = build_param_array(ranges)
for p in params:
doFunction(*p)

Finding multiple indexes but the array always has a length of 1

This seems trivial (again) but has me stumped.
I need to find the indexes of multiple values in a numpy array. I can do this with where and isin but the resulting answer always has a length of 1 regardless of how many indexes are found. Example
import numpy as np
a = [1,3,5,7,9,11,13,15]
b = [1,7,13]
x = np.where(np.isin(a,b))
print(x)
print(len(x))
this returns
(array([0, 3, 6]),)
1
I think its because the array is a single item inside a tuple. How do I return just the array?
Just use
x = np.where(np.isin(a,b))[0]
to get what you expect.
As hpaulj points out in the comments where returns a tuple with one array for each input array dimension, in this case there is only one, which is why x is a tuple of length one.

comparing two arrays and get the values which are not common

I am doing this problem a friend gave me where you are given 2 arrays say (a[1,2,3,4] and b[8,7,9,2,1]) and you have to find not common elements.
Expected output is [3,4,8,7,9]. Code below.
def disjoint(e,f):
c = e[:]
d = f[:]
for i in range(len(e)):
for j in range(len(f)):
if e[i] == f[j]:
c.remove(e[i])
d.remove(d[j])
final = c + d
print(final)
print(disjoint(a,b))
I tried with nested loops and creating copies of given arrays to modify them then add them but...
def disjoint(e,f):
c = e[:] # list copies
d = f[:]
for i in range(len(e)):
for j in range(len(f)):
if e[i] == f[j]:
c.remove(c[i]) # edited this line
d.remove(d[j])
final = c + d
print(final)
print(disjoint(a,b))
when I try removing common element from list copies, I get different output [2,4,8,7,9]. why ??
This is my first question in this website. I'll be thankful if anyone can clear my doubts.
Using sets you can do:
a = [1,2,3,4]
b = [8,7,9,2,1]
diff = (set(a) | set(b)) - (set(a) & set(b))
(set(a) | set(b)) is the union, set(a) & set(b) is the intersection and finally you do the difference between the two sets using -.
Your bug comes when you remove the elements in the lines c.remove(c[i]) and d.remove(d[j]). Indeed, the common elements are e[i]and f[j] while c and d are the lists you are updating.
To fix your bug you only need to change these lines to c.remove(e[i]) and d.remove(f[j]).
Note also that your method to delete items in both lists will not work if a list may contain duplicates.
Consider for instance the case a = [1,1,2,3,4] and b = [8,7,9,2,1].
You can simplify your code to make it works:
def disjoint(e,f):
c = e.copy() # [:] works also, but I think this is clearer
d = f.copy()
for i in e: # no need for index. just walk each items in the array
for j in f:
if i == j: # if there is a match, remove the match.
c.remove(i)
d.remove(j)
return c + d
print(disjoint([1,2,3,4],[8,7,9,2,1]))
Try it online!
There are a lot of more effecient way to achieve this. Check this stack overflow question to discover them: Get difference between two lists. My favorite way is to use set (like in #newbie's answer). What is a set? Lets check the documentation:
A set object is an unordered collection of distinct hashable objects. Common uses include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference. (For other containers see the built-in dict, list, and tuple classes, and the collections module.)
emphasis mine
Symmetric difference is perfect for our need!
Returns a new set with elements in either the set or the specified iterable but not both.
Ok here how to use it in your case:
def disjoint(e,f):
return list(set(e).symmetric_difference(set(f)))
print(disjoint([1,2,3,4],[8,7,9,2,1]))
Try it online!

Efficiently Perform Nested Dictionary Lookups and List Appending Using Numpy Nonzero Indices

I have working code to perform a nested dictionary lookup and append results of another lookup to each key's list using the results of numpy's nonzero lookup function. Basically, I need a list of strings appended to a dictionary. These strings and the dictionary's keys are hashed at one point to integers and kept track of using separate dictionaries with the integer hash as the key and the string as the value. I need to look up these hashed values and store the string results in the dictionary. It's confusing so hopefully looking at the code helps. Here's a simplified version of code:
for key in ResultDictionary:
ResultDictionary[key] = []
true_indices = np.nonzero(numpy_array_of_booleans)
for idx in range(0, len(true_indices[0])):
ResultDictionary.get(HashDictA.get(true_indices[0][idx])).append(HashDictB.get(true_indices[1][idx]))
This code works for me, but I am hoping there's a way to improve the efficiency. I am not sure if I'm limited due to the nested lookup. The speed is also dependent on the number of true results returned by the nonzero function. Any thoughts on this? Appreciate any suggestions.
Here are two suggestions:
1) since your hash dicts are keyed with ints it might help to transform them into arrays or even lists for faster lookup if that is an option.
k, v = map(list, (HashDictB.keys(), HashDictB.values())
mxk, mxv = max(k), max(v, key=len)
lookupB = np.empty((mxk+1,), dtype=f'U{mxv}')
lookupB[k] = v
2) you probably can save a number of lookups in ResultDictionary and HashDictA by processing your numpy_array_of_booleans row-wise:
i, j = np.where(numpy_array_of_indices)
bnds, = np.where(np.r_[True, i[:-1] != i[1:], True])
ResultDict = {HashDictA[i[l]]: [HashDictB[jj] for jj in j[l:r]] for l, r in zip(bnds[:-1], bnds[1:])}
2b) if for some reason you need to incrementally add associations you could do something like (I'll shorten variable names for that)
from operator import itemgetter
res = {}
def add_batch(data, res, hA, hB):
i, j = np.where(data)
bnds, = np.where(np.r_[True, i[:-1] != i[1:], True])
for l, r in zip(bnds[:-1], bnds[1:]):
if l+1 == r:
res.setdefault(hA[i[l]], set()).add(hB[j[l]])
else:
res.setdefault(hA[i[l]], set()).update(itemgetter(*j[l:r])(hB))
You can't do much about the dictionary lookups - you have to do those one at a time.
You can clean up the array indexing a bit:
idxes = np.argwhere(numpy_array_of_booleans)
for i,j in idxes:
ResultDictionary.get(HashDictA.get(i)).append(HashDictB.get(j)
argwhere is transpose(nonzero(...)), turning the tuple of arrays into a (n,2) array of index pairs. I don't think this makes a difference in speed, but the code is cleaner.

Switching positions of two strings within a list

I have another question that I'd like input on, of course no direct answers just something to point me in the right direction!
I have a string of numbers ex. 1234567890 and I want 1 & 0 to change places (0 and 9) and for '2345' & '6789' to change places. For a final result of '0678923451'.
First things I did was convert the string into a list with:
ex. original = '1234567890'
original = list(original)
original = ['0', '1', '2' etc....]
Now, I get you need to pull the first and last out, so I assigned
x = original[0]
and
y = original[9]
So: x, y = y, x (which gets me the result I'm looking for)
But how do I input that back into the original list?
Thanks!
The fact that you 'pulled' the data from the list in variables x and y doesn't help at all, since those variables have no connection anymore with the items from the list. But why don't you swap them directly:
original[0], original[9] = original[9], original[0]
You can use the slicing operator in a similar manner to swap the inner parts of the list.
But, there is no need to create a list from the original string. Instead, you can use the slicing operator to achieve the result you want. Note that you cannot swap the string elements as you did with lists, since in Python strings are immutable. However, you can do the following:
>>> a = "1234567890"
>>> a[9] + a[5:9] + a[1:5] + a[0]
'0678923451'
>>>

Resources