check if a number has its equivalent negative number in a list - python-3.x

need to traverse through a list and check if list members has its equivalent negative numbers also in same list
Note: list will not have duplicates and considering performance as well
l = [2, -3, 1, 3, 5, 0, -5, -2]
d=defaultdict(list)
for num in l:
if num>0:
d['pos'].append(num)
else:
d['neg'].append(num)
print (d)
not sure how to proceed further. Could you help please
l = [2, -3, 1, 3, 5, 0, -5, -2]
output: [[2,-2],[-3,3],[5,-5]]

As there cannot be duplicates and you thus do not need to consider exclusive use of each number, you may use a set, like so,
numbers = [2, -3, 1, 3, 5, 0, -5, -2]
unique_numbers = set(numbers)
paired = [[number, -number] for number in numbers
if number > 0 and -number in unique_numbers]
print(paired)
For a result of [[2, -2], [3, -3], [5, -5]].
Sets support O(1) membership checks. Constructing one from an existing list costs O(n), where n is the number of elements in the list. Iterating over the list to compute the pairs is again O(n), such that it runs in O(n) total, at the additional cost of some more memory for the set (again around n).

If you don't care much about performance,
[[x, -x] for x in l if x > 0 and -x in l]

Related

Is it possible to perform row-wise tensor operations if two PyTorch tensors do not have the same size without using list comprehensions?

This is a pretty specific usage case, but I'm hoping someone out there is more familiar with PyTorch tensors than I am and can help me speed this up.
I'm working on implementing a custom similarity metric for a neural network and have successfully gotten it to work, but it is incredibly slow to calculate. Each epoch takes about a minute to run, which simply isn't going to work with how I wanted to compare it with other metrics. So, I've been trying to utilize PyTorch tensors more effectively to speed things up, but haven't had much success.
Basically, I need to sum up the integers in the 'counts' tensor between the min and max indices specified in the 'min' and 'max' tensors for each sample and cluster combination.
As mentioned, my original implementation using loops took about a minute per epoch to run, but I did manage to reduce that to about 18-20 seconds using list comprehensions:
# counts has size (16, 100), max and min have size (2708, 7, 16)
data_mass = torch.sum(torch.tensor([[[torch.pow(torch.sum(counts[k][min[i][j][k]:max[i][j][k]+1]) / divisor, 2) for k in range(len(counts))] for j in range(len(min[i]))] for i in range(len(min))]), 2)
This feels super janky, and I've seen some clever things done with PyTorch functions, but I haven't been able to find anything yet that addresses quite what I want to do. Thanks in advance! I'm happy to clarify anything that may not be clear, I understand the use case is a bit convoluted.
EDIT: I'll try and break down the code snippet above and provide a minimal example. Examples of minimal inputs might look like the following:
'min' and 'max' are both 3-dimensional tensors of shape (num_samples, num_clusters, num_features), such as this one of size (2, 3, 4)
min = tensor([[[1, 2, 3, 1],
[2, 1, 1, 2],
[1, 2, 2, 1]],
[[2, 3, 2, 1],
[3, 3, 1, 2],
[1, 0, 2, 1]]])
max = tensor([[[3, 3, 4, 4],
[3, 2, 3, 4],
[2, 4, 3, 2]],
[[4, 4, 3, 3],
[4, 4, 2, 3],
[2, 1, 3, 2]]])
'counts' is a 2-dimensional tensor of size(num_features, num_bins),
so for this example we'll say size (4, 5)
counts = tensor([[1, 2, 3, 4, 5],
[2, 5, 3, 1, 1],
[1, 2, 3, 4, 5],
[2, 5, 3, 1, 1]])
The core part of the code snippet given above is the summation of the counts tensor between the values given by the min and max tensors for each pair of indices given at each index in max/min. For the first sample/cluster combo above:
mins = [1, 2, 3, 1]
maxes = [3, 3, 4, 4]
#Starting with feature #1 (leftmost element of min/max, top row of counts),
we sum the values in counts between the indices specified by min and max:
min_value = mins[0] = 1
max_value = maxes[0] = 3
counts[0] = [1, 2, 3, 4, 5]
subset = counts[0][mins[0]:maxes[0]+1] = [2, 3, 4]
torch.sum(subset) = 9
#Second feature
min_value = mins[1] = 2
max_value = maxes[1] = 3
counts[1] = [2, 5, 3, 1, 1]
subset = counts[0][mins[0]:maxes[0]+1] = [3, 1]
torch.sum(subset) = 4
In my code snippet, I perform a few additional operations, but if we ignore those and just sum all the index pairs, the output will have the form
pre_sum_output = tensor([[[9, 4, 9, 10],
[7, 8, 9, 5]
[5, 5, 7, 8]],
[[12, 2, 7, 9],
[9, 2, 5, 4],
[5, 7, 7, 8]]])
Finally, I sum the output one final time along the third dimension:
data_mass = torch.sum(pre_sum_output, 2) = torch.tensor([[32, 39, 25],
[30, 20, 27]])
I then need to repeat this for every pair of mins and maxes in 'min' and 'max' (each [i][j][k]), hence the list comprehension above iterating through i and j to get each sample and cluster respectively.
By noticing that torch.sum(counts[0][mins[0]:maxes[0]+1]) is equal to cumsum[maxes[0]] - cumsum[mins[0]-1] where cumsum = torch.cumsum(counts[0]), you can get rid of the loops like so:
# Dim of sample, clusters, etc.
S, C, F, B = range(4)
# Copy min and max over bins
min = min.unsqueeze(B)
max = max.unsqueeze(B)
# Copy counts over samples and clusters
counts = counts.reshape(
1, # S
1, # C
*counts.shape # F x B
)
# Number of samples, clusters, etc.
ns, nc, nf, nb = min.size(S), min.size(C), min.size(F), counts.size(B)
# Calculate cumulative sum and copy over samples and clusters
cum_counts = counts.cumsum(dim=B).expand(ns, nc, nf, nb)
# Prevent index error when min index is 0
is_zero = min == 0
lo = (min - 1).masked_fill(is_zero, 0)
# Compute the contiguous sum from min to max (inclusive)
lo_sum = cum_counts.gather(dim=B, index=lo)
hi_sum = cum_counts.gather(dim=B, index=max)
sum_counts = torch.where(is_zero, hi_sum, hi_sum - lo_sum)
pre_sum_output = sum_counts.squeeze(B)
You can then sum over the 2nd dim to get data_mass.

LIS on two arrays

I feel lost on how to approach this question,
Given two integer array of size 𝑛
, 𝑚
, I want to merge these two arrays into one such that order of element in each array doesn't change and size of their Longest Increasing Subsequence become maximum.
Once we choose an element of A or B, we cannot choose an earlier element of that sequence
My goal is to find maximum possible length of longest increasing subsequence.
This is what I have so far:
def sequences(a, b, start_index=0, min_val=None):
limits = a[start_index], b[start_index]
lower = min(limits)
higher = max(limits)
if min_val is not None and min_val > lower:
lower = min_val
options = range(lower, higher + 1)
is_last = start_index == len(a) - 1
for val in options:
if is_last:
yield [val]
else:
for seq in sequences(a, b, start_index+1, min_val=val+1):
yield [val, *seq]
for seq in sequences([1,3,1,6], [6,5,4,4]):
print(seq)
However, this results in: [1, 3, 4, 5], [1, 3, 4, 6], [2, 3, 4, 5], [2, 3, 4, 6].
The expected output should be:
array1: [1,3,1,6]
array2: [6,5,4,4]
We take 1(from array1), 3(from array1), 4(from array2), 6(from array1)
Giving us LIS: [1,3,4,6].
We got this by not choosing an earlier element from a sequence once we are at a certain value.
How do I stop it from unwanted recursion?

understanding the working principle of sorted function python [duplicate]

I have the following Python dict:
[(2, [3, 4, 5]), (3, [1, 0, 0, 0, 1]), (4, [-1]), (10, [1, 2, 3])]
Now I want to sort them on the basis of sum of values of the values of dictionary, so for the first key the sum of values is 3+4+5=12.
I have written the following code that does the job:
def myComparator(a,b):
print "Values(a,b): ",(a,b)
sum_a=sum(a[1])
sum_b=sum(b[1])
print sum_a,sum_b
print "Comparision Returns:",cmp(sum_a,sum_b)
return cmp(sum_a,sum_b)
items.sort(myComparator)
print items
This is what the output that I get after running above:
Values(a,b): ((3, [1, 0, 0, 0, 1]), (2, [3, 4, 5]))
2 12
Comparision Returns: -1
Values(a,b): ((4, [-1]), (3, [1, 0, 0, 0, 1]))
-1 2
Comparision Returns: -1
Values(a,b): ((10, [1, 2, 3]), (4, [-1]))
6 -1
Comparision Returns: 1
Values(a,b): ((10, [1, 2, 3]), (3, [1, 0, 0, 0, 1]))
6 2
Comparision Returns: 1
Values(a,b): ((10, [1, 2, 3]), (2, [3, 4, 5]))
6 12
Comparision Returns: -1
[(4, [-1]), (3, [1, 0, 0, 0, 1]), (10, [1, 2, 3]), (2, [3, 4, 5])]
Now I am unable to understand as to how the comparator is working, which two values are being passed and how many such comparisons would happen? Is it creating a sorted list of keys internally where it keeps track of each comparison made? Also the behavior seems to be very random. I am confused, any help would be appreciated.
The number and which comparisons are done is not documented and in fact, it can freely change from different implementations. The only guarantee is that if the comparison function makes sense the method will sort the list.
CPython uses the Timsort algorithm to sort lists, so what you see is the order in which that algorithm is performing the comparisons (if I'm not mistaken for very short lists Timsort just uses insertion sort)
Python is not keeping track of "keys". It just calls your comparison function every time a comparison is made. So your function can be called many more than len(items) times.
If you want to use keys you should use the key argument. In fact you could do:
items.sort(key=lambda x: sum(x[1]))
This will create the keys and then sort using the usual comparison operator on the keys. This is guaranteed to call the function passed by key only len(items) times.
Given that your list is:
[a,b,c,d]
The sequence of comparisons you are seeing is:
b < a # -1 true --> [b, a, c, d]
c < b # -1 true --> [c, b, a, d]
d < c # 1 false
d < b # 1 false
d < a # -1 true --> [c, b, d, a]
how the comparator is working
This is well documented:
Compare the two objects x and y and return an integer according to the outcome. The return value is negative if x < y, zero if x == y and strictly positive if x > y.
Instead of calling the cmp function you could have written:
sum_a=sum(a[1])
sum_b=sum(b[1])
if sum_a < sum_b:
return -1
elif sum_a == sum_b:
return 0
else:
return 1
which two values are being passed
From your print statements you can see the two values that are passed. Let's look at the first iteration:
((3, [1, 0, 0, 0, 1]), (2, [3, 4, 5]))
What you are printing here is a tuple (a, b), so the actual values passed into your comparison functions are
a = (3, [1, 0, 0, 0, 1])
b = (2, [3, 4, 5]))
By means of your function, you then compare the sum of the two lists in each tuple, which you denote sum_a and sum_b in your code.
and how many such comparisons would happen?
I guess what you are really asking: How does the sort work, by just calling a single function?
The short answer is: it uses the Timsort algorithm, and it calls the comparison function O(n * log n) times (note that the actual number of calls is c * n * log n, where c > 0).
To understand what is happening, picture yourself sorting a list of values, say v = [4,2,6,3]. If you go about this systematically, you might do this:
start at the first value, at index i = 0
compare v[i] with v[i+1]
If v[i+1] < v[i], swap them
increase i, repeat from 2 until i == len(v) - 2
start at 1 until no further swaps occurred
So you get, i =
0: 2 < 4 => [2, 4, 6, 3] (swap)
1: 6 < 4 => [2, 4, 6, 3] (no swap)
2: 3 < 6 => [2, 4, 3, 6] (swap)
Start again:
0: 4 < 2 => [2, 4, 3, 6] (no swap)
1: 3 < 4 => [2, 3, 4, 6] (swap)
2: 6 < 4 => [2, 3, 4, 6] (no swap)
Start again - there will be no further swaps, so stop. Your list is sorted. In this example we have run through the list 3 times, and there were 3 * 3 = 9 comparisons.
Obviously this is not very efficient -- the sort() method only calls your comparator function 5 times. The reason is that it employs a more efficient sort algorithm than the simple one explained above.
Also the behavior seems to be very random.
Note that the sequence of values passed to your comparator function is not, in general, defined. However, the sort function does all the necessary comparisons between any two values of the iterable it receives.
Is it creating a sorted list of keys internally where it keeps track of each comparison made?
No, it is not keeping a list of keys internally. Rather the sorting algorithm essentially iterates over the list you give it. In fact it builds subsets of lists to avoid doing too many comparisons - there is a nice visualization of how the sorting algorithm works at Visualising Sorting Algorithms: Python's timsort by Aldo Cortesi
Basically, for the simple list such as [2, 4, 6, 3, 1] and the complex list you provided, the sorting algorithms are the same.
The only differences are the complexity of elements in the list and the comparing scheme that how to compare any tow elements (e.g. myComparator you provided).
There is a good description for Python Sorting: https://wiki.python.org/moin/HowTo/Sorting
First, the cmp() function:
cmp(...)
cmp(x, y) -> integer
Return negative if x<y, zero if x==y, positive if x>y.
You are using this line: items.sort(myComparator) which is equivalent to saying: items.sort(-1) or items.sort(0) or items.sort(1)
Since you want to sort based on the sum of each tuples list, you could do this:
mylist = [(2, [3, 4, 5]), (3, [1, 0, 0, 0, 1]), (4, [-1]), (10, [1, 2, 3])]
sorted(mylist, key=lambda pair: sum(pair[1]))
What this is doing is, I think, exactly what you wanted. Sorting mylist based on the sum() of each tuples list

Reducing time complexity in comparing contiguous subarrays?

So say I have a list sequences such as this.
I want to remove all sequences where its total sum = N and/or it has a contiguous subarray with sum = N.
For example, if N = 4, then (1,1,2) is not valid since its total is 4. (1,1,3) is also not valid since the (1,3) is also 4. (1,3,1) is also not valid for the same reason.
lst = [
(1,1,1), (1,1,2), (1,1,3),
(1,2,1), (1,2,2), (1,2,3),
(1,3,1), (1,3,2), (1,3,3),
(2,1,1), (2,1,2), (2,1,3),
(2,2,1), (2,2,2), (2,2,3),
(2,3,1), (2,3,2), (2,3,3),
(3,1,1), (3,1,2), (3,1,3),
(3,2,1), (3,2,2), (3,2,3),
(3,3,1), (3,3,2), (3,3,3)
]
E.g.
Input: 4 3
Output: 2 1 2
So what I have right now is
lst = [t for t in list(product(range(1,n),repeat=n-1)) if not any((sum(t[l:h+1]) % n == 0) for l, h in combinations(range(len(t)), 2))]
Currently it is in O(n2) if I'm not mistaken. What would be a better way to do this?
If you can use numpy, you can concatenate the total sum of each tuple with the contiguous value sums, then check if any of your resultign elements are equal to 4:
arr = np.array(lst)
arr[~(np.concatenate((np.sum(arr,axis=1).reshape(-1,1),
(arr[:,:-1]+ arr[:,1:])),axis=1) == 4).any(1)]
# or:
arr[(np.concatenate((np.sum(arr,axis=1).reshape(-1,1),
(arr[:,:-1]+ arr[:,1:])),axis=1) != 4).all(1)]
Returning:
array([[1, 1, 1],
[1, 2, 3],
[2, 1, 2],
[2, 3, 2],
[2, 3, 3],
[3, 2, 1],
[3, 2, 3],
[3, 3, 2],
[3, 3, 3]])

How to find all numbers in a list that are not part of a pair - using python 3

I am trying to write a python 3 function that finds all numbers in a list (unspecified length) that are not part of a pair.
For example, given the list [1, 2, 1, 3, 2], the function will return 3; and given the list [0, 1, 1, 7, 8, 3, 9, 3, 9], the function will return 0, 7, and 8.
Thanks for your help!
You can use the following function :
>>> def find(l):
... return (i for i in l if l.count(i)==1)
>>> l= [0, 1, 1, 7, 8, 3, 9, 3, 9]
>>> list(find(l))
[0, 7, 8]
This function will return a generator that is contain the elements in list which those count is equal to 1.
I can tell you how I would do it. What does it mean a "pair"?
You should say, find all the numbers repeated oddly in the array.
First plan: (more efficient!)
Sort the list and then a single loop through your list should be enough to find how many numbers of each there are inside and you can generate awhile another list that you will return.
Second plan (nicer in python, but also more expensive because of the number of evaluations though the hole list):
Try the solution of Kasra. 'count' function from 'list' type helps our code but not our efficiency. It counts the number of times that appears the value 'i' on the list 'l', obviously.
If the pair need to be "closed pair" I mean, if you have three 1 (ones), do you have one pair and one single 1? or do you have all the 1 paired? If the second one, the solution of Kasra is Ok. Else you should compare:
if l.count(i) % 2 == 1
This can be easily and efficiently done in 3 lines with collections.Counter.
from collections import Counter
def unpaired(numbers):
for key, count in Counter(numbers).items():
if count % 2:
yield key
print(list(unpaired([1, 2, 1, 3, 2])))
# [3]
print(list(unpaired([0, 1, 1, 7, 8, 3, 9, 3, 9])))
# [0, 7, 8]
My answer comport if you have three equals numbers or if you have one pair and one single number without pair.
def name(array):
o = sorted(array)
c = []
d = []
for i in o:
if o.count(i) % 2 == 1:
c.append(i)
for j in c:
if j not in d:
d.append(j)
return d
or do not use for j in c and use directly:
return list(set(c))
for example:
array = [0, 1, 1, 7, 8, 3, 9, 3, 9, 9]
output: [0, 7, 8, 9]

Resources