Wrong set output for intervals - python-3.x

I came across an interview question and decided to try it.
The problem is as follows: Given a set of closed intervals, find the smallest set of numbers that covers all the intervals. If there are multiple smallest sets, return any of them.
For example, given the intervals [0, 3], [2, 6], [3, 4], [6, 9], one set of numbers that covers all these intervals is {3, 6}.
My code is below:
def findIntersection(intervals):
"""
find the intersection of a list of intersections.
"""
# First interval
l = intervals[0][0] #lower component
r = intervals[0][1] #higher component
# Check rest of the intervals
# and find the intersection
for i in range(1,len(intervals)):
interval=[l,r]
# If no intersection exists
if (intersecting(intervals[i],interval)):
print()
# Else update the intersection
else:
l = max(l, intervals[i][0])
r = min(r, intervals[i][1])
interval=[l,r]
return([l, r])
def intersecting(x, y):
"""
Return a boolean indicaing if 2 intervals (x,y) are intersecting
"""
return(y[0] > x[1] or x[0] > y[1])
l=[[0, 3], [2, 6], [3, 4], [6, 9]]
print(findIntersection(l) #this does not work
intervals= [ [ 1, 6 ], [ 2, 8 ], [ 3, 10 ], [ 5, 8 ]]
print(findIntersection(intervals))# this works`
For input l: the output is [3,9] which is not the answer. The output with intervals is [5,6] which is expected.

It's not easier to work with the intersection funtion from the set-tool.
like
s1 = set([1,2,3])
s2 = set([2,3,4,5])
print(s1.intersection(s2))

Related

Is it possible to perform row-wise tensor operations if two PyTorch tensors do not have the same size without using list comprehensions?

This is a pretty specific usage case, but I'm hoping someone out there is more familiar with PyTorch tensors than I am and can help me speed this up.
I'm working on implementing a custom similarity metric for a neural network and have successfully gotten it to work, but it is incredibly slow to calculate. Each epoch takes about a minute to run, which simply isn't going to work with how I wanted to compare it with other metrics. So, I've been trying to utilize PyTorch tensors more effectively to speed things up, but haven't had much success.
Basically, I need to sum up the integers in the 'counts' tensor between the min and max indices specified in the 'min' and 'max' tensors for each sample and cluster combination.
As mentioned, my original implementation using loops took about a minute per epoch to run, but I did manage to reduce that to about 18-20 seconds using list comprehensions:
# counts has size (16, 100), max and min have size (2708, 7, 16)
data_mass = torch.sum(torch.tensor([[[torch.pow(torch.sum(counts[k][min[i][j][k]:max[i][j][k]+1]) / divisor, 2) for k in range(len(counts))] for j in range(len(min[i]))] for i in range(len(min))]), 2)
This feels super janky, and I've seen some clever things done with PyTorch functions, but I haven't been able to find anything yet that addresses quite what I want to do. Thanks in advance! I'm happy to clarify anything that may not be clear, I understand the use case is a bit convoluted.
EDIT: I'll try and break down the code snippet above and provide a minimal example. Examples of minimal inputs might look like the following:
'min' and 'max' are both 3-dimensional tensors of shape (num_samples, num_clusters, num_features), such as this one of size (2, 3, 4)
min = tensor([[[1, 2, 3, 1],
[2, 1, 1, 2],
[1, 2, 2, 1]],
[[2, 3, 2, 1],
[3, 3, 1, 2],
[1, 0, 2, 1]]])
max = tensor([[[3, 3, 4, 4],
[3, 2, 3, 4],
[2, 4, 3, 2]],
[[4, 4, 3, 3],
[4, 4, 2, 3],
[2, 1, 3, 2]]])
'counts' is a 2-dimensional tensor of size(num_features, num_bins),
so for this example we'll say size (4, 5)
counts = tensor([[1, 2, 3, 4, 5],
[2, 5, 3, 1, 1],
[1, 2, 3, 4, 5],
[2, 5, 3, 1, 1]])
The core part of the code snippet given above is the summation of the counts tensor between the values given by the min and max tensors for each pair of indices given at each index in max/min. For the first sample/cluster combo above:
mins = [1, 2, 3, 1]
maxes = [3, 3, 4, 4]
#Starting with feature #1 (leftmost element of min/max, top row of counts),
we sum the values in counts between the indices specified by min and max:
min_value = mins[0] = 1
max_value = maxes[0] = 3
counts[0] = [1, 2, 3, 4, 5]
subset = counts[0][mins[0]:maxes[0]+1] = [2, 3, 4]
torch.sum(subset) = 9
#Second feature
min_value = mins[1] = 2
max_value = maxes[1] = 3
counts[1] = [2, 5, 3, 1, 1]
subset = counts[0][mins[0]:maxes[0]+1] = [3, 1]
torch.sum(subset) = 4
In my code snippet, I perform a few additional operations, but if we ignore those and just sum all the index pairs, the output will have the form
pre_sum_output = tensor([[[9, 4, 9, 10],
[7, 8, 9, 5]
[5, 5, 7, 8]],
[[12, 2, 7, 9],
[9, 2, 5, 4],
[5, 7, 7, 8]]])
Finally, I sum the output one final time along the third dimension:
data_mass = torch.sum(pre_sum_output, 2) = torch.tensor([[32, 39, 25],
[30, 20, 27]])
I then need to repeat this for every pair of mins and maxes in 'min' and 'max' (each [i][j][k]), hence the list comprehension above iterating through i and j to get each sample and cluster respectively.
By noticing that torch.sum(counts[0][mins[0]:maxes[0]+1]) is equal to cumsum[maxes[0]] - cumsum[mins[0]-1] where cumsum = torch.cumsum(counts[0]), you can get rid of the loops like so:
# Dim of sample, clusters, etc.
S, C, F, B = range(4)
# Copy min and max over bins
min = min.unsqueeze(B)
max = max.unsqueeze(B)
# Copy counts over samples and clusters
counts = counts.reshape(
1, # S
1, # C
*counts.shape # F x B
)
# Number of samples, clusters, etc.
ns, nc, nf, nb = min.size(S), min.size(C), min.size(F), counts.size(B)
# Calculate cumulative sum and copy over samples and clusters
cum_counts = counts.cumsum(dim=B).expand(ns, nc, nf, nb)
# Prevent index error when min index is 0
is_zero = min == 0
lo = (min - 1).masked_fill(is_zero, 0)
# Compute the contiguous sum from min to max (inclusive)
lo_sum = cum_counts.gather(dim=B, index=lo)
hi_sum = cum_counts.gather(dim=B, index=max)
sum_counts = torch.where(is_zero, hi_sum, hi_sum - lo_sum)
pre_sum_output = sum_counts.squeeze(B)
You can then sum over the 2nd dim to get data_mass.

Python return specific items from list in same line

I have a list of items from which I need to separate items with specific "key". Let's say I need all items that follow "X" -> the list may look like this: Y1 1-2 X1 3-5 Z1 6-8, Y2 3-5 X2 5-7 Z2 5-9 so I need to take the X "values" that are 3-5 and 5-7. These should be returned this way: 3 4 5 and 5 6 7 and on their own lines so that they can be used in another functions.
I have also tried taking "X"s to its own dictionary but the problem is still the same. I also know about end="" but it does not help me with this.
def get_x_values(list_parameter):
list_of_items = []
list_of_x = []
for i in list_parameter:
i = i.split(' ')
for item in i:
if item != '':
list_of_items.append(item)
for item, next_item in zip(list_of_items, list_of_items[1:]):
if item == 'X':
list_of_x.append(next_item)
for x in list_of_x:
for i in range(int(x[0]), int(x[-1]) + 1):
yield i
When I loop the yield values trough, I get the X values like this:
3
4
5
5
6
7
When I need them this way:
3 4 5
5 6 7
Any help appreciated.
I modified you code, so that it will work.
def get_x_values(list_parameter):
list_of_items = []
for i in list_parameter:
i = i.split(' ')
for item in i:
if item != '':
list_of_items.append(item)
for item, next_item in zip(list_of_items, list_of_items[1:]):
if item == 'X':
range_list = list(range(int(next_item[0]), int(next_item[-1]) + 1))
yield " ".join(str(number) for number in range_list)
lst = ["Y 1-2 X 3-5 Z 6-8", "Y 3-5 X 5-7 Z 5-9"]
result = get_x_values(lst)
for x in result:
print(x)
However, this is not the most elegant solution. But I guess it's easier to understand for you as it's pretty close to your own attempt.
I hope it helps you. Let me know if there are any questions left. Have a nice day!
You need to
split your list (you got it)
put key and value together (you use zip, I use a dict comprehension for that)
split your values into numbers and convert to int
create a range from your int-converted values to fill in missing numbers
for example like so:
# there is a pesky , in your string, we strip it out
inp = "Y1 1-2 X1 3-5 Z1 6-8, Y2 3-5 X2 5-7 Z2 5-9"
formatted_input = [a.rstrip(",") for a in inp.split(" ")]
print(formatted_input)
# put keys and values together and convert values to int-list
as_dict = {formatted_input[a]:list(map(int,formatted_input[a+1].split("-")))
for a in range(0,len(formatted_input),2)}
print(as_dict)
# create correct ranges from int-list
as_dict_ranges = {key:list(range(a,b+1)) for key,(a,b) in as_dict.items()}
print(as_dict_ranges)
# you could put all the above in a function and yield the dict-items from here:
# yield from as_dict_ranges.item()
# and filter them for key = X.... outside
# filter for keys that start with X
for k,v in as_dict_ranges.items():
if k.startswith("X"):
print(*v, sep=" ") # decompose the values and print seperated onto one line
Outputs:
# formatted_input
['Y1', '1-2', 'X1', '3-5', 'Z1', '6-8', 'Y2', '3-5', 'X2', '5-7', 'Z2', '5-9']
# as_dict
{'Y1': [1, 2], 'X1': [3, 5], 'Z1': [6, 8],
'Y2': [3, 5], 'X2': [5, 7], 'Z2': [5, 9]}
# as_dict_ranges
{'Y1': [1, 2], 'X1': [3, 4, 5], 'Z1': [6, 7, 8],
'Y2': [3, 4, 5], 'X2': [5, 6, 7], 'Z2': [5, 6, 7, 8, 9]}
# output for keys X...
3 4 5
5 6 7
You can omit one list conversion if you do not want to print the map(int, ...) values:
as_dict = {formatted_input[a]:map(int,formatted_input[a+1].split("-"))
for a in range(0,len(formatted_input),2)}
Documentation:
range()
map
str.split()
str.rstrip()

Reducing time complexity in comparing contiguous subarrays?

So say I have a list sequences such as this.
I want to remove all sequences where its total sum = N and/or it has a contiguous subarray with sum = N.
For example, if N = 4, then (1,1,2) is not valid since its total is 4. (1,1,3) is also not valid since the (1,3) is also 4. (1,3,1) is also not valid for the same reason.
lst = [
(1,1,1), (1,1,2), (1,1,3),
(1,2,1), (1,2,2), (1,2,3),
(1,3,1), (1,3,2), (1,3,3),
(2,1,1), (2,1,2), (2,1,3),
(2,2,1), (2,2,2), (2,2,3),
(2,3,1), (2,3,2), (2,3,3),
(3,1,1), (3,1,2), (3,1,3),
(3,2,1), (3,2,2), (3,2,3),
(3,3,1), (3,3,2), (3,3,3)
]
E.g.
Input: 4 3
Output: 2 1 2
So what I have right now is
lst = [t for t in list(product(range(1,n),repeat=n-1)) if not any((sum(t[l:h+1]) % n == 0) for l, h in combinations(range(len(t)), 2))]
Currently it is in O(n2) if I'm not mistaken. What would be a better way to do this?
If you can use numpy, you can concatenate the total sum of each tuple with the contiguous value sums, then check if any of your resultign elements are equal to 4:
arr = np.array(lst)
arr[~(np.concatenate((np.sum(arr,axis=1).reshape(-1,1),
(arr[:,:-1]+ arr[:,1:])),axis=1) == 4).any(1)]
# or:
arr[(np.concatenate((np.sum(arr,axis=1).reshape(-1,1),
(arr[:,:-1]+ arr[:,1:])),axis=1) != 4).all(1)]
Returning:
array([[1, 1, 1],
[1, 2, 3],
[2, 1, 2],
[2, 3, 2],
[2, 3, 3],
[3, 2, 1],
[3, 2, 3],
[3, 3, 2],
[3, 3, 3]])

Building a list of random multiplication examples

I have two 100-element lists filled with random numbers between 1 and 10.
I want to make a list of multiplications of randomly selected numbers that proceeds until a product greater than 50 is generated.
How can I obtain such a list where each element is a product and its two factors?
Here is the code I tried. I think it has a lot of problems.
import random
list1 = []
for i in range(0,1000):
x = random.randint(1,10)
list1.append(x)
list2 = []
for i in range(0,1000):
y = random.randint(1,10)
list2.append(y)
m=random.sample(list1,1)
n=random.sample(list2,1)
list3=[]
while list3[-1][-1]<50:
c=[m*n]
list3.append(m)
list3.append(n)
list3.append(c)
print(list3)
The output I want
[[5.12154, 4.94359, 25.3188], [1.96322, 3.46708, 6.80663], [9.40574,
2.28941, 21.5336], [4.61705, 9.40964, 43.4448], [9.84915, 3.0071, 29.6174], [8.44413, 9.50134, 80.2305]]
To be more descriptive:
[[randomfactor, randomfactor, product],......,[[randomfactor,
randomfactor, greater than 50]]
Prepping two lists with 1000 numbers each with numbers from 1 to 10 in them is wasted memory. If that is just a simplification and you want to draw from lists you got otherwise, simply replace
a,b = random.choices(range(1,11),k=2)
by
a,b = random.choice(list1), random.choice(list2)
import random
result = []
while True:
a,b = random.choices(range(1,11),k=2) # replace this one as described above
c = a*b
result.append( [a,b,c] )
if c > 50:
break
print(result)
Output:
[[9, 3, 27], [3, 5, 15], [8, 5, 40], [5, 9, 45], [9, 3, 27], [8, 5, 40], [8, 8, 64]]
If you need 1000 random ints between 1 and 10, do:
random_nums = random.choices(range(1,11),k=1000)
this if much faster then looping and appending single integers 1000 times.
Doku:
random.choices(iterable, k=num_to_draw)
random.choice(iterable)

Returning the N largest values' indices in a multidimensional array (can find solutions for one dimension but not multi-dimension)

I have a numpy array X, and I'd like to return another array Y whose entries are the indices of the n largest values of X i.e. suppose I have:
a =np.array[[1, 3, 5], [4, 5 ,6], [9, 1, 7]]
then say, if I want the first 5 "maxs"'s indices-here 9, 7 , 6 , 5, 5 are the maxs, and their indices are:
b=np.array[[2, 0], [2 2], [ 2 1], [1 1], [0 , 2])
I've been able to find some solutions and make this work for a one dimensional array like
c=np.array[1, 2, 3, 4, 5, 6]:
def f(a,N):
return np.argsort(a)[::-1][:N]
But have not been able to generate something that works in more than one dimension. Thanks!
Approach #1
Get the argsort indices on its flattened version and select the last N indices. Then, get the corresponding row and column indices -
N = 5
idx = np.argsort(a.ravel())[-N:][::-1] #single slicing: `[:N-2:-1]`
topN_val = a.ravel()[idx]
row_col = np.c_[np.unravel_index(idx, a.shape)]
Sample run -
# Input array
In [39]: a = np.array([[1,3,5],[4,5,6],[9,1,7]])
In [40]: N = 5
...: idx = np.argsort(a.ravel())[-N:][::-1]
...: topN_val = a.ravel()[idx]
...: row_col = np.c_[np.unravel_index(idx, a.shape)]
...:
In [41]: topN_val
Out[41]: array([9, 7, 6, 5, 5])
In [42]: row_col
Out[42]:
array([[2, 0],
[2, 2],
[1, 2],
[1, 1],
[0, 2]])
Approach #2
For performance, we can use np.argpartition to get top N indices without keeping sorted order, like so -
idx0 = np.argpartition(a.ravel(), -N)[-N:]
To get the sorted order, we need one more round of argsort -
idx = idx0[a.ravel()[idx0].argsort()][::-1]

Resources