Say I have a list of keys that are floats.
keys = [0.999999, 1.999999]
Say I have another list of values.
vals = [1.0, 2.0, 3.0, 4.0, 5.0, 1.0, 1.0, 2.0]
I want to find the total number of times each key occurs in vals and I measure equality using np.isclose(). In the example above, the answer would 5. The following snippet can return this answer, but it is extremely slow when keys and vals are larger in size (10^6 and 10^7, resp.).
def count_float_keys(keys,vals):
count = 0
for key in keys:
present = np.where(np.isclose(vals,key))[0]
count += len(present)
return count
Is there a faster and cleaner alternative to do this?
Edit: 0.99999 is only used as a simplifying example. My data has random float values like 0.035014 that I am not allowed to round further.
Here you go:
# generate random vals
vals = np.random.randint(0,2,(10,10)) + np.random.uniform(0,1,(10,10))
keys = [0.999999, 1.999999]
# check how often each value is in the tolerance of each key
res = [np.sum(np.isclose(vals,k, rtol=0.1, atol=0.1)) for k in keys]
Related
I am trying to find the indices of the n smallest values in a list of tensors in pytorch. Since these tensors might contain many non-unique values, I cannot simply compute percentiles to obtain the indices. The ordering of non-unique values does not matter however.
I came up with the following solution but am wondering if there is a more elegant way of doing it:
import torch
n = 10
tensor_list = [torch.randn(10, 10), torch.zeros(20, 20), torch.ones(30, 10)]
all_sorted, all_sorted_idx = torch.sort(torch.cat([t.view(-1) for t in tensor_list]))
cum_num_elements = torch.cumsum(torch.tensor([t.numel() for t in tensor_list]), dim=0)
cum_num_elements = torch.cat([torch.tensor([0]), cum_num_elements])
split_indeces_lt = [all_sorted_idx[:n] < cum_num_elements[i + 1] for i, _ in enumerate(cum_num_elements[1:])]
split_indeces_ge = [all_sorted_idx[:n] >= cum_num_elements[i] for i, _ in enumerate(cum_num_elements[:-1])]
split_indeces = [all_sorted_idx[:n][torch.logical_and(lt, ge)] - c for lt, ge, c in zip(split_indeces_lt, split_indeces_ge, cum_num_elements[:-1])]
n_smallest = [t.view(-1)[idx] for t, idx in zip(tensor_list, split_indeces)]
Ideally a solution would pick a random subset of the non-unique values instead of picking the entries of the first tensor of the list.
Pytorch does provide a more elegant (I think) way to do it, with torch.unique_consecutive (see here)
I'm going to work on a tensor, not a list of tensors because as you did yourself, there's just a cat to do. Unraveling the indices afterward is not hard either.
# We want to find the n=3 min values and positions in t
n = 3
t = torch.tensor([1,2,3,2,0,1,4,3,2])
# To get a random occurrence, we create a random permutation
randomizer = torch.randperm(len(t))
# first, we sort t, and get the indices
sorted_t, idx_t = t[randomizer].sort()
# small util function to extract only the n smallest values and positions
head = lambda v,w : (v[:n], w[:n])
# use unique_consecutive to remove duplicates
uniques_t, counts_t = head(*torch.unique_consecutive(sorted_t, return_counts=True))
# counts_t.cumsum gives us the position of the unique values in sorted_t
uniq_idx_t = torch.cat([torch.tensor([0]), counts_t.cumsum(0)[:-1]], 0)
# And now, we have the positions of uniques_t values in t :
final_idx_t = randomizer[idx_t[uniq_idx_t]]
print(uniques_t, final_idx_t)
#>>> tensor([0,1,2]), tensor([4,0,1])
#>>> tensor([0,1,2]), tensor([4,5,8])
#>>> tensor([0,1,2]), tensor([4,0,8])
EDIT : I think the added permutation solves your need-random-occurrence problem
I am stuck at this question where I am required to update all largest and smallest numbers in the list with the average value as a way to remove the extreme values in the given list.
For example:
def remove_extreme( [ [0,4], [1,4], [-1,2] ] ) would return [ [0,1.5], [1,1.5], [1.5,2] ].
The function will need to use the average value to modify the elements in the lists, which, in this case the smallest and largest values are -1 and 4 respectively. Then, all the largest and smallest values present in the list need to be changed to the average(mean) of these two values, which is (4+-1)/2 = 1.5
Here's my code:
def remove_extreme(datalist):
for numlist in datalist:
for index, number in enumerate(numlist):
largest = max(number)
smallest = min(number)
average = (largest - smallest)/2
if number == largest or smallest:
num_list[index] = average
return datalist
May I know what went wrong? I keep getting 'int' object is not iterable.
What you asked about
To answer your immediate question, the built in functions max and min return for you the maximum and minimum number from an iterable.
https://docs.python.org/3/library/functions.html#max
So it throws a TypeError when you pass it an integer. Run it on a list/iterable instead.
But your code has more issues than just that.
Your if statement, though syntactically correct, is probably not what you want. More than likely you wanted to do this:
if number == largest or number == smallest:
Like Tomerikoo pointed out, you want to put your max and min outside the loop. As an aside, you do not need to return the list as lists are mutable and you are modifying it freely inside the function.
def remove_extreme(datalist):
for numlist in datalist:
largest = max(numlist)
smallest = min(numlist)
average = (largest - smallest)/2
for index, number in enumerate(numlist):
if number == largest or number == smallest:
numlist[index] = average
return datalist
What your problem is actually asking you
Looking at your original question I think you're a little off from the correct answer if your lists need to look like your given answer. The first hint is that your given answer shows only one of the values changed, and it's not always the average of the inner list. Take [0, 4] for instance. 1.5 is not the average of 0 and 4, yet that is what you say it should return. It seems that you are really desiring to change the most extreme number for each inner list based off the average of all the lists. Taking the average of the numbers of all the inner lists yields 1.66, so I'm not sure of this precisely, but I think one of your numbers might be off by 1 (I think so because 10/6 yields 1.66 while 9/6 yields 1.5).
If the above assumptions are all correct you will want to calculate the average (which is usually the sum/number of elements) and then find the most extreme element within each list.
Your function should look a bit more like this:
def remove_extreme(datalist):
# sum len of each list to get total number of elements
num_elements = sum([len(numlist) for numlist in datalist])
# sum the sum of each list to get total
sum_elements = sum([sum(numlist) for numlist in datalist])
# calculate average
average = sum_elements/num_elements
# find the most extreme element in each list and perform substitution
for numlist in datalist:
smallest = min(numlist)
largest = max(numlist)
large_diff = abs(largest - average)
small_diff = abs(average - smallest)
num_to_change = largest if large_diff > small_diff else smallest
for index, number in enumerate(numlist):
if number == num_to_change: # Just look at the most extreme number
numlist[index] = average
return datalist # list will be modified, but returning it doesn't hurt either
Running this function after changing your -1 to -2:
my_list = [
[0,4],
[1,4],
[-2,2]
]
print("Before: ", my_list)
remove_extreme(my_list)
print("After: ", my_list)
Output:
$ python remove_extreme.py
Before: [[0, 4], [1, 4], [-2, 2]]
After: [[0, 1.5], [1, 1.5], [1.5, 2]]
After further clarification
After clarifying what the question was really asking you, the answer is even simpler:
def remove_extreme(datalist):
flattened = [i for numlist in datalist for i in numlist] # flatten list for convenience
largest = max(flattened)
smallest = min(flattened)
average = (largest + smallest)/2
# find the most extreme element in each list and perform substitution
for numlist in datalist:
for index, number in enumerate(numlist):
if number == smallest or number == largest:
numlist[index] = average
return datalist # list will be modified, but returning it doesn't hurt either
Personally I feel like this makes less sense, but that seems to be what you're asking for.
Also, when writing a question, it's helpful to include the stack trace or point to the specific line where the issue is occurring. Just a helpful tip!
You are trying to get max and min of the element not of the list
>>> list = ( [ [0,4], [1,4], [-1,2] ] )
>>> max(list)
Output
[1, 4]
>>> min(list)
Output
[-1, 2]
I have a list of strings. I need to parse and convert the string into floats and use that for a calculation.
After multiple attempts, I figured out the easiest way to do this.
List=["1x+1y+0","1x-1y+0","1x+0y-3","0x+1y-0.5"]
I need to extract the numerical coefficients of x and y
I used:
for coef in re.split('x|y', line):
float(coeff)
This was not serving the purpose and then I found out that,
for line in list:
a,b,c = [float(coef) for coef in re.split('x|y', line)]
this code works.
If I do
a=[float(coeff) for coeff in re.split('x|y',lines)]
then a is a list with coefficients of the line
[1.0, 1.0, 0.0]
[1.0, -1.0, 0.0]
[1.0, 0.0, -3.0]
[0.0, 1.0, -0.5]
However, I am struggling to understand the logic. Here we used list comprehension. How can we assign multiple variables in a list comprehension? Is the way it works as follows:
for each string element in the list, it splits the string and converts into float. And then assign the three numbers resulting from the operation to three numbers.
But how is that if we assign only one variable it is a list, but if we assign multiple variables the type is changing?
I am sorry if the question is too basic. Am new to python hence the doubt.
a, b, c = x is called sequence unpacking. It is (almost) equivalent to:
a = x[0]
b = x[1]
c = x[2]
So a,b,c = [float(coef) for coef in re.split('x|y', line)] actually means:
x = [float(coef) for coef in re.split('x|y', line)]
a = x[0]
b = x[1]
c = x[2]
But a = x is not unpacking - it's just normal assignment: it makes a reference x. The difference: in the first case you assign a list to three variables, each "gets" one item of the list. In the second case, you assign a list to one variable and that variable "gets" the whole list. Assigning a list of three numbers to two variables (a, b = [1, 2, 3]) is invalid - you get an error message saying that there are too many values to unpack.
In the following program I have a numpy array t.I divide it by one to get J values. Now after taking the inverse I want to get back same array t. I tried to do this in the following code but when I print t and T_list they give me different values. I want exactly the same values of t as in the beginning.
`t= np.linspace(1,4,10)
print(t)
J_values =[]
for i in range(len(t)):
j= 1.0/t[i]
J_values.append(j)
print(J_values)
T_list =[]
for i in range(len(J_values)):
T=1.0/J_values[i]
T_list.append(T)
print(T_list)`
One of the great things about NumPy arrays is that you rarely have to use loops to manipulate them. Inverting the values of t can be done like this:
t = np.linspace(1, 4, 10)
J_values = 1.0 / t
Getting the original values back is then:
T_list = 1.0 / J_values
I have a 3-D array of random numbers of size [channels = 3, height = 10, width = 10].
Then I sorted it using sort command from pytorch along the columns and obtained the indices as well.
The corresponding index is shown below:
Now, I would like to return to the original matrix using these indices. I currently use for loops to do this (without considering the batches). The code is:
import torch
torch.manual_seed(1)
ch = 3
h = 10
w = 10
inp_unf = torch.randn(ch,h,w)
inp_sort, indices = torch.sort(inp_unf,1)
resort = torch.zeros(inp_sort.shape)
for i in range(ch):
for j in range(inp_sort.shape[1]):
for k in range (inp_sort.shape[2]):
temp = inp_sort[i,j,k]
resort[i,indices[i,j,k],k] = temp
I would like it to be vectorized considering batches as well i.e.input size is [batch, channel, height, width].
Using Tensor.scatter_()
You can directly scatter the sorted tensor back into its original state using the indices provided by sort():
torch.zeros(ch,h,w).scatter_(dim=1, index=indices, src=inp_sort)
The intuition is based on the previous answer below. As scatter() is basically the reverse of gather(), inp_reunf = inp_sort.gather(dim=1, index=reverse_indices) is the same as inp_reunf.scatter_(dim=1, index=indices, src=inp_sort):
Previous answer
Note: while correct, this is probably less performant, as calling the sort() operation a 2nd time.
You need to obtain the sorting "reverse indices", which can be done by "sorting the indices returned by sort()".
In other words, given x_sort, indices = x.sort(), you have x[indices] -> x_sort ; while what you want is reverse_indices such that x_sort[reverse_indices] -> x.
This can be obtained as follows: _, reverse_indices = indices.sort().
import torch
torch.manual_seed(1)
ch, h, w = 3, 10, 10
inp_unf = torch.randn(ch,h,w)
inp_sort, indices = inp_unf.sort(dim=1)
_, reverse_indices = indices.sort(dim=1)
inp_reunf = inp_sort.gather(dim=1, index=reverse_indices)
print(torch.equal(inp_unf, inp_reunf))
# True