Related
I am stuck at this question where I am required to update all largest and smallest numbers in the list with the average value as a way to remove the extreme values in the given list.
For example:
def remove_extreme( [ [0,4], [1,4], [-1,2] ] ) would return [ [0,1.5], [1,1.5], [1.5,2] ].
The function will need to use the average value to modify the elements in the lists, which, in this case the smallest and largest values are -1 and 4 respectively. Then, all the largest and smallest values present in the list need to be changed to the average(mean) of these two values, which is (4+-1)/2 = 1.5
Here's my code:
def remove_extreme(datalist):
for numlist in datalist:
for index, number in enumerate(numlist):
largest = max(number)
smallest = min(number)
average = (largest - smallest)/2
if number == largest or smallest:
num_list[index] = average
return datalist
May I know what went wrong? I keep getting 'int' object is not iterable.
What you asked about
To answer your immediate question, the built in functions max and min return for you the maximum and minimum number from an iterable.
https://docs.python.org/3/library/functions.html#max
So it throws a TypeError when you pass it an integer. Run it on a list/iterable instead.
But your code has more issues than just that.
Your if statement, though syntactically correct, is probably not what you want. More than likely you wanted to do this:
if number == largest or number == smallest:
Like Tomerikoo pointed out, you want to put your max and min outside the loop. As an aside, you do not need to return the list as lists are mutable and you are modifying it freely inside the function.
def remove_extreme(datalist):
for numlist in datalist:
largest = max(numlist)
smallest = min(numlist)
average = (largest - smallest)/2
for index, number in enumerate(numlist):
if number == largest or number == smallest:
numlist[index] = average
return datalist
What your problem is actually asking you
Looking at your original question I think you're a little off from the correct answer if your lists need to look like your given answer. The first hint is that your given answer shows only one of the values changed, and it's not always the average of the inner list. Take [0, 4] for instance. 1.5 is not the average of 0 and 4, yet that is what you say it should return. It seems that you are really desiring to change the most extreme number for each inner list based off the average of all the lists. Taking the average of the numbers of all the inner lists yields 1.66, so I'm not sure of this precisely, but I think one of your numbers might be off by 1 (I think so because 10/6 yields 1.66 while 9/6 yields 1.5).
If the above assumptions are all correct you will want to calculate the average (which is usually the sum/number of elements) and then find the most extreme element within each list.
Your function should look a bit more like this:
def remove_extreme(datalist):
# sum len of each list to get total number of elements
num_elements = sum([len(numlist) for numlist in datalist])
# sum the sum of each list to get total
sum_elements = sum([sum(numlist) for numlist in datalist])
# calculate average
average = sum_elements/num_elements
# find the most extreme element in each list and perform substitution
for numlist in datalist:
smallest = min(numlist)
largest = max(numlist)
large_diff = abs(largest - average)
small_diff = abs(average - smallest)
num_to_change = largest if large_diff > small_diff else smallest
for index, number in enumerate(numlist):
if number == num_to_change: # Just look at the most extreme number
numlist[index] = average
return datalist # list will be modified, but returning it doesn't hurt either
Running this function after changing your -1 to -2:
my_list = [
[0,4],
[1,4],
[-2,2]
]
print("Before: ", my_list)
remove_extreme(my_list)
print("After: ", my_list)
Output:
$ python remove_extreme.py
Before: [[0, 4], [1, 4], [-2, 2]]
After: [[0, 1.5], [1, 1.5], [1.5, 2]]
After further clarification
After clarifying what the question was really asking you, the answer is even simpler:
def remove_extreme(datalist):
flattened = [i for numlist in datalist for i in numlist] # flatten list for convenience
largest = max(flattened)
smallest = min(flattened)
average = (largest + smallest)/2
# find the most extreme element in each list and perform substitution
for numlist in datalist:
for index, number in enumerate(numlist):
if number == smallest or number == largest:
numlist[index] = average
return datalist # list will be modified, but returning it doesn't hurt either
Personally I feel like this makes less sense, but that seems to be what you're asking for.
Also, when writing a question, it's helpful to include the stack trace or point to the specific line where the issue is occurring. Just a helpful tip!
You are trying to get max and min of the element not of the list
>>> list = ( [ [0,4], [1,4], [-1,2] ] )
>>> max(list)
Output
[1, 4]
>>> min(list)
Output
[-1, 2]
Consider we have :
size of array = 5
pairs= 3
array= 1 2 3 4 5
We need to divide it into int the possible sub list as:
[(1,2,3),(4),(5)]
[(1),(2,3,4),(5)]
[(1),(2),(3,4,5)]
[(1,2),(3,4),(5)]
[(1),(2,3),(4,5)]
Suppose if:
size of array = 5
pairs= 2
array= 1 2 3 4 5
We need to divide it into int the possible sub list as:
[(1,2,3,4),(5)]
[(1),(2,3,4,5)]
[(1,2),(3,4,5)]
[(1,2,3),(4,5)]
The code I have tried:
l1=[1,2,3,4,5]
from itertools import permutations
l2 = permutations(l1)
l3 = [[sum([x[0], x[1]]), sum([x[2], x[3]]),x[4]] for x in l2]
max_arr=[]
for arr in l3:
max_arr.append(max(arr))
print(min(max_arr))
To generate all list partitions, you can make a list of parts-1 ones and size - parts zeros. (Note I used parts rather pairs as more suitable name).
Then generate permutations of this list (for example, with itertools), and for every permutation make separation of initial list after indices of 1's. (Note there are Cnk(size-1, parts-1) of such permutations).
For example, result [0,1,1,0] corresponds to partition [(1,2),(3),(4,5)] (divide list after 1st and 2nd items)
This is application of "stars and bars" principle
MBo's good solution may be improved a little by generating combinations of list splitting indexes instead of permutations of separation flags. Then if we're lazy we can pass those indexes directly to numpy.split.
l = [1, 2, 3, 4, 5]
import itertools
import numpy
parts = 3
for comb in itertools.combinations(range(1, len(l)), parts-1):
print(numpy.split(l, comb))
I have used Python for a long time but I don't know how objects and the memory really work.
Until a few days ago, I thought that alpha = gamma made a variable whose name was alpha and saved in it a copy of gamma, without linking the variables to each other. However, I have recently noticed that that doesn't happen. Both variables actually point to the same object. Nevertheless, the variables become independent when you change the data in one of them (depending on the variables).
There are many other cases in which variables don't behave like you would expect. This is an example I came upon:
>>> grid1=[[0]*4]*4
>>> grid2=[[0,0,0,0],[0,0,0,0],[0,0,0,0],[0,0,0,0]]
>>> grid1 == grid2
True
>>> grid1[2][3]+=1
>>> grid2[2][3]+=1
>>> grid1
[[0, 0, 0, 1], [0, 0, 0, 1], [0, 0, 0, 1], [0, 0, 0, 1]]
>>> grid2
[[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 1], [0, 0, 0, 0]]
I have tried to find more information about how = and other commands treat variables and found some threads, but I have many questions whose answer I don't know yet:
Why did the behavior shown above with the lists take place?
What should be done in order to avoid it (make grid1 behave like grid2)?
Does this have anything to do with the modifiability of the variables? What does it really mean for a variable to be modifiable?
When are variables the same object or different ones? How do you know if a command creates a separate variable or not (x+=y vs x = x + y, append vs +)?
Is there an == which would have returned false in the example above (is wouldn't work, those two variables were created in different steps and independently so they won't be in the same place in the memory) because in grid1 all lists were in the same place in the memory while in grid2 they were independent?
I haven't been able to find the answers to those questions anywhere, could anyone give a brief answer to the questions or provide a reference which explained these concepts? Thanks.
Why did the behavior shown above with the lists take place?
Because lists and other mutable collections do not create a new object when you set them to a variable.
What should be done in order to avoid it (make grid1 behave like grid2)?
grid1=[[[0] for _ in range(4)] for _ in range(4)] would make it work as you want. This is because it actually creates a new list each time instead of duplicating it (like [[0]*4]*4 does).
Does this have anything to do with the modifiability of the variables? What does it really mean for a variable to be modifiable?
Collections such as strings are immutable so when you do a = "hi";b = a; b += "!" b is set to a new string that copies a and then to a new string that copies b and adds "!".
Lists instead operate on the same object so when you do a = [];b = a;b.append(1) b is set to a and then it appends 1 to b (which references a in memory).
When are variables the same object or different ones? How do you know if a command creates a separate variable or not (x+=y vs x = x + y, append vs +)?
It Depends more on the data structure rather than on the operator or method.
Mutable types: list, set, dict.
Immutable types: tuple, frozenset, string.
Is there an == which would have returned false in the example above (is wouldn't work, those two variables were created in different steps and independently so they won't be in the same place in the memory) because in grid1 all lists were in the same place in the memory while in grid2 they were independent?
== evaluates the equality of values (i.e. if they contain the same) while is evaluates if both are the same object. (Try testing == and is on two equal lists. In the first case a = [1]; b = [1] and in the second case a = [1]; b = a.
So I had this statistics homework and I wanted to do it with python and numpy.
The question started with making of 1000 random samples which follow normal distribution.
random_sample=np.random.randn(1000)
Then it wanted to divided these numbers to some subgroups . for example suppose we divide them to five subgroups.first subgroup is random numbers in range of (-5,-3)and it goes on to the last subgroup (3,5).
Is there anyway to do it using numpy (or anything else)?
And If it's possible I want it to work when the number of subgroups are changed.
You can get subgroup indices using numpy.digitize:
random_sample = 5 * np.random.randn(10)
random_sample
# -> array([-3.99645573, 0.44242061, 8.65191515, -1.62643622, 1.40187879,
# 5.31503683, -4.73614766, 2.00544974, -6.35537813, -7.2970433 ])
indices = np.digitize(random_sample, (-3,-1,1,3))
indices
# -> array([0, 2, 4, 1, 3, 4, 0, 3, 0, 0])
If you sort your random_sample, then you can divide this array by finding the indices of the "breakpoint" values — the values closest to the ranges you define, like -3, -5. The code would be something like:
import numpy as np
my_range = [-5,-3,-1,1,3,5] # example of ranges
random_sample = np.random.randn(1000)
hist = np.sort(random_sample)
# argmin() will find index where absolute difference is closest to zero
idx = [np.abs(hist-i).argmin() for i in my_range]
groups=[hist[idx[i]:idx[i+1]] for i in range(len(idx)-1)]
Now groups is a list where each element is an array with all random values within your defined ranges.
I'm aware of:
id,value = max(enumerate(trans_p), key=operator.itemgetter(1))
I'm trying to find something equivalent for matrices, where I'm looking for the value and row index of the max for each column of the matrix
so the function could take in any matrix, such as:
np.array([[0,0,1],[2,0,0],[5,0,0]])
and return two vectors: a vector of row numbers where the max is found, and the max values themselves - for each column. I'm trying to avoid a for-loop! Ideally the function returns two values, like that:
rowIdVect, maxVect = ...........
where the values for the example matrix above would be:
[2,0,0] #rowIdVect
[5,0,1] #maxVect
I can do this in two steps:
idVect = np.argmax( myMat , axis=0)
maxVect = np.max( trans_probs_mat, axis=0)
But is there a syntax that would perform both at the same time? Note: I'm trying to improve run times.
You can use the index to find the corresponding values:
In [201]: arr=np.array([[0,0,1],[2,0,0],[5,0,0]])
In [202]: idx=np.argmax(arr, axis=0)
In [203]: np.max(arr, axis=0)
Out[203]: array([5, 0, 1])
In [204]: arr[idx,np.arange(3)]
Out[204]: array([5, 0, 1])
Is this worth it? I doubt if the use of argmax and/or max is a bottleneck in your calculations. But feel free to time test with realistic data.