Related
I have an array and I would like to place it into 7 bins and then calculate the mean and standard deviation (standard in the error) corresponding to each bin so that I can plot both the histogram as well as the errorbars. While the numpy histogram readily outputs the mean values of bins, it is not meant to produce the errorbars (unless I am wrong). This is why I want to use the physt python package to directly extract the mean and errors corresponding to each bin for the purpose of errorbars. But, I just noticed that the two methodology are not agreeing with each other in the first place; they don't even produce the same mean values (heights) as expected. Now, I am kind of confused. I would truly appreciate your help.
import numpy as np
from physt import h1
import matplotlib.pyplot as plt
x_arr = np.array([
0, 32, 28, 15, 19, 22, 18, 16, 13, 35, 21, 32, 23, 11, 17, 3, 17, 3, 21, 43, 32, 15, 16, 18,
28, 9, 33, 16, 20, 19, 35, 37, 32, 26, 30, 30, 28, 30, 22, 25, 21, 26, 41, 41, 12, 3, 5, 6, 5,
17, 16, 16, 16, 7, 2, 15, 16, 15, 15, 15, 7, 5
])
bins = np.array([0, 2, 3, 5, 9, 17, 33, 65])
ax = plt.axes()
heights, bins, patches = ax.hist(x_arr, bins, density=True)
print('numpy: \n', heights)
hist = h1(x_arr, bins, density=True)
print('physt: \n', hist.frequencies / sum(hist.frequencies))
And here are the outputs which are interestingly different:
numpy:
[0.00806452 0.01612903 0.02419355 0.02419355 0.03427419 0.02721774
0.00352823]
physt:
[0.01612903 0.01612903 0.0483871 0.09677419 0.27419355 0.43548387
0.11290323]
I have this function:
def function(start_date_arrow=None,end_date_arrow=None, date_concept=None):
list=[getattr(date, date_concept) for date in arrow.Arrow.range(date_concept, start_date_arrow, end_date_arrow)]
This function works well when iterating over date_concept='month' and date_concept='day'. On the other hand, date_concept='year' only returns a list of one item.
For example:
start_date_arrow= arrow.get('2021-11-05')
end_date_arrow= arrow.get('2022-02-05')
year_list=function(start_date_arrow=start_date_arrow,end_date_arrow=end_date_arrow, date_concept='year')
year_list is [2021]
month_list=function(start_date_arrow=start_date_arrow,end_date_arrow=end_date_arrow, date_concept='month')
month_list is [11, 12, 1, 2]
day_list=function(start_date_arrow=start_date_arrow,end_date_arrow=end_date_arrow, date_concept='day')
day_list is [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
Second and third call are okei, but first one should return [2021,2022] instead of [2021].
Any idea of what is happening in the year call?
Found the issue.
If you use:
start_date_arrow= arrow.get('2021-11-05')
end_date_arrow= arrow.get('2022-02-05')
Year difference between both is less than 1, so it only returns the first one, so to return 2022 in the list end_date_arrow should be end_date_arrow= arrow.get('2022-11-05')
So I forced with an if statement the end date to be bigger just by one year, to force the return of both years.
Down below I have a function that checks and outputs any common number within the list_1, list_2, list_3 is there a way that I could use the enumerate or any other function function that would minimize the middle part of the code.
Bit that need minimization:
for elem in l1:#loop to access l1elements
if elem in l2:#checking for element in l2
if elem in l3:#checking for element in l3
Full Code:
def intersect(l1, l2, l3) :#function
for elem in l1:#loop to access l1elements
if elem in l2:#checking for element in l2
if elem in l3:#checking for element in l3
print (element) #display element
list_1 =[27, 20, 22, 21, 17, 12, 24, 23, 19, 14, 11, 26, 25, 13, 15, 21, 18, 28, 29, 10]
list_2 = [14, 25, 26, 21, 22, 17, 11, 23, 27, 18, 24, 28, 12, 29, 16, 19, 13, 10, 20, 15]
list_3 = [19, 21, 11, 24, 16, 17, 18, 22, 26, 10, 23, 29, 27, 13, 25, 14, 15, 20, 28, 12]
intersect(list_1, list_2, list_3) #calling function
You can use numpy intersect1d method to find the common values in the lists or array
def intersect(l1, l2, l3):
print(reduce(np.intersect1d, (l1, l2, l3)))
Result:
[10 11 12 13 14 15 17 18 19 20 21 22 23 24 25 26 27 28 29]
Code:
import numpy as np
from functools import reduce
def intersect(l1, l2, l3):
print(reduce(np.intersect1d, (l1, l2, l3)))
list_1 = [27, 20, 22, 21, 17, 12, 24, 23, 19, 14, 11, 26, 25, 13, 15, 21, 18, 28, 29, 10]
list_2 = [14, 25, 26, 21, 22, 17, 11, 23, 27, 18, 24, 28, 12, 29, 16, 19, 13, 10, 20, 15]
list_3 = [19, 21, 11, 24, 16, 17, 18, 22, 26, 10, 23, 29, 27, 13, 25, 14, 15, 20, 28, 12]
intersect(list_1, list_2, list_3) #calling functio
You should use set object instead.
set_1 = set([27, 20, 22, 21, 17, 12, 24, 23, 19, 14, 11, 26, 25, 13, 15, 21, 18, 28, 29, 10])
set_2 = set([14, 25, 26, 21, 22, 17, 11, 23, 27, 18, 24, 28, 12, 29, 16, 19, 13, 10, 20, 15])
set_3 = set([19, 21, 11, 24, 16, 17, 18, 22, 26, 10, 23, 29, 27, 13, 25, 14, 15, 20, 28, 12])
set_1.intersection(set_2, set_3)
#tony selcuk - It seems that you've tried to loop 3 lists to find corresponding matching numbers? In that case, You could try this code snippet to see if it works as you want. It used the enumerate() to loop all 3 lists together and get their (index, num) as tuple to compare if there is a match. Just run it. Once it proves to work as expected, you can turn it into a function easily. This approach will find all matching numbers that appear in all three list and at the SAME position (index).
for i, j, k in zip(enumerate(list_1), enumerate(list_2), enumerate(list_3)):
#print(i, j, k)
if i == j == k:
print("number:{} order:{}".format(i[1], j[0]))
I use the function list_ .index(list_ ) to get the order of digits within list_ like how list_[0] = 14. I want a function to format the list_ and print the orders that are greater than 20. So the answer would be numbers = 1,2,3,4,5,7,8,10,11,13,18 within list_[] that are greater than 20.
list_ = [14, 25, 26, 21, 22, 17, 11, 23, 27, 18, 24, 28, 12, 29, 16, 19, 13, 10, 20, 15]
list_ = [14, 25, 26, 21, 22, 17, 11, 23, 27, 18, 24, 28, 12, 29, 16, 19, 13, 10, 20, 15]
for index,i in enumerate(list_):
if i >= 20:
print(index)
If you want it as a list
x = [index for index,i in enumerate(list_) if i >= 20]
print(x)
>>> [1, 2, 3, 4, 7, 8, 10, 11, 13, 18]
I have a below list.
33, 26, 24, 21, 19, 20, 18, 18, 52, 56, 27, 22, 18, 49, 22, 20, 23, 32, 20, 18
All I am trying is to find the the 25th Percentile.
I used simple numpy program to find it.
import numpy as np
arr = [33, 26, 24, 21, 19, 20, 18, 18, 52, 56, 27, 22, 18, 49, 22, 20, 23, 32, 20, 18]
np.percentile(arr,25)
Output is : 19.75
But If we count is manually or Use Excel the 25th percentile comes as 19.25.
I expect the output as 19.25 but the actual output from numpy is 19.75. Can someone please help what is wrong here?
You see, in excel there's two percentile function: PERCENTILE.EXC and PERCENTILE.INC and the difference is in "the Percentile.Inc function the value of k is is within the range 0 to 1 inclusive, and in the Percentile.Exc function, the value of k is within the range 0 to 1 exclusive." (source)
Numpy's percentile function computes the k'th percentile where k must be between 0 and 100 inclusive (docs)
Let's check that.
arr = [18, 18, 18, 18, 19, 20, 20, 20, 21, 22, 22, 23, 24, 26, 27, 32, 33, 49, 52, 56]
np.percentile(arr,25)
19.75
Hope that helps
Check your input values, and lookup what excel uses, since these are the options in numpy
t = ['linear', 'lower', 'higher', 'nearest', 'midpoint']
arr = np.array([33, 26, 24, 21, 19, 20, 18, 18, 52, 56, 27, 22, 18, 49, 22, 20, 23, 32, 20, 18])
for cnt, i in enumerate(t):
v = np.percentile(arr, 25., interpolation=i)
print("type: {} value: {}".format(i, v))
type: linear value: 19.75
type: lower value: 19
type: higher value: 20
type: nearest value: 20
type: midpoint value: 19.5