Related
I have an array and I would like to place it into 7 bins and then calculate the mean and standard deviation (standard in the error) corresponding to each bin so that I can plot both the histogram as well as the errorbars. While the numpy histogram readily outputs the mean values of bins, it is not meant to produce the errorbars (unless I am wrong). This is why I want to use the physt python package to directly extract the mean and errors corresponding to each bin for the purpose of errorbars. But, I just noticed that the two methodology are not agreeing with each other in the first place; they don't even produce the same mean values (heights) as expected. Now, I am kind of confused. I would truly appreciate your help.
import numpy as np
from physt import h1
import matplotlib.pyplot as plt
x_arr = np.array([
0, 32, 28, 15, 19, 22, 18, 16, 13, 35, 21, 32, 23, 11, 17, 3, 17, 3, 21, 43, 32, 15, 16, 18,
28, 9, 33, 16, 20, 19, 35, 37, 32, 26, 30, 30, 28, 30, 22, 25, 21, 26, 41, 41, 12, 3, 5, 6, 5,
17, 16, 16, 16, 7, 2, 15, 16, 15, 15, 15, 7, 5
])
bins = np.array([0, 2, 3, 5, 9, 17, 33, 65])
ax = plt.axes()
heights, bins, patches = ax.hist(x_arr, bins, density=True)
print('numpy: \n', heights)
hist = h1(x_arr, bins, density=True)
print('physt: \n', hist.frequencies / sum(hist.frequencies))
And here are the outputs which are interestingly different:
numpy:
[0.00806452 0.01612903 0.02419355 0.02419355 0.03427419 0.02721774
0.00352823]
physt:
[0.01612903 0.01612903 0.0483871 0.09677419 0.27419355 0.43548387
0.11290323]
The below code snippet is displaying the plot image perfectly in Pycharm window, but the same image isn't appearing properly when it's saved in an image.
How I can save the image properly?
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
sns.set_context('paper')
report_id = ['Report_1', 'Report_2', 'Report_3', 'Report_4', 'Report_5', 'Report_6', 'Report_7', 'Report_8', 'Report_9',
'Report_10', 'Report_11', 'Report_12', 'Report_13', 'Report_14', 'Report_15', 'Report_16', 'Report_17',
'Report_18', 'Report_19', 'Report_20', 'Report_21', 'Report_22', 'Report_23', 'Report_24', 'Report_25',
'Report_26', 'Report_27', 'Report_28', 'Report_29', 'Report_30', 'Report_31', 'Report_32', 'Report_33',
'Report_34', 'Report_35', 'Report_36', 'Report_37', 'Report_38', 'Report_39', 'Report_40', 'Report_41',
'Report_42', 'Report_43', 'Report_44', 'Report_45', 'Report_46', 'Report_47', 'Report_48', 'Report_49',
'Report_50', 'Report_51', 'Report_52', 'Report_53', 'Report_54', 'Report_55', 'Report_56', 'Report_57',
'Report_58', 'Report_59', 'Report_60']
report_value = [1300, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54,
55, 56, 57, 58, 59, 60]
df = pd.DataFrame({'report_id': report_id, 'report_value': report_value})
sns.set(rc={'figure.figsize': (15, 100)})
ax = sns.barplot(y="report_id", x="report_value", data=df, palette="GnBu_d")
ax.tick_params(labelsize=3)
initialx = 0
for p in ax.patches:
ax.text(p.get_width(), initialx + p.get_height() / 10, "{:1.0f}".format(p.get_width()),fontsize=5)
initialx += 1
plt.savefig(r"C:\Program\Anaconda3\venvs\PlotGraph\Bar_Graph.png")
plt.show()
Pycharm Image:
Saved Image of Same plot:
I want to update the value of a key in dictionary. This is a snippet of a list that contains over 300 dictionaries
chats = [
{'hour': 10, 'operator': 'john_doe', 'duration': [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59], 'date': '2019-09-09'},
{'hour': 10, 'operator': 'john_doe', 'duration': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'date': '2019-09-09'},
{'hour': 10, 'operator': 'john_doe', 'duration': [18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28], 'date': '2019-09-09'},
{'hour': 11, 'operator': 'john_doe', 'duration': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28], 'date': '2019-09-09'},
{'hour': 10, 'operator': 'joseph_doe', 'duration': [5, 6, 7, 8, 9], 'date': '2019-09-09'}
]
script: I am getting an error on that script. I am looping to know if this dict is already in so that I can update the duration.
chat_list = list()
for chat in chats:
hour = chat.get('hour')
operator = chat.get("operator")
if len(chat_list) == 0:
chat_list.append(chat)
else:
found = False
for i in chat_list:
hour2 = chat.get('hour')
operator2 = chat.get("operator")
if (hour2 == hour) and (operator == operator2):
found = True
#concat both dictionary
i['duration'] = i.get('duration') + chat.get("duration")
if found == True:
found = False
else:
chat_list.append(chat)
My expected output is
chat_list = [
{'hour': 10, 'operator': 'john_doe', 'duration': [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28], 'date': '2019-09-09'},
{'hour': 11, 'operator': 'john_doe', 'duration': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28], 'date': '2019-09-09'},
{'hour': 10, 'operator': 'joseph_doe', 'duration': [5, 6, 7, 8, 9], 'date': '2019-09-09'}
]
or
df = pd.DataFrame(chat_list)
df['duration'] = df['duration'].apply(lambda x: list(set(x)))
To be honest, I didn't tested your algorithm. Instead I took it as a small challenge and I wrote the following algorithm which doesn't need to copy chats in to a new list.
It finds the first occurrence of "similar" chat and concat the duration arrays. Then it deletes the "duplicated" chat. Further explanation in the code itself:
chats = [
{'hour': 10, 'operator': 'john_doe', 'duration': [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59], 'date': '2019-09-09'},
{'hour': 10, 'operator': 'john_doe', 'duration': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'date': '2019-09-09'},
{'hour': 10, 'operator': 'john_doe', 'duration': [18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28], 'date': '2019-09-09'},
{'hour': 11, 'operator': 'john_doe', 'duration': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28], 'date': '2019-09-09'},
{'hour': 10, 'operator': 'joseph_doe', 'duration': [5, 6, 7, 8, 9], 'date': '2019-09-09'}
]
index = 0
while index < len(chats) - 1:
chat = chats[index]
# detect if there is another "similar" chat in the list (before this one)
first_index = next(
i for i, first_chat in enumerate(chats)
if chat.get('hour') == first_chat.get('hour') and chat.get('operator') == first_chat.get('operator')
)
# if the first index found is not this one:
# - concat `duration` arrays
# - delete this (duplicated) chat
if index != first_index:
chats[first_index]['duration'] += chat['duration']
del chats[index]
# otherwise continue and increment the index
else:
index += 1
print(chats)
import numpy as np
arr = np.array(range(60)).reshape(6,10)
arr
> array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
> [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
> [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
> [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
> [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
> [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]])
What I need:
select_random_windows(arr, number_of windows= 3, window_size=3)
> array([[[ 1, 2, 3],
> [11, 12, 13],
> [21, 22, 23]],
>
> [37, 38, 39],
> [47, 48, 49],
> [57, 58, 59]],
>
> [31, 32, 33],
> [41, 42, 43],
> [51, 52, 53]]])
In this hypothetical case I'm selecting 3 windows of 3x3 within the main array (arr).
My actual array is a raster and I basically need a bunch (on the thousands) of little 3x3 windows.
Any help or even a hint will be much appreciated.
I actually haven't found any practical solution yet...since many many hours
THX!
We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to get sliding windows. More info on use of as_strided based view_as_windows.
from skimage.util.shape import view_as_windows
def select_random_windows(arr, number_of_windows, window_size):
# Get sliding windows
w = view_as_windows(arr,window_size)
# Store shape info
m,n = w.shape[:2]
# Get random row, col indices for indexing into windows array
lidx = np.random.choice(m*n,number_of_windows,replace=False)
r,c = np.unravel_index(lidx,(m,n))
# If duplicate windows are allowed, use replace=True or np.random.randint
# Finally index into windows and return output
return w[r,c]
Sample run -
In [209]: arr
Out[209]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59]])
In [210]: np.random.seed(0)
In [211]: select_random_windows(arr, number_of_windows=3, window_size=(2,4))
Out[211]:
array([[[41, 42, 43, 44],
[51, 52, 53, 54]],
[[26, 27, 28, 29],
[36, 37, 38, 39]],
[[22, 23, 24, 25],
[32, 33, 34, 35]]])
You can try [numpy.random.choice()][1]. It takes a 1D or an ndarray and creates a single element or an ndarray by sampling the elements from the given ndarray. You also have an option of providing the size of the array you want as the output.
my code:
def originalList = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]
def newList = orginalList.percent(0.05,0.95) //I have no idea what I'm doing here
println newList
I have an original list of numbers, they are 1 - 100 and i want to make a new list from the original list however the new list must only have data that belongs to the sub-range 5%- 95% of the original list
so the new list must be like [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18....95]. How do i do that? i know my newList code is wrong
You mean like:
originalList[ 4..94 ] // zero starting pos
Or do you need percentages?
You could do:
originalList[ (originalList.size() * 0.05 - 1)..<(originalList.size() * 0.95) ]
You could also use the metaClass:
List.metaClass.percent { double lower, double upper ->
int d = lower * delegate.size() - 1
int t = upper * delegate.size()
delegate.take( t ).drop( d )
}
originalList.percent( 0.05, 0.95 )