Numpy choice based on another array - python-3.x

I want to select some elements from an array main_array if the indexes correspond indexes with a True value in another array. For example y should contain [14,15,16] in arbitrary order
import numpy as np
main_array = np.array([11,12,13,14,15,16])
selector = np.array([0,1,2,3,3,3])
x = np.random.choice(main_array, 3, replace=False) # This works
y = np.random.choice(main_array, 3, replace=False, p=np.where(selector>2)) # This fails
However, I get ValueError: 'p' must be 1-dimensional
What is the correct way to limit selection to indexes based on another array?

A way to do it is just make by parts:
import numpy as np
main_array = np.array([11, 12, 13, 14, 15, 16])
selector = np.array([0, 1, 2, 3, 3, 3])
x = np.random.choice(main_array, 3, replace=False)
z = main_array[selector > 2]
y = np.random.choice(z, len(z), replace=False)
print(f"x={x}")
print(f"z={z}")
print(f"y={y}")
The output is
x=[16 14 13]
z=[14 15 16]
y=[16 15 14]
Another way to make it is to put the probabilities equal to zero where the mask doesn't apply:
import numpy as np
main_array = np.array([11, 12, 13, 14, 15, 16])
selector = np.array([0, 1, 2, 3, 3, 3])
x = np.random.choice(main_array, 3, replace=False)
p = 1 * (selector > 2)
y = np.random.choice(main_array, 3, replace=False, p=p / np.sum(p))
print(y)

Related

Python: Plot histograms with customized bins

I am using matplotlib.pyplot to make a histogram. Due to the distribution of the data, I want manually set up the bins. The details are as follows:
Any value = 0 in one bin;
Any value > 60 in the last bin;
Any value > 0 and <= 60 are in between the bins described above and the bin size is 5.
Could you please give me some help? Thank you.
I'm not sure what you mean by "the bin size is 5". You can either plot a histogramm by specifying the bins with a sequence:
import matplotlib.pyplot as plt
data = [0, 0, 1, 2, 3, 4, 5, 6, 35, 60, 61, 82, -5] # your data here
plt.hist(data, bins=[0, 0.5, 60, max(data)])
plt.show()
But the bin size will match the corresponding interval, meaning -in this example- that the "0-case" will be barely visible:
(Note that 60 is moved to the last bin when specifying bins as a sequence, changing the sequence to [0, 0.5, 59.5, max(data)] would fix that)
What you (probably) need is first to categorize your data and then plot a bar chart of the categories:
import matplotlib.pyplot as plt
import pandas as pd
data = [0, 0, 1, 2, 3, 4, 5, 6, 35, 60, 61, 82, -5] # your data here
df = pd.DataFrame()
df['data'] = data
def find_cat(x):
if x == 0:
return "0"
elif x > 60:
return "> 60"
elif x > 0:
return "> 0 and <= 60"
df['category'] = df['data'].apply(find_cat)
df.groupby('category', as_index=False).count().plot.bar(x='category', y='data', rot=0, width=0.8)
plt.show()
Output:
building off Tranbi's answer, you could specify the bin edges as detailed in the link they shared.
import matplotlib.pyplot as plt
import pandas as pd
data = [0, 0, 1, 2, 3, 4, 5, 6, 35, 60, 61, 82, -6] # your data here
df = pd.DataFrame()
df['data'] = data
bin_edges = [-5, 0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65]
bin_edges_offset = [x+0.000001 for x in bin_edges]
plt.figure()
plt.hist(df['data'], bins=bin_edges_offset)
plt.show()
histogram
IIUC you want a classic histogram for value between 0 (not included) and 60 (included) and add two bins for 0 and >60 on the side.
In that case I would recommend plotting the 3 regions separately:
import matplotlib.pyplot as plt
data = [0, 0, 1, 2, 3, 4, 5, 6, 35, 60, 61, 82, -3] # your data here
fig, axes = plt.subplots(1,3, sharey=True, width_ratios=[1, 12, 1])
fig.subplots_adjust(wspace=0)
# counting 0 values and drawing a bar between -5 and 0
axes[0].bar(-5, data.count(0), width=5, align='edge')
axes[0].xaxis.set_visible(False)
axes[0].spines['right'].set_visible(False)
axes[0].set_xlim((-5, 0))
# histogram between (0, 60]
axes[1].hist(data, bins=12, range=(0.0001, 60.0001))
axes[1].yaxis.set_visible(False)
axes[1].spines['left'].set_visible(False)
axes[1].spines['right'].set_visible(False)
axes[1].set_xlim((0, 60))
# counting values > 60 and drawing a bar between 60 and 65
axes[2].bar(60, len([x for x in data if x > 60]), width=5, align='edge')
axes[2].xaxis.set_visible(False)
axes[2].yaxis.set_visible(False)
axes[2].spines['left'].set_visible(False)
axes[2].set_xlim((60, 65))
plt.show()
Output:
Edit: If you wanna plot probability density, I would edit the data and simply use hist:
import matplotlib.pyplot as plt
data = [0, 0, 1, 2, 3, 4, 5, 6, 35, 60, 61, 82, -3] # your data here
data2 = []
for el in data:
if el < 0:
pass
elif el > 60:
data2.append(61)
else:
data2.append(el)
plt.hist(data2, bins=14, density=True, range=(-4.99,65.01))
plt.show()

How to concatenate list to int in Python?

When using a list, I saw that I cannot add or subtract the sample I took from the list. For example:
import random
x = random.sample ((1 ,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13), k=1 )
print(x + 1)
Why I can’t add into the list I created and how can I get around that issue?
If you want to increase the value of every item in a list, you can do like:
import random
x = random.sample ((1 ,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13), k=3 )
print(x)
for index in range(len(x)):
x[index] = x[index] +1
print(x)
In your case, if k is always 1, you can simply like:
import random
x = random.sample ((1 ,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13), k=1 )
print(x)
x[0] = x[0] + 1
print(x)
The reason you can't concatenate is because the type random.sample is returning is a list of size k=1. If you want to be returning an element of your sequence and add to it, you should be using random.choice. It should read something along the lines of:
import random
x = random.choice((1,2,3,4,5,6,7,8,9,10,11,12,13))
print(x+1)

How do I get the x and y labels to appear when displaying more then one histogram using pandas hist() function with the by argument?

I am trying to create a series of graphs that share x and y labels. I can get the graphs to each have a label (explained well here!), but this is not what I am looking for.
I want one label that covers the y axis of both graphs, and same for the x axis.
I've been looking at the matplotlib and pandas documentation and I was unable to find anything that addresses this issues when the using by argument.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 1, 2, 3, 4, 3, 4],
'B': [1, 7, 2, 4, 1, 4, 8, 3],
'C': [1, 4, 8, 3, 1, 7, 3, 4],
'D': [1, 2, 6, 5, 8, 3, 1, 7]},
index=[0, 1, 2, 3, 5, 6, 7, 8])
histo = df.hist(by=df['A'], sharey=True, sharex=True)
plt.ylabel('ylabel') # I assume the label is created on the 4th graph and then deleted?
plt.xlabel('xlabel') # Creates a label on the 4th graph.
plt.tight_layout()
plt.show()
The ouput looks like this.
Is there any way that I can create a Y Label that goes across the entire left side of the image (not each graph individually) and the same for the X Label.
As you can see, the x label only appears on the last graph created, and there is no y label.
Help?
This is one way to do it indirectly using the x- and y-labels as texts. I am not aware of a direct way using plt.xlabel or plt.ylabel. When passing an axis object to df.hist, the sharex and sharey arguments have to be passed in plt.subplots(). Here you can manually control/specify the position where you want to put the labels. For example, if you think the x-label is too close to the ticks, you can use 0.5, -0.02, 'X-label' to shift it slightly below.
import matplotlib.pyplot as plt
import pandas as pd
f, ax = plt.subplots(2, 2, figsize=(8, 6), sharex=True, sharey=True)
df = pd.DataFrame({'A': [1, 2, 1, 2, 3, 4, 3, 4],
'B': [1, 7, 2, 4, 1, 4, 8, 3],
'C': [1, 4, 8, 3, 1, 7, 3, 4],
'D': [1, 2, 6, 5, 8, 3, 1, 7]},
index=[0, 1, 2, 3, 5, 6, 7, 8])
histo = df.hist(by=df['A'], ax=ax)
f.text(0, 0.5, 'Y-label', ha='center', va='center', fontsize=20, rotation='vertical')
f.text(0.5, 0, 'X-label', ha='center', va='center', fontsize=20)
plt.tight_layout()
I fixed the issue with the variable number of sub-plots using something like this:
cols = 3
n = len(set(df['A']))
rows = int(n / cols) + (0 if n % cols == 0 else 1)
fig, axes = plt.subplots(rows, cols)
extra = rows * cols - n
if extra:
newaxes = []
count = 0
for row in range(rows):
for col in range(cols):
if count < n:
newaxes.append(axes[row][col])
else:
axes[row][col].axis('off')
count += 1
else:
newaxes = axes
hist = df.hist(by=df['A'], ax=newaxes)

Draw different graphs at the same position/co-ordinates in python using networkX and matplotlib

Graph 1:
Adjacency list:
2: [2, 3, 4, 5, 6, 7, 10, 11, 12, 13, 14]
3: [2, 3, 4, 5, 6, 7, 10, 11, 12, 13, 14]
5: [2, 3, 4, 5, 6, 7, 8, 9]
Plot:
`import networkx as nx
G = nx.Graph()
G1 = nx.Graph()
import matplotlib.pyplot as plt
for i, j in adj_list.items():
for k in j:
G.add_edge(i, k)
pos = nx.spring_layout(G)
nx.draw(G, with_labels=True, node_size = 1000, font_size=20)
plt.draw()
plt.figure() # To plot the next graph in a new figure
plt.show() `
Graph 1
In graph 2, I am eliminating a few edges and replotting the graph, but the position of nodes is changing, how to store the position of nodes for the next graph?
You need to re-use your pos variable while plotting the graph. nx.spring_layout returns a dictionary where the node id is the key and the values are the x,y co-ordinates of the node to be plotted. Just reuse the same pos variable and pass it as an attribute to nx.draw function like this
import networkx as nx
import matplotlib.pyplot as plt
G = nx.Graph()
G1 = nx.Graph()
adj_list = {2: [2, 3, 4, 5, 6, 7, 10, 11, 12, 13, 14],
3: [2, 3, 4, 5, 6, 7, 10, 11, 12, 13, 14],
5: [2, 3, 4, 5, 6, 7, 8, 9]}
import matplotlib.pyplot as plt
for i, j in adj_list.items():
for k in j:
G.add_edge(i, k)
pos = nx.spring_layout(G) #<<<<<<<<<< Initialize this only once
nx.draw(G,pos=pos, with_labels=True, node_size = 1000, font_size=20) #<<<<<<<<< pass the pos variable
plt.draw()
plt.figure() # To plot the next graph in a new figure
plt.show()
Now I will create a new graph and add only half the edges
cnt = 0
G = nx.Graph()
for i, j in adj_list.items():
for k in j:
cnt+=1
if cnt%2 == 0:
continue
G.add_edge(i, k)
nx.draw(G,pos=pos, with_labels=True, node_size = 1000, font_size=20) #<-- same pos variable is used
plt.draw()
plt.figure() # To plot the next graph in a new figure
plt.show()
As you can see only half the edges are added and the node positions still remain the same.

Numpy arrays: can I multiply only a few elements in the array and not all of them?

I am using Python3 and numpy with matplotlib on a project to get Jupiter's Mass from observational telescope astrometry. I want to take an array of numbers, say from 1 to 10, and multiply only a few of them in order, say 1 to 4, by -1.
So 1 to 4 is now negative and 5 to 10 is still positive. I imagine out put like this:
L = [1,2,3,4,5,6,7,8,9,10]
array_L = np.array(L)
>>>array_L
array([1,2,3,4,5,6,7,8,9,10])
neg = array_L[0:4]
>>>neg
array([1,2,3,4])
Neg = neg * -1
>>>Neg
array([-1,-2,-3,-4])
Now I need a way of combining neg and array_L into a new final array that would output like this:
# pseudo code: Neg + array_L(minus elements 0 to 4) = New_L
New_L = array([-1,-2,-3,-4, 5, 6, 7, 8, 9, 10])
Also I know it may be possible to do a limited element iteration over just the elements I want and not the whole array. I can do some of these operations on the list vs the array if it would make it easier.
You are almost there! Try this:
L = array([1,2,3,4,5,6,7,8,9,10])
L[0:4] *= -1
print(L)
Just like with regular python lists, you can do operations on slices of NumPy arrays to change them in-place:
>>> import numpy
>>> L = [1,2,3,4,5,6,7,8,9,10]
>>> array_L = numpy.array(L)
>>> array_L
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> array_L[0:4] *= -1
>>> array_L
array([-1, -2, -3, -4, 5, 6, 7, 8, 9, 10])

Resources