Related
Given a list of search terms and a pandas dataframe, what is the most pythonic way to print whether the search term is present in the target dataframe?
search_terms = ["red", "blue", "green", "orange"]
input_df looks like...
color count
0 red 15
1 blue 39
2 yellow 40
3 green 21
I want to see...
red = true
blue = true
green = true
orange = false
I know how to filter the input_df to include only search_terms. This doesn't alert me to the fact that "orange" was not located in the input_df. The search_terms could contain hundreds or thousands of strings.
color = ['red', 'blue', 'yellow', 'green']
count = [15,39,40,21]
input_dict = dict(color=color, count=count)
input_df = pd.DataFrame(data=input_dict)
found_df = input_df[input_df['color'].isin(search_terms)]
You can try:
out = dict(zip(search_terms, pd.Series(search_terms).isin(input_df['color'])))
Or:
out = dict(zip(search_terms, np.isin(search_terms, input_df)) )
Output:
{'red': True, 'blue': True, 'green': True, 'orange': False}
I have recently started to use Plotly to make 3D plots in python and I wanted to create an animation of what is going on in terms of column vectos of a 3 by 3 matrix when applying Gaussain elimination.
I wrote a function to get the row echelon form and the history of the matrix obtained at each step.
Then I wanted to plot the comuns vectors at each step of the algorithm.
At first I was able to get an animation of the the evolution of the three vectors by adpating this code : https://plotly.com/python/visualizing-mri-volume-slices/
But then I wanted to show on each frame the three row vectors of a given step and the three row vectors from the matrix of the previous step with opacity 0.2.
And when I added that part of the code I got a strange behavior from Plotly. It only showed me the three first vectors which are given to the frame and not all of them.
Here the code I have so far :
import numpy as np
import numpy.linalg as la
import plotly.graph_objects as go
v1 = np.array([5,2,1])
v2 = np.array([2,3,2])
v3 = np.array([3,-1,1])
A = np.transpose(np.vstack([v1,v2,v3]))
# G, H = pivot_Gauss(A)
H = [np.array([[ 5, 2, 3],[ 2, 3, -1],[ 1, 2, 1]]), np.array([[ 1, 0, 0],[ 2, 3, -1],[ 1, 2, 1]]),
np.array([[ 1, 0, 0],[ 0, 3, -1],[ 1, 2, 1]]), np.array([[ 1, 0, 0],[ 0, 3, -1],[ 0, 2, 1]]),
np.array([[1, 0, 0],[0, 1, 0],[0, 2, 1]]), np.array([[1, 0, 0],[0, 1, 0],[0, 0, 1]]),
np.array([[1, 0, 0],[0, 1, 0],[0, 0, 1]]) ]
G = np.array([[1,0,0],[0,1,0],[0,0,1]]) # results obtained using the function pivot_Gauss(A)
nb_frames = len(H)
frames = []
v_norm = 5
colors = ["blue","red","green"]
for k in range(nb_frames): # go.Frame(data,name=str(k))
dat = []
for j in range(np.shape(A)[1]):
v = H[k][:,j]
if la.norm(v) != 0 :
d1 = go.Scatter3d( x=[0,v[0]],y=[0,v[1]],z=[0,v[2]],name="v"+str(k+j+1),hoverinfo='name',
marker=dict(size=0), line=dict(color=colors[j], width=10 ))
dat.append(d1)
d2 = go.Cone(x=[v[0]],y=[v[1]],z=[v[2]],
u=[v[0]/v_norm],v=[v[1]/v_norm],w=[v[2]/v_norm],sizeref=1,
sizemode="scaled",anchor="cm",name="v"+str(k+j+1),hoverinfo='x+y+z+name',
colorscale=[[0, colors[j]], [1,colors[j]]],showscale=False)
dat.append(d2)
if k>0 : # add column vectors of previous Gaussain elimination step (causes some troubles,
#if this if section is commented I get an animation of the three clumn vectors of current step)
vk = H[k-1][:,j]
if la.norm(v) != 0 :
d3 = go.Scatter3d( x=[0,vk[0]],y=[0,vk[1]],z=[0,vk[2]],name="v"+str(k+j+1),hoverinfo='name',
marker=dict(size=0), line=dict(color=colors[j], width=10), opacity = 0.2 )
dat.append(d3)
d4 = go.Cone(x=[vk[0]],y=[vk[1]],z=[vk[2]],
u=[vk[0]/v_norm],v=[vk[1]/v_norm],w=[vk[2]/v_norm],sizeref=1,
sizemode="scaled",anchor="cm",name="v"+str(k+j+1),hoverinfo='x+y+z+name',
colorscale=[[0, colors[j]], [1,colors[j]]],showscale=False,opacity=0.2)
dat.append(d4)
frames.append(go.Frame(data=dat,name=str(k)))
fig = go.Figure(frames=frames)
# Add data to be displayed before animation starts
for j in range(A.shape[1]):
v = A[:,j]
if la.norm(v) != 0 :
fig.add_trace( go.Scatter3d( x=[0,v[0]],y=[0,v[1]],z=[0,v[2]],name="v"+str(k+1),hoverinfo='name',
marker=dict(size=0), line=dict(color=colors[j], width=10 )) )
fig.add_trace( go.Cone(x=[v[0]],y=[v[1]],z=[v[2]],
u=[v[0]/v_norm],v=[v[1]/v_norm],w=[v[2]/v_norm],sizeref=1,
sizemode="scaled",anchor="cm",name="v"+str(k+1),hoverinfo='x+y+z+name',
colorscale=[[0, colors[j]], [1,colors[j]]],showscale=False) )
### This remained almost exactly as the Plotly example
def frame_args(duration):
return {
"frame": {"duration": duration},
"mode": "immediate",
"fromcurrent": True,
"transition": {"duration": duration, "easing": "linear"},
}
sliders = [
{
"pad": {"b": 10, "t": 60},
"len": 0.9,
"x": 0.1,
"y": 0,
"steps": [
{
"args": [[f.name], frame_args(0)],
"label": str(k),
"method": "animate",
}
for k, f in enumerate(fig.frames)
],
}
]
matrix_but = [
{"buttons: [{},{},{},{},{},{}]"}
]
# Layout
fig.update_layout(
title='Pivot de Gauss',
width=600,
height=400,
scene=dict(xaxis=dict(autorange=True),
yaxis=dict(autorange=True),
zaxis=dict(autorange=True),
aspectratio=dict(x=1, y=1, z=1),
),
updatemenus = [
{
"buttons": [
{
"args": [None, frame_args(200)],
"label": "▶", # play symbol
"method": "animate",
},
{
"args": [[None], frame_args(0)],
"label": "◼", # pause symbol
"method": "animate",
},
],
"direction": "left",
"pad": {"r": 10, "t": 70},
"type": "buttons",
"x": 0.1,
"y": 0,
}
],
sliders=sliders
)
fig.show()
You will notice that for each vector I first draw a 3D line and then use cone to get the it arrow_shaped. It might not be the best way to do it, but I do not want to use cone alone as the apsect does not fit what I would like.
I stumbled across a (I think) similar question here : https://community.plotly.com/t/only-one-trace-showing-per-frame-in-animated-plot/25803
But I did not undestand the answer nor the example.
It seems from what I get that only the first six elemetns of the data contained in each frame is taken into account, but I do not understand why and I would like to show everything.
If someone has some insight (and a solution) on the subject, it would be warmly welcomed.
I can clarify things if needed.
Image of the two first column vectors of matrix from current step and first column vector of matrix from previous step
Image of the three column vectors of current matrix when part below if k>0 is commented
It seems from what I get that only the first six elemetns of the data contained in each frame is taken into account, but I do not understand why and I would like to show everything.
There's this paragraph under the heading 'Current Animation Limitations and Caveats':
Animations are designed to work well when each row of input is present across all animation frames, and when categorical values mapped to symbol, color and facet are constant across frames. Animations may be misleading or inconsistent if these constraints are not met.
Though in your first frame you have only three vectors (three lines plus three coneheads) to plot, it violates the above constraint when following frames contain six vectors. To overcome this restriction, we could insert the three vectors in the first frame (and also in the data to be displayed before animation starts) twice, i. e. to the
if k>0 : # add column vectors of previous Gaussain elimination step (causes some troubles,
block add an
else:
dat.append(d1)
dat.append(d2)
block, and in the
if la.norm(v) != 0 :
block duplicate the two fig.add_trace calls.
The following example gives different results obtained with eigenvector_centrality and eigenvector_centrality_numpy. Is there a way to make such calculation more robust? I'm using networkx 2.4, numpy 1.18.5 and scipy 1.5.0.
import numpy as np
import networkx as nx
AdjacencyMatrix = {
0: {
1: 0.6,
},
1: {
2: 0,
3: 0,
},
2: {
4: 0.5,
5: 0.5,
},
3: {
6: 0.5,
7: 0.5,
8: 0.5,
},
4: {},
5: {},
6: {},
7: {},
8: {},
}
G = nx.DiGraph()
for nodeID in AdjacencyMatrix.keys():
G.add_node(nodeID)
for k1 in AdjacencyMatrix.keys():
for k2 in AdjacencyMatrix[k1]:
weight = AdjacencyMatrix[k1][k2]
split_factor = len(AdjacencyMatrix[k1])
G.add_edge(k1, k2, weight=weight / split_factor, reciprocal=1.0 / (split_factor * weight) if weight != 0 else np.inf)
eigenvector_centrality = {v[0]: v[1] for v in sorted(nx.eigenvector_centrality(G.reverse() if G.is_directed() else G, max_iter=10000, weight="weight").items(), key=lambda x: x[1], reverse=True)}
print(eigenvector_centrality)
eigenvector_centrality_numpy = {v[0]: v[1] for v in sorted(nx.eigenvector_centrality_numpy(G.reverse() if G.is_directed() else G, max_iter=10000, weight="weight").items(), key=lambda x: x[1], reverse=True)}
print(eigenvector_centrality_numpy)
Here's my output:
{0: 0.6468489798823026, 3: 0.5392481399595738, 2: 0.5392481399595732, 1: 0.0012439403459275048, 4: 0.0012439403459275048, 5: 0.0012439403459275048, 6: 0.0012439403459275048, 7: 0.0012439403459275048, 8: 0.0012439403459275048}
{3: 0.9637027924175013, 0: 0.0031436862826891288, 6: 9.593026373266866e-11, 8: 3.5132785569658154e-11, 4: 1.2627565659784068e-11, 1: 9.433263632036004e-14, 7: -2.6958851817582286e-11, 5: -3.185304797703736e-11, 2: -0.26695888283266833}
edit - see the response by dshult. He's one of the main people who maintains/updates networkx.
I think this may be a bug, but not the way you think. This graph is directed and acyclic. So for this graph, I don't think there is a nonzero eigenvalue.
It looks like the algorithm seems to implicitly assume an undirected graph, or at least that if it's directed it has cycles. And I would expect the algorithm to break if there's no cycle.
I'm going to encourage the networkx people to look at this in more detail.
I'm actually surprised that it converges for the non-numpy version.
Joel is right to say that eigenvector_centrality isn't a useful measure for directed acyclic graphs. See this nice description of centrality. This should be useless for both the numpy and non-numpy versions of the code.
I am trying to insert spacing between two specific bars but cannot find any easy way to do this. I can manually add a dummy row with with 0 height to create and empty space but doesn't give me control of how wide the space should be. Is there a more programmatic method I can use to control the spacing between bars at any position?
Example Code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
mydict = {
'Event': ['Running', 'Swimming', 'Biking', '', 'Hiking', 'Jogging'],
'Completed': [2, 4, 3, 0, 7, 9],
'Participants': [10, 20, 35, 0, 10, 20]}
df = pd.DataFrame(mydict).set_index('Event')
df = df.assign(Completion=(df.Completed / df.Participants) * 100)
plt.subplots(figsize=(5, 4))
print(df.index)
ax = sns.barplot(x=df.Completion, y=df.index, color="orange", orient='h')
plt.xticks(rotation=60)
plt.tight_layout()
plt.show()
Example DataFrame Output:
Completed Participants Completion
Event
Running 2 10 20.000000
Swimming 4 20 20.000000
Biking 3 35 8.571429
0 0 NaN
Hiking 7 10 70.000000
Jogging 9 20 45.000000
Example output (blue arrows added outside of code to show where empty row was added.):
I think you can access the position of the boxes and the name of the labels. Then modify them. You may find an more general way depending on your use case, but this works for the given example.
#define a function to add space starting a specific label
def add_space_after(ax, label_shift='', extra_space=0):
bool_space = False
# get postion of current ticks
ticks_position = np.array(ax.get_yticks()).astype(float)
# iterate over the boxes/label
for i, (patch, label) in enumerate(zip(ax.patches, ax.get_yticklabels())):
# if the label to start the shift found
if label.get_text()==label_shift: bool_space = True
# reposition the boxes and the labels afterward
if bool_space:
patch.set_y(patch.get_y() + extra_space)
ticks_position[i] += extra_space
# in the case where the spacing is needed
if bool_space:
ax.set_yticks(ticks_position)
ax.set_ylim([ax.get_ylim()[0]+extra_space, ax.get_ylim()[1]])
#note: no more blank row
mydict = {
'Event': ['Running', 'Swimming', 'Biking', 'Hiking', 'Jogging'],
'Completed': [2, 4, 3, 7, 9],
'Participants': [10, 20, 35, 10, 20]}
df = pd.DataFrame(mydict).set_index('Event')
df = df.assign(Completion=(df.Completed / df.Participants) * 100)
ax = sns.barplot(x=df.Completion, y=df.index, color="orange", orient='h')
plt.xticks(rotation=60)
plt.tight_layout()
#use the function
add_space_after(ax, 'Hiking', 0.6)
plt.show()
I people, I'm trying to plot a network graph using networkx module, but I am having results I was not expecting and I am starting to ask myself if it is any module issue!
I have this code inside a class:
def plotGraph(self):
conn = []
nodeLabel = {}
for node_idx in self.operatorNodes:
print("i = ", node_idx)
print(self.node[node_idx].childs)
for child in self.node[node_idx].childs:
conn.append((child.idx, node_idx))
for i in range(self.nn):
nodeLabel[i] = str(i) + ": " + self.node[i].opString
node_color = ['blue'] * self.nn
#for i in range(self.nOutputs):
# node_color[i] = 'red'
node_color[0] = 'red'
print('Graph Conn = ', conn)
print('Graph Color = ', node_color)
# you may name your edge labels
labels = map(chr, range(65, 65 + len(conn)))
print('nodeLabel = ', nodeLabel)
draw_graph(conn, nodeLabel, node_color=node_color, labels=labels)
From the prints I can see that what is being passed inside the draw_graph is (draw_graph code is based in https://www.udacity.com/wiki/creating-network-graphs-with-python):
Graph Conn = [(2, 0), (3, 0), (4, 1), (5, 1), (6, 2), (7, 2), (8, 5), (9, 5)]
Graph Color = ['red', 'blue', 'blue', 'blue', 'blue', 'blue', 'blue', 'blue', 'blue', 'blue']
nodeLabel = {0: '0: mul', 1: '1: mul', 2: '2: mul', 3: '3: cte', 4: '4: cte', 5: '5: sum', 6: '6: cte', 7: '7: cte', 8: '8: cte', 9: '9: cte'}
Yet the plot is the following
draw_graph code is:
def draw_graph(graph, nodeLabel, node_color, labels=None, graph_layout='shell',
node_size=1600, node_alpha=0.3,
node_text_size=12,
edge_color='blue', edge_alpha=0.3, edge_tickness=1,
edge_text_pos=0.3,
text_font='sans-serif'):
# create networkx graph
G=nx.DiGraph()
# add edges
for edge in graph:
G.add_edge(edge[0], edge[1])
# these are different layouts for the network you may try
# shell seems to work best
if graph_layout == 'spring':
graph_pos = nx.spring_layout(G)
elif graph_layout == 'spectral':
graph_pos = nx.spectral_layout(G)
elif graph_layout == 'random':
graph_pos = nx.random_layout(G)
else:
graph_pos = nx.shell_layout(G)
# draw graph
nx.draw_networkx_edges(G, graph_pos, width=edge_tickness, alpha=edge_alpha, edge_color=edge_color)
nx.draw_networkx_labels(G, graph_pos, labels=nodeLabel, font_size=node_text_size, font_family=text_font)
if labels is None:
labels = range(len(graph))
edge_labels = dict(zip(graph, labels))
nx.draw_networkx_edge_labels(G, graph_pos, edge_labels=edge_labels, label_pos=edge_text_pos)
nx.draw(G, graph_pos, node_size=node_size, alpha=node_alpha, node_color=node_color)
Has can be seen, the Graph Color in 0 position is red and the remain should be blue, yet the plot is putting in the third node! There is no way for me to access node 1 has well, apparently, nodes are misplaced! The nodes color are placed in the following positions [2, 0, 3, 4, 5,....].
When you use nx.draw and pass it an (optional) list of colors, it will assign those colors to the nodes in the same order as the (optional) nodelist. But you didn't define nodelist. So it will default to whatever order comes out of G.nodes().
Since the underlying data structure for a networkx graph is a dictionary, you have to deal with the fact that you cannot count on the nodes to have any specified order.
Try passing nodelist into the nx.draw command in the order you want.