i have stared nodes positions with networkx but matplotlib render it at wrong place.
What is very important is to get the same picture each time i launch the script : so, nodes coordinates respect is fundamental.
Also, the view is too compact, forcing me too manually zoom inside, and appears in figure2 in place of figure1.
finally, matplotlib works on little float scale [-1,1] where i prefer screen dimension x [0,1280] and y [0,704].
i have tried many source code but it still doesnt do the job properly
import matplotlib.pyplot as plt
import networkx as nx
zero_zero = 'zero_zero'
zero_one = 'zero_one'
one_zero = 'one_zero'
one_one = 'one_one'
foo_1 = 'foo_1'
foo_2 = 'foo_2'
foo_3 = 'foo_3'
bar_1 = 'bar_1'
bar_2 = 'bar_2'
bar_3 = 'bar_3'
stuff_1 = 'stuff_1'
stuff_2 = 'stuff_2'
stuff_3 = 'stuff_3'
waow = 'waow'
fig = plt.figure(figsize=(100,100))
fig, ax = plt.subplots()
ax.set_xlim(-1.5, 1.5)
ax.set_ylim(-1.5 , 1.5)
G = nx.Graph()
starpos={zero_zero:(0,0), zero_one:(0,1), one_zero:(1,0), one_one:(1,1), foo_1:(1,0),foo_2:(0.1,0.1),foo_3:(0.2,0.3),bar_1:(0.3,0.2),bar_2:(0.76,.80),bar_3:(0,0.2),stuff_1:(0.8,0.6),stuff_2:(0.3,0.9),stuff_3:(0.7,0.7),waow:(0.4,0.6)}
for k,v in starpos.items():
G.add_node(k,pos=v)
G.nodes(data=True)
G.add_edge(foo_1, foo_2)
G.add_edge(foo_3, bar_3)
G.add_edge(bar_1, foo_3)
G.add_edge(bar_1, bar_2)
G.add_edge(bar_3, bar_2)
G.add_edge(stuff_1, stuff_3)
G.add_edge(waow, bar_3)
G.add_edge(bar_2, stuff_3)
G.add_edge(zero_zero, zero_one)
G.add_edge(zero_one, one_zero)
G.add_edge(one_zero, one_one)
G.add_edge(one_one, zero_zero)
pos = nx.spring_layout(G)
nx.draw(G, pos, font_size=16, with_labels=False)
for p in pos: # raise text positions
pos[p][1] += 0.07
nx.draw_networkx_labels(G, pos)
plt.show()
networkx matplotlib picture
Let's first deal with a misconception: even though you've assigned an attribute 'pos' to each node in the graph, the drawing commands don't use that at all.
When you do:
nx.draw(G, pos)
the argument pos is a dictionary whose keys are the nodes of G and whose values are the positions you want them to be in. In your case, you've defined pos using pos=nx.spring_layout(G). In this command, each node is initially given a random position, and then it treats the nodes as masses which are connected by springs and tries to find where the nodes would move to. So every time it will find a different arrangement, and - this is important - it doesn't care at all about the node attributes you've defined.
In your case, you've already defined the dictionary starpos, which I believe has the desired position of each node. So there's no need to assign an attribute when you create the nodes [since networkx doesn't use the attributes to assign positions]. Then when you draw it, use the dictionary you've already created rather than using spring_layout to create a new one. Try
nx.draw(G, starpos, font_size=16, with_labels=False)
Related
So im trying to write a sorting algorithm visualizer. Code bellow. I am basically using matplotlib to plot the figure. My problem is that i want to also highlight the current element in the array being accessed, compared, and swaped. all of my attempts have failed at this. Please do also let me know if there is a better way of writing a visulizer in python. I have seen some tutorials using pygame but wanted to stick to basics. Also when the program runs till the end and everthing is sorted the plot goes blank. Is this because of the plt.clf() command and is there a way for the sorted plot to not close. Thanks!!!
from matplotlib import pyplot as plt
import numpy as np
# generate sudo-random list of numbers
lst = np.random.randint(0, 100, 20)
# x values for the bar plot
x = range(0, len(lst))
def insertion_sort(lst):
# loop through the list
# incrementally check which index to the left should i be placed in
for i in range(1, len(lst)):
while lst[i-1] > lst[i] and i>0:
lst[i], lst[i-1] = lst[i-1], lst[i]
i = i-1
# plot
plt.bar(x,lst)
plt.pause(0.1)
plt.clf()
plt.show()
return lst
print(lst)
print(insertion_sort(lst))
So the solution i came up with for this problem was to create a second list containing the current i and i-1 indexes and basically plot a second barchart over the main one set to a different color. Bad solution and failed indeed. Another idea i tried was to pass a conditional argument for the color paramater of plt.bar()
colors = ['red' if lst[i-1]>lst[i] else for element in lst 'blue']
plt.bar(x, lst, color=colors)
This did not work aswell. dont know if am on the right track and just need to keep at it or this is whole setup is futile to begin with. thank you for your time!!
I am trying to construct a grouped vertical bar chart in Bokeh from a pandas dataframe. I'm struggling with understanding the use of factor_cmap and how the color mapping works with this function. There's an example in the documentation (https://docs.bokeh.org/en/latest/docs/user_guide/categorical.html#pandas) that was helpful to follow, here:
from bokeh.io import output_file, show
from bokeh.palettes import Spectral5
from bokeh.plotting import figure
from bokeh.sampledata.autompg import autompg_clean as df
from bokeh.transform import factor_cmap
output_file("bar_pandas_groupby_nested.html")
df.cyl = df.cyl.astype(str)
df.yr = df.yr.astype(str)
group = df.groupby(by=['cyl', 'mfr'])
index_cmap = factor_cmap('cyl_mfr', palette=Spectral5, factors=sorted(df.cyl.unique()), end=1)
p = figure(plot_width=800, plot_height=300, title="Mean MPG by # Cylinders and Manufacturer",
x_range=group, toolbar_location=None, tooltips=[("MPG", "#mpg_mean"), ("Cyl, Mfr", "#cyl_mfr")])
p.vbar(x='cyl_mfr', top='mpg_mean', width=1, source=group,
line_color="white", fill_color=index_cmap, )
p.y_range.start = 0
p.x_range.range_padding = 0.05
p.xgrid.grid_line_color = None
p.xaxis.axis_label = "Manufacturer grouped by # Cylinders"
p.xaxis.major_label_orientation = 1.2
p.outline_line_color = None
show(p)
This yields the following (again, a screen shot from the documentation):
Grouped Vbar output
I understand how factor_cmap is working here, I think. The index for the dataframe has multiple factors and we're only taking the first by slicing (as seen with the end = 1). But when I try to instead set coloring based on the second index level, mfr, (setting start = 1 , end = 2) , the index mapping breaks and I get this. I based this change on my assumption that the factors were hierarchical and I needed to slice them to get the second level.
I think I must be thinking about the indexing with these categorical factors wrong, but I'm not sure what I'm doing wrong. How do I get a categorical mapper to color by the second level of the factor? I assumed the format of the factors was ('cyl', 'mfr') but maybe that assumption is wrong?
Here's the documentation for factor_cmap, although it wasn't very helpful: https://docs.bokeh.org/en/latest/docs/reference/transform.html#bokeh.transform.factor_cmap .
If you mean you are trying this:
index_cmap = factor_cmap('cyl_mfr',
palette=Spectral5,
factors=sorted(df.cyl.unique()),
start=1, end=2)
Then there are at least two issues:
2 is out of bounds for the length of the list of sub-factors ('cyl', 'mfr'). You would just want start=1 and leave end with its default value of None (which means to the end of the list, as usual for any Python slice).
In this specific case, with start=1 that means "colormap based on mfr sub-factors of the values", but you are still configuring the cololormapper with the cylinders as the factors for the map:
factors=sorted(df.cyl.unique())
When the colormapper goes to look up a value with mfr="mazda" in the mapping, it does not find anything (because you only put cylinder values in the mapping) so it gets shaded the default color grey (as expected).
So you could do something like this:
index_cmap = factor_cmap('cyl_mfr',
palette=Spectral5,
factors=sorted(df.mfr.unique()),
start=1)
Which "works" modulo the fact that there are way more manufacturer values than there are colors in the Spectral5 palette:
In the real situation you'll need to make sure you use a palette as least as big as the number of (sub-)factors that you configure.
I would like to specify the color of particular observations using seaborn catplot. In a made up exemple:
import seaborn as sns
import random as r
name_list=['pepe','Fabrice','jim','Michael']
country_list=['spain','France','uk','Uruguay']
favourite_color=['green','blue','red','white']
df=pd.DataFrame({'name':[r.choice(name_list) for n in range(100)],
'country':[r.choice(country_list) for n in range(100)],
'fav_color':[r.choice(favourite_color) for n in range(100)],
'score':np.random.rand(100),
})
sns.catplot(x='fav_color',
y='score',
col='country',
col_wrap=2,
data=df,
kind='swarm')
I would like to colour (or mark in another distinctive way, it could be the marker) all the observations with the name 'pepe'. How I could do that ? The other colors I dont mind, it would be better if they are all the same.
You can achieve the results you want by adding a boolean column to the dataframe and using it as the hue parameter of the catplot() call. This way you will obtain the results with two colours (one for pepe observations and one for the rest). Results can be seen here:
Also, the parameter legend=False should be set since otherwise the legend for is_pepe will appear on the side.
The code would be as below:
df['is_pepe'] = df['name'] == 'pepe'
ax = sns.catplot(x='fav_color',
y='score',
col='country',
col_wrap=2,
data=df,
kind='swarm',
hue='is_pepe',
legend=False)
Furthermore, you can specify the two colours you want for the two kinds of observations (pepe and not-pepe) using the parameter palette and the top-level function sns.color_palette() as shown below:
ax = sns.catplot(x='fav_color',
y='score',
col='country',
col_wrap=2,
data=df,
kind='swarm',
hue='is_pepe',
legend=False,
palette=sns.color_palette(['green', 'blue']))
Obtaining this results:
I have a dataset with 80 variables. I am interested in creating a function that will automate the creation of a 20 X 4 GridSpec in Matplotlib. Each subplot would either contain a histogram or a barplot for each of the 80 variables in the data. As a first step, I successfully created two functions (I call them 'counts' and 'histogram') that contain the layout of the plot that I want. Both of them work when tested on individual variables. As a next step, I attempted to create a function that would take the column names, loop through a conditional to test whether the data type is an object or otherwise and call the right function based on the datatype as a new subplot. Here is the code that I have so far:
Creates list of coordinates we will need for subplot specification:
A = np.arange(21)
B = np.arange(4)
coords = []
for i in A:
for j in B:
coords.append([A[i], B[j]])
#Create the gridspec and layout the figure
import matplotlib.gridspec as gridspec
fig = plt.figure(figsize=(12,6))
gs = gridspec.GridSpec(2,4)
#Function that relies on what we've done above:
def grid(cols=['MSZoning', 'LotFrontage', 'LotArea', 'Street', 'Alley']):
for i in cols:
for vals in coords:
if str(train[i].dtype) == 'object':
plt.subplot('gs'+str(vals))
counts(cols)
else:
plt.subplot('gs'+str(vals))
histogram(cols)
When attempted, this code returns an error:
ValueError: Single argument to subplot must be a 3-digit integer
For purposes of helping you visualize, what I am hoping to achieve, I attach the screen shot below, which was produced by the line by line coding (with my created helper functions) I am trying to avoid:
Can anyone help me figure out where I am going wrong? I would appreciate any advice. Thank you!
The line plt.subplot('gs'+str(vals)) cannot work; which is also what the error tells you.
As can be seen from the matplotlib GridSpec tutorial, it needs to be
ax = plt.subplot(gs[0, 0])
So in your case you may use the values from the list as
ax = plt.subplot(gs[vals[0], vals[1]])
Mind that you also need to make sure that the coords list must have the n*m elements, if the gridspec is defined as gs = gridspec.GridSpec(n,m).
Background:
I'm working on a program to show a 2d cross section of 3d data. The data is stored in a simple text csv file in the format x, y, z1, z2, z3, etc. I take a start and end point and flick through the dataset (~110,000 lines) to create a line of points between these two locations, and dump them into an array. This works fine, and fairly quickly (takes about 0.3 seconds). To then display this line, I've been creating a matplotlib stacked bar chart. However, the total run time of the program is about 5.5 seconds. I've narrowed the bulk of it (3 seconds worth) down to the code below.
'values' is an array with the x, y and z values plus a leading identifier, which isn't used in this part of the code. The first plt.bar is plotting the bar sections, and the second is used to create an arbitrary floor of -2000. In order to generate a continuous looking section, I'm using an interval between each bar of zero.
import matplotlib.pyplot as plt
for values in crossSection:
prevNum = None
layerColour = None
if values != None:
for i in range(3, len(values)):
if values[i] != 'n':
num = float(values[i].strip())
if prevNum != None:
plt.bar(spacing, prevNum-num, width=interval, \
bottom=num, color=layerColour, \
edgecolor=None, linewidth=0)
prevNum = num
layerColour = layerParams[i].strip()
if prevNum != None:
plt.bar(spacing, prevNum+2000, width=interval, bottom=-2000, \
color=layerColour, linewidth=0)
spacing += interval
I'm sure there's a more efficient way to do this, but I'm new to Matplotlib and still unfamilar with its capabilities. The other main use of time in the code is:
plt.savefig('output.png')
which takes about a second, but I figure this is to be expected to save the file and I can't do anything about it.
Question:
Is there a faster way of generating the same output (a stacked bar chart or something that looks like one) by using plt.bar() better, or a different Matplotlib function?
EDIT:
I forgot to mention in the original post that I'm using Python 3.2.3 and Matplotlib 1.2.0
Leaving this here in case someone runs into the same problem...
While not exactly the same as using bar(), with a sufficiently large dataset (large enough that using bar() takes a few seconds) the results are indistinguishable from stackplot(). If I sort the data into layers using the method given by tcaswell and feed it into stackplot() the chart is created in 0.2 seconds, rather than 3 seconds.
EDIT
Code provided by tcaswell to turn the data into layers:
accum_values = []
for values in crosssection:
accum_values.append([float(v.strip()) for v iv values[3:]])
accum_values = np.vstack(accum_values).T
layer_params = [l.strip() for l in layerParams]
bottom = numpy.zeros(accum_values[0].shape)
It looks like you are drawing each bar, you can pass sequences to bar (see this example)
I think something like:
accum_values = []
for values in crosssection:
accum_values.append([float(v.strip()) for v iv values[3:]])
accum_values = np.vstack(accum_values).T
layer_params = [l.strip() for l in layerParams]
bottom = numpy.zeros(accum_values[0].shape)
ax = plt.gca()
spacing = interval*numpy.arange(len(accum_values[0]))
for data,color is zip(accum_values,layer_params):
ax.bar(spacing,data,bottom=bottom,color=color,linewidth=0,width=interval)
bottom += data
will be faster (because each call to bar creates one BarContainer and I suspect the source of your issues is you were creating one for each bar, instead of one for each layer).
I don't really understand what you are doing with the bars that have tops below their bottoms, so I didn't try to implement that, so you will have to adapt this a bit.