Im trying to build an application in which I need to extract the x,y value of a bokeh line. Im able to do this for a bokeh circle (see below, where I find the x value of the circle is tmp1.glyph.x = 2), but the same syntax doesnt work for a line between two points (tmp1.glyph.x ="x"). I would hope to see [-3,3]. Would be grateful for any advice.
from bokeh.plotting import figure, show
fig = figure(x_range=(-5,5),y_range=(-5, 5))
tmp1=fig.circle(x=2, y=-3, size=5)
tmp=fig.line(x = [-3,3], y = [4,-4])
print(tmp1.glyph.x)
# output: 2
print(tmp.glyph.x)
# output: x
show(fig)
For the line glyph a ColumnDataSource object is created. To print the data of this ColumnDataSource use tmp.data_source.data['x'] in your example.
To explain this behavior in more detail, you have to know, that if you pass only one value for x and y for a glyph, this value is stored directly as value (inside the object is looks like this: x = {'value': 2}). If you pass a list to the glyph this gets a pointer with the name of the column in the ColumnDataSource (inside it looks like this x = {'field': 'x'}). The same behavior has the circle glyph, you can try it out adding one value as a list.
Therefor a general solution to print the values could look like the code below:
value = tmp.glyph.x
if isinstance(field_or_value, str):
value = tmp1.data_source.data[value]
print(value)
Here we check if the value in tmp.glyph.x is a string. If it is a string, this is a pointer the the ColumnDataSource.
Related
I am trying to construct a grouped vertical bar chart in Bokeh from a pandas dataframe. I'm struggling with understanding the use of factor_cmap and how the color mapping works with this function. There's an example in the documentation (https://docs.bokeh.org/en/latest/docs/user_guide/categorical.html#pandas) that was helpful to follow, here:
from bokeh.io import output_file, show
from bokeh.palettes import Spectral5
from bokeh.plotting import figure
from bokeh.sampledata.autompg import autompg_clean as df
from bokeh.transform import factor_cmap
output_file("bar_pandas_groupby_nested.html")
df.cyl = df.cyl.astype(str)
df.yr = df.yr.astype(str)
group = df.groupby(by=['cyl', 'mfr'])
index_cmap = factor_cmap('cyl_mfr', palette=Spectral5, factors=sorted(df.cyl.unique()), end=1)
p = figure(plot_width=800, plot_height=300, title="Mean MPG by # Cylinders and Manufacturer",
x_range=group, toolbar_location=None, tooltips=[("MPG", "#mpg_mean"), ("Cyl, Mfr", "#cyl_mfr")])
p.vbar(x='cyl_mfr', top='mpg_mean', width=1, source=group,
line_color="white", fill_color=index_cmap, )
p.y_range.start = 0
p.x_range.range_padding = 0.05
p.xgrid.grid_line_color = None
p.xaxis.axis_label = "Manufacturer grouped by # Cylinders"
p.xaxis.major_label_orientation = 1.2
p.outline_line_color = None
show(p)
This yields the following (again, a screen shot from the documentation):
Grouped Vbar output
I understand how factor_cmap is working here, I think. The index for the dataframe has multiple factors and we're only taking the first by slicing (as seen with the end = 1). But when I try to instead set coloring based on the second index level, mfr, (setting start = 1 , end = 2) , the index mapping breaks and I get this. I based this change on my assumption that the factors were hierarchical and I needed to slice them to get the second level.
I think I must be thinking about the indexing with these categorical factors wrong, but I'm not sure what I'm doing wrong. How do I get a categorical mapper to color by the second level of the factor? I assumed the format of the factors was ('cyl', 'mfr') but maybe that assumption is wrong?
Here's the documentation for factor_cmap, although it wasn't very helpful: https://docs.bokeh.org/en/latest/docs/reference/transform.html#bokeh.transform.factor_cmap .
If you mean you are trying this:
index_cmap = factor_cmap('cyl_mfr',
palette=Spectral5,
factors=sorted(df.cyl.unique()),
start=1, end=2)
Then there are at least two issues:
2 is out of bounds for the length of the list of sub-factors ('cyl', 'mfr'). You would just want start=1 and leave end with its default value of None (which means to the end of the list, as usual for any Python slice).
In this specific case, with start=1 that means "colormap based on mfr sub-factors of the values", but you are still configuring the cololormapper with the cylinders as the factors for the map:
factors=sorted(df.cyl.unique())
When the colormapper goes to look up a value with mfr="mazda" in the mapping, it does not find anything (because you only put cylinder values in the mapping) so it gets shaded the default color grey (as expected).
So you could do something like this:
index_cmap = factor_cmap('cyl_mfr',
palette=Spectral5,
factors=sorted(df.mfr.unique()),
start=1)
Which "works" modulo the fact that there are way more manufacturer values than there are colors in the Spectral5 palette:
In the real situation you'll need to make sure you use a palette as least as big as the number of (sub-)factors that you configure.
If I have a table with three columns where the first column represents the name of each point, the second column represent numerical data (mean) and the last column represent (second column + fixed number). The following an example how is the data looks like:
I want to plot this table so I have the following figure
If it is possible how I can plot it using either Microsoft Excel or python or R (Bokeh).
Alright, I only know how to do it in ggplot2, I will answer regarding R here.
These method only works if the data-frame is in the format you provided above.
I rename your column to Name.of.Method, Mean, Mean.2.2
Preparation
Loading csv data into R
df <- read.csv('yourdata.csv', sep = ',')
Change column name (Do this if you don't want to change the code below or else you will need to go through each parameter to match your column names.
names(df) <- c("Name.of.Method", "Mean", "Mean.2.2")
Method 1 - Using geom_segment()
ggplot() +
geom_segment(data=df,aes(x = Mean,
y = Name.of.Method,
xend = Mean.2.2,
yend = Name.of.Method))
So as you can see, geom_segment allows us to specify the end position of the line (Hence, xend and yend)
However, it does not look similar to the image you have above.
The line shape seems to represent error bar. Therefore, ggplot provides us with an error bar function.
Method 2 - Using geom_errorbarh()
ggplot(df, aes(y = Name.of.Method, x = Mean)) +
geom_errorbarh(aes(xmin = Mean, xmax = Mean.2.2), linetype = 1, height = .2)
Usually we don't use this method just to draw a line. However, its functionality fits your requirement. You can see that we use xmin and ymin to specify the head and the tail of the line.
The height input is to adjust the height of the bar at the end of the line in both ends.
I would use hbar for this:
from bokeh.io import show, output_file
from bokeh.plotting import figure
output_file("intervals.html")
names = ["SMB", "DB", "SB", "TB"]
p = figure(y_range=names, plot_height=350)
p.hbar(y=names, left=[4,3,2,1], right=[6.2, 5.2, 4.2, 3.2], height=0.3)
show(p)
However Whisker would also be an option if you really want whiskers instead of interval bars.
I have a dataset with 80 variables. I am interested in creating a function that will automate the creation of a 20 X 4 GridSpec in Matplotlib. Each subplot would either contain a histogram or a barplot for each of the 80 variables in the data. As a first step, I successfully created two functions (I call them 'counts' and 'histogram') that contain the layout of the plot that I want. Both of them work when tested on individual variables. As a next step, I attempted to create a function that would take the column names, loop through a conditional to test whether the data type is an object or otherwise and call the right function based on the datatype as a new subplot. Here is the code that I have so far:
Creates list of coordinates we will need for subplot specification:
A = np.arange(21)
B = np.arange(4)
coords = []
for i in A:
for j in B:
coords.append([A[i], B[j]])
#Create the gridspec and layout the figure
import matplotlib.gridspec as gridspec
fig = plt.figure(figsize=(12,6))
gs = gridspec.GridSpec(2,4)
#Function that relies on what we've done above:
def grid(cols=['MSZoning', 'LotFrontage', 'LotArea', 'Street', 'Alley']):
for i in cols:
for vals in coords:
if str(train[i].dtype) == 'object':
plt.subplot('gs'+str(vals))
counts(cols)
else:
plt.subplot('gs'+str(vals))
histogram(cols)
When attempted, this code returns an error:
ValueError: Single argument to subplot must be a 3-digit integer
For purposes of helping you visualize, what I am hoping to achieve, I attach the screen shot below, which was produced by the line by line coding (with my created helper functions) I am trying to avoid:
Can anyone help me figure out where I am going wrong? I would appreciate any advice. Thank you!
The line plt.subplot('gs'+str(vals)) cannot work; which is also what the error tells you.
As can be seen from the matplotlib GridSpec tutorial, it needs to be
ax = plt.subplot(gs[0, 0])
So in your case you may use the values from the list as
ax = plt.subplot(gs[vals[0], vals[1]])
Mind that you also need to make sure that the coords list must have the n*m elements, if the gridspec is defined as gs = gridspec.GridSpec(n,m).
Hi i got a script im working on and its not working out as well as I want it to
This is what I got so far
import bpy
def Key_Frame_Points(): #Gets the key-frame values as an array.
fcurves = bpy.context.active_object.animation_data.action.fcurves
for curve in fcurves:
keyframePoints = fcurves[4].keyframe_points # selects Action channel's axis / attribute
for keyframe in keyframePoints:
print('KEY FRAME POINTS ARE #T ',keyframe.co[0])
KEYFRAME_POINTS_ARRAY = keyframe.co[0]
print(KEYFRAME_POINTS_ARRAY)
Key_Frame_Points()
When I run this its printing out all the keyframes on the selected Objects as I wanted it to. But the problem is that I cant for some reason get the Values its printing into a variable. If you run it and check the the System concole. its acting odd.Like as in its printing out the Values of the Keyframed object.But when I ask it to get those values as an array, its just printing out the last frame.
Here is how it looks like briefly
I think what you want to do is add each keyframe.co[1] to an array which means you want to use KEYFRAME_POINTS_ARRAY.append(keyframe.co[1]) and for that to work you will need to define it as an empty array outside the loop with KEYFRAME_POINTS_ARRAY = []
Note that keyframe.co[0] is the frame that is keyed while keyframe.co[1] is the keyed value at that frame.
Also of note is that you are looping through fcurves but not using each curve.
for curve in fcurves:
keyframePoints = fcurves[4].keyframe_points
By using fcurves[4] here you are reading the same fcurve every time, you probably meant to use keyframePoints = curve.keyframe_points
So I expect you want to have -
import bpy
def Key_Frame_Points(): #Gets the key-frame values as an array.
KEYFRAME_POINTS_ARRAY = []
fcurves = bpy.context.active_object.animation_data.action.fcurves
for curve in fcurves:
keyframePoints = curve.keyframe_points
for keyframe in keyframePoints:
print('KEY FRAME POINTS ARE frame:{} value:{}'.format(keyframe.co[0],keyframe.co[1]))
KEYFRAME_POINTS_ARRAY.append(keyframe.co[1])
return KEYFRAME_POINTS_ARRAY
print(Key_Frame_Points())
You may also be interested to use fcurves.find(data_path) to find a specific curve by it's path.
There is also fcurve.evaluate(frame) that will give you the curve value at any frame not just the keyed values.
Background:
I'm working on a program to show a 2d cross section of 3d data. The data is stored in a simple text csv file in the format x, y, z1, z2, z3, etc. I take a start and end point and flick through the dataset (~110,000 lines) to create a line of points between these two locations, and dump them into an array. This works fine, and fairly quickly (takes about 0.3 seconds). To then display this line, I've been creating a matplotlib stacked bar chart. However, the total run time of the program is about 5.5 seconds. I've narrowed the bulk of it (3 seconds worth) down to the code below.
'values' is an array with the x, y and z values plus a leading identifier, which isn't used in this part of the code. The first plt.bar is plotting the bar sections, and the second is used to create an arbitrary floor of -2000. In order to generate a continuous looking section, I'm using an interval between each bar of zero.
import matplotlib.pyplot as plt
for values in crossSection:
prevNum = None
layerColour = None
if values != None:
for i in range(3, len(values)):
if values[i] != 'n':
num = float(values[i].strip())
if prevNum != None:
plt.bar(spacing, prevNum-num, width=interval, \
bottom=num, color=layerColour, \
edgecolor=None, linewidth=0)
prevNum = num
layerColour = layerParams[i].strip()
if prevNum != None:
plt.bar(spacing, prevNum+2000, width=interval, bottom=-2000, \
color=layerColour, linewidth=0)
spacing += interval
I'm sure there's a more efficient way to do this, but I'm new to Matplotlib and still unfamilar with its capabilities. The other main use of time in the code is:
plt.savefig('output.png')
which takes about a second, but I figure this is to be expected to save the file and I can't do anything about it.
Question:
Is there a faster way of generating the same output (a stacked bar chart or something that looks like one) by using plt.bar() better, or a different Matplotlib function?
EDIT:
I forgot to mention in the original post that I'm using Python 3.2.3 and Matplotlib 1.2.0
Leaving this here in case someone runs into the same problem...
While not exactly the same as using bar(), with a sufficiently large dataset (large enough that using bar() takes a few seconds) the results are indistinguishable from stackplot(). If I sort the data into layers using the method given by tcaswell and feed it into stackplot() the chart is created in 0.2 seconds, rather than 3 seconds.
EDIT
Code provided by tcaswell to turn the data into layers:
accum_values = []
for values in crosssection:
accum_values.append([float(v.strip()) for v iv values[3:]])
accum_values = np.vstack(accum_values).T
layer_params = [l.strip() for l in layerParams]
bottom = numpy.zeros(accum_values[0].shape)
It looks like you are drawing each bar, you can pass sequences to bar (see this example)
I think something like:
accum_values = []
for values in crosssection:
accum_values.append([float(v.strip()) for v iv values[3:]])
accum_values = np.vstack(accum_values).T
layer_params = [l.strip() for l in layerParams]
bottom = numpy.zeros(accum_values[0].shape)
ax = plt.gca()
spacing = interval*numpy.arange(len(accum_values[0]))
for data,color is zip(accum_values,layer_params):
ax.bar(spacing,data,bottom=bottom,color=color,linewidth=0,width=interval)
bottom += data
will be faster (because each call to bar creates one BarContainer and I suspect the source of your issues is you were creating one for each bar, instead of one for each layer).
I don't really understand what you are doing with the bars that have tops below their bottoms, so I didn't try to implement that, so you will have to adapt this a bit.