Display figures and names with tooltips and mark_text with Altair - python-3.x

Here are three issues I have with tooltips and labels that I want to display on my Altair graph. All the issues are more or less linked.
First, I would like to modify the name of the information I display with the tooltip:
Year instead of properties.annee
Region instead of properties.region
Bioenergy instead of properties.bioenerie...
Second, I would like to round the values displayed in the tooltip.
"11.2" instead of "11.1687087653"
The code I wrote does what I want for the labels I put in the regions but it is not working for the tooltip.
Third, I would like to display the unit in the labels and in the tooltip but I don't find the correct syntax in the documentation.
Below is my code.
Thanks in advance for yous answers.
Bertrand
Current result of my code
def gen_map(data: gpd.geodataframe.GeoDataFrame, title: str, abs_values: bool):
data_json = json.loads(data.to_json())
choro_data = alt.Data(values=data_json['features'])
# Absolute values or relative values
if abs_values:
column = data.columns[0]
units = 'MW'
form = '.0f'
else:
column = data.columns[1]
units = '%'
form = '.1f'
# Base layer
layer = alt.Chart(choro_data, title=title).mark_geoshape(
stroke='white',
strokeWidth=1
).encode(
alt.Color(f'properties.{column}:Q',
type='quantitative',
title = f'Installed Capacity in {units}'),
tooltip=[f'properties.annee:Q',
f'properties.region:O',
f'properties.{column}:Q',
alt.Text(f'properties.{column}:Q', format=form)]
).transform_lookup(
lookup='region',
from_=alt.LookupData(choro_data, 'region')
).properties(
width=600,
height=500
)
# Label layer
labels = alt.Chart(choro_data).mark_text(baseline='top'
).properties(
width=600,
height=500
).encode(
longitude='properties.centroid_lon:Q',
latitude='properties.centroid_lat:Q',
text=alt.Text(f'properties.{column}:Q', format=form),
size=alt.value(14),
opacity=alt.value(1)
)
return layer + labels
gen_map(bioenergies_2019, 'Bioenergy in France in 2019', False)

Instead of a list of strings, use a list of alt.Tooltip objects:
tooltip=[alt.Tooltip('properties.annee:Q', title='Annee'),
alt.Tooltip('properties.region:O', title='Region'),
alt.Tooltip(f'properties.{column}:Q', title=f'{column}')]
You can additionally pass the format argument to specify the format of the value; for number formats, use d3-format codes; for date/time formats use d3-date-format codes.

Related

Is there a way to specify what the legend shows in Altair?

I have the following graph in Altair:
The code used to generate it is as follows:
data = pd.read_csv(data_csv)
display(data)
display(set(data['algo_score_raw']))
# First generate base graph
base = alt.Chart(data).mark_circle(opacity=1, stroke='#4c78a8').encode(
x=alt.X('Paragraph:N', axis=None),
y=alt.Y('Section:N', sort=list(OrderedDict.fromkeys(data['Section']))),
size=alt.Size('algo_score_raw:Q', title="Number of Matches"),
).properties(
width=900,
height=500
)
# Next generate the overlying graph with the lines
lines = alt.Chart(data).mark_rule(stroke='#4c78a8').encode(
x=alt.X('Paragraph:N', axis=alt.Axis(labelAngle=0)),
y=alt.Y('Section:N', sort=list(OrderedDict.fromkeys(data['Section'])))
).properties(
width=900,
height=500
)
if max(data['algo_score_raw']) == 0:
return lines # no circles if no matches
else:
return base + lines
However, I don't want the decimal values in my legend; I only want 1.0, 2.0, and 3.0, because those are the only values that are actually present in my data. However, Altair seems to default to what you see above.
The legend is generated based on how you specify your encoding. It sounds like your data are better represented as ordered categories than as a continuous quantitative scale. You can specify this by changing the encoding type to ordinal:
size=alt.Size('algo_score_raw:O')
You can read more about encoding types at https://altair-viz.github.io/user_guide/encoding.html
You can use alt.Legend(tickCount=2)) (labelExpr could also be helpful, see the docs for more):
import altair as alt
from vega_datasets import data
source = data.cars()
source['Acceleration'] = source['Acceleration'] / 10
chart = alt.Chart(source).mark_circle(size=60).encode(
x='Horsepower',
y='Miles_per_Gallon',
size='Acceleration',
)
chart
chart.encode(size=alt.Size('Acceleration', legend=alt.Legend(tickCount=2)))

How to control the number of stacked bars through single select widget in python bokeh

I have created a vertical stacked bar chart using python bokeh on an input dataset df using the following code -
print(df.head())
YearMonth A B C D E
0 Jan'18 1587.816 1586.544 856.000 1136.464 1615.360
1 Feb'18 2083.024 1847.808 1036.000 1284.016 2037.872
2 Mar'18 2193.420 1850.524 1180.000 1376.028 2076.464
3 Apr'18 2083.812 1811.636 1192.028 1412.028 2104.588
4 May'18 2379.976 2091.536 1452.000 1464.432 2400.876
Stacked Bar Chart Code -
products = ['python', 'pypy', 'jython']
customers = ['Cust 1', 'Cust 2']
colours = ['red', 'blue']
data = {
'products': products,
'Cust 1': [200, 850, 400],
'Cust 2': [600, 620, 550],
'Retail 1' : [100, 200, 300],
'Retail 2' : [400,500,600]
}
source = ColumnDataSource(data)
# Set up widgets
select=Select(options=['customers','retailers'],value='customers')
def make_plot() :
p=figure()
#p.title.text=select.value
if select.value=='customers' :
customers=['cust 1','cust 2']
else :
customers=['Retail 1','Retail 2']
p.hbar_stack(customers, y='products', height=0.5, source=source, color=colours)
return p
layout = column(select, make_plot())
# Set up callbacks
def update_data(attrname, old, new):
p = make_plot() # make a new plot
layout.children[1] = p
select.on_change('value', update_data)
# # Set up layouts and add to document
curdoc().add_root(layout)
Now I want to limit the number of segments(ie.stacked bars) by using a widget (preferrably by a single select widget). Can anyone please guide me how can i achieve using bokeh serve functionality. I don't want to use Javascript call back function.
This would take some non-trivial work to make happen. The vbar_stack method is a convenience function that actually creates multiple glyph renderers, one for each "row" in the initial stacking. What's more the renderers are all inter-related to one another, via the Stack transform that stacks all the previous renderers at each step. So there is not really any simple way to change the number of rows that are stacked after the fact. So much so that I would suggest simply deleting and re-creating the entire plot in each callback. (I would not normally recommend this approach, but this situation is one of the few exceptions.)
Since you have not given complete code or even mentioned what widget you want to use, all I can provide is a high level sketch of the code. Here is a complete example that updates a plot based on select widget:
from bokeh.layouts import column
from bokeh.models import Select
from bokeh.plotting import curdoc, figure
select = Select(options=["1", "2", "3", "4"], value="1")
def make_plot():
p = figure()
p.circle(x=[0,2], y=[0, 5], size=15)
p.circle(x=1, y=float(select.value), color="red", size=15)
return p
layout = column(select, make_plot())
def update(attr, old, new):
p = make_plot() # make a new plot
layout.children[1] = p # replace the old plot
select.on_change('value', update)
curdoc().add_root(layout)
Note I have changed your show call to curdoc().add_root since it is never useful to call show in a Bokeh server application. You might want to refer to and study the User Guide chapter Running a Bokeh Server for background information, if necessary.

Format labels on bar charts in Altair

[![chart showing numbers without correct formatting][1]][1]
I need to format the label on these bars, so that they are rounded to nearest whole number. I have the following code:
def chart_tender_response_times(dataframe=None):
chart = (
alt.Chart(dataframe, title="Median time to respond to a tender")
.mark_bar()
.encode(
alt.X("year(date):O"
),
alt.Y("mean(median_duration):Q",
## This is our units section, only describe the units of measurement here.
axis=alt.Axis(title="Unit: days.")
),
alt.Tooltip(["mean(median_duration):Q"], format=",.2r", title="Days to respond to a tender")
)
)
text = (
chart.mark_text(align="center", baseline="bottom")
.encode(text='mean(median_duration):Q')
)
return chart+text
I've tried variations of the following...
text = (
chart.mark_text(align="center", baseline="bottom")
.encode(text='mean(median_duration):Q', format='.,2r')
)
but this returns the following schema validation error:
SchemaValidationError: Invalid specification
altair.vegalite.v3.api.Chart, validating 'required'
'data' is a required property
My hunch is that I have to somehow call and format the value, before adding it to the chart, but I can't see how to do this from either the documentation or the examples.
You need to wrap the format in alt.Text, as in encode(text=alt.Text('mean(median_duration):Q', format=',.2r'))
Also, I think format=',.0f' is more robust to round to the nearest integer (e.g. if you have 256.4, it would be rounded to 256, whereas with format=',.2r' you'd get 260)
Below is an example with a function a bit modified to fit another dataset (as you did not provide one):
import altair as alt
from vega_datasets import data
cars = data("cars")
def chart_tender_response_times(dataframe=None):
chart = (
alt.Chart(dataframe, title="Median time to respond to a tender")
.mark_bar()
.encode(
alt.X("year(Year):O"),
alt.Y(
"mean(Displacement):Q",
## This is our units section, only describe the units of measurement here.
axis=alt.Axis(title="Unit: days."),
),
alt.Tooltip(
["mean(Displacement):Q"],
format=",.0f",
title="Days to respond to a tender",
),
)
)
text = chart.mark_text(align="center", baseline="bottom").encode(
text=alt.Text("mean(Displacement):Q", format=",.0f")
)
return chart + text
chart_tender_response_times(cars)

How to label line chart with column from pandas dataframe (from 3rd column values)?

I have a data set I filtered to the following (sample data):
Name Time l
1 1.129 1G-d
1 0.113 1G-a
1 3.374 1B-b
1 3.367 1B-c
1 3.374 1B-d
2 3.355 1B-e
2 3.361 1B-a
3 1.129 1G-a
I got this data after filtering the data frame and converting it to CSV file:
# Assigns the new data frame to "df" with the data from only three columns
header = ['Names','Time','l']
df = pd.DataFrame(df_2, columns = header)
# Sorts the data frame by column "Names" as integers
df.Names = df.Names.astype(int)
df = df.sort_values(by=['Names'])
# Changes the data to match format after converting it to int
df.Time=df.Time.astype(int)
df.Time = df.Time/1000
csv_file = df.to_csv(index=False, columns=header, sep=" " )
Now, I am trying to graph lines for each label column data/items with markers.
I want the column l as my line names (labels) - each as a new line, Time as my Y-axis values and Names as my X-axis values.
So, in this case, I would have 7 different lines in the graph with these labels: 1G-d, 1G-a, 1B-b, 1B-c, 1B-d, 1B-e, 1B-a.
I have done the following so far which is the additional settings, but I am not sure how to graph the lines.
plt.xlim(0, 60)
plt.ylim(0, 18)
plt.legend(loc='best')
plt.show()
I used sns.lineplot which comes with hue and I do not want to have name for the label box. Also, in that case, I cannot have the markers without adding new column for style.
I also tried ply.plot but in that case, I am not sure how to have more lines. I can only give x and y values which create only one line.
If there's any other source, please let me know below.
Thanks
The final graph I want to have is like the following but with markers:
You can apply a few tweaks to seaborn's lineplot. Using some created data since your sample isn't really long enough to demonstrate:
# Create data
np.random.seed(2019)
categories = ['1G-d', '1G-a', '1B-b', '1B-c', '1B-d', '1B-e', '1B-a']
df = pd.DataFrame({'Name':np.repeat(range(1,11), 10),
'Time':np.random.randn(100).cumsum(),
'l':np.random.choice(categories, 100)
})
# Plot
sns.lineplot(data=df, x='Name', y='Time', hue='l', style='l', dashes=False,
markers=True, ci=None, err_style=None)
# Temporarily removing limits based on sample data
#plt.xlim(0, 60)
#plt.ylim(0, 18)
# Remove seaborn legend title & set new title (if desired)
ax = plt.gca()
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:], title='New Title', loc='best')
plt.show()
To apply markers, you have to specify a style variable. This can be the same as hue.
You likely want to remove dashes, ci, and err_style
To remove the seaborn legend title, you can get the handles and labels, then re-add the legend without the first handle and label. You can also specify the location here and set a new title if desired (or just remove title=... for no title).
Edits per comments:
Filtering your data to only a subset of level categories can be done fairly easily via:
categories = ['1G-d', '1G-a', '1B-b', '1B-c', '1B-d', '1B-e', '1B-a']
df = df.loc[df['l'].isin(categories)]
markers=True will fail if there are too many levels. If you are only interested in marking points for aesthetic purposes, you can simply multiply a single marker by the number of categories you are interested in (which you have already created to filter your data to categories of interest): markers='o'*len(categories).
Alternatively, you can specify a custom dictionary to pass to the markers argument:
points = ['o', '*', 'v', '^']
mult = len(categories) // len(points) + (len(categories) % len(points) > 0)
markers = {key:value for (key, value)
in zip(categories, points * mult)}
This will return a dictionary of category-point combinations, cycling over the marker points specified until each item in categories has a point style.

Changing width of heatmap in Seaborn to compensate for font size reduction

I have a sentence like say
Hey I am feeling pretty boring today and the day is dull too
I pass it through the openai sentiment code which gives me some neuron weights which can be equal or little greater then number of words.
Neuron weights are
[ 0.01258736, 0.03544582, 0.08490616, 0.09010842, 0.07180552,
0.07271874, 0.08906463, 0.09690772, 0.10281454, 0.08131664,
0.08315734, 0.0790544 , 0.07770097, 0.07302617, 0.07329235,
0.06856266, 0.07642639, 0.08199468, 0.09079508, 0.09539193,
0.09061056, 0.07109602, 0.02138061, 0.02364372, 0.00322057,
0.01517018, 0.01150052, 0.00627739, 0.00445003, 0.00061127,
0.0228037 , -0.29226044, -0.40493113, -0.4069235 , -0.39796737,
-0.39871565, -0.39242673, -0.3537892 , -0.3779315 , -0.36448184,
-0.36063945, -0.3506464 , -0.36719123, -0.37997353, -0.35103855,
-0.34472692, -0.36256564, -0.35900915, -0.3619383 , -0.3532831 ,
-0.35352525, -0.33328298, -0.32929575, -0.33149993, -0.32934144,
-0.3261477 , -0.32421976, -0.3032671 , -0.47205922, -0.46902984,
-0.45346943, -0.4518705 , -0.50997925, -0.50997925]
Now what I wanna do is plot a heatmap , the positive values shows positive sentiments while negative ones shows negative sentiment and I am plotting the heat map but the heatmap isn't plotting like it should be
But when the sentence gets longer the whole sentence gets smaller and smaller that can't be seen ,So what changes should I do to make it show better.
Here is my plotting function:
def plot_neuron_heatmap(text, values, savename=None, negate=False, cell_height=.112, cell_width=.92):
#n_limit = 832
cell_height=.325
cell_width=.15
n_limit = count
num_chars = len(text)
text = list(map(lambda x: x.replace('\n', '\\n'), text))
num_chars = len(text)
total_chars = math.ceil(num_chars/float(n_limit))*n_limit
mask = np.array([0]*num_chars + [1]*(total_chars-num_chars))
text = np.array(text+[' ']*(total_chars-num_chars))
values = np.array((values+[0])*(total_chars-num_chars))
values = values.reshape(-1, n_limit)
text = text.reshape(-1, n_limit)
mask = mask.reshape(-1, n_limit)
num_rows = len(values)
plt.figure(figsize=(cell_width*n_limit, cell_height*num_rows))
hmap=sns.heatmap(values, annot=text, mask=mask, fmt='', vmin=-5, vmax=5, cmap='RdYlGn',xticklabels=False, yticklabels=False, cbar=False)
plt.subplots_adjust()
#plt.tight_layout()
plt.savefig('fig1.png')
#plt.show()
This is how it shows the lengthy text as
What I want it to show
Here is a link to the full notebook: https://github.com/yashkumaratri/testrepo/blob/master/heatmap.ipynb
Mad Physicist , Your code does this
and what really it should do is
The shrinkage of the font you are seeing is to be expected. As you add more characters horizontally, the font shrinks to fit everything in. There are a couple of solutions for this. The simplest would be to break your text into smaller chunks, and display them as you show in your desired output. Also, you can print your figure with a different DPI with what is shown on the screen, so that the letters will look fine in the image file.
You should consider cleaning up your function along the way:
count appears to be a global that is never used.
You redefine variables without ever using the original value (e.g. num_chars and the input parameters).
You have a whole bunch of variables you don't really use.
You recompute a lot of quantities multiple times.
The expression list(map(lambda x: x.replace('\n', '\\n'), text)) is total overkill: list(text.replace('\n', '\\n')) does the same thing.
Given that len(values) != len(text) for most cases, the line values = np.array((values+[0])*(total_chars-num_chars)) is nonsense and needs cleanup.
You are constructing numpy arrays by doing padding operations on lists, instead of using the power of numpy.
You have the entire infrastructure for properly reshaping your arrays already in place, but you don't use it.
The updated version below fixes the minor issues and adds n_limit as a parameter, which determines how many characters you are willing to have in a row of the heat map. As I mentioned in the last item, you already have all the necessary code to reshape your arrays properly, and even mask out the extra tail you end up with sometimes. The only thing that is wrong is the -1 in the shape, which always resolves to one row because of the remainder of the shape. Additionally, the figure is always saved at 100dpi, so the results should come out consistent for a given width, no matter how many rows you end up with. The DPI affects PNG because it increases or decreases the total number of pixels in the image, and PNG does not actually understand DPI:
def plot_neuron_heatmap(text, values, n_limit=80, savename='fig1.png',
cell_height=0.325, cell_width=0.15, dpi=100):
text = text.replace('\n', '\\n')
text = np.array(list(text + ' ' * (-len(text) % n_limit)))
if len(values) > text.size:
values = np.array(values[:text.size])
else:
t = values
values = np.zeros(text.shape, dtype=np.int)
values[:len(t)] = t
text = text.reshape(-1, n_limit)
values = values.reshape(-1, n_limit)
# mask = np.zeros(values.shape, dtype=np.bool)
# mask.ravel()[values.size:] = True
plt.figure(figsize=(cell_width * n_limit, cell_height * len(text)))
hmap = sns.heatmap(values, annot=text, fmt='', vmin=-5, vmax=5, cmap='RdYlGn', xticklabels=False, yticklabels=False, cbar=False)
plt.subplots_adjust()
plt.savefig(savename if savename else 'fig1.png', dpi=dpi)
Here are a couple of sample runs of the function:
text = 'Hey I am feeling pretty boring today and the day is dull too'
values = [...] # The stuff in your question
plot_neuron_heatmap(text, values)
plot_neuron_heatmap(text, values, 20)
plot_neuron_heatmap(text, values, 7)
results in the following three figures:

Resources