Altair: Remove title from layered faceted graphs - altair

I tried layering faceted graphs and it failed, so moved to the method suggested in here - https://stackoverflow.com/a/52882510/20390480 which basically layer the graphs and then call .facet(column). With this method I am unable to remove the facet title.
I tried .facet(column, title=None) throws the following error.
import altair as alt
from vega_datasets import data
cars = data.cars()
horse = alt.Chart().mark_point().encode(
x = 'Weight_in_lbs',
y = 'Horsepower'
)
miles = alt.Chart().mark_point(color='red').encode(
x = 'Weight_in_lbs',
y = 'Miles_per_Gallon'
)
alt.layer(horse, miles, data=cars).facet(column='Origin', title=None)
SchemaValidationError: Invalid specification
altair.vegalite.v4.api.Chart, validating 'required'
'data' is a required property
alt.FacetChart(...)

Try:
alt.layer(horse, miles, data=cars).facet(column=alt.Column('Origin', title=None))

Related

Altair dropdown for linear or log scale

I'd like for be able to toggle between log and linear scale in my altair plot. I'd also like to avoid multiple columns of transformed data if possible. I've tried this but get an error AttributeError: 'Scale' object has no attribute 'selection'
import altair as alt
from vega_datasets import data
cars_data = data.cars()
input_dropdown = alt.binding_select(options=['linear','log'], name='Scale')
selection = alt.selection_single(fields=['Miles_per_Gallon'], bind=input_dropdown)
scale = alt.condition(selection, alt.Scale(type = 'linear'), alt.Scale(type = 'log'))
alt.Chart(cars_data).mark_point().encode(
x='Horsepower:Q',
y = alt.Y('Miles_per_Gallon:Q',
scale=scale),
tooltip='Name:N'
).add_selection(
scale
)
I've tried a variety of different things but can't seem to make it work. Any suggestions are greatly appreciated.

how to avoid overlapping of barchart using matplotlib

I am new to data science using python . While plotting two different barcharts I got into an problem. This is my code :
def compare_groups(field):
if field in less_equal_150_cal.columns:
less_equal_150_cal[field].plot.bar(color = 'blue',alpha =0.4, title = field )
more_150_cal[field].plot.bar(color = 'red', alpha =0.4)
else:
raise ValueError(f"{field} not found")
The resulting bargraphs are overlapping each other. I want two different bar graphs.

How do I plot the groupby in altair?

I did groupby by genre and trying to plot using altair and I'm getting the error below.
disney_revenue = disney_movies.assign(inflation_adjusted_gross = disney_movies['inflation_adjusted_gross'].str.strip('$').str.replace(',','').astype(float))
disney_total_revenue = disney_revenue.assign(total_gross = disney_revenue['total_gross'].str.strip('$').str.replace(',','').astype(float))
disney_group = disney_total_revenue.groupby(by='genre')
chart2 = alt.Chart(disney_group, width=500, height=300).mark_circle().encode(
x='movie_title:N',
y='inflation_adjusted_gross:Q').properties(title='Total Adjusted Gross per Genre')
chart2
---------------------------------------------------------------------------
SchemaValidationError: Invalid specification
altair.vegalite.v4.api.Chart->0, validating 'type'
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f0611f2ac10> is not of type 'object'
You cannot pass a pandas groupby object to alt.Chart – you must pass a dataframe. But if you want to visualize grouped data, you can do that via the Altair encoding syntax. For example, here is a version of the chart you were trying to create, faceted by genre:
alt.Chart(disney_total_revenue).mark_circle().encode(
x='movie_title:N',
y='inflation_adjusted_gross:Q',
facet='genre:N',
).properties(
title='Total Adjusted Gross per Genre'
)

Hide the grid in an a specificaltair plot within a set of vstacked plots

I am trying to create a plot composed of 2 charts stacked vertically: a time series chart showing a data and below it a time series chart showing texts representing events on the time axis. I want the data-chart having a grid, but the mark_text chart below not to show an outer line and no grid. I use the chart.configure_axis(grid=False) command to hide the axis but get the following error: Objects with "config" attribute cannot be used within LayerChart. Consider defining the config attribute in the LayerChart object instead.
I can't figure out, where to apply the configure_axis(grid=False) option, so it will only apply to the bottom plot. any help on this would be greatly appreciated. or any suggestion how to implement the label-plot in a different way.
here is my code:
import altair as alt
import pandas as pd
import locale
from altair_saver import save
from datetime import datetime
file = '.\lagebericht.csv'
df = pd.read_csv(file, sep=';')
source = df
locale.setlocale(locale.LC_ALL, "de_CH")
min_date = '2020-02-29'
domain_pd = pd.to_datetime([min_date, '2020-12-1']).astype(int) / 10 ** 6
base = alt.Chart(source, title='Neumeldungen BS').encode(
alt.X('test_datum:T', axis=alt.Axis(title="",format="%b %y"), scale = alt.Scale(domain=list(domain_pd) ))
)
bar = base.mark_bar(width = 1).encode(
alt.Y('faelle_bs:Q', axis=alt.Axis(title="Anzahl Fälle"), scale = alt.Scale(domain=(0, 120)))
)
line = base.mark_line(color='blue').encode(
y='faelle_Total:Q')
chart1 = (bar + line).properties(width=600)
events= pd.DataFrame({
'datum': [datetime(2020,7,1), datetime(2020,5,15)],
'const': [1,1],
'label': ['allgememeiner Lockdown', 'Gruppen > 50 verboten'],
})
base = alt.Chart(events).encode(
alt.X('datum:T', axis=alt.Axis(title="", format="%b %y"), scale = alt.Scale(domain=list(domain_pd) ))
)
points = base.mark_rule(color='blue').encode(
y=alt.Y('const:Q', axis=alt.Axis(title="",ticks=False, domain=False, labels=False), scale = alt.Scale(domain=(0, 10)))
)
text = base.mark_text(
align='right',
baseline='bottom',
angle = 20,
dx=0, # Nudges text to right so it doesn't appear on top of the bar
dy=20,
).encode(text='label:O').configure_axis(grid=False)
chart2 = (points + text).properties(width=600, height = 50)
save(chart1 & chart2, r"images\figs.html")
this is what it looks without the grid=False option:
enter image description here
The configure() method should be thought of as a way to specify a global chart theme; you cannot have different configurations within a single Chart (See https://altair-viz.github.io/user_guide/customization.html#global-config-vs-local-config-vs-encoding for a discussion of this).
The way to do what you want is not via global configuration, but via axis settings. For example, you can pass grid=False to alt.Axis:
points = alt.Chart(events).mark_rule(color='blue').encode(
x=alt.X('datum:T', axis=alt.Axis(title="", format="%b %y"), scale = alt.Scale(domain=list(domain_pd) )),
y=alt.Y('const:Q', axis=alt.Axis(title="",ticks=False, domain=False, labels=False), scale = alt.Scale(domain=(0, 10)))
)
text = alt.Chart(events).mark_text().encode(
x=alt.X('datum:T', axis=alt.Axis(title="", grid=False, format="%b %y"), scale = alt.Scale(domain=list(domain_pd) )),
text='label:O'
)

Bokeh plot returned in function not rendering

I was writing a function to simplify my plotting, it dose not give any error yet when I call
show(plt)
on the return value nothing happens. I'm working in a Jupyter notebook. I've alredy made a call to :
output_notebook()
Here is the function code :
def plot_dist(x, h, title, xl="X axis", yl="Y axis", categories=None, width=0.5, bottom=0, color="#DC143C", xmlo=None, ymlo=None, xlo=-18, ylo=5):
total = np.sum(h)
source = ColumnDataSource(data={
"x":x,
"h":h,
"percentages":[str(round((x*100)/total, 2)) + "%" for x in h]
})
plt = figure(
title=title,
x_axis_label=xl,
y_axis_label=yl
)
plt.vbar(
x="x",
width=width,
bottom=bottom,
top="h",
source=source,
color=color
)
if xmlo is None:
if categories is None:
raise ValueError("If no categories are provided xaxis.major_label_overrides must be defined")
plt.xaxis.major_label_overrides = {
int(x):("(" + str(c.left) + "-" + str(c.right) + "]") for x,c in enumerate(categories)
}
else:
plt.xaxis.major_label_overrides = xmlo
if ymlo is None:
plt.yaxis.major_label_overrides = { int(x):(str(int(x)/1000)+"k") for x in range(0, h.max(), math.ceil((h.max()/len(h))) )}
else:
plt.yaxis.major_label_overrides = ymlo
labels = LabelSet(
x=str(x), y=str(h), text="percentages", level="glyph",
x_offset=xlo, y_offset=ylo, source=source, render_mode="canvas"
)
plt.add_layout(labels)
return plt
And this is how it is invoked :
X = [x for x in range(0, len(grps.index))]
H = grps.to_numpy()
plt = plot_dist(X, H, "Test", "xtest", "ytest", grps.index.categories)
X is just a list and grps is the result of a call to pandas' DataFrame.groupby
As I said it dose not give any error so I think the problem is with the ColumnDataSource object, I must be creating it wrong. Any help is appreciated, thanks!
Edit 1 : Apparently removing the following line solved the problem :
plt.add_layout(labels)
The plot now renders correclyt, yet I need to add the labels, any idea?
Edit 2 : Ok I've solved the problem, inspecting the web console when running the code the following error shows :
Error: attempted to retrieve property array for nonexistent field
The problem was in the following lines :
labels = LabelSet(
x=str(x), y=str(h), text="percentages", level="glyph",
x_offset=xlo, y_offset=ylo, source=source, render_mode="canvas"
)
In particular assignin x=str(x) and y=str(h). Changed it to simply x="x" and y="h" solved it.
The problem with the code is with the labels declaration :
labels = LabelSet(
x=str(x), y=str(h), text="percentages", level="glyph",
x_offset=xlo, y_offset=ylo, source=source, render_mode="canvas"
)
It was discovered by inspecting the browser's web console, which gave the following error :
Error: attempted to retrieve property array for nonexistent field
The parameters x and y must refer to the names in the ColumnDataSource object passed to the Glyph method used to draw on the plot.
I was mistakenly passing str(x) and str(y) which, are the string representation of the content. I was mistakenly assuming it would refer to the string representation of the variable.
To solve the problem is sufficient to pass as values to the x and y parameters of the LabelSet constructor the dictionary's keys used in the ColumnDataSource constructor :
labels = LabelSet(
x="x", y="h", text="percentages", level="glyph",
x_offset=xlo, y_offset=ylo, source=source, render_mode="canvas"
)
In addition if the ColumnDataSource was constructed from a DataFrame the strings will be either the columns names, the string "index", if any of the data used in the plot refer to the index and this has no explicit name, or the name of the index object.
Thanks a lot to bigreddot for helping me with the problem and answer.

Resources