adjusting properties (e.g. width and height) on a layered faceted chart produce a 'data is a required property error' - altair

I'm trying to adjust the width and height of a faceted layered chart. I have these two charts:
bar_chart = alt.Chart().mark_bar().encode(x='x', y='mean(y)')
text_overlay = bar_chart.mark_text().encode(text='mean(y)')
if I try to adjust the width after I layered the chart with:
alt.layer(bar_chart, text_overlay, data=df).facet('z').properties(width=100)
I get a 'data' is a required property error.
I can change the width and height by adjusting one of the original charts with:
bar_chart = alt.Chart().mark_bar().encode(x='x', y='mean(y)').properties(width=100, height=200)
but I'm trying to return this chart as the output of a function, so I'd like to allow the user to adjust the properties outside of the function.
Is there any way around this error that doesn't require to apply the properties to the original charts?
Thank you.

I think you can only use .properties when using facet as an encoding, but that is not compatible with layering. You could use the object oriented syntax to set the property after creation of the faceted layered chart:
import altair as alt
import pandas as pd
chart = alt.Chart(pd.DataFrame({'x': [1, 2], 'y': ['b', 'a']})).mark_point().encode(x='x', y='y')
chart_layered = (chart + chart).facet(facet='y')
chart_layered.spec.width = 100
chart_layered
To figure out which these attribues are, you could create a faceted layered chart using .properties the right way and study its dictionary or json output:
chart = alt.Chart(pd.DataFrame({'x': [1, 2], 'y': ['b', 'a']})).mark_point().encode(x='x', y='y').properties(width=100).facet(facet='y')
chart.to_dict()
{'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}},
'data': {'name': 'data-5f40ae3874157bbf64df213f9a844d59'},
'facet': {'type': 'nominal', 'field': 'y'},
'spec': {'mark': 'point',
'encoding': {'x': {'type': 'quantitative', 'field': 'x'},
'y': {'type': 'nominal', 'field': 'y'}},
'width': 100},
'$schema': 'https://vega.github.io/schema/vega-lite/v4.8.1.json',
'datasets': {'data-5f40ae3874157bbf64df213f9a844d59': [{'x': 1, 'y': 'b'},
{'x': 2, 'y': 'a'}]}}

Related

How to customize image mark width and height to act as barchart bars in altair?

I am exploring the image mark in altair. I tried to make bar charts with image as bars,
source = pd.DataFrame({
'a': ['A', 'B', 'C'],
'b': [28, 12, 77],
'url': ['https://vega.github.io/vega-datasets/data/7zip.png',
'https://vega.github.io/vega-datasets/data/gimp.png',
'https://vega.github.io/vega-datasets/data/ffox.png']
})
init = alt.Chart(source).mark_image(
# width= 50,
).encode(
x='a',
y='b',
url = 'url',
size=alt.Size('b:N', scale=None),
# color = 'a'
).properties(
width=512,
height=512
).configure_axis(
grid=False
)
my current result is like this:
but I want to make the height of images corresponding to y value while keep same width, like this:
Am I able to achieve this by altair? Thanks!
The documentation mentions an aspect parameter which will do what you are looking for, but I couldn't find an example.
Vega-lite, on which Altair is based, allows changing the image aspect ratio as seen here. You can try looking at the source of the example there and figure out how to make it work in Altair.

Plotting a Pie chart from a dictionary?

I am attempting to construct a pie chart of the weighting of certain sectors in a index.
given this sample data frame.
Data = { 'Company': ['Google', 'Nike', 'Goldman', 'Tesla'], 'Ticker': ['GOOG', 'NKE', 'GGG', 'TSA'], 'Sector': ['Tech', 'Consumer', 'Financial', 'Auto'], 'Weighting': ['10', '20', '40', '30']
df = pd.DataFrame(Data)
I have tried to create a dictionary of weights by sector.
Weight_Sector = df.groupby('Sector')['Weighting'].apply(list).to_dict()
I checked the dictionary and all was good. However when I went on to plot the pie chart with the following code, I get an error 'ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (9,) + inhomogeneous part.
plt.pie(Weight_Sector.values(), startangle=90, autopct='%.1f%%')
In my actual data frame and dictionary there is a lot more values and most dictionary sectors have more than 1 item as a weight in them corresponding to different companies.
Any help is appreciated!!
There are two problems:
the values need to be numeric, now they are strings
the individual values shouldn't be put into lists, they need to be just one number, e.g. taking the sum of all values belonging to the same category
As the original data only has one entry per category, the following example adds an extra 'Tech'.
from matplotlib import pyplot as plt
import pandas as pd
Data = {'Company': ['Microsoft', 'Google', 'Nike', 'Goldman', 'Tesla'], 'Ticker': ['MSFT', 'GOOG', 'NKE', 'GGG', 'TSA'],
'Sector': ['Tech', 'Tech', 'Consumer', 'Financial', 'Auto'], 'Weighting': ['1', '10', '20', '40', '30']}
df = pd.DataFrame(Data)
df['Weighting'] = df['Weighting'].astype(float) # make numeric
Weight_Sector = df.groupby('Sector')['Weighting'].sum().to_dict()
plt.pie(Weight_Sector.values(), labels=Weight_Sector.keys(),
startangle=90, autopct='%.1f%%', colors=plt.cm.Set2.colors)
plt.show()

How to apply a linestyle to a specific line in Seaborn lineplot?

I have the a dataframe where in the column Products there are many different items, let's show only a few:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data = np.array([[1, 1, 570], [2, 1, 650], [1, 2, 27], [2, 2, 64], [1, 3, 125], [2, 3, 216],
[1, 'item_1', 343], [2, 'item_1', 340], [1, 'item_2', 343], [2, 'item_2', 345]])
df = pd.DataFrame(data=data, columns=["Flag", "Products", "Value"])
I'm using Seaborn to get the following lineplot:
sns.set_theme()
sns.set_style("ticks")
sns.set_context("paper")
fig1, ax1 = plt.subplots()
sns.lineplot(data=df, x="Flag", y="Value",
hue="Products", style="Products", ax=ax1)
plt.legend(bbox_to_anchor=(1.02, 1),borderaxespad=0)
fig1.tight_layout()
In this way all the lines have a style "chosen" by Seaborn, but I need to set a specific color and a line style (not dashed) for the products named 'item_1' and 'item_2'.
Up to now I have found the following solution:
palette = {c:'red' if c=='item_1' else 'blue' for c in df.Products.unique()}
sns.lineplot(data=df, x="Flag", y="Value",
hue="Products", style="Products", palette=palette, ax=ax1)
So, I can set the red color for the item_1 only, but all the other lines are blue, while I'd like to:
set the red color and not-dashed lines for both items_1 and items_2
set another color palette (e.g. bright) for all the other lines
Is it possible to do that?
palette= and dashes= can be passed a dictionary mapping levels of the column used to different colors/styles.
You can generate these dictionaries by hand or programmatically (depending on how many levels you have).
for instance, the color palette:
#color palette
cmap = sns.color_palette("bright")
palette = {key:value for key,value in zip(data[hue_col].unique(), cmap)}
palette['item_1'] = 'red'
palette['item_2'] = 'red'
output:
{'1': (0.00784313725490196, 0.24313725490196078, 1.0),
'2': (1.0, 0.48627450980392156, 0.0),
'3': (0.10196078431372549, 0.788235294117647, 0.2196078431372549),
'item_1': 'red,
'item_2': 'red'}
we give each level a different color from the "bright" palette, and we can fix some values by hand if needed (although do keep in mind that there is a color very similar to red in the bright palette already, so there might be some possible confusion).
The same can be done for the dash style:
#style palette
dash_list = sns._core.unique_dashes(data[style_col].unique().size+1)
style = {key:value for key,value in zip(data[style_col].unique(), dash_list[1:])}
style['item_1'] = '' # empty string means solid
style['item_2'] = ''
output:
{'1': (4, 1.5),
'2': (1, 1),
'3': (3, 1.25, 1.5, 1.25),
'item_1': '',
'item_2': ''}
Here I use one of seaborn's private functions (use at your own risk, could change at any time), to generate a list of dash styles, and then manually set the particular levels I want to have a solid line. I request one too many items in dash_list because the first element is always a solid line, and I want to reserve solid lines for item_1 and item_2.
full code:
data = df
x_col = 'Flag'
y_col = "Value"
hue_col = "Products"
style_col = "Products"
#color palette
cmap = sns.color_palette("bright")
palette = {key:value for key,value in zip(data[hue_col].unique(), cmap)}
palette['item_1'] = 'red'
palette['item_2'] = 'red'
#style palette
dash_list = sns._core.unique_dashes(data[style_col].unique().size+1)
style = {key:value for key,value in zip(data[style_col].unique(), dash_list[1:])}
style['item_1'] = '' # empty string means solid
style['item_2'] = ''
sns.set_theme()
sns.set_style("ticks")
sns.set_context("paper")
fig1, ax1 = plt.subplots()
sns.lineplot(data=df, x=x_col, y=y_col,
hue=hue_col, palette=palette,
style=style_col, dashes=style,
ax=ax1)
plt.legend(bbox_to_anchor=(1.02, 1),borderaxespad=0)
fig1.tight_layout()
Thanks a lot #Diziet Asahi, indeed this works in the example dataframe! But, in my full dataframe that contains many more items I get the error message:
The palette dictionary is missing keys: {'9', '20', ...}
I guess that this is due to the fact that the default color palette in seaborn is a qualitative palette with ten distinct hues only.
To handle this error I set the following color palette:
cmap = sns.color_palette("hls", 75)
This works also choosing the "husl" color space.

How to display normalized categories in Altair stacked bar chart tooltip

I'm creating a stacked bar chart using the count of a categorical field in a dataframes column.
chart = alt.Chart(df2).mark_bar().encode(
x="take__take:O",
y=alt.Y('count(name)', stack="normalize", axis=alt.Axis(title="Percent", format="%")),
color=alt.Color('name', sort=alt.EncodingSortField('value', order='descending')),
order=alt.Order(
'value',
sort="ascending"
),
tooltip=[
alt.Tooltip('count(name)', title="Total Students")
]
)
How would I go about getting the normalized count in the tooltip?
Up until now your chart uses encoding shorthands to compute various aggregates; for more complicated operations (like displaying normalized values in tooltips) you will need to use transforms directly.
Here is an example of displaying per-group percentages in a tooltip, using a chart similar to what you showed above:
import altair as alt
import numpy as np
import pandas as pd
np.random.seed(0)
df2 = pd.DataFrame({
'name': np.random.choice(['A', 'B', 'C', 'D'], size=100),
'value': np.random.randint(0, 20, 100),
'take__take': np.random.randint(0, 5, 100)
})
alt.Chart(df2).transform_aggregate(
count='count()',
groupby=['name', 'take__take']
).transform_joinaggregate(
total='sum(count)',
groupby=['take__take']
).transform_calculate(
frac=alt.datum.count / alt.datum.total
).mark_bar().encode(
x="take__take:O",
y=alt.Y('count:Q', stack="normalize", axis=alt.Axis(title="Percent", format="%")),
color='name:N',
tooltip=[
alt.Tooltip('count:Q', title="Total Students"),
alt.Tooltip('frac:Q', title="Percentage of Students", format='.0%')
]
)

A gauge chart using XlsxWriter?

I tried to find a gauge chart based on XlsxWriter but I haven't found anything on the web. I don't want to reinvent the wheel. Does someone have already build this chart or know where I can find a python script?
Here is an example of how to create a gauge chart by combining a doughnut chart and and pie chart based on this tutorial.
Note, this requires XlsxWriter >= 1.0.8:
import xlsxwriter
workbook = xlsxwriter.Workbook('chart_gauge.xlsx')
worksheet = workbook.add_worksheet()
chart_doughnut = workbook.add_chart({'type': 'doughnut'})
chart_pie = workbook.add_chart({'type': 'pie'})
# Add some data for the Doughnut and Pie charts. This is set up so the
# gauge goes from 0-100. It is initially set at 75%.
worksheet.write_column('H2', ['Donut', 25, 50, 25, 100])
worksheet.write_column('I2', ['Pie', 75, 1, '=200-I4-I3'])
# Configure the doughnut chart as the background for the gauge.
chart_doughnut.add_series({
'name': '=Sheet1!$H$2',
'values': '=Sheet1!$H$3:$H$6',
'points': [
{'fill': {'color': 'green'}},
{'fill': {'color': 'yellow'}},
{'fill': {'color': 'red'}},
{'fill': {'none': True}}],
})
# Rotate chart so the gauge parts are above the horizontal.
chart_doughnut.set_rotation(270)
# Turn off the chart legend.
chart_doughnut.set_legend({'none': True})
# Turn off the chart fill and border.
chart_doughnut.set_chartarea({
'border': {'none': True},
'fill': {'none': True},
})
# Configure the pie chart as the needle for the gauge.
chart_pie.add_series({
'name': '=Sheet1!$I$2',
'values': '=Sheet1!$I$3:$I$6',
'points': [
{'fill': {'none': True}},
{'fill': {'color': 'black'}},
{'fill': {'none': True}}],
})
# Rotate the pie chart/needle to align with the doughnut/gauge.
chart_pie.set_rotation(270)
# Combine the pie and doughnut charts.
chart_doughnut.combine(chart_pie)
# Insert the chart into the worksheet.
worksheet.insert_chart('A1', chart_doughnut)
workbook.close()
Output:

Resources