Trying to set Altair color to white at zero value - altair

I'm trying to create an Altair heat map where the color is white when my count is zero.
Here is my data frame:
df11 = pd.DataFrame({
'lineid': [0,1,2,0,1,2],
'term': ['data', 'data', 'data', 'explore', 'explore', 'explore'],
'count': [3,2,1,1,0,2],
'title': ['weather', 'weather','weather','weather','weather','weather',]
})
alt.Chart(d11).mark_rect().encode(
x=alt.X('lineid:O', title=None, axis=alt.Axis(ticks=False, labels=False)),
y=alt.Y('term:O', title=None),
color=alt.Color('count:O', legend=None)
)
The encoding gives me this chart (which is what I intended) but I'm trying to get the zero count (2nd row, 2nd column) to be white not light blue.

You can use a conditional encoding as in How to make Altair display NaN points with a quantitative color scale?
import altair as alt
import pandas as pd
df = pd.DataFrame({
'lineid': [0,1,2,0,1,2],
'term': ['data', 'data', 'data', 'explore', 'explore', 'explore'],
'count': [3,2,1,1,0,2],
'title': ['weather', 'weather','weather','weather','weather','weather',]
})
alt.Chart(df).mark_rect().encode(
x=alt.X('lineid:O', title=None, axis=alt.Axis(ticks=False, labels=False)),
y=alt.Y('term:O', title=None),
color=alt.condition(alt.datum.count == 0, alt.value('white'), 'count:O', legend=None)
)

Related

Plotting a Pie chart from a dictionary?

I am attempting to construct a pie chart of the weighting of certain sectors in a index.
given this sample data frame.
Data = { 'Company': ['Google', 'Nike', 'Goldman', 'Tesla'], 'Ticker': ['GOOG', 'NKE', 'GGG', 'TSA'], 'Sector': ['Tech', 'Consumer', 'Financial', 'Auto'], 'Weighting': ['10', '20', '40', '30']
df = pd.DataFrame(Data)
I have tried to create a dictionary of weights by sector.
Weight_Sector = df.groupby('Sector')['Weighting'].apply(list).to_dict()
I checked the dictionary and all was good. However when I went on to plot the pie chart with the following code, I get an error 'ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (9,) + inhomogeneous part.
plt.pie(Weight_Sector.values(), startangle=90, autopct='%.1f%%')
In my actual data frame and dictionary there is a lot more values and most dictionary sectors have more than 1 item as a weight in them corresponding to different companies.
Any help is appreciated!!
There are two problems:
the values need to be numeric, now they are strings
the individual values shouldn't be put into lists, they need to be just one number, e.g. taking the sum of all values belonging to the same category
As the original data only has one entry per category, the following example adds an extra 'Tech'.
from matplotlib import pyplot as plt
import pandas as pd
Data = {'Company': ['Microsoft', 'Google', 'Nike', 'Goldman', 'Tesla'], 'Ticker': ['MSFT', 'GOOG', 'NKE', 'GGG', 'TSA'],
'Sector': ['Tech', 'Tech', 'Consumer', 'Financial', 'Auto'], 'Weighting': ['1', '10', '20', '40', '30']}
df = pd.DataFrame(Data)
df['Weighting'] = df['Weighting'].astype(float) # make numeric
Weight_Sector = df.groupby('Sector')['Weighting'].sum().to_dict()
plt.pie(Weight_Sector.values(), labels=Weight_Sector.keys(),
startangle=90, autopct='%.1f%%', colors=plt.cm.Set2.colors)
plt.show()

How to make one line bold in multiline plot in matplotlib or seaborn? [duplicate]

I am trying to plot a multi line plot using sns but only keeping the US line in red while the other countries are in grey
This is what I have so far:
df = px.data.gapminder()
sns.lineplot(x = 'year', y = 'pop', data = df, hue = 'country', color = 'grey', dashes = False, legend = False)
But this does not change the lines to grey. I was thinking that after this, I could add in US line by itself in red.....
You can use pandas groupby to plot:
fig,ax=plt.subplots()
for c,d in df.groupby('country'):
color = 'red' if c=='US' else 'grey'
d.plot(x='year',y='pop', ax=ax, color=color)
ax.legend().remove()
output:
Or you can define a specific palette as a dictionary:
palette = {c:'red' if c=='US' else 'grey' for c in df.country.unique()}
sns.lineplot(x='year', y='pop', data=df, hue='country',
palette=palette, legend=False)
Output:
You can use the palette parameter to pass custom colors for the lines to sns.lineplot, for example:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame({'year': [2018, 2019, 2020, 2018, 2019, 2020, 2018, 2019, 2020, ],
'pop': [325, 328, 332, 125, 127, 132, 36, 37, 38],
'country': ['USA', 'USA', 'USA', 'Mexico', 'Mexico', 'Mexico',
'Canada', 'Canada', 'Canada']})
colors = ['red', 'grey', 'grey']
sns.lineplot(x='year', y='pop', data=df, hue='country',
palette=colors, legend=False)
plt.ylim(0, 350)
plt.xticks([2018, 2019, 2020]);
It could still be useful to have a legend though, so you may also want to consider tinkering with the alpha values (the last values in the tuples below) to highlight the USA.
red = (1, 0, 0, 1)
green = (0, 0.5, 0, 0.2)
blue = (0, 0, 1, 0.2)
colors = [red, green, blue]
sns.lineplot(x='year', y='pop', data=df, hue='country',
palette=colors)
plt.ylim(0, 350)
plt.xticks([2018, 2019, 2020]);
Easily scalable solution:
Split dataframe into two based on lines to be highlighted
lines_to_highlight = ['USA']
hue_column = 'country'
a. Get data to be grayed out
df_gray = df.loc[~df[hue_column].isin(lines_to_highlight)].reset_index(drop=True)
Generate custom color pallet for grayed out lines - gray hex code #808080
gray_palette = {val:'#808080' for val in df_gray[hue_column].values}
b. Get data to be highlighted
df_highlight = df.loc[df[hue_column].isin(lines_to_highlight)].reset_index(drop=True)
Plot the two data frames on the same figure
a. Plot grayed out data:
ax = sns.lineplot(data=df_gray,x='year',y='pop',hue=hue_column,palette=gray_palette)
b. Plot highlighted data
sns.lineplot(data=df_highlight,x='year',y='pop',hue=hue_column,ax=ax)

How to display normalized categories in Altair stacked bar chart tooltip

I'm creating a stacked bar chart using the count of a categorical field in a dataframes column.
chart = alt.Chart(df2).mark_bar().encode(
x="take__take:O",
y=alt.Y('count(name)', stack="normalize", axis=alt.Axis(title="Percent", format="%")),
color=alt.Color('name', sort=alt.EncodingSortField('value', order='descending')),
order=alt.Order(
'value',
sort="ascending"
),
tooltip=[
alt.Tooltip('count(name)', title="Total Students")
]
)
How would I go about getting the normalized count in the tooltip?
Up until now your chart uses encoding shorthands to compute various aggregates; for more complicated operations (like displaying normalized values in tooltips) you will need to use transforms directly.
Here is an example of displaying per-group percentages in a tooltip, using a chart similar to what you showed above:
import altair as alt
import numpy as np
import pandas as pd
np.random.seed(0)
df2 = pd.DataFrame({
'name': np.random.choice(['A', 'B', 'C', 'D'], size=100),
'value': np.random.randint(0, 20, 100),
'take__take': np.random.randint(0, 5, 100)
})
alt.Chart(df2).transform_aggregate(
count='count()',
groupby=['name', 'take__take']
).transform_joinaggregate(
total='sum(count)',
groupby=['take__take']
).transform_calculate(
frac=alt.datum.count / alt.datum.total
).mark_bar().encode(
x="take__take:O",
y=alt.Y('count:Q', stack="normalize", axis=alt.Axis(title="Percent", format="%")),
color='name:N',
tooltip=[
alt.Tooltip('count:Q', title="Total Students"),
alt.Tooltip('frac:Q', title="Percentage of Students", format='.0%')
]
)

Plot markers indicating the net value for a grouped, stacked bar chart in Altair

I have a grouped, stacked bar chart here, Python using the Altair package, with positive and negative values. How can I plot a marker to indicate the Net value for each bar? A plot here uses a red line marker, but a diamond or point would be fine.
df1=pd.DataFrame(10*np.random.rand(4,3),index=["A","B","C","D"],columns=["I","J","K"])
print(df1)
df2=pd.DataFrame(-10*np.random.rand(4,3),index=["A","B","C","D"],columns=["I","J","K"])
df3=pd.DataFrame(10*np.random.rand(4,3),index=["A","B","C","D"],columns=["I","J","K"])
def prep_df(df, name):
df = df.stack().reset_index()
df.columns = ['c1', 'c2', 'values']
df['DF'] = name
return df
df1 = prep_df(df1, 'DF1')
df2 = prep_df(df2, 'DF2')
df3 = prep_df(df3, 'DF3')
df = pd.concat([df1, df2, df3])
alt.Chart(df).mark_bar().encode(
# tell Altair which field to group columns on
x=alt.X('c2:N', title=None),
# tell Altair which field to use as Y values and how to calculate
y=alt.Y('sum(values):Q',
axis=alt.Axis(
grid=False,
title=None)),
# tell Altair which field to use to use as the set of columns to be represented in each group
column=alt.Column('c1:N', title=None),
# tell Altair which field to use for color segmentation
color=alt.Color('DF:N',
scale=alt.Scale(
# make it look pretty with an enjoyable color pallet
range=['#96ceb4', '#ffcc5c','#ff6f69'],
),
))\
.configure_view(
# remove grid lines around column clusters
strokeOpacity=0
)
I've tried computing "Net" in a separate df, then doing something like:
tick = alt.Chart(source).mark_tick(
color='red',
thickness=2,
size=40 * 0.9, # controls width of tick.
).encode(
x=alt.X('c2:N', title=None),
y=alt.Y('Net')
)
but error is: 'Net' is not defined
No need to precompute the sum; altair can do that directly. The trick here is that faceted charts cannot be layered, so you have to instead facet the layered chart:
base = alt.Chart(df).encode(
x=alt.X('c2:N', title=None),
y=alt.Y('sum(values):Q',
axis=alt.Axis(
grid=False,
title=None)),
)
bars = base.mark_bar().encode(
color=alt.Color('DF:N',
scale=alt.Scale(
range=['#96ceb4', '#ffcc5c','#ff6f69'],
),
)
)
net = base.mark_tick(
color='red',
thickness=2,
size=18,
)
alt.layer(bars, net).facet(
column=alt.Column('c1:N', title=None)
).configure_view(
strokeOpacity=0
)

Plot graph with the data showing respective colors

color = []
for key,value in ms.iterrows():
if(value['Color']=='Blue'):
color.append('b')
elif(value['Color']=='Green'):
color.append('g')
elif(value['Color']=='Red'):
color.append('r')
elif(value['Color']=='Yellow'):
color.append('y')
elif(value['Color']=='Orange'):
color.append('o')
else:
color.append('k')
ax =ms[['Height','Color']].plot(x='Color', kind='bar', title="Correlation",
figsize=(15,10), color=color legend=True, fontsize=12)
ax.set_xlabel("Colors", fontsize=12)
ax.set_ylabel("Height", fontsize=12)
My intention is to plot a bar graph that shows Color against Height. I managed to do it. However, I would like each of the bars to show respective color. In accord with the data set, I would like the 1st bar to show red...and so on. I tried adding the color, but it still shows only 1 color.
The trick is to create a multicolumn dataframe and use the stacked=True option.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({"Height" : [5,3,6,4],
"Color" : ["Blue", "Green", "Red", "Yellow"]})
color = []
for key,value in df.iterrows():
if(value['Color']=='Blue'):
color.append('b')
elif(value['Color']=='Green'):
color.append('g')
elif(value['Color']=='Red'):
color.append('r')
elif(value['Color']=='Yellow'):
color.append('y')
elif(value['Color']=='Orange'):
color.append('o')
else:
color.append('k')
df2 = pd.DataFrame(np.diag(df["Height"]), columns=df["Color"], index=df["Color"])
ax = df2.plot(kind='bar', title="Correlation", color=color, legend=True,
fontsize=12, stacked=True)
ax.set_xlabel("Colors", fontsize=12)
ax.set_ylabel("Height", fontsize=12)
plt.show()
You do not have to create if/else conditions:
import pandas as pd
df = pd.DataFrame({"Height" : [5,3,6,4],
"Color" : ["Blue", "Green", "Red", "Yellow"]})
df.set_index('Color').Height.plot(kind='bar',color=df.Color)

Resources