Altair show overlapping images - altair

I want to show overlapping images using altair.
Here's a demo code.
import altair as alt
import pandas as pd
source = pd.DataFrame([{"x": [0,0,0], "y": [0,0,0],
"img": ["https://vega.github.io/vega-datasets/data/gimp.png",
"https://vega.github.io/vega-datasets/data/7zip.png",
"https://vega.github.io/vega-datasets/data/ffox.png"]},])
chart=alt.Chart(source).mark_image(width=100,height=100,).encode(x='x',y='y',url='img')
chart
What I see as an output is not what I expected:
I wonder what's the issue here(?).

You probably meant to construct your dataframe this way:
source = pd.DataFrame({"x": [0,0,0], "y": [0,0,0],
"img": ["https://vega.github.io/vega-datasets/data/gimp.png",
"https://vega.github.io/vega-datasets/data/7zip.png",
"https://vega.github.io/vega-datasets/data/ffox.png"]})
i.e. it should have three rows rather than one row. If you do this, the chart works, and the last image appears on top of the others, as expected.

Related

Plotting graphs with Altair from a Pandas Dataframe

I am trying to read table values from a spreadsheet and plot different charts using Altair.
The spreadsheet can be found here
import pandas as pd
xls_file = pd.ExcelFile('PET_PRI_SPT_S1_D.xls')
xls_file
crude_df = xls_file.parse('Data 1')
crude_df
I am setting the second row values as column headers of the data frame.
crude_df.columns = crude_df.iloc[1]
crude_df.columns
Index(['Date', 'Cushing, OK WTI Spot Price FOB (Dollars per Barrel)',
'Europe Brent Spot Price FOB (Dollars per Barrel)'],
dtype='object', name=1)
The following is a modified version of Altair code got from documentation examples
crude_df_header = crude_df.head(100)
import altair as alt
alt.Chart(crude_df_header).mark_circle().encode(
# Mapping the WTI column to y-axis
y='Cushing, OK WTI Spot Price FOB (Dollars per Barrel)'
)
This does not work.
Error is shown as
TypeError: Object of type datetime is not JSON serializable
How to make 2 D plots with this data?
Also, how to make plots for number of values exceeding 5000 in Altair? Even this results in errors.
Your error is due to the way you parsed the file. You have set the column name but forgot to remove the first two rows, including the ones which are now the column names. The presence of these string values resulted in the error.
The proper way of achieving what you are looking for will be as follow:
import pandas as pd
import altair as alt
crude_df = pd.read_excel(open('PET_PRI_SPT_S1_D.xls', 'rb'),
sheet_name='Data 1',index_col=None, header=2)
alt.Chart(crude_df.head(100)).mark_circle().encode(
x ='Date',
y='Cushing, OK WTI Spot Price FOB (Dollars per Barrel)'
)
For the max rows issue, you can use the following
alt.data_transformers.disable_max_rows()
But be mindful of the official warning
If you choose this route, please be careful: if you are making multiple plots with the dataset in a particular notebook, the notebook will grow very large and performance may suffer.

Altair repeated chart, add different subplot/chart title

I can't believe I haven't been able to google the answer for this .... in the documented example of repeated charts, how would I add a different sub-chart titles?
import altair as alt
from vega_datasets import data
iris = data.iris.url
alt.Chart(iris).mark_point().encode(
alt.X(alt.repeat("column"), type='quantitative'),
alt.Y(alt.repeat("row"), type='quantitative'),
color='species:N'
).properties(
width=200,
height=200,
title="Chart title",
).repeat(
row=['petalLength', 'petalWidth'],
column=['sepalLength', 'sepalWidth']
).interactive()
adds the same title to each sub-chart. Can I pass in a list of titles here? The figure in this question shows that the same chart title would show up in all columns. The same seems to be the case for my data/code:
I don't think it is possible to change title for repeated charts, but depending on your application you might be able to workaround this by using a transform_fold + faceting instead:
import altair as alt
from vega_datasets import data
iris = data.iris.url
alt.Chart(iris).mark_point().encode(
alt.X('species:N'),
alt.Y('value:Q'),
color='species:N'
).transform_fold(
['petalLength', 'petalWidth', 'sepalLength', 'sepalWidth']
).facet(
'key:N'
)

How do i set the domain of an axis to a value that isn't a multiple of five in Altair?

I'm trying to set the x-axis domain to between 0-36, as some data I'm processing was collected in 6-week increments. Following the documentation i used the scale=alt.Scale(domain=[0,36]). However, this continues to show the chart up to 40.
df = pd.DataFrame({'x':[0,6,12,18,24,30,36],'y':[0,3,1,4,2,5,3]})
alt.Chart(df).mark_line(point=True).encode(
x=alt.X('x:Q',
axis=alt.Axis(values=[0,6,12,18,24,30,36]),
scale=alt.Scale(domain=[0,36])),
y=alt.Y('y:Q'),
)
Output of code above
Changing the above code to cut off between 30 and 35 i.e., scale=alt.Scale(domain=[0,31]) generates this behavior, where the chart axis gets truncated at 30 (but shows the data after 30, appropriately since the data hasn't been clipped).
But why can't I cut off the graph at values that aren't multiples of 5?
I'm using Altair v4.0.1
The Vega-Lite renderer defaults to choosing "nice" values for the scale. If you want to disable this behavior, you can pass nice=False:
import pandas as pd
import altair as alt
df = pd.DataFrame({'x':[0,6,12,18,24,30,36],'y':[0,3,1,4,2,5,3]})
alt.Chart(df).mark_line(point=True).encode(
x=alt.X('x:Q',
axis=alt.Axis(values=[0,6,12,18,24,30,36]),
scale=alt.Scale(domain=[0,36], nice=False)),
y=alt.Y('y:Q'),
)

Coloring bar chart by category of the values

I need to color my bar plot by the values' category. Is that possible using Matplotlib?
Example:
Normally I use two list to create a bar chart:
values = [5, 2, 1, 7, 8, 12]
xticks = [John, Nina, Darren, Peter, Joe, Kendra]
Is it possible to add an extra category list, and color those bars based on those categories?
Like:
category = [male, female, male, male, male, female]
Thanks in advance!
If you want, you could structure your data as a dictionary, then it could be solved like this, where each entry as a property "gender" which is also the lookup key in the dictionar color_map :
import matplotlib.pyplot as plt
fig,ax=plt.subplots(1,1)
data={
0:{"name":"John", "value":5, "gender":"male"},
1:{"name":"Nina", "value":2, "gender":"female"},
2:{"name":"Darren", "value":1, "gender":"male"},
3:{"name":"Peter", "value":7, "gender":"male"},
4:{"name":"Joe", "value":8, "gender":"male"},
5:{"name":"Kendra", "value":12,"gender":"female"},
}
color_map={"male":"b","female":"r"}
xs=data.keys()
ys=[v["value"] for v in data.values()]
names=[v["name"] for v in data.values()]
colors=[color_map[v["gender"]] for v in data.values()]
ax.bar(xs,ys,color=colors)
ax.set_xticks(xs)
ax.set_xticklabels(names)
plt.show()
There are multiple ways to get a valid "male"/"female" legend here; one would be to create a fully custom legend. Apart form that you could, for example, separate the dict into one for male and one for female data, and then create two separate bar plots like ax.bar(… , label="male"), which is kind of a bloated approach, however.
I'd suggest to use the pandas library then, changing my answer to:
import matplotlib.pyplot as plt
import pandas as pd
fig,ax=plt.subplots(1,1)
data={
0:{"name":"John", "value":5, "gender":"male"},
1:{"name":"Nina", "value":2, "gender":"female"},
2:{"name":"Darren", "value":1, "gender":"male"},
3:{"name":"Peter", "value":7, "gender":"male"},
4:{"name":"Joe", "value":8, "gender":"male"},
5:{"name":"Kendra", "value":12,"gender":"female"},
}
df=pd.DataFrame.from_dict(data,orient='index')
color_map={"male":"b","female":"r"}
df["colors"]=df["gender"].map(color_map)
for g in ["male","female"]:
xs=df.index[df["gender"]==g]
ys=df["value"][df["gender"]==g]
color=df["colors"][df["gender"]==g]
## or, perhaps easier in this specific case:
# color=color_map[g]
ax.bar(xs,ys,color=color,label=g)
ax.legend()
ax.set_xticks(df.index)
ax.set_xticklabels(df["name"])
plt.show()
Here, the key is that we can filter the dataframe df (think of it perhaps like an Excel worksheet) on various conditions, e.g. df["value"][df["gender"]=="male"]. This way we can create the two separate bar plots easily.

How to make an altair plot within an IF statement?

The situation seems to be quite simple: I am working in a Jupyter Lab file with several Altair plots, which eventually make the file too large to run and to save. Since I don't need to see these plots every single time, I figured I could avoid this by specifying something like plotAltair = True at the beginning of the script and then nesting each Altair plot in if statements. As simple as this may sound, for some reason it doesn't appear to work. Am I missing out on something obvious? [edit: turns out I was]
For instance:
import altair as alt
import os
import pandas as pd
import numpy as np
lengths = np.random.randint(0,100,200)
lengths_list = lengths.tolist()
labels = [str(i) for i in lengths_list]
peak_lengths = pd.DataFrame.from_dict({'coords': labels,
'lengths': lengths_list},
orient='columns')
What works:
alt.Chart(peak_lengths).mark_bar().encode(
x = alt.X('lengths:Q', bin=True),
y='count(*):Q'
)
What doesn't work:
plotAltair = True
if plotAltair:
alt.Chart(peak_lengths).mark_bar().encode(
x = alt.X('lengths:Q', bin=True),
y='count(*):Q'
)
** Obs.: I have already attempted to use alt.data_transformers.enable('json') as a way of reducing file size and it is also not working, but let's please not focus on this but rather on the more simple question.
Short answer: use chart.display()
Long answer: Jupyter notebooks in general will only display things if you tell them to. For example, this code will not result in any output:
if x:
x + 1
You are telling the notebook to evaluate x + 1, but not to do anything with it. What you need to do is tell the notebook to print the result, either implicitly by putting it as the last line in the main block of the cell, or explicitly by asking for it to be printed when the statement appears anywhere else:
if x:
print(x + 1)
It is similar for Altair charts, which are just normal Python objects. If you put the chart at the end of the cell, you are implicitly asking for the result to be displayed, and Jupyter will display it as it will any variable. If you want it to be displayed from any other location in the cell, you need to explicitly ask that it be displayed using the IPython.display.display() function:
from IPython.display import display
if plotChart:
chart = alt.Chart(data).mark_point().encode(x='x', y='y')
display(chart)
Because this extra import is a bit verbose, Altair provides a .display() method as a convenience function to do the same thing:
if plotChart:
chart = alt.Chart(data).mark_point().encode(x='x', y='y')
chart.display()
Note that calling .display() on multiple charts is the way that you can display multiple charts in a single cell.

Resources