Hide the grid in an a specificaltair plot within a set of vstacked plots - altair

I am trying to create a plot composed of 2 charts stacked vertically: a time series chart showing a data and below it a time series chart showing texts representing events on the time axis. I want the data-chart having a grid, but the mark_text chart below not to show an outer line and no grid. I use the chart.configure_axis(grid=False) command to hide the axis but get the following error: Objects with "config" attribute cannot be used within LayerChart. Consider defining the config attribute in the LayerChart object instead.
I can't figure out, where to apply the configure_axis(grid=False) option, so it will only apply to the bottom plot. any help on this would be greatly appreciated. or any suggestion how to implement the label-plot in a different way.
here is my code:
import altair as alt
import pandas as pd
import locale
from altair_saver import save
from datetime import datetime
file = '.\lagebericht.csv'
df = pd.read_csv(file, sep=';')
source = df
locale.setlocale(locale.LC_ALL, "de_CH")
min_date = '2020-02-29'
domain_pd = pd.to_datetime([min_date, '2020-12-1']).astype(int) / 10 ** 6
base = alt.Chart(source, title='Neumeldungen BS').encode(
alt.X('test_datum:T', axis=alt.Axis(title="",format="%b %y"), scale = alt.Scale(domain=list(domain_pd) ))
)
bar = base.mark_bar(width = 1).encode(
alt.Y('faelle_bs:Q', axis=alt.Axis(title="Anzahl Fälle"), scale = alt.Scale(domain=(0, 120)))
)
line = base.mark_line(color='blue').encode(
y='faelle_Total:Q')
chart1 = (bar + line).properties(width=600)
events= pd.DataFrame({
'datum': [datetime(2020,7,1), datetime(2020,5,15)],
'const': [1,1],
'label': ['allgememeiner Lockdown', 'Gruppen > 50 verboten'],
})
base = alt.Chart(events).encode(
alt.X('datum:T', axis=alt.Axis(title="", format="%b %y"), scale = alt.Scale(domain=list(domain_pd) ))
)
points = base.mark_rule(color='blue').encode(
y=alt.Y('const:Q', axis=alt.Axis(title="",ticks=False, domain=False, labels=False), scale = alt.Scale(domain=(0, 10)))
)
text = base.mark_text(
align='right',
baseline='bottom',
angle = 20,
dx=0, # Nudges text to right so it doesn't appear on top of the bar
dy=20,
).encode(text='label:O').configure_axis(grid=False)
chart2 = (points + text).properties(width=600, height = 50)
save(chart1 & chart2, r"images\figs.html")
this is what it looks without the grid=False option:
enter image description here

The configure() method should be thought of as a way to specify a global chart theme; you cannot have different configurations within a single Chart (See https://altair-viz.github.io/user_guide/customization.html#global-config-vs-local-config-vs-encoding for a discussion of this).
The way to do what you want is not via global configuration, but via axis settings. For example, you can pass grid=False to alt.Axis:
points = alt.Chart(events).mark_rule(color='blue').encode(
x=alt.X('datum:T', axis=alt.Axis(title="", format="%b %y"), scale = alt.Scale(domain=list(domain_pd) )),
y=alt.Y('const:Q', axis=alt.Axis(title="",ticks=False, domain=False, labels=False), scale = alt.Scale(domain=(0, 10)))
)
text = alt.Chart(events).mark_text().encode(
x=alt.X('datum:T', axis=alt.Axis(title="", grid=False, format="%b %y"), scale = alt.Scale(domain=list(domain_pd) )),
text='label:O'
)

Related

Altair dropdown for linear or log scale

I'd like for be able to toggle between log and linear scale in my altair plot. I'd also like to avoid multiple columns of transformed data if possible. I've tried this but get an error AttributeError: 'Scale' object has no attribute 'selection'
import altair as alt
from vega_datasets import data
cars_data = data.cars()
input_dropdown = alt.binding_select(options=['linear','log'], name='Scale')
selection = alt.selection_single(fields=['Miles_per_Gallon'], bind=input_dropdown)
scale = alt.condition(selection, alt.Scale(type = 'linear'), alt.Scale(type = 'log'))
alt.Chart(cars_data).mark_point().encode(
x='Horsepower:Q',
y = alt.Y('Miles_per_Gallon:Q',
scale=scale),
tooltip='Name:N'
).add_selection(
scale
)
I've tried a variety of different things but can't seem to make it work. Any suggestions are greatly appreciated.

Altair: Remove title from layered faceted graphs

I tried layering faceted graphs and it failed, so moved to the method suggested in here - https://stackoverflow.com/a/52882510/20390480 which basically layer the graphs and then call .facet(column). With this method I am unable to remove the facet title.
I tried .facet(column, title=None) throws the following error.
import altair as alt
from vega_datasets import data
cars = data.cars()
horse = alt.Chart().mark_point().encode(
x = 'Weight_in_lbs',
y = 'Horsepower'
)
miles = alt.Chart().mark_point(color='red').encode(
x = 'Weight_in_lbs',
y = 'Miles_per_Gallon'
)
alt.layer(horse, miles, data=cars).facet(column='Origin', title=None)
SchemaValidationError: Invalid specification
altair.vegalite.v4.api.Chart, validating 'required'
'data' is a required property
alt.FacetChart(...)
Try:
alt.layer(horse, miles, data=cars).facet(column=alt.Column('Origin', title=None))

Is there a way to specify what the legend shows in Altair?

I have the following graph in Altair:
The code used to generate it is as follows:
data = pd.read_csv(data_csv)
display(data)
display(set(data['algo_score_raw']))
# First generate base graph
base = alt.Chart(data).mark_circle(opacity=1, stroke='#4c78a8').encode(
x=alt.X('Paragraph:N', axis=None),
y=alt.Y('Section:N', sort=list(OrderedDict.fromkeys(data['Section']))),
size=alt.Size('algo_score_raw:Q', title="Number of Matches"),
).properties(
width=900,
height=500
)
# Next generate the overlying graph with the lines
lines = alt.Chart(data).mark_rule(stroke='#4c78a8').encode(
x=alt.X('Paragraph:N', axis=alt.Axis(labelAngle=0)),
y=alt.Y('Section:N', sort=list(OrderedDict.fromkeys(data['Section'])))
).properties(
width=900,
height=500
)
if max(data['algo_score_raw']) == 0:
return lines # no circles if no matches
else:
return base + lines
However, I don't want the decimal values in my legend; I only want 1.0, 2.0, and 3.0, because those are the only values that are actually present in my data. However, Altair seems to default to what you see above.
The legend is generated based on how you specify your encoding. It sounds like your data are better represented as ordered categories than as a continuous quantitative scale. You can specify this by changing the encoding type to ordinal:
size=alt.Size('algo_score_raw:O')
You can read more about encoding types at https://altair-viz.github.io/user_guide/encoding.html
You can use alt.Legend(tickCount=2)) (labelExpr could also be helpful, see the docs for more):
import altair as alt
from vega_datasets import data
source = data.cars()
source['Acceleration'] = source['Acceleration'] / 10
chart = alt.Chart(source).mark_circle(size=60).encode(
x='Horsepower',
y='Miles_per_Gallon',
size='Acceleration',
)
chart
chart.encode(size=alt.Size('Acceleration', legend=alt.Legend(tickCount=2)))

Drag & drop Nodes in network graph with Bokeh

I am starting with Bokeh. I plot a network graph. It works.
I want to drag & drop on Nodes to move them through plot for better clarity in nodes relation:
So far I have the following (just important lines are written):
df = pd.read_csv('data.csv', sep=" ", header=None)
G = nx.from_pandas_edgelist(d, 0, 1)
plot = Plot(background_fill_color="lightgrey",
plot_width=800, plot_height=600,
x_range=Range1d(-0.5, 0.5), y_range=Range1d(-0.5, 0.5))
graph_renderer = from_networkx(
G, nx.spring_layout, scale=1, center=(0, 0))
# here is the issue:
plot.add_tools(PointDrawTool(
renderers=[graph_renderer], empty_value='black'))
plot.renderers.append(graph_renderer)
...
PointDrawTool is the tool that enables drag&drop. Following documentation says it expects a renderer (I assume: graph_renderer) but I get the error AttributeError: 'GraphRenderer' object has no attribute 'glyph'
Some guidance appreciated.
Everything works fine in Bokeh v1.1.0 when you replace the
plot.add_tools(PointDrawTool(renderers = [graph_renderer], empty_value='black'))
with:
plot.add_tools(PointDrawTool(renderers = [graph_renderer.node_renderer], empty_value = 'black'))

Bokeh Mapping Counties

I am attempting to modify this example with county data for Michigan. In short, it's working, but it seems to be adding some extra shapes here and there in the process of drawing the counties. I'm guessing that in some instances (where there are counties with islands), the island part needs to be listed as a separate "county", but I'm not sure about the other case, such as with Wayne county in the lower right part of the state.
Here's a picture of what I currently have:
Here's what I did so far:
Get county data from Bokeh's sample county data just to get the state abbreviation per state number (my second, main data source only has state numbers). For this example, I'll simplify it by just filtering for state number 26).
Get state coordinates ('500k' file) by county from the U.S. Census site.
Use the following code to generate an 'interactive' map of Michigan.
Note: To pip install shapefile (really pyshp), I think I had to download the .whl file from here and then do pip install [path to .whl file].
import pandas as pd
import numpy as np
import shapefile
from bokeh.models import HoverTool, ColumnDataSource
from bokeh.palettes import Viridis6
from bokeh.plotting import figure, show, output_notebook
shpfile=r'Path\500K_US_Counties\cb_2015_us_county_500k.shp'
sf = shapefile.Reader(shpfile)
shapes = sf.shapes()
#Here are the rows from the shape file (plus lat/long coordinates)
rows=[]
lenrow=[]
for i,j in zip(sf.shapeRecords(),sf.shapes()):
rows.append(i.record+[j.points])
if len(i.record+[j.points])!=10:
print("Found record with irrular number of columns")
fields1=sf.fields[1:] #Ignore first field as it is not used (maybe it's a meta field?)
fields=[seq[0] for seq in fields1]+['Long_Lat']#Take the first element in each tuple of the list
c=pd.DataFrame(rows,columns=fields)
try:
c['STATEFP']=c['STATEFP'].astype(int)
except:
pass
#cns=pd.read_csv(r'Path\US_Counties.csv')
#cns=cns[['State Abbr.','STATE num']]
#cns=cns.drop_duplicates('State Abbr.',keep='first')
#c=pd.merge(c,cns,how='left',left_on='STATEFP',right_on='STATE num')
c['Lat']=c['Long_Lat'].apply(lambda x: [e[0] for e in x])
c['Long']=c['Long_Lat'].apply(lambda x: [e[1] for e in x])
#c=c.loc[c['State Abbr.']=='MI']
c=c.loc[c['STATEFP']==26]
#latitudex, longitude=y
county_xs = c['Lat']
county_ys = c['Long']
county_names = c['NAME']
county_colors = [Viridis6[np.random.randint(1,6, size=1).tolist()[0]] for l in aland]
randns=np.random.randint(1,6, size=1).tolist()[0]
#county_colors = [Viridis6[e] for e in randns]
#county_colors = 'b'
source = ColumnDataSource(data=dict(
x=county_xs,
y=county_ys,
color=county_colors,
name=county_names,
#rate=county_rates,
))
output_notebook()
TOOLS="pan,wheel_zoom,box_zoom,reset,hover,save"
p = figure(title="Title", tools=TOOLS,
x_axis_location=None, y_axis_location=None)
p.grid.grid_line_color = None
p.patches('x', 'y', source=source,
fill_color='color', fill_alpha=0.7,
line_color="white", line_width=0.5)
hover = p.select_one(HoverTool)
hover.point_policy = "follow_mouse"
hover.tooltips = [
("Name", "#name"),
#("Unemployment rate)", "#rate%"),
("(Long, Lat)", "($x, $y)"),
]
show(p)
I'm looking for a way to avoid the extra lines and shapes.
Thanks in advance!
I have a solution to this problem, and I think I might even know why it is correct. First, let me show quote from Bryan Van de ven in a Google groups Bokeh discussion:
there is no built-in support for dealing with shapefiles. You will have to convert the data to the simple format that Bokeh understands. (As an aside: it would be great to have a contribution that made dealing with various GIS formats easier).
The format that Bokeh expects for patches is a "list of lists" of points. So something like:
xs = [ [patch0 x-coords], [patch1 x-coords], ... ]
ys = [ [patch1 y-coords], [patch1 y-coords], ... ]
Note that if a patch is comprised of multiple polygons, this is currently expressed by putting NaN values in the sublists. So, the task is basically to convert whatever form of polygon data you have to this format, and then Bokeh can display it.
So it seems like somehow you are ignoring NaNs or otherwise not handling multiple polygons properly. Here is some code that will download US census data, unzip it, read it properly for Bokeh, and make a data frame of lat, long, state, and county.
def get_map_data(shape_data_file, local_file_path):
url = "http://www2.census.gov/geo/tiger/GENZ2015/shp/" + \
shape_data_file + ".zip"
zfile = local_file_path + shape_data_file + ".zip"
sfile = local_file_path + shape_data_file + ".shp"
dfile = local_file_path + shape_data_file + ".dbf"
if not os.path.exists(zfile):
print("Getting file: ", url)
response = requests.get(url)
with open(zfile, "wb") as code:
code.write(response.content)
if not os.path.exists(sfile):
uz_cmd = 'unzip ' + zfile + " -d " + local_file_path
print("Executing command: " + uz_cmd)
os.system(uz_cmd)
shp = open(sfile, "rb")
dbf = open(dfile, "rb")
sf = shapefile.Reader(shp=shp, dbf=dbf)
lats = []
lons = []
ct_name = []
st_id = []
for shprec in sf.shapeRecords():
st_id.append(int(shprec.record[0]))
ct_name.append(shprec.record[5])
lat, lon = map(list, zip(*shprec.shape.points))
indices = shprec.shape.parts.tolist()
lat = [lat[i:j] + [float('NaN')] for i, j in zip(indices, indices[1:]+[None])]
lon = [lon[i:j] + [float('NaN')] for i, j in zip(indices, indices[1:]+[None])]
lat = list(itertools.chain.from_iterable(lat))
lon = list(itertools.chain.from_iterable(lon))
lats.append(lat)
lons.append(lon)
map_data = pd.DataFrame({'x': lats, 'y': lons, 'state': st_id, 'county_name': ct_name})
return map_data
The inputs to this command are a local directory where you want to download the map data to and the other input is the name of the shape file. I know there are at least two available maps from the url in the function above that you could call:
map_low_res = "cb_2015_us_county_20m"
map_high_res = "cb_2015_us_county_500k"
If the US census changes their url, which they certainly will one day, then you will need to change the input file name and the url variable. So, you can call the function above
map_output = get_map_data(map_low_res, ".")
Then you could plot it just as the code in the original question does. Add a color data column first ("county_colors" in the original question), and then set it to the source like this:
source = ColumnDataSource(map_output)
To make this all work you will need to import libraries such as requests, os, itertools, shapefile, bokeh.models.ColumnDataSource, etc...
One solution:
Use the 1:20,000,000 shape file instead of the 1:500,000 file.
It loses some detail around the shape of each county but does not have any extra shapes (and just a couple of extra lines).

Resources