Can't get transform_filter to work in Altair - altair

For my teaching notes I am trying to implement this vega-lite example in Altair:
{
"data": {"url": "data/seattle-weather.csv"},
"layer": [{
"params": [{
"name": "brush",
"select": {"type": "interval", "encodings": ["x"]}
}],
"mark": "bar",
"encoding": {
"x": {
"timeUnit": "month",
"field": "date",
"type": "ordinal"
},
"y": {
"aggregate": "mean",
"field": "precipitation",
"type": "quantitative"
},
"opacity": {
"condition": {
"param": "brush", "value": 1
},
"value": 0.7
}
}
}, {
"transform": [{
"filter": {"param": "brush"}
}],
"mark": "rule",
"encoding": {
"y": {
"aggregate": "mean",
"field": "precipitation",
"type": "quantitative"
},
"color": {"value": "firebrick"},
"size": {"value": 3}
}
}]
}
I getting the separate charts (bar and rule to work) was easy, but I run into issues in making mark_rule interactive.
import altair as alt
from vega_datasets import data
df = data.seattle_weather()
selection = alt.selection_interval(encodings=['x'])
base = alt.Chart(df).add_selection(selection)
bar_i = base.mark_bar().encode(
x="month(date):T",
y="mean(precipitation):Q",
opacity=alt.condition(selection, alt.value(1.0), alt.value(0.7)))
rule_i = base.mark_rule().transform_filter(selection).encode(y="mean(precipitation):Q")
(bar_i + rule_i).properties(width=600)
The error reads
Javascript Error: Duplicate signal name: "selector013_scale_trigger"
This usually means there's a typo in your chart specification. See the javascript console for the full traceback.

It looks like the chart you're interested in creating is part of Altair's example gallery: https://altair-viz.github.io/gallery/selection_layer_bar_month.html
import altair as alt
from vega_datasets import data
source = data.seattle_weather()
brush = alt.selection(type='interval', encodings=['x'])
bars = alt.Chart(source).mark_bar().encode(
x='month(date):O',
y='mean(precipitation):Q',
opacity=alt.condition(brush, alt.OpacityValue(1), alt.OpacityValue(0.7)),
).add_selection(
brush
)
line = alt.Chart(source).mark_rule(color='firebrick').encode(
y='mean(precipitation):Q',
size=alt.SizeValue(3)
).transform_filter(
brush
)
bars + line
The error you're seeing comes from the fact that base includes the selection, and both layers are derived from base, so the same selection is declared twice within the single chart.

Related

Why do groups not respect the parent's padding, except at the top level?

What exactly is the behavior of padding for group marks in Vega? At the top-most level the children groups respect the top-level padding, but this doesn't seem the case for the children's children, they don't respect their parent's padding.
For example, here I would expect to get a rectangle centered in a rectangle centered in another rectangle:
Open the Chart in the Vega Editor
Instead each rectangle seems to be anchored at the origin of the top-level coordinate system.
Note that replacing "padding": {"signal": "level_2_padding"} with "padding": {"value": 0} doesn't seem to have any effect, so I'm not even sure if inner groups can have padding?
How can I best implement nested groups that respect the parent's padding?
There is no padding property on a Group mark. Instead, you can access group properties using Field Values. Something like the following should work.
Editor
{
"$schema": "https://vega.github.io/schema/vega/v5.json",
"autosize": "none",
"config": {"group": {"stroke": "black"}},
"signals": [
{"name": "target_height", "value": 400},
{"name": "target_width", "value": 300},
{"name": "level_0_padding", "value": 64},
{"name": "level_1_padding", "update": "1/2 * level_0_padding"},
{"name": "level_2_padding", "update": "1/4 * level_0_padding"},
{"name": "level_0_height", "update": "target_height - 2*level_0_padding"},
{"name": "level_0_width", "update": "target_width - 2*level_0_padding"},
{"name": "level_1_width", "update": "level_0_width - 2*level_1_padding"},
{"name": "level_1_height", "update": "level_0_height - 2*level_1_padding"}
],
"width": {"signal": "level_0_width"},
"height": {"signal": "level_0_height"},
"padding": {"signal": "level_0_padding"},
"marks": [
{
"type": "group",
"signals": [
{
"name": "level_2_width",
"update": "level_1_width - 2*level_2_padding"
},
{
"name": "level_2_height",
"update": "level_1_height - 2*level_2_padding"
}
],
"encode": {
"update": {
"width": {"signal": "level_1_width"},
"height": {"signal": "level_1_height"},
"x": {"signal": "level_0_width-level_1_width - level_1_padding"},
"y": {"signal": "level_0_height-level_1_height - level_1_padding"},
"stroke": {"value": "red"},
"strokeOpacity": {"value": 0.5}
}
},
"marks": [
{
"type": "group",
"encode": {
"update": {
"width": {"signal": "level_2_width"},
"height": {"signal": "level_2_height"},
"x": {
"field": {"group": "width"},
"mult": 0.5,
"offset": {"signal": "-level_2_width/2"}
},
"y": {
"field": {"group": "height"},
"mult": 0.5,
"offset": {"signal": "-level_2_height/2"}
},
"stroke": {"value": "blue"},
"strokeOpacity": {"value": 0.5}
}
}
}
]
}
]
}
I'll accept David's answer, but also post my own to complement David's.
Here's an alternative solution specification, like David's spec it uses the "x" and "y" group properties, but I think it's a bit simpler and closer to what I need: Open the Chart in the Vega Editor
An important point that I have to mention is that using layout prevents x and y from working, that is: groups directly contained in a layout/grid may not be offset using x or y.

Data is getting embedded via a local json file

I'm trying to plot some data, that data is in a pandas dataframe cdfs:
alt.Chart(cdfs).mark_line().encode(
x = alt.X('latency:Q', scale=alt.Scale(type='log'), axis=alt.Axis(format="", title='Response_time (ms)')),
y = alt.Y('percentile:Q', axis=alt.Axis(format="", title='Cumulative Fraction')),
color='write_size:N',
)
The issue is that when viewing the source of the resultant plot there is just a url to a json file. That json file can't be found and hence the plots are appearing to be blank (no data).
{
"config": {"view": {"continuousWidth": 400, "continuousHeight": 300}},
"data": {
"url": "altair-data-78b044f23db74f7d4408fba9f31b9ea9.json",
"format": {"type": "json"}
},
"mark": "line",
"encoding": {
"color": {"type": "nominal", "field": "write_size"},
"x": {
"type": "quantitative",
"axis": {"format": "", "title": "Response_time (ms)"},
"field": "latency",
"scale": {"type": "log"}
},
"y": {
"type": "quantitative",
"axis": {"format": "", "title": "Cumulative Fraction"},
"field": "percentile"
}
},
"$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json"
}
This code was previously working (displaying the data on the chart) however I restarted the jupyterlab server its running on between now and then.
Hence I'm wondering why the data is getting embedded via a url rather than directly all of a sudden?
At some point in your session, you must have run
alt.data_transformers.enable('json')
If you want to restore the default data transformer which embeds data directly into the chart, run
alt.data_transformers.enable('default')

Get a workable URL from Python Get request

I'm scraping a JS loaded website using requests. In order to do so, I go to inspect website, network console and look for the XHR calls to know where is the website calling for the data and how. Process would be as follows
Go to the link https://www.888sport.es/futbol/#/event/1006276426 in Chrome. Once that is loaded, you can click on many items with an unique ID. After doing so, a pop up window with information appears. In the XHR call I mentioned above you get a direct link to get this information as follows:
import requests
url='https://eu-offering.kambicdn.org/offering/v2018/888es/betoffer/outcome.json?lang=es_ES&market=ES&client_id=2&channel_id=1&ncid=1586874367958&id=2740660278'
#ncid is the date in timestamp format, and id is the unique id of the node clicked
response=requests.get(url=url,headers=headers)
Problem is, this isn't user friendly and require python. If I put this last url in the Chrome driver, I get the information but in plain text, and I can't interact with it. Is there any way to get a workable link from the request so that manually inserting it in a Chrome driver it loads that pop up window directly, as a regular website?
You've to make the requests as .json() so you receive a json dict, which you can access it with keys.
import requests
import json
def main(url):
r = requests.get(url).json()
print(r.keys())
hview = json.dumps(r, indent=4)
print(hview) # here to see it in nice view.
main("https://eu-offering.kambicdn.org/offering/v2018/888es/betoffer/outcome.json?lang=es_ES&market=ES&client_id=2&channel_id=1&ncid=1586874367958&id=2740660278")
Output:
dict_keys(['betOffers', 'events', 'prePacks'])
{
"betOffers": [
{
"id": 2210856430,
"closed": "2020-04-17T14:30:00Z",
"criterion": {
"id": 1001159858,
"label": "Final del partido",
"englishLabel": "Full Time",
"order": [],
"occurrenceType": "GOALS",
"lifetime": "FULL_TIME"
},
"betOfferType": {
"id": 2,
"name": "Partido",
"englishName": "Match"
},
"eventId": 1006276426,
"outcomes": [
{
"id": 2740660278,
"label": "1",
"englishLabel": "1",
"odds": 1150,
"participant": "FC Lokomotiv Gomel",
"type": "OT_ONE",
"betOfferId": 2210856430,
"changedDate": "2020-04-14T09:11:55Z",
"participantId": 1003789012,
"oddsFractional": "1/7",
"oddsAmerican": "-670",
"status": "OPEN",
"cashOutStatus": "ENABLED"
},
{
"id": 2740660284,
"label": "X",
"englishLabel": "X",
"odds": 6750,
"type": "OT_CROSS",
"betOfferId": 2210856430,
"changedDate": "2020-04-14T09:11:55Z",
"oddsFractional": "23/4",
"oddsAmerican": "575",
"status": "OPEN",
"cashOutStatus": "ENABLED"
},
{
"id": 2740660286,
"label": "2",
"englishLabel": "2",
"odds": 11000,
"participant": "Khimik Svetlogorsk",
"type": "OT_TWO",
"betOfferId": 2210856430,
"changedDate": "2020-04-14T09:11:55Z",
"participantId": 1001024009,
"oddsFractional": "10/1",
"oddsAmerican": "1000",
"status": "OPEN",
"cashOutStatus": "ENABLED"
}
],
"tags": [
"OFFERED_PREMATCH",
"MAIN"
],
"cashOutStatus": "ENABLED"
}
],
"events": [
{
"id": 1006276426,
"name": "FC Lokomotiv Gomel - Khimik Svetlogorsk",
"nameDelimiter": "-",
"englishName": "FC Lokomotiv Gomel - Khimik Svetlogorsk",
"homeName": "FC Lokomotiv Gomel",
"awayName": "Khimik Svetlogorsk",
"start": "2020-04-17T14:30:00Z",
"group": "1\u00aa Divisi\u00f3n",
"groupId": 2000053499,
"path": [
{
"id": 1000093190,
"name": "F\u00fatbol",
"englishName": "Football",
"termKey": "football"
},
{
"id": 2000051379,
"name": "Bielorrusa",
"englishName": "Belarus",
"termKey": "belarus"
},
{
"id": 2000053499,
"name": "1\u00aa Divisi\u00f3n",
"englishName": "1st Division",
"termKey": "1st_division"
}
],
"nonLiveBoCount": 6,
"sport": "FOOTBALL",
"tags": [
"MATCH"
],
"state": "NOT_STARTED",
"groupSortOrder": 1999999000000000000
}
],
"prePacks": []
}

Creating wordclouds with Altair

How do I create a wordcloud with Altair?
Vega and vega-lite provide wordcloud functionality which I have used succesfully in the past.
Therefore it should be possible to access it from Altair if I understand correctly and
I would prefer to prefer to express the visualizations in Python rather than embedded JSON.
All the examples for Altair I have seen involve standard chart types like
scatter plots and bar graphs.
I have not seen any involving wordclouds, networks, treemaps, etc.
More specifically how would I express or at least approximate the following Vega visualization in Altair?
def wc(pages, width=2**10.5, height=2**9.5):
return {
"$schema": "https://vega.github.io/schema/vega/v3.json",
"name": "wordcloud",
"width": width,
"height": height,
"padding": 0,
"data" : [
{
'name' : 'table',
'values' : [{'text': pg.title, 'definition': pg.defn, 'count': pg.count} for pg in pages)]
}
],
"scales": [
{
"name": "color",
"type": "ordinal",
"range": ["#d5a928", "#652c90", "#939597"]
}
],
"marks": [
{
"type": "text",
"from": {"data": "table"},
"encode": {
"enter": {
"text": {"field": "text"},
"align": {"value": "center"},
"baseline": {"value": "alphabetic"},
"fill": {"scale": "color", "field": "text"},
"tooltip": {"field": "definition", "type": "nominal", 'fontSize': 32}
},
"update": {
"fillOpacity": {"value": 1}
},
},
"transform": [
{
"type": "wordcloud",
"size": [width, height],
"text": {"field": "text"},
#"rotate": {"field": "datum.angle"},
"font": "Helvetica Neue, Arial",
"fontSize": {"field": "datum.count"},
#"fontWeight": {"field": "datum.weight"},
"fontSizeRange": [2**4, 2**6],
"padding": 2**4
}
]
}
],
}
Vega(wc(pages))
Altair's API is built on the Vega-Lite grammar, which includes only a subset of the plot types available in Vega. Word clouds cannot be created in Vega-Lite, so they cannot be created in Altair.
With mad respect to #jakevdp, you can construct a word cloud (or something word cloud-like) in altair by recognizing that the elements of a word cloud chart involve:
a dataset of words and their respective quantities
text_marks encoded with each word, and optionally size and or color based on quantity
"randomly" distributing the text_marks in 2d space.
One simple option to distribute marks is to add an additional 'x' and 'y' column to data, each element being a random sample from the range of your chosen x and y domain:
import random
def shuffled_range(n): return random.sample(range(n), k=n)
n = len(words_and_counts) # words_and_counts: a pandas data frame
x = shuffled_range(n)
y = shuffled_range(n)
data = words_and_counts.assign(x=x, y=y)
This isn't perfect as it doesn't explicitly prevent word overlap, but you can play with n and do a few runs of random number generation until you find a layout that's pleasing.
Having thus prepared your data you may specify the word cloud elements like so:
base = alt.Chart(data).encode(
x=alt.X('x:O', axis=None),
y=alt.Y('y:O', axis=None)
).configure_view(strokeWidth=0) # remove border
word_cloud = base.mark_text(baseline='middle').encode(
text='word:N',
color=alt.Color('count:Q', scale=alt.Scale(scheme='goldred')),
size=alt.Size('count:Q', legend=None)
)
Here's the result applied to the same dataset used in the Vega docs:

Vega lite use one data field for the axis and another for the label

In Vega Lite, is it possible to use one field of the data values as the axis value, and another field as the label?
If this is my vega lite spec, then the graph works correctly, but shows the dates on the x-axis. How can I show the day names on the x-axis instead?
{
"$schema": "https://vega.github.io/schema/vega-lite/v2.json",
"description": "basic line graph",
"data": {
"values": [
{"date":"2017-08-15", "dayName":"Tue","item":"foo","count":"0"},
{"date":"2017-08-16", "dayName":"Wed","item":"foo","count":"11"},
{"date":"2017-08-17", "dayName":"Thu","item":"foo","count":"7"},
{"date":"2017-08-18", "dayName":"Fri","item":"foo","count":"28"},
{"date":"2017-08-19", "dayName":"Sat","item":"foo","count":"0"},
{"date":"2017-08-20", "dayName":"Sun","item":"foo","count":"0"},
{"date":"2017-08-21", "dayName":"Mon","item":"foo","count":"0"}
]},
"mark": {
"type": "line",
"interpolate": "monotone"
},
"encoding": {
"x": {"field": "date", "type": "temporal"},
"y": {"field": "count", "type": "quantitative"}
}
}
It shows the date field, August 16, August 17 on the x-axis. How can I make it show the dayName field instead? It should show Tue, Wed, and so on.
You can use timeUnit.
{
"$schema": "https://vega.github.io/schema/vega-lite/v2.json",
"description": "basic line graph",
"data": {
"values": [
{"date":"2017-08-15", "dayName":"Tue","item":"foo","count":"0"},
{"date":"2017-08-16", "dayName":"Wed","item":"foo","count":"11"},
{"date":"2017-08-17", "dayName":"Thu","item":"foo","count":"7"},
{"date":"2017-08-18", "dayName":"Fri","item":"foo","count":"28"},
{"date":"2017-08-19", "dayName":"Sat","item":"foo","count":"0"},
{"date":"2017-08-20", "dayName":"Sun","item":"foo","count":"0"},
{"date":"2017-08-21", "dayName":"Mon","item":"foo","count":"0"}
]},
"mark": {
"type": "line",
"interpolate": "monotone"
},
"encoding": {
"x": {
"timeUnit": "day",
"field": "date",
"type": "temporal"
},
"y": {"field": "count", "type": "quantitative"}
}
}
If you want to customize the label format, you can add axis format, as well

Resources