Circles of different radius based on other values in d3 - node.js

I am trying to create nodes whose radius depend on the amount value in the link array. The data structure is as shown below:
{ "nodes": [{"id": "site01", "x": 317.5, "y": 282.5},
{"id": "site02", "x": 112, "y": 47},
{"id": "site03", "x": 69.5,"y": 287},
{"id": "site04", "x": 424.5, "y": 99.5},
],
"links": [
{"node01": "site01", "node02": "site04", "amount": 10},
{"node01": "site03", "node02": "site02", "amount": 120},
{"node01": "site01", "node02": "site03", "amount": 50},
{"node01": "site04", "node02": "site02", "amount": 80}]
}
The radius must be proportional to the total amount. For example, site01 must be of radius 60 while site02, 200. I've tried few codes but it doesn't seem to work.
var nodeRadius = {};
data.links.forEach(function (n){
nodeRadius[n.id] = n;
});
nodes.attr("r", function (d) {return nodeRadius[d.node01].amount;})
Please suggest a working code.

Related

Azure Cosmos DB aggregation over nested structure

I have an azure cosmos DB with documents as below
{
"id": "id_1",
"location": "location_1",
"colorCounts": {
"red" : 1,
"blue": 0,
"yellow": 1
}
},
{
"id": "id_2",
"location": "location_1",
"colorCounts": {
"red" : 0,
"blue": 0,
"yellow": 0
}
}
and want make a query that groups the results by location while averaging all the values in colorCounts. My result would look like this:
{
"location": "location_1",
"colorCounts": {
"red" : 0.5,
"blue": 0,
"yellow": 0.5
}
}
When I try to average over colorCounts:
SELECT c.id, c.location, AVG(c.colorCounts) FROM c GROUP BY c.location
I do not get any color counts. I can average over single colors, but I do not know how to average over the nested object colorCounts.
Script:
select a.location,{"red" : a.red, "blue": a.blue
,"yellow": a.yellow} as colorCounts FROM
(SELECT c.location,avg(c.colorCounts.red) as red,
avg(c.colorCounts.blue) as blue,
avg(c.colorCounts.yellow) as yellow FROM c
GROUP by c.location)a
I tried to repro this with the same sample input and got the required output.
Output:
[
{
"location": "location_1",
"colorCounts": {
"red": 0.5,
"blue": 0,
"yellow": 0.5
}
}
]

Why do groups not respect the parent's padding, except at the top level?

What exactly is the behavior of padding for group marks in Vega? At the top-most level the children groups respect the top-level padding, but this doesn't seem the case for the children's children, they don't respect their parent's padding.
For example, here I would expect to get a rectangle centered in a rectangle centered in another rectangle:
Open the Chart in the Vega Editor
Instead each rectangle seems to be anchored at the origin of the top-level coordinate system.
Note that replacing "padding": {"signal": "level_2_padding"} with "padding": {"value": 0} doesn't seem to have any effect, so I'm not even sure if inner groups can have padding?
How can I best implement nested groups that respect the parent's padding?
There is no padding property on a Group mark. Instead, you can access group properties using Field Values. Something like the following should work.
Editor
{
"$schema": "https://vega.github.io/schema/vega/v5.json",
"autosize": "none",
"config": {"group": {"stroke": "black"}},
"signals": [
{"name": "target_height", "value": 400},
{"name": "target_width", "value": 300},
{"name": "level_0_padding", "value": 64},
{"name": "level_1_padding", "update": "1/2 * level_0_padding"},
{"name": "level_2_padding", "update": "1/4 * level_0_padding"},
{"name": "level_0_height", "update": "target_height - 2*level_0_padding"},
{"name": "level_0_width", "update": "target_width - 2*level_0_padding"},
{"name": "level_1_width", "update": "level_0_width - 2*level_1_padding"},
{"name": "level_1_height", "update": "level_0_height - 2*level_1_padding"}
],
"width": {"signal": "level_0_width"},
"height": {"signal": "level_0_height"},
"padding": {"signal": "level_0_padding"},
"marks": [
{
"type": "group",
"signals": [
{
"name": "level_2_width",
"update": "level_1_width - 2*level_2_padding"
},
{
"name": "level_2_height",
"update": "level_1_height - 2*level_2_padding"
}
],
"encode": {
"update": {
"width": {"signal": "level_1_width"},
"height": {"signal": "level_1_height"},
"x": {"signal": "level_0_width-level_1_width - level_1_padding"},
"y": {"signal": "level_0_height-level_1_height - level_1_padding"},
"stroke": {"value": "red"},
"strokeOpacity": {"value": 0.5}
}
},
"marks": [
{
"type": "group",
"encode": {
"update": {
"width": {"signal": "level_2_width"},
"height": {"signal": "level_2_height"},
"x": {
"field": {"group": "width"},
"mult": 0.5,
"offset": {"signal": "-level_2_width/2"}
},
"y": {
"field": {"group": "height"},
"mult": 0.5,
"offset": {"signal": "-level_2_height/2"}
},
"stroke": {"value": "blue"},
"strokeOpacity": {"value": 0.5}
}
}
}
]
}
]
}
I'll accept David's answer, but also post my own to complement David's.
Here's an alternative solution specification, like David's spec it uses the "x" and "y" group properties, but I think it's a bit simpler and closer to what I need: Open the Chart in the Vega Editor
An important point that I have to mention is that using layout prevents x and y from working, that is: groups directly contained in a layout/grid may not be offset using x or y.

Azure Gremlin edge traversal suspiciously high (Out() step) RU cost

I have a weird issue, where doing an out-operation on a few edges causes my RU cost to triple. Hope someone can help me shed light on why + what I can do to mitigate it.
I have a Graph in CosmosDB, where there are two types of vertex labels: "Profile" and "Score". Each profile has 0 or 1 score-vertices via a "ProfileHasAggregatedScore" edge. The partitionKey is the ID of the Profile.
If I make the following queries, the RU currently is:
g.V().hasLabel('Profile').out('ProfileHasAggregatedScore')
>78 RU (8 scores found)
And for reference, the cost of getting all vertices of a type is:
g.V().hasLabel('Profile')
>28 RU (110 profiles found)
g.E().hasLabel('ProfileHasAggregatedScore')
>11 RU (8 edges found)
g.V().hasLabel('AggregatedRating')
>11 RU (8 scores found)
And the cost of a single of the vertices or edges are:
g.V('aProfileId').hasLabel('Profile')
>4 RU (1 found)
g.E('anEdgeId')
> 7RU
G.V('aRatingId')
> 3.5 RU
Can someone please help me as to why, making a traversal with only a few vertices along the way (see traversal at the bottom), is more expensive than searching for everything? And is there something I can do to prevent it? Adding a has-filter with the partitionKey does not seem to help. It seems odd that traversing/finding 16 elements more (8 edges and 8 vertices) after finding 110 vertices triples the cost of the operation?
(NB. With 1000 profiles the cost of doing 1 traversal along an edge to the score node is 2200 RU. This seems high, considering the emphasis their Azure team put on it being scalable?)
Traversal if it can help (It seems most of the time is spent finding the edges with the out() step):
[
{
"gremlin": "g.V().hasLabel('Profile').out('ProfileHasAggregatedScore').executionProfile()",
"totalTime": 46,
"metrics": [
{
"name": "GetVertices",
"time": 13,
"annotations": {
"percentTime": 28.26
},
"counts": {
"resultCount": 110
},
"storeOps": [
{
"fanoutFactor": 1,
"count": 110,
"size": 124649,
"time": 2.47
}
]
},
{
"name": "GetEdges",
"time": 26,
"annotations": {
"percentTime": 56.52
},
"counts": {
"resultCount": 8
},
"storeOps": [
{
"fanoutFactor": 1,
"count": 8,
"size": 5200,
"time": 6.22
},
{
"fanoutFactor": 1,
"count": 0,
"size": 49,
"time": 0.88
}
]
},
{
"name": "GetNeighborVertices",
"time": 7,
"annotations": {
"percentTime": 15.22
},
"counts": {
"resultCount": 8
},
"storeOps": [
{
"fanoutFactor": 1,
"count": 8,
"size": 6303,
"time": 1.18
}
]
},
{
"name": "ProjectOperator",
"time": 0,
"annotations": {
"percentTime": 0
},
"counts": {
"resultCount": 8
}
}
]
}
]
enter code here

How to calculate Heating/Cooling Degree Day using some api in python

i am trying to calculate heating/cooling degree day using (Tbase - Ta) formula Tbase is usually 65F and Ta = (high_temp + low_temp)/2
(e.x)
high_temp = 96.5F low_temp=65.21F then
mean=(high_temp + low_temp)/2
result = mean - 65
65 is average room temperature
if result is > 65 then cooling degree day(cdd) else heating degree day(hdd)
i get weather data from two api
weatherbit
darksky
in weatherbit the provide both cdd and hdd data, but in darksky we need to calculate using above formula (Tbase - Ta)
my problem is both api show different result (e.x)
darksky json response for day
{
"latitude": 47.552758,
"longitude": -122.150589,
"timezone": "America/Los_Angeles",
"daily": {
"data": [
{
"time": 1560927600,
"summary": "Light rain in the morning and overnight.",
"icon": "rain",
"sunriseTime": 1560946325,
"sunsetTime": 1561003835,
"moonPhase": 0.59,
"precipIntensity": 0.0057,
"precipIntensityMax": 0.0506,
"precipIntensityMaxTime": 1561010400,
"precipProbability": 0.62,
"precipType": "rain",
"temperatureHigh": 62.44,
"temperatureHighTime": 1560981600,
"temperatureLow": 48,
"temperatureLowTime": 1561028400,
"apparentTemperatureHigh": 62.44,
"apparentTemperatureHighTime": 1560981600,
"apparentTemperatureLow": 46.48,
"apparentTemperatureLowTime": 1561028400,
"dewPoint": 46.61,
"humidity": 0.75,
"pressure": 1021.81,
"windSpeed": 5.05,
"windGust": 8.36,
"windGustTime": 1560988800,
"windBearing": 149,
"cloudCover": 0.95,
"uvIndex": 4,
"uvIndexTime": 1560978000,
"visibility": 4.147,
"ozone": 380.8,
"temperatureMin": 49.42,
"temperatureMinTime": 1561010400,
"temperatureMax": 62.44,
"temperatureMaxTime": 1560981600,
"apparentTemperatureMin": 47.5,
"apparentTemperatureMinTime": 1561014000,
"apparentTemperatureMax": 62.44,
"apparentTemperatureMaxTime": 1560981600
}
]
},
"offset": -7
}
python calculation
response = result.get("daily").get("data")[0]
low_temp = response.get("temperatureMin")
hi_temp = response.get("temperatureMax")
mean = (hi_temp + low_temp)/2
#65 is normal room temp
print(65-mean)
here mean is 6.509999999999998
65 - mean = 58.49
hdd is 58.49 so cdd is 0
same date in weatherbit json response is :
{
"threshold_units": "F",
"timezone": "America/Los_Angeles",
"threshold_value": 65,
"state_code": "WA",
"country_code": "US",
"city_name": "Newcastle",
"data": [
{
"rh": 68,
"wind_spd": 5.6,
"timestamp_utc": null,
"t_ghi": 8568.9,
"max_wind_spd": 11.4,
"cdd": 0.4,
"dewpt": 46.9,
"snow": 0,
"hdd": 6.7,
"timestamp_local": null,
"precip": 0.154,
"t_dni": 11290.6,
"temp_wetbulb": 53.1,
"t_dhi": 1413.9,
"date": "2019-06-20",
"temp": 58.6,
"sun_hours": 7.6,
"clouds": 58,
"wind_dir": 186
}
],
"end_date": "2019-06-21",
"station_id": "727934-94248",
"count": 1,
"start_date": "2019-06-20",
"city_id": 5804676
}
here hdd is 6.7 and cdd is 0.4
can you explain how they get this result ?
You need to use hourly data to calculate the HDD and CDD, and then average them to get the daily value.
More details here: https://www.weatherbit.io/blog/post/heating-and-cooling-degree-days-weather-api-release

Creating wordclouds with Altair

How do I create a wordcloud with Altair?
Vega and vega-lite provide wordcloud functionality which I have used succesfully in the past.
Therefore it should be possible to access it from Altair if I understand correctly and
I would prefer to prefer to express the visualizations in Python rather than embedded JSON.
All the examples for Altair I have seen involve standard chart types like
scatter plots and bar graphs.
I have not seen any involving wordclouds, networks, treemaps, etc.
More specifically how would I express or at least approximate the following Vega visualization in Altair?
def wc(pages, width=2**10.5, height=2**9.5):
return {
"$schema": "https://vega.github.io/schema/vega/v3.json",
"name": "wordcloud",
"width": width,
"height": height,
"padding": 0,
"data" : [
{
'name' : 'table',
'values' : [{'text': pg.title, 'definition': pg.defn, 'count': pg.count} for pg in pages)]
}
],
"scales": [
{
"name": "color",
"type": "ordinal",
"range": ["#d5a928", "#652c90", "#939597"]
}
],
"marks": [
{
"type": "text",
"from": {"data": "table"},
"encode": {
"enter": {
"text": {"field": "text"},
"align": {"value": "center"},
"baseline": {"value": "alphabetic"},
"fill": {"scale": "color", "field": "text"},
"tooltip": {"field": "definition", "type": "nominal", 'fontSize': 32}
},
"update": {
"fillOpacity": {"value": 1}
},
},
"transform": [
{
"type": "wordcloud",
"size": [width, height],
"text": {"field": "text"},
#"rotate": {"field": "datum.angle"},
"font": "Helvetica Neue, Arial",
"fontSize": {"field": "datum.count"},
#"fontWeight": {"field": "datum.weight"},
"fontSizeRange": [2**4, 2**6],
"padding": 2**4
}
]
}
],
}
Vega(wc(pages))
Altair's API is built on the Vega-Lite grammar, which includes only a subset of the plot types available in Vega. Word clouds cannot be created in Vega-Lite, so they cannot be created in Altair.
With mad respect to #jakevdp, you can construct a word cloud (or something word cloud-like) in altair by recognizing that the elements of a word cloud chart involve:
a dataset of words and their respective quantities
text_marks encoded with each word, and optionally size and or color based on quantity
"randomly" distributing the text_marks in 2d space.
One simple option to distribute marks is to add an additional 'x' and 'y' column to data, each element being a random sample from the range of your chosen x and y domain:
import random
def shuffled_range(n): return random.sample(range(n), k=n)
n = len(words_and_counts) # words_and_counts: a pandas data frame
x = shuffled_range(n)
y = shuffled_range(n)
data = words_and_counts.assign(x=x, y=y)
This isn't perfect as it doesn't explicitly prevent word overlap, but you can play with n and do a few runs of random number generation until you find a layout that's pleasing.
Having thus prepared your data you may specify the word cloud elements like so:
base = alt.Chart(data).encode(
x=alt.X('x:O', axis=None),
y=alt.Y('y:O', axis=None)
).configure_view(strokeWidth=0) # remove border
word_cloud = base.mark_text(baseline='middle').encode(
text='word:N',
color=alt.Color('count:Q', scale=alt.Scale(scheme='goldred')),
size=alt.Size('count:Q', legend=None)
)
Here's the result applied to the same dataset used in the Vega docs:

Resources