Substring filtering in Altair / using "params" - altair

I am using Altair and would like to filter data using a substring search. Here is an example of doing it in vega-lite. Here is the code:
{
"config": {"view": {"continuousWidth": 400, "continuousHeight": 300}},
"data": {"name": "d"},
"mark": "point",
"encoding": {
"x": {"type": "quantitative", "field": "xval", "scale":{"domain": [0,4]}},
"y": {"type": "quantitative", "field": "yval", "scale":{"domain": [1,10]}}
},
"params": [{"name": "Letter", "value": "A",
"bind": {"input": "select", "options": ["A", "B", "C", "D", "E", "F"]}
}],
"transform": [
{"filter": "indexof(datum.info, Letter)>-1"}
],
"datasets": {
"d": [
{"xval": 1, "yval": 7, "info": "A;B;D;E"},
{"xval": 2, "yval": 2, "info": "A;C;E;F"},
{"xval": 3, "yval": 9, "info": "A;B;D"}
]
}
}
This allows me to filter out rows that contain "A", "B", "C" etc. in the info column, but it relies on "params" which is not available in Altair yet - is there any other way of achieving this kind of "substring" filtering in Altair as of now? This is meant to be a minimal example, but I have a large number of "options" (many gene names) in my actual use case, so adding a column for each to the original data wouldn't be feasible.
Trying to do this in Altair because it is for an executable research article which I believe allows Altair but not vega-lite.

Edit: realized that indexing like infoSel.info[0] gives the string of the selection from the dropdown. This still worked with infoSel.info (with no index) but that was just lucky - in expressions like this doing infoSel.info[0] is more correct.
Got it! This is possible with an expression in transform_filter, which I had previously tried but done incorrectly (I was using the name of the dropdown, not the name of the select object):
d = pd.DataFrame({'xval': [1, 2, 3],
'yval': [7, 2, 9],
'info': ['A;B;D;E', 'A;C;E;F', 'B;D']})
info_dropdown = alt.binding_select(options=['A', 'B', 'C', 'D', 'E', 'F'], name='Letter')
info_sel = alt.selection_single(name='infoSel', fields=['info'], bind=info_dropdown, init={'info': 'A'})
alt.Chart(d).mark_circle().encode(
x='xval', y='yval'
).add_selection(info_sel).transform_filter('indexof(datum.info, infoSel.info[0])>-1')

Related

Search for a dictionary based on a property value

I am trying to get list of dictionaries from a list based on a specific property list of values? Any suggestions
list_of_persons = [
{"id": 2, "name": "name_2", "age": 23},
{"id": 3, "name": "name_3", "age": 43},
{"id": 4, "name": "name_4", "age": 35},
{"id": 5, "name": "name_5", "age": 59}
]
ids_search_list = [2, 4]
I'd like to get the following list
result_list = [
{"id": 2, "name": "name_2", "age": 23},
{"id": 4, "name": "name_4", "age": 35}
]
looping could be the simplest solution but there should be a better one in python
you can do this like that :
list_of_persons = [
{"id": 2, "name": "name_2", "age": 23},
{"id": 3, "name": "name_3", "age": 43},
{"id": 4, "name": "name_4", "age": 35},
{"id": 5, "name": "name_5", "age": 59}
]
ids_search_list = [2, 4]
result = []
for person in list_of_persons:
if person["id"] in ids_search_list:
result.append(person)
print(result)
You can use list comprehension
result_list = [person for person in list_of_persons if person["id"] in ids_search_list]
If you want some reading material about it: https://realpython.com/list-comprehension-python/

Get dict inside a list with value without for loop

I have this dict:
data_flights = {
"prices": [
{ "city": "Paris", "iataCode": "AAA", "lowestPrice": 54, "id": 2 },
{ "city": "Berlin", "iataCode": "BBB", "lowestPrice": 42, "id": 3 },
{ "city": "Tokyo", "iataCode": "CCC", "lowestPrice": 485, "id": 4 },
{ "city": "Sydney", "iataCode": "DDD", "lowestPrice": 551, "id": 5 },
],
"date": "31/03/2022"
}
Can I acess a dict using a key value from one of the dics, without using for loop?
something like this:
data_flights["prices"]["city" == "Berlin"]
You can achieve this by either using a comprehension or the filter built in.
comprehension:
[e for e in d['prices'] if e['city'] == 'Berlin']
filter:
list(filter(lambda e: e['city'] == 'Berlin', d['prices']))
Both would result in:
[{'city': 'Berlin', 'iataCode': 'BBB', 'lowestPrice': 42, 'id': 3}]
You can use list comprehension
x = [a for a in data_flights["prices"] if a["city"] == "Berlin"]
>>> x
[{'city': 'Berlin', 'iataCode': 'BBB', 'lowestPrice': 42, 'id': 3}]

How do I fold by "{" in Sublime Text 3 if I have an array of JSON objects?

If I have the following array of JSON objects in Sublime, it will collapse at any "[". My ask is if this can be changed to fold at any "{"
Taking this JSON array
[{"a":[{"b":1,"c":2},{"d":3},{"x":99,"y":100}],"b":4},{"a":[{"b":5,"c":6}],"b":7}]
and pretty formatting it within the same file
[
{
"a": [
{
"b": 1,
"c": 2
},
{
"d": 3
},
{
"x": 99,
"y": 100
}
],
"b": 4
},
{
"a": [
{
"b": 5,
"c": 6
}
],
"b": 7
}
]
As an example of it working mostly the way I am looking for, with the JSON below, it will collapse at any "{" with more than one element, so {"d":3} does not collapse:
{
"a": [
{
"b": 1,
"c": 2
},
{
"d": 3
},
{
"x": 99,
"y": 100
}
],
"b": 4
}
Edit - it looks like if I take the formatted array into a new file, it will recognize all potential folding points.

Spatial Indexes with DocumentDB

I’m trying to do a spatial query against DocumentDB that looks like this:
SELECT * FROM root r WHERE
ST_WITHIN({'type':'Point','coordinates':[-122.02625, 37.4718]}, r.boundingBox)
to match a document that looks like this in the collection:
{
"userId": "747941cfb829",
"id": "747941cfb829_1453640096710",
"boundingBox": {
"type": "Polygon",
"coordinates": [
[-122.0263, 37.9718],
[-122.0262, 37.9718],
[-122.0262, 36.9718],
[-122.0263, 36.9718],
[-122.0263, 37.9718]
]
},
"distance": 0,
"duration": 1
}
I’ve turned on spatial indexes ala https://azure.microsoft.com/en-us/documentation/articles/documentdb-geospatial/ but I’m not getting a match back from DocumentDB.
Any ideas?
NOTE: Corrected GeoJson coordinate order.
The correct specification of a GeoJSON polygon has an additional array around the coordinates than you show to allow for the possibility of holes and multipolygons. So, it would look like this:
{
"type": "Polygon",
"coordinates": [
[
[0, 0], [10, 10], [10, 0], [0, 0]
]
]
}

MongoDB update query for nest array

Having collection Measurement such as shown below:
{
"Data" : [ [-5, [[1, 1023.0], [2, 694.0]]], [-1, [[1, 0.0], [2, 20.0]]], [-3, [[1, 30.75], [2, 30.75]]] ]
}
it reflects c# structure of Dictionary<int, Dictionary<int, double>> - what I'd need to do is to write an update script which will add 5 to all the parental dictionary keys. How could this be done via mongo update script? So it would turn the object to look as follows:
{
"Data" : [ [0, [[1, 1023.0], [2, 694.0]]], [4, [[1, 0.0], [2, 20.0]]], [2, [[1, 30.75], [2, 30.75]]] ]
}
The only way to do this is programatically, i.e., looping over the Data array and updating each individually.
This is probably not the structure that you really want if you need to update things in this way. The problem lies with the ability to match elements in a nested array in that the current limitation is that you can only match the first position and reference that index only when doing an update.
We can't tell much about your purpose based on what you have presented, but what you probably need is something like this:
{
"Data" : [
{
"pos": 0,
"ref": -5,
"A": { "x": 1, "y": 1023.0 },
"B": { "x": 2, "y": 694.0 }
},
{
"pos": 1,
"ref": -1,
"A": { "x": 1, "y": 0.0},
"B": { "x": 2, "y": 20.0 }
},
{
"pos": 2,
"ref": -3,
"A": { "x": 1, "y": 30.75 },
"B": { "x": 2, "y": 30.75 }
}
]
}
Yet even that does not allow you to update in a single query. You can do it with one for each element though:
db.collection.update({"_id": id, "Data.pos": 0}, {"$inc":{"Data.$.ref": 5}});
db.collection.update({"_id": id, "Data.pos": 1}, {"$inc":{"Data.$.ref": 5}});
db.collection.update({"_id": id, "Data.pos": 3}, {"$inc":{"Data.$.ref": 5}});
And your current schema would not allow you to do even that. And at least all of the elements could be accessed in this way, which again they could not before.
In any case, updating all of the array elements at once is not possible other than in a loop:
db.collection.find({ "_id": id }).forEach(function(doc) {
doc.Data.forEach(function(data) {
data.ref += 5;
});
db.collection.update(
{ "_id": doc._id },
{ "$set": { "Data": doc.Data } }
);
})
Or some variant that might even do something like the first example rather that just replacing the whole array as this does. Your current structure would rely on looping through several nested arrays to do the same thing.
Of course if you regularly have to update all elements in this way, then consider something other than an array. Or live with how you have to update, according to what your data access needs are.
Read the documentation on how things can be handled and make you decisions from there.

Resources