Search for a dictionary based on a property value - python-3.x

I am trying to get list of dictionaries from a list based on a specific property list of values? Any suggestions
list_of_persons = [
{"id": 2, "name": "name_2", "age": 23},
{"id": 3, "name": "name_3", "age": 43},
{"id": 4, "name": "name_4", "age": 35},
{"id": 5, "name": "name_5", "age": 59}
]
ids_search_list = [2, 4]
I'd like to get the following list
result_list = [
{"id": 2, "name": "name_2", "age": 23},
{"id": 4, "name": "name_4", "age": 35}
]
looping could be the simplest solution but there should be a better one in python

you can do this like that :
list_of_persons = [
{"id": 2, "name": "name_2", "age": 23},
{"id": 3, "name": "name_3", "age": 43},
{"id": 4, "name": "name_4", "age": 35},
{"id": 5, "name": "name_5", "age": 59}
]
ids_search_list = [2, 4]
result = []
for person in list_of_persons:
if person["id"] in ids_search_list:
result.append(person)
print(result)

You can use list comprehension
result_list = [person for person in list_of_persons if person["id"] in ids_search_list]
If you want some reading material about it: https://realpython.com/list-comprehension-python/

Related

Get dict inside a list with value without for loop

I have this dict:
data_flights = {
"prices": [
{ "city": "Paris", "iataCode": "AAA", "lowestPrice": 54, "id": 2 },
{ "city": "Berlin", "iataCode": "BBB", "lowestPrice": 42, "id": 3 },
{ "city": "Tokyo", "iataCode": "CCC", "lowestPrice": 485, "id": 4 },
{ "city": "Sydney", "iataCode": "DDD", "lowestPrice": 551, "id": 5 },
],
"date": "31/03/2022"
}
Can I acess a dict using a key value from one of the dics, without using for loop?
something like this:
data_flights["prices"]["city" == "Berlin"]
You can achieve this by either using a comprehension or the filter built in.
comprehension:
[e for e in d['prices'] if e['city'] == 'Berlin']
filter:
list(filter(lambda e: e['city'] == 'Berlin', d['prices']))
Both would result in:
[{'city': 'Berlin', 'iataCode': 'BBB', 'lowestPrice': 42, 'id': 3}]
You can use list comprehension
x = [a for a in data_flights["prices"] if a["city"] == "Berlin"]
>>> x
[{'city': 'Berlin', 'iataCode': 'BBB', 'lowestPrice': 42, 'id': 3}]

Substring filtering in Altair / using "params"

I am using Altair and would like to filter data using a substring search. Here is an example of doing it in vega-lite. Here is the code:
{
"config": {"view": {"continuousWidth": 400, "continuousHeight": 300}},
"data": {"name": "d"},
"mark": "point",
"encoding": {
"x": {"type": "quantitative", "field": "xval", "scale":{"domain": [0,4]}},
"y": {"type": "quantitative", "field": "yval", "scale":{"domain": [1,10]}}
},
"params": [{"name": "Letter", "value": "A",
"bind": {"input": "select", "options": ["A", "B", "C", "D", "E", "F"]}
}],
"transform": [
{"filter": "indexof(datum.info, Letter)>-1"}
],
"datasets": {
"d": [
{"xval": 1, "yval": 7, "info": "A;B;D;E"},
{"xval": 2, "yval": 2, "info": "A;C;E;F"},
{"xval": 3, "yval": 9, "info": "A;B;D"}
]
}
}
This allows me to filter out rows that contain "A", "B", "C" etc. in the info column, but it relies on "params" which is not available in Altair yet - is there any other way of achieving this kind of "substring" filtering in Altair as of now? This is meant to be a minimal example, but I have a large number of "options" (many gene names) in my actual use case, so adding a column for each to the original data wouldn't be feasible.
Trying to do this in Altair because it is for an executable research article which I believe allows Altair but not vega-lite.
Edit: realized that indexing like infoSel.info[0] gives the string of the selection from the dropdown. This still worked with infoSel.info (with no index) but that was just lucky - in expressions like this doing infoSel.info[0] is more correct.
Got it! This is possible with an expression in transform_filter, which I had previously tried but done incorrectly (I was using the name of the dropdown, not the name of the select object):
d = pd.DataFrame({'xval': [1, 2, 3],
'yval': [7, 2, 9],
'info': ['A;B;D;E', 'A;C;E;F', 'B;D']})
info_dropdown = alt.binding_select(options=['A', 'B', 'C', 'D', 'E', 'F'], name='Letter')
info_sel = alt.selection_single(name='infoSel', fields=['info'], bind=info_dropdown, init={'info': 'A'})
alt.Chart(d).mark_circle().encode(
x='xval', y='yval'
).add_selection(info_sel).transform_filter('indexof(datum.info, infoSel.info[0])>-1')

How to find common struct for all documents in collection?

I have an array of documents, that have more or less same structure. But I need find fields that present in all documents. Somethink like:
{
"name": "Jow",
"salary": 7000,
"age": 25,
"city": "Mumbai"
},
{
"name": "Mike",
"backname": "Brown",
"sex": "male",
"city": "Minks",
"age": 30
},
{
"name": "Piter",
"hobby": "footbol",
"age": 25,
"location": "USA"
},
{
"name": "Maria",
"age": 22,
"city": "Paris"
},
All docs have name and age. How to find them with ArangoDB?
You could do the following:
Retrieve the attribute names of each document
Get the intersection of those attributes
i.e.
LET attrs = (FOR item IN test RETURN ATTRIBUTES(item, true))
RETURN APPLY("INTERSECTION", attrs)
APPLY is necessary so each list of attributes in attrs can be passed as a separate parameter to INTERSECTION.
Documentation:
ATTRIBUTES: https://www.arangodb.com/docs/stable/aql/functions-document.html#attributes
INTERSECTION: https://www.arangodb.com/docs/stable/aql/functions-array.html#intersection
APPLY: https://www.arangodb.com/docs/stable/aql/functions-miscellaneous.html#apply

PySpark: How to create a nested JSON from spark data frame?

I am trying to create a nested json from my spark dataframe which has data in following structure. The below code is creating a simple json with key and value. Could you please help
df.coalesce(1).write.format('json').save(data_output_file+"createjson.json", overwrite=True)
Update1:
As per #MaxU answer,I converted the spark data frame to pandas and used group by. It is putting the last two fields in a nested array. How could i first put the category and count in nested array and then inside that array i want to put subcategory and count.
Sample text data:
Vendor_Name,count,Categories,Category_Count,Subcategory,Subcategory_Count
Vendor1,10,Category 1,4,Sub Category 1,1
Vendor1,10,Category 1,4,Sub Category 2,2
Vendor1,10,Category 1,4,Sub Category 3,3
Vendor1,10,Category 1,4,Sub Category 4,4
j = (data_pd.groupby(['vendor_name','vendor_Cnt','Category','Category_cnt'], as_index=False)
.apply(lambda x: x[['Subcategory','subcategory_cnt']].to_dict('r'))
.reset_index()
.rename(columns={0:'subcategories'})
.to_json(orient='records'))
[{
"vendor_name": "Vendor 1",
"count": 10,
"categories": [{
"name": "Category 1",
"count": 4,
"subCategories": [{
"name": "Sub Category 1",
"count": 1
},
{
"name": "Sub Category 2",
"count": 1
},
{
"name": "Sub Category 3",
"count": 1
},
{
"name": "Sub Category 4",
"count": 1
}
]
}]
You need to re-structure the whole dataframe for that.
"subCategories" is a struct stype.
from pyspark.sql import functions as F
df.withColumn(
"subCategories",
F.struct(
F.col("subCategories").alias("name"),
F.col("subcategory_count").alias("count")
)
)
and then, groupBy and use F.collect_list to create the array.
At the end, you need to have only 1 record in your dataframe to get the result you expect.
The easiest way to do this in python/pandas would be to use a series of nested generators using groupby I think:
def split_df(df):
for (vendor, count), df_vendor in df.groupby(["Vendor_Name", "count"]):
yield {
"vendor_name": vendor,
"count": count,
"categories": list(split_category(df_vendor))
}
def split_category(df_vendor):
for (category, count), df_category in df_vendor.groupby(
["Categories", "Category_Count"]
):
yield {
"name": category,
"count": count,
"subCategories": list(split_subcategory(df_category)),
}
def split_subcategory(df_category):
for row in df.itertuples():
yield {"name": row.Subcategory, "count": row.Subcategory_Count}
list(split_df(df))
[
{
"vendor_name": "Vendor1",
"count": 10,
"categories": [
{
"name": "Category 1",
"count": 4,
"subCategories": [
{"name": "Sub Category 1", "count": 1},
{"name": "Sub Category 2", "count": 2},
{"name": "Sub Category 3", "count": 3},
{"name": "Sub Category 4", "count": 4},
],
}
],
}
]
To export this to json, you'll need a way to export the np.int64

Python yaml parse error

import pandas as pd
import yaml as y
Movies = pd.read_csv('tmdb_5000_movies.csv',encoding="ISO-8859-1")
company = pd.DataFrame(Movies[['original_title','production_companies']])
for idn in range(10000):
for index in range(len(company['original_title'])):
akm = y.load(company.loc[index,'production_companies'])
for i in range(len(akm)):
if akm[i]['id'] == idn:
if str(idn) not in keyword.columns:
keyword[str(idn)] = " "
keyword.loc[index,str(idn)] = 1
elif str(idn) in keyword.columns:
keyword.loc[index,str(idn)] = 1
# check if akm == idn
# akm length
keyword = keyword.fillna(0)
My data:
[{"id": 416, "name": "miami"},
{"id": 529, "name": "ku klux klan"},
{"id": 701, "name": "cuba"},
{"id": 1568, "name": "undercover"},
{"id": 1666, "name": "mexican standoff"},
{"id": 1941, "name": "ecstasy"},
{"id": 7963, "name": "guant\u00e1namo"},
{"id": 10089, "name": "slaughter"},
{"id": 10950, "name": "shootout"},
{"id": 12371, "name": "gunfight"},
{"id": 12648, "name": "bromance"},
{"id": 13142, "name": "gangster"},
{"id": 14819, "name": "violence"},
{"id": 14967, "name": "foot chase"},
{"id": 15271, "name": "interrogation"},
{"id": 15483, "name": "car chase"},
{"id": 18026, "name": "drug lord"},
{"id": 18067, "name": "exploding house"},
{"id": 155799, "name": "narcotics cop"},
{"id": 156117, "name": "illegal drugs"},
{"id": 156805, "name": "dea agent"},
{"id": 167316, "name": "buddy cop"},
{"id": 179093, "name": "criminal underworld"},
{"id": 219404, "name": "action hero"},
{"id": 226380, "name": "haitian gang"},
{"id": 226381, "name": "minefield"}]
Error message (copied from the comments below):
ParserError: while parsing a flow mapping in "<unicode string>", line 1, column 2: {""name"": ""Dune Entertainment"" ^ expected ',' or '}', but got '<scalar>' in "<unicode string>", line 1, column 5: {""name"": ""Dune Entertainment"" ^

Resources