extract dictionary elements from nested list in python - list-comprehension

I have a question.
I have a nested list that looks like this.
x= [[{'screen_name': 'BreitbartNews',
'name': 'Breitbart News',
'id': 457984599,
'id_str': '457984599',
'indices': [126, 140]}],
[],
[],
[{'screen_name': 'BreitbartNews',
'name': 'Breitbart News',
'id': 457984599,
'id_str': '457984599',
'indices': [98, 112]}],
[{'screen_name': 'BreitbartNews',
'name': 'Breitbart News',
'id': 457984599,
'id_str': '457984599',
'indices': [82, 96]}]]
There are some empty lists inside the main list.
What I am trying to do is to extract screen_name and append them as a new list including the empty ones (maybe noting them as 'null').
y=[]
for i in x :
for j in i :
if len(j)==0 :
n = 'null'
else :
n = j['screen_name']
y.append(n)
I don't know why the code above outputs a list,
['BreitbartNews',
'BreitbartNews',
'BreitbartNews',
'BreitbartNews',
'BreitbartNews']
which don't reflect the empty sublist.
Can anyone help me how I can refine my code to make it right?

You are checking the lengths of the wrong lists. Your empty lists are in the i variables.
The correct code would be
y=[]
for i in x :
if len(i) == 0:
n = 'null'
else:
n = i[0]['screen_name']
y.append(n)
It may help to print(i) in each iteration to better understand what is actually happening.

Related

python3 - check element is actually in list

for example i have excel header list like this
excel_headers = [
'Name',
'Age',
'Sex',
]
and i have another list to check againt it.
headers = {'Name' : 1, 'Age': 2, 'Sex': 3, 'Whatever': 4}
i dont care if headers have whatever elements, i care only element in headers has excel_headers element.
WHAT I've TRIED
lst = all(headers[idx][0] == header for idx,
header in enumerate(excel_headers))
print(lst)
however it always return False.
any help? pleasse
Another way to do it using sets would be to use set difference:
excel_headers = ['Name', 'Age', 'Sex']
headers = {'Name' : 1, 'Age': 2, 'Sex': 3, 'Whatever': 4}
diff = set(excel_headers) - set(headers)
hasAll = len(diff) == 0 # len 0 means every value in excel_headers is present in headers
print(diff) #this will give you unmatched elements
Just sort your list, the results shows you a before and after
excel_headers = [
'Name',
'Age',
'Sex',
]
headers = ['Age' , 'Name', 'Sex']
if excel_headers==headers: print "YES!"
else: print "NO!"
excel_headers.sort()
headers.sort()
if excel_headers==headers: print "YES!"
else: print "NO!"
Output:
No!
Yes!
Tip: this is a good use case for a set, since you're looking up elements by value to see if they exist. However, for small lists (<100 elements) the difference in performance isn't really noticeable, and using a list is fine.
excel_headers = ['Name', 'Age', 'Sex']
headers = {'Name' : 1, 'Age': 2, 'Sex': 3, 'Whatever': 4}
result = all(element in headers for element in excel_headers)
print(result) # --> True

How to get a dict from a list of dict according to some threshold

I have a list of dicts like the one below:
list_dict = [{'Name': 'Andres', 'score': 0.17669814825057983},
{'Name': 'Paul', 'score': 0.14028045535087585},
{'Name': 'Feder', 'score': 0.1379694938659668},
{'Name': 'James', 'score': 0.1348174512386322}]
I want to output another list of dict but only when sum of score is higher than a threshold=0.15
Expected output: [{'name':'Andres', 'score' : 0.1766..}]
I did this, but the code is terrible and the outuput is wrongly formatted
l = []
for i in range(len(list_dict)):
for k in list_dict[i]['name']:
if list_dict[i]['score']>0.15:
print(k)
Maybe this is what you're looking?
Actually you're very close... but just miss a few syntax.
Each item in list_dict is a dictionary, so you can access and ask the score, it should not use index to get the interesting part.
new_dc = list()
for item in list_dict: # each item is a dictionary
if item['score'] > 0.15: # it's better to use a meaningful variable.
new_dc.append(item)
print(new_dc) # [{'Name': 'Andres', 'score': 0.17669814825057983}]
Alternatively you can use List Comprehension:
output = [item for item in list_dict if item['score'] > 0.15]
assert new_dc == output # Silence mean they're the same
1st approach using loop
final_list = []
for each in list_dict: #simply iterate through each dict in list and compare score
if each['score']>0.15:
final_list.append(each)
print(final_list)
2nd approach using list comprehension
final_list = [item for item in list_dict if item['score']>0.15]
print(final_list)

create new dictionary based on keys and split the dictionary values

I am relatively new to python programming. I was trying some challenges in online to thorough my programming skills. I got stuck with the below code. Please someone help here.
ress = {'product': ['Mountain Dew Spark', 'pepsi'], 'quantity': ['7', '5']}
prods_list = []
prods_dict = {}
for k , v in ress.items():
if "product" in k:
if len(ress['product']) > 1:
entity_names = {}
entity_list = []
for i in range(len(ress['product'])):
prod = "product_" + str(i)
entity_names['product'] = ress['product'][i]
entity_names['quantity'] = ress['quantity'][i]
entity_list.append(entity_names)
prods_dict[prod] = entity_list
prods_list.append(prods_dict)
print(prods_list)
i am expecting output as below
Expected output:
[{"product_0":
{"quantity" : "7",
"product" : "mountain dew spark"}
},
{"product_1" : {
"quantity" : "5",
"product" : "pepsi"
}}]
Actual output:
[{'product_0': [{'product': 'pepsi', 'quantity': '5'},
{'product': 'pepsi', 'quantity': '5'}],
'product_1': [{'product': 'pepsi', 'quantity': '5'},
{'product': 'pepsi', 'quantity': '5'}]}]
Please note i want my code work for single values as well like ress = {'product': ['Mountain Dew Spark'], 'quantity': ['7']}
This is one way you can achieve it with regular loops:
ress = {'product': ['Mountain Dew Spark', 'pepsi'], 'quantity': ['7', '5']}
prods_list = []
for key, value in ress.items():
for ind, el in enumerate(value):
prod_num = 'product_' + str(ind)
# If this element is already present
if (len(prods_list) >= ind + 1):
# Add to existing dict
prods_list[ind][prod_num][key] = el
else:
# Otherwise - create a new dict
prods_list.append({ prod_num : { key : el } })
print(prods_list)
The first loop goes through the input dictionary, the second one through each of its lists. The code then determines if a dictionary for that product is already in the output list by checking the output list length. If it is, the code simply appends new inner dict for that product. If it is not - the code creates an outer dict for that product - and an inner one for this particular value set.
Maybe using a list comprehension along with enumerate and zip might be easier:
>>> res = {'product': ['Mountain Dew Spark', 'pepsi'], 'quantity': ['7', '5']}
>>> prods_list = [
... {f'product_{i}': {'quantity': int(q), 'product': p.lower()}}
... for i, (q, p) in enumerate(zip(res['quantity'], res['product']))
... ]
>>> prods_list
[{'product_0': {'quantity': 7, 'product': 'mountain dew spark'}}, {'product_1': {'quantity': 5, 'product': 'pepsi'}}]
This assumes that there will be no duplicate product entries. In that case, you would need to use a traditional for loop.

Auto-extracting columns from nested dictionaries in pandas

So I have this nested multiple dictionaries in a jsonl file column as below:
`df['referenced_tweets'][0]`
producing (shortened output)
'id': '1392893055112400898',
'public_metrics': {'retweet_count': 0,
'reply_count': 1,
'like_count': 2,
'quote_count': 0},
'conversation_id': '1392893055112400898',
'created_at': '2021-05-13T17:22:37.000Z',
'reply_settings': 'everyone',
'entities': {'annotations': [{'start': 65,
'end': 77,
'probability': 0.9719000000000001,
'type': 'Person',
'normalized_text': 'Jill McMillan'}],
'mentions': [{'start': 23,
'end': 36,
'username': 'usasklibrary',
'protected': False,
'description': 'The official account of the University Library at USask.',
'created_at': '2019-06-04T17:19:12.000Z',
'entities': {'url': {'urls': [{'start': 0,
'end': 23,
'url': '*removed*',
'expanded_url': 'http://library.usask.ca',
'display_url': 'library.usask.ca'}]}},
'name': 'University Library',
'url': '....',
'profile_image_url': 'https://pbs.twimg.com/profile_images/1278828446026629120/G1w7t-HK_normal.jpg',
'verified': False,
'id': '1135959197902921728',
'public_metrics': {'followers_count': 365,
'following_count': 119,
'tweet_count': 556,
'listed_count': 9}}]},
'text': 'Wonderful session with #usasklibrary Graduate Writing Specialist Jill McMillan who is walking SURE students through the process of organizing/analyzing a literature review! So grateful to the library -- our largest SURE: Student Undergraduate Research Experience partner!',
...
My intention is to create a function that would auto extract specific columns (e.g. text,type) in the entire dataframe (not just a row). So I wrote the function:
### x = df['referenced_tweets']
def extract_TextType(x):
dic = {}
for i in x:
if i != " ":
new_df= pd.DataFrame.from_dict(i)
dic['refd_text']=new_df['text']
dic['refd_type'] = new_df['type']
else:
print('none')
return dic
However running the function:
df['referenced_tweets'].apply(extract_TextType)
produces an error:
ValueError: Mixing dicts with non-Series may lead to ambiguous ordering.
The whole point is to extract these two nested columns (texts & type) from the original "referenced tweets" column and match them to the original rows.
What am I doing wrong pls?
P.S.
The original df is shotgrabbed below:
A couple things to consider here. referenced_tweets holds a list so this line new_df= pd.DataFrame.from_dict(i) is most likely not parsing correctly the way you are entering it.
Also, because it's possible there are multiple tweets in that list you are correctly iterating over it but you don't need put it into a df to do so. This will also create a new dictionary in each cell as you are using a .apply(). If that's what you want that is ok. If you really just want a new dataframe, you can adapt the following. I don't have access to referenced_tweets so I'm using entities as an example.
Here's my example:
ents = df[df.entities.notnull()]['entities']
dict_hold_list = []
for ent in ents:
# print(ent['hashtags'])
for htag in ent['hashtags']:
# print(htag['text'])
# print(htag['indices'])
dict_hold_list.append({'text': htag['text'], 'indices': htag['indices']})
df_hashtags = pd.DataFrame(dict_hold_list)
Because you have not provided a good working json or dataframe, I can't test this, but your solution could look like this
refs = df[df.referenced_tweets.notnull()]['referenced_tweets']
dict_hold_list = []
for ref in refs:
# print(ref)
for r in ref:
# print(r['text'])
# print(r['type'])
dict_hold_list.append({'text': r['text'], 'type': r['type']})
df_ref_tweets = pd.DataFrame(dict_hold_list)

How to add a lots of list items in a different list in Python

I was creating a list python file. There I created a lot of list and adding new items and lists everyday. I wanted to create a list that will have all the items of previous lists automatically. What should I do?
list_1=['1','one','first',etc...]
list_2=['2','two', 'second', '2nd', etc]
.
.
list_x=['x', 'cross']
all_list=list_1+list_2+....+list_x+... #this will update automatically
How to do it?
This problem can actually be solved by a more adapted choice of data structure. If some items are related, they should be stored together inside a container such as a dict or a list of lists. Doing so will both make them easier to access and will clean your scope.
all_lists = {
'1': ['1', 'one', 'first', ...],
'2': ['2', 'two', 'second', ...],
...: ...,
'x': ['x', 'cross']
}
You can now access a specific list...
list_1 = all_lists['1']
... check if an item is inside the lists.
if any(item in lst for lst in all_lists.values())
print('The item is all_lists')
... or iterate over all lists with a nested loop.
for lst in all_lists.values():
for item in lst:
print(item)
If you are going to collect all lists, this code can do the job.
def generate_list():
l = [globals()[name] for name in globals().keys() if name.startswith('list_')]
return [item for sublist in l for item in sublist]
list_1 = [1]
list_2 = [2]
list_3 = [3]
print(generate_list())
result: [1, 2, 3]

Resources