How to transform a list of dicts - python-3.x

I have a list of dicts:
[{'prox', 'SIN'},
{'SIN', 'nerps'},
{'SIN', 'malzon'},
{'SIN', 'oportun'},
{'ANAT', 'head'},
{'ANAT', 'eyes'}]
How can I transform this list to get this as output:
[{'SIN':['prox','nerps','malzon','oportun'],'ANAT':['head','eyes']}
I tried this approach, but it's not working:
d ={}
for dictionary in l:
for key, (k, v) in dictionary.items:
if key not in d:
d[key] = {}
d[key][k] = v
But it fails because set has no items method
When I iterate over all dicts inside the list, it becomes a set thatI don't know how to use it.

[{'prox', 'SIN'},
{'SIN', 'nerps'},
{'SIN', 'malzon'},
{'SIN', 'oportun'},
{'ANAT', 'head'},
{'ANAT', 'eyes'}]
This is not a list of dict, this is a list of Sets. This changes everything, you can't use dictionary.items and even if you could, don't forget the parentheses dictionary.items().
If you know that you will always have an uppercase and a lowercase word, you can just sort them and add them to a dict.
from collections import defaultdict
d = defaultdict(list)
for i, j in (sorted(e) for e in dictionary):
d[i].append(j)

Related

Can we get columns names sorted in the order of their tf-idf values (if exists) for each document?

I'm using sklearn TfIdfVectorizer. I'm trying to get the column names in a list in the order of thier tf-idf values in decreasing order for each document? So basically, If a document has all the stop words then we don't need any column names.
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
msg = ["My name is Venkatesh",
"Trying to get the significant words for each vector",
"I want to get the list of words name in the decresasing order of their tf-idf values for each vector",
"is to my"]
stopwords=['is','to','my','the','for','in','of','i','their']
tfidf_vect = TfidfVectorizer(stop_words=stopwords)
tfidf_matrix=tfidf_vect.fit_transform(msg)
pd.DataFrame(tfidf_matrix.toarray(),
columns=tfidf_vect.get_feature_names_out())
I want to generate a column with the list word names in the decreasing order of their tf-idf values
So the column would be like this
['venkatesh','name']
['significant','trying','vector','words','each','get']
['decreasing','idf','list','order','tf','values','want','each','get','name','vector','words']
[] # empty list Since the document consists only stopwords
Above is the primary result I'm looking for, it would be great if we get the sorted dict with tdfidf values as keys and the list of words as values asociated with that tfidf value for each document
So,the result would be like the below
{'0.785288':['venkatesh'],'0.619130':['name']}
{'0.47212':['significant','trying'],'0.372225':['vector','words','each','get']}
{'0.314534':['decreasing','idf','list','order','tf','values','want'],'0.247983':['each','get','name','vector','words']}
{} # empty dict Since the document consists only stopwords
I think this code does what you want and avoids using pandas:
from itertools import groupby
sort_func = lambda v: v[0] # sort by first value in tuple
all_dicts = []
for row in tfidf_matrix.toarray():
sorted_vals = sorted(zip(row, tfidf_vect.get_feature_names()), key=sort_func, reverse=True)
all_dicts.append({val:[g[1] for g in group] for val, group in groupby(sorted_vals, key=sort_func) if val != 0})
You could make it even less readable and put it all in a single comprehension! :-)
The combination of the following function and to_dict() method on dataframe can give you the desired output.
def ret_dict(_dict):
# Get a list of unique values
list_keys = list(set(_dict.values()))
processed_dict = {key:[] for key in list_keys}
# Prepare dictionary
for key, value in _dict.items():
processed_dict[value].append(str(key))
# Sort the keys (as you want)
sorted_keys = sorted(processed_dict, key=lambda x: x, reverse=True)
sorted_keys = [ keys for keys in sorted_keys if keys > 0]
# Return the dictionary with sorted keys
sorted_dict = {k:processed_dict[k] for k in sorted_keys}
return sorted_dict
Then:
res = pd.DataFrame(tfidf_matrix.toarray(), columns=tfidf_vect.get_feature_names_out())
list_dict = res.to_dict('records')
processed_list = []
for _dict in list_dict:
processed_list.append(ret_dict(_dict))
processed_list contains the output you desire. For instance: processed_list[1] would output:
{0.47212002654617047: ['significant', 'trying'], 0.3722248517590162: ['each', 'get', 'vector', 'words']}

Best way to exchange keys with values in a dictionary, where values are in a list?

My dict (cpc_docs) has a structure like
{
sym1:[app1, app2, app3],
sym2:[app1, app6, app56, app89],
sym3:[app3, app887]
}
My dict has 15K keys and they are unique strings. Values for each key are a list of app numbers and they can appear as values for more than one key.
I've looked here [Python: Best Way to Exchange Keys with Values in a Dictionary?, but since my value is a list, i get an error unhashable type: list
I've tried the following methods:
res = dict((v,k) for k,v in cpc_docs.items())
for x,y in cpc_docs.items():
res.setdefault(y,[]).append(x)
new_dict = dict (zip(cpc_docs.values(),cpc_docs.keys()))
None of these work of course since my values are lists.
I want each unique element from the value lists and all of its keys as a list.
Something like this:
{
app1:[sym1, sym2]
app2:[sym1]
app3:[sym1, sym3]
app6:[sym2]
app56:[sym2]
app89:[sym2]
app887:[sym3]
}
A bonus would be to order the new dict based on the len of each value list. So like:
{
app1:[sym1, sym2]
app3:[sym1, sym3]
app2:[sym1]
app6:[sym2]
app56:[sym2]
app89:[sym2]
app887:[sym3]
}
Your setdefault code is almost there, you just need an extra loop over the lists of values:
res = {}
for k, lst in cpc_docs.items():
for v in lst:
res.setdefault(v, []).append(k)
First create a list of key, value tuples
new_list=[]
for k,v in cpc_docs.items():
for i in range(len(v)):
new_list.append((k,v[i]))
Then for each tuple in the list, add the key if it isn't in the dict and append the
doc_cpc = defaultdict(set)
for tup in cpc_doc_list:
doc_cpc[tup[1]].add(tup[0])
Probably many better ways, but this works.

Remove values from dictionary

I have a large dictionary and I am trying to remove values from keys if they start with certain values. Below is a small example of the dictionary.
a_data = {'78567908': {'26.01.17', '02.03.24', '26.01.12', '04.03.03', '01.01.13', '02.03.01', '01.01.10', '26.01.21'}, '85789070': {'26.01.02', '09.01.04', '02.05.04', '02.03.17', '02.05.01'}, '87140110': {'03.15.25', '03.15.24', '03.15.19'}, '87142218': {'26.17.13', '02.11.01', '02.03.22'}, '87006826': {'28.01.03'}}
After I read in the dictionary, I want to remove values from all keys that start with '26.' or '02.' It is possible that would leave a key with no values (an empty set).
I do have code that works:
exclude = ('26.', '02.')
f_a_data = {}
for k, v in a_data.items():
f_a_data.setdefault(k,[])
for code in v:
print (k, code, not code.startswith(exclude))
if not code.startswith(exclude):
f_a_data[k].append(code)
print('Filtered dict:')
print(f_a_data)
This returns a filtered dict:
Filtered dict:
{'78567908': ['04.03.03', '01.01.13', '01.01.10'], '85789070': ['09.01.04'], '87140110': ['03.15.25', '03.15.24', '03.15.19'], '87142218': [], '87006826': ['28.01.03']}
Question 1: Is this the best way to filter a dictionary?
Question 2: How could i modify the above snippet to return values in a set like the original dict?
Your code is quite all right in complexity terms but can be "pythonized" a little and still remain readable.
My proposal: you can rebuild a dictionary using nested comprehensions and all to test if you should include the values:
a_data = {'78567908': {'26.01.17', '02.03.24', '26.01.12', '04.03.03', '01.01.13', '02.03.01', '01.01.10', '26.01.21'}, '85789070': {'26.01.02', '09.01.04', '02.05.04', '02.03.17', '02.05.01'}, '87140110': {'03.15.25', '03.15.24', '03.15.19'}, '87142218': {'26.17.13', '02.11.01', '02.03.22'}, '87006826': {'28.01.03'}}
exclude = ('26.', '02.')
new_data = {k:{x for x in v if all(s not in x for s in exclude)} for k,v in a_data.items()}
result:
>>> new_data
{'78567908': {'01.01.10', '01.01.13', '04.03.03'},
'85789070': {'09.01.04'},
'87006826': {'28.01.03'},
'87140110': {'03.15.19', '03.15.25', '03.15.24'},
'87142218': set()}
(here using a dictionary comprehension embedding a set comprehension (since you need a set) using a generator comprehension in all)

Making a dictionary of from a list and a dictionary

I am trying to create a dictionary of codes that I can use for queries and selections. Let's say I have a dictionary of state names and corresponding FIPS codes:
statedict ={'Alabama': '01', 'Alaska':'02', 'Arizona': '04',... 'Wyoming': '56'}
And then I have a list of FIPS codes that I have pulled in from a Map Server request:
fipslist = ['02121', '01034', '56139', '04187', '02003', '04023', '02118']
I want to sort of combine the key from the dictionary (based on the first 2 characters of the value of that key) with the list items (also, based on the first 2 characters of the value of that key. Ex. all codes beginning with 01 = 'Alabama', etc...). My end goal is something like this:
fipsdict ={'Alabama': ['01034'], 'Alaska':['02121', '02003','02118'], 'Arizona': ['04187', '04023'],... 'Wyoming': ['56139']}
I would try to set it up similar to this, but it's not working quite correctly. Any suggestions?
fipsdict = {}
tempList = []
for items in fipslist:
for k, v in statedict:
if item[:2] == v in statedict:
fipsdict[k] = statedict[v]
fipsdict[v] = tempList.extend(item)
A one liner with nested comprehensions:
>>> {k:[n for n in fipslist if n[:2]==v] for k,v in statedict.items()}
{'Alabama': ['01034'],
'Alaska': ['02121', '02003', '02118'],
'Arizona': ['04187', '04023'],
'Wyoming': ['56139']}
You will have to create a new list to hold matching fips codes for each state. Below is the code that should work for your case.
for state,two_digit_fips in statedict.items():
matching_fips = []
for fips in fipslist:
if fips[:2] == two_digit_fips:
matching_fips.append(fips)
state_to_matching_fips_map[state] = matching_fips
>>> print(state_to_matching_fips_map)
{'Alabama': ['01034'], 'Arizona': ['04187', '04023'], 'Alaska': ['02121', '02003', '02118'], 'Wyoming': ['56139']}
For both proposed solutions I need a reversed state dictionary (I assume that each state has exactly one 2-digit code):
reverse_state_dict = {v: k for k,v in statedict.items()}
An approach based on defaultdict:
from collections import defaultdict
fipsdict = defaultdict(list)
for f in fipslist:
fipsdict[reverse_state_dict[f[:2]]].append(f)
An approach based on groupby and dictionary comprehension:
from itertools import groupby
{reverse_state_dict[k]: list(v) for k,v
in (groupby(sorted(fipslist), key=lambda x:x[:2]))}

Return only a few keys in a dictionary? Python

I have a dictionary with values attached. I am able to get all keys. I have done searching around and a lot of people are saying to put the keys in a list, However I need the values attached to that key and the values must stay the same.
mydict = {'Car':'BMW','Speed':'kph','Range':33}
for keys in mydict:
print(keys)
What I am after is any two of the keys and their values to be printed out.
I don't fully understand what are you looking for.
You want to print values also? Go for
mydict = {'Car':'BMW','Speed':'kph','Range':33}
for keys in mydict:
print(keys,":",mydict[keys])
You want just print 2 of them?
mydict = {'Car':'BMW','Speed':'kph','Range':33}
from itertools import islice
def take(n, iterable):
return list(islice(iterable, n))
n_items = take(2, mydict.iteritems())
print(n_items)
You'll need itertools from pip tho.
Well, even if you put them in a list, you can still get the values:
mydict = {'Car':'BMW','Speed':'kph','Range':33}
keys = list(mydict)
for key in keys:
print(mydict[keys])
If you want only two keys you can do:
keys = keys[:2]
And if you want a new dictionary using only those two keys:
mynewdict = {k:v for k,v in mydict.items() if k in keys}
And probably the shortest:
for key in list(mydict)[:2]:
print(key, mydict[key])

Resources