Making a dictionary of from a list and a dictionary - python-3.x

I am trying to create a dictionary of codes that I can use for queries and selections. Let's say I have a dictionary of state names and corresponding FIPS codes:
statedict ={'Alabama': '01', 'Alaska':'02', 'Arizona': '04',... 'Wyoming': '56'}
And then I have a list of FIPS codes that I have pulled in from a Map Server request:
fipslist = ['02121', '01034', '56139', '04187', '02003', '04023', '02118']
I want to sort of combine the key from the dictionary (based on the first 2 characters of the value of that key) with the list items (also, based on the first 2 characters of the value of that key. Ex. all codes beginning with 01 = 'Alabama', etc...). My end goal is something like this:
fipsdict ={'Alabama': ['01034'], 'Alaska':['02121', '02003','02118'], 'Arizona': ['04187', '04023'],... 'Wyoming': ['56139']}
I would try to set it up similar to this, but it's not working quite correctly. Any suggestions?
fipsdict = {}
tempList = []
for items in fipslist:
for k, v in statedict:
if item[:2] == v in statedict:
fipsdict[k] = statedict[v]
fipsdict[v] = tempList.extend(item)

A one liner with nested comprehensions:
>>> {k:[n for n in fipslist if n[:2]==v] for k,v in statedict.items()}
{'Alabama': ['01034'],
'Alaska': ['02121', '02003', '02118'],
'Arizona': ['04187', '04023'],
'Wyoming': ['56139']}

You will have to create a new list to hold matching fips codes for each state. Below is the code that should work for your case.
for state,two_digit_fips in statedict.items():
matching_fips = []
for fips in fipslist:
if fips[:2] == two_digit_fips:
matching_fips.append(fips)
state_to_matching_fips_map[state] = matching_fips
>>> print(state_to_matching_fips_map)
{'Alabama': ['01034'], 'Arizona': ['04187', '04023'], 'Alaska': ['02121', '02003', '02118'], 'Wyoming': ['56139']}

For both proposed solutions I need a reversed state dictionary (I assume that each state has exactly one 2-digit code):
reverse_state_dict = {v: k for k,v in statedict.items()}
An approach based on defaultdict:
from collections import defaultdict
fipsdict = defaultdict(list)
for f in fipslist:
fipsdict[reverse_state_dict[f[:2]]].append(f)
An approach based on groupby and dictionary comprehension:
from itertools import groupby
{reverse_state_dict[k]: list(v) for k,v
in (groupby(sorted(fipslist), key=lambda x:x[:2]))}

Related

How to transform a list of dicts

I have a list of dicts:
[{'prox', 'SIN'},
{'SIN', 'nerps'},
{'SIN', 'malzon'},
{'SIN', 'oportun'},
{'ANAT', 'head'},
{'ANAT', 'eyes'}]
How can I transform this list to get this as output:
[{'SIN':['prox','nerps','malzon','oportun'],'ANAT':['head','eyes']}
I tried this approach, but it's not working:
d ={}
for dictionary in l:
for key, (k, v) in dictionary.items:
if key not in d:
d[key] = {}
d[key][k] = v
But it fails because set has no items method
When I iterate over all dicts inside the list, it becomes a set thatI don't know how to use it.
[{'prox', 'SIN'},
{'SIN', 'nerps'},
{'SIN', 'malzon'},
{'SIN', 'oportun'},
{'ANAT', 'head'},
{'ANAT', 'eyes'}]
This is not a list of dict, this is a list of Sets. This changes everything, you can't use dictionary.items and even if you could, don't forget the parentheses dictionary.items().
If you know that you will always have an uppercase and a lowercase word, you can just sort them and add them to a dict.
from collections import defaultdict
d = defaultdict(list)
for i, j in (sorted(e) for e in dictionary):
d[i].append(j)

Can we get columns names sorted in the order of their tf-idf values (if exists) for each document?

I'm using sklearn TfIdfVectorizer. I'm trying to get the column names in a list in the order of thier tf-idf values in decreasing order for each document? So basically, If a document has all the stop words then we don't need any column names.
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
msg = ["My name is Venkatesh",
"Trying to get the significant words for each vector",
"I want to get the list of words name in the decresasing order of their tf-idf values for each vector",
"is to my"]
stopwords=['is','to','my','the','for','in','of','i','their']
tfidf_vect = TfidfVectorizer(stop_words=stopwords)
tfidf_matrix=tfidf_vect.fit_transform(msg)
pd.DataFrame(tfidf_matrix.toarray(),
columns=tfidf_vect.get_feature_names_out())
I want to generate a column with the list word names in the decreasing order of their tf-idf values
So the column would be like this
['venkatesh','name']
['significant','trying','vector','words','each','get']
['decreasing','idf','list','order','tf','values','want','each','get','name','vector','words']
[] # empty list Since the document consists only stopwords
Above is the primary result I'm looking for, it would be great if we get the sorted dict with tdfidf values as keys and the list of words as values asociated with that tfidf value for each document
So,the result would be like the below
{'0.785288':['venkatesh'],'0.619130':['name']}
{'0.47212':['significant','trying'],'0.372225':['vector','words','each','get']}
{'0.314534':['decreasing','idf','list','order','tf','values','want'],'0.247983':['each','get','name','vector','words']}
{} # empty dict Since the document consists only stopwords
I think this code does what you want and avoids using pandas:
from itertools import groupby
sort_func = lambda v: v[0] # sort by first value in tuple
all_dicts = []
for row in tfidf_matrix.toarray():
sorted_vals = sorted(zip(row, tfidf_vect.get_feature_names()), key=sort_func, reverse=True)
all_dicts.append({val:[g[1] for g in group] for val, group in groupby(sorted_vals, key=sort_func) if val != 0})
You could make it even less readable and put it all in a single comprehension! :-)
The combination of the following function and to_dict() method on dataframe can give you the desired output.
def ret_dict(_dict):
# Get a list of unique values
list_keys = list(set(_dict.values()))
processed_dict = {key:[] for key in list_keys}
# Prepare dictionary
for key, value in _dict.items():
processed_dict[value].append(str(key))
# Sort the keys (as you want)
sorted_keys = sorted(processed_dict, key=lambda x: x, reverse=True)
sorted_keys = [ keys for keys in sorted_keys if keys > 0]
# Return the dictionary with sorted keys
sorted_dict = {k:processed_dict[k] for k in sorted_keys}
return sorted_dict
Then:
res = pd.DataFrame(tfidf_matrix.toarray(), columns=tfidf_vect.get_feature_names_out())
list_dict = res.to_dict('records')
processed_list = []
for _dict in list_dict:
processed_list.append(ret_dict(_dict))
processed_list contains the output you desire. For instance: processed_list[1] would output:
{0.47212002654617047: ['significant', 'trying'], 0.3722248517590162: ['each', 'get', 'vector', 'words']}

Remove values from dictionary

I have a large dictionary and I am trying to remove values from keys if they start with certain values. Below is a small example of the dictionary.
a_data = {'78567908': {'26.01.17', '02.03.24', '26.01.12', '04.03.03', '01.01.13', '02.03.01', '01.01.10', '26.01.21'}, '85789070': {'26.01.02', '09.01.04', '02.05.04', '02.03.17', '02.05.01'}, '87140110': {'03.15.25', '03.15.24', '03.15.19'}, '87142218': {'26.17.13', '02.11.01', '02.03.22'}, '87006826': {'28.01.03'}}
After I read in the dictionary, I want to remove values from all keys that start with '26.' or '02.' It is possible that would leave a key with no values (an empty set).
I do have code that works:
exclude = ('26.', '02.')
f_a_data = {}
for k, v in a_data.items():
f_a_data.setdefault(k,[])
for code in v:
print (k, code, not code.startswith(exclude))
if not code.startswith(exclude):
f_a_data[k].append(code)
print('Filtered dict:')
print(f_a_data)
This returns a filtered dict:
Filtered dict:
{'78567908': ['04.03.03', '01.01.13', '01.01.10'], '85789070': ['09.01.04'], '87140110': ['03.15.25', '03.15.24', '03.15.19'], '87142218': [], '87006826': ['28.01.03']}
Question 1: Is this the best way to filter a dictionary?
Question 2: How could i modify the above snippet to return values in a set like the original dict?
Your code is quite all right in complexity terms but can be "pythonized" a little and still remain readable.
My proposal: you can rebuild a dictionary using nested comprehensions and all to test if you should include the values:
a_data = {'78567908': {'26.01.17', '02.03.24', '26.01.12', '04.03.03', '01.01.13', '02.03.01', '01.01.10', '26.01.21'}, '85789070': {'26.01.02', '09.01.04', '02.05.04', '02.03.17', '02.05.01'}, '87140110': {'03.15.25', '03.15.24', '03.15.19'}, '87142218': {'26.17.13', '02.11.01', '02.03.22'}, '87006826': {'28.01.03'}}
exclude = ('26.', '02.')
new_data = {k:{x for x in v if all(s not in x for s in exclude)} for k,v in a_data.items()}
result:
>>> new_data
{'78567908': {'01.01.10', '01.01.13', '04.03.03'},
'85789070': {'09.01.04'},
'87006826': {'28.01.03'},
'87140110': {'03.15.19', '03.15.25', '03.15.24'},
'87142218': set()}
(here using a dictionary comprehension embedding a set comprehension (since you need a set) using a generator comprehension in all)

List of dictionaries set comprehension calculation

My data structure is a list of dicts. I would like to run a function over the values of certain keys, and then output only a certain number of dictionaries as the result.
from datetime import datetime
from dateutil.parser import parse
today = '05/17/18'
adict = [{'taskid':1,'desc':'task1','complexity':5,'dl':'05/28/18'},{'taskid':2,'desc':'task2','complexity':3,'dl':'05/20/18'},
{'taskid':3,'desc':'task3','complexity':1,'dl':'05/25/18'}]
def conv_tm(t):
return datetime.strptime(t,'%m/%d/%y')
def days(obj):
day = conv_tm(today)
dl = conv_tm(obj)
dur = (dl-day).days
if dur <0:
dur = 1
return dur
I found the easiest way to process the dates for the 'dl' key was to run this dict comprehension:
vals = [days(i['dl']) for i in adict]
#this also worked, but I didn't like it as much
vals = list(map(lambda x: days(x['dl']), adict))
Now, I need to do 2 things: 1) zip this list back up to the 'dl' key, and 2)return or print a (random) set of 2 dicts w/o altering the origianl dict, perhaps like so:
{'taskid':1,'desc':task1,'dl':8,'complexity':5}
{'taskid':3,'desc':task3,'dl':8,'complexity':1}
Cheers
You could produce the new dicts directly like this:
new_dicts = [{**d, 'dl': days(d['dl'])} for d in adict]
If you need vals separately, you can use it to do this as well:
new_dicts = [{**d, 'dl': v} for d, v in zip(adict, vals)]

Return only a few keys in a dictionary? Python

I have a dictionary with values attached. I am able to get all keys. I have done searching around and a lot of people are saying to put the keys in a list, However I need the values attached to that key and the values must stay the same.
mydict = {'Car':'BMW','Speed':'kph','Range':33}
for keys in mydict:
print(keys)
What I am after is any two of the keys and their values to be printed out.
I don't fully understand what are you looking for.
You want to print values also? Go for
mydict = {'Car':'BMW','Speed':'kph','Range':33}
for keys in mydict:
print(keys,":",mydict[keys])
You want just print 2 of them?
mydict = {'Car':'BMW','Speed':'kph','Range':33}
from itertools import islice
def take(n, iterable):
return list(islice(iterable, n))
n_items = take(2, mydict.iteritems())
print(n_items)
You'll need itertools from pip tho.
Well, even if you put them in a list, you can still get the values:
mydict = {'Car':'BMW','Speed':'kph','Range':33}
keys = list(mydict)
for key in keys:
print(mydict[keys])
If you want only two keys you can do:
keys = keys[:2]
And if you want a new dictionary using only those two keys:
mynewdict = {k:v for k,v in mydict.items() if k in keys}
And probably the shortest:
for key in list(mydict)[:2]:
print(key, mydict[key])

Resources