Bracket for values in dictionary python 3 - python-3.x

I am wondering how can I manage the bracket[] for the values of that key. For instance, a dictionary named "diamond" which contains = {'a':'Roger', 'c':'Rafael', 'b':'Roger'}. I am supposed to re-organize the data in diamond so that it would become rediamond = {'Rafael':['c'], 'Roger': '['a','b']'}.
My Code
def group_by_owners(files):
store = dict()
for key,value in files.items():
if value in store:
store[value]=(store[value], [key])
else:
store[value]=[key]
return store
files = {
'a': 'Rafael',
'b': 'Roger',
'c': 'Roger'
}
print(group_by_owners(files))
My Output
{'Rafael': (['a']), 'Roger': ['b'],['c']}
Expected Output
{{'Rafael': (['a']), 'Roger': ['b','c']}
So if there is to be 3 values for Roger, it should organize like ['','',''] .

You should use a defaultdict:
from collections import defaultdict
diamond = {'a':'Roger', 'c':'Rafael', 'b':'Roger'}
rediamond = defaultdict(list)
for k, v in diamond.items():
rediamond[v].append(k)

Related

column comprehension robust to missing values

I have only been able to create a two column data frame from a defaultdict (termed output):
df_mydata = pd.DataFrame([(k, v) for k, v in output.items()],
columns=['id', 'value'])
What I would like to be able to do is using this basic format also initiate the dataframe with three columns: 'id', 'id2' and 'value'. I have a separate defined dict that contains the necessary look up info, called id_lookup.
So I tried:
df_mydata = pd.DataFrame([(k, id_lookup[k], v) for k, v in output.items()],
columns=['id', 'id2','value'])
I think I'm doing it right, but I get key errors. I will only know if id_lookup is exhaustive for all possible encounters in hindsight. For my purposes, simply putting it all together and placing 'N/A` or something for those types of errors will be acceptable.
Would the above be appropriate for calculating a new column of data using a defaultdict and a simple lookup dict, and how might I make it robust to key errors?
Here is an example of how you could do this:
import pandas as pd
from collections import defaultdict
df = pd.DataFrame({'id': [1, 2, 3, 4],
'value': [10, 20, 30, 40]})
id_lookup = {1: 'A', 2: 'B', 3: 'C'}
new_column = defaultdict(str)
# Loop through the df and populate the defaultdict
for index, row in df.iterrows():
try:
new_column[index] = id_lookup[row['id']]
except KeyError:
new_column[index] = 'N/A'
# Convert the defaultdict to a Series and add it as a new column in the df
df['id2'] = pd.Series(new_column)
# Print the updated DataFrame
print(df)
which gives:
id value id2
0 1 10 A
1 2 20 B
2 3 30 C
3 4 40 N/A
​

split dataframe into a dictionary of dictionaries

I have a dataframe containing 4 columns. I want to use 2 of the columns as keys for a dictionary of dictionaries, where the values inside are the remaining 2 columns (so a dataframe)
birdies = pd.DataFrame({'Habitat' : ['Captive', 'Wild', 'Captive', 'Wild'],
'Animal': ['Falcon', 'Falcon','Parrot', 'Parrot'],
'Max Speed': [380., 370., 24., 26.],
'Color': ["white", "grey", "green", "blue"]})
#this should ouput speed and color
birdies_dict["Falcon"]["Wild"]
#this should contain a dictionary, which the keys are 'Captive','Wild'
birdies_dict["Falcon"]
I have found a way to generate a dictionary of dataframes with a single column as a key, but not with 2 columns as keys:
birdies_dict = {k:table for k,table in birdies.groupby("Animal")}
I suggest to use defaultdict for this, a solution for the 2 column problem is:
from collections import defaultdict
d = defaultdict(dict)
for (hab, ani), _df in df.groupby(['Habitat', 'Animal']):
d[hab][ani] = _df
This breaks with 2 columns, if you want it with a higher depth, you can just define a recursive defaultdict:
from collections import defaultdict
recursive_dict = lambda: defaultdict(recursive_dict)
dct = recursive_dict()
dct[1][2][3] = ...
Pass to_dict to the inside:
birdies_dict = {k:d.to_dict() for k,d in birdies.groupby('Animal')}
birdies_dict['Falcon']['Habitat']
Output:
{0: 'Captive', 1: 'Wild'}
Or do you mean:
out = birdies.set_index(['Animal','Habitat'])
out.loc[('Falcon','Captive')]
which gives:
Max Speed 380
Color white
Name: (Falcon, Captive), dtype: object
IIUC:
birdies_dict = {k:{habitat: table[['Max Speed', 'Color']].to_numpy() for habitat in table['Habitat'].to_numpy()} for k,table in birdies.groupby("Animal")}
OR
birdies_dict = {k:{habitat: table[['Max Speed', 'Color']] for habitat in table['Habitat'].to_numpy()} for k,table in birdies.groupby("Animal")}
#In this case inner key will have a dataframe
OR
birdies_dict = {k:{inner_key: inner_table for inner_key, inner_table in table.groupby('Habitat')} for k,table in birdies.groupby("Animal")}

Appending value to a list based on dictionary key

I started writing Python scripts for my research this past summer, and have been picking up the language as I go. For my current work, I have a dictionary of lists, sample_range_dict, that is initialized with descriptor_cols as the keys and empty lists for values. Sample code is below:
import numpy as np
import pandas as pd
def rangeFunc(arr):
return (np.max(arr) - np.min(arr))
df_sample = pd.DataFrame(np.random.rand(2000, 4), columns=list("ABCD")) #random dataframe for testing
col_list = df_sample.columns
sample_range_dict = dict.fromkeys(col_list, []) #creates dictionary where each key pairs with an empty list
rand_df = df_sample.sample(n=20) #make a new dataframe with 20 random rows of df_sample
I want to go through each column from rand_df and calculate the range of values, putting each range in the list with the specified column name (e.g. sample_range_dict["A"] = [range in column A]). The following is the code I initially thought to use for this:
for d in col_list:
sample_range_dict[d].append(rangeFunc(rand_df[d].tolist()))
However, instead of each key having one item in the list, printing sample_range_dict shows each key having an identical list of 4 values:
{'A': [0.8404352070810013,
0.9766398946246098,
0.9364714925930782,
0.9801082480908744],
'B': [0.8404352070810013,
0.9766398946246098,
0.9364714925930782,
0.9801082480908744],
'C': [0.8404352070810013,
0.9766398946246098,
0.9364714925930782,
0.9801082480908744],
'D': [0.8404352070810013,
0.9766398946246098,
0.9364714925930782,
0.9801082480908744]}
I've determined that the first value is the range for "A", second value is the range for "B", and so on. My question is about why this is happening, and how I could rewrite the code in order to get one item in the list for each key.
P.S. I'm looking to make this an iterative process, hence using lists instead of single numbers.
The issue is this line:
sample_range_dict = dict.fromkeys(col_list, [])
You only created one list. You don't have four lists with the same elements; you have one list, and four references to it. When you add to it via one reference, the element is visible through the other references, because it's the same list:
>>> a = dict.fromkeys(['x', 'y', 'z'], [])
>>> a['x'] is a['y']
True
>>> a['x'].append(5)
>>> a['y']
[5]
If you want each key to have a different list, either create a new list for each key:
>>> a = { k: [] for k in ['x', 'y', 'z'] }
>>> a['x'] is a['y']
False
>>> a['x'].append(5)
>>> a['y']
[]
Or use a defaultdict which will do it for you:
>>> from collections import defaultdict
>>> a = defaultdict(list)
>>> a['x'] is a['y']
False
>>> a['x'].append(5)
>>> a['y']
[]

return dictionary of file names as keys and word lists with words unique to file as values

I am trying to write a function to extract only words unique to each key and list them in a dictionary output like {"key1": "unique words", "key2": "unique words", ... }. I start out with a dictionary. To test with I created a simple dictionary:
d = {1:["one", "two", "three"], 2:["two", "four",
"five"], 3:["one","four", "six"]}
My output should be:
{1:"three",
2:"five",
3:"six"}
I am thinking maybe split in to separate lists
def return_unique(dct):
Klist = list(dct.keys())
Vlist = list(dct.values())
aList = []
for i in range(len(Vlist)):
for j in Vlist[i]:
if
What I'm stuck on is how do I tell Python to do this: if Vlist[i][j] is not in the rest of Vlist then aList.append(Vlist[i][j]).
Thank you.
You can try something like this:
def return_unique(data):
all_values = []
for i in data.values(): # Get all values
all_values = all_values + i
unique_values = set([x for x in all_values if all_values.count(x) == 1]) # Values which are not duplicated
for key, value in data.items(): # For Python 3.x ( For Python 2.x -> data.iteritems())
for item in value: # Comparing values of two lists
for item1 in unique_values:
if item == item1:
data[key] = item
return data
d = {1:["one", "two", "three"], 2:["two", "four", "five"], 3:["one","four", "six"]}
print (return_unique(d))
result >> {1: 'three', 2: 'five', 3: 'six'}
Since a key may have more than one unique word associated with it, it makes sense for the values in the new dictionary to be a container type object to hold the unique words.
The set difference operator returns the difference between 2 sets:
>>> a = set([1, 2, 3])
>>> b = set([2, 4, 6])
>>> a - b
{1, 3}
We can use this to get the values unique to each key. Packaging these into a simple function yields:
def unique_words_dict(data):
res = {}
values = []
for k in data:
for g in data:
if g != k:
values += data[g]
res[k] = set(data[k]) - set(values)
values = []
return res
>>> d = {1:["one", "two", "three"],
2:["two", "four", "five"],
3:["one","four", "six"]}
>>> unique_words_dict(d)
{1: {'three'}, 2: {'five'}, 3: {'six'}}
If you only had to do this once, then you might be interested in the less efficeint but more consice dictionary comprehension:
>>> from functools import reduce
>>> {k: set(d[k]) - set(reduce(lambda a, b: a+b, [d[g] for g in d if g!=k], [])) for k in d}
{1: {'three'}, 2: {'five'}, 3: {'six'}}

How do I create a default dictionary of dictionaries

I am trying to write some code that involves creating a default dictionary of dictionaries. However, I have no idea how to initialise/create such a thing. My current attempt looks something like this:
from collections import defaultdict
inner_dict = {}
dict_of_dicts = defaultdict(inner_dict(int))
The use of this default dict of dictionaries is to for each pair of words that I produce from a file I open (e.g. [['M UH M', 'm oo m']] ), to set each segment of the first word delimited by empty space as a key in the outer dictionary, and then for each segment in the second word delimited by empty space count the frequency of that segment.
For example
[['M UH M', 'm oo m']]
(<class 'dict'>, {'M': {'m': 2}, 'UH': {'oo': 1}})
Having just run this now it doesn't seem to have output any errors, however I was just wondering if something like this will actually produce a default dictionary of dictionaries.
Apologies if this is a duplicate, however previous answers to these questions have been confusing and in a different context.
To initialise a defaultdict that creates dictionaries as its default value you would use:
d = defaultdict(dict)
For this particular problem, a collections.Counter would be more suitable
>>> from collections import defaultdict, Counter
>>> d = defaultdict(Counter)
>>> for a, b in zip(*[x.split() for x in ['M UH M', 'm oo m']]):
... d[a][b] += 1
>>> print(d)
defaultdict(collections.Counter,
{'M': Counter({'m': 2}), 'UH': Counter({'oo': 1})})
Edit
You expressed interest in a comment about the equivalent without a Counter. Here is the equivalent using a plain dict
>>> from collections import defaultdict
>>> d = defaultdict(dict)
>>> for a, b in zip(*[x.split() for x in ['M UH M', 'm oo m']]):
... d[a][b] = d[a].get(b, 0) + 1
>>> print(d)
defaultdict(dict, {'M': {'m': 2}, 'UH': {'oo': 1}})
You also could a use a normal dictionary and its setdefault method.
my_dict.setdefault(key, default) will look up my_dict[key] and ...
... if the key already exists, return its current value without modifying it, or ...
... assign the default value (my_dict[key] = default) and then return that.
So you can call my_dict.setdefault(key, {}) always when you want to get a value from your outer dictionary instead of the normal my_dict[key] to retrieve either the real value assigned with this key if it#s present, or to get a new empty dictionary as default value which gets automatically stored into your outer dictionary as well.
Example:
outer_dict = {"M": {"m": 2}}
inner_dict = d.setdefault("UH", {})
# outer_dict = {"M": {"m": 2}, "UH": {}}
# inner_dict = {}
inner_dict["oo"] = 1
# outer_dict = {"M": {"m": 2}, "UH": {"oo": 1}}
# inner_dict = {"oo": 1}
inner_dict = d.setdefault("UH", {})
# outer_dict = {"M": {"m": 2}, "UH": {"oo": 1}}
# inner_dict = {"oo": 1}
inner_dict["xy"] = 3
# outer_dict = {"M": {"m": 2}, "UH": {"oo": 1, "xy": 3}}
# inner_dict = {"oo": 1, "xy": 3}
This way you always get a valid inner_dict, either an empty default one or the one that's already present for the given key. As dictionaries are mutable data types, modifying the returned inner_dict will also modify the dictionary inside outer_dict.
The other answers propose alternative solutions or show you can make a default dictionary of dictionaries using d = defaultdict(dict)
but the question asked how to make a default dictionary of default dictionaries, my navie first attempt was this:
from collections import defaultdict
my_dict = defaultdict(defaultdict(list))
however this throw an error: *** TypeError: first argument must be callable or None
so my second attempt which works is to make a callable using the lambda key word to make an anonymous function:
from collections import defaultdict
my_dict = defaultdict(lambda: defaultdict(list))
which is more concise than the alternative method using a regular function:
from collections import defaultdict
def default_dict_maker():
return defaultdict(list)
my_dict = defaultdict(default_dict_maker)
you can check it works by assigning:
my_dict[2][3] = 5
my_dict[2][3]
>>> 5
or by trying to return a value:
my_dict[0][0]
>>> []
my_dict[5]
>>> defaultdict(<class 'list'>, {})
tl;dr
this is your oneline answer my_dict = defaultdict(lambda: defaultdict(list))

Resources