sort values of lists inside dictionary based on length of characters - python-3.x

d = {'A': ['A11117',
'33465'
'17160144',
'A11-33465',
'3040',
'A11-33465 W1',
'nor'], 'B': ['maD', 'vern', 'first', 'A2lRights']}
I have a dictionary d and I would like to sort the values based on length of characters. For instance, for key A the value A11-33465 W1 would be first because it contains 12 characters followed by 'A11-33465' because it contains 9 characters etc. I would like this output:
d = {'A': ['A11-33465 W1',
' A11-33465',
'17160144',
'A11117',
'33465',
'3040',
'nor'],
'B': ['A2lRights',
'first',
'vern',
'maD']}
(I understand that dictionaries are not able to be sorted but I have examples below that didn't work for me but the answer contains a dictionary that was sorted)
I have tried the following
python sorting dictionary by length of values
print(' '.join(sorted(d, key=lambda k: len(d[k]), reverse=True)))
Sort a dictionary by length of the value
sorted_items = sorted(d.items(), key = lambda item : len(item[1]))
newd = dict(sorted_items[-2:])
How do I sort a dictionary by value?
import operator
sorted_x = sorted(d.items(), key=operator.itemgetter(1))
But they both do not give me what I am looking for.
How do I get my desired output?

You are not sorting the dict, you are sorting the lists inside it. The simplest will be a loop that sorts the lists in-place:
for k, lst in d.items():
lst.sort(key=len, reverse=True)
This will turn d into:
{'A': ['3346517160144', 'A11-33465 W1', 'A11-33465', 'A11117', '3040', 'nor'],
'B': ['A2lRights', 'first', 'vern', 'maD']}
If you want to keep the original data intact, use a comprehension like:
sorted_d = {k: sorted(lst, key=len, reverse=True) for k, lst in d.items()}

Related

column comprehension robust to missing values

I have only been able to create a two column data frame from a defaultdict (termed output):
df_mydata = pd.DataFrame([(k, v) for k, v in output.items()],
columns=['id', 'value'])
What I would like to be able to do is using this basic format also initiate the dataframe with three columns: 'id', 'id2' and 'value'. I have a separate defined dict that contains the necessary look up info, called id_lookup.
So I tried:
df_mydata = pd.DataFrame([(k, id_lookup[k], v) for k, v in output.items()],
columns=['id', 'id2','value'])
I think I'm doing it right, but I get key errors. I will only know if id_lookup is exhaustive for all possible encounters in hindsight. For my purposes, simply putting it all together and placing 'N/A` or something for those types of errors will be acceptable.
Would the above be appropriate for calculating a new column of data using a defaultdict and a simple lookup dict, and how might I make it robust to key errors?
Here is an example of how you could do this:
import pandas as pd
from collections import defaultdict
df = pd.DataFrame({'id': [1, 2, 3, 4],
'value': [10, 20, 30, 40]})
id_lookup = {1: 'A', 2: 'B', 3: 'C'}
new_column = defaultdict(str)
# Loop through the df and populate the defaultdict
for index, row in df.iterrows():
try:
new_column[index] = id_lookup[row['id']]
except KeyError:
new_column[index] = 'N/A'
# Convert the defaultdict to a Series and add it as a new column in the df
df['id2'] = pd.Series(new_column)
# Print the updated DataFrame
print(df)
which gives:
id value id2
0 1 10 A
1 2 20 B
2 3 30 C
3 4 40 N/A
​

Appending value to a list based on dictionary key

I started writing Python scripts for my research this past summer, and have been picking up the language as I go. For my current work, I have a dictionary of lists, sample_range_dict, that is initialized with descriptor_cols as the keys and empty lists for values. Sample code is below:
import numpy as np
import pandas as pd
def rangeFunc(arr):
return (np.max(arr) - np.min(arr))
df_sample = pd.DataFrame(np.random.rand(2000, 4), columns=list("ABCD")) #random dataframe for testing
col_list = df_sample.columns
sample_range_dict = dict.fromkeys(col_list, []) #creates dictionary where each key pairs with an empty list
rand_df = df_sample.sample(n=20) #make a new dataframe with 20 random rows of df_sample
I want to go through each column from rand_df and calculate the range of values, putting each range in the list with the specified column name (e.g. sample_range_dict["A"] = [range in column A]). The following is the code I initially thought to use for this:
for d in col_list:
sample_range_dict[d].append(rangeFunc(rand_df[d].tolist()))
However, instead of each key having one item in the list, printing sample_range_dict shows each key having an identical list of 4 values:
{'A': [0.8404352070810013,
0.9766398946246098,
0.9364714925930782,
0.9801082480908744],
'B': [0.8404352070810013,
0.9766398946246098,
0.9364714925930782,
0.9801082480908744],
'C': [0.8404352070810013,
0.9766398946246098,
0.9364714925930782,
0.9801082480908744],
'D': [0.8404352070810013,
0.9766398946246098,
0.9364714925930782,
0.9801082480908744]}
I've determined that the first value is the range for "A", second value is the range for "B", and so on. My question is about why this is happening, and how I could rewrite the code in order to get one item in the list for each key.
P.S. I'm looking to make this an iterative process, hence using lists instead of single numbers.
The issue is this line:
sample_range_dict = dict.fromkeys(col_list, [])
You only created one list. You don't have four lists with the same elements; you have one list, and four references to it. When you add to it via one reference, the element is visible through the other references, because it's the same list:
>>> a = dict.fromkeys(['x', 'y', 'z'], [])
>>> a['x'] is a['y']
True
>>> a['x'].append(5)
>>> a['y']
[5]
If you want each key to have a different list, either create a new list for each key:
>>> a = { k: [] for k in ['x', 'y', 'z'] }
>>> a['x'] is a['y']
False
>>> a['x'].append(5)
>>> a['y']
[]
Or use a defaultdict which will do it for you:
>>> from collections import defaultdict
>>> a = defaultdict(list)
>>> a['x'] is a['y']
False
>>> a['x'].append(5)
>>> a['y']
[]

python return list of sorted dictionary keys

I'm sure this has been asked and answered, but I cant find it. I have this dictionary:
{'22775': 15.9,
'22778': 29.2,
'22776': 20.25,
'22773': 9.65,
'22777': 22.9,
'22774': 12.45}
a string and a float.
I want to list the key strings in a tk listbox to allow the user to select one and then use the corresponding float in a calculation to determine a delay factor in an event.
I have this code:
def dic_entry(line):
#Create key:value pairs from string
key, sep, value = line.strip().partition(":")
return key, float(value)
with open(filename1) as f_obj:
s = dict(dic_entry(line) for line in f_obj)
print (s) #for testing only
s_ord = sorted(s.items(),key=lambda x: x[1])
print (s_ord)
The first print gets me
{'22775': 15.9,
'22778': 29.2,
'22776': 20.25,
'22773': 9.65,
'22777': 22.9,
'22774': 12.45}
as expected. The second, which I hoped would give me an ordered list of keys gets me
[('22773', 9.65),
('22774', 12.45),
('22775', 15.9),
('22776', 20.25),
('22777', 22.9),
('22778', 29.2)].
I have tried using sorteddictionary from the collections module and it gives me a sorted dictionary, but I'm having trouble extracting a list of keys.
s_ord2 = []
for keys in s.items():
s_ord2.append (keys)
print (s_ord2)
gives me a list of key value pairs:
[('22776', 20.25),
('22777', 22.9),
('22774', 12.45),
('22773', 9.65),
('22778', 29.2),
('22775', 15.9)]
I'm sure I'm doing something dumb, I just don't know what it is.
You're using items when you want to use keys:
In [1]: d = {'z': 3, 'b': 4, 'a': 9}
In [2]: sorted(d.keys())
Out[2]: ['a', 'b', 'z']
In [3]: sorted(d.items())
Out[3]: [('a', 9), ('b', 4), ('z', 3)]
d.items() gives you tuples of (key, value); d.keys() just gives you just the keys.

testing if the values of a dictionary are non zero with all() function

I use Python 3
I want to check if all of my tested values in the nested dictionary are non 0.
So here is the simplified example dict:
d = {'a': {'1990': 10, '1991': 0, '1992': 30},
'b': {'1990': 15, '1991': 40, '1992': 0}}
and I want to test if for both dicts 'a' and 'b' the values of the keys '1990' and '1991' are not zero
for i in d:
for k in range(2):
year = 1990
year = year + k
if all((d[i][str(year)]) != 0):
print(d[i])
so it should only return b, because a['1991']=0
but this is the first time I work with the all() function and I get the error core: TypeError: 'bool' object is not iterable
the error is in the if all() line
thank you very much!
This can done a bit more generally with a list comprehension where you iterate over the items in dict d. A simple comprehension to iterate over the keys and values in our dictionary looks like this:
>>> [k for k, v in d.items()]
['a', 'b']
In the above k will contain the keys and v the values. The comprehension also has an if clause. With that you can filter out the items you don't want. So we define years = ('1990', '1991'). Now we can do another comprehension to test our year values.
To iterate over only 'a', we could do this:
>>> [d['a'][y] for y in years]
[10, 0]
>>> all([d['a'][y] for y in years])
False
Gluing the whole thing together:
>>> d={'a' :{ '1990': 10, '1991':0, '1992':30},'b':{ '1990':15, '1991':40, '1992':0}}
>>> years = ('1990', '1991')
>>> [k for k, v in d.items() if all([v[y] for y in years])]
['b']
See the python docs for more information on list comprehensions.

Translating for loop into list comprehension

I can get this loop to work properly:
for x in range(0,len(l)):
for k in d:
if l[x] in d[k]:
l[x] = k
This looks through a list and checks if the value is in any of the dictionary items and then calculates it equal to the dictionary key it is found within (the dictionary contains lists.)
However, I want to convert to a list comprehension or other single line statement for use in a pandas dataframe - to populate a field based on whether or not another field's value is in the labeled dictionary keys and assign it the dictionary key value.
Here is my best attempt, but it does not work:
l = [ k for x in range(0,len(l)) if l[x] in d[k] for k in d ]
Thanks
Assuming I understand what you're after (example data that can be copied and pasted is always appreciated), I'd do something like this:
>>> l = ["a", "b", "c", "d"]
>>> d = {1: ["a"], 3: ["d", "c"]}
>>> l2 = [next((k for k,v in d.items() if lx in v), lx) for lx in l]
>>> l2
[1, 'b', 3, 3]
Don't forget to think about what behaviour you want if an entry in l is found in multiple lists in d, of course, although that may not be an issue with your data.
You can't do it with a list comprehension, because you have an assignment:
l[x] = k
which is an statement, and a list comprehension can't have them.

Resources