I have 5 list of words, which basically act as values in a dictionary where the keys are the IDs of the documents.
For each document, I would like to apply some calculations and display the values and results of the calculation in a nested dictionary.
So far so good, I managed to do everything but I am failing in the easiest part.
When showing the resulting nested dictionary, it seems it's only iterating over the last element of each of the 5 lists, and therefore not showing all the elements...
Could anybody explain me where I am failing??
This is the original dictionary data_docs:
{'doc01': ['simpl', 'hello', 'world', 'test', 'python', 'code'],
'doc02': ['today', 'wonder', 'day'],
'doc03': ['studi', 'pac', 'today'],
'doc04': ['write', 'need', 'cup', 'coffe'],
'doc05': ['finish', 'pac', 'use', 'python']}
This is the result I am getting (missing 'simpl','hello', 'world', 'test', 'python' in doc01 as example):
{'doc01': {'code': 0.6989700043360189},
'doc02': {'day': 0.6989700043360189},
'doc03': {'today': 0.3979400086720376},
'doc04': {'coffe': 0.6989700043360189},
'doc05': {'python': 0.3979400086720376}}
And this is the code:
def tfidf (data, idf_score): #function, 2 dictionaries as parameters
tfidf = {} #dict for output
for word, val in data.items(): #for each word and value in data_docs(first dict)
for v in val: #for each value in each list
a = val.count(v) #count the number of times that appears in that list
scores = {v :a * idf_score[v]} # dictionary that will act as value in the nested
tfidf[word] = scores #final dictionary, the key is doc01,doc02... and the value the above dict
return tfidf
tfidf(data_docs, idf_score)
Thanks,
Did you mean to do this?
def tfidf(data, idf_score): # function, 2 dictionaries as parameters
tfidf = {} # dict for output
for word, val in data.items(): # for each word and value in data_docs(first dict)
scores = {} # <---- a new dict for each outer iteration
for v in val: # for each value in each list
a = val.count(v) # count the number of times that appears in that list
scores[v] = a * idf_score[v] # <---- keep adding items to the dictionary
tfidf[word] = scores # final dictionary, the key is doc01,doc02... and the value the above dict
return tfidf
... see my changes with <----- arrow :)
Returns:
{'doc01': {'simpl': 1,
'hello': 1,
'world': 1,
'test': 1,
'python': 1,
'code': 1},
'doc02': {'today': 1, 'wonder': 1, 'day': 1},
'doc03': {'studi': 1, 'pac': 1, 'today': 1},
'doc04': {'write': 1, 'need': 1, 'cup': 1, 'coffe': 1},
'doc05': {'finish': 1, 'pac': 1, 'use': 1, 'python': 1}}
Related
I have a 2d list with arbitrary strings like this:
lst = [['a', 'xyz' , 'tps'], ['rtr' , 'xyz']]
I want to create a dictionary out of this:
{'a': 0, 'xyz': 1, 'tps': 2, 'rtr': 3}
How do I do this? This answer answers for 1D list for non-repeated values, but, I have a 2d list and values can repeat. Is there a generic way of doing this?
Maybe you could use two for-loops:
lst = [['a', 'xyz' , 'tps'], ['rtr' , 'xyz']]
d = {}
overall_idx = 0
for sub_lst in lst:
for word in sub_lst:
if word not in d:
d[word] = overall_idx
# Increment overall_idx below if you want to only increment if word is not previously seen
# overall_idx += 1
overall_idx += 1
print(d)
Output:
{'a': 0, 'xyz': 1, 'tps': 2, 'rtr': 3}
You could first convert the list of lists to a list using a 'double' list comprehension.
Next, get rid of all the duplicates using a dictionary comprehension, we could use set for that but would lose the order.
Finally use another dictionary comprehension to get the desired result.
lst = [['a', 'xyz' , 'tps'], ['rtr' , 'xyz']]
# flatten list of lists to a list
flat_list = [item for sublist in lst for item in sublist]
# remove duplicates
ordered_set = {x:0 for x in flat_list}.keys()
# create required output
the_dictionary = {v:i for i, v in enumerate(ordered_set)}
print(the_dictionary)
""" OUTPUT
{'a': 0, 'xyz': 1, 'tps': 2, 'rtr': 3}
"""
also, with collections and itertools:
import itertools
from collections import OrderedDict
lstdict={}
lst = [['a', 'xyz' , 'tps'], ['rtr' , 'xyz']]
lstkeys = list(OrderedDict(zip(itertools.chain(*lst), itertools.repeat(None))))
lstdict = {lstkeys[i]: i for i in range(0, len(lstkeys))}
lstdict
output:
{'a': 0, 'xyz': 1, 'tps': 2, 'rtr': 3}
I'm trying to create a dictionary using keys and values from other dictionaries. The issue I'm having is that the dictionary I want to return comes up empty.
Note:
USER_RATING_DICT_SMALL = {1: {68735: 3.5, 302156: 4.0}, 2: {68735: 1.0, 124057: 1.5, 293660: 4.5}}
MOVIE_USER_DICT_SMALL = {293660: [2], 68735: [1, 2], 302156: [1], 124057: [2]}
def movies_to_users(user_ratings):
"""Return a dictionary of movie ids to list of users who rated the movie,
using information from the user_ratings dictionary of users to movie
ratings dictionaries.
>>> result = movies_to_users(USER_RATING_DICT_SMALL)
>>> result == MOVIE_USER_DICT_SMALL
True
"""
user_list = []
movie_to_users = {}
for items in user_ratings.items():
user_id = items[0]
user_list.append(user_id)
for items in user_ratings.items():
movie_id = items[1]
if movie_id in user_list:
movie_to_users[movie_id] = [user_list]
return movie_to_users
I created an empty dictionary for all the values and keys to accumulate to but it is not accumulating; it returns an empty dictionary instead. I want the output to be == MOVIE_USER_DICT_SMALL
A pretty short solution might be
MOVIE_USER_DICT_SMALL = {}
for user, data in USER_RATING_DICT_SMALL.items():
for movie in data.keys():
if not movie in MOVIE_USER_DICT_SMALL.keys():
# add entry for each movie
# will be an empty list if no user rated.
MOVIE_USER_DICT_SMALL[movie] = []
# for user in current iteration, there MUST be a rating...
# otherwise, movie would not be in data.keys()
MOVIE_USER_DICT_SMALL[movie] += [user]
MOVIE_USER_DICT_SMALL
# {68735: [1, 2], 302156: [1], 124057: [2], 293660: [2]}
The trick is to use key and value as returned by dict.items()
I need to make a dictionary using the string list as keys and their distinct characters as values.
I have tried some functions and ended up with the following code but I cannot seem to add the string key into it
value=["check", "look", "try", "pop"]
print(value)
def distinct_characters(x):
for i in x:
yield dict (i=len(set(i)))
print (list(distinct_characters(value))
I would like to get
{ "check" : 4, "look" : 3, "try" : 3, "pop" : 2}
but I keep getting
{ "i" : 4, "i" : 3, "i" : 3, "i" : 2}
Well, string is itself an iterable, so don't call list on dicts instead call dict on list of tuples like below.
value=["check", "look", "try", "pop"]
print(value)
def distinct_characters(x):
for i in x:
yield (i, len(set(i)))
print(dict(distinct_characters(value)))
Output:
{'check': 4, 'look': 3, 'try': 3, 'pop': 2}
Consider the simple dictionary comprehension:
value = ["check", "look", "try", "pop"]
result = {key: len(set(key)) for key in value}
print(result)
Thanks for the replies
I needed to answer it as a function for a class exercise so I ended up using this code:
value=["check", "look", "try", "pop"]
print(value)
def distinct_characters(x):
for i in x:
yield (i, len(set(i)))
print(dict(distinct_characters(value)))
Thanks again
I have a standard list of objects, where each object is defined as
class MyRecord(object):
def __init__(self, name, date, category, memo):
self.name = name
self.date = date
self.category = category
self.memo = memo.strip().split()
When I create an object usually the input memo is a long sentence, for example: "Hello world this is a new funny-memo", which then in the init function turns into a list ['Hello', 'world', 'is', 'a', 'new', 'funny-memo'].
Given let's say a 10000 of such records in the list (with different memos) I want to group them (as fast as possible) in the following way:
'Hello' : [all the records, which memo contains word 'Hello']
'world' : [all the records, which memo contains word 'world']
'is' : [all the records, which memo contains word 'is']
I know how to use group-by to group the records by for example name, date, or category (since it is a single value), but I'm having a problem to group in the way described above.
If you want to group them really fast then you should do it once and never recalculate. To achieve this you may try approach used for caching that is group objects during the creation:
class MyRecord():
__groups = dict()
def __init__(self, name, date, category, memo):
self.name = name
self.date = date
self.category = category
self.memo = memo.strip().split()
for word in self.memo:
self.__groups.setdefault(word, set()).add(self)
#classmethod
def get_groups(cls):
return cls.__groups
records = list()
for line in [
'Hello world this is a new funny-memo',
'Hello world this was a new funny-memo',
'Hey world this is a new funny-memo']:
records.append(MyRecord(1, 1, 1, line))
print({key: len(val) for key, val in MyRecord.get_groups().items()})
Output:
{'Hello': 2, 'world': 3, 'this': 3, 'is': 2, 'a': 3, 'new': 3, 'funny-memo': 3, 'was': 1, 'Hey': 1}
I am trying to write some code that involves creating a default dictionary of dictionaries. However, I have no idea how to initialise/create such a thing. My current attempt looks something like this:
from collections import defaultdict
inner_dict = {}
dict_of_dicts = defaultdict(inner_dict(int))
The use of this default dict of dictionaries is to for each pair of words that I produce from a file I open (e.g. [['M UH M', 'm oo m']] ), to set each segment of the first word delimited by empty space as a key in the outer dictionary, and then for each segment in the second word delimited by empty space count the frequency of that segment.
For example
[['M UH M', 'm oo m']]
(<class 'dict'>, {'M': {'m': 2}, 'UH': {'oo': 1}})
Having just run this now it doesn't seem to have output any errors, however I was just wondering if something like this will actually produce a default dictionary of dictionaries.
Apologies if this is a duplicate, however previous answers to these questions have been confusing and in a different context.
To initialise a defaultdict that creates dictionaries as its default value you would use:
d = defaultdict(dict)
For this particular problem, a collections.Counter would be more suitable
>>> from collections import defaultdict, Counter
>>> d = defaultdict(Counter)
>>> for a, b in zip(*[x.split() for x in ['M UH M', 'm oo m']]):
... d[a][b] += 1
>>> print(d)
defaultdict(collections.Counter,
{'M': Counter({'m': 2}), 'UH': Counter({'oo': 1})})
Edit
You expressed interest in a comment about the equivalent without a Counter. Here is the equivalent using a plain dict
>>> from collections import defaultdict
>>> d = defaultdict(dict)
>>> for a, b in zip(*[x.split() for x in ['M UH M', 'm oo m']]):
... d[a][b] = d[a].get(b, 0) + 1
>>> print(d)
defaultdict(dict, {'M': {'m': 2}, 'UH': {'oo': 1}})
You also could a use a normal dictionary and its setdefault method.
my_dict.setdefault(key, default) will look up my_dict[key] and ...
... if the key already exists, return its current value without modifying it, or ...
... assign the default value (my_dict[key] = default) and then return that.
So you can call my_dict.setdefault(key, {}) always when you want to get a value from your outer dictionary instead of the normal my_dict[key] to retrieve either the real value assigned with this key if it#s present, or to get a new empty dictionary as default value which gets automatically stored into your outer dictionary as well.
Example:
outer_dict = {"M": {"m": 2}}
inner_dict = d.setdefault("UH", {})
# outer_dict = {"M": {"m": 2}, "UH": {}}
# inner_dict = {}
inner_dict["oo"] = 1
# outer_dict = {"M": {"m": 2}, "UH": {"oo": 1}}
# inner_dict = {"oo": 1}
inner_dict = d.setdefault("UH", {})
# outer_dict = {"M": {"m": 2}, "UH": {"oo": 1}}
# inner_dict = {"oo": 1}
inner_dict["xy"] = 3
# outer_dict = {"M": {"m": 2}, "UH": {"oo": 1, "xy": 3}}
# inner_dict = {"oo": 1, "xy": 3}
This way you always get a valid inner_dict, either an empty default one or the one that's already present for the given key. As dictionaries are mutable data types, modifying the returned inner_dict will also modify the dictionary inside outer_dict.
The other answers propose alternative solutions or show you can make a default dictionary of dictionaries using d = defaultdict(dict)
but the question asked how to make a default dictionary of default dictionaries, my navie first attempt was this:
from collections import defaultdict
my_dict = defaultdict(defaultdict(list))
however this throw an error: *** TypeError: first argument must be callable or None
so my second attempt which works is to make a callable using the lambda key word to make an anonymous function:
from collections import defaultdict
my_dict = defaultdict(lambda: defaultdict(list))
which is more concise than the alternative method using a regular function:
from collections import defaultdict
def default_dict_maker():
return defaultdict(list)
my_dict = defaultdict(default_dict_maker)
you can check it works by assigning:
my_dict[2][3] = 5
my_dict[2][3]
>>> 5
or by trying to return a value:
my_dict[0][0]
>>> []
my_dict[5]
>>> defaultdict(<class 'list'>, {})
tl;dr
this is your oneline answer my_dict = defaultdict(lambda: defaultdict(list))