Related
I have a text file which contains duplicate car registration numbers with different values, like so:
EDF768, Bill Meyer, 2456, Vet_Parking
TY5678, Jane Miller, 8987, AgHort_Parking
GEF123, Jill Black, 3456, Creche_Parking
ABC234, Fred Greenside, 2345, AgHort_Parking
GH7682, Clara Hill, 7689, AgHort_Parking
JU9807, Jacky Blair, 7867, Vet_Parking
KLOI98, Martha Miller, 4563, Vet_Parking
ADF645, Cloe Freckle, 6789, Vet_Parking
DF7800, Jacko Frizzle, 4532, Creche_Parking
WER546, Olga Grey, 9898, Creche_Parking
HUY768, Wilbur Matty, 8912, Creche_Parking
EDF768, Jenny Meyer, 9987, Vet_Parking
TY5678, Jo King, 8987, AgHort_Parking
JU9807, Mike Green, 3212, Vet_Parking
I want to create a dictionary from this data, which uses the registration numbers (first column) as keys and the data from the rest of the line for values.
I wrote this code:
data_dict = {}
data_list = []
def createDictionaryModified(filename):
path = "C:\Users\user\Desktop"
basename = "ParkingData_Part3.txt"
filename = path + "//" + basename
file = open(filename)
contents = file.read()
print(contents,"\n")
data_list = [lines.split(",") for lines in contents.split("\n")]
for line in data_list:
regNumber = line[0]
name = line[1]
phoneExtn = line[2]
carpark = line[3].strip()
details = (name,phoneExtn,carpark)
data_dict[regNumber] = details
print(data_dict,"\n")
print(data_dict.items(),"\n")
print(data_dict.values())
The problem is that the data file contains duplicate values for the registration numbers. When I try to store them in the same dictionary with data_dict[regNumber] = details, the old value is overwritten.
How do I make a dictionary with duplicate keys?
Sometimes people want to "combine" or "merge" multiple existing dictionaries by just putting all the items into a single dict, and are surprised or annoyed that duplicate keys are overwritten. See the related question How to merge dicts, collecting values from matching keys? for dealing with this problem.
Python dictionaries don't support duplicate keys. One way around is to store lists or sets inside the dictionary.
One easy way to achieve this is by using defaultdict:
from collections import defaultdict
data_dict = defaultdict(list)
All you have to do is replace
data_dict[regNumber] = details
with
data_dict[regNumber].append(details)
and you'll get a dictionary of lists.
You can change the behavior of the built in types in Python. For your case it's really easy to create a dict subclass that will store duplicated values in lists under the same key automatically:
class Dictlist(dict):
def __setitem__(self, key, value):
try:
self[key]
except KeyError:
super(Dictlist, self).__setitem__(key, [])
self[key].append(value)
Output example:
>>> d = dictlist.Dictlist()
>>> d['test'] = 1
>>> d['test'] = 2
>>> d['test'] = 3
>>> d
{'test': [1, 2, 3]}
>>> d['other'] = 100
>>> d
{'test': [1, 2, 3], 'other': [100]}
Rather than using a defaultdict or messing around with membership tests or manual exception handling, use the setdefault method to add new empty lists to the dictionary when they're needed:
results = {} # use a normal dictionary for our output
for k, v in some_data: # the keys may be duplicates
results.setdefault(k, []).append(v) # magic happens here!
setdefault checks to see if the first argument (the key) is already in the dictionary. If doesn't find anything, it assigns the second argument (the default value, an empty list in this case) as a new value for the key. If the key does exist, nothing special is done (the default goes unused). In either case though, the value (whether old or new) gets returned, so we can unconditionally call append on it (knowing it should always be a list).
You can't have a dict with duplicate keys for definition!
Instead you can use a single key and, as the value, a list of elements that had that key.
So you can follow these steps:
See if the current element's key (of your initial set) is in the final dict. If it is, go to step 3
Update dict with key
Append the new value to the dict[key] list
Repeat [1-3]
If you want to have lists only when they are necessary, and values in any other cases, then you can do this:
class DictList(dict):
def __setitem__(self, key, value):
try:
# Assumes there is a list on the key
self[key].append(value)
except KeyError: # If it fails, because there is no key
super(DictList, self).__setitem__(key, value)
except AttributeError: # If it fails because it is not a list
super(DictList, self).__setitem__(key, [self[key], value])
You can then do the following:
dl = DictList()
dl['a'] = 1
dl['b'] = 2
dl['b'] = 3
Which will store the following {'a': 1, 'b': [2, 3]}.
I tend to use this implementation when I want to have reverse/inverse dictionaries, in which case I simply do:
my_dict = {1: 'a', 2: 'b', 3: 'b'}
rev = DictList()
for k, v in my_dict.items():
rev_med[v] = k
Which will generate the same output as above: {'a': 1, 'b': [2, 3]}.
CAVEAT: This implementation relies on the non-existence of the append method (in the values you are storing). This might produce unexpected results if the values you are storing are lists. For example,
dl = DictList()
dl['a'] = 1
dl['b'] = [2]
dl['b'] = 3
would produce the same result as before {'a': 1, 'b': [2, 3]}, but one might expected the following: {'a': 1, 'b': [[2], 3]}.
You can refer to the following article:
http://www.wellho.net/mouth/3934_Multiple-identical-keys-in-a-Python-dict-yes-you-can-.html
In a dict, if a key is an object, there are no duplicate problems.
For example:
class p(object):
def __init__(self, name):
self.name = name
def __repr__(self):
return self.name
def __str__(self):
return self.name
d = {p('k'): 1, p('k'): 2}
You can't have duplicated keys in a dictionary. Use a dict of lists:
for line in data_list:
regNumber = line[0]
name = line[1]
phoneExtn = line[2]
carpark = line[3].strip()
details = (name,phoneExtn,carpark)
if not data_dict.has_key(regNumber):
data_dict[regNumber] = [details]
else:
data_dict[regNumber].append(details)
It's pertty old question but maybe my solution help someone.
by overriding __hash__ magic method, you can save same objects in dict.
Example:
from random import choices
class DictStr(str):
"""
This class behave exacly like str class but
can be duplicated in dict
"""
def __new__(cls, value='', custom_id='', id_length=64):
# If you want know why I use __new__ instead of __init__
# SEE: https://stackoverflow.com/a/2673863/9917276
obj = str.__new__(cls, value)
if custom_id:
obj.id = custom_id
else:
# Make a string with length of 64
choice_str = "abcdefghijklmopqrstuvwxyzABCDEFJHIJKLMNOPQRSTUVWXYZ1234567890"
obj.id = ''.join(choices(choice_str, k=id_length))
return obj
def __hash__(self) -> int:
return self.id.__hash__()
Now lets create a dict:
>>> a_1 = DictStr('a')
>>> a_2 = DictStr('a')
>>> a_3 = 'a'
>>> a_1
a
>>> a_2
a
>>> a_1 == a_2 == a_3
True
>>> d = dict()
>>> d[a_1] = 'some_data'
>>> d[a_2] = 'other'
>>> print(d)
{'a': 'some_data', 'a': 'other'}
NOTE: This solution can apply to any basic data structure like (int, float,...)
EXPLANATION :
We can use almost any object as key in dict class (or mostly known as HashMap or HashTable in other languages) but there should be a way to distinguish between keys because dict have no idea about objects.
For this purpose objects that want to add to dictionary as key somehow have to provide a unique identifier number(I name it uniq_id, it's actually a number somehow created with hash algorithm) for themself.
Because dictionary structure widely use in most of solutions,
most of programming languages hide object uniq_id generation inside a hash name buildin method that feed dict in key search
So if you manipulate hash method of your class you can change behaviour of your class as dictionary key
Dictionary does not support duplicate key, instead you can use defaultdict
Below is the example of how to use defaultdict in python3x to solve your problem
from collections import defaultdict
sdict = defaultdict(list)
keys_bucket = list()
data_list = [lines.split(",") for lines in contents.split("\n")]
for data in data_list:
key = data.pop(0)
detail = data
keys_bucket.append(key)
if key in keys_bucket:
sdict[key].append(detail)
else:
sdict[key] = detail
print("\n", dict(sdict))
Above code would produce output as follow:
{'EDF768': [[' Bill Meyer', ' 2456', ' Vet_Parking'], [' Jenny Meyer', ' 9987', ' Vet_Parking']], 'TY5678': [[' Jane Miller', ' 8987', ' AgHort_Parking'], [' Jo King', ' 8987', ' AgHort_Parking']], 'GEF123': [[' Jill Black', ' 3456', ' Creche_Parking']], 'ABC234': [[' Fred Greenside', ' 2345', ' AgHort_Parking']], 'GH7682': [[' Clara Hill', ' 7689', ' AgHort_Parking']], 'JU9807': [[' Jacky Blair', ' 7867', ' Vet_Parking'], [' Mike Green', ' 3212', ' Vet_Parking']], 'KLOI98': [[' Martha Miller', ' 4563', ' Vet_Parking']], 'ADF645': [[' Cloe Freckle', ' 6789', ' Vet_Parking']], 'DF7800': [[' Jacko Frizzle', ' 4532', ' Creche_Parking']], 'WER546': [[' Olga Grey', ' 9898', ' Creche_Parking']], 'HUY768': [[' Wilbur Matty', ' 8912', ' Creche_Parking']]}
I have a tuple with tuples inside like this:
tup = ((1,2,3,'Joe'),(3,4,5,'Kevin'),(6,7,8,'Joe'),(10,11,12,'Donald'))
This goes on and on and the numbers don't matter here. The only data that matters are the names. What I need is to count how many times a given name occurs in the tuple and return a list where each item is a list and the number of times it occurs, like this:
list_that_i_want = [['Joe',2],['Kevin',1],['Donald',1]]
I don't want to use any modules or collections like Counter. I want to hard code this.
I actually wanted to hardcode the full solution and not even use the '.count()' method.
So far what I got is this:
def create_list(tuples):
new_list= list()
cont = 0
for tup in tuples:
for name in tup:
name = tup[3]
cont = tup.count(name)
if name not in new_list:
new_list.append(name)
new_list.append(cont)
return new_list
list_that_i_want = create_list(tup)
print(list_that_i_want)
And the output that I am been given is:
['Joe',1,'Kevin',1,'Donald',1]
Any help? Python newbie here.
You could. create a dictionary first and find the counts. Then convert the dictionary to a list of list.
tup = ((1,2,3,'Joe'),(3,4,5,'Kevin'),(6,7,8,'Joe'),(10,11,12,'Donald'))
dx = {}
for _,_,_,nm in tup:
if nm in dx: dx[nm] +=1
else: dx[nm] = 1
list_i_want = [[k,v] for k,v in dx.items()]
print (list_i_want)
You can replace the for_loop and the if statement section to this one line:
for _,_,_,nm in tup: dx[nm] = dx.get(nm, 0) + 1
The output will be
[['Joe', 2], ['Kevin', 1], ['Donald', 1]]
The updated code will be:
tup = ((1,2,3,'Joe'),(3,4,5,'Kevin'),(6,7,8,'Joe'),(10,11,12,'Donald'))
dx = {}
for _,_,_,nm in tup: dx[nm] = dx.get(nm, 0) + 1
list_i_want = [[k,v] for k,v in dx.items()]
print (list_i_want)
Output:
[['Joe', 2], ['Kevin', 1], ['Donald', 1]]
Using an intermediary dict:
def create_list(tuple_of_tuples):
results = {}
for tup in tuple_of_tuples:
name = tup[3]
if name not in results:
results[name] = 0
results[name] += 1
return list(results.items())
Of course, using defaultdict, or even Counter, would be the more Pythonic solution.
You can try with this approach:
tuples = ((1,2,3,'Joe'),(3,4,5,'Kevin'),(6,7,8,'Joe'),(10,11,12,'Donald'))
results = {}
for tup in tuples:
if tup[-1] not in results:
results[tup[-1]] = 1
else:
results[tup[-1]] += 1
new_list = [[key,val] for key,val in results.items()]
Here, a no-counter solution:
results = {}
for t in tup:
results[t[-1]] = results[t[-1]]+1 if (t[-1] in results) else 1
results.items()
#dict_items([('Joe', 2), ('Kevin', 1), ('Donald', 1)])
I have below dictionaries of lists:
dict1 = {'SourceName': ['PUICUI'], 'EventType': ['XYX'], 'TableName': ['XYX__ct'], 'KeyIndex': ['XYX', 'ZXX']}
dict2 = {'SourceName': ['PUICI2'], 'EventType': ['XYX'], 'TableName': ['ZXX__ct1']}
And my below piece of code is working just as expected.
def combineDictList(*args):
result = {}
for dic in args:
for key in (result.keys() | dic.keys()):
if key in dic:
result.setdefault(key, []).extend(dic[key])
return result
print(combineDictList(dict1, dict2))
which gives me
{'TableName': ['XYX__ct', 'ZXX__ct1'], 'SourceName': ['PUICUI', 'PUICI2'], 'KeyIndex': ['XYX', 'ZXX'], 'EventType': ['XYX', 'XYX']}
But my question is how to print the final result to have unique values, e.g. here EventType has same values.
So, in final result i would only expect the final result to be
{'TableName': ['XYX__ct', 'ZXX__ct1'], 'SourceName': ['PUICUI', 'PUICI2'], 'KeyIndex': ['XYX', 'ZXX'], 'EventType': ['XYX']}
Is there anyway I can achieve this?
Try this
def combineDictList(*args):
result = {}
for dic in args:
for key in (result.keys() | dic.keys()):
if key in dic:
result.setdefault(key, []).extend(dic[key])
result[key] = list(set(result[key]))
return result
print(combineDictList(dict1, dict2))
Use set
Ex:
dict1 = {'SourceName': ['PUICUI'], 'EventType': ['XYX'], 'TableName': ['XYX__ct'], 'KeyIndex': ['XYX', 'ZXX']}
dict2 = {'SourceName': ['PUICI2'], 'EventType': ['XYX'], 'TableName': ['ZXX__ct1']}
def combineDictList(*args):
result = {}
for dic in args:
for k, v in dic.items():
result.setdefault(k, set()).update(v)
# If you need values as list
# result = {k: list(v) for k, v in result.items()}
return result
print(combineDictList(dict1, dict2))
Output:
{'EventType': {'XYX'},
'KeyIndex': {'ZXX', 'XYX'},
'SourceName': {'PUICI2', 'PUICUI'},
'TableName': {'ZXX__ct1', 'XYX__ct'}}
I am learning Python. I tried to get the keys of a dictionary. But I only get the last key. In my understanding, method keys() is used to get all keys in the dictionary.
Following are my questions?
1. Why I cannot get all keys?
2. If I have a dictionary, how can I get the value if I know the key? e.g. dict = {'Ben':8, 'Joe':7, 'Mary' : 9}. How can I input the key = "Ben", so the program can output the value 8? The tutorial shows that the key must be immutable. This constraint is very inconvenient when trying to get a value with a given key.
Any suggestion would be highly appreciated.
Here are my code.
import os, tarfile, urllib
work_path = os.getcwd()
input_control_file = "input_control"
import os, tarfile, urllib
work_path = os.getcwd()
input_control_file = "input_control"
input_control= work_path + "/" + input_control_file
#open control file if file exist
#read setting info
try:
#if the file does not exist,
#then it would throw an IOError
f = open(input_control, 'r')
#define dictionary/hash table
for LINE in f:
LINE = LINE.strip() #remove leading and trailing whitespace
lst = LINE.split() #split string into lists
lst[0] = lst[0].split(":")[0]
dic = {lst[0].strip():lst[1].strip()}
except IOError:
# print(os.error) will <class 'OSError'>
print("Reading file error. File " + input_control + " does not exist.")
#get keys
def getkeys(dict):
return list(dict.keys())
print("l39")
print(getkeys(dic))
print("end")
Below are the outputs.
l39
['source_type']
end
The reason is that you are reassigning variable dic again in for loop. You are not updating or adding the dictionary, instead you are reassigning the variable. In that case, dic will have only the last entry. You can change your for loop to:
dic = {}
for LINE in f:
LINE = LINE.strip() #remove leading and trailing whitespace
lst = LINE.split() #split string into lists
lst[0] = lst[0].split(":")[0]
dic.update({lst[0].strip():lst[1].strip()}) # update the dictionary with new values.
For your other question, if you have the dictionary dic = {'Ben':8, 'Joe':7, 'Mary' : 9}, then you can get the value by: dic['Ben']. It will return the value 8 or will raise KeyError if key Ben is not found in the dictionary. To avoid KeyError, you can use the get() method of dictionary. It will return None if provided key is not found in the dictionary.
val = dic['Ben'] # returns 8
val = dic['Hen'] # will raise KeyError
val = dic.get('Hen') # will return None
In your for loop, you are re-initializing the dictionary value, while you need to update the dictionary, i.e., append the key-value pair to the pre-existing dictionary. For this, use
dic.update({lst[0].strip() : lst[1].strip()})
This will update the key-value pair to the dictionary. Now, when you use dic.keys(), you will get all the keys of dic, as a list.
As for your second question, access the dictionary, just like accessing a list, except that list is accessed with indices, and dictionary will be accessed by keys. Say, you have a list and a dictionary as
lst = [1, 2, 3, 4, 5]
dic = {'a' : 1, 'b' : 2, 'c' : 3, 'd' : 4, 'e' : 5}
To get value 2 from list, you do lst[1], i.e., value at index 1. Similarly, if you want to get the value 2 from dictionary, do dic['b'], i.e., value of key 'b'. It is as simple as that.
Python 3
Hello guys I'm a python beginner studying dictionary now
below is what I have learned so far how to save list into a file
and count items in list like below.
class item:
name = None
score = None
def save_list_in_file(file_name:str, L:[item]):
f = open(file_name,'w')
for it in L:
f.write(it.name + "" + str(it.score))
f.close()
def count_item_in_list(L:[item])->int:
n = 0
for it in L:
if it.score >= 72:
n += 1
return n
and I'm not sure if using dictionary is same way as I use in list
for example:
def save_dict_to_file(file_name:str, D:{item}):
f = open(file_name,'w')
for it in D:
f.write(it.name + "" + str(it.score))
f.close()
def count_item_in_dict(D:{item})->int:
n = 0
for it in D:
if it.score <= 72:
n += 1
return n
will be correct? i thought dict would be different than using a list.
Thanks for any comment!
You can't use a dictionary the same way as using a list.
A list is defined as a sequence of elements. So when you have the list:
L=['D','a','v','i','d']
You can loop it like this:
for it in L:
print(it)
And it will print:
D
a
v
i
d
Instead, a dictionary is a group of tuples of two elements where one is the key and the second is the value. So for example you have a Dictonary like this:
D = {'firstletter' : 'D', 'secondletter': 'a', 'thirdletter' : 'v' }
And when you loop it like a list:
for it in L:
print(it)
it will print only the keys:
firstletter
secondletter
thirdletter
so in order to obtain the values you have to print it like this:
for it in D:
print(D[it])
that will display this result:
D
a
v
If you need more information you can check de documentation for dictionary :)
Python 3 documentation of Data Structures