i want to find a common character from n number string inside a single multidimensional list using python - python-3.x

def bibek():
test_list=[[]]
x=int(input("Enter the length of String elements using enter -: "))
for i in range(0,x):
a=str(input())
a=list(a)
test_list.append(a)
del(test_list[0]):
def filt(b):
d=['b','i','b']
if b in d:
return True
else:
return False
for t in test_list:
x=filter(filt,t)
for i in x:
print(i)
bibek()
suppose test_list=[['b','i','b'],['s','i','b'],['r','i','b']]
output should be ib since ib is common among all

an option is to use set and its methods:
test_list=[['b','i','b'],['s','i','b'],['r','i','b']]
common = set(test_list[0])
for item in test_list[1:]:
common.intersection_update(item)
print(common) # {'i', 'b'}
UPDATE: now that you have clarified your question i would to this:
from difflib import SequenceMatcher
test_list=[['b','i','b','b'],['s','i','b','b'],['r','i','b','b']]
# convert the list to simple strings
strgs = [''.join(item) for item in test_list]
common = strgs[0]
for item in strgs[1:]:
sm = SequenceMatcher(isjunk=None, a=item, b=common)
match = sm.find_longest_match(0, len(item), 0, len(common))
common = common[match.b:match.b+match.size]
print(common) # 'ibb'
the trick here is to use difflib.SequenceMatcher in order to get the longest string.
one more update after clarification of your question this time using collections.Counter:
from collections import Counter
strgs='app','bapp','sardipp', 'ppa'
common = Counter(strgs[0])
print(common)
for item in strgs[1:]:
c = Counter(item)
for key, number in common.items():
common[key] = min(number, c.get(key, 0))
print(common) # Counter({'p': 2, 'a': 1})
print(sorted(common.elements())) # ['a', 'p', 'p']

Related

Counting: How do I add a zero if a word does not occur in a list?

I would like to find keywords from a list, but return a zero if the word does not exist (in this case: part). In this example, collabor occurs 4 times and part 0 times.
My current output is
[['collabor', 4]]
But what I would like to have is
[['collabor', 4], ['part', 0]]
str1 = ["collabor", "part"]
x10 = []
for y in wordlist:
for string in str1:
if y.find(string) != -1:
x10.append(y)
from collections import Counter
x11 = Counter(x10)
your_list = [list(i) for i in x11.items()]
rowssorted = sorted(your_list, key=lambda x: x[0])
print(rowssorted)
Although you have not clearly written your problem and requirements,I think I understood the task.
I assume that you have a set of words that may or may not occur in a given list and you want to print the count of those words based on the occurrence in the given list.
Code:
constants=["part","collabor"]
wordlist = ["collabor", "collabor"]
d={}
for const in constants:
d[const]=0
for word in wordlist:
if word in d:
d[word]+=1
else:
d[word]=0
from collections import Counter
x11 = Counter(d)
your_list = [list(i) for i in x11.items()]
rowssorted = sorted(your_list, key=lambda x: x[0])
print(rowssorted)
output:
[['collabor', 2], ['part', 0]]
This approach gives the required output.
In python, to get the count of occurrence dictionary is popular.
Hope it helps!

Need to fetch 1st value from the dictionary from all the preferred keys in python [duplicate]

What is an efficient way to find the most common element in a Python list?
My list items may not be hashable so can't use a dictionary.
Also in case of draws the item with the lowest index should be returned. Example:
>>> most_common(['duck', 'duck', 'goose'])
'duck'
>>> most_common(['goose', 'duck', 'duck', 'goose'])
'goose'
A simpler one-liner:
def most_common(lst):
return max(set(lst), key=lst.count)
Borrowing from here, this can be used with Python 2.7:
from collections import Counter
def Most_Common(lst):
data = Counter(lst)
return data.most_common(1)[0][0]
Works around 4-6 times faster than Alex's solutions, and is 50 times faster than the one-liner proposed by newacct.
On CPython 3.6+ (any Python 3.7+) the above will select the first seen element in case of ties. If you're running on older Python, to retrieve the element that occurs first in the list in case of ties you need to do two passes to preserve order:
# Only needed pre-3.6!
def most_common(lst):
data = Counter(lst)
return max(lst, key=data.get)
With so many solutions proposed, I'm amazed nobody's proposed what I'd consider an obvious one (for non-hashable but comparable elements) -- [itertools.groupby][1]. itertools offers fast, reusable functionality, and lets you delegate some tricky logic to well-tested standard library components. Consider for example:
import itertools
import operator
def most_common(L):
# get an iterable of (item, iterable) pairs
SL = sorted((x, i) for i, x in enumerate(L))
# print 'SL:', SL
groups = itertools.groupby(SL, key=operator.itemgetter(0))
# auxiliary function to get "quality" for an item
def _auxfun(g):
item, iterable = g
count = 0
min_index = len(L)
for _, where in iterable:
count += 1
min_index = min(min_index, where)
# print 'item %r, count %r, minind %r' % (item, count, min_index)
return count, -min_index
# pick the highest-count/earliest item
return max(groups, key=_auxfun)[0]
This could be written more concisely, of course, but I'm aiming for maximal clarity. The two print statements can be uncommented to better see the machinery in action; for example, with prints uncommented:
print most_common(['goose', 'duck', 'duck', 'goose'])
emits:
SL: [('duck', 1), ('duck', 2), ('goose', 0), ('goose', 3)]
item 'duck', count 2, minind 1
item 'goose', count 2, minind 0
goose
As you see, SL is a list of pairs, each pair an item followed by the item's index in the original list (to implement the key condition that, if the "most common" items with the same highest count are > 1, the result must be the earliest-occurring one).
groupby groups by the item only (via operator.itemgetter). The auxiliary function, called once per grouping during the max computation, receives and internally unpacks a group - a tuple with two items (item, iterable) where the iterable's items are also two-item tuples, (item, original index) [[the items of SL]].
Then the auxiliary function uses a loop to determine both the count of entries in the group's iterable, and the minimum original index; it returns those as combined "quality key", with the min index sign-changed so the max operation will consider "better" those items that occurred earlier in the original list.
This code could be much simpler if it worried a little less about big-O issues in time and space, e.g....:
def most_common(L):
groups = itertools.groupby(sorted(L))
def _auxfun((item, iterable)):
return len(list(iterable)), -L.index(item)
return max(groups, key=_auxfun)[0]
same basic idea, just expressed more simply and compactly... but, alas, an extra O(N) auxiliary space (to embody the groups' iterables to lists) and O(N squared) time (to get the L.index of every item). While premature optimization is the root of all evil in programming, deliberately picking an O(N squared) approach when an O(N log N) one is available just goes too much against the grain of scalability!-)
Finally, for those who prefer "oneliners" to clarity and performance, a bonus 1-liner version with suitably mangled names:-).
from itertools import groupby as g
def most_common_oneliner(L):
return max(g(sorted(L)), key=lambda(x, v):(len(list(v)),-L.index(x)))[0]
What you want is known in statistics as mode, and Python of course has a built-in function to do exactly that for you:
>>> from statistics import mode
>>> mode([1, 2, 2, 3, 3, 3, 3, 3, 4, 5, 6, 6, 6])
3
Note that if there is no "most common element" such as cases where the top two are tied, this will raise StatisticsError on Python
<=3.7, and on 3.8 onwards it will return the first one encountered.
Without the requirement about the lowest index, you can use collections.Counter for this:
from collections import Counter
a = [1936, 2401, 2916, 4761, 9216, 9216, 9604, 9801]
c = Counter(a)
print(c.most_common(1)) # the one most common element... 2 would mean the 2 most common
[(9216, 2)] # a set containing the element, and it's count in 'a'
If they are not hashable, you can sort them and do a single loop over the result counting the items (identical items will be next to each other). But it might be faster to make them hashable and use a dict.
def most_common(lst):
cur_length = 0
max_length = 0
cur_i = 0
max_i = 0
cur_item = None
max_item = None
for i, item in sorted(enumerate(lst), key=lambda x: x[1]):
if cur_item is None or cur_item != item:
if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
max_length = cur_length
max_i = cur_i
max_item = cur_item
cur_length = 1
cur_i = i
cur_item = item
else:
cur_length += 1
if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
return cur_item
return max_item
This is an O(n) solution.
mydict = {}
cnt, itm = 0, ''
for item in reversed(lst):
mydict[item] = mydict.get(item, 0) + 1
if mydict[item] >= cnt :
cnt, itm = mydict[item], item
print itm
(reversed is used to make sure that it returns the lowest index item)
Sort a copy of the list and find the longest run. You can decorate the list before sorting it with the index of each element, and then choose the run that starts with the lowest index in the case of a tie.
A one-liner:
def most_common (lst):
return max(((item, lst.count(item)) for item in set(lst)), key=lambda a: a[1])[0]
I am doing this using scipy stat module and lambda:
import scipy.stats
lst = [1,2,3,4,5,6,7,5]
most_freq_val = lambda x: scipy.stats.mode(x)[0][0]
print(most_freq_val(lst))
Result:
most_freq_val = 5
# use Decorate, Sort, Undecorate to solve the problem
def most_common(iterable):
# Make a list with tuples: (item, index)
# The index will be used later to break ties for most common item.
lst = [(x, i) for i, x in enumerate(iterable)]
lst.sort()
# lst_final will also be a list of tuples: (count, index, item)
# Sorting on this list will find us the most common item, and the index
# will break ties so the one listed first wins. Count is negative so
# largest count will have lowest value and sort first.
lst_final = []
# Get an iterator for our new list...
itr = iter(lst)
# ...and pop the first tuple off. Setup current state vars for loop.
count = 1
tup = next(itr)
x_cur, i_cur = tup
# Loop over sorted list of tuples, counting occurrences of item.
for tup in itr:
# Same item again?
if x_cur == tup[0]:
# Yes, same item; increment count
count += 1
else:
# No, new item, so write previous current item to lst_final...
t = (-count, i_cur, x_cur)
lst_final.append(t)
# ...and reset current state vars for loop.
x_cur, i_cur = tup
count = 1
# Write final item after loop ends
t = (-count, i_cur, x_cur)
lst_final.append(t)
lst_final.sort()
answer = lst_final[0][2]
return answer
print most_common(['x', 'e', 'a', 'e', 'a', 'e', 'e']) # prints 'e'
print most_common(['goose', 'duck', 'duck', 'goose']) # prints 'goose'
Simple one line solution
moc= max([(lst.count(chr),chr) for chr in set(lst)])
It will return most frequent element with its frequency.
You probably don't need this anymore, but this is what I did for a similar problem. (It looks longer than it is because of the comments.)
itemList = ['hi', 'hi', 'hello', 'bye']
counter = {}
maxItemCount = 0
for item in itemList:
try:
# Referencing this will cause a KeyError exception
# if it doesn't already exist
counter[item]
# ... meaning if we get this far it didn't happen so
# we'll increment
counter[item] += 1
except KeyError:
# If we got a KeyError we need to create the
# dictionary key
counter[item] = 1
# Keep overwriting maxItemCount with the latest number,
# if it's higher than the existing itemCount
if counter[item] > maxItemCount:
maxItemCount = counter[item]
mostPopularItem = item
print mostPopularItem
Building on Luiz's answer, but satisfying the "in case of draws the item with the lowest index should be returned" condition:
from statistics import mode, StatisticsError
def most_common(l):
try:
return mode(l)
except StatisticsError as e:
# will only return the first element if no unique mode found
if 'no unique mode' in e.args[0]:
return l[0]
# this is for "StatisticsError: no mode for empty data"
# after calling mode([])
raise
Example:
>>> most_common(['a', 'b', 'b'])
'b'
>>> most_common([1, 2])
1
>>> most_common([])
StatisticsError: no mode for empty data
ans = [1, 1, 0, 0, 1, 1]
all_ans = {ans.count(ans[i]): ans[i] for i in range(len(ans))}
print(all_ans)
all_ans={4: 1, 2: 0}
max_key = max(all_ans.keys())
4
print(all_ans[max_key])
1
#This will return the list sorted by frequency:
def orderByFrequency(list):
listUniqueValues = np.unique(list)
listQty = []
listOrderedByFrequency = []
for i in range(len(listUniqueValues)):
listQty.append(list.count(listUniqueValues[i]))
for i in range(len(listQty)):
index_bigger = np.argmax(listQty)
for j in range(listQty[index_bigger]):
listOrderedByFrequency.append(listUniqueValues[index_bigger])
listQty[index_bigger] = -1
return listOrderedByFrequency
#And this will return a list with the most frequent values in a list:
def getMostFrequentValues(list):
if (len(list) <= 1):
return list
list_most_frequent = []
list_ordered_by_frequency = orderByFrequency(list)
list_most_frequent.append(list_ordered_by_frequency[0])
frequency = list_ordered_by_frequency.count(list_ordered_by_frequency[0])
index = 0
while(index < len(list_ordered_by_frequency)):
index = index + frequency
if(index < len(list_ordered_by_frequency)):
testValue = list_ordered_by_frequency[index]
testValueFrequency = list_ordered_by_frequency.count(testValue)
if (testValueFrequency == frequency):
list_most_frequent.append(testValue)
else:
break
return list_most_frequent
#tests:
print(getMostFrequentValues([]))
print(getMostFrequentValues([1]))
print(getMostFrequentValues([1,1]))
print(getMostFrequentValues([2,1]))
print(getMostFrequentValues([2,2,1]))
print(getMostFrequentValues([1,2,1,2]))
print(getMostFrequentValues([1,2,1,2,2]))
print(getMostFrequentValues([3,2,3,5,6,3,2,2]))
print(getMostFrequentValues([1,2,2,60,50,3,3,50,3,4,50,4,4,60,60]))
Results:
[]
[1]
[1]
[1, 2]
[2]
[1, 2]
[2]
[2, 3]
[3, 4, 50, 60]
Here:
def most_common(l):
max = 0
maxitem = None
for x in set(l):
count = l.count(x)
if count > max:
max = count
maxitem = x
return maxitem
I have a vague feeling there is a method somewhere in the standard library that will give you the count of each element, but I can't find it.
This is the obvious slow solution (O(n^2)) if neither sorting nor hashing is feasible, but equality comparison (==) is available:
def most_common(items):
if not items:
raise ValueError
fitems = []
best_idx = 0
for item in items:
item_missing = True
i = 0
for fitem in fitems:
if fitem[0] == item:
fitem[1] += 1
d = fitem[1] - fitems[best_idx][1]
if d > 0 or (d == 0 and fitems[best_idx][2] > fitem[2]):
best_idx = i
item_missing = False
break
i += 1
if item_missing:
fitems.append([item, 1, i])
return items[best_idx]
But making your items hashable or sortable (as recommended by other answers) would almost always make finding the most common element faster if the length of your list (n) is large. O(n) on average with hashing, and O(n*log(n)) at worst for sorting.
>>> li = ['goose', 'duck', 'duck']
>>> def foo(li):
st = set(li)
mx = -1
for each in st:
temp = li.count(each):
if mx < temp:
mx = temp
h = each
return h
>>> foo(li)
'duck'
I needed to do this in a recent program. I'll admit it, I couldn't understand Alex's answer, so this is what I ended up with.
def mostPopular(l):
mpEl=None
mpIndex=0
mpCount=0
curEl=None
curCount=0
for i, el in sorted(enumerate(l), key=lambda x: (x[1], x[0]), reverse=True):
curCount=curCount+1 if el==curEl else 1
curEl=el
if curCount>mpCount \
or (curCount==mpCount and i<mpIndex):
mpEl=curEl
mpIndex=i
mpCount=curCount
return mpEl, mpCount, mpIndex
I timed it against Alex's solution and it's about 10-15% faster for short lists, but once you go over 100 elements or more (tested up to 200000) it's about 20% slower.
def most_frequent(List):
counter = 0
num = List[0]
for i in List:
curr_frequency = List.count(i)
if(curr_frequency> counter):
counter = curr_frequency
num = i
return num
List = [2, 1, 2, 2, 1, 3]
print(most_frequent(List))
Hi this is a very simple solution, with linear time complexity
L = ['goose', 'duck', 'duck']
def most_common(L):
current_winner = 0
max_repeated = None
for i in L:
amount_times = L.count(i)
if amount_times > current_winner:
current_winner = amount_times
max_repeated = i
return max_repeated
print(most_common(L))
"duck"
Where number, is the element in the list that repeats most of the time
numbers = [1, 3, 7, 4, 3, 0, 3, 6, 3]
max_repeat_num = max(numbers, key=numbers.count) *# which number most* frequently
max_repeat = numbers.count(max_repeat_num) *#how many times*
print(f" the number {max_repeat_num} is repeated{max_repeat} times")
def mostCommonElement(list):
count = {} // dict holder
max = 0 // keep track of the count by key
result = None // holder when count is greater than max
for i in list:
if i not in count:
count[i] = 1
else:
count[i] += 1
if count[i] > max:
max = count[i]
result = i
return result
mostCommonElement(["a","b","a","c"]) -> "a"
The most common element should be the one which is appearing more than N/2 times in the array where N being the len(array). The below technique will do it in O(n) time complexity, with just consuming O(1) auxiliary space.
from collections import Counter
def majorityElement(arr):
majority_elem = Counter(arr)
size = len(arr)
for key, val in majority_elem.items():
if val > size/2:
return key
return -1
def most_common(lst):
if max([lst.count(i)for i in lst]) == 1:
return False
else:
return max(set(lst), key=lst.count)
def popular(L):
C={}
for a in L:
C[a]=L.count(a)
for b in C.keys():
if C[b]==max(C.values()):
return b
L=[2,3,5,3,6,3,6,3,6,3,7,467,4,7,4]
print popular(L)

return unique python lists of chars ignoring order

Problem:
Consider a python list of lists that contains a sequence of chars:
[['A', 'B'],['A','B','C'],['B','A'],['C','A','B'],['D'],['D'],['Ao','B']]
The goal is to return the unique lists, regardless of order:
[['A','B'],['A','B','C'],['D'],['Ao','B']]
Attempt:
I'm able to achieve my goal using many if/else statements with try/exceptions. What would be the most pythonic (faster) way to approach this problem? Thanks!
def check_duplicates(x,list_):
for li in list_:
if compare(x,li):
return True
def compare(s, t):
t = list(t) # make a mutable copy
try:
for elem in s:
t.remove(elem)
except ValueError:
return False
return not t
vars_list = [['A', 'B'],['A','B','C'],['B','A'],['C','A','B'],['D'],['D'],['Ao','B']]
second_list = []
for i in vars_list:
if check_duplicates(i,second_list):
continue
else:
second_list.append(i)
print(i)
Assuming that the elements of the nested lists are hashable, you can isolate the unique collections by constructing a set of frozensets from the nested list:
unique_sets = {frozenset(l) for l in vars_list}
# {frozenset({'D'}),
# frozenset({'A', 'B'}),
# frozenset({'A', 'B', 'C'}),
# frozenset({'Ao', 'B'})}
If you need a list-of-lists as the output, you can obtain one trivially with [list(s) for s in unique_sets].

Data Structure Option

I'm wondering what appropriate data structure I'm going to use to store information about chemical elements that I have in a text file. My program should
read and process input from the user. If the user enters an integer then it program
should display the symbol and name of the element with the number of protons
entered. If the user enters a string then my program should display the number
of protons for the element with that name or symbol.
The text file is formatted as below
# element.txt
1,H,Hydrogen
2,He,Helium
3,Li,Lithium
4,Be,Beryllium
...
I thought of dictionary but figured that mapping a string to a list can be tricky as my program would respond based on whether the user provides an integer or a string.
You shouldn't be worried about the "performance" of looking for an element:
There are no more than 200 elements, which is a small number for a computer;
Since the program interacts with a human user, the human will be orders of magnitude slower than the computer anyway.
Option 1: pandas.DataFrame
Hence I suggest a simple pandas DataFrame:
import pandas as pd
df = pd.read_csv('element.txt')
df.columns = ['Number', 'Symbol', 'Name']
def get_column_and_key(s):
s = s.strip()
try:
k = int(s)
return 'Number', k
except ValueError:
if len(s) <= 2:
return 'Symbol', s
else:
return 'Name', s
def find_element(s):
column, key = get_column_and_key(s)
return df[df[column] == key]
def play():
keep_going = True
while keep_going:
s = input('>>>> ')
if s[0] == 'q':
keep_going = False
else:
print(find_element(s))
if __name__ == '__main__':
play()
See also:
Finding elements in a pandas dataframe
Option 2: three redundant dicts
One of python's most used data structures is dict. Here we have three different possible keys, so we'll use three dict.
import csv
with open('element.txt', 'r') as f:
data = csv.reader(f)
elements_by_num = {}
elements_by_symbol = {}
elements_by_name = {}
for row in data:
num, symbol, name = int(row[0]), row[1], row[2]
elements_by_num[num] = num, symbol, name
elements_by_symbol[symbol] = num, symbol, name
elements_by_name[name] = num, symbol, name
def get_dict_and_key(s):
s = s.strip()
try:
k = int(s)
return elements_by_num, k
except ValueError:
if len(s) <= 2:
return elements_by_symbol, s
else:
return elements_by_name, s
def find_element(s):
d, key = get_dict_and_key(s)
return d[key]
def play():
keep_going = True
while keep_going:
s = input('>>>> ')
if s[0] == 'q':
keep_going = False
else:
print(find_element(s))
if __name__ == '__main__':
play()
You are right that it is tricky. However, I suggest you just make three dictionaries. You certainly can just store the data in a 2d list, but that'd be way harder to make and access than using three dicts. If you desire, you can join the three dicts into one. I personally wouldn't, but the final choice is always up to you.
weight = {1: ("H", "Hydrogen"), 2: ...}
symbol = {"H": (1, "Hydrogen"), "He": ...}
name = {"Hydrogen": (1, "H"), "Helium": ...}
If you want to get into databases and some QLs, I suggest looking into sqlite3. It's a classic, thus it's well documented.

How to add data with same key in dictionary [duplicate]

I have a text file which contains duplicate car registration numbers with different values, like so:
EDF768, Bill Meyer, 2456, Vet_Parking
TY5678, Jane Miller, 8987, AgHort_Parking
GEF123, Jill Black, 3456, Creche_Parking
ABC234, Fred Greenside, 2345, AgHort_Parking
GH7682, Clara Hill, 7689, AgHort_Parking
JU9807, Jacky Blair, 7867, Vet_Parking
KLOI98, Martha Miller, 4563, Vet_Parking
ADF645, Cloe Freckle, 6789, Vet_Parking
DF7800, Jacko Frizzle, 4532, Creche_Parking
WER546, Olga Grey, 9898, Creche_Parking
HUY768, Wilbur Matty, 8912, Creche_Parking
EDF768, Jenny Meyer, 9987, Vet_Parking
TY5678, Jo King, 8987, AgHort_Parking
JU9807, Mike Green, 3212, Vet_Parking
I want to create a dictionary from this data, which uses the registration numbers (first column) as keys and the data from the rest of the line for values.
I wrote this code:
data_dict = {}
data_list = []
def createDictionaryModified(filename):
path = "C:\Users\user\Desktop"
basename = "ParkingData_Part3.txt"
filename = path + "//" + basename
file = open(filename)
contents = file.read()
print(contents,"\n")
data_list = [lines.split(",") for lines in contents.split("\n")]
for line in data_list:
regNumber = line[0]
name = line[1]
phoneExtn = line[2]
carpark = line[3].strip()
details = (name,phoneExtn,carpark)
data_dict[regNumber] = details
print(data_dict,"\n")
print(data_dict.items(),"\n")
print(data_dict.values())
The problem is that the data file contains duplicate values for the registration numbers. When I try to store them in the same dictionary with data_dict[regNumber] = details, the old value is overwritten.
How do I make a dictionary with duplicate keys?
Sometimes people want to "combine" or "merge" multiple existing dictionaries by just putting all the items into a single dict, and are surprised or annoyed that duplicate keys are overwritten. See the related question How to merge dicts, collecting values from matching keys? for dealing with this problem.
Python dictionaries don't support duplicate keys. One way around is to store lists or sets inside the dictionary.
One easy way to achieve this is by using defaultdict:
from collections import defaultdict
data_dict = defaultdict(list)
All you have to do is replace
data_dict[regNumber] = details
with
data_dict[regNumber].append(details)
and you'll get a dictionary of lists.
You can change the behavior of the built in types in Python. For your case it's really easy to create a dict subclass that will store duplicated values in lists under the same key automatically:
class Dictlist(dict):
def __setitem__(self, key, value):
try:
self[key]
except KeyError:
super(Dictlist, self).__setitem__(key, [])
self[key].append(value)
Output example:
>>> d = dictlist.Dictlist()
>>> d['test'] = 1
>>> d['test'] = 2
>>> d['test'] = 3
>>> d
{'test': [1, 2, 3]}
>>> d['other'] = 100
>>> d
{'test': [1, 2, 3], 'other': [100]}
Rather than using a defaultdict or messing around with membership tests or manual exception handling, use the setdefault method to add new empty lists to the dictionary when they're needed:
results = {} # use a normal dictionary for our output
for k, v in some_data: # the keys may be duplicates
results.setdefault(k, []).append(v) # magic happens here!
setdefault checks to see if the first argument (the key) is already in the dictionary. If doesn't find anything, it assigns the second argument (the default value, an empty list in this case) as a new value for the key. If the key does exist, nothing special is done (the default goes unused). In either case though, the value (whether old or new) gets returned, so we can unconditionally call append on it (knowing it should always be a list).
You can't have a dict with duplicate keys for definition!
Instead you can use a single key and, as the value, a list of elements that had that key.
So you can follow these steps:
See if the current element's key (of your initial set) is in the final dict. If it is, go to step 3
Update dict with key
Append the new value to the dict[key] list
Repeat [1-3]
If you want to have lists only when they are necessary, and values in any other cases, then you can do this:
class DictList(dict):
def __setitem__(self, key, value):
try:
# Assumes there is a list on the key
self[key].append(value)
except KeyError: # If it fails, because there is no key
super(DictList, self).__setitem__(key, value)
except AttributeError: # If it fails because it is not a list
super(DictList, self).__setitem__(key, [self[key], value])
You can then do the following:
dl = DictList()
dl['a'] = 1
dl['b'] = 2
dl['b'] = 3
Which will store the following {'a': 1, 'b': [2, 3]}.
I tend to use this implementation when I want to have reverse/inverse dictionaries, in which case I simply do:
my_dict = {1: 'a', 2: 'b', 3: 'b'}
rev = DictList()
for k, v in my_dict.items():
rev_med[v] = k
Which will generate the same output as above: {'a': 1, 'b': [2, 3]}.
CAVEAT: This implementation relies on the non-existence of the append method (in the values you are storing). This might produce unexpected results if the values you are storing are lists. For example,
dl = DictList()
dl['a'] = 1
dl['b'] = [2]
dl['b'] = 3
would produce the same result as before {'a': 1, 'b': [2, 3]}, but one might expected the following: {'a': 1, 'b': [[2], 3]}.
You can refer to the following article:
http://www.wellho.net/mouth/3934_Multiple-identical-keys-in-a-Python-dict-yes-you-can-.html
In a dict, if a key is an object, there are no duplicate problems.
For example:
class p(object):
def __init__(self, name):
self.name = name
def __repr__(self):
return self.name
def __str__(self):
return self.name
d = {p('k'): 1, p('k'): 2}
You can't have duplicated keys in a dictionary. Use a dict of lists:
for line in data_list:
regNumber = line[0]
name = line[1]
phoneExtn = line[2]
carpark = line[3].strip()
details = (name,phoneExtn,carpark)
if not data_dict.has_key(regNumber):
data_dict[regNumber] = [details]
else:
data_dict[regNumber].append(details)
It's pertty old question but maybe my solution help someone.
by overriding __hash__ magic method, you can save same objects in dict.
Example:
from random import choices
class DictStr(str):
"""
This class behave exacly like str class but
can be duplicated in dict
"""
def __new__(cls, value='', custom_id='', id_length=64):
# If you want know why I use __new__ instead of __init__
# SEE: https://stackoverflow.com/a/2673863/9917276
obj = str.__new__(cls, value)
if custom_id:
obj.id = custom_id
else:
# Make a string with length of 64
choice_str = "abcdefghijklmopqrstuvwxyzABCDEFJHIJKLMNOPQRSTUVWXYZ1234567890"
obj.id = ''.join(choices(choice_str, k=id_length))
return obj
def __hash__(self) -> int:
return self.id.__hash__()
Now lets create a dict:
>>> a_1 = DictStr('a')
>>> a_2 = DictStr('a')
>>> a_3 = 'a'
>>> a_1
a
>>> a_2
a
>>> a_1 == a_2 == a_3
True
>>> d = dict()
>>> d[a_1] = 'some_data'
>>> d[a_2] = 'other'
>>> print(d)
{'a': 'some_data', 'a': 'other'}
NOTE: This solution can apply to any basic data structure like (int, float,...)
EXPLANATION :
We can use almost any object as key in dict class (or mostly known as HashMap or HashTable in other languages) but there should be a way to distinguish between keys because dict have no idea about objects.
For this purpose objects that want to add to dictionary as key somehow have to provide a unique identifier number(I name it uniq_id, it's actually a number somehow created with hash algorithm) for themself.
Because dictionary structure widely use in most of solutions,
most of programming languages hide object uniq_id generation inside a hash name buildin method that feed dict in key search
So if you manipulate hash method of your class you can change behaviour of your class as dictionary key
Dictionary does not support duplicate key, instead you can use defaultdict
Below is the example of how to use defaultdict in python3x to solve your problem
from collections import defaultdict
sdict = defaultdict(list)
keys_bucket = list()
data_list = [lines.split(",") for lines in contents.split("\n")]
for data in data_list:
key = data.pop(0)
detail = data
keys_bucket.append(key)
if key in keys_bucket:
sdict[key].append(detail)
else:
sdict[key] = detail
print("\n", dict(sdict))
Above code would produce output as follow:
{'EDF768': [[' Bill Meyer', ' 2456', ' Vet_Parking'], [' Jenny Meyer', ' 9987', ' Vet_Parking']], 'TY5678': [[' Jane Miller', ' 8987', ' AgHort_Parking'], [' Jo King', ' 8987', ' AgHort_Parking']], 'GEF123': [[' Jill Black', ' 3456', ' Creche_Parking']], 'ABC234': [[' Fred Greenside', ' 2345', ' AgHort_Parking']], 'GH7682': [[' Clara Hill', ' 7689', ' AgHort_Parking']], 'JU9807': [[' Jacky Blair', ' 7867', ' Vet_Parking'], [' Mike Green', ' 3212', ' Vet_Parking']], 'KLOI98': [[' Martha Miller', ' 4563', ' Vet_Parking']], 'ADF645': [[' Cloe Freckle', ' 6789', ' Vet_Parking']], 'DF7800': [[' Jacko Frizzle', ' 4532', ' Creche_Parking']], 'WER546': [[' Olga Grey', ' 9898', ' Creche_Parking']], 'HUY768': [[' Wilbur Matty', ' 8912', ' Creche_Parking']]}

Resources