Finding commonly occurring words across multiple lists

Finding commonly occurring words across multiple lists - python-3.x

I have 5 lists of words. I need to find all words occurring in more than 2 lists. Any word can occur multiple times in a list.
I have used collections.Counter but it only returns the frequencies of all the words in individual lists.
a = ['wood', 'tree', 'bark', 'log']
b = ['branch', 'mill', 'boat', 'boat', 'house']
c = ['log', 'tree', 'water', 'boat']
d = ['water', 'log', 'branch', 'water']
e = ['branch', 'rock', 'log']
For example, the output from these lists should be ['log':4, 'branch':3] as 'log' is present in 4 lists and 'branch' in 3.

Without Counter:
a = ['wood', 'tree', 'bark', 'log']
b = ['branch', 'mill', 'boat', 'boat', 'house']
c = ['log', 'tree', 'water', 'boat']
d = ['water', 'log', 'branch', 'water']
e = ['branch', 'rock', 'log']
all_lists = [a, b, c, d, e]
all_words = set().union(w for l in all_lists for w in l)
out = {}
for word in all_words:
s = sum(word in l for l in all_lists)
if s > 2:
out[word] = s
print(out)
Prints:
{'branch': 3, 'log': 4}
Edit (to print the names of lists):
a = ['wood', 'tree', 'bark', 'log']
b = ['branch', 'mill', 'boat', 'boat', 'house']
c = ['log', 'tree', 'water', 'boat']
d = ['water', 'log', 'branch', 'water']
e = ['branch', 'rock', 'log']
all_lists = {'a':a, 'b':b, 'c':c, 'd':d, 'e':e}
all_words = set().union(w for l in all_lists.values() for w in l)
out = {}
for word in all_words:
s = sum(word in l for l in all_lists.values())
if s > 2:
out[word] = s
for k, v in out.items():
print('Word : {}'.format(k))
print('Count: {}'.format(v))
print('Lists: {}'.format(', '.join(kk for kk, vv in all_lists.items() if k in vv )))
print()
Prints:
Word : log
Count: 4
Lists: a, c, d, e
Word : branch
Count: 3
Lists: b, d, e

you can sum the counters - starting with an empty Counter():
from collections import Counter
lists = [a, b, c, d, e]
total = sum((Counter(set(lst)) for lst in lists), Counter())
# Counter({'log': 4, 'branch': 3, 'tree': 2, 'boat': 2, 'water': 2,
# 'wood': 1, 'bark': 1, 'house': 1, 'mill': 1, 'rock': 1})
res = {word: occ for word, occ in total.items() if occ > 2}
# {'log': 4, 'branch': 3}
note that i convert all the lists to a set first in order to avoid double-counts for the words that are more than once in the same list.
if you need to know what list the words were from you could try this:
lists = {"a": a, "b": b, "c": c, "d": d, "e": e}
total = sum((Counter(set(lst)) for lst in lists.values()), Counter())
# Counter({'log': 4, 'branch': 3, 'tree': 2, 'boat': 2, 'water': 2,
# 'wood': 1, 'bark': 1, 'house': 1, 'mill': 1, 'rock': 1})
res = {word: occ for word, occ in total.items() if occ > 2}
# {'log': 4, 'branch': 3}
word_appears_in = {
word: [key for key, value in lists.items() if word in value] for word in res
}
# {'log': ['a', 'c', 'd', 'e'], 'branch': ['b', 'd', 'e']}

Related

How to check the values in two dictionaries have the same type?

For example, I have two dictionaries having the same keys:
a = {"a": 1, "b": 2, "c":4.5, "d":[1,2], "e":"string", "f":{"f1":0.0, "f2":1.5}}
b = {"a": 10, "b": 20, "c":3.5, "d":[0,2,4], "e":"q", "f":{"f1":1.0, "f2":0.0}}
and I want to compare the types. My code is something like this:
if type(a["a"]) == type(b["a"]) and type(a["b"]) == type(b["b"]) and type(a["c"]) == type(b["c"]) and type(a["d"]) == type(b["d"]) and type(a["e"]) == type(b["e"]) and type(a["f"]) == type(b["f"]) and type(a["f"]["f1"]) == type(b["f"]["f1"]) and type(a["f"]["f2"]) == type(b["f"]["f2"]):
first_type = type(b["d"][0])
if all( (type(x) is first_type) for x in a["d"] )
#do something
pass
Is there a better way to do it?

You can make a list of the common keys between the dicts:
common_keys = a.keys() & b.keys()
and then iterate over them to check the types:
for k in common_keys:
if type(a[k]) == type(b[k]):
print("Yes, same type! " + k, a[k], b[k])
else:
print("Nope! " + k, a[k], b[k])
and if you wanted to go deeper, check if any of the items are dicts, rinse an repeat
for k in common_keys:
if type(a[k]) == type(b[k]):
print("Yes, same type! " + k, type(a[k]), type(b[k]))
if isinstance(a[k], dict):
ck = a[k].keys() & b[k].keys()
for key in ck:
if type(a[k][key]) == type(b[k][key]):
print("Yes, same type! " + key, type(a[k][key]), type(b[k][key]))
else:
print("Nope!")
else:
print("Nope! " + k, type(a[k]), type(b[k]))

You can use a for loop to iterate through the dicts:
same_types = True
for key in a.keys():
if type(a[key]) != type(b[key]):
same_types = False
break
# if the value is a dict, check nested value types
if type(a[key]) == dict:
for nest_key in a[key].keys():
if type(a[key][nest_key]) != type(b[key][nest_key]):
same_types = False
break
# if the value is a list, check all list elements
# I just simply concat two lists together, you can also refer to
# https://stackoverflow.com/q/35554208/19322223
elif type(a[key]) == list:
first_type = a[key][0]
for elem in a[key] + b[key]:
if type(elem) != first_type:
same_types = False
break
if not same_types:
break
if same_types:
# do something

With the following helper function:
def get_types(obj, items=None):
"""Function that recursively traverses 'obj' and returns
a list of all values and nested values types
"""
if not items:
items = []
if isinstance(obj, dict):
for value in obj.values():
if not isinstance(value, (dict, list, set, tuple)):
items.append(value)
else:
get_types(value, items)
elif isinstance(obj, (list, set, tuple)):
for value in obj:
get_types(value, items)
else:
items.append(obj)
return [type(x) for x in items]
You can compare two dictionaries' values types however deeply nested these are, like this:
if get_types(a) == get_types(b):
print("Each a and b values are of same types")
Since, in your example, a misses one value for d key ([1, 2]) compared to the other dict ([0, 2, 4]), nothing will be printed.
Let's take another example where both dictionaries have the same shape this time, but one value of different type (f2):
a = {"a": 1, "b": [[1, 2], [3, [4]]], "c": {"c1": 0.0, "c2": {"x": "9"}}}
b = {"d": 7, "e": [[2, 1], [5, [7]]], "f": {"f1": 8.9, "f2": {"y": 9}}}
if get_types(a) == get_types(b):
print("Each a and b values are of same types")
Then again, nothing will be printed.
But if you replace 9 by "9" in b["f2"]:
a = {"a": 1, "b": [[1, 2], [3, [4]]], "c": {"c1": 0.0, "c2": {"x": "9"}}}
b = {"d": 7, "e": [[2, 1], [5, [7]]], "f": {"f1": 8.9, "f2": {"y": "9"}}}
if get_types(a) == get_types(b):
print("Each a and b values are of same types")
# Output
# Each a and b values are of same types

Creating new dictionaries using keys from main dict[BEGINNER]

I'm trying to check whether a specific key ends with ":F" or ":M", so i can create new dictionaries called male and female.
d = {'Jane:F': 3, 'Tom:M': 2, 'Jeff:M': 5, 'Mary:F': 3} #initial dict
male = {'Tom': 2, 'Jeff': 5} #required output
female = {'Jane': 3, 'Mary': 3} #required output

You can split the key by : and then check for M and/or F:
d = {'Jane:F': 3, 'Tom:M': 2, 'Jeff:M': 5, 'Mary:F': 3}
male, female = {}, {}
for k, v in d.items():
k, gender = k.split(':')
if gender == 'M':
male[k] = v
else:
female[k] = v
print(male)
print(female)
Prints:
{'Tom': 2, 'Jeff': 5}
{'Jane': 3, 'Mary': 3}
Another version using dict-comprehensions:
male = {k.split(':')[0]: v for k, v in d.items() if k.endswith(':M')}
female = {k.split(':')[0]: v for k, v in d.items() if k.endswith(':F')}

how to generate the shortest string from two string

I would like to write a function to return the shortest string C from two string, A, B and make sure string A, B is substring of C. But the key is length of A does not have to longer than B
ex:
A: 'abcd', B: 'cde' = > C: 'abcde' # c,d is duplicated
A: 'abcd', B: 'ecd' = > C: 'abcdecd' #no character duplicated so C is A + B
A: 'abc', B: 'cdeab' = > C: 'cdeabc'
A: 'bce', B: 'eabc' = > C: 'eabce' #length of eabcd is 5, length of bceabc is 6
A: '', B: 'abc' = > C: 'abc'
A: 'abc', B: '' = > C: 'abc'
I have following function but it seems like it is not correct
def checksubstring(A, B):
if not A or not B: return A if not B else B
index, string = 0, ''
for i, c in enumerate(A):
index = index + 1 if c == B[index] else 0
string += c
return string + B[index:]

You can back up from the end looking for a match like:
Code:
def shortest_substring(a, b):
def checksubstring(a, b):
if not a or not b:
return b or a
for i in range(1, len(b)):
if a[len(a) - i:] == b[:i]:
return a + b[i:]
return a + b
x = checksubstring(a, b)
y = checksubstring(b, a)
return x if len(x) <= len(y) else y
Test Code:
results = [
{'A': 'abcd', 'B': 'cde', 'C': 'abcde'},
{'A': 'abcd', 'B': 'ecd', 'C': 'abcdecd'},
{'A': 'abc', 'B': 'cdeab', 'C': 'cdeabc'},
{'A': 'bce', 'B': 'eabc', 'C': 'eabce'},
{'A': '', 'B': 'abc', 'C': 'abc'},
{'A': 'abc', 'B': '', 'C': 'abc'},
]
for result in results:
assert result['C'] == shortest_substring(result['A'], result['B'])

You must check A,B and B,A, and after this check their results:
def checksubstring(A, B):
if not A or not B: return A if not B else B
index, string = 0, ''
for i, c in enumerate(A):
index = index + 1 if c == B[index] else 0
string += c
return string + B[index:]
def test(A, B):
s1 = checksubstring(A, B)
s2 = checksubstring(B, A)
if len(s1) > len(s2):
return s2
else:
return s1
print(test('abcd', 'cde')) # = > C: 'abcde' # c,d is duplicated
print(test('abcd', 'ecd')) # = > C: 'abcdecd' #no character duplicated so C is A + B
print(test('abc', 'cdeab')) # = > C: 'cdeabc'
print(test('bce', 'eabc')) # = > C: 'eabce' #length of eabcd is 5, length of bceabc is 6
print(test('', 'abc')) # = > C: 'abc'
print(test('abc', '')) # = > C: 'abc'

Python Box and Pointers in lists

I'm studying for my final semester written paper. I was taught to trace the mutations of lists using box and pointer diagrams. However there was one question I came across where my method didn't work.
#Main code 1
a = [1, 2]
b = [a, a]
c = a.copy()
c[0], a[1] = b[1], c[0] #replace this
#code A
##c[0]=b[1]
##a[1]=c[0]
#code B
##a[1]=c[0]
##c[0]=b[1]
print(a)
print(b)
print(c)
##Normal Output/Code B
##[1, 1]
##[[1, 1], [1, 1]]
##[[1, 1], 2]
##Code A output
##[1, [...]]
##[[1, [...]], [1, [...]]]
##[[1, [...]], 2]
The main code is written as it is, and the given answer is shown in Normal Output(proven by Python3 IDLE). When I did the tracing on paper, I thought that the code would produce Code A output instead.
Is there any knowledge that I am missing out here?
Here are some other similar mutation questions that my method of tracing has worked in, but I can't draw any differences to why this behaviour is only seen in the first code.
#Code2 where switching line 3 like code 1 doesn't matter
a = [['a', 'b'], ['c'], 'd']
b = a[:-1]
a[1], b[0][1] = b[0], a[2] #switch this
print(a)
print(b)
##[['a', 'd'], ['a', 'd'], 'd']
##[['a', 'd'], ['c']]
#Code3 that has the same output even with a 1 liner replacement
a = [1,2,3]
b = [a,3,4]
a[2] = b
b[0][0] = b
a[0][1] = 99
a[2][0] = 5
#a[2],b[0][0],a[0][1],a[2][0]=b,b,99,5 #1 liner replacement
print(a)
#[[5, 99, 4], 2, [5, 99, 4]]

An assignment statement has two parts, a left side and a right side. The right side is calculated first, then the left side is calculated, then the right side is assigned to the left side, through tuple unpacking if necessary.
So c[0], a[1] = b[1], c[0] becomes c[0], a[1] = (a, 1). Then c[0] = a, then a[1] = 1
So you end up with the final result c == [[1, 1], 2] and a == [1, 1], with b == [a, a] still.
A key thing to realize here is that 1, 2, 3 is the syntax for a tuple.
a, b = 1, 2 and a, b = (1, 2) are exactly the same, it's just that in most contexts there are operators around that require you to parenthesize the tuple elements.
It's easier to follow that example if we give a[0] it's own variable name. So
x = ['a', 'b']
a = [x, ['c'], 'd']
b = a[:-1]
Then a[1], b[0][1] = b[0], a[2] -> a[1], b[0][1] = x, 'd' -> a[1] = x and x[1] = 'd'.
So we end up with x = ['a', 'd'], a = [x, x, 'd'] and b = [x, ['c']]. The ['c'] isn't an x because the second list in a wasn't mutated, it was replaced in a but not in b.

Python: Take each item and iterate through all items in another list

I have two lists.
A=[1,2,3,4,5,6]
B=['a','b','c']
How do I iterate through all the elements of B for each element of list A.

i guess you want itertools.product
from itertools import product
A = [1, 2, 3, 4, 5, 6]
B = ['a', 'b', 'c']
for a, b in product(A, B):
print(a, b)
which produces
1 a
1 b
1 c
2 a
2 b
2 c
3 a
...

You can do it with simple list comprehension followed by string formatting
>>> ["{}{}".format(i,k) for i in A for k in B]
>>> ['1a', '1b', '1c', '2a', '2b', '2c', '3a', '3b', '3c', '4a', '4b', '4c', '5a', '5b', '5c', '6a', '6b', '6c']

You can do it by using nested For loops.
A=[1,2,3,4,5,6]
B=['a','b','c']
c=[]
for i in A:
for j in B:
c.append(str(i)+j)
print(c)
output:
['1a', '1b', '1c', '2a', '2b', '2c', '3a', '3b', '3c', '4a', '4b', '4c', '5a', '5b', '5c', '6a', '6b', '6c']

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Finding commonly occurring words across multiple lists - python-3.x

Related

How to check the values in two dictionaries have the same type?

Creating new dictionaries using keys from main dict[BEGINNER]

how to generate the shortest string from two string

Python Box and Pointers in lists

Python: Take each item and iterate through all items in another list

Categories

Resources