Related
As a beginner in python, I want to know simplest way without using any complicated built in modules or very complex program to form a merged list with common elements in it and with that merged list form a dictionary with smallest element as a key and other than the key everything is value.
For example,
list = [[a, o], [c, r], [d, m], [l, p], [g, t], [e, r], [e, s], [e, o]]
and my merged list should be,
merged list = [[a, e, r, c, s, o], [l, p], [d, m], [g, t]]
and my dictionary output should be,
pair dictionary = {'a': [c, e, r, s ,o], 'l': 'p', 'd': 'm', 'g': 't'}
Suppose I have four one-dimensional numpy arrays A, B, C, D, and I want to create a matrix M such that each entry M[i, j, k, l] of the matrix is the tuple (a, b, c, d)
where a = A[i], b = B[j], c = C[k] and d = D[d].
How can I go about constructing it efficiently without loops?
You can create an empty array M with the correct shape (note the 4 in the last dimension -- that's your tuple), and use broadcasting to assign entire rows/columns in M afterwards.
M = np.empty((
len(a), len(b), len(c), len(d), 4
))
M[..., 0] = a[:, None, None, None]
M[..., 1] = b[None, :, None, None]
M[..., 2] = c[None, None, :, None]
M[..., 3] = d[None, None, None, :]
I have a current dictionary grade_lists that has the student name as the key and a list of grades as the value pair
{'Dave':[100,95,95]}
I need to assign assignment names for the grade values in a new dictionary; in this case, ['Exam 1', 'Exam 2', 'Exam 3']
so that
new_dict['Dave']['Exam 2'] == 95
You could do a nested dict comprehension, using a zip to match up the assignment names and grade values.
grade_lists = {'Dave': [100, 95, 95]}
names = ['Exam 1', 'Exam 2', 'Exam 3']
new_dict = {k: {n: g for n, g in zip(names, v)} for k, v in grade_lists.items()}
print(new_dict)
Output:
{'Dave': {'Exam 1': 100, 'Exam 2': 95, 'Exam 3': 95}}
A short variant of #wjandrea's answer:
new_d = {k: {x: v for x, v in zip(lst, *d.values())} for k in d}
where d is your existing dictionary, lst is the list of exam names.
Example:
d = {'Dave':[100,95,95]}
lst = ['Exam 1', 'Exam 2', 'Exam 3']
new_d = {k: {x: v for x, v in zip(lst, *d.values())} for k in d}
# {'Dave': {'Exam 1': 100, 'Exam 2': 95, 'Exam 3': 95}}
I have 5 lists of words. I need to find all words occurring in more than 2 lists. Any word can occur multiple times in a list.
I have used collections.Counter but it only returns the frequencies of all the words in individual lists.
a = ['wood', 'tree', 'bark', 'log']
b = ['branch', 'mill', 'boat', 'boat', 'house']
c = ['log', 'tree', 'water', 'boat']
d = ['water', 'log', 'branch', 'water']
e = ['branch', 'rock', 'log']
For example, the output from these lists should be ['log':4, 'branch':3] as 'log' is present in 4 lists and 'branch' in 3.
Without Counter:
a = ['wood', 'tree', 'bark', 'log']
b = ['branch', 'mill', 'boat', 'boat', 'house']
c = ['log', 'tree', 'water', 'boat']
d = ['water', 'log', 'branch', 'water']
e = ['branch', 'rock', 'log']
all_lists = [a, b, c, d, e]
all_words = set().union(w for l in all_lists for w in l)
out = {}
for word in all_words:
s = sum(word in l for l in all_lists)
if s > 2:
out[word] = s
print(out)
Prints:
{'branch': 3, 'log': 4}
Edit (to print the names of lists):
a = ['wood', 'tree', 'bark', 'log']
b = ['branch', 'mill', 'boat', 'boat', 'house']
c = ['log', 'tree', 'water', 'boat']
d = ['water', 'log', 'branch', 'water']
e = ['branch', 'rock', 'log']
all_lists = {'a':a, 'b':b, 'c':c, 'd':d, 'e':e}
all_words = set().union(w for l in all_lists.values() for w in l)
out = {}
for word in all_words:
s = sum(word in l for l in all_lists.values())
if s > 2:
out[word] = s
for k, v in out.items():
print('Word : {}'.format(k))
print('Count: {}'.format(v))
print('Lists: {}'.format(', '.join(kk for kk, vv in all_lists.items() if k in vv )))
print()
Prints:
Word : log
Count: 4
Lists: a, c, d, e
Word : branch
Count: 3
Lists: b, d, e
you can sum the counters - starting with an empty Counter():
from collections import Counter
lists = [a, b, c, d, e]
total = sum((Counter(set(lst)) for lst in lists), Counter())
# Counter({'log': 4, 'branch': 3, 'tree': 2, 'boat': 2, 'water': 2,
# 'wood': 1, 'bark': 1, 'house': 1, 'mill': 1, 'rock': 1})
res = {word: occ for word, occ in total.items() if occ > 2}
# {'log': 4, 'branch': 3}
note that i convert all the lists to a set first in order to avoid double-counts for the words that are more than once in the same list.
if you need to know what list the words were from you could try this:
lists = {"a": a, "b": b, "c": c, "d": d, "e": e}
total = sum((Counter(set(lst)) for lst in lists.values()), Counter())
# Counter({'log': 4, 'branch': 3, 'tree': 2, 'boat': 2, 'water': 2,
# 'wood': 1, 'bark': 1, 'house': 1, 'mill': 1, 'rock': 1})
res = {word: occ for word, occ in total.items() if occ > 2}
# {'log': 4, 'branch': 3}
word_appears_in = {
word: [key for key, value in lists.items() if word in value] for word in res
}
# {'log': ['a', 'c', 'd', 'e'], 'branch': ['b', 'd', 'e']}
I'm trying to understand the normalized squared euclidean distance formula from the Wolfram documentation:
1/2*Norm[(u-Mean[u])-(v-Mean[v])]^2/(Norm[u-Mean[u]]^2+Norm[v-Mean[v]]^2)
I searched around for this formula on the web but couldn't find it. Can someone explain how this formula is derived?
Meaning of this formula is the following:
Distance between two vectors where there lengths have been scaled to
have unit norm. This is helpful when the direction of the vector is
meaningful but the magnitude is not.
https://stats.stackexchange.com/questions/136232/definition-of-normalized-euclidean-distance
Further to Luca's comment, here is an example showing the "distance between two vectors where their lengths have been scaled to have unit norm". It doesn't equal the normalised square Euclidean distance. The former is coloured blue in the graphic below. The standard Euclidean distance is coloured red.
(* Leave this unevaluated to see symbolic expressions *)
{{a, b, c}, {d, e, f}} = {{1, 2, 3}, {3, 5, 10}};
N[EuclideanDistance[{a, b, c}, {d, e, f}]]
7.87401
Norm[{a, b, c} - {d, e, f}]
SquaredEuclideanDistance[{a, b, c}, {d, e, f}]
Norm[{a, b, c} - {d, e, f}]^2
N[NormalizedSquaredEuclideanDistance[{a, b, c}, {d, e, f}]]
0.25
(1/2 Norm[({a, b, c} - Mean[{a, b, c}]) - ({d, e, f} - Mean[{d, e, f}])]^2)/
(Norm[{a, b, c} - Mean[{a, b, c}]]^2 + Norm[{d, e, f} - Mean[{d, e, f}]]^2)
1/2 Variance[{a, b, c} - {d, e, f}]/(Variance[{a, b, c}] + Variance[{d, e, f}])
{a2, b2, c2} = Normalize[{a, b, c}];
{d2, e2, f2} = Normalize[{d, e, f}];
N[EuclideanDistance[{a2, b2, c2}, {d2, e2, f2}]]
0.120185
Graphics3D[{Line[{{0, 0, 0}, {1, 2, 3}}],
Line[{{0, 0, 0}, {3, 5, 10}}],
Red, Thick, Line[{{1, 2, 3}, {3, 5, 10}}],
Blue, Line[{{a2, b2, c2}, {d2, e2, f2}}]},
Axes -> True, AspectRatio -> 1,
PlotRange -> {{0, 10}, {0, 10}, {0, 10}},
AxesLabel -> Map[Style[#, Bold, 16] &, {"x", "y", "z"}],
AxesEdge -> {{-1, -1}, {-1, -1}, {-1, -1}},
ViewPoint -> {1.275, -2.433, -1.975},
ViewVertical -> {0.551, -0.778, 0.302}]