How to assing values to a dictionary - python-3.x

I am creating a function which is supposed to return a dictionary with keys and values from different lists. But I amhavin problems in getting the mean of a list o numbers as values of the dictionary. However, I think I am getting the keys properly.
This is what I get so far:
def exp (magnitudes,measures):
"""return for each magnitude the associated mean of numbers from a list"""
dict_expe = {}
for mag in magnitudes:
dict_expe[mag] = 0
for mea in measures:
summ = 0
for n in mea:
summ += n
dict_expe[mag] = summ/len(mea)
return dict_expe
print(exp(['mag1', 'mag2', 'mag3'], [[1,2,3],[3,4],[5]]))
The output should be:
{mag1 : 2, mag2: 3.5, mag3: 5}
But what I am getting is always 5 as values of all keys. I thought about the zip() method but im trying to avoid it as because the it requieres the same length in both lists.

An average of a sequence is sum(sequence) / len(sequence), so you need to iterate through both magnitudes and measures, calculate these means (arithmetical averages) and store it in a dictionary.
There are much more pythonic ways you can achieve this. All of these examples produce {'mag1': 2.0, 'mag2': 3.5, 'mag3': 5.0} as result.
Using for i in range() loop:
def exp(magnitudes, measures):
means = {}
for i in range(len(magnitudes)):
means[magnitudes[i]] = sum(measures[i]) / len(measures[i])
return means
print(exp(['mag1', 'mag2', 'mag3'], [[1, 2, 3], [3, 4], [5]]))
But if you need both indices and values of a list you can use for i, val in enumerate(sequence) approach which is much more suitable in this case:
def exp(magnitudes, measures):
means = {}
for i, mag in enumerate(magnitudes):
means[mag] = sum(measures[i]) / len(measures[i])
return means
print(exp(['mag1', 'mag2', 'mag3'], [[1, 2, 3], [3, 4], [5]]))
Another problem hides here: i index belongs to magnitudes but we are also getting values from measures using it, this is not a big deal in your case if you have magnitudes and measures the same length but if magnitudes will be larger you will get an IndexError. So it seems to me like using zip function is what would be the best choice here (actually as of python3.6 it doesn't require two lists to be the same length, it will just use the length of shortest one as the length of result):
def exp(magnitudes, measures):
means = {}
for mag, mes in zip(magnitudes, measures):
means[mag] = sum(mes) / len(mes)
return means
print(exp(['mag1', 'mag2', 'mag3'], [[1, 2, 3], [3, 4], [5]]))
So feel free to use the example which suits your requirements of which one you like and don't forget to add docstring.
More likely you don't need such pythonic way but it can be even shorter when dictionary comprehension comes into play:
def exp(magnitudes, measures):
return {mag: sum(mes) / len(mes) for mag, mes in zip(magnitudes, measures)}
print(exp(['mag1', 'mag2', 'mag3'], [[1, 2, 3], [3, 4], [5]]))

Related

How to get duplicates in sorted list with O(n) python

Hear i have a list, and i have to get duplicates from it. So i can use solution like this:
arr = [1, 1, 2, 3, 4, 4, 5]
def get_duplicates(arr):
duplicates = []
for index in range(len(arr)-1):
if arr[index] == arr[index+1]:
duplicates.append(arr[index])
return duplicates
print(*duplicates)
Ok, what if i have three or more duplicates in my list? I did something like that:
arr = [1, 1, 1, 2, 3, 4, 4, 4, 4, 5]
def get_duplicates(arr):
duplicates = []
for index in range(len(arr)-1):
if arr[index] == arr[index+1]:
duplicates.append(arr[index])
return duplicates
print(*set(duplicates))
Is that both my code works with O(n) or not? I just dont know what is speed of set() function in python, but i think that first for loop takes O(n),
if set() takes O(n), it doesnt matter, because finally i will have O(2n) = O(n) in this case.
Do i solve that task correctly, or my code is not effective? Smart people, help me please))
If u know how to do it in wright direction, explain me please.
Here is a version that is clearly O(n):
def get_duplicates(arr):
last_duplicate = None
duplicates = []
for i,v in enumerate(arr[1:]):
if v==arr[i-1] and v!=last_duplicate:
duplicates.append(v)
last_duplicate = v
return duplicates
Note that this assumes, as your original code does, that duplicates will be adjacent to one another. It also assumes that the first duplicate is not None.

Combination of elements in a list leading to a sum K

I have been struggling with a programming problem lately, and I would appreciate any help that I could get.
Basically, the input is a list of numbers (both positive and negative, also, note that the numbers can repeat within the list), and I want to find the combinations of the numbers that lead upto a sum K.
For example,
List - [1,2,3,6,5,-2]
Required sum - 8
Output - Many combinations like : [5,3], [5,2,1], [6,3,1,-2]... and so on
I do understand that there are solutions available to this problem using module functions like “itertools.combinations” or using recursion - subset of sum (works efficiently only for positive numbers on the list), but I am looking for an efficient solution in python3 for lists upto a 100 numbers.
Any and every help would be appreciated.
You are trying to compute the sum over all subsets, not combinations.
This will never be efficient for 100 numbers (atleast in no way I know, either in C++ or in Python), because you need to actually compute all sums, and then filter.
This takes O(n*2^n) time using a method called Dynamic Programming - Sum over Subsets (DP-SOS). The following is the most efficient code possible:
a, k = [1, 2, 3, 6, 5, -2], 8
dp = [0] * (1 << len(a))
for idx, val in enumerate(a):
dp[1 << idx] = a[idx]
for i in range(len(a)):
for mask in range(len(dp)):
if mask & (1 << i):
dp[mask] += dp[mask^(1<<i)]
answer = []
for pos, sum in enumerate(dp):
if sum == k:
answer.append([val for idx, val in enumerate(a)
if (1 << idx) & pos])
Using itertools seems unnecessary to me, though you might get an answer in almost the same (or slightly longer) time, because the bitwise operators even in python are pretty fast.
Nevertheless, using itertools:
from itertools import chain, combinations
a, k = [1, 2, 3, 6, 5, -2], 8
def powerset(iterable):
"powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
answer = list(filter(lambda x: sum(x) == k, powerset(a)))
If you want to scale to 100 numbers, there might not be an exact solution, you would need to use heuristics and not compute the answer explicitly, because if the array is [8, 0, 0, ...(100 times)] then there are 2^99 subsets which you can anyways never compute explicitly or store.

Roll of different amount along a single axis in a 3D matrix [duplicate]

I have a matrix (2d numpy ndarray, to be precise):
A = np.array([[4, 0, 0],
[1, 2, 3],
[0, 0, 5]])
And I want to roll each row of A independently, according to roll values in another array:
r = np.array([2, 0, -1])
That is, I want to do this:
print np.array([np.roll(row, x) for row,x in zip(A, r)])
[[0 0 4]
[1 2 3]
[0 5 0]]
Is there a way to do this efficiently? Perhaps using fancy indexing tricks?
Sure you can do it using advanced indexing, whether it is the fastest way probably depends on your array size (if your rows are large it may not be):
rows, column_indices = np.ogrid[:A.shape[0], :A.shape[1]]
# Use always a negative shift, so that column_indices are valid.
# (could also use module operation)
r[r < 0] += A.shape[1]
column_indices = column_indices - r[:, np.newaxis]
result = A[rows, column_indices]
numpy.lib.stride_tricks.as_strided stricks (abbrev pun intended) again!
Speaking of fancy indexing tricks, there's the infamous - np.lib.stride_tricks.as_strided. The idea/trick would be to get a sliced portion starting from the first column until the second last one and concatenate at the end. This ensures that we can stride in the forward direction as needed to leverage np.lib.stride_tricks.as_strided and thus avoid the need of actually rolling back. That's the whole idea!
Now, in terms of actual implementation we would use scikit-image's view_as_windows to elegantly use np.lib.stride_tricks.as_strided under the hoods. Thus, the final implementation would be -
from skimage.util.shape import view_as_windows as viewW
def strided_indexing_roll(a, r):
# Concatenate with sliced to cover all rolls
a_ext = np.concatenate((a,a[:,:-1]),axis=1)
# Get sliding windows; use advanced-indexing to select appropriate ones
n = a.shape[1]
return viewW(a_ext,(1,n))[np.arange(len(r)), (n-r)%n,0]
Here's a sample run -
In [327]: A = np.array([[4, 0, 0],
...: [1, 2, 3],
...: [0, 0, 5]])
In [328]: r = np.array([2, 0, -1])
In [329]: strided_indexing_roll(A, r)
Out[329]:
array([[0, 0, 4],
[1, 2, 3],
[0, 5, 0]])
Benchmarking
# #seberg's solution
def advindexing_roll(A, r):
rows, column_indices = np.ogrid[:A.shape[0], :A.shape[1]]
r[r < 0] += A.shape[1]
column_indices = column_indices - r[:,np.newaxis]
return A[rows, column_indices]
Let's do some benchmarking on an array with large number of rows and columns -
In [324]: np.random.seed(0)
...: a = np.random.rand(10000,1000)
...: r = np.random.randint(-1000,1000,(10000))
# #seberg's solution
In [325]: %timeit advindexing_roll(a, r)
10 loops, best of 3: 71.3 ms per loop
# Solution from this post
In [326]: %timeit strided_indexing_roll(a, r)
10 loops, best of 3: 44 ms per loop
In case you want more general solution (dealing with any shape and with any axis), I modified #seberg's solution:
def indep_roll(arr, shifts, axis=1):
"""Apply an independent roll for each dimensions of a single axis.
Parameters
----------
arr : np.ndarray
Array of any shape.
shifts : np.ndarray
How many shifting to use for each dimension. Shape: `(arr.shape[axis],)`.
axis : int
Axis along which elements are shifted.
"""
arr = np.swapaxes(arr,axis,-1)
all_idcs = np.ogrid[[slice(0,n) for n in arr.shape]]
# Convert to a positive shift
shifts[shifts < 0] += arr.shape[-1]
all_idcs[-1] = all_idcs[-1] - shifts[:, np.newaxis]
result = arr[tuple(all_idcs)]
arr = np.swapaxes(result,-1,axis)
return arr
I implement a pure numpy.lib.stride_tricks.as_strided solution as follows
from numpy.lib.stride_tricks import as_strided
def custom_roll(arr, r_tup):
m = np.asarray(r_tup)
arr_roll = arr[:, [*range(arr.shape[1]),*range(arr.shape[1]-1)]].copy() #need `copy`
strd_0, strd_1 = arr_roll.strides
n = arr.shape[1]
result = as_strided(arr_roll, (*arr.shape, n), (strd_0 ,strd_1, strd_1))
return result[np.arange(arr.shape[0]), (n-m)%n]
A = np.array([[4, 0, 0],
[1, 2, 3],
[0, 0, 5]])
r = np.array([2, 0, -1])
out = custom_roll(A, r)
Out[789]:
array([[0, 0, 4],
[1, 2, 3],
[0, 5, 0]])
By using a fast fourrier transform we can apply a transformation in the frequency domain and then use the inverse fast fourrier transform to obtain the row shift.
So this is a pure numpy solution that take only one line:
import numpy as np
from numpy.fft import fft, ifft
# The row shift function using the fast fourrier transform
# rshift(A,r) where A is a 2D array, r the row shift vector
def rshift(A,r):
return np.real(ifft(fft(A,axis=1)*np.exp(2*1j*np.pi/A.shape[1]*r[:,None]*np.r_[0:A.shape[1]][None,:]),axis=1).round())
This will apply a left shift, but we can simply negate the exponential exponant to turn the function into a right shift function:
ifft(fft(...)*np.exp(-2*1j...)
It can be used like that:
# Example:
A = np.array([[1,2,3,4],
[1,2,3,4],
[1,2,3,4]])
r = np.array([1,-1,3])
print(rshift(A,r))
Building on divakar's excellent answer, you can apply this logic to 3D array easily (which was the problematic that brought me here in the first place). Here's an example - basically flatten your data, roll it & reshape it after::
def applyroll_30(cube, threshold=25, offset=500):
flattened_cube = cube.copy().reshape(cube.shape[0]*cube.shape[1], cube.shape[2])
roll_matrix = calc_roll_matrix_flattened(flattened_cube, threshold, offset)
rolled_cube = strided_indexing_roll(flattened_cube, roll_matrix, cube_shape=cube.shape)
rolled_cube = triggered_cube.reshape(cube.shape[0], cube.shape[1], cube.shape[2])
return rolled_cube
def calc_roll_matrix_flattened(cube_flattened, threshold, offset):
""" Calculates the number of position along time axis we need to shift
elements in order to trig the data.
We return a 1D numpy array of shape (X*Y, time) elements
"""
# armax(...) finds the position in the cube (3d) where we are above threshold
roll_matrix = np.argmax(cube_flattened > threshold, axis=1) + offset
# ensure we don't have index out of bound
roll_matrix[roll_matrix>cube_flattened.shape[1]] = cube_flattened.shape[1]
return roll_matrix
def strided_indexing_roll(cube_flattened, roll_matrix_flattened, cube_shape):
# Concatenate with sliced to cover all rolls
# otherwise we shift in the wrong direction for my application
roll_matrix_flattened = -1 * roll_matrix_flattened
a_ext = np.concatenate((cube_flattened, cube_flattened[:, :-1]), axis=1)
# Get sliding windows; use advanced-indexing to select appropriate ones
n = cube_flattened.shape[1]
result = viewW(a_ext,(1,n))[np.arange(len(roll_matrix_flattened)), (n - roll_matrix_flattened) % n, 0]
result = result.reshape(cube_shape)
return result
Divakar's answer doesn't do justice to how much more efficient this is on large cube of data. I've timed it on a 400x400x2000 data formatted as int8. An equivalent for-loop does ~5.5seconds, Seberg's answer ~3.0seconds and strided_indexing.... ~0.5second.

Why does shallow copy behaves as deep copy for a simple list

I was going understanding shallow copy and deep copy concepts in python. I observe most of the posts/blogs/SO answer explain these concepts are using a nested lists.
import copy
lst = [[1,2,3],[4,5,6]]
b = copy.copy(lst)
c = copy.deepcopy(lst)
# Shallow copy demo
b[0][0] = 9
print(b)
# >>> [[9, 2, 3], [4, 5, 6]]
print(lst)
# >>> [[9, 2, 3], [4, 5, 6]]
# Deepcopy demo
c[0][0] = 10
print(c)
# >>> [[10, 2, 3], [4, 5, 6]]
print(lst)
# >>> [[9, 2, 3], [4, 5, 6]]
I understood the shallow and deep copy concept with the above simple example. But when I implement the concept, on a simple list (one-dimensional list), the observation is shallow copy behaves as deep copy.
import copy
lst = [1,2,3]
b = copy.copy(lst)
c = copy.deepcopy(lst)
# Shallow copy demo
b[0] = 0
print(b)
# >>> [0, 2, 3]
print(lst)
# >>> [1,2,3]
# Deepcopy demo
c[0] = 9
print(c)
# >>> [9,2,3]
print(lst)
# >>> [1,2,3]
This shows that copy.copy(lst) behaves different and does deep copy instead of shallow copy.
I would like to understand, why the behavior of copy.copy() is different for nested list and simple list. Also if i have to get shallow copy working for simple list, how can i achieve it?.
The results that you're getting are not directly related with the "level of depth", the
most important thing to keep in mind here is the concept of mutabiliy.
List are mutable, meanwhile numeric values are not. That means that you can add or modify items on a list, but those operations doesn't create or destroy the list, they only change it. You can verify that using the built-in function id(), which gives you the memory address of a variable:
lst = [1, 2, 3]
print(id(lst)) # the number printed by this...
lst.append(4)
lst[1] = 0
print(id(lst)) # should be the same printed by this one. That tells us that
# the variable 'lst' keeps referecing the same object, although
# the object have changed in form (mutated)
Numbers are totally different, and it makes sense, since a numeric type variable can only
store a single numeric value:
a = 5
print(id(a)) # the number printed by this...
a = 6
print(id(a)) # should be different than this one, meaning that a new numeric
# value were created and stored in a different memory address
On the line
b[0][0] = 9
of your first example, the list at b[0] is being manipulated, but it remains being the same object, and since b[0] is nothing more than a reference to the same list at lst[0] (because b is a shallow copy), when we print lst we will see that it changed too.
On your implementation, when you assign b[0] = 0, python is creating the value 0, storing it on a new memory location, and overriding the reference that b[0] had to the same value as lst[0] (cause thats the natural behavior of numeric types).
As is said, this doesn't have to be with the level of nesting of compound data structures,
since some of the are inmutable (as for example the tuple) and the same that happened on your implementation would happen with this inmutable data structures.
You can read some more about the id() built-in function here, and more about
mutable and inmutable types here
Hope this answer helps you!

I want to check the occurrence of a particular item in all other items (even be it a sub string)

I want to check the occurrence of a particular item in all other items (even be it a sub string) .
n_a = ['28', '4663', '66', '66']
occ_arr = [[0,0]]*len(n_a)
for i in range(len(n_a)):
count=0
for j in range(len(n_a)):
if n_a[i] in n_a[j]:
count+=1
occ_arr[i][0] = n_a[i]
occ_arr[i][1] = count
print(occ_arr)
This is my piece of code.
The result is
[['66', 3], ['66', 3], ['66', 3], ['66', 3]]
but the desired output is
[['28', 1], ['4663', 1], ['66', 3], ['66',3]].
Please help me to figure out what is wrong with the code.
All your sub-lists in the occ_arr list are referencing the same list because you're using the * operator to copy the reference of the same list, so any change in one sub-list is reflected on all the other sub-lists. You should instead use list comprehension to create a distinct sub-lists.
Change:
occ_arr = [[0,0]]*len(n_a)
to:
occ_arr = [[0,0] for _ in range(len(n_a))]
Changing:
occ_arr = [[0,0]]*len(n_a)
To:
occ_arr = []
for i in range(len(n_a)):
occ_arr.append([0,0])
Will fix the bug occuring with the program. If you want to make this a one line statement, use the following list comprehension:
occ_arr = [[0,0] for _ in n_a]
#Add the list [0,0] for each item in the list n_a
All together, the program turns into (using the one line solution):
n_a = ['28', '4663', '66', '66']
occ_arr = [[0,0] for _ in n_a]
for i in range(len(n_a)):
count=0
for j in range(len(n_a)):
if n_a[i] in n_a[j]:
count+=1
occ_arr[i][0] = n_a[i]
occ_arr[i][1] = count
print(occ_arr)
print(occ_arr)
Explanation of bug
The reason why the bug occurs is because of the way lists are stored. Rather than being stored as literal data (like ints, floats, etc...), they are stored as objects, with memory addresses and ids. The line:
cc_arr = [[0,0]]*len(n_a)
Creates a list with it's own unique id and then copies it (shallowly [copying just the memory address, rather than the data]) four times. This can be shown through the following example:
>>> x = [[0,0]] * 4
>>> for item in x:
... print(id(x))
4500701640
4500701640
4500701640
4500701640
Note that the output will be different for you.
Hence, when you change one list, you change the underlying representation of the object, which changes the other shallow copies, which is why your program was outputting [['66', 3], ['66', 3], ['66', 3], ['66', 3]] rather than [['28', 1], ['4663', 1], ['66', 3], ['66',3]]

Resources