List of dicts - Partial shuffle - python-3.x

Let's suppose I have this:
my_list = [{'id':'2','value':'4'},
{'id':'6','value':'3'},
{'id':'4','value':'5'},
{'id':'9','value':'10'},
{'id':'0','value':'9'}]
and I want to shuffle the list but I want to do it partly - by this I mean that I do not want to shuffle all the elements but only a percentage of them (eg 40%).
For example like this:
my_list = [{'id':'4','value':'5'},
{'id':'6','value':'3'},
{'id':'2','value':'4'},
{'id':'9','value':'10'},
{'id':'0','value':'9'}]
How can this be efficiently done?

random.shuffle does not allow you to specify only part of a list, it will always shuffle an entire list.
A trade-off between effort, speed, and memory footprint would be to slice out the part of the list you want to shuffle, do it, and then assign it back to that slice:
>>> from random import shuffle
>>> x = list(range(10))
>>> y = x[:5]
>>> shuffle(y)
>>> x[:5] = y
>>> x
[2, 1, 4, 3, 0, 5, 6, 7, 8, 9]

My solution is the following:
from random import sample
shuffle_percentage = 0.4
x = sample(range(len(my_list)), int(len(my_list) * shuffle_percentage))
for index in range(0, len(x)-1, 2):
my_list[x[index]], my_list[x[index+1]] = my_list[x[index+1]], my_list[x[index]]

Related

What is the best possible way to find the first AND the last occurrences of an element in a list in Python?

The basic way I usually use is by using the list.index(element) and reversed_list.index(element), but this fails when I need to search for many elements and the length of the list is too large say 10^5 or say 10^6 or even larger than that. What is the best possible way (which uses very little time) for the same?
You can build auxiliary lookup structures:
lst = [1,2,3,1,2,3] # super long list
last = {n: i for i, n in enumerate(lst)}
first = {n: i for i, n in reversed(list(enumerate(lst)))}
last[3]
# 5
first[3]
# 2
The construction of the lookup dicts takes linear time, but then the lookup itself is constant.
Whreas calls to list.index() take linear time, and repeatedly doing so is then quadratic (given the number of lookups you make depends on the size of the list).
You could also build a single structure in one iteration:
from collections import defaultdict
lookup = defaultdict(lambda: [None, None])
for i, n in enumerate(lst):
lookup[n][1] = i
if lookup[n][0] is None:
lookup[n][0] = i
lookup[3]
# [2, 5]
lookup[2]
# [1, 4]
Well, someone needs to do the work in finding the element, and in a large list this can take time! Without more information or a code example, it'll be difficult to help you, but usually the go-to answer is to use another data structure- for example, if you can keep your elements in a dictionary instead of a list with the key being the element and the value being an array of indices, you'll be much quicker.
You can just remember first and last index for every element in the list:
In [9]: l = [random.randint(1, 10) for _ in range(100)]
In [10]: first_index = {}
In [11]: last_index = {}
In [12]: for idx, x in enumerate(l):
...: if x not in first_index:
...: first_index[x] = idx
...: last_index[x] = idx
...:
In [13]: [(x, first_index.get(x), last_index.get(x)) for x in range(1, 11)]
Out[13]:
[(1, 3, 88),
(2, 23, 90),
(3, 10, 91),
(4, 13, 98),
(5, 11, 57),
(6, 4, 99),
(7, 9, 92),
(8, 19, 95),
(9, 0, 77),
(10, 2, 87)]
In [14]: l[0]
Out[14]: 9
Your approach sounds good, I did some testing and:
import numpy as np
long_list = list(np.random.randint(0, 100_000, 100_000_000))
# This takes 10ms in my machine
long_list.index(999)
# This takes 1,100ms in my machine
long_list[::-1].index(999)
# This takes 1,300ms in my machine
list(reversed(long_list)).index(999)
# This takes 200ms in my machine
long_list.reverse()
long_list.index(999)
long_list.reverse()
But at the end of the day, a Python list does not seem like the best data structure for this.
As others have sugested, you can build a dict:
indexes = {}
for i, val in enumerate(long_list):
if val in indexes.keys():
indexes[val].append(i)
else:
indexes[val] = [i]
This is memory expensive, but solves your problem (depends on how often you modify the original list).
You can then do:
# This takes 0.02ms in my machine
ix = indexes.get(999)
ix[0], ix[-1]

Python how can create a subset from a integer array list based on a range?

I am looking around a way to get the subset from an integer array based on certain range
For example
Input
array1=[3,5,4,12,34,54]
#Now getting subset for every 3 element
Output
subset= [(3,5,4), (12,34,54)]
I know it could be simple, but didn't find the right way to get this output
Appreciated for the help
Thanks
Consider using a list comprehension:
>>> array1 = [3, 5, 4, 12, 34, 54]
>>> subset = [tuple(array1[i:i+3]) for i in range(0, len(array1), 3)]
>>> subset
[(3, 5, 4), (12, 34, 54)]
Links to other relevant documentation:
tuples
ranges
arr = [1,2,3,4,5,6]
sets = [tuple(arr[i:i+3]) for i in range(0, len(arr), 3)]
print(sets)
We are taking a range of values from the array that we make into a tuple. The range is determined by the for loop which iterates at a step of three so that a tuple only is create after every 3 items.
you can use code:
from itertools import zip_longest
input_list = [3,5,4,12,34,54]
iterables = [iter(input_list)] * 3
slices = zip_longest(*iterables, fillvalue=None)
output_list =[]
for slice in slices:
my_list = [slice]
# print(my_list)
output_list = output_list + my_list
print(output_list)
You could use the zip_longest function from itertools
https://docs.python.org/3.0/library/itertools.html#itertools.zip_longest

Roll of different amount along a single axis in a 3D matrix [duplicate]

I have a matrix (2d numpy ndarray, to be precise):
A = np.array([[4, 0, 0],
[1, 2, 3],
[0, 0, 5]])
And I want to roll each row of A independently, according to roll values in another array:
r = np.array([2, 0, -1])
That is, I want to do this:
print np.array([np.roll(row, x) for row,x in zip(A, r)])
[[0 0 4]
[1 2 3]
[0 5 0]]
Is there a way to do this efficiently? Perhaps using fancy indexing tricks?
Sure you can do it using advanced indexing, whether it is the fastest way probably depends on your array size (if your rows are large it may not be):
rows, column_indices = np.ogrid[:A.shape[0], :A.shape[1]]
# Use always a negative shift, so that column_indices are valid.
# (could also use module operation)
r[r < 0] += A.shape[1]
column_indices = column_indices - r[:, np.newaxis]
result = A[rows, column_indices]
numpy.lib.stride_tricks.as_strided stricks (abbrev pun intended) again!
Speaking of fancy indexing tricks, there's the infamous - np.lib.stride_tricks.as_strided. The idea/trick would be to get a sliced portion starting from the first column until the second last one and concatenate at the end. This ensures that we can stride in the forward direction as needed to leverage np.lib.stride_tricks.as_strided and thus avoid the need of actually rolling back. That's the whole idea!
Now, in terms of actual implementation we would use scikit-image's view_as_windows to elegantly use np.lib.stride_tricks.as_strided under the hoods. Thus, the final implementation would be -
from skimage.util.shape import view_as_windows as viewW
def strided_indexing_roll(a, r):
# Concatenate with sliced to cover all rolls
a_ext = np.concatenate((a,a[:,:-1]),axis=1)
# Get sliding windows; use advanced-indexing to select appropriate ones
n = a.shape[1]
return viewW(a_ext,(1,n))[np.arange(len(r)), (n-r)%n,0]
Here's a sample run -
In [327]: A = np.array([[4, 0, 0],
...: [1, 2, 3],
...: [0, 0, 5]])
In [328]: r = np.array([2, 0, -1])
In [329]: strided_indexing_roll(A, r)
Out[329]:
array([[0, 0, 4],
[1, 2, 3],
[0, 5, 0]])
Benchmarking
# #seberg's solution
def advindexing_roll(A, r):
rows, column_indices = np.ogrid[:A.shape[0], :A.shape[1]]
r[r < 0] += A.shape[1]
column_indices = column_indices - r[:,np.newaxis]
return A[rows, column_indices]
Let's do some benchmarking on an array with large number of rows and columns -
In [324]: np.random.seed(0)
...: a = np.random.rand(10000,1000)
...: r = np.random.randint(-1000,1000,(10000))
# #seberg's solution
In [325]: %timeit advindexing_roll(a, r)
10 loops, best of 3: 71.3 ms per loop
# Solution from this post
In [326]: %timeit strided_indexing_roll(a, r)
10 loops, best of 3: 44 ms per loop
In case you want more general solution (dealing with any shape and with any axis), I modified #seberg's solution:
def indep_roll(arr, shifts, axis=1):
"""Apply an independent roll for each dimensions of a single axis.
Parameters
----------
arr : np.ndarray
Array of any shape.
shifts : np.ndarray
How many shifting to use for each dimension. Shape: `(arr.shape[axis],)`.
axis : int
Axis along which elements are shifted.
"""
arr = np.swapaxes(arr,axis,-1)
all_idcs = np.ogrid[[slice(0,n) for n in arr.shape]]
# Convert to a positive shift
shifts[shifts < 0] += arr.shape[-1]
all_idcs[-1] = all_idcs[-1] - shifts[:, np.newaxis]
result = arr[tuple(all_idcs)]
arr = np.swapaxes(result,-1,axis)
return arr
I implement a pure numpy.lib.stride_tricks.as_strided solution as follows
from numpy.lib.stride_tricks import as_strided
def custom_roll(arr, r_tup):
m = np.asarray(r_tup)
arr_roll = arr[:, [*range(arr.shape[1]),*range(arr.shape[1]-1)]].copy() #need `copy`
strd_0, strd_1 = arr_roll.strides
n = arr.shape[1]
result = as_strided(arr_roll, (*arr.shape, n), (strd_0 ,strd_1, strd_1))
return result[np.arange(arr.shape[0]), (n-m)%n]
A = np.array([[4, 0, 0],
[1, 2, 3],
[0, 0, 5]])
r = np.array([2, 0, -1])
out = custom_roll(A, r)
Out[789]:
array([[0, 0, 4],
[1, 2, 3],
[0, 5, 0]])
By using a fast fourrier transform we can apply a transformation in the frequency domain and then use the inverse fast fourrier transform to obtain the row shift.
So this is a pure numpy solution that take only one line:
import numpy as np
from numpy.fft import fft, ifft
# The row shift function using the fast fourrier transform
# rshift(A,r) where A is a 2D array, r the row shift vector
def rshift(A,r):
return np.real(ifft(fft(A,axis=1)*np.exp(2*1j*np.pi/A.shape[1]*r[:,None]*np.r_[0:A.shape[1]][None,:]),axis=1).round())
This will apply a left shift, but we can simply negate the exponential exponant to turn the function into a right shift function:
ifft(fft(...)*np.exp(-2*1j...)
It can be used like that:
# Example:
A = np.array([[1,2,3,4],
[1,2,3,4],
[1,2,3,4]])
r = np.array([1,-1,3])
print(rshift(A,r))
Building on divakar's excellent answer, you can apply this logic to 3D array easily (which was the problematic that brought me here in the first place). Here's an example - basically flatten your data, roll it & reshape it after::
def applyroll_30(cube, threshold=25, offset=500):
flattened_cube = cube.copy().reshape(cube.shape[0]*cube.shape[1], cube.shape[2])
roll_matrix = calc_roll_matrix_flattened(flattened_cube, threshold, offset)
rolled_cube = strided_indexing_roll(flattened_cube, roll_matrix, cube_shape=cube.shape)
rolled_cube = triggered_cube.reshape(cube.shape[0], cube.shape[1], cube.shape[2])
return rolled_cube
def calc_roll_matrix_flattened(cube_flattened, threshold, offset):
""" Calculates the number of position along time axis we need to shift
elements in order to trig the data.
We return a 1D numpy array of shape (X*Y, time) elements
"""
# armax(...) finds the position in the cube (3d) where we are above threshold
roll_matrix = np.argmax(cube_flattened > threshold, axis=1) + offset
# ensure we don't have index out of bound
roll_matrix[roll_matrix>cube_flattened.shape[1]] = cube_flattened.shape[1]
return roll_matrix
def strided_indexing_roll(cube_flattened, roll_matrix_flattened, cube_shape):
# Concatenate with sliced to cover all rolls
# otherwise we shift in the wrong direction for my application
roll_matrix_flattened = -1 * roll_matrix_flattened
a_ext = np.concatenate((cube_flattened, cube_flattened[:, :-1]), axis=1)
# Get sliding windows; use advanced-indexing to select appropriate ones
n = cube_flattened.shape[1]
result = viewW(a_ext,(1,n))[np.arange(len(roll_matrix_flattened)), (n - roll_matrix_flattened) % n, 0]
result = result.reshape(cube_shape)
return result
Divakar's answer doesn't do justice to how much more efficient this is on large cube of data. I've timed it on a 400x400x2000 data formatted as int8. An equivalent for-loop does ~5.5seconds, Seberg's answer ~3.0seconds and strided_indexing.... ~0.5second.

Fastest way to find all the indexes of maximum value in a list - Python

I am having list which as follows
input_list= [2, 3, 5, 2, 5, 1, 5]
I want to get all the indexes of maximum value. Need efficient solution. The output will be as follows.
output = [2,4,6] (The above list 5 is maximum value in a list)
I have tried by using below code
m = max(input_list)
output = [i for i, j in enumerate(a) if j == m]
I need to find any other optimum solution.
from collections import defaultdict
dic=defaultdict(list)
input_list=[]
for i in range(len(input_list)):
dic[input_list[i]]+=[i]
max_value = max(input_list)
Sol = dic[max_value]
You can use numpy (numpy arrays are very fast):
import numpy as np
input_list= np.array([2, 3, 5, 2, 5, 1, 5])
i, = np.where(input_list == np.max(input_list))
print(i)
Output:
[2 4 6]
Here's the approach which is described in comments. Even if you use some library, fundamentally you need to traverse at least once to solve this problem (considering input list is unsorted). So even lower bound for the algorithm would be Omega(size_of_list). If list is sorted we can leverage binary_search to solve the problem.
def max_indexes(l):
try:
assert l != []
max_element = l[0]
indexes = [0]
for index, element in enumerate(l[1:]):
if element > max_element:
max_element = element
indexes = [index + 1]
elif element == max_element:
indexes.append(index + 1)
return indexes
except AssertionError:
print ('input_list in empty')
Use a for loop for O(n) and iterating just once over the list resolution:
from itertools import islice
input_list= [2, 3, 5, 2, 5, 1, 5]
def max_indexes(l):
max_item = input_list[0]
indexes = [0]
for i, item in enumerate(islice(l, 1, None), 1):
if item < max_item:
continue
elif item > max_item:
max_item = item
indexes = [i]
elif item == max_item:
indexes.append(i)
return indexes
Here you have the live example
Think of it in this way, unless you iterate through the whole list once, which is O(n), n being the length of the list, you won't be able to compare the maximum with all values in the list, so the best you can do is O(n), which you already seems to be doing in your example.
So I am not sure you can do it faster than O(n) with the list approach.

How to assing values to a dictionary

I am creating a function which is supposed to return a dictionary with keys and values from different lists. But I amhavin problems in getting the mean of a list o numbers as values of the dictionary. However, I think I am getting the keys properly.
This is what I get so far:
def exp (magnitudes,measures):
"""return for each magnitude the associated mean of numbers from a list"""
dict_expe = {}
for mag in magnitudes:
dict_expe[mag] = 0
for mea in measures:
summ = 0
for n in mea:
summ += n
dict_expe[mag] = summ/len(mea)
return dict_expe
print(exp(['mag1', 'mag2', 'mag3'], [[1,2,3],[3,4],[5]]))
The output should be:
{mag1 : 2, mag2: 3.5, mag3: 5}
But what I am getting is always 5 as values of all keys. I thought about the zip() method but im trying to avoid it as because the it requieres the same length in both lists.
An average of a sequence is sum(sequence) / len(sequence), so you need to iterate through both magnitudes and measures, calculate these means (arithmetical averages) and store it in a dictionary.
There are much more pythonic ways you can achieve this. All of these examples produce {'mag1': 2.0, 'mag2': 3.5, 'mag3': 5.0} as result.
Using for i in range() loop:
def exp(magnitudes, measures):
means = {}
for i in range(len(magnitudes)):
means[magnitudes[i]] = sum(measures[i]) / len(measures[i])
return means
print(exp(['mag1', 'mag2', 'mag3'], [[1, 2, 3], [3, 4], [5]]))
But if you need both indices and values of a list you can use for i, val in enumerate(sequence) approach which is much more suitable in this case:
def exp(magnitudes, measures):
means = {}
for i, mag in enumerate(magnitudes):
means[mag] = sum(measures[i]) / len(measures[i])
return means
print(exp(['mag1', 'mag2', 'mag3'], [[1, 2, 3], [3, 4], [5]]))
Another problem hides here: i index belongs to magnitudes but we are also getting values from measures using it, this is not a big deal in your case if you have magnitudes and measures the same length but if magnitudes will be larger you will get an IndexError. So it seems to me like using zip function is what would be the best choice here (actually as of python3.6 it doesn't require two lists to be the same length, it will just use the length of shortest one as the length of result):
def exp(magnitudes, measures):
means = {}
for mag, mes in zip(magnitudes, measures):
means[mag] = sum(mes) / len(mes)
return means
print(exp(['mag1', 'mag2', 'mag3'], [[1, 2, 3], [3, 4], [5]]))
So feel free to use the example which suits your requirements of which one you like and don't forget to add docstring.
More likely you don't need such pythonic way but it can be even shorter when dictionary comprehension comes into play:
def exp(magnitudes, measures):
return {mag: sum(mes) / len(mes) for mag, mes in zip(magnitudes, measures)}
print(exp(['mag1', 'mag2', 'mag3'], [[1, 2, 3], [3, 4], [5]]))

Resources