Python how can create a subset from a integer array list based on a range? - python-3.x

I am looking around a way to get the subset from an integer array based on certain range
For example
Input
array1=[3,5,4,12,34,54]
#Now getting subset for every 3 element
Output
subset= [(3,5,4), (12,34,54)]
I know it could be simple, but didn't find the right way to get this output
Appreciated for the help
Thanks

Consider using a list comprehension:
>>> array1 = [3, 5, 4, 12, 34, 54]
>>> subset = [tuple(array1[i:i+3]) for i in range(0, len(array1), 3)]
>>> subset
[(3, 5, 4), (12, 34, 54)]
Links to other relevant documentation:
tuples
ranges

arr = [1,2,3,4,5,6]
sets = [tuple(arr[i:i+3]) for i in range(0, len(arr), 3)]
print(sets)
We are taking a range of values from the array that we make into a tuple. The range is determined by the for loop which iterates at a step of three so that a tuple only is create after every 3 items.

you can use code:
from itertools import zip_longest
input_list = [3,5,4,12,34,54]
iterables = [iter(input_list)] * 3
slices = zip_longest(*iterables, fillvalue=None)
output_list =[]
for slice in slices:
my_list = [slice]
# print(my_list)
output_list = output_list + my_list
print(output_list)
You could use the zip_longest function from itertools
https://docs.python.org/3.0/library/itertools.html#itertools.zip_longest

Related

What is the best possible way to find the first AND the last occurrences of an element in a list in Python?

The basic way I usually use is by using the list.index(element) and reversed_list.index(element), but this fails when I need to search for many elements and the length of the list is too large say 10^5 or say 10^6 or even larger than that. What is the best possible way (which uses very little time) for the same?
You can build auxiliary lookup structures:
lst = [1,2,3,1,2,3] # super long list
last = {n: i for i, n in enumerate(lst)}
first = {n: i for i, n in reversed(list(enumerate(lst)))}
last[3]
# 5
first[3]
# 2
The construction of the lookup dicts takes linear time, but then the lookup itself is constant.
Whreas calls to list.index() take linear time, and repeatedly doing so is then quadratic (given the number of lookups you make depends on the size of the list).
You could also build a single structure in one iteration:
from collections import defaultdict
lookup = defaultdict(lambda: [None, None])
for i, n in enumerate(lst):
lookup[n][1] = i
if lookup[n][0] is None:
lookup[n][0] = i
lookup[3]
# [2, 5]
lookup[2]
# [1, 4]
Well, someone needs to do the work in finding the element, and in a large list this can take time! Without more information or a code example, it'll be difficult to help you, but usually the go-to answer is to use another data structure- for example, if you can keep your elements in a dictionary instead of a list with the key being the element and the value being an array of indices, you'll be much quicker.
You can just remember first and last index for every element in the list:
In [9]: l = [random.randint(1, 10) for _ in range(100)]
In [10]: first_index = {}
In [11]: last_index = {}
In [12]: for idx, x in enumerate(l):
...: if x not in first_index:
...: first_index[x] = idx
...: last_index[x] = idx
...:
In [13]: [(x, first_index.get(x), last_index.get(x)) for x in range(1, 11)]
Out[13]:
[(1, 3, 88),
(2, 23, 90),
(3, 10, 91),
(4, 13, 98),
(5, 11, 57),
(6, 4, 99),
(7, 9, 92),
(8, 19, 95),
(9, 0, 77),
(10, 2, 87)]
In [14]: l[0]
Out[14]: 9
Your approach sounds good, I did some testing and:
import numpy as np
long_list = list(np.random.randint(0, 100_000, 100_000_000))
# This takes 10ms in my machine
long_list.index(999)
# This takes 1,100ms in my machine
long_list[::-1].index(999)
# This takes 1,300ms in my machine
list(reversed(long_list)).index(999)
# This takes 200ms in my machine
long_list.reverse()
long_list.index(999)
long_list.reverse()
But at the end of the day, a Python list does not seem like the best data structure for this.
As others have sugested, you can build a dict:
indexes = {}
for i, val in enumerate(long_list):
if val in indexes.keys():
indexes[val].append(i)
else:
indexes[val] = [i]
This is memory expensive, but solves your problem (depends on how often you modify the original list).
You can then do:
# This takes 0.02ms in my machine
ix = indexes.get(999)
ix[0], ix[-1]

List of dicts - Partial shuffle

Let's suppose I have this:
my_list = [{'id':'2','value':'4'},
{'id':'6','value':'3'},
{'id':'4','value':'5'},
{'id':'9','value':'10'},
{'id':'0','value':'9'}]
and I want to shuffle the list but I want to do it partly - by this I mean that I do not want to shuffle all the elements but only a percentage of them (eg 40%).
For example like this:
my_list = [{'id':'4','value':'5'},
{'id':'6','value':'3'},
{'id':'2','value':'4'},
{'id':'9','value':'10'},
{'id':'0','value':'9'}]
How can this be efficiently done?
random.shuffle does not allow you to specify only part of a list, it will always shuffle an entire list.
A trade-off between effort, speed, and memory footprint would be to slice out the part of the list you want to shuffle, do it, and then assign it back to that slice:
>>> from random import shuffle
>>> x = list(range(10))
>>> y = x[:5]
>>> shuffle(y)
>>> x[:5] = y
>>> x
[2, 1, 4, 3, 0, 5, 6, 7, 8, 9]
My solution is the following:
from random import sample
shuffle_percentage = 0.4
x = sample(range(len(my_list)), int(len(my_list) * shuffle_percentage))
for index in range(0, len(x)-1, 2):
my_list[x[index]], my_list[x[index+1]] = my_list[x[index+1]], my_list[x[index]]

Effective ways to group things into list

I am doing a K-means project and I have to do it by hand, which is why I am trying to figure out what is the best ways to group things according to their last values into a list or a dictionary. Here is what I am talking about
list_of_tuples = [(honey,1),(bee,2),(tree,5),(flower,2),(computer,5),(key,1)]
Now my ultimate goal is to be able to sort out the list and have 3 different lists each with its respected element
"""This is the goal"""
list_1 = [honey,key]
list_2 = [bee,flower]
list_3 = [tree, computer]
I can use a lot of if statements and a for loop, but is there a more efficient way to do it?
If you're not opposed to using something like pandas, you could do something along these lines:
import pandas as pd
list_1, list_2, list_3 = pd.DataFrame(list_of_tuples).groupby(1)[0].apply(list).values
Result:
In [19]: list_1
Out[19]: ['honey', 'key']
In [20]: list_2
Out[20]: ['bee', 'flower']
In [21]: list_3
Out[21]: ['tree', 'computer']
Explanation:
pd.DataFrame(list_of_tuples).groupby(1) groups your list of tuples by the value at index 1, then you extract the values as lists of index 0 with [0].apply(list).values. This gives you an array of lists as below:
array([list(['honey', 'key']), list(['bee', 'flower']),
list(['tree', 'computer'])], dtype=object)
Something to the effect can be achieved with a dictionary and a for loop, using the second element of the tuple as a key value.
list_of_tuples = [("honey",1),("bee",2),("tree",5),("flower",2),("computer",5),("key",1)]
dict_list = {}
for t in list_of_tuples:
# create key and a single element list if key doesn't exist yet
# append to existing list otherwise
if t[1] not in dict_list.keys():
dict_list[t[1]] = [t[0]]
else:
dict_list[t[1]].append( t[0] )
list_1, list_2, list_3 = dict_list.values()

Fastest way to find all the indexes of maximum value in a list - Python

I am having list which as follows
input_list= [2, 3, 5, 2, 5, 1, 5]
I want to get all the indexes of maximum value. Need efficient solution. The output will be as follows.
output = [2,4,6] (The above list 5 is maximum value in a list)
I have tried by using below code
m = max(input_list)
output = [i for i, j in enumerate(a) if j == m]
I need to find any other optimum solution.
from collections import defaultdict
dic=defaultdict(list)
input_list=[]
for i in range(len(input_list)):
dic[input_list[i]]+=[i]
max_value = max(input_list)
Sol = dic[max_value]
You can use numpy (numpy arrays are very fast):
import numpy as np
input_list= np.array([2, 3, 5, 2, 5, 1, 5])
i, = np.where(input_list == np.max(input_list))
print(i)
Output:
[2 4 6]
Here's the approach which is described in comments. Even if you use some library, fundamentally you need to traverse at least once to solve this problem (considering input list is unsorted). So even lower bound for the algorithm would be Omega(size_of_list). If list is sorted we can leverage binary_search to solve the problem.
def max_indexes(l):
try:
assert l != []
max_element = l[0]
indexes = [0]
for index, element in enumerate(l[1:]):
if element > max_element:
max_element = element
indexes = [index + 1]
elif element == max_element:
indexes.append(index + 1)
return indexes
except AssertionError:
print ('input_list in empty')
Use a for loop for O(n) and iterating just once over the list resolution:
from itertools import islice
input_list= [2, 3, 5, 2, 5, 1, 5]
def max_indexes(l):
max_item = input_list[0]
indexes = [0]
for i, item in enumerate(islice(l, 1, None), 1):
if item < max_item:
continue
elif item > max_item:
max_item = item
indexes = [i]
elif item == max_item:
indexes.append(i)
return indexes
Here you have the live example
Think of it in this way, unless you iterate through the whole list once, which is O(n), n being the length of the list, you won't be able to compare the maximum with all values in the list, so the best you can do is O(n), which you already seems to be doing in your example.
So I am not sure you can do it faster than O(n) with the list approach.

Python: How to find the average on each array in the list?

Lets say I have a list with three arrays as following:
[(1,2,0),(2,9,6),(2,3,6)]
Is it possible I get the average by diving each "slot" of the arrays in the list.
For example:
(1+2+2)/3, (2+0+9)/3, (0+6+6)/3
and make it become new arraylist with only 3 integers.
You can use zip to associate all of the elements in each of the interior tuples by index
tups = [(1,2,0),(2,9,6),(2,3,6)]
print([sum(x)/len(x) for x in zip(*tups)])
# [1.6666666666666667, 4.666666666666667, 4.0]
You can also do something like sum(x)//len(x) or round(sum(x)/len(x)) inside the list comprehension to get an integer.
Here are couple of ways you can do it.
data = [(1,2,0),(2,9,6),(2,3,6)]
avg_array = []
for tu in data:
avg_array.append(sum(tu)/len(tu))
print(avg_array)
using list comprehension
data = [(1,2,0),(2,9,6),(2,3,6)]
comp = [ sum(i)/len(i) for i in data]
print(comp)
Can be achieved by doing something like this.
Create an empty array. Loop through your current array and use the sum and len functions to calculate averages. Then append the average to your new array.
array = [(1,2,0),(2,9,6),(2,3,6)]
arraynew = []
for i in range(0,len(array)):
arraynew.append(sum(array[i]) / len(array[i]))
print arraynew
As you were told in the comments with sum and len it's pretty easy.
But in python I would do something like this, assuming you want to maintain decimal precision:
list = [(1, 2, 0), (2, 9, 6), (2, 3, 6)]
res = map(lambda l: round(float(sum(l)) / len(l), 2), list)
Output:
[1.0, 5.67, 3.67]
But as you said you wanted 3 ints in your question, would be like this:
res = map(lambda l: sum(l) / len(l), list)
Output:
[1, 5, 3]
Edit:
To sum the same index of each tuple, the most elegant method is the solution provided by #PatrickHaugh.
On the other hand, if you are not fond of list comprehensions and some built in functions as zip is, here's a little longer and less elegant version using a for loop:
arr = []
for i in range(0, len(list)):
arr.append(sum(l[i] for l in list) / len(list))
print(arr)
Output:
[1, 4, 4]

Resources