I'd like to take a vector and get an array of vectors in which the i-th element of each vector are the k neighbors of the i-th element of the original vector. Also, I'm looking for the fastest way to do so.
I've already done that in MATLAB:
a=zeros(k, length(v));
I=cell(1,k);
a(1,:) = v;
for j=2:k
a(k,:)=[a(k-1,2:end),a(k-1,1)];
end
aux1=[a(:,(end-r+1):end),a(:,1:(end-r))];
for j=1:k
I{k}=aux1(k,:);
end
For example, v = [1, 2, 3, 4, 5] and k = 1; and I want to get:
M = [[5, 1, 2, 3, 4], [1, 2, 3, 4, 5], [2, 3, 4, 5, 1]]
so that, for the 1st element of each vector, I get [5; 1; 2], which are the element 1 and its neighbors.
Hope it makes sense. Thanks for reading :)
You could use the numpy roll function:
import numpy as np
def get_neighbors(v, k):
N = len(v)
M = np.zeros((k*2+1, N), dtype=int)
for i in range(-k, k+1):
M[i+k, :] = np.roll(v, -i)
return M
v = np.array([1, 2, 3, 4, 5])
k = 1
M = get_neighbors(v, k)
print(M)
Output:
[[5 1 2 3 4]
[1 2 3 4 5]
[2 3 4 5 1]]
Using sliding_window_view on a repetition of your array can do it "vectorized" way
# Example array
a = np.arange(1,16)
k = 2 # Window of neighbors
# My solution
np.lib.stride_tricks.sliding_window_view(np.hstack([a,a,a]), (len(a),))[len(a)-k:len(a)+k+1]
Returns
array([[14, 15, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13],
[15, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
[ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1],
[ 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1, 2]])
Note that sliding_window_view creates just a view. It doesn't create new data. Hence the reason why I do not hesitate creating (in this example) 31 lines (3*15-15+1), and then subset only 5 of them: I do not really create them.
So only real cost of that solution is in hstack, both cpu-wise and memory-wise.
That subset, btw, was done to abide strictly by what you asked. But, depending on what you intend to do, you may drop the subset. Important point is that if
T=np.lib.stride_tricks.sliding_window_view(np.hstack([a,a,a]), (len(a),))
Then T[len(a)+k] is a row made of the kth neighbor, whether k is positive, negative or 0 (the original row)
See timings, since it matters for you
sizes
This method
Roll method
len=15/k=2
51 μs
132 μs
len=15/k=7
51 μs
383 μs
len=1000/k=7
52 μs
422 μs
len=1M/k=7
6 ms
160 ms
len=1M/k=100
6 ms
2.2 s
Roll method is obviously proportional to the size of the window (O(k) — it has one roll to perform per row of output), when sliding_window_view is just a view, and does not really create rows, so is O(1) as far as k is concerned. Both method are equally impacted by len of data (O(n) really, but it shows only for n big enough).
So, all together, this method is O(n) while roll method is O(kn)
Related
I try to solve this problem:
initial list = [0, 1, 2, 2]
You get this sequence of numbers [0, 1, 2, 2] and you need to add every time the next natural number (so 3, 4, 5, etc.) n times, where n is the element of its index. For example, the next number to add is 3, and list[3] is 2, so you append [3] 2 times. New list will be: [0, 1, 2, 2, 3, 3]. Then the index of 4 is 3, so you have to append 4 three times. The list will be [0, 1, 2, 2, 3, 3, 4, 4, 4] and so on. ([0, 1, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10])
In order to solve this, I tried various approaches. I used recursion, but a recursive approach is very slow in this case. I tried as well the mathematical formula from OEIS (A055086) => a(n) = ceiling(2*sqrt(n+1)) - 2. The problem with the formula is that after 2 ** 20 it is too imprecise.
So, my next idea was to use memoization:
lst = [0, 1, 2, 2]
from itertools import repeat
def find(n):
global lst
print(lst[-1], n, flush = True)
if len(lst) > n:
return lst[n]
for number in range(lst[-1]+1, n+1):
lst += list(repeat(number, lst[number]))
if len(lst) > n:
return lst[n]
Now, this approach works until 2 ** 37, but after this is just timing out. The site where I try to implement my algorithm is (https://www.codewars.com/kata/5f134651bc9687000f8022c4/train/python). I don't ask for a solution, but for any hint on how to optimize my code.
I googled some similar problems and I found that in this case, I could use the total sum of the list, but is not very clear to me yet how could this help me.
Any help is welcomed!
You can answer it iteratively like so:
def find(n):
lst = [0,1,2,2]
if n < 4:
return lst[n]
to_add = 3
while n >= len(lst):
for i in range(lst[to_add]):
lst.append(to_add)
to_add += 1
return lst[n]
You could optimise for large n by breaking early in the for loop, and by keeping track of the list length separately, rather than calls to len
I'm trying to write a program that takes a list as input and prints only the multiples of three...can't figure it out, hoping to find some help. The list I've defined is [3, 1, 6, 2, 3, 9, 7, 9, 5, 4, 5, 12, 13, 15].
def multiples_of_three(input_list):
return [y for y in input_list if y % 3 == 0]
x = [3, 1, 6, 2, 3, 9, 7, 9, 5, 4, 5, 12, 13, 15]
print(multiples_of_three(x))
% is the modulo operator in python, y is a multiple of 3 if and only if y % 3 == 0
I have the following function which makes use of a dictionary of cycle_times to generate lists and dictionaries containing elements whose values are greater than a certain threshold.
def anamolous_cycle_time_index_and_lengths(cycle_time_dict, anamoly_threshold):
for meter,cycle_time_list in cycle_time_dict.items():
anamoly_dict = {cycle_time_list.index(x):x for x in cycle_time_list if x > anamoly_threshold}
anamoly_list = [x for x in cycle_time_list if x > anamoly_threshold]
print(meter,len(anamoly_dict))
print([value for key,value in anamoly_dict.items()])
print(anamoly_list)
Suppose I give the inputs as
new_dict = {104:[2,3,4,5,6,7,3,2,5,6,7], 101:[2,45,4,2,5,2,34,2,5,6,7], 106:[2,23,4,5,65,7,3,23,5,6,7]}
anamoly_threshold = 3
The outputs I get are
104 4
[4, 5, 6, 7]
[4, 5, 6, 7, 5, 6, 7]
101 6
[45, 4, 5, 34, 6, 7]
[45, 4, 5, 34, 5, 6, 7]
106 6
[23, 4, 5, 65, 7, 6]
[23, 4, 5, 65, 7, 23, 5, 6, 7]
Shouldn't the list and dictionary give me the same output? I have run a comprehension for both data structures on the same data.
Your problem is the use of .index(x). This returns the index for the first occurrence of x. And since dictionary keys are unique, you will see only the first occurrence of duplicate elements in your dict comprehension.
There are several ways to overcome this problem. The easiest is to use enumerate:
anamoly_dict = {index: x for index, x in enumerate(cycle_time_list) if x > anamoly_threshold}
Now the output for both methods is the same.
The program below will create a list of 100 numbers chosen randomly between 1-10. I need help to then sum the list, then average the list created.
I have no idea how to begin and since I'm watching videos online I have no person to turn to. I'm very fresh in this world so I may just be missing entire ideas. I would doubt that I don't actually know enough though because the videos I paid for are step by step know nothing to know something.
Edit: I was informed that what the program does is overwrite a variable, not make a list. So how do I sum my output like this example?
This is all I have to go on:
Code:
import random
x=0
while x < 100:
mylist = (random.randrange(1,10))
print(mylist)
x = x+1
I think the shortest and pythonic way to do this is:
import random
x = [random.randrange(1,10) for i in range(100)] #list comprehension
summed = sum(x) #Sum of all integers from x
avg = summed / len(x) #Average of the numbers from x
In this case this shouldn't have a big impact, but you should never use while and code manual counter when you know how many times you want to go; in other words, always use for when it's possible. It's more efficient and clearer to see what the code does.
def sum(list):
sm = 0
for i in list:
sm+=i
return sm
Just run sum(list) to get sum of all elements
Or you can use
import random
x=0
mylist = []
sm = 0
while x < 100:
mylist.append(random.randrange(1,10))
sm += mylist[x]
x += 1
Then sm will be sum of list
The code is not correct. It will not create a list but generate a number everytime. Use the below code to get your desired result.
import random
mylist = []
for x in range(100):
mylist.append(random.randrange(1,10))
print(mylist)
print(sum(mylist))
OR
import random
mylist = [random.randrange(1,10) for value in range(100)]
print(mylist)
print(sum(mylist))
Output:
[3, 9, 3, 1, 3, 5, 8, 8, 3, 3, 1, 2, 5, 1, 2, 1, 4, 8, 9, 1, 2, 2, 4,
6, 9, 7, 9, 5, 4, 5, 7, 7, 9, 2, 5, 8, 2, 4, 3, 8, 2, 1, 3, 4, 2, 2,
2, 1, 6, 8, 3, 2, 1, 9, 6, 5, 8, 7, 7, 9, 9, 9, 8, 5, 7, 9, 4, 9, 8,
7, 5, 9, 2, 6, 8, 8, 3, 4, 8, 4, 7, 9, 9, 4, 2, 9, 9, 6, 3, 4, 9, 5,
3, 8, 4, 1, 1, 3, 2, 6]
512
I need a parallel algorithm (cost optimal) to check if a given sequence of n numbers is sorted .
For m threads, give each thread a chunk of n/m consecutive numbers with an overlap of 1 number. In each thread, check that that the sequence it is assigned is in sorted order. If all subsequences are sorted, then the entire sequence is sorted.
Examples:
[1, 4, 5, 6, 11, 42] => [1, 4, 5, 6*] and [6, 11, 42] with 2 threads
[1, 4, 5, 6, 11, 42] => [1, 4, 5*], [5, 6, 11*] and [11, 42] with 3 threads
* this is the overlap of 1.
This solution has complexity O(n/m).