Easiest way to get Pandas rolling window of values

Easiest way to get Pandas rolling window of values - python-3.x

I have a dataset. I want a window of 5 values. Does pandas have a native function that will give me a rolling window of 5 values until there are no longer 5 values that it can use? I want these to be rows.
I also want the new label to be the middle of the 5 values.
Input DataFrame
first label
0 1 0
1 2 1
2 3 2
3 4 3
4 5 4
5 6 5
Output DataFrame desired:
first label
0 [1, 2, 3, 4, 5] 2
1 [2, 3, 4, 5, 6] 3
I have tried using the .rolling function and haven't been successful.

You can use strides and for label get position of middle value and by numpy indexing set value:
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
a = rolling_window(df['first'].to_numpy(), 5)
print (a)
[[ 1 2 3 4 5]
[2 3 4 5 6]]
#get positions of middle value
i = rolling_window(np.arange(len(df)), 5)[:, 2]
print (i)
[2 3]
df = pd.DataFrame({'first':a.tolist(),
'label': df['label'].to_numpy()[i]})
print (df)
first label
0 [1, 2, 3, 4, 5] 2
1 [2, 3, 4, 5, 6] 3
You can more optimalize code for run strides only one:
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
#get positions
idx = rolling_window(np.arange(len(df)), 5)
print (idx)
[[0 1 2 3 4]
[1 2 3 4 5]]
df = pd.DataFrame({'first': df['first'].to_numpy()[idx].tolist(),
'label': df['label'].to_numpy()[idx][:, 2]})
print (df)
first label
0 [1, 2, 3, 4, 5] 2
1 [2, 3, 4, 5, 6] 3

An alternative, more of a hack, I don't think pandas has a native function for what you want.
Convert dataframe to numpy, transpose dataframe and pull out labels and array, using a list comprehension:
M = df.to_numpy().T
outcome = [(M[0,i:5+i],
M[1][(5+i)//2])
for i in range(0,M.shape[1])
if 5+i <=M.shape[1]
]
print(outcome)
[(array([1, 2, 3, 4, 5]), 2), (array([2, 3, 4, 5, 6]), 3)]
pd.DataFrame(outcome,columns=['first','label'])
first label
0 [1, 2, 3, 4, 5] 2
1 [2, 3, 4, 5, 6] 3

Related

How to append item to match the length of two list in Python

I am working on a Python script which is connected to a server. Every x min, server returns two list but the length of these list is not same. For ex:
a = [8, 10, 1, 34]
b = [4, 6, 8]
As you can see above that a is of length 4 and b is of length 3. Similarly, sometimes it returns
a = [3, 6, 4, 5]
b = [8, 3, 5, 2, 9, 3]
I have to write a logic where I have to check if length of these two list is not same, then add the 0 at the end of the list which is smaller than other list. So for ex, if input is:
a = [3, 6, 4, 5]
b = [8, 3, 5, 2, 9, 3]
then output will be:
a = [3, 6, 4, 5, 0, 0]
b = [8, 3, 5, 2, 9, 3]
What can I try to achieve this?

def pad(list1, list2):
# make copies of the existing lists so that original lists remain intact
list1_copy = list1.copy()
list2_copy = list2.copy()
len_list1 = len(list1_copy)
len_list2 = len(list2_copy)
# find the difference in the element count between the two lists
diff = abs(len_list1 - len_list2)
# add `diff` number of elements to the end of the list
if len_list1 < len_list2:
list1_copy += [0] * diff
elif len_list1 > len_list2:
list2_copy += [0] * diff
return list1_copy, list2_copy
a = [3, 6, 4, 5]
b = [8, 3, 5, 2, 9, 3]
# prints: ([3, 6, 4, 5, 0, 0], [8, 3, 5, 2, 9, 3])
print(pad(a, b))
a = [8, 10, 1, 34]
b = [4, 6, 8]
# prints: ([8, 10, 1, 34], [4, 6, 8, 0])
print(pad(a, b))

For now, I can suggest this solution:
a = [3, 6, 4, 5]
b = [8, 3, 5, 2, 9, 3]
# Gets the size of a and b.
sizeA, sizeB = len(a), len(b)
# Constructs the zeros...
zeros = [0 for _ in range(abs(sizeA-sizeB))]
# Determines whether a or b needs to be appended with 0,0,0,0...
if sizeA < sizeB:
a += zeros
else:
b += zeros
print(a,b)

You should use extend instead of append. This is the way to add a list to another list in Python. The list here is the list of zeros.
a = [3, 6, 4, 5, 9, 3]
b = [8, 3, 5, 2]
lenA, lenB = len(a), len(b)
diff=abs(len(a)-len(b))
if lenA < lenB:
a.extend([0]*diff)
else:
b.extend([0]*diff)
print(a)
print(b)

You could also try to use more_itertools padded() method:
It's prob. more elegant and adaptable for future Use cases.
Notes: just need to do pip install more_itertools first.
# simple example to demo it:
from more_itertools import padded
print(list(padded([1, 2, 3], 0, 5))) # last num: 5 is the numbers of 0 to be padded to make the total length to be 5. (needs 2 zeros)
# [1, 2, 3, 0, 0]
# more examples:
>>> L = [1, 2, 3]
>>> K = [3, 4, 5, 6, 8, 9]
>>> gap = len(K) - len(L)
# 3
# shorter list is L
>>>list(padded(L, 0, len(L) + gap))
[1, 2, 3, 0, 0, 0]

How to convert list like value in each row into pure value in python dataframe?

I have a dataframe that has a value that looks as belows
colA desired
[0] 0
[3, 1] 3,1
[3, 1, 2] 3,1,2
[3, 1] 3,1
The type for colA is object.
Is there a way to do it?
Thanks

without the lambda as that will be slower for larger data sets, you can simply cast the list to a string type then strip unwanted characters.
import pandas as pd
df = pd.DataFrame(data={'colA':[[0], [3,1],[3,1,2],[3,1]]})
df['desired'] = df.colA.astype(str).str.replace('\[|\]|\'', '')
df
Output:
colA desired
0 [0] 0
1 [3, 1] 3, 1
2 [3, 1, 2] 3, 1, 2
3 [3, 1] 3, 1

Try:
df = pd.DataFrame(data={'colA':[[0], [3,1],[3,1,2],[3,1], [4]]})
df['desired'] = df['colA'].apply(lambda x: ','.join(map(str, x)))
OUTPUT:
colA desired
0 [0] 0
1 [3, 1] 3,1
2 [3, 1, 2] 3,1,2
3 [3, 1] 3,1
4 [4] 4
If colA is obj:
from ast import literal_eval
df = pd.DataFrame(data={'colA':["[0]", "[3,1]","[3,1,2]","[3,1]", "[4]"]})
df['desired'] = df['colA'].apply(lambda x: ','.join(map(str, literal_eval(x))))

You can use str.replace:
df['desired'] = df['colA'].str.replace(r'[][]', '', regex=True)
Prints:
colA desired
0 [0] 0
1 [3, 1] 3, 1
2 [3, 1, 2] 3, 1, 2
3 [3, 1] 3, 1
You can use the regex demo to play with it.

Compare neighbouring cells in a 2d array

Suppose this is a 3X3 matrix and I need to find the number of elements that are greater than their neighbours.
[[1 2 7],
[4 5 6],
[3 8 9]]
Neighbours are those cells whose corners touch each other.
1 has neighbours 2,4,5.
2 has neighbours 1,7,4,5,6.
7 has 2,5,6.
5 has 1,2,7,4,6,3,8,9 and so on.

This problem can be solved in two steps/functions: 1) get_neighbors(matrix, r, c), and 2) compare_neighbors(matrix). In the 2nd function, compare_neighbors we just call get_neighbors and passing all coordinates by leveraging itertools.product.
# code snippet:
from itertools import product
def get_neighbors(matrix, r, c):
sum((row[c -(c>0): c+2]
for row in matrix[r -(r>0):r+2]), []) # sum() beats itertools.chain()
vals.remove(grid[r][c]) # rm itself.
return set(vals) # keep distinct nums. ONLY
def compare_neighbors(matrix):
ROW, COL = len(matrix), len(matrix[0])
result = []
for x, y in product(range(ROW), range(COL)):
current = matrix[x][y]
all_nums = get_neighbors(matrix, x, y)
if all(x < current for x in all_nums):
result.append(current)
return result
Program running:
grid = [[1, 5, 4, 9],
[2, 6, 3, 2],
[8, 3, 6, 3],
[5, 4, 7, 1]]
matrix = [[1, 2, 7],
[4, 5, 6],
[3, 8, 9]]
print(f' {compare_neighbors(matrix)} ') # [7, 9]
print(f' {compare_neighbors(grid) } ') # [9, 8, 7]

You need to tackle different problems:
ensure you got data that fits the problemstatement - a matrix of strings f.e. does not work nor does a non-quadratic data input
get the actual neighbouring indexes based on a potential neighbourhood
check all neighbours for bigger then testing-index
You can get the dimensions directly from the provided data (after assertioning it conforms to some base rules) and provide a generic check based on this like so:
# different neighbourhood tuples
neigh8 = tuple((a,b) for a in range(-1,2) for b in range(-1,2))
neigh4 = tuple((a,b) for (a,b) in neigh8 if a or b)
def assertions(data):
"""Check if data is list of lists and all dims are same.
Check if all elements are either ints or floats.
Exit with exception if not."""
assert isinstance(data, list)
outer_dim = len(data)
for inner in data:
assert outer_dim == len(inner), f"Inner element not of len {outer_dim}: {inner}"
assert isinstance(inner, list), f"Inner element not list: {inner}"
allNumbers = all(isinstance(i, (int, float)) for i in inner)
assert allNumbers, f"Not all elements ints or floats: {inner}"
return outer_dim
def test_surrounded_by_lower_numbers(data, idx, n_idx):
"""Test one element at 'idx' in 'data' for a given neighbourhood 'n_idx'
and return True if surrounded only by smaller numbers."""
def get_idx(data, idx, n_idx):
"""Get all indexes that conform to the given neighbourhood and are
not identical to idx nor out of bounds."""
n = []
for (a,b) in n_idx:
# identical to input idx
if (idx[0]+a , idx[1]+b) == idx:
continue
# out of bounds
if idx[0]+a < 0 or idx[1]+b < 0:
continue
if idx[0]+a >= len(data) or idx[1]+b >= len(data):
continue
n.append( (idx[0]+a , idx[1]+b ))
return n
value = data[idx[0]][idx[1]]
n = get_idx(data, idx, n_idx)
# check if all are smaller as the current value
return all (data[a][b] < value for a,b in n)
def test_matrix(matrix, n_idx = neigh8):
"""Check all matrix values for given neighbourhood. Output alle values that are
surrounded only by strictly smaller values."""
print()
for i in matrix:
for n in i:
print(f"{float(n):>5.2f} ".replace(".00"," "), end=" ")
print()
print()
dim = assertions(matrix)
for (a,b) in ((a,b) for a in range(dim) for b in range(dim)):
if test_surrounded_by_lower_numbers(matrix,(a,b)):
print(f"{(a,b)} = {matrix[a][b]} is biggest.")
Program:
# 3 x 3
test_matrix( [[1, 2, 7], [4, 5, 6], [3, 8, 9]] )
# 5 x 5
test_matrix([[1, 2, 7, 11, 9],
[4, 5, 6, -2, .5],
[9.1, 3,99.99, 8, 9.7],
[1,2,3,4,5],
[40,50,60,70,80]])
Output for 3x3 testcase:
1 2 7
4 5 6
3 8 9
(0, 2) = 7 is biggest.
(2, 2) = 9 is biggest.
Output for 5x5 testcase:
1 2 7 11 9
4 5 6 -2 0.50
9.10 3 99.99 8 9.70
1 2 3 4 5
40 50 60 70 80
(0, 3) = 11 is biggest.
(2, 0) = 9.1 is biggest.
(2, 2) = 99.99 is biggest.
(2, 4) = 9.7 is biggest.
(4, 4) = 80 is biggest.

Counting contiguous numbers within a list

I am completely new to the topic of programming but interested.
I am coding in python 3.x and have a question to my latest topic:
We have a list, containing a few tenthousands of randomly generated integers between 1 and 7.
import random
list_of_states = []
n = int(input('Enter number of elements:'))
for i in range(n):
list_of_states.append(random.randint(1,7))
print (list_of_states)
Afterwards, I would like to count the contiguous numbers in this list and put them into an numpy.array
example: [1, 2, 3, 4, 4, 4, 7, 3, 1, 1, 1]
1 1
2 1
3 1
4 3
7 1
3 1
1 3
I would like to know whether someone has a hint/an idea of how I could do this.
This part is a smaller part of a markov chain wherefor I need the frequency of each number.
Thanks for sharing
Nadim

Below is a crude way of doing this. I am creating a list of lists and then converting it to a numpy array. Please use this only a guidance and improvise on this.
import numpy as np
num_list = [1,1,1,1,2,2,2,3,4,5,6,6,6,6,7,7,7,7,1,1,1,1,3,3,3]
temp_dict = {}
two_dim_list = []
for x in num_list:
if x in temp_dict:
temp_dict[x] += 1
else:
if temp_dict:
for k,v in temp_dict.items():
two_dim_list.append([k,v])
temp_dict = {}
temp_dict[x] = 1
for k,v in temp_dict.items():
two_dim_list.append([k,v])
print ("List of List = %s" %(two_dim_list))
two_dim_arr = np.array(two_dim_list)
print ("2D Array = %s" %(two_dim_arr))
Output:
List of List = [[1, 4], [2, 3], [3, 1], [4, 1], [5, 1], [6, 4], [7, 4], [1, 4], [3, 3]]
2D Array = [[1 4]
[2 3]
[3 1]
[4 1]
[5 1]
[6 4]
[7 4]
[1 4]
[3 3]]

Returning the N largest values' indices in a multidimensional array (can find solutions for one dimension but not multi-dimension)

I have a numpy array X, and I'd like to return another array Y whose entries are the indices of the n largest values of X i.e. suppose I have:
a =np.array[[1, 3, 5], [4, 5 ,6], [9, 1, 7]]
then say, if I want the first 5 "maxs"'s indices-here 9, 7 , 6 , 5, 5 are the maxs, and their indices are:
b=np.array[[2, 0], [2 2], [ 2 1], [1 1], [0 , 2])
I've been able to find some solutions and make this work for a one dimensional array like
c=np.array[1, 2, 3, 4, 5, 6]:
def f(a,N):
return np.argsort(a)[::-1][:N]
But have not been able to generate something that works in more than one dimension. Thanks!

Approach #1
Get the argsort indices on its flattened version and select the last N indices. Then, get the corresponding row and column indices -
N = 5
idx = np.argsort(a.ravel())[-N:][::-1] #single slicing: `[:N-2:-1]`
topN_val = a.ravel()[idx]
row_col = np.c_[np.unravel_index(idx, a.shape)]
Sample run -
# Input array
In [39]: a = np.array([[1,3,5],[4,5,6],[9,1,7]])
In [40]: N = 5
...: idx = np.argsort(a.ravel())[-N:][::-1]
...: topN_val = a.ravel()[idx]
...: row_col = np.c_[np.unravel_index(idx, a.shape)]
...:
In [41]: topN_val
Out[41]: array([9, 7, 6, 5, 5])
In [42]: row_col
Out[42]:
array([[2, 0],
[2, 2],
[1, 2],
[1, 1],
[0, 2]])
Approach #2
For performance, we can use np.argpartition to get top N indices without keeping sorted order, like so -
idx0 = np.argpartition(a.ravel(), -N)[-N:]
To get the sorted order, we need one more round of argsort -
idx = idx0[a.ravel()[idx0].argsort()][::-1]

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Easiest way to get Pandas rolling window of values - python-3.x

Related

How to append item to match the length of two list in Python

How to convert list like value in each row into pure value in python dataframe?

Compare neighbouring cells in a 2d array

Counting contiguous numbers within a list

Returning the N largest values' indices in a multidimensional array (can find solutions for one dimension but not multi-dimension)

Categories

Resources