Python3, True and False of element in ndarray - python-3.x

I saw this question on a forum.
import numpy as np
a = np.arange(16).reshape(4,4)
print(a)
print('-'*20)
print(a[[True,True,False,False]])
print('-'*20)
print(a[:,[True,True,False,False]])
print('-'*20)
print(a[[True,True,False,False],[True,True,False,False]])
the result is
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
--------------------
[[0 1 2 3]
[4 5 6 7]]
--------------------
[[ 0 1]
[ 4 5]
[ 8 9]
[12 13]]
--------------------
[0 5]
He asked that why the result of line "print(a[[True,True,False,False],[True,True,False,False]])" wasn't
[
[0,1],
[4,5]
]
I thought about it and couldn't come to an explain as well.
No one had answer him, yet. Thus I thought that I came here for help.

Related

Reconcile with np.fromiter and multidimensional arrays in Python

I am working on coming up with a multi-dimensional array in order to come up with the following result in jupyter notebook.
I have tried several codes but I seem not to be able to produce the forth column with the number range of 30 - 35. The closest I have gone is using this code:
import numpy as np
from itertools import chain
def fun(i):
return tuple(4*i + j for j in range(4))
a = np.fromiter(chain.from_iterable(fun(i) for i in range(6)), 'i', 6 * 4)
a.shape = 6, 4
print(repr(a))
I am expecting the following results:
array([[ 1, 2, 3, 30],
[ 4, 5, 6, 31],
[ 7, 8, 9, 32],
[10, 11, 12, 33],
[13, 14, 15, 34],
[20, 21, 22, 35]])
You can create a flat array with all your subsequent numbers like this:
import numpy as np
a = np.arange(1, 16)
print(a)
# output:
[ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
Then you reshape it:
a = np.reshape(a, (5, 3))
print(a)
# output
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]
[13 14 15]]
Then you add a new row:
a = np.vstack([a, np.arange(20, 23)])
print(a)
# output:
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]
[13 14 15]
[20 21 22]]
You create the column to add:
col = np.arange(30, 36).reshape(-1, 1)
print(col)
# output:
[[30]
[31]
[32]
[33]
[34]
[35]]
You add it:
a = np.concatenate((a, col), axis=1)
print(a)
# output:
[[ 1 2 3 30]
[ 4 5 6 31]
[ 7 8 9 32]
[10 11 12 33]
[13 14 15 34]
[20 21 22 35]]

How to aggregate n previous rows as list in Pandas DataFrame?

As the title says:
a = pd.DataFrame([1,2,3,4,5,6,7,8,9,10])
Having a dataframe with 10 values we want to aggregate say last 5 rows and put them as list into a new column:
>>> a new_col
0
0 1
1 2
2 3
3 4
4 5 [1,2,3,4,5]
5 6 [2,3,4,5,6]
6 7 [3,4,5,6,7]
7 8 [4,5,6,7,8]
8 9 [5,6,7,8,9]
9 10 [6,7,8,9,10]
How?
Due to how rolling windows are implemented, you won't be able to aggregate the results as you expect, but we still can reach your desired result by iterating each window and storing the values as a list of values:
>>> new_col_values = [
window.to_list() if len(window) == 5 else None
for window in df["column"].rolling(5)
]
>>> df["new_col"] = new_col_values
>>> df
column new_col
0 1 None
1 2 None
2 3 None
3 4 None
4 5 [1, 2, 3, 4, 5]
5 6 [2, 3, 4, 5, 6]
6 7 [3, 4, 5, 6, 7]
7 8 [4, 5, 6, 7, 8]
8 9 [5, 6, 7, 8, 9]
9 10 [6, 7, 8, 9, 10]

Is there a function that shuffles certain portion of both row and column of a numpy matrix in python?

I have a 2D matrix. I want to shuffle last few columns and rows associated with those columns.
I tried using np.random.shuffle but it only shuffles the column, not rows.
def randomize_the_data(original_matrix, reordering_sz):
new_matrix = np.transpose(original_matrix)
np.random.shuffle(new_matrix[reordering_sz:])
shuffled_matrix = np.transpose(new_matrix)
print(shuffled_matrix)
a = np.arange(20).reshape(4, 5)
print(a)
print()
randomize_the_data(a, 2)
My original matrix is this:
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]
I am getting this.
[[ 0 1 3 4 2]
[ 5 6 8 9 7]
[10 11 13 14 12]
[15 16 18 19 17]]
But I want something like this.
[[ 0 1 3 2 4]
[ 5 6 7 8 9]
[10 11 14 12 13]
[15 16 17 18 19]]
Another example would be:
Original =
-1.3702 0.3341 -1.2926 -1.4690 -0.0843
0.0170 0.0332 -0.1189 -0.0234 -0.0398
-0.1755 0.2182 -0.0563 -0.1633 0.1081
-0.0423 -0.0611 -0.8568 0.0184 -0.8866
Randomized =
-1.3702 0.3341 -0.0843 -1.2926 -1.4690
0.0170 0.0332 -0.0398 -0.0234 -0.1189
-0.1755 0.2182 -0.0563 0.1081 -0.1633
-0.0423 -0.0611 0.0184 -0.8866 -0.8568
To shuffle the last elements of each row you can go through each row independently and shuffle the last few numbers by doing the shuffle for each row, the rows will each be shuffled in different ways compared to each other, unlike what you had before where each row was shuffled the same way.
import numpy as np
def randomize_the_data(original_matrix, reordering_sz):
for ln in original_matrix:
np.random.shuffle(ln[reordering_sz:])
print(original_matrix)
a = np.arange(20).reshape(4, 5)
print(a)
print()
randomize_the_data(a, 2)
OUTPUT:
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]
[[ 0 1 4 2 3]
[ 5 6 8 7 9]
[10 11 13 14 12]
[15 16 17 18 19]]

Why is stratifiedkfold generating the same splits in spite of using different random_state values?

I am trying to generate different stratified splits of my data set using stratifiedkfold split and random_state parameter. However, when I use different random_state values, I still get the same splits. My understanding is that by using different random_state values, you will be able to generate different splits. Please let me know what I am doing incorrectly. Here is the code.
import numpy as np
X_train=np.ones(10)
Y_train=np.ones(10)
from sklearn.model_selection import StratifiedKFold
skf = StratifiedKFold(n_splits=5,random_state=0)
skf1 = StratifiedKFold(n_splits=5,random_state=100)
trn1=[]
cv1=[]
for train, cv in skf.split(X_train, Y_train):
trn1=trn1+[train]
cv1=cv1+[cv]
trn2=[]
cv2=[]
for train, cv in skf1.split(X_train, Y_train):
trn2=trn2+[train]
cv2=cv2+[cv]
for c in list(range(0,5)):
print('Fold:'+str(c+1))
print(trn1[c])
print(trn2[c])
print(cv1[c])
print(cv2[c])
Here is the output
Fold:1
[2 3 4 5 6 7 8 9]
[2 3 4 5 6 7 8 9]
[0 1]
[0 1]
Fold:2
[0 1 4 5 6 7 8 9]
[0 1 4 5 6 7 8 9]
[2 3]
[2 3]
Fold:3
[0 1 2 3 6 7 8 9]
[0 1 2 3 6 7 8 9]
[4 5]
[4 5]
Fold:4
[0 1 2 3 4 5 8 9]
[0 1 2 3 4 5 8 9]
[6 7]
[6 7]
Fold:5
[0 1 2 3 4 5 6 7]
[0 1 2 3 4 5 6 7]
[8 9]
[8 9]
As stated in the documentation:
random_state : int, RandomState instance or None, optional, default=None
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Used when shuffle == True.
So simply add shuffle=True to your StratifiedKFold calls. For example:
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=0)
skf1 = StratifiedKFold(n_splits=5, shuffle=True, random_state=100)
Output:
Fold:1
[0 1 3 4 5 6 7 9]
[0 1 2 3 4 5 8 9]
[2 8]
[6 7]
Fold:2
[0 1 2 3 5 6 7 8]
[0 2 3 4 6 7 8 9]
[4 9]
[1 5]
Fold:3
[0 2 3 4 5 7 8 9]
[0 1 3 5 6 7 8 9]
[1 6]
[2 4]
Fold:4
[0 1 2 4 5 6 8 9]
[1 2 4 5 6 7 8 9]
[3 7]
[0 3]
Fold:5
[1 2 3 4 6 7 8 9]
[0 1 2 3 4 5 6 7]
[0 5]
[8 9]

Three-dimensional array processing

I want to turn
arr = np.array([[[1,2,3],[4,5,6],[7,8,9],[10,11,12]], [[2,2,2],[4,5,6],[7,8,9],[10,11,12]], [[3,3,3],[4,5,6],[7,8,9],[10,11,12]]])
into
arr = np.array([[[1,2,3],[7,8,9],[10,11,12]], [[2,2,2],[7,8,9],[10,11,12]], [[3,3,3],[7,8,9],[10,11,12]]])
Below is the code:
a = 0
b = 0
NewArr = []
while a < 3:
c = arr[a, :, :]
d = arr[a]
print(d)
if c[1, 2] == 6:
c = np.delete(c, [1], axis=0)
a += 1
b += 1
c = np.concatenate((d, c), axis=1)
print(c)
But after deleting the line containing the number 6, I cannot stitch the array together,Can someone help me?
thank you very much for your help.
If you want a more automatic way of processing your input data, here is an answer using numpy functions :
arr[np.newaxis,~np.any(arr==6,axis=2)].reshape((3,-1,3))
np.any(arr==6,axis=2) outputs an array which has True at rows which contain the value 6. We take the inverse of those booleans since we want to remove those rows. The solution is then used as an index selection in arr, with a np.newaxis because the output of np.any had one axis less than the original array.
Finally, the output is reshaped into a 3,x,3 array, where x will depend on the number of rows which were removed (hence the -1 in reshape)
Based on the input / output you provide, a simpler solution would be to just use index selection and slices:
import numpy as np
arr = np.array([[[1,2,3],[4,5,6],[7,8,9],[10,11,12]], [[2,2,2],[4,5,6],[7,8,9],[10,11,12]], [[3,3,3],[4,5,6],[7,8,9],[10,11,12]]])
print("arr=")
print(arr)
expected_result = np.array([[[1,2,3],[7,8,9],[10,11,12]], [[2,2,2],[7,8,9],[10,11,12]], [[3,3,3],[7,8,9],[10,11,12]]])
# select indices 0, 2 and 3 from dimension 2
a = np.copy(arr[:,[0,2,3],:])
print("a=")
print(a)
print(np.array_equal(a, expected_result))
Output:
arr=
[[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
[[ 2 2 2]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
[[ 3 3 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]]
a=
[[[ 1 2 3]
[ 7 8 9]
[10 11 12]]
[[ 2 2 2]
[ 7 8 9]
[10 11 12]]
[[ 3 3 3]
[ 7 8 9]
[10 11 12]]]
True

Resources