Is there a function that shuffles certain portion of both row and column of a numpy matrix in python? - python-3.x

I have a 2D matrix. I want to shuffle last few columns and rows associated with those columns.
I tried using np.random.shuffle but it only shuffles the column, not rows.
def randomize_the_data(original_matrix, reordering_sz):
new_matrix = np.transpose(original_matrix)
np.random.shuffle(new_matrix[reordering_sz:])
shuffled_matrix = np.transpose(new_matrix)
print(shuffled_matrix)
a = np.arange(20).reshape(4, 5)
print(a)
print()
randomize_the_data(a, 2)
My original matrix is this:
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]
I am getting this.
[[ 0 1 3 4 2]
[ 5 6 8 9 7]
[10 11 13 14 12]
[15 16 18 19 17]]
But I want something like this.
[[ 0 1 3 2 4]
[ 5 6 7 8 9]
[10 11 14 12 13]
[15 16 17 18 19]]
Another example would be:
Original =
-1.3702 0.3341 -1.2926 -1.4690 -0.0843
0.0170 0.0332 -0.1189 -0.0234 -0.0398
-0.1755 0.2182 -0.0563 -0.1633 0.1081
-0.0423 -0.0611 -0.8568 0.0184 -0.8866
Randomized =
-1.3702 0.3341 -0.0843 -1.2926 -1.4690
0.0170 0.0332 -0.0398 -0.0234 -0.1189
-0.1755 0.2182 -0.0563 0.1081 -0.1633
-0.0423 -0.0611 0.0184 -0.8866 -0.8568

To shuffle the last elements of each row you can go through each row independently and shuffle the last few numbers by doing the shuffle for each row, the rows will each be shuffled in different ways compared to each other, unlike what you had before where each row was shuffled the same way.
import numpy as np
def randomize_the_data(original_matrix, reordering_sz):
for ln in original_matrix:
np.random.shuffle(ln[reordering_sz:])
print(original_matrix)
a = np.arange(20).reshape(4, 5)
print(a)
print()
randomize_the_data(a, 2)
OUTPUT:
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]
[[ 0 1 4 2 3]
[ 5 6 8 7 9]
[10 11 13 14 12]
[15 16 17 18 19]]

Related

How to show ranges of repeated values in a colum in Python Pandas?

Does anyone know How to find ranges of repeated categorical values in a column?
I mean, it's something like this:
[Floor] [Height]
1 A 10
2 A 11
3 A 12
4 B 13
5 B 14
6 C 15
7 C 16
8 A 17
9 A 18
10 C 19
11 C 20
12 B 21
13 B 22
14 B 23
What I'm trying to achieve is to determine the Height ranges for each Floor, as shown below:
Floor Height
A [10 - 12]
B [13 - 14]
C [15 - 16]
A [17 - 18]
C [19 - 20]
B [21 - 23]
I was trying with pandas.cut() but I can't find the way to set the intervals for repeated values.
Another way
(df.update((df.astype(str)).groupby((df.Floor!=df.Floor.shift())\
.cumsum())["Height"].transform(lambda x: x.iloc[0]+'-'+x.iloc[-1])))
df=df.drop_duplicates()
print(df)
Floor Height
1 A 10-12
4 B 13-14
6 C 15-16
8 A 17-18
10 C 19-20
12 B 21-23
How it works
(df.Floor!=df.Floor.shift())#Gives a bolean selection where the first in Floor is not eqal to the immidiate or consecutive last
1 True
2 False
3 False
4 True
5 False
6 True
7 False
8 True
9 False
10 True
11 False
12 True
13 False
14 False
(df.Floor!=df.Floor.shift()).cumsum()#gives a new group by cumulatively summing the booleans.Remember True is 1 and Faslse is zero hence the cumulation is by 1
1 1
2 1
3 1
4 2
5 2
6 3
7 3
8 4
9 4
10 5
11 5
12 6
13 6
14 6
(df.astype(str)).groupby((df.Floor!=df.Floor.shift()).cumsum())#Insetad of using Floor to classify I use the group derived above. Notice I force the df to be of datatype string and this is because I want to concat the heights. This cannot happen unless they are strings
(df.astype(str)).groupby((df.Floor!=df.Floor.shift())\
.cumsum())["Height"].transform(lambda x: x.iloc[0]+'-'+x.iloc[-1])#I use lambda in transform to concat the heights. You concat strings using +. In this case I introduce - between the heights by simply string + '-'+string
1 10-12
2 10-12
3 10-12
4 13-14
5 13-14
6 15-16
7 15-16
8 17-18
9 17-18
10 19-20
11 19-20
12 21-23
13 21-23
14 21-23
#You notice transform appends values to each row hence I have to drop duplicates later.
#Before dropping duplicates, I have to append the new datframe above to the original.
df.update(newframe above)# gives overwrites the Height with the concatenated heights
df=df.drop_duplicates()#I however have to drop duplicates hence
Try:
(df.groupby(['Floor',
(df['Floor']!=df['Floor'].shift()).cumsum().rename('index')])['Height']
.agg(lambda x: f'{x.min()} - {x.max()}').reset_index(level=0).sort_index())
Output:
Floor Height
index
1 A 10 - 12
2 B 13 - 14
3 C 15 - 16
4 A 17 - 18
5 C 19 - 20
6 B 21 - 23
For those interested, based on #Scott Boston and #wwnde answers.
Just in case you need the ranges in the same row, if you add to both:
df = df[['Floor','Height']]
pd.DataFrame(df.groupby('Floor')['Height'].unique())
The output will be:
Floor Height
A [10 - 12, 17 - 18]
B [13 - 14, 21 - 23]
C [15 - 16, 19 - 20]
Thanks you for your help, and special thanks to #wwnde for that nice explanation.

How to generate an nd-array where values are greater than 1? [duplicate]

This question already has answers here:
Generate random array of floats between a range
(10 answers)
Closed 2 years ago.
Is it possible to generate random numbers in an nd-array such the elements in the array are between 1 and 2 (The interval should be between 1 and some number greater than 1 )? This is what I did.
input_array = np.random.rand(3,10,10)
But the values in the nd-array are between 0 and 1.
Please let me know if that is possible. Any help and suggestions will be highly appreciated.
You can try scaling:
min_val, max_val = 1, 2
input_array = np.random.rand(3,10,10) * (mal_val-min_val) + min_val
or use uniform:
input_array = np.random.uniform(min_val, max_val, (3,10,10))
You can use np.random.randInt() in order to generate nd array with random integers
import numpy as np
rand_arr=np.random.randint(low = 1, high = 10, size = (10,10))
print(rand_arr)
# It changes randomly
#Output:
[[6 9 3 4 9 2 6 2 9 7]
[7 1 7 1 6 2 4 1 8 6]
[9 5 8 3 5 9 9 7 8 4]
[7 3 6 9 9 4 7 2 8 5]
[7 7 7 4 6 6 6 7 2 5]
[3 3 8 5 8 3 4 5 4 3]
[7 8 9 3 5 8 3 5 7 9]
[3 9 7 1 3 6 3 1 4 6]
[2 9 3 9 3 6 8 2 4 8]
[6 3 9 4 9 5 5 6 3 7]]

pd.Series(pred).value_counts() how to get the first column in dataframe?

I apply pd.Series(pred).value_counts() and get this output:
0 2084
-1 15
1 13
3 10
4 7
6 4
11 3
8 3
2 3
9 2
7 2
5 2
10 2
dtype: int64
When I create a list I get only the second column:
c_list=list(pd.Series(pred).value_counts()), Out:
[2084, 15, 13, 10, 7, 4, 3, 3, 3, 2, 2, 2, 2]
How do I get ultimately a dataframe that looks like this including a new column for size% of total size?
df=
[class , size ,relative_size]
0 2084 , x%
-1 15 , y%
1 13 , etc.
3 10
4 7
6 4
11 3
8 3
2 3
9 2
7 2
5 2
10 2
You are very nearly there. Typing this in the blind as you didn't provide a sample input:
df = pd.Series(pred).value_counts().to_frame().reset_index()
df.columns = ['class', 'size']
df['relative_size'] = df['size'] / df['size'].sum()

Concatenate 3D Numpy Arrary

Is there a way to acheive the same output as the following code without using a for loop?
A:
[[[ 1 2 3]
[ 4 5 6]]
[[ 7 8 9]
[10 11 12]]]
Output:
[[ 0 1 2 3 4 5 6 7 8 9 10 11]
[12 13 14 15 16 17 18 19 20 21 22 23]]
Code:
A=np.arange(24).reshape(2,3,4)
v= []
for i in range(A.shape[0]):
v.append(np.concatenate(A[i]))
Just do another reshape!
v = A.reshape(A.shape[0], -1)
It might be helpful to use the Python REPL to experiment with this in the future.

Compare two matrices and create a matrix of their common values [duplicate]

This question already has an answer here:
Numpy intersect1d with array with matrix as elements
(1 answer)
Closed 5 years ago.
I'm currently trying to compare two matrices and return matching rows into the "intersection matrix" via python. Both matrices are numerical data-and I'm trying to return the rows of their common entries (I have also tried just creating a matrix with matching positional entries along the first column and then creating an accompanying tuple). These matrices are not necessarily the same in dimensionality.
Let's say I have two matrices of matching column length but arbitrary (can be very large and different row length)
23 3 4 5 23 3 4 5
12 6 7 8 45 7 8 9
45 7 8 9 34 5 6 7
67 4 5 6 3 5 6 7
I'd like to create a matrix with the "intersection" being for this low dimensional example
23 3 4 5
45 7 8 9
perhaps it looks like this though:
1 2 3 4 2 4 6 7
2 4 6 7 4 10 6 9
4 6 7 8 5 6 7 8
5 6 7 8
in which case we only want:
2 4 6 7
5 6 7 8
I've tried things of this nature:
def compare(x):
# This is a matrix I created with another function-purely numerical data of arbitrary size with fixed column length D
y =n_c(data_cleaner(x))
# this is a second matrix that i'd like to compare it to. note that the sizes are probably not the same, but the columns length are
z=data_cleaner(x)
# I initialized an array that would hold the matching values
compare=[]
# create nested for loop that will check a single index in one matrix over all entries in the second matrix over iteration
for i in range(len(y)):
for j in range(len(z)):
if y[0][i] == z[0][i]:
# I want the row or the n tuple (shown here) of those columns with the matching first indexes as shown above
c_vec = ([0][i],[15][i],[24][i],[0][25],[0][26])
compare.append(c_vec)
else:
pass
return compare
compare(c_i_w)
Sadly, I'm running into some errors. Specifically it seems that I'm telling python to improperly reference values.
Consider the arrays a and b
a = np.array([
[23, 3, 4, 5],
[12, 6, 7, 8],
[45, 7, 8, 9],
[67, 4, 5, 6]
])
b = np.array([
[23, 3, 4, 5],
[45, 7, 8, 9],
[34, 5, 6, 7],
[ 3, 5, 6, 7]
])
print(a)
[[23 3 4 5]
[12 6 7 8]
[45 7 8 9]
[67 4 5 6]]
print(b)
[[23 3 4 5]
[45 7 8 9]
[34 5 6 7]
[ 3 5 6 7]]
Then we can broadcast and get an array of equal rows with
x = (a[:, None] == b).all(-1)
print(x)
[[ True False False False]
[False False False False]
[False True False False]
[False False False False]]
Using np.where we can identify the indices
i, j = np.where(x)
Show which rows of a
print(a[i])
[[23 3 4 5]
[45 7 8 9]]
And which rows of b
print(b[j])
[[23 3 4 5]
[45 7 8 9]]
They are the same! That's good. That's what we wanted.
We can put the results into a pandas dataframe with a MultiIndex with row number from a in the first level and row number from b in the second level.
pd.DataFrame(a[i], [i, j])
0 1 2 3
0 0 23 3 4 5
2 1 45 7 8 9

Resources