Concatenate 3D Numpy Arrary - python-3.x

Is there a way to acheive the same output as the following code without using a for loop?
A:
[[[ 1 2 3]
[ 4 5 6]]
[[ 7 8 9]
[10 11 12]]]
Output:
[[ 0 1 2 3 4 5 6 7 8 9 10 11]
[12 13 14 15 16 17 18 19 20 21 22 23]]
Code:
A=np.arange(24).reshape(2,3,4)
v= []
for i in range(A.shape[0]):
v.append(np.concatenate(A[i]))

Just do another reshape!
v = A.reshape(A.shape[0], -1)
It might be helpful to use the Python REPL to experiment with this in the future.

Related

How to show ranges of repeated values in a colum in Python Pandas?

Does anyone know How to find ranges of repeated categorical values in a column?
I mean, it's something like this:
[Floor] [Height]
1 A 10
2 A 11
3 A 12
4 B 13
5 B 14
6 C 15
7 C 16
8 A 17
9 A 18
10 C 19
11 C 20
12 B 21
13 B 22
14 B 23
What I'm trying to achieve is to determine the Height ranges for each Floor, as shown below:
Floor Height
A [10 - 12]
B [13 - 14]
C [15 - 16]
A [17 - 18]
C [19 - 20]
B [21 - 23]
I was trying with pandas.cut() but I can't find the way to set the intervals for repeated values.
Another way
(df.update((df.astype(str)).groupby((df.Floor!=df.Floor.shift())\
.cumsum())["Height"].transform(lambda x: x.iloc[0]+'-'+x.iloc[-1])))
df=df.drop_duplicates()
print(df)
Floor Height
1 A 10-12
4 B 13-14
6 C 15-16
8 A 17-18
10 C 19-20
12 B 21-23
How it works
(df.Floor!=df.Floor.shift())#Gives a bolean selection where the first in Floor is not eqal to the immidiate or consecutive last
1 True
2 False
3 False
4 True
5 False
6 True
7 False
8 True
9 False
10 True
11 False
12 True
13 False
14 False
(df.Floor!=df.Floor.shift()).cumsum()#gives a new group by cumulatively summing the booleans.Remember True is 1 and Faslse is zero hence the cumulation is by 1
1 1
2 1
3 1
4 2
5 2
6 3
7 3
8 4
9 4
10 5
11 5
12 6
13 6
14 6
(df.astype(str)).groupby((df.Floor!=df.Floor.shift()).cumsum())#Insetad of using Floor to classify I use the group derived above. Notice I force the df to be of datatype string and this is because I want to concat the heights. This cannot happen unless they are strings
(df.astype(str)).groupby((df.Floor!=df.Floor.shift())\
.cumsum())["Height"].transform(lambda x: x.iloc[0]+'-'+x.iloc[-1])#I use lambda in transform to concat the heights. You concat strings using +. In this case I introduce - between the heights by simply string + '-'+string
1 10-12
2 10-12
3 10-12
4 13-14
5 13-14
6 15-16
7 15-16
8 17-18
9 17-18
10 19-20
11 19-20
12 21-23
13 21-23
14 21-23
#You notice transform appends values to each row hence I have to drop duplicates later.
#Before dropping duplicates, I have to append the new datframe above to the original.
df.update(newframe above)# gives overwrites the Height with the concatenated heights
df=df.drop_duplicates()#I however have to drop duplicates hence
Try:
(df.groupby(['Floor',
(df['Floor']!=df['Floor'].shift()).cumsum().rename('index')])['Height']
.agg(lambda x: f'{x.min()} - {x.max()}').reset_index(level=0).sort_index())
Output:
Floor Height
index
1 A 10 - 12
2 B 13 - 14
3 C 15 - 16
4 A 17 - 18
5 C 19 - 20
6 B 21 - 23
For those interested, based on #Scott Boston and #wwnde answers.
Just in case you need the ranges in the same row, if you add to both:
df = df[['Floor','Height']]
pd.DataFrame(df.groupby('Floor')['Height'].unique())
The output will be:
Floor Height
A [10 - 12, 17 - 18]
B [13 - 14, 21 - 23]
C [15 - 16, 19 - 20]
Thanks you for your help, and special thanks to #wwnde for that nice explanation.

How to generate an nd-array where values are greater than 1? [duplicate]

This question already has answers here:
Generate random array of floats between a range
(10 answers)
Closed 2 years ago.
Is it possible to generate random numbers in an nd-array such the elements in the array are between 1 and 2 (The interval should be between 1 and some number greater than 1 )? This is what I did.
input_array = np.random.rand(3,10,10)
But the values in the nd-array are between 0 and 1.
Please let me know if that is possible. Any help and suggestions will be highly appreciated.
You can try scaling:
min_val, max_val = 1, 2
input_array = np.random.rand(3,10,10) * (mal_val-min_val) + min_val
or use uniform:
input_array = np.random.uniform(min_val, max_val, (3,10,10))
You can use np.random.randInt() in order to generate nd array with random integers
import numpy as np
rand_arr=np.random.randint(low = 1, high = 10, size = (10,10))
print(rand_arr)
# It changes randomly
#Output:
[[6 9 3 4 9 2 6 2 9 7]
[7 1 7 1 6 2 4 1 8 6]
[9 5 8 3 5 9 9 7 8 4]
[7 3 6 9 9 4 7 2 8 5]
[7 7 7 4 6 6 6 7 2 5]
[3 3 8 5 8 3 4 5 4 3]
[7 8 9 3 5 8 3 5 7 9]
[3 9 7 1 3 6 3 1 4 6]
[2 9 3 9 3 6 8 2 4 8]
[6 3 9 4 9 5 5 6 3 7]]

pd.Series(pred).value_counts() how to get the first column in dataframe?

I apply pd.Series(pred).value_counts() and get this output:
0 2084
-1 15
1 13
3 10
4 7
6 4
11 3
8 3
2 3
9 2
7 2
5 2
10 2
dtype: int64
When I create a list I get only the second column:
c_list=list(pd.Series(pred).value_counts()), Out:
[2084, 15, 13, 10, 7, 4, 3, 3, 3, 2, 2, 2, 2]
How do I get ultimately a dataframe that looks like this including a new column for size% of total size?
df=
[class , size ,relative_size]
0 2084 , x%
-1 15 , y%
1 13 , etc.
3 10
4 7
6 4
11 3
8 3
2 3
9 2
7 2
5 2
10 2
You are very nearly there. Typing this in the blind as you didn't provide a sample input:
df = pd.Series(pred).value_counts().to_frame().reset_index()
df.columns = ['class', 'size']
df['relative_size'] = df['size'] / df['size'].sum()

Is there a function that shuffles certain portion of both row and column of a numpy matrix in python?

I have a 2D matrix. I want to shuffle last few columns and rows associated with those columns.
I tried using np.random.shuffle but it only shuffles the column, not rows.
def randomize_the_data(original_matrix, reordering_sz):
new_matrix = np.transpose(original_matrix)
np.random.shuffle(new_matrix[reordering_sz:])
shuffled_matrix = np.transpose(new_matrix)
print(shuffled_matrix)
a = np.arange(20).reshape(4, 5)
print(a)
print()
randomize_the_data(a, 2)
My original matrix is this:
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]
I am getting this.
[[ 0 1 3 4 2]
[ 5 6 8 9 7]
[10 11 13 14 12]
[15 16 18 19 17]]
But I want something like this.
[[ 0 1 3 2 4]
[ 5 6 7 8 9]
[10 11 14 12 13]
[15 16 17 18 19]]
Another example would be:
Original =
-1.3702 0.3341 -1.2926 -1.4690 -0.0843
0.0170 0.0332 -0.1189 -0.0234 -0.0398
-0.1755 0.2182 -0.0563 -0.1633 0.1081
-0.0423 -0.0611 -0.8568 0.0184 -0.8866
Randomized =
-1.3702 0.3341 -0.0843 -1.2926 -1.4690
0.0170 0.0332 -0.0398 -0.0234 -0.1189
-0.1755 0.2182 -0.0563 0.1081 -0.1633
-0.0423 -0.0611 0.0184 -0.8866 -0.8568
To shuffle the last elements of each row you can go through each row independently and shuffle the last few numbers by doing the shuffle for each row, the rows will each be shuffled in different ways compared to each other, unlike what you had before where each row was shuffled the same way.
import numpy as np
def randomize_the_data(original_matrix, reordering_sz):
for ln in original_matrix:
np.random.shuffle(ln[reordering_sz:])
print(original_matrix)
a = np.arange(20).reshape(4, 5)
print(a)
print()
randomize_the_data(a, 2)
OUTPUT:
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]
[[ 0 1 4 2 3]
[ 5 6 8 7 9]
[10 11 13 14 12]
[15 16 17 18 19]]

Expanding/Duplicating dataframe rows based on condition

I am an R user who has recently started using Python 3 for data management. I am struggling with a way to expand/duplicate data frame rows based on a condition. I also need to be able to expand rows in a variable way. I'll illustrate with this example.
I have this data:
df = pd.DataFrame([[1, 10], [1,15], [2,10], [2, 15], [2, 20], [3, 10], [3, 15]], columns = ['id', 'var'])
df
Out[6]:
id var
0 1 10
1 1 15
2 2 10
3 2 15
4 2 20
5 3 10
6 3 15
I would like to expand rows for both ID == 1 and ID == 3. I would also like to expand each ID == 1 row by 1 duplicate each, and I would like to expand each ID == 3 row by 2 duplicates each. The result would look like this:
df2
Out[8]:
id var
0 1 10
1 1 10
2 1 15
3 1 15
4 2 10
5 2 15
6 2 20
7 3 10
8 3 10
9 3 10
10 3 15
11 3 15
12 3 15
13 3 15
I've been trying to use np.repeat, but I am failing to think of a way that I can use both ID condition and variable duplication numbers at the same time. Index ordering does not matter here, only that the rows are duplicated appropriately. I apologize in advance if this is an easy question. Thanks in advance for any help and feel free to ask clarifying questions.
This should do it:
dup = {1: 1, 3:2} #what value and how much to add
res = df.copy()
for k, v in dup.items():
for i in range(v):
res = res.append(df.loc[df['id']==k], ignore_index=True)
res.sort_values(['id', 'var'], inplace=True)
res.reset_index(inplace=True, drop=True)
res
# id var
#0 1 10
#1 1 10
#2 1 15
#3 1 15
#4 2 10
#5 2 15
#6 2 20
#7 3 10
#8 3 10
#9 3 10
#10 3 15
#11 3 15
#12 3 15
P.S. your desired solution had 7 values for id 3 while your description implies 6 values.
I think below code gets your job done:
df_1=df.loc[df.id==1]
df_3=df.loc[df.id==3]
df1=df.append([df_1]*1,ignore_index=True)
df1.append([df_3]*2,ignore_index=True).sort_values(by='id')
id var
0 1 10
1 1 15
7 1 10
8 1 15
2 2 10
3 2 15
4 2 20
5 3 10
6 3 15
9 3 10
10 3 15
11 3 10
12 3 15

Resources