Reconcile with np.fromiter and multidimensional arrays in Python - python-3.x

I am working on coming up with a multi-dimensional array in order to come up with the following result in jupyter notebook.
I have tried several codes but I seem not to be able to produce the forth column with the number range of 30 - 35. The closest I have gone is using this code:
import numpy as np
from itertools import chain
def fun(i):
return tuple(4*i + j for j in range(4))
a = np.fromiter(chain.from_iterable(fun(i) for i in range(6)), 'i', 6 * 4)
a.shape = 6, 4
print(repr(a))
I am expecting the following results:
array([[ 1, 2, 3, 30],
[ 4, 5, 6, 31],
[ 7, 8, 9, 32],
[10, 11, 12, 33],
[13, 14, 15, 34],
[20, 21, 22, 35]])

You can create a flat array with all your subsequent numbers like this:
import numpy as np
a = np.arange(1, 16)
print(a)
# output:
[ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
Then you reshape it:
a = np.reshape(a, (5, 3))
print(a)
# output
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]
[13 14 15]]
Then you add a new row:
a = np.vstack([a, np.arange(20, 23)])
print(a)
# output:
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]
[13 14 15]
[20 21 22]]
You create the column to add:
col = np.arange(30, 36).reshape(-1, 1)
print(col)
# output:
[[30]
[31]
[32]
[33]
[34]
[35]]
You add it:
a = np.concatenate((a, col), axis=1)
print(a)
# output:
[[ 1 2 3 30]
[ 4 5 6 31]
[ 7 8 9 32]
[10 11 12 33]
[13 14 15 34]
[20 21 22 35]]

Related

Python3, True and False of element in ndarray

I saw this question on a forum.
import numpy as np
a = np.arange(16).reshape(4,4)
print(a)
print('-'*20)
print(a[[True,True,False,False]])
print('-'*20)
print(a[:,[True,True,False,False]])
print('-'*20)
print(a[[True,True,False,False],[True,True,False,False]])
the result is
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
--------------------
[[0 1 2 3]
[4 5 6 7]]
--------------------
[[ 0 1]
[ 4 5]
[ 8 9]
[12 13]]
--------------------
[0 5]
He asked that why the result of line "print(a[[True,True,False,False],[True,True,False,False]])" wasn't
[
[0,1],
[4,5]
]
I thought about it and couldn't come to an explain as well.
No one had answer him, yet. Thus I thought that I came here for help.

How to aggregate n previous rows as list in Pandas DataFrame?

As the title says:
a = pd.DataFrame([1,2,3,4,5,6,7,8,9,10])
Having a dataframe with 10 values we want to aggregate say last 5 rows and put them as list into a new column:
>>> a new_col
0
0 1
1 2
2 3
3 4
4 5 [1,2,3,4,5]
5 6 [2,3,4,5,6]
6 7 [3,4,5,6,7]
7 8 [4,5,6,7,8]
8 9 [5,6,7,8,9]
9 10 [6,7,8,9,10]
How?
Due to how rolling windows are implemented, you won't be able to aggregate the results as you expect, but we still can reach your desired result by iterating each window and storing the values as a list of values:
>>> new_col_values = [
window.to_list() if len(window) == 5 else None
for window in df["column"].rolling(5)
]
>>> df["new_col"] = new_col_values
>>> df
column new_col
0 1 None
1 2 None
2 3 None
3 4 None
4 5 [1, 2, 3, 4, 5]
5 6 [2, 3, 4, 5, 6]
6 7 [3, 4, 5, 6, 7]
7 8 [4, 5, 6, 7, 8]
8 9 [5, 6, 7, 8, 9]
9 10 [6, 7, 8, 9, 10]

Creating a TXT file and seeking a position in Python

I have given the following variables:
signal1 = 'speed'
bins1 = [0, 10, 20, 30, 40]
signal2 = 'rpm'
bins2 = [0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500]
hist_result = [ [1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
[1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
[1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
]
I want to create a .TXT file which would look like this with tab separated values:
speed>= 0 10 20 30
speed< 10 20 30 40
rpm>= rpm<
0 500 1 4 5 12
500 1000 5 8 9 0
1000 1500 6 7 11 19
1500 2000 1 4 5 12
2000 2500 -5 8 9 0
2500 3000 -6 7 11 19
3000 3500 1 4 5 12
3500 4000 -5 8 9 0
4000 4500 -6 7 11 19
I have written the following code:
#!/usr/bin/env python3
import os
from datetime import datetime
import time
signal1 = 'speed'
bins1 = [0, 10, 20, 30, 40]
signal2 = 'rpm'
bins2 = [0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500]
hist_result = [ [1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
[1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
[1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
]
filename = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}_{signal1}_results.TXT"
with open(filename, 'w') as f:
# write the bin1 range
f.write('\n\n\n')
f.write('\t\t\t\t')
f.write(signal1 + '>=')
for bin in bins1[:-1]:
f.write('\t' + str(bin))
f.write('\n')
f.write('\t\t\t\t')
f.write(signal1 + '<')
for bin in bins1[1:]:
f.write('\t' + str(bin))
f.write('\n')
# write the bin2 range
f.write('\t\t')
f.write(signal2 + '>=' + '\t' + signal2 + '<' + '\n')
f.write('\t\t')
# store the cursor position from where hist result will be written line by line
track_cursor_pos = []
curr = bins2[0]
for next in bins2[1:]:
f.write(str(curr) + '\t' + str(next))
track_cursor_pos.append(f.tell())
f.write('\n\t\t')
curr = next
f.write('\n')
print(track_cursor_pos)
i = 0
# Everything is fine until here
# Code below doesn't work as expected!?
for result in hist_result:
f.seek(track_cursor_pos[i], os.SEEK_SET)
for r in result:
f.write('\t' + str(r))
f.write('\n')
i += 1
But, this is producing the TXT file whose contents look like this:
speed>= 0 10 20 30
speed< 10 20 30 40
rpm>= rpm<
0 500 1 4 5 12
0 -5 8 9 0
00 -6 7 11 19
1 4 5 12
00 -5 8 9 0
00 -6 7 11 19
1 4 5 12
00 -5 8 9 0
00 -6 7 11 19
I think I am not using the f.seek() properly. Any suggestion would be appreciated. Thanks in advance.
You don't have to seek inside the file to print your data:
signal1 = 'speed'
bins1 = [0, 10, 20, 30, 40]
signal2 = 'rpm'
bins2 = [0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500]
hist_result = [ [1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
[1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
[1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
]
with open('data.txt', 'w') as f_out:
print('\t{signal1}>=\t{bins}'.format(signal1=signal1, bins='\t'.join(map(str,bins1[:-1]))), file=f_out)
print('\t{signal1}<\t{bins}'.format(signal1=signal1, bins='\t'.join(map(str,bins1[1:]))), file=f_out)
print('{signal2}>=\t{signal2}<'.format(signal2=signal2))
for a, b, data in zip(bins2[:-1], bins2[1:], hist_result):
print(a, b, *data, sep='\t', file=f_out)
Creates data.txt:
speed>= 0 10 20 30
speed< 10 20 30 40
rpm>= rpm<
0 500 1 4 5 12
500 1000 -5 8 9 0
1000 1500 -6 7 11 19
1500 2000 1 4 5 12
2000 2500 -5 8 9 0
2500 3000 -6 7 11 19
3000 3500 1 4 5 12
3500 4000 -5 8 9 0
4000 4500 -6 7 11 19

Three-dimensional array processing

I want to turn
arr = np.array([[[1,2,3],[4,5,6],[7,8,9],[10,11,12]], [[2,2,2],[4,5,6],[7,8,9],[10,11,12]], [[3,3,3],[4,5,6],[7,8,9],[10,11,12]]])
into
arr = np.array([[[1,2,3],[7,8,9],[10,11,12]], [[2,2,2],[7,8,9],[10,11,12]], [[3,3,3],[7,8,9],[10,11,12]]])
Below is the code:
a = 0
b = 0
NewArr = []
while a < 3:
c = arr[a, :, :]
d = arr[a]
print(d)
if c[1, 2] == 6:
c = np.delete(c, [1], axis=0)
a += 1
b += 1
c = np.concatenate((d, c), axis=1)
print(c)
But after deleting the line containing the number 6, I cannot stitch the array together,Can someone help me?
thank you very much for your help.
If you want a more automatic way of processing your input data, here is an answer using numpy functions :
arr[np.newaxis,~np.any(arr==6,axis=2)].reshape((3,-1,3))
np.any(arr==6,axis=2) outputs an array which has True at rows which contain the value 6. We take the inverse of those booleans since we want to remove those rows. The solution is then used as an index selection in arr, with a np.newaxis because the output of np.any had one axis less than the original array.
Finally, the output is reshaped into a 3,x,3 array, where x will depend on the number of rows which were removed (hence the -1 in reshape)
Based on the input / output you provide, a simpler solution would be to just use index selection and slices:
import numpy as np
arr = np.array([[[1,2,3],[4,5,6],[7,8,9],[10,11,12]], [[2,2,2],[4,5,6],[7,8,9],[10,11,12]], [[3,3,3],[4,5,6],[7,8,9],[10,11,12]]])
print("arr=")
print(arr)
expected_result = np.array([[[1,2,3],[7,8,9],[10,11,12]], [[2,2,2],[7,8,9],[10,11,12]], [[3,3,3],[7,8,9],[10,11,12]]])
# select indices 0, 2 and 3 from dimension 2
a = np.copy(arr[:,[0,2,3],:])
print("a=")
print(a)
print(np.array_equal(a, expected_result))
Output:
arr=
[[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
[[ 2 2 2]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
[[ 3 3 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]]
a=
[[[ 1 2 3]
[ 7 8 9]
[10 11 12]]
[[ 2 2 2]
[ 7 8 9]
[10 11 12]]
[[ 3 3 3]
[ 7 8 9]
[10 11 12]]]
True

How to shuffle data in python keeping some n number of rows intact

I want to shuffle my data in such manner that each 4 rows remain intact. For example I have 16 rows then first 4 rows can go to last and then second four rows may go to third and so on in any particular order. I am trying to do thins in python
Reshape splitting the first axis into two with the later of length same as the group length = 4, giving us a 3D array and then use np.random.shuffle, which shuffles along the first axis. The reshaped version being a view into the original array, assigns back the results directly into it. Being in-situ, this should be pretty efficient (both memory-wise and on performance).
Hence, the implementation would be as simple as this -
def array_shuffle(a, n=4):
a3D = a.reshape(a.shape[0]//n,n,-1) # a is input array
np.random.shuffle(a3D)
Another variant of it would be to generate random permutations covering the length of the 3D array, then indexing into it with those and finally reshaping back to 2D.This makes a copy, but seems more performant than in-situ edits as shown in the previous method.
The implementation would be -
def array_permuted_indexing(a, n=4):
m = a.shape[0]//n
a3D = a.reshape(m, n, -1)
return a3D[np.random.permutation(m)].reshape(-1,a3D.shape[-1])
Step-by-step run on shuffling method -
1] Setup random input array and split into a 3D version :
In [2]: np.random.seed(0)
In [3]: a = np.random.randint(11,99,(16,3))
In [4]: a3D = a.reshape(a.shape[0]//4,4,-1)
In [5]: a
Out[5]:
array([[55, 58, 75],
[78, 78, 20],
[94, 32, 47],
[98, 81, 23],
[69, 76, 50],
[98, 57, 92],
[48, 36, 88],
[83, 20, 31],
[91, 80, 90],
[58, 75, 93],
[60, 40, 30],
[30, 25, 50],
[43, 76, 20],
[68, 43, 42],
[85, 34, 46],
[86, 66, 39]])
2] Check the 3D array :
In [6]: a3D
Out[6]:
array([[[55, 58, 75],
[78, 78, 20],
[94, 32, 47],
[98, 81, 23]],
[[69, 76, 50],
[98, 57, 92],
[48, 36, 88],
[83, 20, 31]],
[[91, 80, 90],
[58, 75, 93],
[60, 40, 30],
[30, 25, 50]],
[[43, 76, 20],
[68, 43, 42],
[85, 34, 46],
[86, 66, 39]]])
3] Shuffle it along the first axis (in-situ) :
In [7]: np.random.shuffle(a3D)
In [8]: a3D
Out[8]:
array([[[69, 76, 50],
[98, 57, 92],
[48, 36, 88],
[83, 20, 31]],
[[43, 76, 20],
[68, 43, 42],
[85, 34, 46],
[86, 66, 39]],
[[55, 58, 75],
[78, 78, 20],
[94, 32, 47],
[98, 81, 23]],
[[91, 80, 90],
[58, 75, 93],
[60, 40, 30],
[30, 25, 50]]])
4] Verify the changes back in the original array :
In [9]: a
Out[9]:
array([[69, 76, 50],
[98, 57, 92],
[48, 36, 88],
[83, 20, 31],
[43, 76, 20],
[68, 43, 42],
[85, 34, 46],
[86, 66, 39],
[55, 58, 75],
[78, 78, 20],
[94, 32, 47],
[98, 81, 23],
[91, 80, 90],
[58, 75, 93],
[60, 40, 30],
[30, 25, 50]])
Runtime test
In [102]: a = np.random.randint(11,99,(16000,3))
In [103]: df = pd.DataFrame(a)
# #piRSquared's soln1
In [106]: %timeit df.iloc[np.random.permutation(np.arange(df.shape[0]).reshape(-1, 4)).ravel()]
100 loops, best of 3: 2.88 ms per loop
# #piRSquared's soln2
In [107]: %%timeit
...: d = df.set_index(np.arange(len(df)) // 4, append=True).swaplevel(0, 1)
...: pd.concat([d.xs(i) for i in np.random.permutation(range(4))])
100 loops, best of 3: 3.48 ms per loop
# Array based soln-1
In [108]: %timeit array_shuffle(a, n=4)
100 loops, best of 3: 3.38 ms per loop
# Array based soln-2
In [109]: %timeit array_permuted_indexing(a, n=4)
10000 loops, best of 3: 125 µs per loop
Setup
Consider the dataframe df
df = pd.DataFrame(np.random.randint(10, size=(16, 4)), columns=list('WXYZ'))
df
W X Y Z
0 9 8 6 2
1 0 9 5 5
2 7 5 9 4
3 7 1 1 8
4 7 7 2 2
5 5 5 0 2
6 9 3 2 7
7 5 7 2 9
8 6 6 2 8
9 0 7 0 8
10 7 5 5 2
11 6 0 9 5
12 9 2 2 2
13 8 8 2 5
14 4 1 5 6
15 1 2 3 9
Option 1
Inspired by #B.M. and #Divakar
I'm using np.random.permutation because it returns a copy that is a permuted version of what was passed. This means I can then pass that directly to iloc and return what I need.
df.iloc[np.random.permutation(np.arange(16).reshape(-1, 4)).ravel()]
W X Y Z
12 9 2 2 2
13 8 8 2 5
14 4 1 5 6
15 1 2 3 9
0 9 8 6 2
1 0 9 5 5
2 7 5 9 4
3 7 1 1 8
8 6 6 2 8
9 0 7 0 8
10 7 5 5 2
11 6 0 9 5
4 7 7 2 2
5 5 5 0 2
6 9 3 2 7
7 5 7 2 9
Option 2
I'd add a level to the index that we can call on when shuffling
d = df.set_index(np.arange(len(df)) // 4, append=True).swaplevel(0, 1)
d
W X Y Z
0 0 9 8 6 2
1 0 9 5 5
2 7 5 9 4
3 7 1 1 8
1 4 7 7 2 2
5 5 5 0 2
6 9 3 2 7
7 5 7 2 9
2 8 6 6 2 8
9 0 7 0 8
10 7 5 5 2
11 6 0 9 5
3 12 9 2 2 2
13 8 8 2 5
14 4 1 5 6
15 1 2 3 9
Then we can shuffle
pd.concat([d.xs(i) for i in np.random.permutation(range(4))])
W X Y Z
12 9 2 2 2
13 8 8 2 5
14 4 1 5 6
15 1 2 3 9
4 7 7 2 2
5 5 5 0 2
6 9 3 2 7
7 5 7 2 9
0 9 8 6 2
1 0 9 5 5
2 7 5 9 4
3 7 1 1 8
8 6 6 2 8
9 0 7 0 8
10 7 5 5 2
11 6 0 9 5
Below code in python does the magic
from random import shuffle
import numpy as np
from math import ceil
#creating sample dataset
d=[[i*4 +j for i in range(5)] for j in range(25)]
a = np.array(d, int)
print '--------------Input--------------'
print a
gl=4 #group length i.e number of rows needs to be intact
parts=ceil(1.0*len(a)/gl) #no of partitions based on grouplength for the given dataset
#creating partition list and shuffling it to use later
x = [i for i in range(int(parts))]
shuffle(x)
#Creates new dataset based on shuffled partition list
fg=x.pop(0)
f = a[gl*fg:gl*(fg+1)]
for i in x:
t=a[gl*i:(i+1)*gl]
f=np.concatenate((f, t), axis=0)
print '--------------Output--------------'
print f

Resources