I have given the following variables:
signal1 = 'speed'
bins1 = [0, 10, 20, 30, 40]
signal2 = 'rpm'
bins2 = [0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500]
hist_result = [ [1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
[1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
[1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
]
I want to create a .TXT file which would look like this with tab separated values:
speed>= 0 10 20 30
speed< 10 20 30 40
rpm>= rpm<
0 500 1 4 5 12
500 1000 5 8 9 0
1000 1500 6 7 11 19
1500 2000 1 4 5 12
2000 2500 -5 8 9 0
2500 3000 -6 7 11 19
3000 3500 1 4 5 12
3500 4000 -5 8 9 0
4000 4500 -6 7 11 19
I have written the following code:
#!/usr/bin/env python3
import os
from datetime import datetime
import time
signal1 = 'speed'
bins1 = [0, 10, 20, 30, 40]
signal2 = 'rpm'
bins2 = [0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500]
hist_result = [ [1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
[1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
[1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
]
filename = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}_{signal1}_results.TXT"
with open(filename, 'w') as f:
# write the bin1 range
f.write('\n\n\n')
f.write('\t\t\t\t')
f.write(signal1 + '>=')
for bin in bins1[:-1]:
f.write('\t' + str(bin))
f.write('\n')
f.write('\t\t\t\t')
f.write(signal1 + '<')
for bin in bins1[1:]:
f.write('\t' + str(bin))
f.write('\n')
# write the bin2 range
f.write('\t\t')
f.write(signal2 + '>=' + '\t' + signal2 + '<' + '\n')
f.write('\t\t')
# store the cursor position from where hist result will be written line by line
track_cursor_pos = []
curr = bins2[0]
for next in bins2[1:]:
f.write(str(curr) + '\t' + str(next))
track_cursor_pos.append(f.tell())
f.write('\n\t\t')
curr = next
f.write('\n')
print(track_cursor_pos)
i = 0
# Everything is fine until here
# Code below doesn't work as expected!?
for result in hist_result:
f.seek(track_cursor_pos[i], os.SEEK_SET)
for r in result:
f.write('\t' + str(r))
f.write('\n')
i += 1
But, this is producing the TXT file whose contents look like this:
speed>= 0 10 20 30
speed< 10 20 30 40
rpm>= rpm<
0 500 1 4 5 12
0 -5 8 9 0
00 -6 7 11 19
1 4 5 12
00 -5 8 9 0
00 -6 7 11 19
1 4 5 12
00 -5 8 9 0
00 -6 7 11 19
I think I am not using the f.seek() properly. Any suggestion would be appreciated. Thanks in advance.
You don't have to seek inside the file to print your data:
signal1 = 'speed'
bins1 = [0, 10, 20, 30, 40]
signal2 = 'rpm'
bins2 = [0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500]
hist_result = [ [1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
[1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
[1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
]
with open('data.txt', 'w') as f_out:
print('\t{signal1}>=\t{bins}'.format(signal1=signal1, bins='\t'.join(map(str,bins1[:-1]))), file=f_out)
print('\t{signal1}<\t{bins}'.format(signal1=signal1, bins='\t'.join(map(str,bins1[1:]))), file=f_out)
print('{signal2}>=\t{signal2}<'.format(signal2=signal2))
for a, b, data in zip(bins2[:-1], bins2[1:], hist_result):
print(a, b, *data, sep='\t', file=f_out)
Creates data.txt:
speed>= 0 10 20 30
speed< 10 20 30 40
rpm>= rpm<
0 500 1 4 5 12
500 1000 -5 8 9 0
1000 1500 -6 7 11 19
1500 2000 1 4 5 12
2000 2500 -5 8 9 0
2500 3000 -6 7 11 19
3000 3500 1 4 5 12
3500 4000 -5 8 9 0
4000 4500 -6 7 11 19
Related
I am working on coming up with a multi-dimensional array in order to come up with the following result in jupyter notebook.
I have tried several codes but I seem not to be able to produce the forth column with the number range of 30 - 35. The closest I have gone is using this code:
import numpy as np
from itertools import chain
def fun(i):
return tuple(4*i + j for j in range(4))
a = np.fromiter(chain.from_iterable(fun(i) for i in range(6)), 'i', 6 * 4)
a.shape = 6, 4
print(repr(a))
I am expecting the following results:
array([[ 1, 2, 3, 30],
[ 4, 5, 6, 31],
[ 7, 8, 9, 32],
[10, 11, 12, 33],
[13, 14, 15, 34],
[20, 21, 22, 35]])
You can create a flat array with all your subsequent numbers like this:
import numpy as np
a = np.arange(1, 16)
print(a)
# output:
[ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
Then you reshape it:
a = np.reshape(a, (5, 3))
print(a)
# output
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]
[13 14 15]]
Then you add a new row:
a = np.vstack([a, np.arange(20, 23)])
print(a)
# output:
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]
[13 14 15]
[20 21 22]]
You create the column to add:
col = np.arange(30, 36).reshape(-1, 1)
print(col)
# output:
[[30]
[31]
[32]
[33]
[34]
[35]]
You add it:
a = np.concatenate((a, col), axis=1)
print(a)
# output:
[[ 1 2 3 30]
[ 4 5 6 31]
[ 7 8 9 32]
[10 11 12 33]
[13 14 15 34]
[20 21 22 35]]
DataFrame
df=pd.DataFrame({'occurance':[1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0],'value':[45, 3, 2, 12, 14, 32, 1, 1, 6, 4, 9, 32, 78, 96, 12, 6, 3]})
df
Expected output
df=pd.DataFrame({'occurance':[1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0],'value':[45, 3, 2, 12, 14, 32, 1, 1, 6, 4, 9, 32, 78, 96, 12, 6, 3],'group':[1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 4, 100, 5, 5, 5, 5]})
df
I need to transform the dataframe into the output. I am after a wild card that will determine 1 is the start of a new group and a group consists of only 1 followed by n zeroes. If a group criteria is not met, then group it as 100.
I tried in the line of;
bs=df[df.occurance.eq(1).any(1)&df.occurance.shift(-1).eq(0).any(1)].squeeze()
bs
This even when broken down could only bool select start and nothing more.
Any help?
Create mask by compare 1 and next 1 in mask, then filter occurance for all values without them, create cumulative sum by Series.cumsum and last add 100 values by Series.reindex:
m = df.occurance.eq(1) & df.occurance.shift(-1).eq(1)
df['group'] = df.loc[~m, 'occurance'].cumsum().reindex(df.index, fill_value=100)
print (df)
occurance value group
0 1 45 1
1 0 3 1
2 0 2 1
3 0 12 1
4 1 14 2
5 0 32 2
6 0 1 2
7 0 1 2
8 0 6 2
9 0 4 2
10 1 9 3
11 0 32 3
12 1 78 100
13 1 96 4
14 0 12 4
15 0 6 4
16 0 3 4
I am trying to create a vector of the previous 10 values from a pandas column and insert it back into the pandas data frame as a list in a cell.
The below code works but I need to do this for a dataframe of over 30 million rows so it will take too long to do it in a loop.
Can someone please help me convert this to a numpy function that I can apply. I would also like to be able to apply this function in a groupby.
import pandas as pd
df = pd.DataFrame(list(range(1,20)),columns = ['A'])
df.insert(0,'Vector','')
df['Vector'] = df['Vector'].astype(object)
for index, row in df.iterrows():
df['Vector'].iloc[index] = list(df['A'].iloc[(index-10):index])
I have tried in multiple ways but have not been able to get it to work. Any help would be appreciated.
IIUC
df['New']=[df.A.tolist()[max(0,x-10):x] for x in range(len(df))]
df
Out[123]:
A New
0 1 []
1 2 [1]
2 3 [1, 2]
3 4 [1, 2, 3]
4 5 [1, 2, 3, 4]
5 6 [1, 2, 3, 4, 5]
6 7 [1, 2, 3, 4, 5, 6]
7 8 [1, 2, 3, 4, 5, 6, 7]
8 9 [1, 2, 3, 4, 5, 6, 7, 8]
9 10 [1, 2, 3, 4, 5, 6, 7, 8, 9]
10 11 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
11 12 [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
12 13 [3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
13 14 [4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
14 15 [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
15 16 [6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
16 17 [7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
17 18 [8, 9, 10, 11, 12, 13, 14, 15, 16, 17]
18 19 [9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
I have a numpy array:
[[12 13 12 5 6 5 14 4 6 11 11 10 8 11 8 11 7 8 0 0 0]
[ 5 14 4 6 11 11 10 8 11 8 11 8 11 8 11 7 8 0 0 0 0]
[ 5 14 4 6 11 10 10 8 11 8 11 8 11 8 11 8 11 7 8 0 0]
[ 5 14 4 6 11 11 10 7 8 0 0 0 0 0 0 0 0 0 0 0 0]
[ 5 14 4 6 11 11 10 8 11 8 11 8 11 8 11 8 11 8 11 7 8]
[ 5 14 4 6 11 10 8 11 10 8 11 10 8 11 10 7 8 0 0 0 0]
[ 5 14 4 6 11 10 10 8 11 8 11 7 8 0 0 0 0 0 0 0 0]
[ 5 14 4 6 11 11 10 1 11 1 11 7 8 0 0 0 0 0 0 0 0]
[ 5 14 4 6 11 10 10 1 11 1 11 1 11 7 8 0 0 0 0 0 0]
[ 5 14 4 6 11 10 10 8 11 8 11 8 11 7 8 0 0 0 0 0 0]
[ 5 14 4 6 11 10 8 11 10 8 11 10 8 11 10 8 11 7 7 0 0]]
And a colors dictionary:
{0: (0, 0, 0), 1: (17, 17, 17), 2: (34, 34, 34), 3: (51, 51, 51), 4: (68, 68, 68), 5: (85, 85, 85), 6: (102, 102, 102), 7: (119, 119, 119), 8: (136, 136, 136), 9: (153, 153, 153), 10: (170, 170, 170), 11: (187, 187, 187), 12: (204, 204, 204), 13: (221, 221, 221), 14: (238, 238, 238)}
And I'm trying to write pass the array through the dictionary, then write those colors in 10x10 blocks to a .png file. So far I have:
rows = []
for row in arr:
for j in range(10):
for col in row:
for i in range(10):
rows.extend(colors[col])
rows = bytes(rows)
img = Image.frombytes('RGB', (110, 120), rows)
img.save("generated.png")
But this writes it like this:
Which has lines instead of the 10x10 blocks I was trying to write. It seems to me as though the blocks are shifted somehow, but I can't figure out how to un-shift them. Why is this behavior happening?
I believe you only need to change the size parameter to obtain the result you want. Replacing this line should correct the error:
# img = Image.frombytes('RGB', (110, 120), rows)
img = Image.frombytes('RGB', (210, 110), rows)
Size should be a 2-Tuple of the width and height of the image in pixels. The rows list you are creating is an image that is (210,110) pixels. You are drawing that to an image that is (110,120) pixels. This causes the image to break to a new row every 110 pixels.
Here is a working example:
from PIL import Image
array = [
[12, 13, 12, 5, 6, 5, 14, 4, 6, 11, 11, 10, 8, 11, 8, 11, 7, 8, 0, 0, 0],
[5, 14, 4, 6, 11, 11, 10, 8, 11, 8, 11, 8, 11, 8, 11, 7, 8, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 10, 10, 8, 11, 8, 11, 8, 11, 8, 11, 8, 11, 7, 8, 0, 0],
[5, 14, 4, 6, 11, 11, 10, 7, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 11, 10, 8, 11, 8, 11, 8, 11, 8, 11, 8, 11, 8, 11, 7, 8],
[5, 14, 4, 6, 11, 10, 8, 11, 10, 8, 11, 10, 8, 11, 10, 7, 8, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 10, 10, 8, 11, 8, 11, 7, 8, 0, 0, 0, 0, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 11, 10, 1, 11, 1, 11, 7, 8, 0, 0, 0, 0, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 10, 10, 1, 11, 1, 11, 1, 11, 7, 8, 0, 0, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 10, 10, 8, 11, 8, 11, 8, 11, 7, 8, 0, 0, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 10, 8, 11, 10, 8, 11, 10, 8, 11, 10, 8, 11, 7, 7, 0, 0],
]
colors = {
0: (0, 0, 0),
1: (17, 17, 17),
2: (34, 34, 34),
3: (51, 51, 51),
4: (68, 68, 68),
5: (85, 85, 85),
6: (102, 102, 102),
7: (119, 119, 119),
8: (136, 136, 136),
9: (153, 153, 153),
10: (170, 170, 170),
11: (187, 187, 187),
12: (204, 204, 204),
13: (221, 221, 221),
14: (238, 238, 238)
}
rows = []
for row in array:
for _ in range(10):
for col in row:
for _ in range(10):
rows.extend(colors[col])
rows = bytes(rows)
img = Image.frombytes('RGB', (210, 110), rows)
img.save("generated.png")
I want to shuffle my data in such manner that each 4 rows remain intact. For example I have 16 rows then first 4 rows can go to last and then second four rows may go to third and so on in any particular order. I am trying to do thins in python
Reshape splitting the first axis into two with the later of length same as the group length = 4, giving us a 3D array and then use np.random.shuffle, which shuffles along the first axis. The reshaped version being a view into the original array, assigns back the results directly into it. Being in-situ, this should be pretty efficient (both memory-wise and on performance).
Hence, the implementation would be as simple as this -
def array_shuffle(a, n=4):
a3D = a.reshape(a.shape[0]//n,n,-1) # a is input array
np.random.shuffle(a3D)
Another variant of it would be to generate random permutations covering the length of the 3D array, then indexing into it with those and finally reshaping back to 2D.This makes a copy, but seems more performant than in-situ edits as shown in the previous method.
The implementation would be -
def array_permuted_indexing(a, n=4):
m = a.shape[0]//n
a3D = a.reshape(m, n, -1)
return a3D[np.random.permutation(m)].reshape(-1,a3D.shape[-1])
Step-by-step run on shuffling method -
1] Setup random input array and split into a 3D version :
In [2]: np.random.seed(0)
In [3]: a = np.random.randint(11,99,(16,3))
In [4]: a3D = a.reshape(a.shape[0]//4,4,-1)
In [5]: a
Out[5]:
array([[55, 58, 75],
[78, 78, 20],
[94, 32, 47],
[98, 81, 23],
[69, 76, 50],
[98, 57, 92],
[48, 36, 88],
[83, 20, 31],
[91, 80, 90],
[58, 75, 93],
[60, 40, 30],
[30, 25, 50],
[43, 76, 20],
[68, 43, 42],
[85, 34, 46],
[86, 66, 39]])
2] Check the 3D array :
In [6]: a3D
Out[6]:
array([[[55, 58, 75],
[78, 78, 20],
[94, 32, 47],
[98, 81, 23]],
[[69, 76, 50],
[98, 57, 92],
[48, 36, 88],
[83, 20, 31]],
[[91, 80, 90],
[58, 75, 93],
[60, 40, 30],
[30, 25, 50]],
[[43, 76, 20],
[68, 43, 42],
[85, 34, 46],
[86, 66, 39]]])
3] Shuffle it along the first axis (in-situ) :
In [7]: np.random.shuffle(a3D)
In [8]: a3D
Out[8]:
array([[[69, 76, 50],
[98, 57, 92],
[48, 36, 88],
[83, 20, 31]],
[[43, 76, 20],
[68, 43, 42],
[85, 34, 46],
[86, 66, 39]],
[[55, 58, 75],
[78, 78, 20],
[94, 32, 47],
[98, 81, 23]],
[[91, 80, 90],
[58, 75, 93],
[60, 40, 30],
[30, 25, 50]]])
4] Verify the changes back in the original array :
In [9]: a
Out[9]:
array([[69, 76, 50],
[98, 57, 92],
[48, 36, 88],
[83, 20, 31],
[43, 76, 20],
[68, 43, 42],
[85, 34, 46],
[86, 66, 39],
[55, 58, 75],
[78, 78, 20],
[94, 32, 47],
[98, 81, 23],
[91, 80, 90],
[58, 75, 93],
[60, 40, 30],
[30, 25, 50]])
Runtime test
In [102]: a = np.random.randint(11,99,(16000,3))
In [103]: df = pd.DataFrame(a)
# #piRSquared's soln1
In [106]: %timeit df.iloc[np.random.permutation(np.arange(df.shape[0]).reshape(-1, 4)).ravel()]
100 loops, best of 3: 2.88 ms per loop
# #piRSquared's soln2
In [107]: %%timeit
...: d = df.set_index(np.arange(len(df)) // 4, append=True).swaplevel(0, 1)
...: pd.concat([d.xs(i) for i in np.random.permutation(range(4))])
100 loops, best of 3: 3.48 ms per loop
# Array based soln-1
In [108]: %timeit array_shuffle(a, n=4)
100 loops, best of 3: 3.38 ms per loop
# Array based soln-2
In [109]: %timeit array_permuted_indexing(a, n=4)
10000 loops, best of 3: 125 µs per loop
Setup
Consider the dataframe df
df = pd.DataFrame(np.random.randint(10, size=(16, 4)), columns=list('WXYZ'))
df
W X Y Z
0 9 8 6 2
1 0 9 5 5
2 7 5 9 4
3 7 1 1 8
4 7 7 2 2
5 5 5 0 2
6 9 3 2 7
7 5 7 2 9
8 6 6 2 8
9 0 7 0 8
10 7 5 5 2
11 6 0 9 5
12 9 2 2 2
13 8 8 2 5
14 4 1 5 6
15 1 2 3 9
Option 1
Inspired by #B.M. and #Divakar
I'm using np.random.permutation because it returns a copy that is a permuted version of what was passed. This means I can then pass that directly to iloc and return what I need.
df.iloc[np.random.permutation(np.arange(16).reshape(-1, 4)).ravel()]
W X Y Z
12 9 2 2 2
13 8 8 2 5
14 4 1 5 6
15 1 2 3 9
0 9 8 6 2
1 0 9 5 5
2 7 5 9 4
3 7 1 1 8
8 6 6 2 8
9 0 7 0 8
10 7 5 5 2
11 6 0 9 5
4 7 7 2 2
5 5 5 0 2
6 9 3 2 7
7 5 7 2 9
Option 2
I'd add a level to the index that we can call on when shuffling
d = df.set_index(np.arange(len(df)) // 4, append=True).swaplevel(0, 1)
d
W X Y Z
0 0 9 8 6 2
1 0 9 5 5
2 7 5 9 4
3 7 1 1 8
1 4 7 7 2 2
5 5 5 0 2
6 9 3 2 7
7 5 7 2 9
2 8 6 6 2 8
9 0 7 0 8
10 7 5 5 2
11 6 0 9 5
3 12 9 2 2 2
13 8 8 2 5
14 4 1 5 6
15 1 2 3 9
Then we can shuffle
pd.concat([d.xs(i) for i in np.random.permutation(range(4))])
W X Y Z
12 9 2 2 2
13 8 8 2 5
14 4 1 5 6
15 1 2 3 9
4 7 7 2 2
5 5 5 0 2
6 9 3 2 7
7 5 7 2 9
0 9 8 6 2
1 0 9 5 5
2 7 5 9 4
3 7 1 1 8
8 6 6 2 8
9 0 7 0 8
10 7 5 5 2
11 6 0 9 5
Below code in python does the magic
from random import shuffle
import numpy as np
from math import ceil
#creating sample dataset
d=[[i*4 +j for i in range(5)] for j in range(25)]
a = np.array(d, int)
print '--------------Input--------------'
print a
gl=4 #group length i.e number of rows needs to be intact
parts=ceil(1.0*len(a)/gl) #no of partitions based on grouplength for the given dataset
#creating partition list and shuffling it to use later
x = [i for i in range(int(parts))]
shuffle(x)
#Creates new dataset based on shuffled partition list
fg=x.pop(0)
f = a[gl*fg:gl*(fg+1)]
for i in x:
t=a[gl*i:(i+1)*gl]
f=np.concatenate((f, t), axis=0)
print '--------------Output--------------'
print f