Creating a vector containing the next 10 row-column values for each pandas row - python-3.x

I am trying to create a vector of the previous 10 values from a pandas column and insert it back into the pandas data frame as a list in a cell.
The below code works but I need to do this for a dataframe of over 30 million rows so it will take too long to do it in a loop.
Can someone please help me convert this to a numpy function that I can apply. I would also like to be able to apply this function in a groupby.
import pandas as pd
df = pd.DataFrame(list(range(1,20)),columns = ['A'])
df.insert(0,'Vector','')
df['Vector'] = df['Vector'].astype(object)
for index, row in df.iterrows():
df['Vector'].iloc[index] = list(df['A'].iloc[(index-10):index])
I have tried in multiple ways but have not been able to get it to work. Any help would be appreciated.

IIUC
df['New']=[df.A.tolist()[max(0,x-10):x] for x in range(len(df))]
df
Out[123]:
A New
0 1 []
1 2 [1]
2 3 [1, 2]
3 4 [1, 2, 3]
4 5 [1, 2, 3, 4]
5 6 [1, 2, 3, 4, 5]
6 7 [1, 2, 3, 4, 5, 6]
7 8 [1, 2, 3, 4, 5, 6, 7]
8 9 [1, 2, 3, 4, 5, 6, 7, 8]
9 10 [1, 2, 3, 4, 5, 6, 7, 8, 9]
10 11 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
11 12 [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
12 13 [3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
13 14 [4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
14 15 [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
15 16 [6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
16 17 [7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
17 18 [8, 9, 10, 11, 12, 13, 14, 15, 16, 17]
18 19 [9, 10, 11, 12, 13, 14, 15, 16, 17, 18]

Related

print all possible routes of a conditional binary tree by python3

I want to print a conditioned binary tree.
Take an example of five different lists a b c d e:
1
2, 4
3, 5, 7
6, 8
9
The condition is that the following number must be larger than the previous number, so printing 1, 4, 3, 6, 9 is wrong.
The desired result is:
1, 2, 3, 6, 9
1, 2, 5, 6, 9
1, 4, 5, 8, 9
1, 4, 7, 8, 9
How to get those lists by python3?
Thank you very much.

Creating a TXT file and seeking a position in Python

I have given the following variables:
signal1 = 'speed'
bins1 = [0, 10, 20, 30, 40]
signal2 = 'rpm'
bins2 = [0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500]
hist_result = [ [1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
[1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
[1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
]
I want to create a .TXT file which would look like this with tab separated values:
speed>= 0 10 20 30
speed< 10 20 30 40
rpm>= rpm<
0 500 1 4 5 12
500 1000 5 8 9 0
1000 1500 6 7 11 19
1500 2000 1 4 5 12
2000 2500 -5 8 9 0
2500 3000 -6 7 11 19
3000 3500 1 4 5 12
3500 4000 -5 8 9 0
4000 4500 -6 7 11 19
I have written the following code:
#!/usr/bin/env python3
import os
from datetime import datetime
import time
signal1 = 'speed'
bins1 = [0, 10, 20, 30, 40]
signal2 = 'rpm'
bins2 = [0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500]
hist_result = [ [1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
[1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
[1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
]
filename = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}_{signal1}_results.TXT"
with open(filename, 'w') as f:
# write the bin1 range
f.write('\n\n\n')
f.write('\t\t\t\t')
f.write(signal1 + '>=')
for bin in bins1[:-1]:
f.write('\t' + str(bin))
f.write('\n')
f.write('\t\t\t\t')
f.write(signal1 + '<')
for bin in bins1[1:]:
f.write('\t' + str(bin))
f.write('\n')
# write the bin2 range
f.write('\t\t')
f.write(signal2 + '>=' + '\t' + signal2 + '<' + '\n')
f.write('\t\t')
# store the cursor position from where hist result will be written line by line
track_cursor_pos = []
curr = bins2[0]
for next in bins2[1:]:
f.write(str(curr) + '\t' + str(next))
track_cursor_pos.append(f.tell())
f.write('\n\t\t')
curr = next
f.write('\n')
print(track_cursor_pos)
i = 0
# Everything is fine until here
# Code below doesn't work as expected!?
for result in hist_result:
f.seek(track_cursor_pos[i], os.SEEK_SET)
for r in result:
f.write('\t' + str(r))
f.write('\n')
i += 1
But, this is producing the TXT file whose contents look like this:
speed>= 0 10 20 30
speed< 10 20 30 40
rpm>= rpm<
0 500 1 4 5 12
0 -5 8 9 0
00 -6 7 11 19
1 4 5 12
00 -5 8 9 0
00 -6 7 11 19
1 4 5 12
00 -5 8 9 0
00 -6 7 11 19
I think I am not using the f.seek() properly. Any suggestion would be appreciated. Thanks in advance.
You don't have to seek inside the file to print your data:
signal1 = 'speed'
bins1 = [0, 10, 20, 30, 40]
signal2 = 'rpm'
bins2 = [0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500]
hist_result = [ [1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
[1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
[1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
]
with open('data.txt', 'w') as f_out:
print('\t{signal1}>=\t{bins}'.format(signal1=signal1, bins='\t'.join(map(str,bins1[:-1]))), file=f_out)
print('\t{signal1}<\t{bins}'.format(signal1=signal1, bins='\t'.join(map(str,bins1[1:]))), file=f_out)
print('{signal2}>=\t{signal2}<'.format(signal2=signal2))
for a, b, data in zip(bins2[:-1], bins2[1:], hist_result):
print(a, b, *data, sep='\t', file=f_out)
Creates data.txt:
speed>= 0 10 20 30
speed< 10 20 30 40
rpm>= rpm<
0 500 1 4 5 12
500 1000 -5 8 9 0
1000 1500 -6 7 11 19
1500 2000 1 4 5 12
2000 2500 -5 8 9 0
2500 3000 -6 7 11 19
3000 3500 1 4 5 12
3500 4000 -5 8 9 0
4000 4500 -6 7 11 19

Groupby arrays in a pandas dataframe

Consider a dataframe with numpy arrays as entries for lat/lon:
lat lon min max
[1, 2, 3] [4, 5, 6] 10 90
[1, 2, 3] [4, 5, 6] 80 120
[7, 8, 9] [4, 5, 6] 10 20
[7, 8, 9] [4, 5, 6] 30 40
How can I group the dataset by unique lat/lon combinations when the entries are numpy arrays? The goal is to check if min/max ranges intersect for unique lat/lon combinations and then combine them to a single row with new min/max. The result should look like this:
lat lon min max
[1, 2, 3] [4, 5, 6] 10 120
[7, 8, 9] [4, 5, 6] 10 20
[7, 8, 9] [4, 5, 6] 30 40
What I have tried so far is:
grouped = sectors.groupby(['lat', 'lon'])
But I can not access the groups in grouped. The following will result in an Error (TypeError: unhashable type: 'numpy.ndarray'):
for name, group in grouped:
print(name)
print(group)

How to calculate minimum number of swap to make the median of two sorted arrays equal?

1 2 3 3 5 6 7
4 6 8 8 9 9 9
We just need two swap operations.
First swap operation:
Take 1 from A and 9 from B and swap them.
Now the arrays look like this: A = [2, 3, 3, 5, 6, 7, 9] and B = [1, 4, 6, 8, 8, 9, 9].
Second swap operation:
Take 2 from A and 9 from B and swap them.
Now the arrays look like this: A = [3, 3, 5, 6, 7, 9, 9] and B = [1, 2, 4, 6, 8, 8, 9].
Now the median of both arrays is 6.

Image.frombytes not writing squares

I have a numpy array:
[[12 13 12 5 6 5 14 4 6 11 11 10 8 11 8 11 7 8 0 0 0]
[ 5 14 4 6 11 11 10 8 11 8 11 8 11 8 11 7 8 0 0 0 0]
[ 5 14 4 6 11 10 10 8 11 8 11 8 11 8 11 8 11 7 8 0 0]
[ 5 14 4 6 11 11 10 7 8 0 0 0 0 0 0 0 0 0 0 0 0]
[ 5 14 4 6 11 11 10 8 11 8 11 8 11 8 11 8 11 8 11 7 8]
[ 5 14 4 6 11 10 8 11 10 8 11 10 8 11 10 7 8 0 0 0 0]
[ 5 14 4 6 11 10 10 8 11 8 11 7 8 0 0 0 0 0 0 0 0]
[ 5 14 4 6 11 11 10 1 11 1 11 7 8 0 0 0 0 0 0 0 0]
[ 5 14 4 6 11 10 10 1 11 1 11 1 11 7 8 0 0 0 0 0 0]
[ 5 14 4 6 11 10 10 8 11 8 11 8 11 7 8 0 0 0 0 0 0]
[ 5 14 4 6 11 10 8 11 10 8 11 10 8 11 10 8 11 7 7 0 0]]
And a colors dictionary:
{0: (0, 0, 0), 1: (17, 17, 17), 2: (34, 34, 34), 3: (51, 51, 51), 4: (68, 68, 68), 5: (85, 85, 85), 6: (102, 102, 102), 7: (119, 119, 119), 8: (136, 136, 136), 9: (153, 153, 153), 10: (170, 170, 170), 11: (187, 187, 187), 12: (204, 204, 204), 13: (221, 221, 221), 14: (238, 238, 238)}
And I'm trying to write pass the array through the dictionary, then write those colors in 10x10 blocks to a .png file. So far I have:
rows = []
for row in arr:
for j in range(10):
for col in row:
for i in range(10):
rows.extend(colors[col])
rows = bytes(rows)
img = Image.frombytes('RGB', (110, 120), rows)
img.save("generated.png")
But this writes it like this:
Which has lines instead of the 10x10 blocks I was trying to write. It seems to me as though the blocks are shifted somehow, but I can't figure out how to un-shift them. Why is this behavior happening?
I believe you only need to change the size parameter to obtain the result you want. Replacing this line should correct the error:
# img = Image.frombytes('RGB', (110, 120), rows)
img = Image.frombytes('RGB', (210, 110), rows)
Size should be a 2-Tuple of the width and height of the image in pixels. The rows list you are creating is an image that is (210,110) pixels. You are drawing that to an image that is (110,120) pixels. This causes the image to break to a new row every 110 pixels.
Here is a working example:
from PIL import Image
array = [
[12, 13, 12, 5, 6, 5, 14, 4, 6, 11, 11, 10, 8, 11, 8, 11, 7, 8, 0, 0, 0],
[5, 14, 4, 6, 11, 11, 10, 8, 11, 8, 11, 8, 11, 8, 11, 7, 8, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 10, 10, 8, 11, 8, 11, 8, 11, 8, 11, 8, 11, 7, 8, 0, 0],
[5, 14, 4, 6, 11, 11, 10, 7, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 11, 10, 8, 11, 8, 11, 8, 11, 8, 11, 8, 11, 8, 11, 7, 8],
[5, 14, 4, 6, 11, 10, 8, 11, 10, 8, 11, 10, 8, 11, 10, 7, 8, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 10, 10, 8, 11, 8, 11, 7, 8, 0, 0, 0, 0, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 11, 10, 1, 11, 1, 11, 7, 8, 0, 0, 0, 0, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 10, 10, 1, 11, 1, 11, 1, 11, 7, 8, 0, 0, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 10, 10, 8, 11, 8, 11, 8, 11, 7, 8, 0, 0, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 10, 8, 11, 10, 8, 11, 10, 8, 11, 10, 8, 11, 7, 7, 0, 0],
]
colors = {
0: (0, 0, 0),
1: (17, 17, 17),
2: (34, 34, 34),
3: (51, 51, 51),
4: (68, 68, 68),
5: (85, 85, 85),
6: (102, 102, 102),
7: (119, 119, 119),
8: (136, 136, 136),
9: (153, 153, 153),
10: (170, 170, 170),
11: (187, 187, 187),
12: (204, 204, 204),
13: (221, 221, 221),
14: (238, 238, 238)
}
rows = []
for row in array:
for _ in range(10):
for col in row:
for _ in range(10):
rows.extend(colors[col])
rows = bytes(rows)
img = Image.frombytes('RGB', (210, 110), rows)
img.save("generated.png")

Resources