I need a help from the group, I am trying to make the logic based on the below conditions:
Have 4 input columns- l1, l2, l3 and two output columns (output_val and output_col) are driven from the input columns
If value of mon is less 5: Output_val takes value from l1 and output_col would 'l1'. If l1 is nan, preference is given to l3 and then to l2.
From mon 5 to 7 (inclusive): value transitions from l1 to l2:
1. In mon 5: output_val: 0.60*l1 + 0.40*l2; output_col: 0.60*l1+0.40*l2
2. In mon 6: output_val: 0.50*l1 + 0.50*l2 and output_col: '0.50(l1+l2)' (If any one of them is Nan then 100% is given to the available one like in this case l2 is Nan then 1*l1);
3. In mon 7: output_val: 0.40*l1 + 0.60*l2 and output_col: '0.40*l1+0.60*l2'
from mon 8-10: output_val=l2, and output_col='l2' but if l2 is Nan, then preference is given to l1 and then to l3
from mon 11-13: output_val transitions from l2 to l3 (similar logic as mon 5 to 7)
1. in mon 11: output_val: 0.60*l2 + 0.40*l3; output_col: 0.60*l2+0.40*l3
2. In mon 12: output_val: 0.50*l2 + 0.50*l3 and output_col: '0.50(l2+l3)' (If any one of them is Nan then 100% is given to the avaiable one);
3. In mon 13: output_val: 0.40*l2 + 0.60*l3 and output_col: '0.40*l2 + 0.60*l3'
from mon 14 to n: output_val: l3, and output_col=l3; If L3 is not available then first preference is given to l2 and if l1 and l2 is not given then give preference to l1
If all l1, l2, and l3 columns are Nan then output_val=0 and output_col=' '
Input dataframe looks like below:
import numpy as np
import pandas as pd
data = {'l1': [2,3,np.nan,3,4,1,23,5,np.nan, 100, 101, 200, 121, 431, 341],
'l2': [12,13,np.nan,13,14,np.nan,123,15,np.nan, 200, 87, 65, 23, 54, np.nan],
'l3': [np.nan,333,111,np.nan,334,111,123,5,np.nan, np.nan, 65, 154, 341, np.nan, np.nan],
'mon':[1,2,3,4,5,6,7,8,9,10, 11, 12, 13, 14, 15]}
data = pd.DataFrame(data)
data
And output dataframe with two extra columns: 'output_val' and 'output_col' looks like:
output_data = {'l1': [2,3,1,3,4,1,23,5,np.nan, 100, 101, 200, 121, 431, 341],
'l2': [12,13,np.nan,13,14,np.nan,123,15,np.nan, 200, 87, 65, 23, 54, np.nan],
'l3': [np.nan,333,111,np.nan,334,111,123,5,np.nan, np.nan, 65, 154, 341, np.nan, np.nan],
'mon':[1,2,3,4,5,6,7,8,9,10, 11, 12, 13, 14, 15],
'output_val': [2,3,111, 3, 13.44, 1, 678.96, 15, 0, 200, 1357, 2502, 1882, 54, 341],
'output_col':['l1', 'l1', 'l3', 'l1', '0.60*l1+0.40*l2', '1*l1', '0.40*l1+0.60*l2', 'l2', ' ', 'l2', '0.60*l2+0.40*l3', '0.50*(l2+l3)', '0.40*l2+0.60*l3', 'l2', 'l1']}
output_data = pd.DataFrame(output_data)
Related
So I have a chess board represented as an array of size 64 with top-left square being 0 and bottom-right square being 63. I have this function which gives all possible moves of King.
current_pos = i
arr = np.array([i-9, i-8, i-7, i-1, i+1, i+7, i+8, 1+9])
return arr
.
.
.
if selected position is in arr:
move king
Where i is the number of the square on which king currently is.
This works if the king is not on edges of the chessboard.
BUT if king is on the bottom-right square, that is number 63, the function gives bottom-left square that is number 56 as a valid position for a king to move.
Is there any efficient way to know that the king is going to the other edge and is not a valid move?
I'm having same problems with almost all my pieces where the function will allow piece to go on the other side of board but i figured king's movement was the simplest to ask.
A 1D list is way faster than a 2D 8x8 list so I like that you are using this approach.
The way this is handled is to use a 10x12 board where you have an extra 2 rows on bottom and top, and an extra column on the left and right:
Then in your generate move function you simple check if the square you are looking at is within the board. If it is not, you skip to the next square in your loop.
Please read more about it on https://www.chessprogramming.org/10x12_Board. It is also a great site for information about chess programming in general.
Here is one approach using table lookup.
Code
piece_offsets = {
'n': [-17, -15, -10, -6, 6, 10, 15, 17],
'b': [ -9, -7, 9, 7],
'r': [ -8, -1, 8, 1],
'q': [ -9, -8, -7, -1, 9, 8, 7, 1],
'k': [ -9, -8, -7, -1, 9, 8, 7, 1]
}
sqdist = [[0 for x in range(64)] for y in range(64)]
pseudo_legal = {
'n': [[] for y in range(64)],
'b': [[] for y in range(64)],
'r': [[] for y in range(64)],
'q': [[] for y in range(64)],
'k': [[] for y in range(64)],
}
def distance(sq1, sq2):
file1 = sq1 & 7
file2 = sq2 & 7
rank1 = sq1 >> 3
rank2 = sq2 >> 3
rank_distance = abs(rank2 - rank1)
file_distance = abs(file2 - file1)
return max(rank_distance, file_distance)
def print_board():
for i in range(64):
print(f'{i:02d} ', end='')
if (i+1)%8 == 0:
print()
def on_board(s):
return s >= 0 and s < 64
def init_board():
for sq1 in range(64):
for sq2 in range(64):
sqdist[sq1][sq2] = distance(sq1, sq2)
for pt in ['n', 'b', 'r', 'q', 'k']:
for s in range(64):
for offset in piece_offsets[pt]:
to = s + offset
if pt in ['k', 'n']:
if on_board(to) and sqdist[s][to] < 4:
pseudo_legal[pt][s].append(to)
else: # sliders
s1 = s
while True:
to1 = s1 + offset
if on_board(to1) and sqdist[s1][to1] < 4:
pseudo_legal[pt][s].append(to1)
s1 = to1
else:
break
def main():
init_board() # build sqdist and pseudo_legal_to tables
print_board()
print()
for pt in ['n', 'b', 'r', 'q', 'k']:
for s in [0, 63, 36]:
print(f'pt: {pt}, from: {s}: to: {pseudo_legal[pt][s]}')
print()
# pseudo_legal_sq = pseudo_legal['b'][61]
# print(pseudo_legal_sq)
main()
Output
00 01 02 03 04 05 06 07
08 09 10 11 12 13 14 15
16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31
32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55
56 57 58 59 60 61 62 63
pt: n, from: 0: to: [10, 17]
pt: n, from: 63: to: [46, 53]
pt: n, from: 36: to: [19, 21, 26, 30, 42, 46, 51, 53]
pt: b, from: 0: to: [9, 18, 27, 36, 45, 54, 63]
pt: b, from: 63: to: [54, 45, 36, 27, 18, 9, 0]
pt: b, from: 36: to: [27, 18, 9, 0, 29, 22, 15, 45, 54, 63, 43, 50, 57]
pt: r, from: 0: to: [8, 16, 24, 32, 40, 48, 56, 1, 2, 3, 4, 5, 6, 7]
pt: r, from: 63: to: [55, 47, 39, 31, 23, 15, 7, 62, 61, 60, 59, 58, 57, 56]
pt: r, from: 36: to: [28, 20, 12, 4, 35, 34, 33, 32, 44, 52, 60, 37, 38, 39]
pt: q, from: 0: to: [9, 18, 27, 36, 45, 54, 63, 8, 16, 24, 32, 40, 48, 56, 1, 2, 3, 4, 5, 6, 7]
pt: q, from: 63: to: [54, 45, 36, 27, 18, 9, 0, 55, 47, 39, 31, 23, 15, 7, 62, 61, 60, 59, 58, 57, 56]
pt: q, from: 36: to: [27, 18, 9, 0, 28, 20, 12, 4, 29, 22, 15, 35, 34, 33, 32, 45, 54, 63, 44, 52, 60, 43, 50, 57, 37, 38, 39]
pt: k, from: 0: to: [9, 8, 1]
pt: k, from: 63: to: [54, 55, 62]
pt: k, from: 36: to: [27, 28, 29, 35, 45, 44, 43, 37]
I'm trying to build a batch generator which takes a large Pandas DataFrame as input and output as a given number of rows (batch_size). I practiced on the smaller dataframe with 10 rows to get it work. I have trouble with the generator function where the for loop below works well on the practice dataframe, and spits out the designated batch size:
for i in range(0, len(df), 3):
lower = i
upper = i+3
print(df.iloc[lower:upper])
However, trying to build this into a generator function is proving difficult:
def Generator(batch_size, seed = None):
num_items = len(df)
x = df.sample(frac = 1, replace = False, random_state = seed)
for offset in range(0, num_items, batch_size):
lower_limit = offset
upper_limit = offset+batch_size
batch = x.iloc[lower_limit:upper_limit]
yield batch
Unfortunately:
next(Generator(e.g.1))
returns the same row over and over again
I'm fairly new to working with this, and I feel I must be missing something, however, I can't spot what.
If anyone could point out what might be the issue I would very much appreciate it.
Edit:
The dataframe is predefined, it is:
raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy', 'Sarah', 'Gueniva', 'Know', 'Sara', 'Cat'],
'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Mornig', 'Jaker', 'Alom', 'Ormon', 'Koozer'],
'age': [42, 52, 36, 24, 73, 53, 26, 72, 73, 24],
'preTestScore': [4, 24, 31, 2, 3, 13, 52, 72, 26, 26],
'postTestScore': [25, 94, 57, 62, 70, 82, 52, 56, 234, 254]}
df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age', 'preTestScore', 'postTestScore'])
df
Create an iterator on the result of calling your Generator and next() that iterator. Else you recreate new Generator "states" for the generator which might have the same "first line" if you provide a seed.
After fixing the indentation problems it works as it should:
import pandas as pd
# I dislike variable scope bleeding into the function, provide df explicitly
def Generator(df, batch_size, seed = None):
num_items = len(df)
x = df.sample(frac = 1, replace = False, random_state = seed)
for offset in range(0, num_items, batch_size):
lower_limit = offset
upper_limit = offset+batch_size
batch = x.iloc[lower_limit:upper_limit]
yield batch
raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy', 'Sarah',
'Gueniva', 'Know', 'Sara', 'Cat'],
'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Mornig',
'Jaker', 'Alom', 'Ormon', 'Koozer'],
'age': [42, 52, 36, 24, 73, 53, 26, 72, 73, 24],
'preTestScore': [4, 24, 31, 2, 3, 13, 52, 72, 26, 26],
'postTestScore': [25, 94, 57, 62, 70, 82, 52, 56, 234, 254]}
df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age',
'preTestScore', 'postTestScore'])
# capture a "state" for the generator function
i = iter(Generator(df, 2))
# get the next states from the iterator and print
print(next(i))
print(next(i))
print(next(i))
Output:
first_name last_name age preTestScore postTestScore
8 Sara Ormon 73 26 234
6 Gueniva Jaker 26 52 52
first_name last_name age preTestScore postTestScore
5 Sarah Mornig 53 13 82
9 Cat Koozer 24 26 254
first_name last_name age preTestScore postTestScore
1 Molly Jacobson 52 24 94
2 Tina Ali 36 31 57
Alternativly you can do:
k = Generator(df, 1)
print(next(k))
print(next(k))
print(next(k))
wich works as well.
If you do
print(next(Generator(df, 2)))
print(next(Generator(df, 2)))
print(next(Generator(df, 2)))
You create three seperate shuffled df`s that might have the same line shown to you because you only ever print the first "iteration" of it and then it gets discarded
This Question is about challenge number 6 in set number 1 in the challenges of "the cryptopals crypto challenges".
The challenge is:
There's a file here. It's been base64'd after being encrypted with repeating-key XOR.
Decrypt it.
After that there's a description of steps to decrypt the file, There is total of 8 steps. You can find them in the site.
I have been trying to solve this challenge for a while and I am struggling with the final two steps. Even though I've solved challenge number 3, and it contains the solution for these steps.
Note: It is, of course, possible that there is a mistake in the first 6 steps but they seems to work well after looking at the print after every step.
My code:
Written in Python 3.6.
In order to not deal with web requests, and since it is not the purpose of this challenge. I just copied the content of the file to a string in the begging, You can do this as well before running the code.
import base64
# Encoding the file from base64 to binary
file = base64.b64decode("""HUIfTQsP...JwwRTWM=""")
print(file)
print()
# Step 1 - guess key size
KEYSIZE = 4
# Step 2 - find hamming distance - number of differing bits
def hamming2(s1, s2):
"""Calculate the Hamming distance between two bit strings"""
assert len(s1) == len(s2)
return sum(c1 != c2 for c1, c2 in zip(s1, s2))
def distance(a, b): # Hamming distance
calc = 0
for ca, cb in [(a[i], b[i]) for i in range(len(a))]:
bina = '{:08b}'.format(int(ca))
binb = '{:08b}'.format(int(cb))
calc += hamming2(bina, binb)
return calc
# Test step 2
print("distance: 'this is a test' and 'wokka wokka!!!' =", distance([ord(c) for c in "this is a test"], [ord(c) for c in "wokka wokka!!!"])) # 37 - Working
print()
# Step 3
key_sizes = []
# For each key size
for KEYSIZE in range(2, 41):
# take the first KEYSIZE worth of bytes, and the second KEYSIZE worth of bytes -
# file[0:KEYSIZE], file[KEYSIZE:2*KEYSIZE]
# and find the edit distance between them
# Normalize this result by dividing by KEYSIZE
key_sizes.append((distance(file[0:KEYSIZE], file[KEYSIZE:2*KEYSIZE]) / KEYSIZE, KEYSIZE))
key_sizes.sort(key=lambda a: a[0])
# Step 4
for val, key in key_sizes:
print(key, ":", val)
KEYSIZE = key_sizes[0][1]
print()
# Step 5 + 6
# Each line is a list of all the bytes in that index
splited_file = [[] for i in range(KEYSIZE)]
counter = 0
for char in file:
splited_file[counter].append(char)
counter += 1
counter %= KEYSIZE
for line in splited_file:
print(line)
print()
# Step 7
# Code from another level
# Gets a string and a single char
# Doing a single-byte XOR over it
def single_char_string(a, b):
final = ""
for c in a:
final += chr(c ^ b)
return final
# Going over all the bytes and listing the result arter the XOR by number of bytes
def find_single_byte(in_string):
helper_list = []
for num in range(256):
helper_list.append((single_char_string(in_string, num), num))
helper_list.sort(key=lambda a: a[0].count(' '), reverse=True)
return helper_list[0]
# Step 8
final_key = ""
key_list = []
for line in splited_file:
result = find_single_byte(line)
print(result)
final_key += chr(result[1])
key_list.append(result[1])
print(final_key)
print(key_list)
Output:
b'\x1dB\x1fM\x0b\x0f\x02\x1fO\x13N<\x1aie\x1fI...\x08VA;R\x1d\x06\x06TT\x0e\x10N\x05\x16I\x1e\x10\'\x0c\x11Mc'
distance: 'this is a test' and 'wokka wokka!!!' = 37
5 : 1.2
3 : 2.0
2 : 2.5
.
.
.
26 : 3.5
28 : 3.5357142857142856
9 : 3.5555555555555554
22 : 3.727272727272727
6 : 4.0
[29, 15, 78, 31, 19, 27, 0, 32, ... 17, 26, 78, 38, 28, 2, 1, 65, 6, 78, 16, 99]
[66, 2, 60, 73, 1, 1, 30, 3, 13, ... 26, 14, 0, 26, 79, 99, 8, 79, 11, 4, 82, 59, 84, 5, 39]
[31, 31, 19, 26, 79, 47, 17, 28, ... 71, 89, 12, 1, 16, 45, 78, 3, 120, 11, 42, 82, 84, 22, 12]
[77, 79, 105, 14, 7, 69, 73, 29, 101, ... 54, 70, 78, 55, 7, 79, 31, 88, 10, 69, 65, 8, 29, 14, 73, 17]
[11, 19, 101, 78, 78, 54, 100, 67, 82, ... 1, 76, 26, 1, 2, 73, 21, 72, 73, 49, 27, 86, 6, 16, 30, 77]
('=/n?3; \x00\x13&-,>1...r1:n\x06<"!a&n0C', 32)
('b"\x1ci!!>ts es(ogg ...5i<% tc:. :oC(o+$r\x1bt%\x07', 32)
('??:<+6!=ngm2i4\x0byD...&h9&2:-)sm.a)u\x06&=\x0ct&~n +=&*4X:<(3:o\x0f1<mE gy,!0\rn#X+\nrt6,', 32)
('moI.\'ei=Et\'\x1c:l ...6k=\x1b m~t*\x155\x1ei+=+ts/e*9$sgl0\'\x02\x16fn\x17\'o?x*ea(=.i1', 32)
('+3Enn\x16Dcr<$,)\x01...i5\x01,hi\x11;v&0>m', 32)
[32, 32, 32, 32, 32]
Notice that in the printing of the key as string you cannot see it but there is 5 chars in there.
It is not the correct answer since you can see that in the forth part - after the XOR, the results do not look like words... Probably a problem in the last two functions but I couldn't figure it out.
I've also tried some other lengths and It does not seems to be the problem.
So what I'm asking is not to fix my code, I want to solve this challenge by myself :). I would like you to tell me where I am wrong? why? and how should I continue?
Thank you for your help.
After a lot of thinking and checking the conclusion was that the problem is in step number 3. The result was not good enough since I looked only at the first two blocks.
I fixed the code so it will calculate the KEYSIZE according to all of the blocks.
The code of Step 3 now look like this:
# Step 3
key_sizes = []
# For each key size
for KEYSIZE in range(2, 41):
running_sum = []
for i in range(0, int(len(file) / KEYSIZE) - 1):
running_sum.append(distance(file[i * KEYSIZE:(i + 1) * KEYSIZE],
file[(i + 1) * KEYSIZE:(i + 2) * KEYSIZE]) / KEYSIZE)
key_sizes.append((sum(running_sum)/ len(running_sum), KEYSIZE))
key_sizes.sort(key=lambda a: a[0])
Thanks for any one who tried to help.
Was looking for a way to get the list of a partial row.
Name x y r
a 9 81 63
a 98 5 89
b 51 50 73
b 41 22 14
c 6 18 1
c 1 93 55
d 57 2 90
d 58 24 20
So i was trying to get the dictionary as follows,
di = {a:{0: [9,81,63], 1: [98,5,89]},
b:{0:[51,50,73], 1:[41,22,14]},
c:{0:[6,18,1], 1:[1,93,55]},
d:{0:[57,2,90], 1:[58,24,20]}}
Use groupby with custom function for count lists, last convert output Series to_dict:
di = (df.groupby('Name')['x','y','r']
.apply(lambda x: dict(zip(range(len(x)),x.values.tolist())))
.to_dict())
print (di)
{'b': {0: [51, 50, 73], 1: [41, 22, 14]},
'a': {0: [9, 81, 63], 1: [98, 5, 89]},
'c': {0: [6, 18, 1], 1: [1, 93, 55]},
'd': {0: [57, 2, 90], 1: [58, 24, 20]}}
Detail:
print (df.groupby('Name')['x','y','r']
.apply(lambda x: dict(zip(range(len(x)),x.values.tolist()))))
Name
a {0: [9, 81, 63], 1: [98, 5, 89]}
b {0: [51, 50, 73], 1: [41, 22, 14]}
c {0: [6, 18, 1], 1: [1, 93, 55]}
d {0: [57, 2, 90], 1: [58, 24, 20]}
dtype: object
Thank you volcano for suggestion use enumerate:
di = (df.groupby('Name')['x','y','r']
.apply(lambda x: dict(enumerate(x.values.tolist())))
.to_dict())
For better testing is possible use custom function:
def f(x):
#print (x)
a = range(len(x))
b = x.values.tolist()
print (a)
print (b)
return dict(zip(a,b))
[[9, 81, 63], [98, 5, 89]]
range(0, 2)
[[9, 81, 63], [98, 5, 89]]
range(0, 2)
[[51, 50, 73], [41, 22, 14]]
range(0, 2)
[[6, 18, 1], [1, 93, 55]]
range(0, 2)
[[57, 2, 90], [58, 24, 20]]
di = df.groupby('Name')['x','y','r'].apply(f).to_dict()
print (di)
Sometimes it is best to minimize the footprint and overhead.
Using itertools.count, collections.defaultdict
from itertools import count
from collections import defaultdict
counts = {k: count(0) for k in df.Name.unique()}
d = defaultdict(dict)
for k, *v in df.values.tolist():
d[k][next(counts[k])] = v
dict(d)
{'a': {0: [9, 81, 63], 1: [98, 5, 89]},
'b': {0: [51, 50, 73], 1: [41, 22, 14]},
'c': {0: [6, 18, 1], 1: [1, 93, 55]},
'd': {0: [57, 2, 90], 1: [58, 24, 20]}}
I'm trying to merge 5 lists into one 2d matrix in Python. The lists are named a0 ... a4 (all of the same length)
while ( i <= len(a0) ):
while ( k < 5):
matrix[i][k] = #here I want to assign a0[i], a1[i],..., a5[i]
k+=1
i+=1
Is there a way to make this work or do I have to go with something like:
while ( i <= len(a0) ):
matrix[i][0] = a0[i]
matrix[i][1] = a1[i]
....
If a0 through a4 are already lists... you just need to put all of them into ONE BIG list.
Let me know if this works for you:
a0 = [str(x) for x in range(10)]
a1 = [str(x) for x in range(10, 20)]
a2 = [str(x) for x in range(20, 30)]
a3 = [str(x) for x in range(30, 40)]
a4 = [str(x) for x in range(40, 50)]
print("a0: {}".format(", ".join(a0)))
print("a1: {}".format(", ".join(a1)))
print("a2: {}".format(", ".join(a2)))
print("a3: {}".format(", ".join(a3)))
print("a4: {}".format(", ".join(a4)))
matrix = [
a0,
a1,
a2,
a3,
a4
]
# Below is another way:
# matrix = []
# matrix.append(a0)
# matrix.append(a1)
# matrix.append(a2)
# matrix.append(a3)
# matrix.append(a4)
print("matrix[3][4]: {}".format(matrix[3][4]))
Output:
a0: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
a1: 10, 11, 12, 13, 14, 15, 16, 17, 18, 19
a2: 20, 21, 22, 23, 24, 25, 26, 27, 28, 29
a3: 30, 31, 32, 33, 34, 35, 36, 37, 38, 39
a4: 40, 41, 42, 43, 44, 45, 46, 47, 48, 49
matrix[3][4]: 34