How to break repeating-key XOR Challenge using Single-byte XOR cipher - python-3.x

This Question is about challenge number 6 in set number 1 in the challenges of "the cryptopals crypto challenges".
The challenge is:
There's a file here. It's been base64'd after being encrypted with repeating-key XOR.
Decrypt it.
After that there's a description of steps to decrypt the file, There is total of 8 steps. You can find them in the site.
I have been trying to solve this challenge for a while and I am struggling with the final two steps. Even though I've solved challenge number 3, and it contains the solution for these steps.
Note: It is, of course, possible that there is a mistake in the first 6 steps but they seems to work well after looking at the print after every step.
My code:
Written in Python 3.6.
In order to not deal with web requests, and since it is not the purpose of this challenge. I just copied the content of the file to a string in the begging, You can do this as well before running the code.
import base64
# Encoding the file from base64 to binary
file = base64.b64decode("""HUIfTQsP...JwwRTWM=""")
print(file)
print()
# Step 1 - guess key size
KEYSIZE = 4
# Step 2 - find hamming distance - number of differing bits
def hamming2(s1, s2):
"""Calculate the Hamming distance between two bit strings"""
assert len(s1) == len(s2)
return sum(c1 != c2 for c1, c2 in zip(s1, s2))
def distance(a, b): # Hamming distance
calc = 0
for ca, cb in [(a[i], b[i]) for i in range(len(a))]:
bina = '{:08b}'.format(int(ca))
binb = '{:08b}'.format(int(cb))
calc += hamming2(bina, binb)
return calc
# Test step 2
print("distance: 'this is a test' and 'wokka wokka!!!' =", distance([ord(c) for c in "this is a test"], [ord(c) for c in "wokka wokka!!!"])) # 37 - Working
print()
# Step 3
key_sizes = []
# For each key size
for KEYSIZE in range(2, 41):
# take the first KEYSIZE worth of bytes, and the second KEYSIZE worth of bytes -
# file[0:KEYSIZE], file[KEYSIZE:2*KEYSIZE]
# and find the edit distance between them
# Normalize this result by dividing by KEYSIZE
key_sizes.append((distance(file[0:KEYSIZE], file[KEYSIZE:2*KEYSIZE]) / KEYSIZE, KEYSIZE))
key_sizes.sort(key=lambda a: a[0])
# Step 4
for val, key in key_sizes:
print(key, ":", val)
KEYSIZE = key_sizes[0][1]
print()
# Step 5 + 6
# Each line is a list of all the bytes in that index
splited_file = [[] for i in range(KEYSIZE)]
counter = 0
for char in file:
splited_file[counter].append(char)
counter += 1
counter %= KEYSIZE
for line in splited_file:
print(line)
print()
# Step 7
# Code from another level
# Gets a string and a single char
# Doing a single-byte XOR over it
def single_char_string(a, b):
final = ""
for c in a:
final += chr(c ^ b)
return final
# Going over all the bytes and listing the result arter the XOR by number of bytes
def find_single_byte(in_string):
helper_list = []
for num in range(256):
helper_list.append((single_char_string(in_string, num), num))
helper_list.sort(key=lambda a: a[0].count(' '), reverse=True)
return helper_list[0]
# Step 8
final_key = ""
key_list = []
for line in splited_file:
result = find_single_byte(line)
print(result)
final_key += chr(result[1])
key_list.append(result[1])
print(final_key)
print(key_list)
Output:
b'\x1dB\x1fM\x0b\x0f\x02\x1fO\x13N<\x1aie\x1fI...\x08VA;R\x1d\x06\x06TT\x0e\x10N\x05\x16I\x1e\x10\'\x0c\x11Mc'
distance: 'this is a test' and 'wokka wokka!!!' = 37
5 : 1.2
3 : 2.0
2 : 2.5
.
.
.
26 : 3.5
28 : 3.5357142857142856
9 : 3.5555555555555554
22 : 3.727272727272727
6 : 4.0
[29, 15, 78, 31, 19, 27, 0, 32, ... 17, 26, 78, 38, 28, 2, 1, 65, 6, 78, 16, 99]
[66, 2, 60, 73, 1, 1, 30, 3, 13, ... 26, 14, 0, 26, 79, 99, 8, 79, 11, 4, 82, 59, 84, 5, 39]
[31, 31, 19, 26, 79, 47, 17, 28, ... 71, 89, 12, 1, 16, 45, 78, 3, 120, 11, 42, 82, 84, 22, 12]
[77, 79, 105, 14, 7, 69, 73, 29, 101, ... 54, 70, 78, 55, 7, 79, 31, 88, 10, 69, 65, 8, 29, 14, 73, 17]
[11, 19, 101, 78, 78, 54, 100, 67, 82, ... 1, 76, 26, 1, 2, 73, 21, 72, 73, 49, 27, 86, 6, 16, 30, 77]
('=/n?3; \x00\x13&-,>1...r1:n\x06<"!a&n0C', 32)
('b"\x1ci!!>ts es(ogg ...5i<% tc:. :oC(o+$r\x1bt%\x07', 32)
('??:<+6!=ngm2i4\x0byD...&h9&2:-)sm.a)u\x06&=\x0ct&~n +=&*4X:<(3:o\x0f1<mE gy,!0\rn#X+\nrt6,', 32)
('moI.\'ei=Et\'\x1c:l ...6k=\x1b m~t*\x155\x1ei+=+ts/e*9$sgl0\'\x02\x16fn\x17\'o?x*ea(=.i1', 32)
('+3Enn\x16Dcr<$,)\x01...i5\x01,hi\x11;v&0>m', 32)
[32, 32, 32, 32, 32]
Notice that in the printing of the key as string you cannot see it but there is 5 chars in there.
It is not the correct answer since you can see that in the forth part - after the XOR, the results do not look like words... Probably a problem in the last two functions but I couldn't figure it out.
I've also tried some other lengths and It does not seems to be the problem.
So what I'm asking is not to fix my code, I want to solve this challenge by myself :). I would like you to tell me where I am wrong? why? and how should I continue?
Thank you for your help.

After a lot of thinking and checking the conclusion was that the problem is in step number 3. The result was not good enough since I looked only at the first two blocks.
I fixed the code so it will calculate the KEYSIZE according to all of the blocks.
The code of Step 3 now look like this:
# Step 3
key_sizes = []
# For each key size
for KEYSIZE in range(2, 41):
running_sum = []
for i in range(0, int(len(file) / KEYSIZE) - 1):
running_sum.append(distance(file[i * KEYSIZE:(i + 1) * KEYSIZE],
file[(i + 1) * KEYSIZE:(i + 2) * KEYSIZE]) / KEYSIZE)
key_sizes.append((sum(running_sum)/ len(running_sum), KEYSIZE))
key_sizes.sort(key=lambda a: a[0])
Thanks for any one who tried to help.

Related

Pytorch transformation for just certain batch

Hi is there any method for apply trasnformation for certain batch?
It means, I want apply trasnformation for just last batch in every epochs.
What I tried is here
import torch
class test(torch.utils.data.Dataset):
def __init__(self):
self.source = [i for i in range(10)]
def __len__(self):
return len(self.source)
def __getitem__(self, idx):
print(idx)
return self.source[idx]
ds = test()
dl = torch.utils.data.DataLoader(dataset = ds, batch_size = 3,
shuffle = False, num_workers = 5)
for i in dl:
print(i)
because I thought that if I could get idx number, it would be possible to apply for certain batchs.
However If using num_workers outputs are
0
1
2
3
964
57
8
tensor([0, 1, 2])
tensor([3, 4, 5])
tensor([6, 7, 8])
tensor([9])
which are not I thought
without num_worker
0
1
2
tensor([0, 1, 2])
3
4
5
tensor([3, 4, 5])
6
7
8
tensor([6, 7, 8])
9
tensor([9])
So the question is
Why idx works so with num_workers?
How can I apply trasnform for certain batchs (or certain idx)?
When you have num_workers > 1, you have multiple subprocesses doing data loading in parallel. So what is likely happening is that there is a race condition for the print step, and the order you see in the output depends on which subprocess goes first each time.
For most transforms, you can apply them on a specific batch simply by calling the transform after the batch has been loaded. To do this just for the last batch, you could do something like:
for batch_idx, batch_data in dl:
# check if batch is the last batch
if ((batch_idx+1) * batch_size) >= len(ds):
batch_data = transform(batch_data)
I found that
class test_dataset(torch.utils.data.Dataset):
def __init__(self):
self.a = [i for i in range(100)]
def __len__(self):
return len(self.a)
def __getitem__(self, idx):
a = torch.tensor(self.a[idx])
#print(idx)
return idx
a = torch.utils.data.DataLoader(
test_dataset(), batch_size = 10, shuffle = False,
num_workers = 10, pin_memory = True)
for i in a:
print(i)
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
tensor([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
tensor([20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
tensor([30, 31, 32, 33, 34, 35, 36, 37, 38, 39])
tensor([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])
tensor([50, 51, 52, 53, 54, 55, 56, 57, 58, 59])
tensor([60, 61, 62, 63, 64, 65, 66, 67, 68, 69])
tensor([70, 71, 72, 73, 74, 75, 76, 77, 78, 79])
tensor([80, 81, 82, 83, 84, 85, 86, 87, 88, 89])
tensor([90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

Return range of integer list based on input number

I found this question in my test today, I have been trying to find correct answer for this but failing to do so.
Question is:
Imagine we have range of page numbers lets say 0, 100. When we click on page lets say 15, we only what to show 10 pages on UI i.e. from page 10 to 20
more example input: 50 output: returns list
[46,47,48,49,50,51,52,53,54,55]
input: 15
output: returns list
[11,12,13,14,15,16,17,18,19,20]
also list should include first page and last page i.e. 0 and 50
so the actual output would be for first example
[0,46,47,48,49,50,51,52,53,54,55,100]
Below is what I have tried
def get_thread_page_num(num, max_page_num):
# Returns 10 numbers dynamically
new_lst =[1,50]
# default list
# defult_lst = [1,2,3,4,5,6,7,8,9,10]
num -4 > 0
num+5 <max_page_num
i = 10
m = 4
p = 5
while i != 0:
if num-1 >0 and m !=0:
new_lst.append(num-m)
i=i-1
m = m-1
elif num+1<max_page_num and p != 0:
new_lst.append(num+p)
i=i-1
p = p-1
print(sorted(new_lst))
get_thread_page_num(9, 50)
In your code m and p starts with value 4 and 5 respectively. In every iteration, either of them decreases by 1. So, after 9 iteration both of them are 0 and new_lst contains 9 elements. Also i becomes 10-9 = 1.
But i never becomes 0 and the loop becomes infinite.
You can try below code instead. Please refer to the comments.
def get_thread_page_num(num, max_page_num):
# low and high denotes the low and high end of the list
# where middle element is num
low = max(0, num - 4)
high = min(num + 5, max_page_num)
lst = []
if max_page_num < 9:
# 10 element list is not possible
return lst
# In case high is same as max, just make the list as
# high-9, high -8, ..., high
if high == max_page_num:
lst = list(range(max(0, high - 9), high + 1))
else:
# Just create a list starting from low like -
# low, low + 1, ..., low + 9
lst = list(range(low, low+10))
# Add 0 and max if not already present
if 0 not in lst:
lst.append(0)
if max_page_num not in lst:
lst.append(max_page_num)
# return sorted lst
return sorted(lst)
Call to get_thread_page_num():
print(get_thread_page_num(15, 50))
print(get_thread_page_num(0, 50))
print(get_thread_page_num(2, 50))
print(get_thread_page_num(50, 50))
print(get_thread_page_num(43, 50))
Output:
[0, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 50]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 50]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 50]
[0, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]
[0, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 50]

How to write a loop that adds all the numbers from the list into a variable

I'm a total beginner, self-learner and I'm trying to solve the problem 5 from How to Think Like a Computer Scientist: Learning with Python 3. The problem looks like this:
xs = [12, 10, 32, 3, 66, 17, 42, 99, 20]
Write a loop that adds all the numbers from the list into a variable called total. You should set the total variable to have the value 0 before you start adding them up, and print the value in total after the loop has completed.
Here is what I tried to do:
for xs in [12, 10, 32, 3, 66, 17, 42, 99, 20]:
xs = [12, 10, 32, 3, 66, 17, 42, 99, 20]
total = 0
total = sum(xs)
print(total)
Should I use a for loop at all? Or should I use a sum function?
There is no need for a for loop here simply:
xs = [12, 10, 32, 3, 66, 17, 42, 99, 20]
total = sum(xs)
print(total)
If you really want to use a loop:
total = 0
xs = [12, 10, 32, 3, 66, 17, 42, 99, 20]
for i in xs:
total += i
print(total)

rounding off to next multiple of 5 in python

when we round 67 to its next multiple of 5 the answer is 70
where as when we round 64 the ans should be 65 but it comes to be 70
I looked for logic in C++
where I found 5*(grades[i]+4)/5) for calculation of next multiple of 5
my code implemented in python is:
ground=[]
grades=[73,64,67,38,33]
for i in range(len(grades)):
r=5*(round((grades[i]+4)/5))
ground.append(r)
print(ground)
expected output:[75,65,70,40,35]
but
actual output:[75,70,70,40,35]
Try this, This can be an optimised solution:
ground=[]
grades=[73,64,67,38,33]
for i in range(len(grades)):
temp = 5 - grades[i] % 5
ground.append(grades[i] + temp)
print(ground)
You should use the // operator (or int) instead of round:
ground = []
grades = [73, 64, 67, 38, 33]
for i in range(len(grades)):
r = 5 * ((grades[i] + 4) // 5)
ground.append(r)
print(ground)
Output:
[75, 65, 70, 40, 35]
Use modulo arithmetic on negative numbers to your advantage for this.
def round_up(n, base):
return n + (-n % base)
grades= [73, 64, 67, 38, 33]
rgrades = [round_up(n, 5) for n in grades] # [75, 65, 70, 40, 35]
This works on float and other number types.
round_up(70.5, 5) # 75.0

Sorting of lists in number ranges

list = [1,2,,3,4,5,6,1,2,56,78,45,90,34]
range = ["0-25","25-50","50-75","75-100"]
I am coding in python. I want to sort a list of integers in range of numbers and store them in differrent lists.How can i do it?
I have specified my ranges in the the range list.
Create a dictionary with max-value of each bin as key. Iterate through your numbers and append them to the list that's the value of each bin-key:
l = [1,2,3,4,5,6,1,2,56,78,45,90,34]
# your range covers 25 a piece - and share start/endvalues.
# I presume [0-25[ ranges
def inRanges(data,maxValues):
"""Sorts elements of data into bins that have a max-value. Max-values are
given by the list maxValues which holds the exclusive upper bound of the bins."""
d = {k:[] for k in maxValues} # init all keys to empty lists
for n in data:
key = min(x for x in maxValues if x>n) # get key
d[key].append(n) # add number
return d
sortEm = inRanges(l,[25,50,75,100])
print(sortEm)
print([ x for x in sortEm.values()])
Output:
{25: [1, 2, 3, 4, 5, 6, 1, 2], 50: [25, 45, 34],
75: [56], 100: [78, 90]}
[[1, 2, 3, 4, 5, 6, 1, 2], [25, 45, 34], [56], [78, 90]]
Another stable bin approach for your special case (regular intervaled bins) would be to use a calculated key - this would get rid of the key-search in each step.
Stable search means the order of numbers in the list is the same as in the input data:
def inRegularIntervals(data, interval):
"""Sorts elements of data into bins of regular sizes.
The size of each bin is given by 'interval'."""
# init dict so keys are ordered - collection.defaultdict(list)
# would be faster - but this works for lists of a couple of
# thousand numbers if you have a quarter up to one second ...
# if random key order is ok, shorten this to d = {}
d = {k:[] for k in range(0, max(data), interval)}
for n in data:
key = n // interval # get key
key *= interval
d.setdefault(key, [])
d[key ].append(n) # add number
return d
Use on random data:
from random import choices
data = choices(range(100), k = 50)
data.append(135) # add a bigger value to see the gapped keys
binned = inRegularIntervals(data, 25)
print(binned)
Output (\n and spaces added):
{ 0: [19, 9, 1, 0, 15, 22, 4, 9, 12, 7, 12, 9, 16, 2, 7],
25: [25, 31, 37, 45, 30, 48, 44, 44, 31, 39, 27, 36],
50: [50, 50, 58, 60, 70, 69, 53, 53, 67, 59, 52, 64],
75: [86, 93, 78, 93, 99, 98, 95, 75, 88, 82, 79],
100: [],
125: [135], }
To sort the binned lists in place, use
for k in binned:
binned[k].sort()
to get:
{ 0: [0, 1, 2, 4, 7, 7, 9, 9, 9, 12, 12, 15, 16, 19, 22],
25: [25, 27, 30, 31, 31, 36, 37, 39, 44, 44, 45, 48],
50: [50, 50, 52, 53, 53, 58, 59, 60, 64, 67, 69, 70],
75: [75, 78, 79, 82, 86, 88, 93, 93, 95, 98, 99],
100: [],
125: [135]}

Resources