Find end period of maximum drawdown? - python-3.x

I looked at this answer where they have a really smart solution to find the point where the maximum drawdown starts, and where the maximum drawdown has it's lowest point. However, according to Wikipedia this is not the end of the period (as is claimed in the answer). The end of the period is when you reach the same peak value you had before the drawdown period began.
This implies that the drawdown period does not have an end for the graph given in the answer I linked to. I'm trying to write a code that can solve this, for both cases {1) it has an end period, 2) it has no end period}.
If the period never ends, I just want it to return the last index of the array (so basically the length of the array), and if the period does indeed end, I want the correct index. Here is a simplified example where I've tried to solve the first case - when it has an end period:
import numpy as np
an_array = [21, 22, 23, 40, 19, 35, 37, 45, 42, 39, 28]
running_maximum = np.maximum.accumulate(an_array)
# running_maximum = [21, 22, 23, 40, 40, 40, 40, 45, 45, 45, 45]
bottom_index = np.argmax(running_maximum - an_array)
start_index = np.argmax(an_array[:bottom_of_period])
# bottom_index = 4, start_index = 3
difference = running_maximum - an_array
# difference = [0, 0, 0, 0, 21, 5, 3, 0, 3, 6, 17]
The reason I compute difference, is because it makes it easy to see that end_index=7. This is because the maximum drawdown is 21 at index 4, and since difference=0 again at index 7, that means I have gone past (or just reached) my peak again. I tried writing np.argmin(difference[bottom_index:]) to get the index, but of course this doesn't give me 7 since I slice the vector difference, it gives me 3 instead which is incorrect.
Any tips on how I could solve this, and also make it so it returns the last index in cases where there is no end period would be amazing.

I think this solves it;
import numpy as np
an_array = [21, 22, 23, 40, 19, 35, 37, 45, 42, 39, 28]
running_maximum = np.maximum.accumulate(an_array)
difference = running_maximum - an_array
bottom_index = np.argmax(difference)
start_index = np.argmax(an_array[:bottom_index])
if difference[bottom_index:].__contains__(0):
end_index = len(difference[:bottom_index]) + np.argmin(difference[bottom_index:])
else:
end_index = len(difference)
With the given example array, I get end_index = 7. If I change an_array[4]=39, the maximum drawback no longer has an end, and I get end_index = 11. Not sure if __contains__(0) is efficient though.

Related

Algorithm to find number of elements in list that are minus than/equal to the current element with complexity O(nlogn)

Thanks for visiting my post!
First a brief description of the task I'm working on:
Given a list_A of integers and a list_B, for every element in list_A, insert in the same index in list_B the count of elements that are minus than or equal to the current element in all the list_A.
Practical case:
list_A = [111, 192, 171, 391, 91, 142, 31, 373, 493, 468]
list_B = [2, 5, 4, 7, 1, 3, 0, 6, 9, 8]
Difficulty:
The problem I'm facing is to write this algorithm as O(nlogn),
do you guys have some ideas?
def count_min_equal_inlist(value, list_to_check):
counter = 0
for element in list_to_check:
if element <= value:
counter += 1
return counter - 1 #minus 1 because the element itself is not to be counted
vettore_A = [1,2,3,4,5,6,7,8,9,0]
vettore_B = [0]*len(vettore_A)
for i in range(len(vettore_A)):
vettore_B[i] = count_min_equal_inlist(vettore_A[i], vettore_A)
...the count of elements that are minus than or equal to the current element in all the list_A.
Yet, your code reduces that count by 1 (count - 1), and also the expected output shows a 0, so somehow this description is not telling the whole story. I will assume that we need the count among all other elements that are less than or equal to the current element.
Then you can use this logic:
create a sorted version of the input list
Use binary search to find the position of the last occurrence of each value in that sorted version
Implementation:
from bisect import bisect
def solve(list_to_check):
sorted_list = sorted(list_to_check)
return [bisect(sorted_list, val) - 1 for val in list_to_check]
list_A = [111, 192, 171, 391, 91, 142, 31, 373, 493, 468]
print(solve(list_A))

Return range of integer list based on input number

I found this question in my test today, I have been trying to find correct answer for this but failing to do so.
Question is:
Imagine we have range of page numbers lets say 0, 100. When we click on page lets say 15, we only what to show 10 pages on UI i.e. from page 10 to 20
more example input: 50 output: returns list
[46,47,48,49,50,51,52,53,54,55]
input: 15
output: returns list
[11,12,13,14,15,16,17,18,19,20]
also list should include first page and last page i.e. 0 and 50
so the actual output would be for first example
[0,46,47,48,49,50,51,52,53,54,55,100]
Below is what I have tried
def get_thread_page_num(num, max_page_num):
# Returns 10 numbers dynamically
new_lst =[1,50]
# default list
# defult_lst = [1,2,3,4,5,6,7,8,9,10]
num -4 > 0
num+5 <max_page_num
i = 10
m = 4
p = 5
while i != 0:
if num-1 >0 and m !=0:
new_lst.append(num-m)
i=i-1
m = m-1
elif num+1<max_page_num and p != 0:
new_lst.append(num+p)
i=i-1
p = p-1
print(sorted(new_lst))
get_thread_page_num(9, 50)
In your code m and p starts with value 4 and 5 respectively. In every iteration, either of them decreases by 1. So, after 9 iteration both of them are 0 and new_lst contains 9 elements. Also i becomes 10-9 = 1.
But i never becomes 0 and the loop becomes infinite.
You can try below code instead. Please refer to the comments.
def get_thread_page_num(num, max_page_num):
# low and high denotes the low and high end of the list
# where middle element is num
low = max(0, num - 4)
high = min(num + 5, max_page_num)
lst = []
if max_page_num < 9:
# 10 element list is not possible
return lst
# In case high is same as max, just make the list as
# high-9, high -8, ..., high
if high == max_page_num:
lst = list(range(max(0, high - 9), high + 1))
else:
# Just create a list starting from low like -
# low, low + 1, ..., low + 9
lst = list(range(low, low+10))
# Add 0 and max if not already present
if 0 not in lst:
lst.append(0)
if max_page_num not in lst:
lst.append(max_page_num)
# return sorted lst
return sorted(lst)
Call to get_thread_page_num():
print(get_thread_page_num(15, 50))
print(get_thread_page_num(0, 50))
print(get_thread_page_num(2, 50))
print(get_thread_page_num(50, 50))
print(get_thread_page_num(43, 50))
Output:
[0, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 50]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 50]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 50]
[0, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]
[0, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 50]

How to break repeating-key XOR Challenge using Single-byte XOR cipher

This Question is about challenge number 6 in set number 1 in the challenges of "the cryptopals crypto challenges".
The challenge is:
There's a file here. It's been base64'd after being encrypted with repeating-key XOR.
Decrypt it.
After that there's a description of steps to decrypt the file, There is total of 8 steps. You can find them in the site.
I have been trying to solve this challenge for a while and I am struggling with the final two steps. Even though I've solved challenge number 3, and it contains the solution for these steps.
Note: It is, of course, possible that there is a mistake in the first 6 steps but they seems to work well after looking at the print after every step.
My code:
Written in Python 3.6.
In order to not deal with web requests, and since it is not the purpose of this challenge. I just copied the content of the file to a string in the begging, You can do this as well before running the code.
import base64
# Encoding the file from base64 to binary
file = base64.b64decode("""HUIfTQsP...JwwRTWM=""")
print(file)
print()
# Step 1 - guess key size
KEYSIZE = 4
# Step 2 - find hamming distance - number of differing bits
def hamming2(s1, s2):
"""Calculate the Hamming distance between two bit strings"""
assert len(s1) == len(s2)
return sum(c1 != c2 for c1, c2 in zip(s1, s2))
def distance(a, b): # Hamming distance
calc = 0
for ca, cb in [(a[i], b[i]) for i in range(len(a))]:
bina = '{:08b}'.format(int(ca))
binb = '{:08b}'.format(int(cb))
calc += hamming2(bina, binb)
return calc
# Test step 2
print("distance: 'this is a test' and 'wokka wokka!!!' =", distance([ord(c) for c in "this is a test"], [ord(c) for c in "wokka wokka!!!"])) # 37 - Working
print()
# Step 3
key_sizes = []
# For each key size
for KEYSIZE in range(2, 41):
# take the first KEYSIZE worth of bytes, and the second KEYSIZE worth of bytes -
# file[0:KEYSIZE], file[KEYSIZE:2*KEYSIZE]
# and find the edit distance between them
# Normalize this result by dividing by KEYSIZE
key_sizes.append((distance(file[0:KEYSIZE], file[KEYSIZE:2*KEYSIZE]) / KEYSIZE, KEYSIZE))
key_sizes.sort(key=lambda a: a[0])
# Step 4
for val, key in key_sizes:
print(key, ":", val)
KEYSIZE = key_sizes[0][1]
print()
# Step 5 + 6
# Each line is a list of all the bytes in that index
splited_file = [[] for i in range(KEYSIZE)]
counter = 0
for char in file:
splited_file[counter].append(char)
counter += 1
counter %= KEYSIZE
for line in splited_file:
print(line)
print()
# Step 7
# Code from another level
# Gets a string and a single char
# Doing a single-byte XOR over it
def single_char_string(a, b):
final = ""
for c in a:
final += chr(c ^ b)
return final
# Going over all the bytes and listing the result arter the XOR by number of bytes
def find_single_byte(in_string):
helper_list = []
for num in range(256):
helper_list.append((single_char_string(in_string, num), num))
helper_list.sort(key=lambda a: a[0].count(' '), reverse=True)
return helper_list[0]
# Step 8
final_key = ""
key_list = []
for line in splited_file:
result = find_single_byte(line)
print(result)
final_key += chr(result[1])
key_list.append(result[1])
print(final_key)
print(key_list)
Output:
b'\x1dB\x1fM\x0b\x0f\x02\x1fO\x13N<\x1aie\x1fI...\x08VA;R\x1d\x06\x06TT\x0e\x10N\x05\x16I\x1e\x10\'\x0c\x11Mc'
distance: 'this is a test' and 'wokka wokka!!!' = 37
5 : 1.2
3 : 2.0
2 : 2.5
.
.
.
26 : 3.5
28 : 3.5357142857142856
9 : 3.5555555555555554
22 : 3.727272727272727
6 : 4.0
[29, 15, 78, 31, 19, 27, 0, 32, ... 17, 26, 78, 38, 28, 2, 1, 65, 6, 78, 16, 99]
[66, 2, 60, 73, 1, 1, 30, 3, 13, ... 26, 14, 0, 26, 79, 99, 8, 79, 11, 4, 82, 59, 84, 5, 39]
[31, 31, 19, 26, 79, 47, 17, 28, ... 71, 89, 12, 1, 16, 45, 78, 3, 120, 11, 42, 82, 84, 22, 12]
[77, 79, 105, 14, 7, 69, 73, 29, 101, ... 54, 70, 78, 55, 7, 79, 31, 88, 10, 69, 65, 8, 29, 14, 73, 17]
[11, 19, 101, 78, 78, 54, 100, 67, 82, ... 1, 76, 26, 1, 2, 73, 21, 72, 73, 49, 27, 86, 6, 16, 30, 77]
('=/n?3; \x00\x13&-,>1...r1:n\x06<"!a&n0C', 32)
('b"\x1ci!!>ts es(ogg ...5i<% tc:. :oC(o+$r\x1bt%\x07', 32)
('??:<+6!=ngm2i4\x0byD...&h9&2:-)sm.a)u\x06&=\x0ct&~n +=&*4X:<(3:o\x0f1<mE gy,!0\rn#X+\nrt6,', 32)
('moI.\'ei=Et\'\x1c:l ...6k=\x1b m~t*\x155\x1ei+=+ts/e*9$sgl0\'\x02\x16fn\x17\'o?x*ea(=.i1', 32)
('+3Enn\x16Dcr<$,)\x01...i5\x01,hi\x11;v&0>m', 32)
[32, 32, 32, 32, 32]
Notice that in the printing of the key as string you cannot see it but there is 5 chars in there.
It is not the correct answer since you can see that in the forth part - after the XOR, the results do not look like words... Probably a problem in the last two functions but I couldn't figure it out.
I've also tried some other lengths and It does not seems to be the problem.
So what I'm asking is not to fix my code, I want to solve this challenge by myself :). I would like you to tell me where I am wrong? why? and how should I continue?
Thank you for your help.
After a lot of thinking and checking the conclusion was that the problem is in step number 3. The result was not good enough since I looked only at the first two blocks.
I fixed the code so it will calculate the KEYSIZE according to all of the blocks.
The code of Step 3 now look like this:
# Step 3
key_sizes = []
# For each key size
for KEYSIZE in range(2, 41):
running_sum = []
for i in range(0, int(len(file) / KEYSIZE) - 1):
running_sum.append(distance(file[i * KEYSIZE:(i + 1) * KEYSIZE],
file[(i + 1) * KEYSIZE:(i + 2) * KEYSIZE]) / KEYSIZE)
key_sizes.append((sum(running_sum)/ len(running_sum), KEYSIZE))
key_sizes.sort(key=lambda a: a[0])
Thanks for any one who tried to help.

Python for every sequence of random sample generated also include an individual ID

I am trying to program a Lotto simulator, where the code generates 6 random unique numbers out of 45 for about 1000 players where each player has a unique ID. I want to place it into an array that looks like this:
lotto[0...n-1][0...5]
Where [0...n-1] contains the players ID, and [0...5] their unique 6 game numbers.
So it should look something like this when printed
lotto[1][32, 34, 24, 13, 20, 8]
lotto[2][1, 27, 4, 41, 33, 17]
...
lotto[1000][6, 12, 39, 16, 45, 3]
What is the best way of doing something like this without actually merging the two arrays together?
As later on I want to use a merge-sort algorithm to then numerically order the game numbers for each player so it would look something like this without the players ID interfering with the game numbers.
lotto[1][8, 13, 20, 24, 32, 34]
lotto[2][1, 4, 17, 27, 33, 41]
So far I've got:
playerID = list(range(1, 1001))
playerNum = random.sample(range(1, 45), 6)
print(playerID + playerNum)
But that just prints and joins:
[1, 2, 3, ..., 1000, 32, 5, 19, 27, 6, 22]
Thanks for the help.
import random
n_players = 1000
lotto = [random.sample(range(1, 45), 6) for _ in range(n_players)]
OR
import random
n_players = 1000
tup = tuple(range(1, 45))
lotto = []
for _ in range(n_players):
lotto.append(random.sample(tup, 6))

How to obtain the indices of all maximum values in array A that correspond to unique values in array B?

Suppose one has an array of observation times ts, each of which corresponds to some observed value in vs. The observation times are taken to be the number of elapsed hours (starting from zero) and can contain duplicates. I would like to find the indices that correspond to the maximum observed value per unique observation time. I am asking for the indices as opposed to the values, unlike a similar question I asked several months ago. This way, I can apply the same indices on various arrays. Below is a sample dataset, which I would like to use to adapt a code for a much larger dataset.
import numpy as np
ts = np.array([0, 0, 1, 2, 3, 3, 3, 4, 4, 5, 6, 7, 8, 8, 9, 10])
vs = np.array([500, 600, 550, 700, 500, 500, 450, 800, 900, 700, 600, 850, 850, 900, 900, 900])
My current approach is to split the array of values at any points at which there is not a duplicate time.
condition = np.where(np.diff(ts) != 0)[0]+1
ts_spl = np.split(ts, condition)
vs_spl = np.split(vs, condition)
print(ts_spl)
>> [array([0, 0]), array([1]), array([2]), array([3, 3, 3]), array([4, 4]), array([5]), array([6]), array([7]), array([8, 8]), array([9]), array([10])]
print(vs_spl)
>> [array([500, 600]), array([550]), array([700]), array([500, 500, 450]), array([800, 900]), array([700]), array([600]), array([850]), array([850, 900]), array([900]), array([900])]
In this case, duplicate max values at any duplicate times should be counted. Given this example, the returned indices would be:
[1, 2, 3, 4, 5, 8, 9, 10, 11, 13, 14, 15]
# indices = 4,5,6 correspond to values = 500, 500, 450 ==> count indices 4,5
# I might modify this part of the algorithm to return either 4 or 5 instead of 4,5 at some future time
Though I have not yet been able to adapt this algorithm for my purpose, I think it must be possible to exploit the size of each previously-split array in vs_spl to keep an index counter. Is this approach feasible for a large dataset (10,000 elements per array before padding; 70,000 elements per array after padding)? If so, how can I adapt it? If not, what are some other approaches that may be useful here?
70,000 isn't that insanely large, so yes it should be feasible. It is, however, faster to avoid the splitting and use the .reduceat method of relevant ufuncs. reduceat is like reduce applied to chunks, but you don't have to provide the chunks, just tell reduceat where you would have cut to get them. For example, like so
import numpy as np
N = 10**6
ts = np.cumsum(np.random.rand(N) < 0.1)
vs = 50*np.random.randint(10, 20, (N,))
#ts = np.array([0, 0, 1, 2, 3, 3, 3, 4, 4, 5, 6, 7, 8, 8, 9, 10])
#vs = np.array([500, 600, 550, 700, 500, 500, 450, 800, 900, 700, 600, 850, 850, 900, 900, 900])
# flatnonzero is a bit faster than where
condition = np.r_[0, np.flatnonzero(np.diff(ts)) + 1, len(ts)]
sizes = np.diff(condition)
maxima = np.repeat(np.maximum.reduceat(vs, condition[:-1]), sizes)
maxat = maxima == vs
indices = np.flatnonzero(maxat)
# if you want to know how many maxima at each hour
nmax = np.add.reduceat(maxat, condition[:-1])

Resources