Calculate all compound growth rates from a list of values - python-3.x

I have a list of growth rates and would like to calculate all available compounded growth rates:
l = [0.3, 0.2, 0.1]
Output (as a list):
o = [0.56, 0.716]
calculation detail about the compounded growth rates:
0.56 = (1 + 0.3) * (1 + 0.2) - 1
0.716 = (1 + 0.3) * (1 + 0.2) * (1 + 0.1) - 1
The function should be flexible to the length of the input list.

You could express the computation with list comprehensions / generator expressions and using itertools.accumulate to handle the compounding:
import itertools as IT
import operator
def compound_growth_rates(l):
result = [xy-1 for xy in
IT.islice(IT.accumulate((1+x for x in l), operator.mul), 1, None)]
return result
l = [0.3, 0.2, 0.1]
print(compound_growth_rates(l))
prints
[0.56, 0.7160000000000002]
Or, equivalently, you could write this with list-comprehensions and a for-loop:
def compound_growth_rates(l):
add_one = [1+x for x in l]
products = [add_one[0]]
for x1 in add_one[1:]:
products.append(x1*products[-1])
result = [p-1 for p in products[1:]]
return result
I think the advantage of using itertools.accumulate is that it expresses the
intent of the code better than the for-loop. But the for-loop may be more
readable in the sense that it uses more commonly known syntax.

Related

How can I input my binary string into the linear complexity test?

I am hoping to gain help to understand how and where I would insert my own binary string that I generated in order to test it for randomness through the linear complexity test. I am very new to coding and would appreciate any help I could get. I uploaded a picture of the code I was using as I was unsuccessful in running the test.
Thanks in advance!
from copy import copy as copy
from numpy import dot as dot
from numpy import histogram as histogram
from numpy import zeros as zeros
from scipy.special import gammainc as gammainc
class ComplexityTest:
#staticmethod
def linear_complexity_test(my_binary_string:str, verbose=False, block_size=4):
"""
Note that this description is taken from the NIST documentation [1]
[1] http://csrc.nist.gov/publications/nistpubs/800-22-rev1a/SP800-22rev1a.pdf
The focus of this test is the length of a linear feedback shift register (LFSR). The purpose of this test is to
determine whether or not the sequence is complex enough to be considered random. Random sequences are
characterized by longer LFSRs. An LFSR that is too short implies non-randomness.
:param my_binary_string: a binary string
:param verbose True to display the debug messgae, False to turn off debug message
:param block_size: Size of the block
:return: (p_value, bool) A tuple which contain the p_value and result of frequency_test(True or False)
"""
my_binary_string = '0010101100010010100010011110110101111010011111110001111001101101'
length_of_my_binary_string = len(my_binary_string)
# The number of degrees of freedom;
# K = 6 has been hard coded into the test.
degree_of_freedom = 6
# π0 = 0.010417, π1 = 0.03125, π2 = 0.125, π3 = 0.5, π4 = 0.25, π5 = 0.0625, π6 = 0.020833
# are the probabilities computed by the equations in Section 3.10
pi = [0.01047, 0.03125, 0.125, 0.5, 0.25, 0.0625, 0.020833]
t2 = (block_size / 3.0 + 2.0 / 9) / 2 ** block_size
mean = 0.5 * block_size + (1.0 / 36) * (9 + (-1) ** (block_size + 1)) - t2
number_of_block = int(length_of_my_binary_string / block_size)
if number_of_block > 1:
block_end = block_size
block_start = 0
blocks = []or i in range(number_of_block):
blocks.append(binary_data[block_start:block_end])
block_start += block_size
block_end += block_size
complexities = []
for block in blocks:
complexities.append(ComplexityTest.berlekamp_massey_algorithm(block))
t = ([-1.0 * (((-1) ** block_size) * (chunk - mean) + 2.0 / 9) for chunk in complexities])
vg = histogram(t, bins=[-9999999999, -2.5, -1.5, -0.5, 0.5, 1.5, 2.5, 9999999999])[0][::-1]
im = ([((vg[ii] - number_of_block * pi[ii]) ** 2) / (number_of_block * pi[ii]) for ii in range(7)])
xObs = 0.0
for i in range(len(pi)):
xObs += im[i]
# P-Value = igamc(K/2, xObs/2)
p_value = gammainc(degree_of_freedom / 2.0, xObs / 2.0)
if verbose:
print('Linear Complexity Test DEBUG BEGIN:')
print("\tLength of input:\t", length_of_binary_data)
print('\tLength in bits of a block:\t', )
print("\tDegree of Freedom:\t\t", degree_of_freedom)
print('\tNumber of Blocks:\t', number_of_block)
print('\tValue of Vs:\t\t', vg)
print('\txObs:\t\t\t\t', xObs)
print('\tP-Value:\t\t\t', p_value)
print('DEBUG END.')
return (p_value, (p_value >= 0.01))
else:
return (-1.0, False)

Manually implementing approximation functions

I have a dataset from kaggle of 45,253 rows and a single column for temperature in Kelvin for the city of Detroit. It's mean = 282.97, std = 11, min = 243.48, max = 308.05.
This is the result when plotted as a histogram of 100 bins with density=True:
I am expected to write the following two functions and see whichever one approximates the closest to the histogram:
Like this one here using scipy.stats.norm.pdf:
I generated the above image using:
x = np.linspace(dataset.Detroit.min(), dataset.Detroit.max(), 1001)
P_norm = norm.pdf(x, dataset.Detroit.mean(), dataset.Detroit.std())
plot_pdf_single(x, P_norm)
However, whenever I try to implement any of the two approximation functions all of my values for P_norm result in 0s or infs.
This is what I tried:
P_norm = [(1.0/(np.sqrt(2.0*pi*(std*std))))*np.exp(((-x_i-mu)*(-x_i-mu))/(2.0*(std*std))) for x_i in x]
I also broke it down into parts for a single x_i:
part1 = ((-x[0] - mu)*(-x[0] - mu)) / (2.0*(std * std))
part2 = np.exp(part1)
part3 = 1.0 / (np.sqrt(2.0 * pi * (std*std)))
total = part3*part2
I got the following values:
1145.3913234604413
inf
0.036267480036493875
inf
Since both of the equations use the same formula:
def pdf_approximation(x_i, mu, std):
return (1.0 / (np.sqrt(2.0 * pi * (std*std)))) * np.exp((-(x_i-mu)*(x_i-mu)) / (2.0 * (std*std)))
The code for the first approximation is:
mu = 283
std = 11
P_norm = np.array([pdf_approximation(x_i, mu, std) for x_i in x])
plot_pdf_single(x, P_norm)
The code for the second approximation is:
mu1 = 276
std1 = 6
mu2 = 293
std2 = 6.5
P_norm = np.array([(pdf_approximation(x_i, mu1, std1) * 0.5) + (pdf_approximation(x_i, mu2, std2) * 0.5) for x_i in x])
plot_pdf_single(x, P_norm)

Numpy tensor implementation slower than loop

I have two functions that compute the same metric. One ends up using a list comprehension to cycle through a calculation, the other uses only numpy tensor operations. The functions take in a (N, 3) array, where N is the number of points in 3D space. When N <~ 3000 the tensor function is faster, when N >~ 3000 the list comprehension is faster. Both seem to have linear time complexity in terms of N i.e two time-N lines cross at N=~3000.
def approximate_area_loop(section, num_area_divisions):
n_a_d = num_area_divisions
interp_vectors = get_section_interp_(section)
a1 = section[:-1]
b1 = section[1:]
a2 = interp_vectors[:-1]
b2 = interp_vectors[1:]
c = lambda u: (1 - u) * a1 + u * a2
d = lambda u: (1 - u) * b1 + u * b2
x = lambda u, v: (1 - v) * c(u) + v * d(u)
area = np.sum([np.linalg.norm(np.cross((x((i + 1)/n_a_d, j/n_a_d) - x(i/n_a_d, j/n_a_d)),\
(x(i/n_a_d, (j +1)/n_a_d) - x(i/n_a_d, j/n_a_d))), axis = 1)\
for i in range(n_a_d) for j in range(n_a_d)])
Dt = section[-1, 0] - section[0, 0]
return area, Dt
def approximate_area_tensor(section, num_area_divisions):
divisors = np.linspace(0, 1, num_area_divisions + 1)
interp_vectors = get_section_interp_(section)
a1 = section[:-1]
b1 = section[1:]
a2 = interp_vectors[:-1]
b2 = interp_vectors[1:]
c = np.multiply.outer(a1, (1 - divisors)) + np.multiply.outer(a2, divisors) # c_areas_vecs_divs
d = np.multiply.outer(b1, (1 - divisors)) + np.multiply.outer(b2, divisors) # d_areas_vecs_divs
x = np.multiply.outer(c, (1 - divisors)) + np.multiply.outer(d, divisors) # x_areas_vecs_Divs_divs
u = x[:, :, 1:, :-1] - x[:, :, :-1, :-1] # u_areas_vecs_Divs_divs
v = x[:, :, :-1, 1:] - x[:, :, :-1, :-1] # v_areas_vecs_Divs_divs
sub_area_norm_vecs = np.cross(u, v, axis = 1) # areas_crosses_Divs_divs
sub_areas = np.linalg.norm(sub_area_norm_vecs, axis = 1) # areas_Divs_divs (values are now sub areas)
area = np.sum(sub_areas)
Dt = section[-1, 0] - section[0, 0]
return area, Dt
Why does the list comprehension version work faster at large N? Surely the tensor version should be faster? I'm wondering if it's something to do with the size of the calculations meaning it's too big to be done in cache? Please ask if I haven't included enough information, I'd really like to get to the bottom of this.
The bottleneck in the fully vectorized function was indeed in the np.linalg.norm as #hpauljs comment suggested.
Norm was used only to get the magnitude of all the vectors contained in axis 1. A much simpler and faster method was to just:
sub_areas = np.sqrt((sub_area_norm_vecs*sub_area_norm_vecs).sum(axis = 1))
This gives exactly the same results and sped up the code by up to 25 times faster than the loop implementation (even when the loop doesn't use linalg.norm either).

finding optimum lambda and features for polynomial regression

I am new to Data Mining/ML. I've been trying to solve a polynomial regression problem of predicting the price from given input parameters (already normalized within range[0, 1])
I'm quite close as my output is in proportion to the correct one, but it seems a bit suppressed, my algorithm is correct, just don't know how to reach to an appropriate lambda, (regularized parameter) and how to decide to what extent I should populate features as the problem says : "The prices per square foot, are (approximately) a polynomial function of the features. This polynomial always has an order less than 4".
Is there a way we could visualize data to find optimum value for these parameters, like we find optimal alpha (step size) and number of iterations by visualizing cost function in linear regression using gradient descent.
Here is my code : http://ideone.com/6ctDFh
from numpy import *
def mapFeature(X1, X2):
degree = 2
out = ones((shape(X1)[0], 1))
for i in range(1, degree+1):
for j in range(0, i+1):
term1 = X1**(i-j)
term2 = X2 ** (j)
term = (term1 * term2).reshape( shape(term1)[0], 1 )
"""note that here 'out[i]' represents mappedfeatures of X1[i], X2[i], .......... out is made to store features of one set in out[i] horizontally """
out = hstack(( out, term ))
return out
def solve():
n, m = input().split()
m = int(m)
n = int(n)
data = zeros((m, n+1))
for i in range(0, m):
ausi = input().split()
for k in range(0, n+1):
data[i, k] = float(ausi[k])
X = data[:, 0 : n]
y = data[:, n]
theta = zeros((6, 1))
X = mapFeature(X[:, 0], X[:, 1])
ausi = computeCostVect(X, y, theta)
# print(X)
print("Results usning BFGS : ")
lamda = 2
theta, cost = findMinTheta(theta, X, y, lamda)
test = [0.05, 0.54, 0.91, 0.91, 0.31, 0.76, 0.51, 0.31]
print("prediction for 0.31 , 0.76 (using BFGS) : ")
for i in range(0, 7, 2):
print(mapFeature(array([test[i]]), array([test[i+1]])).dot( theta ))
# pyplot.plot(X[:, 1], y, 'rx', markersize = 5)
# fig = pyplot.figure()
# ax = fig.add_subplot(1,1,1)
# ax.scatter(X[:, 1],X[:, 2], s=y) # Added third variable income as size of the bubble
# pyplot.show()
The current output is:
183.43478288
349.10716957
236.94627602
208.61071682
The correct output should be:
180.38
1312.07
440.13
343.72

Runtime Warning Using power with Numpy

I'm using power function from numpy and i'm obtaining a warning message. This is the code:
import numpy as np
def f(x, n):
factor = n / (1. + n)
exponent = 1. + (1. / n)
f1_x = factor * np.power(0.5, exponent) - np.power(0.5 - x, exponent)
f2_x = factor * np.power(0.5, exponent) - np.power(x - 0.5, exponent)
return np.where((0 <= x) & (x <= 0.5), f1_x, f2_x)
fv = np.vectorize(f, otypes='f')
x = np.linspace(0., 1., 20)
print(fv(x, 0.23))
And this is the warning message:
E:\ProgramasPython3\LibroCientifico\partesvectorizada.py:8:
RuntimeWarning: invalid value encountered in power f2_x = factor *
np.power(0.5, exponent) - np.power(x - 0.5, exponent)
E:\ProgramasPython3\LibroCientifico\partesvectorizada.py:7:
RuntimeWarning: invalid value encountered in power f1_x = factor *
np.power(0.5, exponent) - np.power(0.5 - x, exponent) [-0.0199636
-0.00895462 -0.0023446 0.00136486 0.003271 0.00414007
0.00447386 0.00457215 0.00459036 0.00459162 0.00459162 0.00459036
0.00457215 0.00447386 0.00414007 0.003271 0.00136486 -0.0023446 -0.00895462 -0.0199636 ]
I don't know what is the invalid value. And I don't know how to specify that with where numpy function f2_x is only valid for values between >0.5 and <= 1.0.
Thanks
The reason this happens is because you are trying to take a non-integer power of a negative number. Apparently this doesn't work in earlier versions of Python/Numpy if you don't explicitly cast the value to be complex. So you will have to do something like
np.power(complex(0.5 - x), exponent).real
EDIT : Since your values will be truly complex (not some real number + some tiny imag part), I think you would want to either use the complex (but then the <=) later on gets kind of difficult, or you would want to catch the case where the base is negative in some other way.
Ok, thanks a lot everyone, here's the solution using a piecewise function against where from numpy, and using np.complex128 as mentioned #Saullo
import numpy as np
def f(x, n):
factor = n / (1. + n)
exponent = (n + 1.) / n
f1_x = lambda x: factor * \
np.power(2., -exponent) - np.power((1. - 2. * x) / 2., exponent)
f2_x = lambda x: factor * \
np.power(2., -exponent) - np.power(-(1. - 2. * x) / 2., exponent)
conditions = [(0. <= x) & (x <= 0.5), (0.5 < x) & (x <= 1.)]
functions = [f1_x, f2_x]
result = np.piecewise(x, conditions, functions)
return np.real(result)
x = np.linspace(0., 1., 20)
x = x.astype(np.complex128)
print(f(x, 0.23))
The problem is when the base from a power is negative then np.power doesn't work fine and you obtain the warning message. I expect this is useful for everyone.

Resources