Calculating the Cumulative Mean in Python - python-3.x

i am new on programming and python. I made a simulation mm1 queue. I ran it properly. I took the results. I have an 5000 output. But now i should calculate the cumulative mean of average delays for every 100 period(1 to 100, 1 to 200... until 1 to 5000).
#data 4 (delay time) set assign to list of numpy array
npdelaytime = np.array(data[4][0:5000])
#reshape the list of delay time 100 customer in each sample
npdelayreshape100 = np.reshape(npdelaytime, (-1,100))
#mean of this reshape matrix
meandelayreshape100 = np.mean(npdelayreshape100, axis=1)
cumsummdr100 = np.cumsum(meandelayreshape100)
a = range(1,51)
meancsmdr100 = cumsummdr100 / a
I can figure this out like this. First reshape the 5000 sample point into to 100*50. Then taking the means of these matrix. Lastly cumsum of these means.
My Question : Is there a easy way to do this ?

What about replacing range by np.arange ?
Try:
meancsmdr100 = cumsummdr100 / np.arange(1,51)

def cum_mean(arr):
cum_sum = np.cumsum(arr, axis=0)
for i in range(cum_sum.shape[0]):
if i == 0:
continue
print(cum_sum[i] / (i + 1))
cum_sum[i] = cum_sum[i] / (i + 1)
return cum_sum

Related

Problem in the function of my program code python

I tried to make a program to do the below things but apparently, the function doesn't work. I want my function to take two or more arguments and give me the average and median and the maximum number of those arguments.
example input:
calc([2, 20])
example output : (11.0, 11.0, 20)
def calc():
total = 0
calc = sorted(calc)
for x in range(len(calc)):
total += int(calc[x])
average = total / len(calc)
sorted(calc)
maximum = calc[len(calc) - 1]
if len(calc) % 2 != 0:
median = calc[(len(calc) // 2) + 1]
else:
median = (float(calc[(len(calc) // 2) - 1]) + float(calc[(len(calc) // 2)])) / 2
return (average, median, maximum)
There are some things I'm going to fix as I go since I can't help myself.
First, you main problem is arguments.
If you hand a function arguments
calc([2, 20])
It needs to accept arguments.
def calc(some_argument):
This will fix your main problem but another thing is you shouldn't have identical names for your variables.
calc is your function name so it should not also be the name of your list within your function.
# changed the arg name to lst
def calc(lst):
lst = sorted(lst)
# I'm going to just set these as variables since
# you're doing the calculations more than once
# it adds a lot of noise to your lines
size = len(lst)
mid = size // 2
total = 0
# in python we can just iterate over a list directly
# without indexing into it
# and python will unpack the variable into x
for x in lst:
total += int(x)
average = total / size
# we can get the last element in a list like so
maximum = lst[-1]
if size % 2 != 0:
# this was a logical error
# the actual element you want is mid
# since indexes start at 0
median = lst[mid]
else:
# here there is no reason to explicity cast to float
# since python division does that automatically
median = (lst[mid - 1] + lst[mid]) / 2
return (average, median, maximum)
print(calc([11.0, 11.0, 20]))
Output:
(14.0, 11.0, 20)
Because you are passing arguments into a function that doesn't accept any, you are getting an error. You could fix this just by making the first line of your program:
def calc(calc):
But it would be better to accept inputs into your function as something like "mylist". To do so you would just have to change your function like so:
def calc(mylist):
calc=sorted(mylist)

Writing user defined function to evaluate the Saha equation with for-loops for an expected output

I am trying to create a function that evaluates the Saha function for certain values of temperature and electron pressure. The question is a little in depth so I will provide as much detail as possible about past code used before this section.
Previous sections code
Evaluating the partition function (part 1):
k= 8.617333262145179e-05
T=10000.
g=1.0
Ca_ion_energies = np.array([6.1131554, 11.871719, 50.91316, 67.2732, 84.34]) #in eV
Ca_partition_values= []
def partfunc_E(chiI,T):
for chiI in Ca_ion_energies:
elem = 0
for i in np.arange(chiI):
elem = elem + (g*np.exp(-(i/(k*T))))
Ca_partition_values.append(elem)
return Ca_partition_values
print(partfunc_E(Ca_ion_energies,T))
Output:
[1.455902590894594, 1.45633321917395, 1.4563345239240013, 1.4563345239240013, 1.4563345239240013]
Evaluating the Boltzmann equation (part 2):
chiI = np.array([6.1131554, 11.871719, 50.91316, 67.2732, 84.34]) #in eV
k= 8.617333262145179e-05
T=10000.
def boltz_E(chiI,T,I,i):
Z_1 = partfunc_E(chiI,T)
ratio = np.exp(-i/(k*T)) / Z_1
return ratio [I-1]
print(Ca_ion_energies)
print("i Fraction in level i for I=1 (neutral)")
print("- -------------------------------------")
for n in range(0,10):
print(n,boltz_E(chiI,10000,1,n))
Output:
[ 6.1131554 11.871719 50.91316 67.2732 84.34 ]
i Fraction in level i for I=1 (neutral)
- -------------------------------------
0 0.6868591389658425
1 0.21522358567610525
2 0.06743914320048579
3 0.021131689732463026
4 0.006621500359539954
5 0.002074811222693332
6 0.0006501308428703751
7 0.0002037149733085943
8 6.383298193775377e-05
9 2.0001718660577703e-05
Question I need help with (and my code so far):
Evaluating the Saha equation (part 3):
The instructions for this section are as follows:
The simplest way to get this ratio is to set 𝑁_𝐼=1 (i.e. the neutral atom) to some value (e.g. unity), evaluate the next ionisation-stage populations successively from the Saha equation in a for loop, and at the end divide them by the sum of all the 𝑁 on the same scale. You will find the numpy np.sum function useful to get the total over all stages. We want temperature T to be 5000K and electron pressure Pe to be 100.0 N/m^2.
FYI: I is the ionisation stage, Z_1 is the partition function from part 1, Z_I is the partition function for stage I+1, Pe is the electron pressure, chiI are the ionisation energies (for Calcium in my code), T is temperature and the function that "fraction" is set equal to is the Saha equation.
It should start something like:
def saha_E(chiI,T,Pe,I):
compute Saha population fraction N_I/N
input: ionisation energies, temperature, electron pressure, ion stage
Compute the partition functions
Loop over each ionisation stage that you have an energy for, computing the fraction via the saha equation. Note that the first stage should be set to 1.
Divide each stage by the total
Return the fraction of the requested stage
My code attempt:
k= 8.617333262145179e-05
T=10000.
g=1.0
Ca_ion_energies = np.array([6.1131554, 11.871719, 50.91316, 67.2732, 84.34])
N_I = 1
h = 6.626e-34
m = 9.11e-31
fractions = []
fraction_sum = []
def saha_E(chiI,T,Pe,I):
Z_1 = partfunc_E(chiI,T)
Z_I = partfunc_E(chiI+1,T)
for I in Ca_ion_energies:
fraction = (N_I*(Z_I/Z_1)*((2*k*T)/((h**3)*Pe))*((2*np.pi*m*k*T)**(3/2))*np.exp(-I/(k*T)))
fractions.append(fraction)
fraction_sum.append(np.sum(fractions))
for i in fractions:
i/fraction_sum
return fraction
print("For ionisation energies (in eV) of:",chiI)
print()
print("I Fraction in stage I")
print("- -------------------")
for I in range(0,6):
print(I,saha_E(chiI,5000,100.0,I))
I am instructed also that the output should be something similar to:
For ionisation energies (in eV) of: [ 6.11 11.87 50.91 67.27 84.34]
I Fraction in stage I
- -------------------
1 0.999998720736
2 1.27926351211e-06
3 7.29993420039e-52
4 1.3474665329e-113
5 1.54848994685e-192
Firstly, I don't think my code is correct but it is the best I can do which is why I need some help, but also, this code is giving me the following error:
TypeError: unsupported operand type(s) for /: 'list' and 'list'
If my code is totally wrong please tell me as I have spent so much time trying to figure this out already.
Edit
This question is still not completely answered, please keep commenting!
If I understood your problem well, my approach is to calculate the "fractions" and "fractions sums" in a single loop on the various energies, and normalize only once we are outside the loop.
Also, careful with the scope of your code. I pushed some variables you declared outside of the function inside of it because there is no reason to keep them alive outside of the function's scope.
Careful also not to use the same variable twice. Your function takes a I argument but then has a I variable in a for loop.
As said in the chat, you want to write dosctrings and comments so that you know where you are going even before touching any code. Here is a base to complete:
import numpy as np
# Constants.
k = 8.617333262145179e-05
g = 1.0
h = 6.626e-34
m = 9.11e-31
Ca_ion_energies = np.array([6.1131554, 11.871719, 50.91316, 67.2732, 84.34]) # in eV.
# Partition function.
def partfunc_E(chiI, T):
"""This function returns the partition of blablabla.
args:
------
:chiI: (array or list) the energy levels of a chosen ion.
:T: (float) the temperature at which kT will be calculated."""
Ca_partition_values = []
for energy_level in chiI: # For each energy level.
elem = 0
for i in np.arange(energy_level): # From 0 to current energy level.
elem += g*np.exp(-(i/(k*T)))
Ca_partition_values.append(elem)
return np.array(Ca_partition_values) # Conversion to numpy array to support operations later.
print(partfunc_E(Ca_ion_energies, T=10000))
# Boltzmann equation.
def boltz_E(chiI, T, I, i):
Z_1 = partfunc_E(chiI, T)
ratio = np.exp(-i/(k*T)) / Z_1
return ratio[I-1]
print(Ca_ion_energies)
print("i Fraction in level i for I=1 (neutral)")
print("- -------------------------------------")
for n in range(0,10):
print(n, boltz_E(Ca_ion_energies, T=10000, I=1, i=n))
# Saha equation.
def saha_E(chiI, T, Pe, i):
p = partfunc_E(chiI, T)
Z_ratios = np.array([p[n]/p[0] for n in range(len(chiI))])
fractions = []
fractions_sum = []
for n, I in enumerate(chiI):
fraction = Z_ratios[n]*((2*k*T)/((h**3)*Pe))*((2*np.pi*m*k*T)**(3/2))*np.exp(-I/(k*T))
fractions.append(fraction)
fractions_sum.append(np.sum(fractions))
# Let's normalize the array before returning it.
fractions = np.divide(fractions, fractions_sum)
return fractions[i]
print("For ionisation energies (in eV) of:", Ca_ion_energies)
print()
print("I Fraction in stage n")
print("- -------------------")
for n in range(0, 4):
print(n, saha_E(Ca_ion_energies, T=5000, Pe=100.0, i=n))

How to vectorize a function of two matrices in numpy?

Say, I have a binary (adjacency) matrix A of dimensions nxn and another matrix U of dimensions nxl. I use the following piece of code to compute a new matrix that I need.
import numpy as np
from numpy import linalg as LA
new_U = np.zeros_like(U)
for idx, a in np.ndenumerate(A):
diff = U[idx[0], :] - U[idx[1], :]
if a == 1.0:
new_U[idx[0], :] += 2 * diff
elif a == 0.0:
norm_diff = LA.norm(U[idx[0], :] - U[idx[1], :])
new_U[idx[0], :] += -2 * diff * np.exp(-norm_diff**2)
return new_U
This takes quite a lot of time to run even when n and l are small. Is there a better way to rewrite (vectorize) this code to reduce the runtime?
Edit 1: Sample input and output.
A = np.array([[0,1,0], [1,0,1], [0,1,0]], dtype='float64')
U = np.array([[2,3], [4,5], [6,7]], dtype='float64')
new_U = np.array([[-4.,-4.], [0,0],[4,4]], dtype='float64')
Edit 2: In mathematical notation, I am trying to compute the following:
where u_ik = U[i, k],u_jk = U[j, k], and u_i = U[i, :]. Also, (i,j) \in E corresponds to a == 1.0 in the code.
Leveraging broadcasting and np.einsum for the sum-reductions -
# Get pair-wise differences between rows for all rows in a vectorized manner
Ud = U[:,None,:]-U
# Compute norm L1 values with those differences
L = LA.norm(Ud,axis=2)
# Compute 2 * diff values for all rows and mask it with ==0 condition
# and sum along axis=1 to simulate the accumulating behaviour
p1 = np.einsum('ijk,ij->ik',2*Ud,A==1.0)
# Similarly, compute for ==1 condition and finally sum those two parts
p2 = np.einsum('ijk,ij,ij->ik',-2*Ud,np.exp(-L**2),A==0.0)
out = p1+p2
Alternatively, use einsum for computing squared-norm values and using those to get p2 -
Lsq = np.einsum('ijk,ijk->ij',Ud,Ud)
p2 = np.einsum('ijk,ij,ij->ik',-2*Ud,np.exp(-Lsq),A==0.0)

Weighted moving average in python with different width in different regions

I was trying to take a oscillation avarage of a highly oscillating data. The oscillations are not uniform, it has less oscillations in the initial regions.
x = np.linspace(0, 1000, 1000001)
y = some oscillating data say, sin(x^2)
(The original data file is huge, so I can't upload it)
I want to take a weighted moving avarage of the function and plot it. Initially the period of the function is larger, so I want to take avarage over a large time interval. While I can do with smaller time interval latter.
I have found a possible elegant solution in following post:
Weighted moving average in python
However, I want to have different width in different regions of x. Say when x is between (0,100) I want the width=0.6, while when x is between (101, 300) width=0.2 and so on.
This is what I have tried to implement( with my limited knowledge in programing!)
def weighted_moving_average(x,y,step_size=0.05):#change the width to control average
bin_centers = np.arange(np.min(x),np.max(x)-0.5*step_size,step_size)+0.5*step_size
bin_avg = np.zeros(len(bin_centers))
#We're going to weight with a Gaussian function
def gaussian(x,amp=1,mean=0,sigma=1):
return amp*np.exp(-(x-mean)**2/(2*sigma**2))
if x.any() < 100:
for index in range(0,len(bin_centers)):
bin_center = bin_centers[index]
weights = gaussian(x,mean=bin_center,sigma=0.6)
bin_avg[index] = np.average(y,weights=weights)
else:
for index in range(0,len(bin_centers)):
bin_center = bin_centers[index]
weights = gaussian(x,mean=bin_center,sigma=0.1)
bin_avg[index] = np.average(y,weights=weights)
return (bin_centers,bin_avg)
It is needless to say that this is not working! I am getting the plot with the first value of sigma. Please help...
The following snippet should do more or less what you tried to do. You have mainly a logical problem in your code, x.any() < 100 will always be True, so you'll never execute the second part.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 1000)
y = np.sin(x**2)
def gaussian(x,amp=1,mean=0,sigma=1):
return amp*np.exp(-(x-mean)**2/(2*sigma**2))
def weighted_average(x,y,step_size=0.3):
weights = np.zeros_like(x)
bin_centers = np.arange(np.min(x),np.max(x)-.5*step_size,step_size)+.5*step_size
bin_avg = np.zeros_like(bin_centers)
for i, center in enumerate(bin_centers):
# Select the indices that should count to that bin
idx = ((x >= center-.5*step_size) & (x <= center+.5*step_size))
weights = gaussian(x[idx], mean=center, sigma=step_size)
bin_avg[i] = np.average(y[idx], weights=weights)
return (bin_centers,bin_avg)
idx = x <= 4
plt.plot(*weighted_average(x[idx],y[idx], step_size=0.6))
idx = x >= 3
plt.plot(*weighted_average(x[idx],y[idx], step_size=0.1))
plt.plot(x,y)
plt.legend(['0.6', '0.1', 'y'])
plt.show()
However, depending on the usage, you could also implement moving average directly:
x = np.linspace(0, 60, 1000)
y = np.sin(x**2)
z = np.zeros_like(x)
z[0] = x[0]
for i, t in enumerate(x[1:]):
a=.2
z[i+1] = a*y[i+1] + (1-a)*z[i]
plt.plot(x,y)
plt.plot(x,z)
plt.legend(['data', 'moving average'])
plt.show()
Of course you could then change a adaptively, e.g. depending of the local variance. Also note that this has apriori a small bias depending on a and the step size in x.

Efficient way of loading minibatches < gpu memory

I have the following scenario:
My dataset >> gpu memory
My minibatches < gpu memory ... such that depending on size I can fit up to 10 in memory at once while still training no problem.
The size of my dataset means I won't revisit datapoints, so I guess no point in making them shared? Or is there? I was thinking that maybe it would be beneficial to have up to 10 shared initialised variables of size=mini-batch, such that I can I swap 10 in at once instead of just one at a a time. Also, is it possible to preload mini-batches in parallel?
If you're not revisiting datapoints then there probably isn't any value in using shared variables.
The following code could be modified and used to evaluate the different methods of getting data into your specific computation.
The "input" method is the one that will probably be best when you have no need to revisit data. The "shared_all" method may outperform everything else but only if you can fit the entire dataset in GPU memory. The "shared_batched" allows you to evaluate whether hierarchically batching your data could help.
In the "shared_batched" method, the dataset is divided into many macro batches and each macro batch is divided into many micro batches. A single shared variable is used to hold a single macro batch. The code evaluates all the micro batches within the current macro batch. Once a complete macro batch has been processed the next macro batch is loaded into the shared variable and the code iterates over the micro batches within it again.
In general, it might be expected that small numbers of large memory transfers will operate faster than larger numbers of smaller transfers (where the total transfered is the same for each). But this needs to be tested (e.g. with the code below) before it can be known for sure; YMMV.
The use of the "borrow" parameter may also have a significant impact on the performance, but be aware of the implications before using it.
import math
import timeit
import numpy
import theano
import theano.tensor as tt
def test_input(data, batch_size):
assert data.shape[0] % batch_size == 0
batch_count = data.shape[0] / batch_size
x = tt.tensor4()
f = theano.function([x], outputs=x.sum())
total = 0.
start = timeit.default_timer()
for batch_index in xrange(batch_count):
total += f(data[batch_index * batch_size: (batch_index + 1) * batch_size])
print 'IN\tNA\t%s\t%s\t%s\t%s' % (batch_size, batch_size, timeit.default_timer() - start, total)
def test_shared_all(data, batch_size):
batch_count = data.shape[0] / batch_size
for borrow in (True, False):
start = timeit.default_timer()
all = theano.shared(data, borrow=borrow)
load_time = timeit.default_timer() - start
x = tt.tensor4()
i = tt.lscalar()
f = theano.function([i], outputs=x.sum(), givens={x: all[i * batch_size:(i + 1) * batch_size]})
total = 0.
start = timeit.default_timer()
for batch_index in xrange(batch_count):
total += f(batch_index)
print 'SA\t%s\t%s\t%s\t%s\t%s' % (
borrow, batch_size, batch_size, load_time + timeit.default_timer() - start, total)
def test_shared_batched(data, macro_batch_size, micro_batch_size):
assert data.shape[0] % macro_batch_size == 0
assert macro_batch_size % micro_batch_size == 0
macro_batch_count = data.shape[0] / macro_batch_size
micro_batch_count = macro_batch_size / micro_batch_size
macro_batch = theano.shared(numpy.empty((macro_batch_size,) + data.shape[1:], dtype=theano.config.floatX),
borrow=True)
x = tt.tensor4()
i = tt.lscalar()
f = theano.function([i], outputs=x.sum(), givens={x: macro_batch[i * micro_batch_size:(i + 1) * micro_batch_size]})
for borrow in (True, False):
total = 0.
start = timeit.default_timer()
for macro_batch_index in xrange(macro_batch_count):
macro_batch.set_value(
data[macro_batch_index * macro_batch_size: (macro_batch_index + 1) * macro_batch_size], borrow=borrow)
for micro_batch_index in xrange(micro_batch_count):
total += f(micro_batch_index)
print 'SB\t%s\t%s\t%s\t%s\t%s' % (
borrow, macro_batch_size, micro_batch_size, timeit.default_timer() - start, total)
def main():
numpy.random.seed(1)
shape = (20000, 3, 32, 32)
print 'Creating random data with shape', shape
data = numpy.random.standard_normal(size=shape).astype(theano.config.floatX)
print 'Running tests'
for macro_batch_size in (shape[0] / pow(10, i) for i in xrange(int(math.log(shape[0], 10)))):
test_shared_all(data, macro_batch_size)
test_input(data, macro_batch_size)
for micro_batch_size in (macro_batch_size / pow(10, i) for i in
xrange(int(math.log(macro_batch_size, 10)) + 1)):
test_shared_batched(data, macro_batch_size, micro_batch_size)
main()

Resources