Efficient implementation of the following code snippet - python-3.x

I have a function which calculates the mean depth of a 3-D volume. Is there a way to make the code more efficient in terms of execution time. The volume is of the following shape.
volume = np.zeros((100, 240, 180))
The volume can contain the number 1 at different voxels and the objective is to find the mean depth (mean Z co-ord) using weighted average of all occupied cells in the volume.
def calc_mean_depth(volume):
'''
Calculate the mean depth of the volume. Only voxels which contain a value are considered for the mean depth
Parameters:
-----------
volume: (100x240x180) numpy array
Input 3-D volume which may contain value of 1 in its voxels
Return:
-------
mean_depth :<float>
mean depth calculated
'''
depth_weight = 0
tot = 0
for z in range(volume.shape[0]):
vol_slice = volume[z, :, :] # take one x-y plane
weight = vol_slice[vol_slice>0].size # get number of values greater than zero
tot += weight # This counter is used to serve as the denominator
depth_weight += weight * z # the depth plane into number of cells in it greater than 0.
if tot==0:
return 0
else:
mean_depth = depth_weight/tot
return mean_depth

This should work. Use count_nonzero to do the summing and do the averaging at the end.
def calc_mean_depth(volume):
w = np.count_nonzero(volume, axis = (1,2))
if w.sum() == 0:
return 0
else
return (np.arange(w.size) * w).sum() / w.sum()

Related

Problem in the function of my program code python

I tried to make a program to do the below things but apparently, the function doesn't work. I want my function to take two or more arguments and give me the average and median and the maximum number of those arguments.
example input:
calc([2, 20])
example output : (11.0, 11.0, 20)
def calc():
total = 0
calc = sorted(calc)
for x in range(len(calc)):
total += int(calc[x])
average = total / len(calc)
sorted(calc)
maximum = calc[len(calc) - 1]
if len(calc) % 2 != 0:
median = calc[(len(calc) // 2) + 1]
else:
median = (float(calc[(len(calc) // 2) - 1]) + float(calc[(len(calc) // 2)])) / 2
return (average, median, maximum)
There are some things I'm going to fix as I go since I can't help myself.
First, you main problem is arguments.
If you hand a function arguments
calc([2, 20])
It needs to accept arguments.
def calc(some_argument):
This will fix your main problem but another thing is you shouldn't have identical names for your variables.
calc is your function name so it should not also be the name of your list within your function.
# changed the arg name to lst
def calc(lst):
lst = sorted(lst)
# I'm going to just set these as variables since
# you're doing the calculations more than once
# it adds a lot of noise to your lines
size = len(lst)
mid = size // 2
total = 0
# in python we can just iterate over a list directly
# without indexing into it
# and python will unpack the variable into x
for x in lst:
total += int(x)
average = total / size
# we can get the last element in a list like so
maximum = lst[-1]
if size % 2 != 0:
# this was a logical error
# the actual element you want is mid
# since indexes start at 0
median = lst[mid]
else:
# here there is no reason to explicity cast to float
# since python division does that automatically
median = (lst[mid - 1] + lst[mid]) / 2
return (average, median, maximum)
print(calc([11.0, 11.0, 20]))
Output:
(14.0, 11.0, 20)
Because you are passing arguments into a function that doesn't accept any, you are getting an error. You could fix this just by making the first line of your program:
def calc(calc):
But it would be better to accept inputs into your function as something like "mylist". To do so you would just have to change your function like so:
def calc(mylist):
calc=sorted(mylist)

Writing user defined function to evaluate the Saha equation with for-loops for an expected output

I am trying to create a function that evaluates the Saha function for certain values of temperature and electron pressure. The question is a little in depth so I will provide as much detail as possible about past code used before this section.
Previous sections code
Evaluating the partition function (part 1):
k= 8.617333262145179e-05
T=10000.
g=1.0
Ca_ion_energies = np.array([6.1131554, 11.871719, 50.91316, 67.2732, 84.34]) #in eV
Ca_partition_values= []
def partfunc_E(chiI,T):
for chiI in Ca_ion_energies:
elem = 0
for i in np.arange(chiI):
elem = elem + (g*np.exp(-(i/(k*T))))
Ca_partition_values.append(elem)
return Ca_partition_values
print(partfunc_E(Ca_ion_energies,T))
Output:
[1.455902590894594, 1.45633321917395, 1.4563345239240013, 1.4563345239240013, 1.4563345239240013]
Evaluating the Boltzmann equation (part 2):
chiI = np.array([6.1131554, 11.871719, 50.91316, 67.2732, 84.34]) #in eV
k= 8.617333262145179e-05
T=10000.
def boltz_E(chiI,T,I,i):
Z_1 = partfunc_E(chiI,T)
ratio = np.exp(-i/(k*T)) / Z_1
return ratio [I-1]
print(Ca_ion_energies)
print("i Fraction in level i for I=1 (neutral)")
print("- -------------------------------------")
for n in range(0,10):
print(n,boltz_E(chiI,10000,1,n))
Output:
[ 6.1131554 11.871719 50.91316 67.2732 84.34 ]
i Fraction in level i for I=1 (neutral)
- -------------------------------------
0 0.6868591389658425
1 0.21522358567610525
2 0.06743914320048579
3 0.021131689732463026
4 0.006621500359539954
5 0.002074811222693332
6 0.0006501308428703751
7 0.0002037149733085943
8 6.383298193775377e-05
9 2.0001718660577703e-05
Question I need help with (and my code so far):
Evaluating the Saha equation (part 3):
The instructions for this section are as follows:
The simplest way to get this ratio is to set 𝑁_𝐼=1 (i.e. the neutral atom) to some value (e.g. unity), evaluate the next ionisation-stage populations successively from the Saha equation in a for loop, and at the end divide them by the sum of all the 𝑁 on the same scale. You will find the numpy np.sum function useful to get the total over all stages. We want temperature T to be 5000K and electron pressure Pe to be 100.0 N/m^2.
FYI: I is the ionisation stage, Z_1 is the partition function from part 1, Z_I is the partition function for stage I+1, Pe is the electron pressure, chiI are the ionisation energies (for Calcium in my code), T is temperature and the function that "fraction" is set equal to is the Saha equation.
It should start something like:
def saha_E(chiI,T,Pe,I):
compute Saha population fraction N_I/N
input: ionisation energies, temperature, electron pressure, ion stage
Compute the partition functions
Loop over each ionisation stage that you have an energy for, computing the fraction via the saha equation. Note that the first stage should be set to 1.
Divide each stage by the total
Return the fraction of the requested stage
My code attempt:
k= 8.617333262145179e-05
T=10000.
g=1.0
Ca_ion_energies = np.array([6.1131554, 11.871719, 50.91316, 67.2732, 84.34])
N_I = 1
h = 6.626e-34
m = 9.11e-31
fractions = []
fraction_sum = []
def saha_E(chiI,T,Pe,I):
Z_1 = partfunc_E(chiI,T)
Z_I = partfunc_E(chiI+1,T)
for I in Ca_ion_energies:
fraction = (N_I*(Z_I/Z_1)*((2*k*T)/((h**3)*Pe))*((2*np.pi*m*k*T)**(3/2))*np.exp(-I/(k*T)))
fractions.append(fraction)
fraction_sum.append(np.sum(fractions))
for i in fractions:
i/fraction_sum
return fraction
print("For ionisation energies (in eV) of:",chiI)
print()
print("I Fraction in stage I")
print("- -------------------")
for I in range(0,6):
print(I,saha_E(chiI,5000,100.0,I))
I am instructed also that the output should be something similar to:
For ionisation energies (in eV) of: [ 6.11 11.87 50.91 67.27 84.34]
I Fraction in stage I
- -------------------
1 0.999998720736
2 1.27926351211e-06
3 7.29993420039e-52
4 1.3474665329e-113
5 1.54848994685e-192
Firstly, I don't think my code is correct but it is the best I can do which is why I need some help, but also, this code is giving me the following error:
TypeError: unsupported operand type(s) for /: 'list' and 'list'
If my code is totally wrong please tell me as I have spent so much time trying to figure this out already.
Edit
This question is still not completely answered, please keep commenting!
If I understood your problem well, my approach is to calculate the "fractions" and "fractions sums" in a single loop on the various energies, and normalize only once we are outside the loop.
Also, careful with the scope of your code. I pushed some variables you declared outside of the function inside of it because there is no reason to keep them alive outside of the function's scope.
Careful also not to use the same variable twice. Your function takes a I argument but then has a I variable in a for loop.
As said in the chat, you want to write dosctrings and comments so that you know where you are going even before touching any code. Here is a base to complete:
import numpy as np
# Constants.
k = 8.617333262145179e-05
g = 1.0
h = 6.626e-34
m = 9.11e-31
Ca_ion_energies = np.array([6.1131554, 11.871719, 50.91316, 67.2732, 84.34]) # in eV.
# Partition function.
def partfunc_E(chiI, T):
"""This function returns the partition of blablabla.
args:
------
:chiI: (array or list) the energy levels of a chosen ion.
:T: (float) the temperature at which kT will be calculated."""
Ca_partition_values = []
for energy_level in chiI: # For each energy level.
elem = 0
for i in np.arange(energy_level): # From 0 to current energy level.
elem += g*np.exp(-(i/(k*T)))
Ca_partition_values.append(elem)
return np.array(Ca_partition_values) # Conversion to numpy array to support operations later.
print(partfunc_E(Ca_ion_energies, T=10000))
# Boltzmann equation.
def boltz_E(chiI, T, I, i):
Z_1 = partfunc_E(chiI, T)
ratio = np.exp(-i/(k*T)) / Z_1
return ratio[I-1]
print(Ca_ion_energies)
print("i Fraction in level i for I=1 (neutral)")
print("- -------------------------------------")
for n in range(0,10):
print(n, boltz_E(Ca_ion_energies, T=10000, I=1, i=n))
# Saha equation.
def saha_E(chiI, T, Pe, i):
p = partfunc_E(chiI, T)
Z_ratios = np.array([p[n]/p[0] for n in range(len(chiI))])
fractions = []
fractions_sum = []
for n, I in enumerate(chiI):
fraction = Z_ratios[n]*((2*k*T)/((h**3)*Pe))*((2*np.pi*m*k*T)**(3/2))*np.exp(-I/(k*T))
fractions.append(fraction)
fractions_sum.append(np.sum(fractions))
# Let's normalize the array before returning it.
fractions = np.divide(fractions, fractions_sum)
return fractions[i]
print("For ionisation energies (in eV) of:", Ca_ion_energies)
print()
print("I Fraction in stage n")
print("- -------------------")
for n in range(0, 4):
print(n, saha_E(Ca_ion_energies, T=5000, Pe=100.0, i=n))

Distance between 2 user defined georeferenced grids in km

I have 2 variables 'Root zone' and 'Tree cover' both are geolocated (NetCDF) (which are basically grids with each grid having a specific value). The values in TC varies from 0 to 100. Each grid size is 0.25 degrees (might be helpful in understanding the distance).
My problem is "I want to calculate the distance of each TC value ranging between 70-100 and 30-70 (so each value of TC value greater than 30 at each lat and lon) from the points where nearest TC ranges between 0-30 (less than 30)."
What I want to do is create a 2-dimensional scatter plot with X-axis denoting the 'distance in km of 70-100 TC (and 30-70 TC) from 0-30 values', Y-axis denoting 'RZS of those 70-100 TC points (and 30-70 TC)'
#I read the files using xarray
deficit_annual = xr.open_dataset('Rootzone_CHIRPS_era5_2000-2015_annual_SA_masked.nc')
tc = xr.open_dataset('Treecover_MODIS_2000-2015_annual_SA_masked.nc')
fig, ax = plt.subplots(figsize = (8,8))
## year I am interested in
year = 2000
i = year - 2000
# Select the indices of the low- and high-valued points
# This will results in warnings here because of NaNs;
# the NaNs should be filtered out in the indices, since they will
# compare to False in all the comparisons, and thus not be
# indexed by 'low' and 'high'
low = (tc[i,:,:] <= 30) # Savanna
moderate = (tc[i,:,:] > 30) & (tc[i,:,:] < 70) #Transitional forest
high = (tc[i,:,:] >= 70) #Forest
# Get the coordinates for the low- and high-valued points,
# combine and transpose them to be in the correct format
y, x = np.where(low)
low_coords = np.array([x, y]).T
y, x = np.where(high)
high_coords = np.array([x, y]).T
y, x = np.where(moderate)
moderate_coords = np.array([x, y]).T
# We now calculate the distances between *all* low-valued points, and *all* high-valued points.
# This calculation scales as O^2, as does the memory cost (of the output),
# so be wary when using it with large input sizes.
from scipy.spatial.distance import cdist, pdist
distances = cdist(low_coords, moderate_coords, 'euclidean')
# Now find the minimum distance along the axis of the high-valued coords,
# which here is the second axis.
# Since we also want to find values corresponding to those minimum distances,
# we should use the `argmin` function instead of a normal `min` function.
indices = distances.argmin(axis=1)
mindistances = distances[np.arange(distances.shape[0]), indices]
minrzs = np.array(deficit_annual[i,:,:]).flatten()[indices]
plt.scatter(mindistances*25, minrzs, s = 60, alpha = 0.5, color = 'goldenrod', label = 'Trasitional Forest')
distances = cdist(low_coords, high_coords, 'euclidean')
# Now find the minimum distance along the axis of the high-valued coords,
# which here is the second axis.
# Since we also want to find values corresponding to those minimum distances,
# we should use the `argmin` function instead of a normal `min` function.
indices = distances.argmin(axis=1)
mindistances = distances[np.arange(distances.shape[0]), indices]
minrzs = np.array(deficit_annual[i,:,:]).flatten()[indices]
plt.scatter(mindistances*25, minrzs, s = 60, alpha = 1, color = 'green', label = 'Forest')
plt.xlabel('Distance from Savanna (km)', fontsize = '14')
plt.xticks(fontsize = '14')
plt.yticks(fontsize = '14')
plt.ylabel('Rootzone storage capacity (mm/year)', fontsize = '14')
plt.legend(fontsize = '14')
#plt.ylim((-10, 1100))
#plt.xlim((0, 30))
What I want is to know whether the code seems to have an error (as it is working now, but doesn't seem to work when I increase the 'high = (tc[i,:,:] >= 70 ` to 80 for year 2000. This makes me wonder if the code is correct or not.
Secondly, is it possible to define a 20 km buffer region of 'low = (tc[i,:,:] <= 30)'. What I mean is that the 'low' is defined only when a cluster of Tree cover values are below 30 and not by an individual pixel.
Some netCDF files are attached in the link below:
https://www.dropbox.com/sh/unm96q7sfto8y53/AAA7e12bs07XtpMiVFdML_PIa?dl=0
The graph I want is something like this (derived from the code above).
Thank you for your help.

Calculating the Cumulative Mean in Python

i am new on programming and python. I made a simulation mm1 queue. I ran it properly. I took the results. I have an 5000 output. But now i should calculate the cumulative mean of average delays for every 100 period(1 to 100, 1 to 200... until 1 to 5000).
#data 4 (delay time) set assign to list of numpy array
npdelaytime = np.array(data[4][0:5000])
#reshape the list of delay time 100 customer in each sample
npdelayreshape100 = np.reshape(npdelaytime, (-1,100))
#mean of this reshape matrix
meandelayreshape100 = np.mean(npdelayreshape100, axis=1)
cumsummdr100 = np.cumsum(meandelayreshape100)
a = range(1,51)
meancsmdr100 = cumsummdr100 / a
I can figure this out like this. First reshape the 5000 sample point into to 100*50. Then taking the means of these matrix. Lastly cumsum of these means.
My Question : Is there a easy way to do this ?
What about replacing range by np.arange ?
Try:
meancsmdr100 = cumsummdr100 / np.arange(1,51)
def cum_mean(arr):
cum_sum = np.cumsum(arr, axis=0)
for i in range(cum_sum.shape[0]):
if i == 0:
continue
print(cum_sum[i] / (i + 1))
cum_sum[i] = cum_sum[i] / (i + 1)
return cum_sum

Weighted moving average in python with different width in different regions

I was trying to take a oscillation avarage of a highly oscillating data. The oscillations are not uniform, it has less oscillations in the initial regions.
x = np.linspace(0, 1000, 1000001)
y = some oscillating data say, sin(x^2)
(The original data file is huge, so I can't upload it)
I want to take a weighted moving avarage of the function and plot it. Initially the period of the function is larger, so I want to take avarage over a large time interval. While I can do with smaller time interval latter.
I have found a possible elegant solution in following post:
Weighted moving average in python
However, I want to have different width in different regions of x. Say when x is between (0,100) I want the width=0.6, while when x is between (101, 300) width=0.2 and so on.
This is what I have tried to implement( with my limited knowledge in programing!)
def weighted_moving_average(x,y,step_size=0.05):#change the width to control average
bin_centers = np.arange(np.min(x),np.max(x)-0.5*step_size,step_size)+0.5*step_size
bin_avg = np.zeros(len(bin_centers))
#We're going to weight with a Gaussian function
def gaussian(x,amp=1,mean=0,sigma=1):
return amp*np.exp(-(x-mean)**2/(2*sigma**2))
if x.any() < 100:
for index in range(0,len(bin_centers)):
bin_center = bin_centers[index]
weights = gaussian(x,mean=bin_center,sigma=0.6)
bin_avg[index] = np.average(y,weights=weights)
else:
for index in range(0,len(bin_centers)):
bin_center = bin_centers[index]
weights = gaussian(x,mean=bin_center,sigma=0.1)
bin_avg[index] = np.average(y,weights=weights)
return (bin_centers,bin_avg)
It is needless to say that this is not working! I am getting the plot with the first value of sigma. Please help...
The following snippet should do more or less what you tried to do. You have mainly a logical problem in your code, x.any() < 100 will always be True, so you'll never execute the second part.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 1000)
y = np.sin(x**2)
def gaussian(x,amp=1,mean=0,sigma=1):
return amp*np.exp(-(x-mean)**2/(2*sigma**2))
def weighted_average(x,y,step_size=0.3):
weights = np.zeros_like(x)
bin_centers = np.arange(np.min(x),np.max(x)-.5*step_size,step_size)+.5*step_size
bin_avg = np.zeros_like(bin_centers)
for i, center in enumerate(bin_centers):
# Select the indices that should count to that bin
idx = ((x >= center-.5*step_size) & (x <= center+.5*step_size))
weights = gaussian(x[idx], mean=center, sigma=step_size)
bin_avg[i] = np.average(y[idx], weights=weights)
return (bin_centers,bin_avg)
idx = x <= 4
plt.plot(*weighted_average(x[idx],y[idx], step_size=0.6))
idx = x >= 3
plt.plot(*weighted_average(x[idx],y[idx], step_size=0.1))
plt.plot(x,y)
plt.legend(['0.6', '0.1', 'y'])
plt.show()
However, depending on the usage, you could also implement moving average directly:
x = np.linspace(0, 60, 1000)
y = np.sin(x**2)
z = np.zeros_like(x)
z[0] = x[0]
for i, t in enumerate(x[1:]):
a=.2
z[i+1] = a*y[i+1] + (1-a)*z[i]
plt.plot(x,y)
plt.plot(x,z)
plt.legend(['data', 'moving average'])
plt.show()
Of course you could then change a adaptively, e.g. depending of the local variance. Also note that this has apriori a small bias depending on a and the step size in x.

Resources