I have a csv dataframe as follow:
filename width height class xmin ymin xmax ymax
0 1.jpg 2048 1251 1 706 513 743 562
1 10.jpg 1600 980 1 715 157 733 181
2 11.jpg 2828 1828 1 460 1530 482 1557
3 12.jpg 1276 1754 1 846 517 878 563
....
19 10.jpg 1600 980 1 428 83 483 145
I would like to get the masks for every image. I've succeded to get them if there is only one box for each image, however some images have multiple bouding boxes (example 10.jpg). How can I be able to add that bounding box to the mask ?
So far my code is as follow ( works good if image has 1 row ):
for idimage in annotations['filename']:
img = cv2.imread('images/'+idimage)
x1 = annotations[annotations['filename'] == idimage]['xmin'][0]
y1 = annotations[annotations['filename'] == idimage]['ymin'][0]
x2 = annotations[annotations['filename'] == idimage]['xmax'][0]
y2 = annotations[annotations['filename'] == idimage]['ymax'][0]
mask = np.zeros((img.shape[0],img.shape[1])).astype('uint8')
mask[y1:y2, x1:x2] = 1
mask = cv2.imwrite('mask/'+idimage,mask)
Thank you !
Actually, this is wrong:
I've succeded to get them if there is only one box for each image
Your code works only for the first row, because you request index 0. All the other rows fails because the dataframes remember their original index.
In this case, groupby does the trick.
for fn, subdf in annotations.groupby('filename'):
img = cv2.imread('images/'+fn)
mask = np.zeros((img.shape[0],img.shape[1])).astype('uint8')
for _, row in subdf.iterrows():
mask[row['ymin']:row['ymax'], row['xmin']:row['xmax']] = 1
cv2.imwrite('mask/'+fn, mask)
Here groupby allows to iterate over a series of subdataframes with the same 'filename'.
Then in a nested loop iterrows is used to iterate over each row of each subdataframe in order to extract the value and build the mask.
As you see, the mask is buildt each iteration of the outer loop, leaving the inner loop to "paint" different rectangles of the mask, one rectangle for each row of the subdataframe.
EDIT
A similar but slightly faster solution for the inner loop, instead of iterrows is:
for x1, y1, x2, y2 in zip(subdf['xmin'], subdf['ymin'], subdf['xmax'], subdf['ymax']):
mask[y1:y2, x1:x2] = 1
If you have a large amount of rows may be useful.
Related
I'm doing some tests to check if some choices from my sampling algorithm is better changing its values.
As I was doing them(till this moment without a hitch) and tried to run a couple more tests for more results I got the MemoryError.
MemoryError Traceback (most recent call last)
<ipython-input-66-1ab060bc6067> in <module>
22 for g in range(0,10000):
23 # sample
---> 24 sample_df = stratified_sample(df,test,size=38, keep_index=False)
25 pathaux = "C://Users//Pedro//Desktop//EscolhasAlgoritmos//Stratified//Stratified_Tests//"
26 example = "exampleFCUL"
<ipython-input-10-7aba847839db> in stratified_sample(df, strata, size, seed, keep_index)
79 # final dataframe
80 if first:
---> 81 stratified_df = df.query(qry).sample(n=n, random_state=seed).reset_index(drop=(not keep_index))
82 first = False
83 else:
D:\Anaconda\lib\site-packages\pandas\core\frame.py in query(self, expr, inplace, **kwargs)
3182 kwargs["level"] = kwargs.pop("level", 0) + 1
3183 kwargs["target"] = None
-> 3184 res = self.eval(expr, **kwargs)
3185
3186 try:
D:\Anaconda\lib\site-packages\pandas\core\frame.py in eval(self, expr, inplace, **kwargs)
3298 kwargs["target"] = self
3299 kwargs["resolvers"] = kwargs.get("resolvers", ()) + tuple(resolvers)
-> 3300 return _eval(expr, inplace=inplace, **kwargs)
3301
3302 def select_dtypes(self, include=None, exclude=None):
D:\Anaconda\lib\site-packages\pandas\core\computation\eval.py in eval(expr, parser, engine, truediv, local_dict, global_dict, resolvers, level, target, inplace)
325 eng = _engines[engine]
326 eng_inst = eng(parsed_expr)
--> 327 ret = eng_inst.evaluate()
328
329 if parsed_expr.assigner is None:
D:\Anaconda\lib\site-packages\pandas\core\computation\engines.py in evaluate(self)
68
69 # make sure no names in resolvers and locals/globals clash
---> 70 res = self._evaluate()
71 return _reconstruct_object(
72 self.result_type, res, self.aligned_axes, self.expr.terms.return_type
D:\Anaconda\lib\site-packages\pandas\core\computation\engines.py in _evaluate(self)
117 truediv = scope["truediv"]
118 _check_ne_builtin_clash(self.expr)
--> 119 return ne.evaluate(s, local_dict=scope, truediv=truediv)
120 except KeyError as e:
121 # python 3 compat kludge
D:\Anaconda\lib\site-packages\numexpr\necompiler.py in evaluate(ex, local_dict, global_dict, out, order, casting, **kwargs)
814 expr_key = (ex, tuple(sorted(context.items())))
815 if expr_key not in _names_cache:
--> 816 _names_cache[expr_key] = getExprNames(ex, context)
817 names, ex_uses_vml = _names_cache[expr_key]
818 arguments = getArguments(names, local_dict, global_dict)
D:\Anaconda\lib\site-packages\numexpr\necompiler.py in getExprNames(text, context)
705
706 def getExprNames(text, context):
--> 707 ex = stringToExpression(text, {}, context)
708 ast = expressionToAST(ex)
709 input_order = getInputOrder(ast, None)
D:\Anaconda\lib\site-packages\numexpr\necompiler.py in stringToExpression(s, types, context)
282 else:
283 flags = 0
--> 284 c = compile(s, '<expr>', 'eval', flags)
285 # make VariableNode's for the names
286 names = {}
MemoryError:
My question is, what is the best way of solving this memory error, without changing number of parameters? With all the search I did here and on Google I have no clear answser.
Code:
def transform(multilevelDict):
return {"t"+'_'+str(key) : (transform(value) if isinstance(value, dict) else value) for key, value in multilevelDict.items()}
df = pd.read_csv('testingwebsitedata6.csv', sep=';')
df['Element_Count'] = df['Element_Count'].apply((json.loads))
df['Tag_Count'] = df['Tag_Count'].apply((json.loads))
for i in range(len(df['Tag_Count'])):
df['Tag_Count'][i] = transform(df['Tag_Count'][i])
df1 = pd.DataFrame(df['Element_Count'].values.tolist())
df2 = pd.DataFrame(df['Tag_Count'].values.tolist())
df = pd.concat([df.drop('Element_Count', axis=1), df1], axis=1)
df= pd.concat([df.drop('Tag_Count', axis=1), df2], axis=1)
df= df.fillna(0)
df[df.select_dtypes(include=['float64']).columns]= df.select_dtypes(include=['float64']).astype(int)
df
test= ['link', 'document', 'heading', 'form', 'textbox', 'button', 'list', 'listitem', 'img', 'navigation', 'banner', 'main', 'article', 'contentinfo', 'checkbox', 'table', 'rowgroup', 'row', 'cell', 'listbox', 'presentation', 'figure', 'columnheader', 'separator', 'group', 'region', 't_html', 't_head', 't_title', 't_meta', 't_link', 't_script', 't_style', 't_body', 't_a', 't_div', 't_h1', 't_form', 't_label', 't_input', 't_ul', 't_li', 't_i', 't_img', 't_nav', 't_header', 't_span', 't_article', 't_p', 't_footer', 't_h3', 't_br', 't_noscript', 't_em', 't_strong', 't_button', 't_h2', 't_ol', 't_time', 't_center', 't_table', 't_tbody', 't_tr', 't_td', 't_font', 't_select', 't_option', 't_b', 't_figure', 't_figcaption', 't_u', 't_iframe', 't_caption', 't_thead', 't_th', 't_h5', 't_sup', 't_map', 't_area', 't_hr', 't_h4', 't_blockquote', 't_sub', 't_fieldset', 't_legend', 't_pre', 't_main', 't_section', 't_small', 't_tfoot', 't_textarea', 't_inserir', 't_s']
print('test1')
print('\n')
for g in range(0,10000):
# sample
sample_df = stratified_sample(df,test,size=38, keep_index=False)
pathaux = "C://Users//Pedro//Desktop//EscolhasAlgoritmos//Stratified//Stratified_Tests//"
example = "exampleFCUL"
randomnumber = g+1
csv = ".csv"
path = pathaux + '26'+'//'+ example +str(randomnumber) + csv
chosencolumns= ["Uri"]
sample_df.to_csv(path,sep=';', index = False, columns =chosencolumns, header = False)
Stratifed Sampling function used:
def stratified_sample(df, strata, size=None, seed=None, keep_index= True):
'''
It samples data from a pandas dataframe using strata. These functions use
proportionate stratification:
n1 = (N1/N) * n
where:
- n1 is the sample size of stratum 1
- N1 is the population size of stratum 1
- N is the total population size
- n is the sampling size
Parameters
----------
:df: pandas dataframe from which data will be sampled.
:strata: list containing columns that will be used in the stratified sampling.
:size: sampling size. If not informed, a sampling size will be calculated
using Cochran adjusted sampling formula:
cochran_n = (Z**2 * p * q) /e**2
where:
- Z is the z-value. In this case we use 1.96 representing 95%
- p is the estimated proportion of the population which has an
attribute. In this case we use 0.5
- q is 1-p
- e is the margin of error
This formula is adjusted as follows:
adjusted_cochran = cochran_n / 1+((cochran_n -1)/N)
where:
- cochran_n = result of the previous formula
- N is the population size
:seed: sampling seed
:keep_index: if True, it keeps a column with the original population index indicator
Returns
-------
A sampled pandas dataframe based in a set of strata.
Examples
--------
>> df.head()
id sex age city
0 123 M 20 XYZ
1 456 M 25 XYZ
2 789 M 21 YZX
3 987 F 40 ZXY
4 654 M 45 ZXY
...
# This returns a sample stratified by sex and city containing 30% of the size of
# the original data
>> stratified = stratified_sample(df=df, strata=['sex', 'city'], size=0.3)
Requirements
------------
- pandas
- numpy
'''
population = len(df)
size = __smpl_size(population, size)
tmp = df[strata]
tmp['size'] = 1
tmp_grpd = tmp.groupby(strata).count().reset_index()
tmp_grpd['samp_size'] = round(size/population * tmp_grpd['size']).astype(int)
# controlling variable to create the dataframe or append to it
first = True
for i in range(len(tmp_grpd)):
# query generator for each iteration
qry=''
for s in range(len(strata)):
stratum = strata[s]
value = tmp_grpd.iloc[i][stratum]
n = tmp_grpd.iloc[i]['samp_size']
if type(value) == str:
value = "'" + str(value) + "'"
if s != len(strata)-1:
qry = qry + stratum + ' == ' + str(value) +' & '
else:
qry = qry + stratum + ' == ' + str(value)
# final dataframe
if first:
stratified_df = df.query(qry).sample(n=n, random_state=seed).reset_index(drop=(not keep_index))
first = False
else:
tmp_df = df.query(qry).sample(n=n, random_state=seed).reset_index(drop=(not keep_index))
stratified_df = stratified_df.append(tmp_df, ignore_index=True)
return stratified_df
def stratified_sample_report(df, strata, size=None):
'''
Generates a dataframe reporting the counts in each stratum and the counts
for the final sampled dataframe.
Parameters
----------
:df: pandas dataframe from which data will be sampled.
:strata: list containing columns that will be used in the stratified sampling.
:size: sampling size. If not informed, a sampling size will be calculated
using Cochran adjusted sampling formula:
cochran_n = (Z**2 * p * q) /e**2
where:
- Z is the z-value. In this case we use 1.96 representing 95%
- p is the estimated proportion of the population which has an
attribute. In this case we use 0.5
- q is 1-p
- e is the margin of error
This formula is adjusted as follows:
adjusted_cochran = cochran_n / 1+((cochran_n -1)/N)
where:
- cochran_n = result of the previous formula
- N is the population size
Returns
-------
A dataframe reporting the counts in each stratum and the counts
for the final sampled dataframe.
'''
population = len(df)
size = __smpl_size(population, size)
tmp = df[strata]
tmp['size'] = 1
tmp_grpd = tmp.groupby(strata).count().reset_index()
tmp_grpd['samp_size'] = round(size/population * tmp_grpd['size']).astype(int)
return tmp_grpd
def __smpl_size(population, size):
'''
A function to compute the sample size. If not informed, a sampling
size will be calculated using Cochran adjusted sampling formula:
cochran_n = (Z**2 * p * q) /e**2
where:
- Z is the z-value. In this case we use 1.96 representing 95%
- p is the estimated proportion of the population which has an
attribute. In this case we use 0.5
- q is 1-p
- e is the margin of error
This formula is adjusted as follows:
adjusted_cochran = cochran_n / 1+((cochran_n -1)/N)
where:
- cochran_n = result of the previous formula
- N is the population size
Parameters
----------
:population: population size
:size: sample size (default = None)
Returns
-------
Calculated sample size to be used in the functions:
- stratified_sample
- stratified_sample_report
'''
if size is None:
cochran_n = round(((1.96)**2 * 0.5 * 0.5)/ 0.02**2)
n = round(cochran_n/(1+((cochran_n -1) /population)))
elif size >= 0 and size < 1:
n = round(population * size)
elif size < 0:
raise ValueError('Parameter "size" must be an integer or a proportion between 0 and 0.99.')
elif size >= 1:
n = size
return n
(Anything that I have forgot to mention that u feel is important to understand the problem please say and I will edit it in)
My question is why Python doesn't execute the print statement in the Exception Handling code below. I am trying to calculate the log of volumes for a bunch of stocks. Each stock has 1259 volume values. But Python generates a RunTimeWarning "divide by zero encountered in log". So I try to use Exception Handling to locate where the log input is zero, but Python doesn't execute the print statement under except. The print statement is supposed to print the name of the stock and the index in the array where the volume is zero. Why?
Here is the code:
for i, stock in enumerate(df.columns):
volumes = df[stock].to_numpy()
for r in range(len(volumes)): # len(volumes) = 1259
try:
v = np.log(volumes[r])
except:
print(stock, r)
Here is the Error that follows after the RunTimeWarning.
LinAlgError Traceback (most recent call last)
<ipython-input-6-6aa283671e2c> in <module>
13 closes = df_close[stock].to_numpy()
14 volumes = df_vol[stock].to_numpy()
---> 15 indicator_values_all_stocks[i] = indicator.price_volume_fit(volumes, closes, histLength)
16
17 indicator_values_all_stocks_no_NaN = indicator_values_all_stocks[:, ~np.isnan(indicator_values_all_stocks).any(axis=0)]
~\Desktop\Python Projects Organized\Finance\Indicator Statistics\B.57. Price Volume Fit\indicator.py in price_volume_fit(volumes, closes, histLength)
1259 x = log_volumes[i - histLength:i]
1260 y = log_prices[i - histLength:i]
-> 1261 model = np.polyfit(x, y, 1, full = True)
1262 slope[i] = model[0][0]
1263
<__array_function__ internals> in polyfit(*args, **kwargs)
c:\users\donald seger\miniconda3\envs\tensorflow\lib\site-packages\numpy\lib\polynomial.py in polyfit(x, y, deg, rcond, full, w, cov)
629 scale = NX.sqrt((lhs*lhs).sum(axis=0))
630 lhs /= scale
--> 631 c, resids, rank, s = lstsq(lhs, rhs, rcond)
632 c = (c.T/scale).T # broadcast scale coefficients
633
<__array_function__ internals> in lstsq(*args, **kwargs)
c:\users\donald seger\miniconda3\envs\tensorflow\lib\site-packages\numpy\linalg\linalg.py in lstsq(a, b, rcond)
2257 # lapack can't handle n_rhs = 0 - so allocate the array one larger in that axis
2258 b = zeros(b.shape[:-2] + (m, n_rhs + 1), dtype=b.dtype)
-> 2259 x, resids, rank, s = gufunc(a, b, rcond, signature=signature, extobj=extobj)
2260 if m == 0:
2261 x[...] = 0
c:\users\donald seger\miniconda3\envs\tensorflow\lib\site-packages\numpy\linalg\linalg.py in _raise_linalgerror_lstsq(err, flag)
107
108 def _raise_linalgerror_lstsq(err, flag):
--> 109 raise LinAlgError("SVD did not converge in Linear Least Squares")
110
111 def get_linalg_error_extobj(callback):
LinAlgError: SVD did not converge in Linear Least Squares
My having troubles understanding why is my code not working like I think it should.
This function is supposed to fill up a 2 dimensional list with 1 instead of 0 given some parameters:
x = to tell it where to start to fill up on the x axis
y = to tell it where to start to fill up on the y axis
length = length of the rectangle
width = width of the rectangle
rotation = boolean to tell if the rectangle should be drawn
vertically (if True) or horizontally (if False)
So I call drawLoop() that will call drawVertical() that calls drawHorizon() afterward
This function is aimed at receiving multiple rectangle, but my problem lies when the first is added.
Here is the code:
# Create a 2D list of 120 * 60
thisBox = [["0"] * 120] * 60
# Filler function
def drawLoop(x, y, length, width, rotation):
def drawHorizon(x, y, length, width, rotation, row):
drawIndexH = 0
while drawIndexH < len(row):
if rotation == False and drawIndexH >= x and drawIndexH < x + length:
row[drawIndexH] = "1"
drawIndexH += 1
elif rotation == True and drawIndexH >= x and drawIndexH < x + width:
row[drawIndexH] = "1"
drawIndexH += 1
else:
return
def drawVertical(x, y, length, width, rotation):
drawIndexV = 0
while drawIndexV < len(thisBox):
if rotation == False and drawIndexV >= y and drawIndexV < y + width:
drawHorizon(x, y, length, width, rotation, thisBox[drawIndexV])
drawIndexV += 1
elif rotation == True and drawIndexV >= y and drawIndexV < y + length:
drawHorizon(x, y, length, width, rotation, thisBox[drawIndexV])
drawIndexV += 1
else:
drawIndexV += 1
# Launch vertical drawing
drawVertical(x, y, length, width, rotation)
# Launch main function
drawLoop(0, 0, 70, 50, False)
As of now, the 120 * 60 space is empty, so by calling the main function with drawloop(0, 0, 70, 50, False)on the last line, I'm supposed to see a 70 * 50 rectangle drawn at the position (0, 0). So that out of the 7200 0 (120 * 60) I should see only 3700 left (7200 - (70 * 50))
So the function is divided into 2 other functions: drawVertical(x, y, length, width, rotation) that will draw vertically and drawHorizon(x, y, length, width, rotation, row) that will draw horizontally first.
But somehow, at the first iteration of drawVertical(...), all the rows are being filled up in one iteration and do not stop at the exit condition: it should stop at y = 50 (the width) when if rotation == False and drawIndexV >= y and drawIndexV < y + width: of drawVertical(...) because drawIndexV < y + width should stop at 50. But it does not and I have no clue why.
Even if I tell drawVertical(...) to stop the loop at the first iteration with while drawIndexV < 2:, all the rows are being filled up.
So horizontally I have the expected result, but I never have it vertically. Can anybody spot my mistake? Many thanks in advance!
Ben.
The problem is indeed that
thisBox = [["0"] * 120] * 60
creates a list with 60 times the same element, a list with 120 "0", as the following code snippet shows:
for r in range(60):
print(id(thisBox[r]))
for which a sample output is:
140706579889408
140706579889408
140706579889408
140706579889408
140706579889408
140706579889408
...
Updating any element on any row will update the same element on every row, since every row is the same unique list object.
To avoid the issue, one needs to ensure that each of the enclosed lists (the 60 lists, each of which contains 120 "0") is a separate list object, distinct from all the other enclosed lists, i.e., has its own id.
If familiar with numpy (numpy.zeros), and depending on the exact requirements, resorting to numpy arrays could be a solution. If using numpy is an instance of "shooting a bird with a cannonball", one alternative is using list comprehensions to initialise the list:
thisBox = [["0" for c in range(120)] for r in range(60)]
Running the same code as before confirms that each list of 120 "0" now has its own id, i.e., each row is a separate list:
for r in range(60):
print(id(thisBox[r]))
140185522518784
140185522481600
140185522519680
140185522482560
140185503364672
...
(Would have added the solution described above, using list comprehensions, as a comment, but was barred from doing so because of insufficient "reputation" points)
I have found my problem (Merci Philippe!!!)
When I do this:
# Create a 2D list of 120 * 60
thisBox = [["0"] * 120] * 60
I'm actually creating a list of the same element 60 times. They all have the same memory allocation. So if I modify one, I modify them all. That's why one iteration of drawVertical(...) modified the whole 60 rows.
I have 2 variables 'Root zone' and 'Tree cover' both are geolocated (NetCDF) (which are basically grids with each grid having a specific value). The values in TC varies from 0 to 100. Each grid size is 0.25 degrees (might be helpful in understanding the distance).
My problem is "I want to calculate the distance of each TC value ranging between 70-100 and 30-70 (so each value of TC value greater than 30 at each lat and lon) from the points where nearest TC ranges between 0-30 (less than 30)."
What I want to do is create a 2-dimensional scatter plot with X-axis denoting the 'distance in km of 70-100 TC (and 30-70 TC) from 0-30 values', Y-axis denoting 'RZS of those 70-100 TC points (and 30-70 TC)'
#I read the files using xarray
deficit_annual = xr.open_dataset('Rootzone_CHIRPS_era5_2000-2015_annual_SA_masked.nc')
tc = xr.open_dataset('Treecover_MODIS_2000-2015_annual_SA_masked.nc')
fig, ax = plt.subplots(figsize = (8,8))
## year I am interested in
year = 2000
i = year - 2000
# Select the indices of the low- and high-valued points
# This will results in warnings here because of NaNs;
# the NaNs should be filtered out in the indices, since they will
# compare to False in all the comparisons, and thus not be
# indexed by 'low' and 'high'
low = (tc[i,:,:] <= 30) # Savanna
moderate = (tc[i,:,:] > 30) & (tc[i,:,:] < 70) #Transitional forest
high = (tc[i,:,:] >= 70) #Forest
# Get the coordinates for the low- and high-valued points,
# combine and transpose them to be in the correct format
y, x = np.where(low)
low_coords = np.array([x, y]).T
y, x = np.where(high)
high_coords = np.array([x, y]).T
y, x = np.where(moderate)
moderate_coords = np.array([x, y]).T
# We now calculate the distances between *all* low-valued points, and *all* high-valued points.
# This calculation scales as O^2, as does the memory cost (of the output),
# so be wary when using it with large input sizes.
from scipy.spatial.distance import cdist, pdist
distances = cdist(low_coords, moderate_coords, 'euclidean')
# Now find the minimum distance along the axis of the high-valued coords,
# which here is the second axis.
# Since we also want to find values corresponding to those minimum distances,
# we should use the `argmin` function instead of a normal `min` function.
indices = distances.argmin(axis=1)
mindistances = distances[np.arange(distances.shape[0]), indices]
minrzs = np.array(deficit_annual[i,:,:]).flatten()[indices]
plt.scatter(mindistances*25, minrzs, s = 60, alpha = 0.5, color = 'goldenrod', label = 'Trasitional Forest')
distances = cdist(low_coords, high_coords, 'euclidean')
# Now find the minimum distance along the axis of the high-valued coords,
# which here is the second axis.
# Since we also want to find values corresponding to those minimum distances,
# we should use the `argmin` function instead of a normal `min` function.
indices = distances.argmin(axis=1)
mindistances = distances[np.arange(distances.shape[0]), indices]
minrzs = np.array(deficit_annual[i,:,:]).flatten()[indices]
plt.scatter(mindistances*25, minrzs, s = 60, alpha = 1, color = 'green', label = 'Forest')
plt.xlabel('Distance from Savanna (km)', fontsize = '14')
plt.xticks(fontsize = '14')
plt.yticks(fontsize = '14')
plt.ylabel('Rootzone storage capacity (mm/year)', fontsize = '14')
plt.legend(fontsize = '14')
#plt.ylim((-10, 1100))
#plt.xlim((0, 30))
What I want is to know whether the code seems to have an error (as it is working now, but doesn't seem to work when I increase the 'high = (tc[i,:,:] >= 70 ` to 80 for year 2000. This makes me wonder if the code is correct or not.
Secondly, is it possible to define a 20 km buffer region of 'low = (tc[i,:,:] <= 30)'. What I mean is that the 'low' is defined only when a cluster of Tree cover values are below 30 and not by an individual pixel.
Some netCDF files are attached in the link below:
https://www.dropbox.com/sh/unm96q7sfto8y53/AAA7e12bs07XtpMiVFdML_PIa?dl=0
The graph I want is something like this (derived from the code above).
Thank you for your help.
I have a function which calculates the mean depth of a 3-D volume. Is there a way to make the code more efficient in terms of execution time. The volume is of the following shape.
volume = np.zeros((100, 240, 180))
The volume can contain the number 1 at different voxels and the objective is to find the mean depth (mean Z co-ord) using weighted average of all occupied cells in the volume.
def calc_mean_depth(volume):
'''
Calculate the mean depth of the volume. Only voxels which contain a value are considered for the mean depth
Parameters:
-----------
volume: (100x240x180) numpy array
Input 3-D volume which may contain value of 1 in its voxels
Return:
-------
mean_depth :<float>
mean depth calculated
'''
depth_weight = 0
tot = 0
for z in range(volume.shape[0]):
vol_slice = volume[z, :, :] # take one x-y plane
weight = vol_slice[vol_slice>0].size # get number of values greater than zero
tot += weight # This counter is used to serve as the denominator
depth_weight += weight * z # the depth plane into number of cells in it greater than 0.
if tot==0:
return 0
else:
mean_depth = depth_weight/tot
return mean_depth
This should work. Use count_nonzero to do the summing and do the averaging at the end.
def calc_mean_depth(volume):
w = np.count_nonzero(volume, axis = (1,2))
if w.sum() == 0:
return 0
else
return (np.arange(w.size) * w).sum() / w.sum()