Distance between 2 user defined georeferenced grids in km - python-3.x

I have 2 variables 'Root zone' and 'Tree cover' both are geolocated (NetCDF) (which are basically grids with each grid having a specific value). The values in TC varies from 0 to 100. Each grid size is 0.25 degrees (might be helpful in understanding the distance).
My problem is "I want to calculate the distance of each TC value ranging between 70-100 and 30-70 (so each value of TC value greater than 30 at each lat and lon) from the points where nearest TC ranges between 0-30 (less than 30)."
What I want to do is create a 2-dimensional scatter plot with X-axis denoting the 'distance in km of 70-100 TC (and 30-70 TC) from 0-30 values', Y-axis denoting 'RZS of those 70-100 TC points (and 30-70 TC)'
#I read the files using xarray
deficit_annual = xr.open_dataset('Rootzone_CHIRPS_era5_2000-2015_annual_SA_masked.nc')
tc = xr.open_dataset('Treecover_MODIS_2000-2015_annual_SA_masked.nc')
fig, ax = plt.subplots(figsize = (8,8))
## year I am interested in
year = 2000
i = year - 2000
# Select the indices of the low- and high-valued points
# This will results in warnings here because of NaNs;
# the NaNs should be filtered out in the indices, since they will
# compare to False in all the comparisons, and thus not be
# indexed by 'low' and 'high'
low = (tc[i,:,:] <= 30) # Savanna
moderate = (tc[i,:,:] > 30) & (tc[i,:,:] < 70) #Transitional forest
high = (tc[i,:,:] >= 70) #Forest
# Get the coordinates for the low- and high-valued points,
# combine and transpose them to be in the correct format
y, x = np.where(low)
low_coords = np.array([x, y]).T
y, x = np.where(high)
high_coords = np.array([x, y]).T
y, x = np.where(moderate)
moderate_coords = np.array([x, y]).T
# We now calculate the distances between *all* low-valued points, and *all* high-valued points.
# This calculation scales as O^2, as does the memory cost (of the output),
# so be wary when using it with large input sizes.
from scipy.spatial.distance import cdist, pdist
distances = cdist(low_coords, moderate_coords, 'euclidean')
# Now find the minimum distance along the axis of the high-valued coords,
# which here is the second axis.
# Since we also want to find values corresponding to those minimum distances,
# we should use the `argmin` function instead of a normal `min` function.
indices = distances.argmin(axis=1)
mindistances = distances[np.arange(distances.shape[0]), indices]
minrzs = np.array(deficit_annual[i,:,:]).flatten()[indices]
plt.scatter(mindistances*25, minrzs, s = 60, alpha = 0.5, color = 'goldenrod', label = 'Trasitional Forest')
distances = cdist(low_coords, high_coords, 'euclidean')
# Now find the minimum distance along the axis of the high-valued coords,
# which here is the second axis.
# Since we also want to find values corresponding to those minimum distances,
# we should use the `argmin` function instead of a normal `min` function.
indices = distances.argmin(axis=1)
mindistances = distances[np.arange(distances.shape[0]), indices]
minrzs = np.array(deficit_annual[i,:,:]).flatten()[indices]
plt.scatter(mindistances*25, minrzs, s = 60, alpha = 1, color = 'green', label = 'Forest')
plt.xlabel('Distance from Savanna (km)', fontsize = '14')
plt.xticks(fontsize = '14')
plt.yticks(fontsize = '14')
plt.ylabel('Rootzone storage capacity (mm/year)', fontsize = '14')
plt.legend(fontsize = '14')
#plt.ylim((-10, 1100))
#plt.xlim((0, 30))
What I want is to know whether the code seems to have an error (as it is working now, but doesn't seem to work when I increase the 'high = (tc[i,:,:] >= 70 ` to 80 for year 2000. This makes me wonder if the code is correct or not.
Secondly, is it possible to define a 20 km buffer region of 'low = (tc[i,:,:] <= 30)'. What I mean is that the 'low' is defined only when a cluster of Tree cover values are below 30 and not by an individual pixel.
Some netCDF files are attached in the link below:
https://www.dropbox.com/sh/unm96q7sfto8y53/AAA7e12bs07XtpMiVFdML_PIa?dl=0
The graph I want is something like this (derived from the code above).
Thank you for your help.

Related

How to increase size of plot using 'ax' and ensure that 'y'-axis ticks are actual values instead of 'le11'

I am trying to plot a bar graph using a dataframe, and I used the below code:
def add_line(ax, xpos, ypos):
line = plt.Line2D([xpos, xpos], [ypos + .1, ypos],
transform=ax.transAxes, color='gray')
line.set_clip_on(False)
ax.add_line(line)
def label_len(my_index,level):
labels = my_index.get_level_values(level)
return [(k, sum(1 for i in g)) for k,g in groupby(labels)]
def label_group_bar_table(ax, df):
ypos = -.1
scale = 1./df.index.size
for level in range(df.index.nlevels)[::-1]:
pos = 0
for label, rpos in label_len(df.index,level):
lxpos = (pos + .5 * rpos)*scale
ax.text(lxpos, ypos, label, ha='center', transform=ax.transAxes)
add_line(ax, pos*scale, ypos)
pos += rpos
add_line(ax, pos*scale , ypos)
ypos -= .1
from matplotlib.pyplot import figure
ax = my_df.plot(kind='bar')
ax.set_xticklabels('State')
ax.set_xlabel('Electricity consumed by every resource')
ax.plot([1,2,3])
#plt.xticks(rotation=90)
label_group_bar_table(ax, my_df)
My question is: How do I change the size of the plot and how can I make sure that the ticks are displayed vertically on the x-axis and ensure that the title of the x-axis and the ticks on the x-axis don't overlap?
While using 'figure', I know that the 'rotation' parameter can be changed to 90 to ensure that x ticks are vertical. I also understand that the 'figsize' can be used to set the size while using figure. But I am not sure how we should work with 'ax'.
Why are my y-axis ticks in decimal and what is that 'le11'? My data contains numbers that are 7 digit or 8 digits. Is there a way to ensure the y-axis also contains 7 or 8 digit numbers instead?
My graph looks like:

Efficient implementation of the following code snippet

I have a function which calculates the mean depth of a 3-D volume. Is there a way to make the code more efficient in terms of execution time. The volume is of the following shape.
volume = np.zeros((100, 240, 180))
The volume can contain the number 1 at different voxels and the objective is to find the mean depth (mean Z co-ord) using weighted average of all occupied cells in the volume.
def calc_mean_depth(volume):
'''
Calculate the mean depth of the volume. Only voxels which contain a value are considered for the mean depth
Parameters:
-----------
volume: (100x240x180) numpy array
Input 3-D volume which may contain value of 1 in its voxels
Return:
-------
mean_depth :<float>
mean depth calculated
'''
depth_weight = 0
tot = 0
for z in range(volume.shape[0]):
vol_slice = volume[z, :, :] # take one x-y plane
weight = vol_slice[vol_slice>0].size # get number of values greater than zero
tot += weight # This counter is used to serve as the denominator
depth_weight += weight * z # the depth plane into number of cells in it greater than 0.
if tot==0:
return 0
else:
mean_depth = depth_weight/tot
return mean_depth
This should work. Use count_nonzero to do the summing and do the averaging at the end.
def calc_mean_depth(volume):
w = np.count_nonzero(volume, axis = (1,2))
if w.sum() == 0:
return 0
else
return (np.arange(w.size) * w).sum() / w.sum()

Weighted moving average in python with different width in different regions

I was trying to take a oscillation avarage of a highly oscillating data. The oscillations are not uniform, it has less oscillations in the initial regions.
x = np.linspace(0, 1000, 1000001)
y = some oscillating data say, sin(x^2)
(The original data file is huge, so I can't upload it)
I want to take a weighted moving avarage of the function and plot it. Initially the period of the function is larger, so I want to take avarage over a large time interval. While I can do with smaller time interval latter.
I have found a possible elegant solution in following post:
Weighted moving average in python
However, I want to have different width in different regions of x. Say when x is between (0,100) I want the width=0.6, while when x is between (101, 300) width=0.2 and so on.
This is what I have tried to implement( with my limited knowledge in programing!)
def weighted_moving_average(x,y,step_size=0.05):#change the width to control average
bin_centers = np.arange(np.min(x),np.max(x)-0.5*step_size,step_size)+0.5*step_size
bin_avg = np.zeros(len(bin_centers))
#We're going to weight with a Gaussian function
def gaussian(x,amp=1,mean=0,sigma=1):
return amp*np.exp(-(x-mean)**2/(2*sigma**2))
if x.any() < 100:
for index in range(0,len(bin_centers)):
bin_center = bin_centers[index]
weights = gaussian(x,mean=bin_center,sigma=0.6)
bin_avg[index] = np.average(y,weights=weights)
else:
for index in range(0,len(bin_centers)):
bin_center = bin_centers[index]
weights = gaussian(x,mean=bin_center,sigma=0.1)
bin_avg[index] = np.average(y,weights=weights)
return (bin_centers,bin_avg)
It is needless to say that this is not working! I am getting the plot with the first value of sigma. Please help...
The following snippet should do more or less what you tried to do. You have mainly a logical problem in your code, x.any() < 100 will always be True, so you'll never execute the second part.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 1000)
y = np.sin(x**2)
def gaussian(x,amp=1,mean=0,sigma=1):
return amp*np.exp(-(x-mean)**2/(2*sigma**2))
def weighted_average(x,y,step_size=0.3):
weights = np.zeros_like(x)
bin_centers = np.arange(np.min(x),np.max(x)-.5*step_size,step_size)+.5*step_size
bin_avg = np.zeros_like(bin_centers)
for i, center in enumerate(bin_centers):
# Select the indices that should count to that bin
idx = ((x >= center-.5*step_size) & (x <= center+.5*step_size))
weights = gaussian(x[idx], mean=center, sigma=step_size)
bin_avg[i] = np.average(y[idx], weights=weights)
return (bin_centers,bin_avg)
idx = x <= 4
plt.plot(*weighted_average(x[idx],y[idx], step_size=0.6))
idx = x >= 3
plt.plot(*weighted_average(x[idx],y[idx], step_size=0.1))
plt.plot(x,y)
plt.legend(['0.6', '0.1', 'y'])
plt.show()
However, depending on the usage, you could also implement moving average directly:
x = np.linspace(0, 60, 1000)
y = np.sin(x**2)
z = np.zeros_like(x)
z[0] = x[0]
for i, t in enumerate(x[1:]):
a=.2
z[i+1] = a*y[i+1] + (1-a)*z[i]
plt.plot(x,y)
plt.plot(x,z)
plt.legend(['data', 'moving average'])
plt.show()
Of course you could then change a adaptively, e.g. depending of the local variance. Also note that this has apriori a small bias depending on a and the step size in x.

Colormapping the Mandelbrot set by iterations in python

I am using np.ogrid to create the x and y grid from which I am drawing my values. I have tried a number of different ways to color the scheme according to the iterations required for |z| >= 2 but nothing seems to work. Even when iterating 10,000 times just to be sure that I have a clear picture when zooming, I cannot figure out how to color the set according to iteration ranges. Here is the code I am using, some of the structure was borrowed from a tutorial. Any suggestions?
#I found this function and searched in numpy for best usage for this type of density plot
x_val, y_val = np.ogrid[-2:2:2000j, -2:2:2000j]
#Creating the values to work with during the iterations
c = x_val + 1j*y_val
z = 0
iter_num = int(input("Please enter the number of iterations:"))
for n in range(iter_num):
z = z**2 + c
if n%10 == 0:
print("Iterations left: ",iter_num - n)
#Creates the mask to filter out values of |z| > 2
z_mask = abs(z) < 2
proper_z_mask = z_mask - 255 #switches current black/white pallette
#Creating the figure and sizing for optimal viewing on a small laptop screen
plt.figure(1, figsize=(8,8))
plt.imshow(z_mask.T, extent=[-2, 2, -2, 2])
plt.gray()
plt.show()

Selecting colors that are furthest apart

I'm working on a project that requires me to select "unique" colors for each item. At times there could be upwards of 400 items. Is there some way out there of selecting the 400 colors that differ the most? Is it as simple as just changing the RGB values by a fixed increment?
You could come up with an equal distribution of 400 colours by incrementing red, green and blue in turn by 34.
That is:
You know you have three colour channels: red, green and blue
You need 400 distinct combinations of R, G and B
So on each channel the number of increments you need is the cube root of 400, i.e. about 7.36
To span the range 0..255 with 7.36 increments, each increment must be about 255/7.36, i.e. about 34
Probably HSL or HSV would be a better representations than RGB for this task.
You may find that changing the hue gives better variability perception to the eye, so adjust your increments in a way that for every X units changed in S and L you change Y (with Y < X) units of hue, and adjust X and Y so you cover the spectrum with your desired amount of samples.
Here is my final code. Hopefully it helps someone down the road.
from PIL import Image, ImageDraw
import math, colorsys, os.path
# number of color circles needed
qty = 400
# the lowest value (V in HSV) can go
vmin = 30
# calculate how much to increment value by
vrange = 100 - vmin
if (qty >= 72):
vdiff = math.floor(vrange / (qty / 72))
else:
vdiff = 0
# set options
sizes = [16, 24, 32]
border_color = '000000'
border_size = 3
# initialize variables
hval = 0
sval = 50
vval = vmin
count = 0
while count < qty:
im = Image.new('RGBA', (100, 100), (0, 0, 0, 0))
draw = ImageDraw.Draw(im)
draw.ellipse((5, 5, 95, 95), fill='#'+border_color)
r, g, b = colorsys.hsv_to_rgb(hval/360.0, sval/100.0, vval/100.0)
r = int(r*255)
g = int(g*255)
b = int(b*255)
draw.ellipse((5+border_size, 5+border_size, 95-border_size, 95-border_size), fill=(r, g, b))
del draw
hexval = '%02x%02x%02x' % (r, g, b)
for size in sizes:
result = im.resize((size, size), Image.ANTIALIAS)
result.save(str(qty)+'/'+hexval+'_'+str(size)+'.png', 'PNG')
if hval + 10 < 360:
hval += 10
else:
if sval == 50:
hval = 0
sval = 100
else:
hval = 0
sval = 50
vval += vdiff
count += 1
Hey I came across this problem a few times in my projects where I wanted to display, say, clusters of points. I found that the best way to go was to use the colormaps from matplotlib (https://matplotlib.org/stable/tutorials/colors/colormaps.html) and
colors = plt.get_cmap("hsv")[np.linspace(0, 1, n_colors)]
this will output rgba colors so you can get the rgb with just
rgb = colors[:,:3]

Resources