Hyperbolic CORDIC in rotation (Z -> 0) to calculate sinh and cosh? - trigonometry

I implemented both circular and hyperbolic CORDIC algorithm in rotation mode:Z -> 0
In case of sin and cos which using circular implementation, the results are accurate. In case of sinh and cosh which is the hyperbolic algorithm, they are not.
The output of the code below (*_calc is the CORDIC version, *_good is the math.* version) is the following:
sin_good(20): 0.3420201433256687
sin_calc(20): 0.34202014332566866
sinh_good(20): 242582597.70489514
sinh_calc(20): 0.3555015499407712
cos_good(20): 0.9396926207859084
cos_calc(20): 0.9396926207859082
cosh_good(20): 242582597.70489514
cosh_calc(20): 1.0594692478629741
What am I doing wrong?
def lookup_circular(iteration):
return math.degrees(math.atan(2 ** -iteration))
def lookup_linear(iteration):
return 2 ** -iteration
def lookup_hyperbolic(iteration):
return math.degrees(math.atanh(2 ** -iteration))
def sin(angle):
x, y, z = cordic_circular_rotation_zto0(
x=1 / circular_scaling_factor(),
return y
def cos(angle):
x, y, z = cordic_circular_rotation_zto0(
x=1 / circular_scaling_factor(),
return x
def sinh(angle):
x, y, z = cordic_hyperbolic_rotation_zto0(
x=1 / hyperbolic_scaling_factor(),
return y
def cosh(angle):
x, y, z = cordic_hyperbolic_rotation_zto0(
x=1 / hyperbolic_scaling_factor(),
return x
def cordic_circular_rotation_zto0(x, y, z, n=64):
i = 0
while i <= n:
if z < 0:
newx = x + (y * 2.0 ** (-i))
newy = y - (x * 2.0 ** (-i))
z = z + lookup_circular(i)
newx = x - (y * 2.0 ** (-i))
newy = y + (x * 2.0 ** (-i))
z = z - lookup_circular(i)
x = newx
y = newy
i += 1
return x, y, z
def cordic_hyperbolic_rotation_zto0(x, y, z, n=64):
i = 1
repeat = 4
while i <= n:
if z < 0:
newx = x - (y * 2.0 ** (-i))
newy = y - (x * 2.0 ** (-i))
z = z + lookup_hyperbolic(i)
newx = x + (y * 2.0 ** (-i))
newy = y + (x * 2.0 ** (-i))
z = z - lookup_hyperbolic(i)
x = newx
y = newy
if i == repeat:
repeat = (i * 3) + 1
i += 1
return x, y, z
def circular_scaling_factor(n=64):
e = 1
for i in range(0, n):
e = e * math.sqrt(1 + 2 ** (-2 * i))
return e
def hyperbolic_scaling_factor(n=64):
e = 1
for i in range(1, n):
e = e * math.sqrt(1 - 2 ** (-2 * i))
return e
if __name__ == '__main__':
angle = 20
sin_res = sin(angle)
print("sin_good({}): {}".format(angle, math.sin(math.radians(angle))))
print("sin_calc({}): {}".format(angle, sin_res))
sinh_res = sinh(angle)
print("sinh_good({}): {}".format(angle, math.sinh(angle)))
print("sinh_calc({}): {}".format(angle, sinh_res))
cos_res = cos(angle)
print("cos_good({}): {}".format(angle, math.cos(math.radians(angle))))
print("cos_calc({}): {}".format(angle, cos_res))
cosh_res = cosh(angle)
print("cosh_good({}): {}".format(angle, math.cosh(angle)))
print("cosh_calc({}): {}".format(angle, cosh_res))

By removing the math.degrees of the inverse hyperbolic tan of lookup_hyperbolic, I find the following result for cosh:
both functions match until x~1.1, and then the cordic function stays constant.
Which is what can be found in Digital Arithmetic - Ercegovac/Lang 2003 chapter 11
max angle = 1.11817
Same for sinh:
There is an extended cordic alogrithm that you could try to implement:
Expanding the Range of Convergence of the CORDIC Algorithm
X. Hu, R. Harber, S. Bass
Published 1991
Computer Science
IEEE Trans. Computers


Cannot optimize the bias parameter in linear regression

I am trying to train a very basic linear regression model to predict a linear equation Y = m*X + c
The Weight parameter is optimized to 5 but the Bias parameter is stuck at 0. Am I doing something wrong?
X = np.array(range(1,1000))
Y = 5 * X + 7
def forward(W, X ,b):
return W * X + b
def getcost(Y, y):
return np.sum((Y-y)**2) / 1000
def backward(W, b, X, Y, y, lr):
dW = -2 * np.dot((Y-y).T, X) / 1000
db = -2 * np.sum(Y-y) / 1000
W -= lr * dW
b -= lr * db
return W, b
W = 0.0
b = 0.0
for i in range(80):
y = forward(W, X ,b)
cost = getcost(Y, y)
W, b = backward(W, b, X, Y, y, lr=0.000001)
print(int(cost), W, b)
The range of X is too extensive since X and Y have a linear relationship the model can be trained on a small range of values. The learning rate is very small it will take much more time to converge since your input set is very big. If you really want to use the same data then You can normalize X.
X = np.array(range(1,30))
Y = 5 * X +7
# Normalize the X values
#X = (X - np.mean(X)) / np.std(X)
N = len(Y)
learning_rate = 0.001
# Initialize the model with the correct values for m and b
m, b = 0.0, 0.0
errors = []
for p in range(8000):
hyp = m * X + b
error = Y - hyp
m_gradient = -(2/N) * np.sum(X * error)
b_gradient = -(2/N) * np.sum(error)
m = m - learning_rate * m_gradient
b = b - learning_rate * b_gradient
errors.append(np.mean(error ** 2))
if p%400==0:
print(f'm={m} b={b} ' )
# prediction for x = 231 , y should be 5*200+7 = 1007
print( m*200+b)
I agree with #Ahsan Nawaz
The only changes I made to your code are -
Scaled your features (for otherwise, increasing the learning_rate gave NANs)
Increased the learning rate
Increased the number of epochs
Here is your code modified -
import numpy as np
from sklearn.preprocessing import StandardScaler
X = np.array(range(1,1000))
scaler = StandardScaler()
X = scaler.transform(X.reshape(-1,1)).reshape(-1)
Y = 5 * X + 7
def forward(W, X ,b):
return W * X + b
def getcost(Y, y):
return np.sum((Y-y)**2) / 1000
def backward(W, b, X, Y, y, lr):
dW = -2 * np.dot((Y-y).T, X) / 1000
db = -2 * np.sum(Y-y) / 1000
W -= lr * dW
b -= lr * db
return W, b
W = 0.0
b = 0.0
for i in range(8000):
y = forward(W, X ,b)
cost = getcost(Y, y)
W, b = backward(W, b, X, Y, y, lr=0.001)
print(int(cost), W, b)
Here is the final output -
0 4.999999437318114 6.999999212245364

Where is my code hanging (in an infinite loop)?

I am new to Python and trying to get this script to run, but it seems to be hanging in an infinite loop. When I use ctrl+c to stop it, it is always on line 103.
vs = 20.05 * np.sqrt(Tb + Lb * (y - y0)) # m/s speed of sound as a function of temperature
I am used to MatLab (from school) and the editor it has. I ran into issues earlier with the encoding for this code. Any suggestions on a (free) editor? I am currently using JEdit and/or Notepad.
Here is the full script:
#!/usr/bin/env python
# -*- coding: ANSI -*-
import numpy as np
from math import *
from astropy.table import Table
import matplotlib.pyplot as plt
from hanging_threads import start_monitoring#test for code hanging
start_monitoring(seconds_frozen=10, test_interval=100)
"""Initial Conditions and Inputs"""
d = 154.71/1000 # diameter of bullet (in meters)
m = 46.7 # mass of bullet ( in kg)
K3 = 0.87*0.3735 # drag coefficient at supersonic speed
Cd1 = 0.87*0.108 #drag coefficient at subsonic speed
v0 = 802 # muzzle velocity in m/sec
dt = 0.01 # timestep in seconds
"""coriolis inputs"""
L = 90*np.pi/180 # radians - latitude of firing site
AZ = 90*np.pi/180 # radians - azimuth angle of fire measured clockwise from North
omega = 0.0000727 #rad/s rotation of the earth
"""wind inputs"""
wx = 0 # m/s
wz = 0 # m/s
"""initializing variables"""
vx = 0 #initial x velocity
vy = 0 #initial y velocity
vy0 = 0
y_max = 0 #apogee
v = 0
t = 0
x = 0
"""Variable Atmospheric Pressure"""
rho0 = 1.2041 # density of air at sea-level (kg/m^3)
T = 20 #temperature at sea level in celcius
Tb = T + 273.15 # temperature at sea level in Kelvin
Lb = -2/304.8 # temperature lapse rate in K/m (-2degrees/1000ft)- not valid above 36000ft
y = 0 # current altitude
y0 = 0 # initial altitude
g = 9.81 # acceleration due to gravity in m/s/s
M = 0.0289644 #kg/mol # molar mass of air
R = 8.3144598 # J/molK - universal gas constant
# air density as a function of altitude and temperature
rho = rho0 * ((Tb/(Tb+Lb*(y-y0)))**(1+(g*M/(R*Lb))))
"""Variable Speed of Sound"""
vs = 20.05*np.sqrt(Tb +Lb*(y-y0)) # m/s speed of sound as a function of temperature
Area = pi*(d/2)**2 # computing the reference area
phi_incr = 5 #phi0 increment (degrees)
N = 12 # length of table
"""Range table"""
dtype = [('phi0', 'f8'), ('phi_impact', 'f8'), ('x', 'f8'), ('z', 'f8'),('y', 'f8'), ('vx', 'f8'), ('vz', 'f8'), ('vy', 'f8'), ('v', 'f8'),('M', 'f8'), ('t', 'f8')]
table = Table(data=np.zeros(N, dtype=dtype))
"""Calculates entire trajectory for each specified angle"""
for i in range(N):
phi0 = (i + 1) * phi_incr
"""list of initial variables used in while loop"""
t = 0
y = 0
y_max = y
x = 0
z = 0
vx = v0*np.cos(radians(phi0))
vy = v0*np.sin(radians(phi0))
vx_w = 0
vz_w = 0
vz = 0
v = v0
ay = 0
ax = 0
wx = wx
wz = wz
rho = rho0 * ((Tb / (Tb + Lb * (y - y0))) ** (1 + (g * M / (R * Lb))))
vs = 20.05 * np.sqrt(Tb + Lb * (y - y0)) # m/s speed of sound as a function of temperature
ax_c = -2 * omega * ((vz * sin(L)) + vy * cos(L) * sin(AZ))
ay_c = 2 * omega * ((vz * cos(L) * cos(AZ)) + vx_w * cos(L) * sin(AZ))
az_c = -2 * omega * ((vy * cos(L) * cos(AZ)) - vx_w * sin(L))
Mach = v/vs
""" initializing variables for plots"""
t_list = [t]
x_list = [x]
y_list = [y]
vy_list = [vy]
v_list = [v]
phi0_list = [phi0]
Mach_list = [Mach]
while y >= 0:
phi0 = phi0
"""drag calculation with variable density, Temp and sound speed"""
rho = rho0 * ((Tb / (Tb + Lb * (y - y0))) ** (1 + (g * M / (R *Lb))))
vs = 20.05 * np.sqrt(Tb + Lb * (y - y0)) # m/s speed of sound as a function of temperature
Cd3 = K3 / sqrt(v / vs)
Mach = v/vs
"""Determining drag regime"""
if v > 1.2 * vs: #supersonic
Cd = Cd3
elif v < 0.8 * vs: #subsonic
Cd = Cd1
else: #transonic
Cd = ((Cd3 - Cd1)*(v/vs - 0.8)/(0.4)) + Cd1
"""Acceleration due to Coriolis"""
ax_c = -2*omega*((vz_w*sin(L))+ vy*cos(L)*sin(AZ))
ay_c = 2*omega*((vz_w*cos(L)*cos(AZ))+ vx_w*cos(L)*sin(AZ))
az_c = -2*omega*((vy*cos(L)*cos(AZ))- vx_w*sin(L))
"""Total acceleration calcs"""
if vx > 0:
ax = -0.5*rho*((vx-wx)**2)*Cd*Area/m + ax_c
ax = 0
""" Vy before and after peak"""
if vy > 0:
ay = (-0.5 * rho * (vy ** 2) * Cd * Area / m) - g + ay_c
ay = (0.5 * rho * (vy ** 2) * Cd * Area / m) - g + ay_c
az = az_c
vx = vx + ax*dt # vx without wind
# vx_w = vx with drag and no wind + wind
vx_w = vx + 2*wx*(1-(vx/v0*np.cos(radians(phi0))))
vy = vy + ay*dt
vz = vz + az*dt
vz_w = vz + wz*(1-(vx/v0*np.cos(radians(phi0))))
"""projectile velocity"""
v = sqrt(vx_w**2 + vy**2 + vz**2)
"""new x, y, z positions"""
x = x + vx_w*dt
y = y + vy*dt
z = z + vz_w*dt
if y_max <= y:
y_max = y
phi_impact = degrees(atan(vy/vx)) #impact angle in degrees
""" appends selected data for ability to plot"""
if y < 0:
t += dt
"""Range table output"""
table[i] = ('%.f' % phi0, '%.3f' % phi_impact, '%.1f' % x,'%.2f' % z, '%.1f' % y_max, '%.1f' % vx_w,'%.1f' % vz,'%.1f' % vy,'%.1f' % v,'%.2f' %Mach, '%.1f' % t)
""" Plot"""
plt.plot(x_list, y_list, label='%d°' % phi0)#plt.plot(x_list, y_list, label='%d°' % phi0)
plt.title('Altitude versus Range')
plt.ylabel('Altitude (m)')
plt.xlabel('Range (m)')
plt.axis([0, 30000, 0, 15000])
legend = plt.legend(title="Firing Angle",loc=0, fontsize='small', fancybox=True)
Thank you in advance
Which Editor Should I Use?
Personally, I prefer VSCode, but Sublime is also pretty popular. If you really want to go barebones, try Vim. All three are completely free.
Code Errors
After scanning your code snippet, it appears that you are caught in an infinite loop, which you enter with the statement while y >= 0. The reason you always get line 103 when you hit Ctrl+C is likely because that takes the longest, making it more likely to land there at any given time.
Note that currently, you can only escape your while loop through this branch:
if y_max <= y:
y_max= y
phi_impact = degrees(atan(vy/vx)) #impact angle in degrees
""" appends selected data for ability to plot"""
if y < 0:
t += dt
This means that if ymax never drops below y, or y never drops below zero, then you will infinitely loop. Granted, I haven't looked at your code in any great depth, but from the surface it appears that y_max is never decremented (meaning it will always be at least equal to y). Furthermore, y is only updated when you do y = y + vy*dt, which will only ever increase y if vy >= 0 (I assume dt is always positive).
As #Giacomo Catenazzi suggested, try printing out y and y_max at the top of the while loop and see how they change as your code runs. I suspect they are not decrementing like you expected.

How do I use multithreading on this function for a np.meshgrid of values?

The following code generates numpy 2D lists of r and E values for the specified intervals.
r = np.linspace(3, 14, 10)
E = np.linspace(0.05, 0.75, 10)
r, E = np.meshgrid(r, E)
I am then using the following nested loop to generate output from the function ionisationGamma for each r and E interval value.
for ridx in trange(len(r)):
z = []
for cidx in range(len(r[ridx])):
z.append(ionisationGamma(r[ridx][cidx], E[ridx][cidx]))
Z = np.array(Z)
This loop gives me a 2D numpy array Z, which is my output and I am using it for a 3D graph. The problem with it is: it is taking ~6 hours to generate the output for all these intervals as there are so many values due to np.meshgrid. I have just discovered multi-threading in Python and wanted to know how I can implement this by using it. Any help is appreciated.
See below code for ionisationGamma
def ionisationGamma(r, E):
I = complex(0.1, 1.0)
a_soft = 1.0
omega = 0.057
beta = 0.0
dt = 0.1
steps = 10000
Nintervals = 60
N = 3000
xmin = float(-300)
xmax = -xmin
x = [0.0]*N
dx = (xmax - xmin) / (N - 1)
L = dx * N
dk = 2 * M_PI / L
propagator = None
in_, out_, psi0 = None, None, None
in_ = [complex(0.,0.)] * N
psi0 = [complex(0.,0.)] * N
out_ = [[complex(0.,0.)]*N for i in range(steps+1)]
overlap = exp(-r) * (1 + r + (1 / 3) * pow(r, 2))
normC = 1 / (sqrt(2 * (1 + overlap)))
gammai = 0.5
qi = 0.0 + (r / 2)
pi = 0.0
gammai1 = 0.5
gammai2 = 0.5
qi1 = 0.0 - (r / 2)
qi2 = 0.0 + (r / 2)
pi1 = 0.0
pi2 = 0.0
# split initial wavepacket
for i in range(N):
x[i] = xmin + i * dx
out_[0][i] = (normC) * ((pow(gammai1 / M_PI, 1. / 4.) * exp(complex(-(gammai1 / 2.) * pow(x[i] - qi1, 2.), pi1 * (x[i] - qi1)))) + (pow(gammai2 / M_PI, 1. / 4.) * exp(complex(-(gammai2 / 2.) * pow(x[i] - qi2, 2.), pi2 * (x[i] - qi2)))))
in_[i] = (normC) * ((pow(gammai1 / M_PI, 1. / 4.) * exp(complex(-(gammai1 / 2.) * pow(x[i] - qi1, 2.), pi1 * (x[i] - qi1)))) + (pow(gammai2 / M_PI, 1. / 4.) * exp(complex(-(gammai2 / 2.) * pow(x[i] - qi2, 2.), pi2 * (x[i] - qi2)))))
psi0[i] = in_[i]
for l in range(1, steps+1):
for i in range(N):
propagator = exp(complex(0, -potential(x[i], omega, beta, a_soft, r, E, dt, l) * dt / 2.))
in_[i] = propagator * in_[i];
in_ = np.fft.fft(in_, N)
for i in range(N):
k = dk * float(i if i < N / 2 else i - N)
propagator = exp(complex(0, -dt * pow(k, 2) / (2.)))
in_[i] = propagator * in_[i]
in_ = np.fft.ifft(in_, N)
for i in range(N):
propagator = exp(complex(0, -potential(x[i], omega, beta, a_soft, r, E, dt, l) * dt / 2.))
in_[i] = propagator * in_[i]
out_[l][i] = in_[i]
initialGammaCentre = 0.0
finalGammaCentre = 0.0
for i in range(500, 2500 +1):
initialGammaCentre += pow(abs(out_[0][i]), 2) * dx
finalGammaCentre += pow(abs(out_[steps][i]), 2) * dx
ionisationGamma = finalGammaCentre / initialGammaCentre
return ionisationGamma
def potential(x, omega, beta, a_soft, r, E, dt, l):
V = (-1. / sqrt((x - (r / 2)) * (x - (r / 2)) + a_soft * a_soft)) + ((-1. / sqrt((x + (r / 2)) * (x + (r / 2)) + a_soft * a_soft))) + E * x
return V
Since the question is about how to use multiprocessing, the following code will work:
import multiprocessing as mp
if __name__ == '__main__':
with mp.Pool(processes=16) as pool:
Z = pool.starmap(ionisationGamma, arguments)
Z = np.array(Z)
Where the arguments are:
arguments = list()
for ridx in range(len(r)):
for cidx in range(len(r[ridx])):
arguments.append((r[ridx][cidx], E[ridx][cidx]))
I am using starmap instead of map, since you have multiple arguments that you want to unpack. This will divide the arguments iterable over multiple cores, using the ionisationGamma function and the final result will be ordered.
However, I do feel the need to say that the main solution is not really the multiprocessing but the original function code. In ionisationGamma you are using several times the slow python for loops. And it would benefit your code a lot if you could vectorize those operations.
A second observation is that you are using many of those loops separately and it would be nice if you could separate that one big function into multiple smaller functions. Then you can time every function individually and speed up those that are too slow.

How could I solve the wrong that is generated from my line of sight checking?

I'm using the line of sight algorithm for a path planning problem, and I'm here checking its work, but as shown in the below picture that the line of sight checking is giving wrong results. The black tiles are paths that are checked by the line of sight, the function should also check tiles on (1,1) but it is not. How could I fix these errors?
The python code
import numpy as np
import matplotlib.pyplot as plt
grid = [[0,0,0,0,0,0,0,0],
start, goal = (0,0), (3,2)
def lineOfSight(p, s):
x0 = p[0]
y0 = p[1]
x1 = s[0]
y1 = s[1]
dy = y1 - y0
dx = x1 - x0
f = 0
if dy < 0:
dy = -dy
sy = -1
sy = 1
if dx < 0:
dx = -dx
sx = -1
sx = 1
result = []
if dx >= dy:
while x0 != x1:
f = f + dy
i1 = x0 + int((sx-1)/2)
i2 = y0 + int((sy-1)/2)
if f >= dx:
y0 = y0 + sy
f = f - dx
if f != 0:
if dy == 0:
result.append((i1, y0))
result.append((i1, y0-1))
x0 = x0 + sx
while y0 != y1:
f = f + dx
i1 = x0 + int((sx-1)/2)
i2 = y0 + int((sy-1)/2)
if f >= dy:
result.append((i1, i2))
x0 = x0 + sx
f = f - dy
if f != 0:
print((i1, i2))
if dx == 0:
result.append((x0, i2))
result.append((x0-1, i2))
y0 = y0 + sy
return result
check = lineOfSight(start,goal)
def getSquare(point):
x = [point[1], point[1]+1]
y0 = [-point[0], -point[0]]
y1 = [-(point[0]+1), -(point[0]+1)]
return (x,y0,y1)
checked = []
for xy in check:
plt.plot(start[1], -start[0], 'sb', label="Start Point")
plt.plot(goal[1], -goal[0], 'sg', label="Goal Point")
maxRow = len(grid)
maxCol = len(grid[0])
listRow = [-i for i in range(maxRow+1)]
listCol = [i for i in range(maxCol+1)]
xHorizontal, yHorizontal = np.meshgrid(listCol, [[0],[-maxRow]])
yVertical, xVertical = np.meshgrid(listRow, [[0],[maxCol]])
plt.plot(xHorizontal, yHorizontal, color='black')
plt.plot(xVertical, yVertical, color='black')
for square in checked:
x, y0, y1 = square
plt.fill_between(x,y0,y1, color='black')
plt.plot([start[1], goal[1]], [-start[0], -goal[0]], '-g')

Implementing self attention

I am trying to implement self attention in Pytorch.
I need to calculate the following expressions.
Similarity function S (2 dimensional), P(2 dimensional), C'
S[i][j] = W1 * inp[i] + W2 * inp[j] + W3 * x1[i] * inp[j]
P[i][j] = e^(S[i][j]) / Sum for all j( e ^ (S[i]))
basically, P is a softmax function
C'[i] = Sum (for all j) P[i][j] * x1[j]
I tried the following code using for loops
for i in range(self.dim):
for j in range(self.dim):
S[i][j] = self.W1 * x1[i] + self.W2 * x1[j] + self.W3 * x1[i] * x1[j]
for i in range(self.dim):
for j in range(self.dim):
P[i][j] = torch.exp(S[i][j]) / torch.sum( torch.exp(S[i]))
# attend
for i in range(self.dim):
out[i] = 0
for j in range(self.dim):
out[i] += P[i][j] * x1[j]
Is there any faster way to implement this in Pytorch?
Here is an example of Self Attention I had implemented in Dual Attention for HSI Imagery
class PAM_Module(Module):
""" Position attention module https://github.com/junfu1115/DANet/blob/master/encoding/nn/attention.py"""
#Ref from SAGAN
def __init__(self, in_dim):
super(PAM_Module, self).__init__()
self.chanel_in = in_dim
self.query_conv = Conv2d(in_channels=in_dim, out_channels=in_dim//8, kernel_size=1)
self.key_conv = Conv2d(in_channels=in_dim, out_channels=in_dim//8, kernel_size=1)
self.value_conv = Conv2d(in_channels=in_dim, out_channels=in_dim, kernel_size=1)
self.gamma = Parameter(torch.zeros(1))
self.softmax = Softmax(dim=-1)
def forward(self, x):
inputs :
x : input feature maps( B X C X H X W)
returns :
out : attention value + input feature
attention: B X (HxW) X (HxW)
m_batchsize, C, height, width = x.size()
proj_query = self.query_conv(x).view(m_batchsize, -1, width*height).permute(0, 2, 1)
proj_key = self.key_conv(x).view(m_batchsize, -1, width*height)
energy = torch.bmm(proj_query, proj_key)
attention = self.softmax(energy)
proj_value = self.value_conv(x).view(m_batchsize, -1, width*height)
out = torch.bmm(proj_value, attention.permute(0, 2, 1))
out = out.view(m_batchsize, C, height, width)
out = self.gamma*out + x
#out = F.avg_pool2d(out, out.size()[2:4])
return out
