Converting OpenMP to TBB - multithreading

I have some difficulties in converting an OpenMP code to TBB. Can someone help me?
I have the following code in OpenMP, where the results are pretty good
# pragma omp parallel \
shared ( b, count, count_max, g, r, x_max, x_min, y_max, y_min ) \
private ( i, j, k, x, x1, x2, y, y1, y2 )
{
# pragma omp for
for ( i = 0; i < m; i++ )
{
for ( j = 0; j < n; j++ )
{
//cout << omp_get_thread_num() << " thread\n";
x = ( ( double ) ( j - 1 ) * x_max
+ ( double ) ( m - j ) * x_min )
/ ( double ) ( m - 1 );
y = ( ( double ) ( i - 1 ) * y_max
+ ( double ) ( n - i ) * y_min )
/ ( double ) ( n - 1 );
count[i][j] = 0;
x1 = x;
y1 = y;
for ( k = 1; k <= count_max; k++ )
{
x2 = x1 * x1 - y1 * y1 + x;
y2 = 2 * x1 * y1 + y;
if ( x2 < -2.0 || 2.0 < x2 || y2 < -2.0 || 2.0 < y2 )
{
count[i][j] = k;
break;
}
x1 = x2;
y1 = y2;
}
if ( ( count[i][j] % 2 ) == 1 )
{
r[i][j] = 255;
g[i][j] = 255;
b[i][j] = 255;
}
else
{
c = ( int ) ( 255.0 * sqrt ( sqrt ( sqrt (
( ( double ) ( count[i][j] ) / ( double ) ( count_max ) ) ) ) ) );
r[i][j] = 3 * c / 5;
g[i][j] = 3 * c / 5;
b[i][j] = c;
}
}
}
}
And the TBB version is 10 times slower then OpenMP
the code for TBB is:
tbb::parallel_for ( int(0), m, [&](int i)
{
for ( j = 0; j < n; j++)
{
x = ( ( double ) ( j - 1 ) * x_max
+ ( double ) ( m - j ) * x_min )
/ ( double ) ( m - 1 );
y = ( ( double ) ( i - 1 ) * y_max
+ ( double ) ( n - i ) * y_min )
/ ( double ) ( n - 1 );
count[i][j] = 0;
x1 = x;
y1 = y;
for ( k = 1; k <= count_max; k++ )
{
x2 = x1 * x1 - y1 * y1 + x;
y2 = 2 * x1 * y1 + y;
if ( x2 < -2.0 || 2.0 < x2 || y2 < -2.0 || 2.0 < y2 )
{
count[i][j] = k;
break;
}
x1 = x2;
y1 = y2;
}
if ( ( count[i][j] % 2 ) == 1 )
{
r[i][j] = 255;
g[i][j] = 255;
b[i][j] = 255;
}
else
{
c = ( int ) ( 255.0 * sqrt ( sqrt ( sqrt (
( ( double ) ( count[i][j] ) / ( double ) ( count_max ) ) ) ) ) );
r[i][j] = 3 * c / 5;
g[i][j] = 3 * c / 5;
b[i][j] = c;
}
}
});

Pay attention to the private ( i, j, k, x, x1, x2, y, y1, y2 ) clause in OpenMP version of the code. This list of variables specifies private/local variable inside the parallel loop body. However, in TBB version of the code many of these variables are captured by lambda as references ([&]) so the code is incorrect. It has races and, in my opinion, slow down is caused by accessing these variables from multiple threads (cache coherence overhead and mess in loop indices) . So, if you want to fix the code, make these variables local, e.g.
tbb::parallel_for ( int(0), m, [&](int i)
{
double x, y, x1, x2, y1, y2; // !!!!
int j, k; // !!!!
for ( j = 0; j < n; j++)
{
x = ( ( double ) ( j - 1 ) * x_max
+ ( double ) ( m - j ) * x_min )
/ ( double ) ( m - 1 );
...

Related

Fitting a function f(x,y,z) with a quadratic polynomial

I'm trying to fit a function f(x,y,z) with the following quadratic polynomial:
3d polynomial
Some distorted spherical surface in three dimensions. The problem is related to the calculation of effective masses in solid state physics.
Here is a picture of the data to show that it indeed falls off parabolically in all directions, even though the curvature in the z-direction is rather low:
3d parabolas
I'm interested in the coefficients, which correspond to effective masses. I've got an array of xyz coordinates, which is regular and centered on the maximum:
[[ 0. 0. 0. ]
[ 0. 0. 0.01282017]
[ 0. 0. 0.02564034]
...
[-0.05026321 -0.05026321 -0.03846052]
[-0.05026321 -0.05026321 -0.02564034]
[-0.05026321 -0.05026321 -0.01282017]]
And a corresponding 1D array of scalar values, one for each point. The number of data points around this maximum can range from 100 to 1000.
This is the code I'm currently trying to use for fitting:
def func(data, mxx, mxy, mxz, myy, myz, mzz):
x = data[:, 0]
y = data[:, 1]
z = data[:, 2]
return (
(1 / (2 * mxx)) * (x ** 2)
+ (1 / (1 * mxy)) * (x * y)
+ (1 / (1 * mxz)) * (x * z)
+ (1 / (2 * myy)) * (y ** 2)
+ (1 / (1 * myz)) * (y * z)
+ (1 / (2 * mzz)) * (z ** 2)
) + f(0, 0, 0)
energy = data[:, 3]
guess = (mxx, mxy, mxz, myy, myz, mzz)
params, pcov = scipy.optimize.curve_fit(
func, data, energy, p0=guess, method="trf"
)
Where f(0,0,0) is the value of the function at (0, 0, 0), which I retrieve with the scipy.interpolate.griddata function.
For this problem, the masses should be negative and have values between -0.2 and -2, roughly speaking. I'm creating guess values through a finite difference differentiation.
However, I don't get any senseful results from scipy.interpolate.curve_fit - typically the coefficients end up with huge numbers (like 1e9). I'm completly lost at this point.
What am I doing wrong :( ?
One of the problems is that you fit 1/m. While this is correct from a physics point of view, it is bad from the algorithm point of view. If the fitting algorithm needs to change sign for values of m near zero, the coefficients diverge. Consequently, it is better to fit mI = 1/m and make the according error progressions later. Here I use leastsqwhich requires some additional calculations for the covariance matrix (as it returns the reduced form). I do the fit with g() and the inverse masses, but you can immediately reproduce your problems when introducing f() and directly fitting the ms.
A second point is that the data has an offset, i.e. if x = y = z = 0 the data is v= -0.0195 This needs to be introduced into the model.
Finally, I'd say that you already have non-parabolic behaviour in your data.
Nevertheless, here is how it looks like:
import matplotlib.pyplot as plt
import numpy as np
np.set_printoptions(linewidth=300)
from scipy.optimize import leastsq
from scipy.optimize import curve_fit
data = np.loadtxt( "silicon.csv", delimiter=',' )
def f( x, y, z, mxx, mxy, mxz, myy, myz, mzz, offI ):
out = 1./(2 * mxx) * x * x
out += 1./( mxy ) * x * y
out += 1./( mxz ) * x * z
out += 1./( 2 * myy ) * y * y
out += 1./( myz ) * y * z
out += 1./( 2 * mzz ) * z * z
out += 1./offI
return out
def g( x, y, z, mxxI, mxyI, mxzI, myyI, myzI, mzzI, off ):
out = mxxI / 2 * x * x
out += mxyI * x * y
out += mxzI * x * z
out += myyI / 2 * y * y
out += myzI * y * z
out += mzzI / 2 * z * z
out += off
return out
def residuals( params, indata ):
out = list()
for x, y, z, v in indata:
out.append( v - g( x,y, z, *params ) )
return out
sol, cov, info, msg, ier = leastsq( residuals, 7*[0], args=( data, ), full_output=True)
s_sq = sum( [x**2 for x in residuals( sol, data) ] )/ (len( data ) - len( sol ) )
print "solution"
print sol
masses = [1/x for x in sol]
print "masses:"
print masses
print "covariance matrix:"
covMX = cov * s_sq
print covMX
print "sum of residuals"
print sum( residuals( sol, data) )
### plotting the cuts
fig = plt.figure('cuts')
ax = dict()
for i in range( 1, 10 ):
ax[i] = fig.add_subplot( 3, 3, i )
dl = np.linspace( -.2, .2, 25)
#### xx
xdata = [ [ x, v ] for x,y,z,v in data if ( abs(y)<1e-3 and abs(z) < 1e-3 ) ]
vl = np.fromiter( ( f( x, 0, 0, *masses ) for x in dl ), np.float )
ax[1].plot( *zip(*sorted( xdata ) ), ls='', marker='o')
ax[1].plot( dl, vl )
#### xy
xydata = [ [ x, v ] for x, y, z, v in data if ( abs( x - y )<1e-2 and abs(z) < 1e-3 ) ]
vl = np.fromiter( ( f( xy, xy, 0, *masses ) for xy in dl ), np.float )
ax[2].plot( *zip(*sorted( xydata ) ), ls='', marker='o')
ax[2].plot( dl, vl )
#### xz
xzdata = [ [ x, v ] for x, y, z, v in data if ( abs( x - z )<1e-2 and abs(y) < 1e-3 ) ]
vl = np.fromiter( ( f( xz, 0, xz, *masses ) for xz in dl ), np.float )
ax[3].plot( *zip(*sorted( xzdata ) ), ls='', marker='o')
ax[3].plot( dl, vl )
#### yy
ydata = [ [ y, v ] for x, y, z, v in data if ( abs(x)<1e-3 and abs(z) < 1e-3 ) ]
vl = np.fromiter( ( f( 0, y, 0, *masses ) for y in dl ), np.float )
ax[5].plot( *zip(*sorted( ydata ) ), ls='', marker='o' )
ax[5].plot( dl, vl )
#### yz
yzdata = [ [ y, v ] for x, y, z, v in data if ( abs( y - z )<1e-2 and abs(x) < 1e-3 ) ]
vl = np.fromiter( ( f( 0, yz, yz, *masses ) for yz in dl ), np.float )
ax[6].plot( *zip(*sorted( yzdata ) ), ls='', marker='o')
ax[6].plot( dl, vl )
#### zz
zdata = [ [ z, v ] for x, y, z, v in data if ( abs(x)<1e-3 and abs(y) < 1e-3 ) ]
vl = np.fromiter( ( f( 0, 0, z, *masses ) for z in dl ), np.float )
ax[9].plot( *zip(*sorted( zdata ) ), ls='', marker='o' )
ax[9].plot( dl, vl )
#### some diag
ddata = [ [ z, v ] for x, y, z, v in data if ( abs(x - y)<1e-3 and abs(x - z) < 1e-3 ) ]
vl = np.fromiter( ( f( d, d, d, *masses ) for d in dl ), np.float )
ax[7].plot( *zip(*sorted( ddata ) ), ls='', marker='o' )
ax[7].plot( dl, vl )
#### some other diag
ddata = [ [ z, v ] for x, y, z, v in data if ( abs(x - y)<1e-3 and abs(x + z) < 1e-3 ) ]
vl = np.fromiter( ( f( d, d, -d, *masses ) for d in dl ), np.float )
ax[8].plot( *zip(*sorted( ddata ) ), ls='', marker='o' )
ax[8].plot( dl, vl )
plt.show()
This gives the following output:
solution
[-1.46528595 0.25090717 0.25090717 -1.46528595 0.25090717 -1.46528595 -0.01993436]
masses:
[-0.6824606499739905, 3.985537743156507, 3.9855376943660676, -0.6824606473928339, 3.9855377322848344, -0.6824606467055248, -50.16463861555409]
covariance matrix:
[
[ 4.76417852e-03 -1.46907683e-12 -8.57639600e-12 -2.21281938e-12 -2.38444957e-12 8.42981521e-12 -2.70034183e-05]
[-1.46907683e-12 9.17104397e-04 -7.10573582e-13 1.32125214e-11 7.44553140e-12 1.29909935e-11 -1.11259046e-13]
[-8.57639600e-12 -7.10573582e-13 9.17104389e-04 -8.60004172e-12 -6.14797647e-12 8.27070243e-12 3.11127064e-14]
[-2.21281914e-12 1.32125214e-11 -8.60004172e-12 4.76417860e-03 -4.20477032e-12 9.20893224e-12 -2.70034186e-05]
[-2.38444957e-12 7.44553140e-12 -6.14797647e-12 -4.20477032e-12 9.17104395e-04 1.50963408e-11 -7.28889534e-14]
[ 8.42981530e-12 1.29909935e-11 8.27070243e-12 9.20893175e-12 1.50963408e-11 4.76417849e-03 -2.70034182e-05]
[-2.70034183e-05 -1.11259046e-13 3.11127064e-14 -2.70034186e-05 -7.28889534e-14 -2.70034182e-05 5.77019926e-07]
]
sum of residuals
4.352727352163743e-09
...and here some 1d cuts that show some significant deviation from parabolic behaviour if one is not on one of the main axes.

Processing: Distance of intersection between line and circle

Now, I know similar questions have been asked. But none of the answers has helped me to find the result I need.
Following situation:
We have a line with a point-of-origin (PO), given as lx, ly. We also have an angle for the line in that it exits PO, where 0° means horizontally to the right, positive degrees mean clockwise. The angle is in [0;360[. Additionally we have the length of the line, since it is not infinitely long, as len.
There is also a circle with the given center-point (CP), given as cx, cy. The radius is given as cr.
I now need a function that takes these numbers as parameters and returns the distance of the closest intersection between line and circle to the PO, or -1 if no intersection occures.
My current approach is a follows:
float getDistance(float lx, float ly, float angle, float len, float cx, float cy, float cr) {
float nlx = lx - cx;
float nly = ly - cy;
float m = tan(angle);
float b = (-lx) * m;
// a = m^2 + 1
// b = 2 * m * b
// c = b^2 - cr^2
float[] x_12 = quadraticFormula(sq(m) + 1, 2*m*b, sq(b) - sq(cr));
// if no intersections
if (Float.isNaN(x_12[0]) && Float.isNaN(x_12[1]))
return -1;
float distance;
if (Float.isNaN(x_12[0])) {
distance = (x_12[1] - nlx) / cos(angle);
} else {
distance = (x_12[0] - nlx) / cos(angle);
}
if (distance <= len) {
return distance;
}
return -1;
}
// solves for x
float[] quadraticFormula(float a, float b, float c) {
float[] results = new float[2];
results[0] = (-b + sqrt(sq(b) - 4 * a * c)) / (2*a);
results[1] = (-b - sqrt(sq(b) - 4 * a * c)) / (2*a);
return results;
}
But the result is not as wished. Sometimes I do get a distance returned, but that is rarely correct, there often isn't even an intersection occuring. Most of the time no intersection is returned though, although there should be one.
Any help would be much appreciated.
EDIT:
I managed to find the solution thanks to MBo's answer. Here is the content of my finished getDistance(...)-function - maybe somebody can be helped by it:
float nlx = lx - cx;
float nly = ly - cy;
float dx = cos(angle);
float dy = sin(angle);
float[] results = quadraticFormula(1, 2*(nlx*dx + nly*dy), sq(nlx)+sq(nly)-sq(cr));
float dist = -1;
if (results[0] >= 0 && results[0] <= len)
dist = results[0];
if (results[1] >= 0 && results[1] <= len && results[1] < results[0])
dist = results[1];
return dist;
Using your nlx, nly, we can build parametric equation of line segment
dx = Cos(angle)
dy = Sin(Angle)
x = nlx + t * dx
y = nly + t * dy
Condition of intersection with circumference:
(nlx + t * dx)^2 + (nly + t * dy)^2 = cr^2
t^2 * (dx^2 + dy^2) + t * (2*nlx*dx + 2*nly*dy) + nlx^2+nly^2-cr^2 = 0
so we have quadratic equation for unknown parameter t with
a = 1
b = 2*(nlx*dx + nly*dy)
c = nlx^2+nly^2-cr^2
solve quadratic equation, find whether t lies in range 0..len.
// https://openprocessing.org/sketch/8009#
// by https://openprocessing.org/user/54?view=sketches
float circleX = 200;
float circleY = 200;
float circleRadius = 100;
float lineX1 = 350;
float lineY1 = 350;
float lineX2, lineY2;
void setup() {
size(400, 400);
ellipseMode(RADIUS);
smooth();
}
void draw() {
background(204);
lineX2 = mouseX;
lineY2 = mouseY;
if (circleLineIntersect(lineX1, lineY1, lineX2, lineY2, circleX, circleY, circleRadius) == true) {
noFill();
}
else {
fill(255);
}
ellipse(circleX, circleY, circleRadius, circleRadius);
line(lineX1, lineY1, lineX2, lineY2);
}
// Code adapted from Paul Bourke:
// http://local.wasp.uwa.edu.au/~pbourke/geometry/sphereline/raysphere.c
boolean circleLineIntersect(float x1, float y1, float x2, float y2, float cx, float cy, float cr ) {
float dx = x2 - x1;
float dy = y2 - y1;
float a = dx * dx + dy * dy;
float b = 2 * (dx * (x1 - cx) + dy * (y1 - cy));
float c = cx * cx + cy * cy;
c += x1 * x1 + y1 * y1;
c -= 2 * (cx * x1 + cy * y1);
c -= cr * cr;
float bb4ac = b * b - 4 * a * c;
//println(bb4ac);
if (bb4ac < 0) { // Not intersecting
return false;
}
else {
float mu = (-b + sqrt( b*b - 4*a*c )) / (2*a);
float ix1 = x1 + mu*(dx);
float iy1 = y1 + mu*(dy);
mu = (-b - sqrt(b*b - 4*a*c )) / (2*a);
float ix2 = x1 + mu*(dx);
float iy2 = y1 + mu*(dy);
// The intersection points
ellipse(ix1, iy1, 10, 10);
ellipse(ix2, iy2, 10, 10);
float testX;
float testY;
// Figure out which point is closer to the circle
if (dist(x1, y1, cx, cy) < dist(x2, y2, cx, cy)) {
testX = x2;
testY = y2;
} else {
testX = x1;
testY = y1;
}
if (dist(testX, testY, ix1, iy1) < dist(x1, y1, x2, y2) || dist(testX, testY, ix2, iy2) < dist(x1, y1, x2, y2)) {
return true;
} else {
return false;
}
}
}

Error using dde23 (line 224) Derivative and history vectors have different lengths

I am trying to solve a couple system of delay differential equations using dde23. While running the following code, I am getting an annoying error "Derivative and history vectors have different lengths"
function sol = prob1
clf
global Lembda alpha u1 u2 p q c d k a T b zeta1 zeta2 A1 A2
Lembda=2; b=0.07; d=0.0123; a=0.6; k=50; q=13; c=40; p=30; alpha = 0.4476; T=1; B=0.4; A1 =200; A2=100; zeta1=10; zeta2=30;
lags = [ 10; 0.2; 2; 10; 0.2; 10; 0.2; 2; 10; 0.2; 15; 0.9; 0.17; 0.01; 0.5; 0.000010; 0.00002];
sol = dde23(#prob2f,T,lags,[0,10], u1, u2);
function yp = prob2f(t,y,Z,B)
global Lembda alpha p b d c q T a k zeta1 zeta2 A1 A2
x2 = y(1);
y2 = y(2);
z2 = y(3);
v = y(4);
w = y(5);
xlag = Z(1,1);
vlag = Z(2,1);
%%%%%%%%%%%%%%%%
x1 = y(6);
y1 = y(7);
z1 = y(8);
v1 = y(9);
w1 = y(10);
x1lag = Z(1,1);
v1lag = Z(2,1);
%%%%%%%%%%%%%%%%%%%
lambda1 = y(11);
lambda2 = y(12);
lambda3 = y(13);
lambda4 = y(14);
lambda5 = y(15);
u1 = y(16);
u2= y(17);
lambda1lag = Z(1,1);
lambda4lag = Z(2,1);
%%%%%%%%%
dxdt=Lembda-d*x2-B*x2*v;
dydt=B*exp(-a*T)*xlag*vlag-a*y2 - alpha*y2*w;
dzdt=alpha*y2*w - b*z2;
dvdt=k*y2-p*v;
dwdt=c*z2-q*w;
%%%%%%%%%
dx1dt=Lembda-d*x1-(1-u1)*B*x1*v1;
dy1dt=(1-u1)*B*exp(-a*T)*x1lag*v1lag-a*y1 - alpha*y1*w1;
dz1dt=alpha*y1*w1 - b*z1;
dv1dt=(1-u2)*k*y1-p*v1;
dw1dt=c*z1-q*w1;
%%%%%%%%%%
dlambda1dt= A1+lambda1*d+(1-u1)*lambda1*B*v1-(1-u1)*lambda2*B*v1lag*exp(-a*T)*lambda2*(T);
dlambda2dt= a*lambda2+(lambda2-lambda3)*alpha*w1-lambda4*k*(u2-1);
dlambda3dt= b*lambda3-c*lambda5;
dlambda4dt= A2+(1-u1)*lambda1*B*x1+lambda4*p+lambda4*(T)*lambda2*x1lag*(u2-1)*exp(-a*T);
dlambda5dt=alpha*lambda2*z1-alpha*lambda3*z1+lambda5*q;
du1dt = ( lambda2*x1lag*v1lag - lambda1*x1*v1)*(B/zeta1);
du2dt =(lambda4*k*y2)/zeta2;
yp = [ dxdt; dydt; dzdt; dvdt;dwdt; dx1dt; dy1dt; dz1dt; dv1dt;dw1dt; dlambda1dt; dlambda2dt; dlambda3dt ;dlambda4dt ;dlambda5dt; du1dt; du2dt ];
Can anyone guide me, to be able to resolve this issue?
Thanks
The error occurs because your return vector yp is not the same size as the lags vector.
The lags vector has length 17, but the yp vector comes out to be of length 10. Even though you have 17 entries in yp, many of them as []
yp = [ dxdt; dydt; dzdt; dvdt;dwdt; dx1dt; dy1dt; dz1dt; dv1dt;dw1dt;
dlambda1dt; dlambda2dt; dlambda3dt ;dlambda4dt ;dlambda5dt; du1dt; du2dt ];
K>> dxdt
dxdt =
[]
K>> length(yp)
10
lags = [ 10; 0.2; 2; 10; 0.2; 10; 0.2; 2; 10; 0.2; 15; 0.9; 0.17; 0.01;
0.5; 0.000010; 0.00002];
sol = dde23(#prob2f,T,lags,[0,10], u1, u2);
K>> length(lags)
17
The return from your prob2f() should have same length as lags. This is why the error shows up
f0 = feval(ddefun,t0,y0,Z0,varargin{:});
nfevals = nfevals + 1;
[m,n] = size(f0);
if n > 1
error(message('MATLAB:dde23:DDEOutputNotCol'))
elseif m ~= neq
error(message('MATLAB:dde23:DDELengthMismatchHistory')); <========
end
You need to check your prob2f function and make sure yp has same length as lags.

Function to calculate a value inside a Verilog generate loop

I am trying to create a parametrized circuit for the multiplication stage of a BCD Wallace Tree Multiplier, which I implemented in Orcad. The trouble I'm having is that I need to calculate the bit positions that each two digits that result from BCD multiplication will inhabit.
Here is my code:
module bcd_mult_1_n #(parameter N = 8)
(input [N * 4 - 1:0] num1, num2, output reg [2 * 4 * N * N - 1:0] partProds);
genvar i, j;
generate
for(i = 0; i < N; i = i + 1) begin : dig1
for(j = 0; j < N; j = j + 1) begin : dig2
localparam lsd = posLSD(i, j);
localparam msd = posMSD(i, j);
bcd_mult_1 bcd_mult(num1[i * 4 + 3:i * 4], num2[j * 4 + 3:j * 4],
partProds[msd * 4 + 3:msd * 4], partProds[lsd * 4 + 3: lsd * 4]);
end
end
endgenerate
In the above code, numPrev(i + j) needs to return a value calculated something like this
int numPrev(int x) {
int acc = 0;
for(int i = x; i > 0; i++) acc = acc + 2 * i;
return acc;
}
Thanks to help from #Morgan I have created the following function; the logic is meant to count up and down a sort of triangle of values which rise from 1 to N and back down to 1.
function integer posLSD;
input integer x, y;
integer weight;
integer acc;
integer num;
integer i;
weight = x + y;
acc = 0;
if(weight >= N) num = N - 1;
else num = weight;
for(i = num; i > 0; i = i - 1)
acc = acc + 2 * i;
if(weight >= N) begin
for(i = 2 * N - weight; i <= N; i = i + 1) begin
acc = acc + 2 * i;
end
acc = acc + N - weight + y - 1;
end
else
acc = acc + y;
posLSD = acc;
endfunction
function integer posMSD;
input integer x, y;
integer acc;
integer weight;
acc = posLSD(x, y);
weight = x + y;
if(weight < N) acc = acc + weight + 1;
else acc = acc + 2 * N - weight - 1;
posMSD = acc;
endfunction
How could I achieve this functionality? If needed, I could use SystemVerilog constructs.
When I change to use a function I get the error Packed dimension must specify a range. I think you need to think about your partProds width and connections.
Using a function:
module bcd_mult_1_n #(
parameter N = 8
) (
input [N * 4 - 1:0] num1,
input [N * 4 - 1:0] num2,
output reg [2 * 4 * N * N] partProds
);
integer prev = 1;
genvar i, j;
generate
for(i = 0; i < N; i = i + 1) begin : dig1
for(j = 0; j < N; j = j + 1) begin : dig2
bcd_mult_1
bcd_mult(
num1[i * 4 + 3:i * 4],
num2[j * 4 + 3:j * 4],
partProds[numPrev(i+j) + 2*j + i + 1],
partProds[numPrev(i+j) + j]
);
end
end
endgenerate
function numPrev;
input integer x ;
integer acc;
begin
acc = 0;
for(int ij = x; ij > 0; ij++) begin
acc = acc + 2 * ij;
end
numPrev = acc;
end
endfunction
endmodule
module bcd_mult_1(
input [3:0]a,
input [3:0]b,
input c,
input d
);
endmodule
Example on EDA Playground.

(computer graphics) radial image distortion

I need to create an effect, that radially distorts a bitmap, by stretching or shrinking its "layers of pixels" radially (as shown on the image):
http://i.stack.imgur.com/V6Voo.png
by colored circles (their thickness) is shown the transform, that is applied to the image
What approach should I take? I have a bitmap (array of pixels) and an another bitmap, that should be the result of such a filter applied (as a result, there should be some kind of a round water ripple on the bitmap).
Where could I read about creating such effects?
Thank you.
Try to look here
http://www.jhlabs.com/ip/blurring.html
Zoom and Spin Blur
it is Java but nevertheless it could be fit to your request.
Well, the most accurate results would come from mapping the euclidean coordinates to a polar matrix. Then you would very easily be able to stretch them out. Then just translate them back to a euclidean representation and save. I'll write and edit with some code in a second.
Alright I got a bit carried away but here's my code. It will take a bitmap, convert it to and from polar coordinates and save it. now, radial based distortion should be a breeze.
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<math.h>
#define PI 3.141592654
#define C_R 1000
#define C_S 1000
#define C_M 2000
typedef struct{ int r,g,b; } color;
typedef struct{ int t; color* data; int w, h; } bitmap;
typedef struct{ int t; color* data; int r, s, w, h; } r_bitmap;
bitmap* bmp_load_from_file( const char* fname ){
FILE* b = fopen( fname, "rb" );
if( b <= 0 ) return 0;
int num;
fscanf( b, "BM%n", &num );
if( num < 2 ) return 0;
struct{ int size, reserved, offset;
int hsize, wid, hig, planes:16, bpp:16, comp, bmpsize, hres, vres, colors, important; } head;
fread( &head, 13, 4, b );
bitmap* bmp = malloc( sizeof( bitmap ) );
bmp->data = malloc( head.wid * head.hig * sizeof( color ) );
bmp->w = head.wid;
bmp->h = head.hig;
for( int y = head.hig - 1; y >= 0; --y ){
int x;
for( x = 0; x < head.wid; ++x ){
color t;
t.r = fgetc( b );
t.g = fgetc( b );
t.b = fgetc( b );
bmp->data[x+y*bmp->w] = t;
}
x*=3;
while( x%4 != 0 ){
++x;
fgetc( b );
}
}
bmp->t = 0;
fclose( b );
return bmp;
}
void bmp_save( const char* fname, bitmap* bmp ){
FILE* b = fopen( fname, "wb" );
if( b <= 0 ) return 0;
struct{ int size, reserved, offset;
int hsize, wid, hig, planes:16, bpp:16, comp, bmpsize, hres, vres, colors, important; } head;
fprintf( b, "BM" );
head.size = 3 * (bmp->w+4)/4*4 * bmp->h + 54;
head.offset = 54;
head.hsize = 40;
head.wid = bmp->w;
head.hig = bmp->h;
head.planes = 1;
head.bpp = 24;
head.comp = 0;
head.bmpsize = 3 * (bmp->w+4)/4*4 * bmp->h;
head.hres = 72;
head.vres = 72;
head.colors = 0;
head.important = 0;
fwrite( &head, 13, 4, b );
for( int y = bmp->h - 1; y >= 0; --y ){
int x;
for( x = 0; x < bmp->w; ++x ){
fputc( bmp->data[x + y * bmp->w].r, b );
fputc( bmp->data[x + y * bmp->w].g, b );
fputc( bmp->data[x + y * bmp->w].b, b );
}
x*=3;
while( x % 4 != 0 ){
++x;
fputc(0, b);
}
}
fclose( b );
}
color color_mix( color a, color b, int offset ){ /*offset is a value between 0 and 255 to determine the weight. the lower it is the more color a gets*/
//if( offset > 255 || offset < 0)
//printf("%i\t", offset);
a.r += ( b.r - a.r ) * offset / 255;
a.g += ( b.g - a.g ) * offset / 255;
a.b += ( b.b - a.b ) * offset / 255;
return a;
}
r_bitmap* bmp_to_r( bitmap* b ){
r_bitmap* r = malloc( sizeof( r_bitmap ) );
r->t = 1;
int radius = sqrt( b->w * b->w + b->h * b->h ) / 2 * C_R / C_M + 2;
int step = C_S * ( b->w + b->h ) / C_M;
r->data = malloc( radius * step * sizeof( color ) );
r->r = radius;
r->s = step;
r->w = b->w;
r->h = b->h;
color black = {0, 0, 0};
for( double i = 0; i < radius; ++ i ){
for( double j = 0; j < step; ++j ){
double x = i * C_M * cos( 2 * PI * j / step ) / C_R + b->w / 2;
double y = i * C_M * sin( 2 * PI * j / step ) / C_R + b->h / 2;
int ix = x;
int iy = y;
if( x < 0 || x >= b->w || y < 0 || y >= b->h )
r->data[(int)(j + i * step)] = black;
else{
color tmp = b->data[ix + iy * b->w];
if( iy < b->h - 1 ){
int off = 255 * (y - iy);
tmp = color_mix( tmp, b->data[ix + (iy+1) * b->w], off );
}
if( ix < b->w - 1 ){
int off = 255 * ( x - ix );
tmp = color_mix( tmp, b->data[ix +1 + iy * b->w], off );
}
r->data[(int)(j + i * step)] = tmp;
}
}
}
return r;
}
bitmap* bmp_from_r( r_bitmap* r ){
bitmap* b = malloc( sizeof( bitmap ) );
b->t = 0;
b->data = malloc( r->w * r->h * sizeof( color ) );
b->w = r->w;
b->h = r->h;
for( int y = 0; y < b->h; ++y ){
for( int x = 0; x < b->w; ++x ){
int tx = x - b->w/2;
int ty = y - b->h/2;
double rad = sqrt( tx*tx+ty*ty ) * C_R / C_M;
double s = atan2( ty, tx );
if( s < 0 ) s += 2 * PI;
s *= r->s / ( 2 * PI );
int is = s;
int irad = rad;
color tmp = r->data[(int)(is + irad * r->s)];
/*if( x > 0 && x < r->w - 1 && y > 0 && y < r->h - 1 ){
tmp = color_mix(tmp, r->data[((int)(is+1)%r->s + irad * r->s)], abs(255* rad - floor(rad)));
tmp = color_mix(tmp, r->data[(is + (irad + 1) * r->s)], abs(255* s - floor(s)));
}*/
b->data[x+y*b->w] = tmp;
}
}
return b;
}
int main( ) {
bitmap* b = bmp_load_from_file( "foo.bmp" );
r_bitmap* r = bmp_to_r( b );
bitmap* c = bmp_from_r( r );
bmp_save( "lol.bmp", c );
}

Resources