cuda kernel 'volta_sgemm_128x32_nn' means what? - multithreading

I am studying the nvidia torch matmul function.
### variable creation
a = torch.randn(size=(1,128,3),dtype=torch.float32).to(cuda)
b = torch.randn(size=(1,3,32),dtype=torch.float32).to(cuda)
### execution
c = torch.matmul(a,b)
I profiled this code using pyprof and this gives me the result below.
I cannot understand many things in there.
what is sgemm_128_32 means?
I see the 's' in sgemm stands for single precision and 'gemm' means general matrix multiplication. But i don't know the 128_32 means. My output matrix dimension is 128 by 32. But I know that cutlass optimizes the sgemm using outer product. (i will give you the link, ref 1) Actually i cannot understand the link.
(1)Does 128_32 means simply the output matrix's dimension?
(2)Is there any way how my output matrix(c, in my code) is actually calculated?
(for example, there are total 128*32 threads. And each thread is responsible for one output element using inner product way)
Why the Grid and Block have 3 dimension each and how the grid and block is used for sgemm_128_32?
Grid consists of x, y, z. And Block consists of x, y, z. (1) Why do you need 3 dimension? I see that (in the picture above) block X has 256 thread. (2) is this true? And Grid Y is 4. so this means that there is 4 blocks in Grid Y. (3) is this true?
By using that pyprof result, can i figure out how many SMs are used? how many warps are activated in that SM?
Thank you.
ref 1 : https://developer.nvidia.com/blog/cutlass-linear-algebra-cuda/

Related

Find a solution that fulfils a subset of output parameters

I new using openMDAO so I assume this should be an easy question.
I have a complex group (with explicit components and cycles) that I can solve for a set of input variables:
y1, y2,..., yi = Group(x1, x2,..., xj)
What I am trying to do now is to match two outputs (y1_target and y2_target) changing two inputs from the group (x1, x2), i.e., adding two equations out of the group such as,
y1_target - y1 = 0
y2_target - y2 = 0
I understand that this should be done with two implicit components but ¿how I force to only change x1 and x2?
Thanks,
This can be done with a single BalanceComp on a group outside of the group you mention there. A simple diagram of the system, as I understand it, is below.
Here a BalanceComp is added that handles your two residual equations. BalanceComp is an ImplicitComponent in OpenMDAO that handles simple "set X equal to Y" situations like the one in your case. The documentation for it is here.
In your case, the outer group (called g_outer in the code below) would have a balance comp that is set to satisfy two residual equations. Subsystem "group" refers to your existing group.
bal = g_outer.add_subsystem('balance', bal)
bal.add_balance('x1', lhs_name='y1_target', rhs_name='y1')
bal.add_balance('x2', lhs_name='y2_target', rhs_name='y2')
g_outer.connect('balance.x1', 'group.x1')
g_outer.connect('balance.x2', 'group.x2')
g_outer.connect('group.y1', 'balance.y1')
g_outer.connect('group.y2', 'balance.y2')
Another absolutely critical set is setting the nonlinear and linear solvers of g_outer. The default solvers only work for explicit systems. This implicit system requires a NewtonSolver for the nonlinear solver, and some iterative linear solver. DirectSolver often works fine unless the system is very large.
g_outer.nonlinear_solver = om.NewtonSolver(solve_subsystems=True)
g_outer.linear_solver = om.DirectSolver()
Left out of the above snippet is the code that either connects a value to balance.y1_target and balance.y2_target, or sets them after setup.

Determining the Distance between two matrices using numpy

I am developing my own Architecture Search algorithm using Pythons numpy. Currently I am trying to determine how to develop a cost function that can see the distance between X and Y, or two matrices.
I'd like to reduce the difference between the two, to a meaningful scalar value.
Ideally between 0 and 1, so that if both sets of elements within the matrices are the same numerically and positionally, a 0 is returned.
In the example below, I have the output of my algorithm X. Both X and Y are the same shape. I tried to sum the difference between the two matrices; however I'm not sure that using summation will work in all conditions. I also tried returning the mean. I don't think that either approach will work though. Aside from looping through both matrices and comparing elements directly, is there a way to capture the degree of difference in a scalar?
Y = np.arange(25).reshape(5, 5)
for i in range(1000):
X = algorithm(Y)
# I try to reduce the difference between the two matrices to a scalar value
cost = np.sum(X-Y)
There are many ways to calculate a scalar "difference" between two matrices. Here are just two examples.
The mean square error:
((m1 - m2) ** 2).mean() ** 0.5
The max absolute error:
np.abs(m1 - m2).max()
The choice of the metric depends on your problem.

Why is stft(istft(x)) ≠ x?

Why is stft(istft(x)) ≠ x?
Using PyTorch, I have computed the short-time Fourier transform of the inverse short-time Fourier transform of a tensor.
I have done so as shown below given a tensor x.
For x, the real and imaginary part are equal, or the imaginary part is set to zero -- both produces the same problem.
torch.stft(torchaudio.functional.istft(x, n_fft), n_fft)
As shown in the image, only a single one of the stripes in the tensor remains after applying stft(istft(x)) -- all other stripes vanish.
If stft(istft(x)) (bottom) was equal to x (top), both images would look similar.
Why are they so different?
It seems like stft(istft(x)) can pick up only certain frequencies of x.
x (top) and stft of istft of x (bottom)
I have also tried the same with scipy.signal.istft and scipy.signal.stft which causes the same issue.
Moreover, I have tried it with a wide range of tensors x, e.g., different randomized distributions, images, and other stripes.
Also, I have tried a variety of hyper-parameters for stft/istft.
Only for x generated by a short-time Fourier transform from a sound wave, it works.
A short-time Fourier transform produces a more data than there is in the original signal. Where a signal has N real samples, then STFT might have 4N complex samples -- 8 times more data.
It follows that the ISTFT operation must discard 7/8 of the data you provide it.
Most of the data in a STFT is redundant, and if you just make up values for all of the data, it is unlikely to correspond to a real signal.
In that case, an implementation of ISTFT will probably use a least-squares fit or other method of producing a signal with an STFT that matches your data as closely as possible, but it won't always be close.

intensity of point process - weights with covariate - spatstat

I am trying spatstat for a specific case. In my shapefile of roads, i have attributes speed and % of heavy vehicles on each road. It is an observation that severe accidents are likely to happen on roads with high speeds and more heavy vehicles (because road is not properly access controlled and pedestrians cross the road). We know that there are accidents at a rate (per 5km stretch).
I would like to generate a random poisson with that rate, but giving weight that the points happen more on roads with high speed ( or high % truck)
and if possible also to include the second variable % of trucks
What is the best way to model the two aspects to make a small proof of concept? I have read (portions of) the spatstat book and section on influence of covariates on intensity, but this is still unclear to me.
Thanks
The spatstat function rpoislpp generates a Poisson random point pattern on the network with a given intensity. In this case, you want a spatially-varying intensity, which can be specified by a function of spatial location. That is, you want something like rpoislpp(f, L) where L is the linear network and f is the intensity function.
I assume you have obtained values of the covariate (like speed limit and fraction of trucks) for each road. Then you need to build a function that looks up these values at any spatial location on the network. Once you have this, you can write the intensity function in terms of it.
To start, suppose you have a network L (object of class linnet). The segments of the network can be indexed in the original order given when you specified them: or you can extract these segments by S <- as.psp(L). We need a vector z giving the covariate values for each of these segments (so this will be a numeric vector of length n=nsegments(S)). Then z[i] is the covariate value along segment i. (Note: if you have covariate values for each road, where a road consists of multiple segments of L, then you first need to figure out which segments of L belong to each road, and construct z.)
Next do the following:
Zfun <- linfun(function(x,y,seg,tp) { z[seg] }, L)
This creates a function on the linear network (class linfun) that evaluates the covariate at any spatial location on L. To check it's built correctly, type plot(Zfun).
Now suppose you want the point process intensity to be lambda = exp(3*Z+2). Then do
lam <- function(x,y,seg,tp) { exp(3 * z[seg] + 2) }
lambda <- linfun(lam, L)
(Needless to say, you can write any mathematical expression in the braces; and you can have more than one covariate, etc.)
Finally generate the random points:
X <- rpoislpp(lambda, L)

How to find a regression line for a closed set of data with 4 parameters in matlab or excel?

I have a set of data I have acquired from simulations. There are 3 parameters that go into my simulations and I get one result out.
I can graph the data from the small subset i have and see the trends for each input, but I need to be able to extrapolate this and get some form of a regression equation seeing as the simulation takes a long time.
In matlab or excel, is it possible to list the inputs and outputs to obtain a 4 parameter regression line for a given set of information?
Before this gets flagged as a duplicate, i understand polyfit will give me an equation of best fit and will be as accurate as i want it, but i need the equation to correspond to the inputs, not just a regression line.
In other words if i 20 simulations of inputs a, b, c and output y, is there a way to obtain a "best fit":
y=B0+B1*a+B2*b+B3*c
using the data?
My usual recommendation for higher-dimensional curve fitting is to pose the problem as a minimization problem (that may be unneeded here with the nice linear model you've proposed, but I'm a hammer-nail guy sometimes).
It starts by creating a correlation function (the functional form you think maps your inputs to the output) given a vector of fit parameters p and input data xData:
correl = #(p,xData) p(1) + p(2)*xData(:,1) + p(3)*xData(:2) + p(4)*xData(:,3)
Then you need to define a function to minimize given the parameter vector, which I call the objective; this is typically your correlation minus you output data.
The details of this function are determined from the solver you'll use (see below).
All of the method need a starting vector pGuess, which is dependent on the trends you see.
For nonlinear correlation function, finding a good pGuess can be a trial but necessary for a good solution.
fminsearch
To use fminsearch, the data must be collapsed to a scalar value using some norm (2 here):
x = [a,b,c]; % your input data as columns of x
objective = #(p) norm(correl(p,x) - y,2);
p = fminsearch(objective,pGuess); % you need to define a good pGuess
lsqnonlin
To use lsqnonlin (which solves the same problem as above in different ways), the norm-ing of the objective is not needed:
objective = #(p) correl(p,x) - y ;
p = lsqnonlin(objective,pGuess); % you need to define a good pGuess
(You can also specify lower and upper bounds on the parameter solution, which is nice.)
lsqcurvefit
To use lsqcurvefit (which is simply a wrapper for lsqnonlin), only the correlation function is needed along with the data:
p = lsqcurvefit(correl,pGuess,x,y); % you need to define a good pGuess

Resources