How to modify a 2d tensor to 3d in libtorch?

How to modify a 2d tensor to 3d in libtorch? - pytorch

I have a 2d array vector<vector>, I have coverted it to tensor, but how to modify the dimension of the tensor, I want to modify the dimension from 2d to 3d?
std::vector<std::vector<float>> voice(434, std::vector<float>(80))
ifstream fp("data.txt");
if (! fp) {
cout << "Error, file couldn't be opened" << endl;
return 1;
}
for(int i=0;i<80;i++)
{
for(int j=0;j<434;j++)
{
if ( ! fp )
{
std::cout << "read error" << std::endl;
break;
}
fp >> voice[i][j]
}
}
auto options = torch::TensorOptions().dtype(at::kDouble);
auto tensor = torch::zeros({80,434}, options);
for (int i = 0; i < 80; i++)
tensor.slice(0, i,i+1) = torch::from_blob(vect[i].data(), {434}, options);
Now the tensor is 80 * 434, how can I add one dimension in this tensor to 3d, I want 1 * 80 * 434

auto tensor = torch::zeros({80,434}, options);
followed by this line
auto tensor = tensor.view({1, 80, 434});
I would recommend to create another variable instead of tensor in the second line, like:
auto transformed_tensor = tensor.view({1, 80, 434});

Related

Problem with Serial.read() and Struct.pack / serial communication between Arduino and Python (3.x)

I have a problem while trying to send some values from Python 3.x to Arduino using Serial Communication.
It working fine when the value is smaller than 255, but when it's greater than 255, error will happen.
I'm using Struct.pack in Python and Serial.read() in Arduino
Python code:
import cv2
import numpy as np
from serial import Serial
import struct
arduinoData = Serial('com6', 9600)
cap = cv2.VideoCapture(0)
hand_cascade = cv2.CascadeClassifier('hand.xml')
while(True):
ret, frame = cap.read()
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
handdetect = hand_cascade.detectMultiScale(gray, 1.6, 3)
for (x, y, w, h) in handdetect:
cv2.rectangle(frame, (x, y), (x + w, y + h), (127, 127, 0), 2)
xcenter = int(x + w/2)
ycenter = int(y + h/2)
#This is where i send values to serial port
arduinoData.write(struct.pack('>II',xcenter,ycenter))
cv2.imshow('Webcam', frame)
k = cv2.waitKey(25) & 0xff
if k == 27:
break
cap.release()
cv2.destroyAllWindows()
Arduino code:
int SerialData[8];
const int led = 7;
int xcenter;
int ycenter;
void setup(){
Serial.begin(9600);
pinMode(led, OUTPUT);
}
void loop(){
if (Serial.available() >= 8){
for (int i = 0; i < 8; i++){
SerialData[i] = Serial.read();
}
xcenter = (SerialData[0]*1000) + (SerialData[1]*100) + (SerialData[2]*10) + SerialData[3];
ycenter = (SerialData[4]*1000) + (SerialData[5]*100) + (SerialData[6]*10) + SerialData[7];
if (xcenter <= 200){
digitalWrite(led, LOW);
}
else if(xcenter > 200){
digitalWrite(led, HIGH);
}
//Serial.flush();
}
}
Like I said at the beginning of this topic, when xcenter > 200 && xcenter <= 255, the LED is turned ON (it's mean the code working fine).
But when xcenter > 255, the LED is OFF (something wrong here).
I think I already read all 8 bytes in Arduino code and used unsigned int >II in struct.pack, so what and where is my false?
I'm appreciate for all help! Thank you!
EDIT and FIXED.
"It doesn't pack int into digits (0-9), it packs them into bytes (0-255)"_
So here is the false:
xcenter = (SerialData[0]*1000) + (SerialData[1]*100) + (SerialData[2]*10) + SerialData[3];
ycenter = (SerialData[4]*1000) + (SerialData[5]*100) + (SerialData[6]*10) + SerialData[7];
Changed to this (for the large values):
long result = long((unsigned long(unsigned char(SerialData[0])) << 24) | (unsigned long(unsigned char(SerialData[1])) << 16)
| (unsigned long(unsigned char(SerialData[2])) << 8) | unsigned char(SerialData[3]));
Or changed to this (for the small values):
xcenter = (SerialData[2]*256) + SerialData[3];
ycenter = (SerialData[6]*256) + SerialData[7];
Or this (for the small values too):
int result = int((unsigned int(unsigned char(SerialData[2])) << 8) | unsigned char(SerialData[3]));
And the code gonna work perfectly!

This code will not work correctly for large values...
int xcenter = (SerialData[0]*256*256*256) + (SerialData[1]*256*256) + (SerialData[2]*256) + SerialData[3];
The problem is that the input is 4 bytes wide, which is a long integer, while the integer size on the arduino is only 2 bytes wide, which means that 256 * 256 = 0x10000 & 0xFFFF = 0 !
To make sure you do not run into problems for values wider than 2 bytes, one must use shift operations.
This gives:
long result = long((unsigned long(unsigned char(SerialData[0])) << 24) | (unsigned long(unsigned char(SerialData[1])) << 16)
| (unsigned long(unsigned char(SerialData[2])) << 8) | unsigned char(SerialData[3]));
Alternatively, if you do not expect large values, only use two bytes from the input. Make sure to do the calculus using unsigned values, or you may run into problems !!!
int result = int((unsigned int(unsigned char(SerialData[2])) << 8) | unsigned char(SerialData[3]));
This is very verbose, but it's safe for all input types. For example, the solution presented by the OP would not work if SerialData[] was a char array, which it should be, to avoid wasting memory.

Given a square matrix, calculate the absolute difference between the sums of its diagonals

import numpy as np n=int(input())
R = n C = n p,s=0,0
print("Enter the entries in a single line (separated by space): ")
entries = list(map(int, input().split())) matrix = np.array(entries).reshape(R, C) print(matrix) for i in range(R): for j in range(C): if i==j: p=p+matrix[i][j] if i+j==n-1: s=s+matrix[i][j] s1=p-s print(s1)

r_sum=0
l_sum=0
for i in range(len(arr)):
l_sum=l_sum+arr[i][i]
r_sum=r_sum+arr[i][(len(arr)-1)-i]
return abs(l_sum - r_sum)
#pyhton3 using array concept

Maybe this helps:
c = np.array([[1,2,3],[4,5,6],[7,8,9]])
i,j = np.indices(c.shape)
sum1 = c[i==j].sum()
sum2 = c[i+j == len(c)-1].sum()
print(abs(sum1-sum2))

function absoluteDifference(arr){
var sumDiagnoalOne=0
var sumDiagnoalTwo=0
for(var i=0; i<arr.length; i++){
for(var j=i; j<arr.length; j++){
sumDiagnoalOne+=arr[i][j]
break
}
}
var checkArray=[]
arr.map(array=>checkArray.push(array.reverse()))
for(var i=0; i<checkArray.length; i++){
for(var j=i; j<checkArray.length; j++){
sumDiagnoalTwo+=checkArray[i][j]
break
}
}
return Math.abs(sumDiagnoalOne- sumDiagnoalTwo)
}

#include
using namespace std;
int main() {
int n;
cin >> n;
int arr[n][n];
long long int d1=0; //First Diagonal
long long int d2=0; //Second Diagonal
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
cin >> arr[i][j];
if (i == j) d1 += arr[i][j];
if (i == n - j - 1) d2 += arr[i][j];
}
}
cout << abs(d1 - d2) << endl; //Absolute difference of the sums across the
diagonals
return 0;
}

#!/bin/ruby
n = gets.strip.to_i
a = Array.new(n)
(0..n-1).each do |i|
a[i] = gets.strip.split(' ').map(&:to_i)
end
d1 = 0
d2 = 0
(0..n-1).each do |i|
d1 = d1 + a[i][i]
d2 = d2 + a[-i-1][i]
end
print (d1-d2).abs

Javascript in O(n)
function diagonalDifference(arr) {
const size = arr.length;
let lsum = 0;
let rsum = 0;
for(let i = 0; i < size; i ++){
lsum += arr[i][i];
rsum += arr[i][Math.abs(size - 1 - i)];
}
return Math.abs(lsum - rsum);
}

//sample array matrix 4x4
const arr=[ [ 11, 2, 4 ,5], [ 4, 5, 6,4 ], [ 10, 8, -12,6 ],[ 10, 8, -12,6 ] ];
function findMedian(arr) {
const matrixType=arr.length
const flat=arr.flat()
let sumDiag1=0
let sumDiag2=0
for(let i=0;i<matrixType;i++)
{
sumDiag1+=flat[i*(matrixType+1)]
sumDiag2+=flat[(i+1)*(matrixType-1)]
}
const diff=Math.abs(sumDiag1-sumDiag2)
return diff
}
console.log(findMedian(arr))

How to detect the fuzzy edge of a raindrop?

I want to extract the edge of the raindrop.
This is raindrop's photo.
I divide the picture into 8*8 blocks and extract the edges using sobel and canny. Now I can get a rough edge.
This is the edge I got.
I can't get the fuzzy edge of the raindrop.
This fuzzy edge I can't get
//sobel
Mat SobelProcess(Mat src)
{
Mat Output;
Mat grad_x, grad_y, abs_grad_x, abs_grad_y, SobelImage;
Sobel(src, grad_x, CV_16S, 1, 0, CV_SCHARR, 1, 1, BORDER_DEFAULT);
Sobel(src, grad_y, CV_16S, 0, 1, CV_SCHARR, 1, 1, BORDER_DEFAULT);
convertScaleAbs(grad_x, abs_grad_x);
convertScaleAbs(grad_y, abs_grad_y);
addWeighted(abs_grad_x, 0.5, abs_grad_y, 0.5, 0, Output);
//subtract(grad_x, grad_y, SobelImage);
//convertScaleAbs(SobelImage, Output);
return Output;
}
int main()
{
Mat Src;
Src = imread("rain.bmp",0)
imshow("src", Src);
Mat Gauss;
GaussianBlur(Src, Src, Size(5, 5), 0.5);
imshow("Gauss", Src);
//M * N = 8 * 8
int OtsuThresh[M * N];
vector<Mat>tempThresh = ImageSegment(Src);
for (int i = 0; i < M * N; i++)
{
OtsuThresh[i] = Otsu(tempThresh[i]); //get Otsu Threshold
}
vector<Mat>temp;
temp = ImageSegment(Src);//ImageSegment() is a function to divide the picture into 8*8 blocks
for (int i = 0; i < M * N; i++)
{
temp[i] = SobelProcess(temp[i]);
GaussianBlur(temp[i], temp[i], Size(3, 3), 0.5);
Canny(temp[i], temp[i], OtsuThresh[i] / 3, OtsuThresh[i]);
}
Mat Tem;
Tem = ImageMerge(temp);//ImageMerge() is a function to merge the blocks
imshow("Tem", Tem);
}
Then I use watershed. But I can't use it get an ideal result.

PyOpenCl Kernel in Loop Crashes GPU

I am writing a neighbor look up routine that is brute force using pypopencl. Later on it will fit into my smoothed particle hydro code. Brute force certainly is not efficient but its simple and its a starting point. I have been testing my look up kernel and I find that when I run it in a loop it crashes. I don't get any error messages in python but the screen flickers off, then comes back on with a note that the graphics drivers failed but have been recovered. The odd thing is that if the number of particles that are searched over are small (~1000 or less) its does just fine. If I increase the count (~10k) it crashes. I tried adding in barriers and wait commands, and a finish command, to no avail. I checked to see if I have an array overrun but I cannot find it. I am including the relevant code and apologize upfront for the size of it but wanted to give it out everything so people can look at it. I am hoping some one can run this and recreate the error, or tell me where I am going wrong. My setup is python 3.5 using spyder and installed pyopencl 2016.1.
Thanks,
Seth
First The main file
import numpy as np
import gpuParameters as gpuParameters
import pyopencl as cl
import pyopencl.array as ar
from BruteForceSearch import BruteForceSearch
import time as time
dim = 3 # dimensions of the problem
n = 15000 # number of particles
nbs = 50 # number of neighbors
x = np.random.rand(n) # randomly choose some x
y = np.random.rand(n) # randomly choose some y
z = np.random.rand(n) # randomly choose some z
h = np.ones(n) # smoothing parameter for the b spline
# setup gpu context
gpu = gpuParameters.gpuParameters()
# neighbor list
nlist = -1*np.ones(n*nbs, dtype=np.int32)
# data to gpu
xg = ar.to_device(gpu.queue, x) # x pos on gpu
yg = ar.to_device(gpu.queue, y) # y pos on gpu
zg = ar.to_device(gpu.queue, z) # z pos on gpu
hg = ar.to_device(gpu.queue, h) # h pos on gpu
num_p = ar.to_device(gpu.queue, np.array(n, dtype=np.int32)) # num of particles
nb = ar.to_device(gpu.queue, np.array(nbs, dtype=np.int32)) # num of neighbors
nlst = ar.to_device(gpu.queue, nlist) # neighbor list on gpu
dg = ar.to_device(gpu.queue, np.array(dim, dtype=np.int32)) # dimension on gpu
out = ar.zeros(gpu.queue, n, np.float64) # debug parameter
# call the Brute force neighbor search and h parameter set class
srch = BruteForceSearch(gpu) # instatiate
s = time.time() # timer start
for ii in range(100):
# set a marker I really didn't think this would be necessary
mark = cl.enqueue_marker(gpu.queue) # set a marker for kernel complete
srch.search.search(gpu.queue, x.shape, None,
num_p.data, nb.data, dg.data, xg.data, yg.data, zg.data,
hg.data, nlst.data, out.data) # run the kernel
cl.Event.wait(mark) # wait for complete run of kernel before next iteration
# gpu.queue.finish()
print('iteration: ', ii) # print iteration time to show me its running
e = time.time() # end the timer
cs = time.time() # clock the time it takes to return the array
nlist = nlst.get()
ce = time.time()
# output the times
print('time to calculate: ', e-s)
print('time to copy back: ', ce - cs)
GPU Context Class
import pyopencl as cl
class gpuParameters:
def __init__(self, dType = []):
#will setup the proper context based on given device preference
#if no device perference given will default to first value
if dType == []:
pltfrms = cl.get_platforms()[0]
devices = pltfrms.get_devices(cl.device_type.GPU)
context = cl.Context(devices) #create a device context
print(context)
print(devices)
self.cntxt = context#keep this context in motion
self.queue = cl.CommandQueue(self.cntxt) #create a command que for this context
self.mF = cl.mem_flags
Neighbor Loop up
import numpy as np
import pyopencl as cl
import gpu_sph_assistance_functions as gsaf
class BruteForceSearch:
def __init__(self, gpu):
# instantiation of the search routine primarilly for pre compiling of
# the function
self.gpu = gpu # save the gpu context
# setup and compile the search
self.bruteSearch()
def bruteSearch(self):
W = gsaf.gpu_sph_kernel()
self.search = cl.Program(
self.gpu.cntxt,
W + '''__kernel void search(__global int *nP, __global int *nN,
__global int *dim,
__global double *x, __global double *y,
__global double *z, __global double *h,
__global int *nlist, __global double *out)
{
// indices
int gid = get_global_id(0); // current particle
int idv = 0; // unrolled array id
int count = 0; // count
int dm = *dim; // problem dimension
int itr = 0; // start iteration
int mxitr = 25; // max number of iterations
// calculate variables
double dms = 1.0/(*dim); // 1 over dimension for pow
double xi = x[gid]; // current x position
double yi = y[gid]; // current y position
double zi = z[gid]; // current z position
double dx = 0; // difference in x
double dy = 0; // difference in y
double dz = 0; // difference in z
double r = 0; // radius
double hg = h[gid]; // smoothing parametre
double Wsum = 0; // sum of weights
double W = 0; // current weight
double dwdx = 0; // derivative of weight in x direction
double dwdy = 0; // derivative of weight in y direction
double dwdz = 0; // derivative of weight in z direction
double dwdr = 0; // derivative of weight in r direction
double V = 0; // Volume of particle
double hn = 0; // holding value for comparison
double err = 10; // error
double tol = 1e-7; // tolerance
double diff = 0; // difference
// first clean the array of neighbors
for (int ii = 0; ii < *nN; ii++) // length of num of neighbors
{
idv = *nN*gid + ii; // unrolled index
nlist[idv] = -1; // this is a trigger for excluding values
}
// Next calculate the h parameter
while (err > tol)
{
Wsum = 0; // clean summation
for (int jj = 0; jj < *nP; jj++) // loop over all particles
{
dx = xi - x[jj];
dy = yi - y[jj];
dz = zi - z[jj];
// spline for weights
quintic_spline(dm, hg, dx, dy, dz, &W,
&dwdx, &dwdy, &dwdz, &dwdr);
Wsum += W; // add to store
}
V = 1.0/Wsum; /// volume
hn = pow(V, dms); // new h parameter
diff = hn - hg; // difference
err = fabs(diff); // error
out[gid] = err; // store error for debug
hg = hn; // reset h
itr ++; // update iter
if (itr > mxitr) // break out
{ break; }
}
h[gid] = hg; // store h
/* // get all neighbors in vicinity of particle not
// currently assessed
for(int ii = 0; ii < *nP; ii++)
{
dx = xi - x[ii];
dy = yi - y[ii];
dz = zi - z[ii];
r = sqrt(dx*dx + dy*dy + dz*dz);
if (r < 3.25*hg & count < *nN)
{
idv = *nN*gid + count;
nlist[idv] = ii;
count++;
}
}
*/
}
''').build()
The Spline function for weighting
W = '''void quintic_spline(
int dim, double h, double dx, double dy, double dz, double *W,
double *dWdx, double *dWdy, double *dWdz, double *dWdrO)
{
double pi = 3.141592654; // pi
double m3q = 0; // prefix values
double m2q = 0; // prefix values
double m1q = 0; // prefix values
double T1 = 0; // prefix values
double T2 = 0; // prefix values
double T3 = 0; // prefix values
double D1 = 0; // prefix values
double D2 = 0; // prefix values
double D3 = 0; // prefix values
double Ch = 0; // normalizing parameter for kernel
double C = 0; // normalizing prior to h
double r = sqrt(dx*dx + dy*dy + dz*dz);
double q = r/h; // normalized radius
double dqdr = 1.0/h; // intermediate derivative
double dWdq = 0; // intermediate derivative
double dWdr = 0; // intermediate derivative
double drdx = dx/r; // intermediate derivative
double drdy = dy/r; // intermediate derivative
double drdz = dz/r; // intermediate derivative
if (dim == 1)
{
C = 1.0/120.0;
}
else if (dim == 2)
{
C = 7.0/(pi*478.0);
}
else if (dim == 3)
{
C = 1.0/(120.0*pi);
}
Ch = C/pow(h, dim);
if (r <= 0)
{
drdx = 0.0;
drdy = 0.0;
drdz = 0.0;
}
// local prefix constants
m1q = 1.0 - q;
m2q = 2.0 - q;
m3q = 3.0 - q;
// smoothing parameter constants
T1 = Ch*pow(m3q, 5);
T2 = -6.0*Ch*pow(m2q, 5);
T3 = 15.0*Ch*pow(m1q, 5);
//derivative of spline coefficients
D1 = -5.0*Ch*pow(m3q,4);
D2 = 30.0*Ch*pow(m2q,4);
D3 = -75.0*Ch*pow(m1q,4);
// W calculation
if (q < 1.0)
{
*W = T1 + T2 + T3;
dWdq = D1 + D2 + D3;
}
else if (q >= 1.0 && q < 2.0)
{
*W = T1 + T2;
dWdq = D1 + D2;
}
else if (q >= 2.0 && q < 3.0)
{
*W = T1;
dWdq = D1;
}
else
{
*W = 0.0;
dWdq = 0.0;
}
dWdr = dWdq*dqdr;
// assign the derivatives
*dWdx = dWdr*drdx;
*dWdy = dWdr*drdy;
*dWdz = dWdr*drdz;
*dWdrO = dWdr;
}'''

I tested the code on a Intel i7-4790K CPU with AMD Accelerated Parallel Processing. It does not crash at n=150000 (I only run one iteration). The only odd thing I discovered while quickly looking into the code, was that the kernel is reading and writing in the array h. This should not be a problem, but still I usually try to avoid this.

Two-bit branch prediction should give higher percentage

I have small program that is supposed to calculate percentage of successfull predicition of a 2-bit branch predictor. I have it all done but my output isn't what I have expected, the percantage stops at about 91% instead of what I think should be at 98% or 99%. I think the problem might be with how apply the mask to the address. Can some one look at my code and verify if that is the issue.
The program iterates through a file that has a branch history of a run of the gcc compiler consisting of about 1792 addresses and a single digit column with 1 for a branch taken and a 0 for a branch not taken.
static void twoBitPredictor_v1(StreamWriter sw)
{
uint hxZero = 0x000000000;
uint uMask1 = 0x00000000;
int nCorrectPrediction = 0;
uint uSize2;
int nSize;
int nTotalReads;
int nTableMin = 2;
int nTableMax = 16;
int nTaken = 0;
uint[] uArrBt1;
sw.WriteLine("\n\nTwo-Bit Predictor Results Ver. 1\n");
sw.WriteLine("-------------------------\n");
sw.WriteLine("Total" + "\t" + "Correct");
sw.WriteLine("Reads" + "\t" + "Prediction" + "\t" + "Percentage");
System.Console.WriteLine("\n\nTwo-Bit Predictor Results Ver. 1\n");
System.Console.WriteLine("-------------------------\n");
System.Console.WriteLine("Total" + "\t" + "Correct");
System.Console.WriteLine("Reads" + "\t" + "Prediction" + "\t" + "Percentage");
for (int _i = nTableMin; _i <= nTableMax; _i++)
{
StreamReader sr2 = new StreamReader(#"C:\Temp\gccHist.txt");
nSize = _i;
uSize2 = (uint)(Math.Pow(2, nSize));
uArrBt1 = new uint[2 * uSize2];
for (int i = 0; i < uSize2; i++)
uArrBt1[i] = hxZero;
nCorrectPrediction = 0;
nTotalReads = 0;
while (!sr2.EndOfStream)
{
String[] strLineRead = sr2.ReadLine().Split(',');
uint uBRAddress = Convert.ToUInt32(strLineRead[0], 16);
uint bBranchTaken = Convert.ToUInt32(strLineRead[2]);
>>>>> In the line below is where I think lies the problem but not sure how to correct it.
uMask1 = uBRAddress & (0xffffffff >> 32 - nSize);
int _mask = Convert.ToInt32(uMask1);
nTaken = Convert.ToInt32(uArrBt1[2 * _mask]);
switch (Convert.ToInt32(uArrBt1[_mask]))
{
case 0:
if (bBranchTaken == 0) // Branch Not Taken
nCorrectPrediction++;
else
uArrBt1[_mask] = 1;
break;
case 1:
if (bBranchTaken == 0)
{
uArrBt1[_mask] = 0;
nCorrectPrediction++;
}
else
uArrBt1[_mask] = 3;
break;
case 2:
if (bBranchTaken == 0)
{
uArrBt1[_mask] = 3;
nCorrectPrediction++;
}
else
uArrBt1[_mask] = 0;
break;
case 3:
if (bBranchTaken == 0)
uArrBt1[_mask] = 2;
else
nCorrectPrediction++;
break;
}
nTotalReads++;
}
sr2.Close();
double percentage = ((double)nCorrectPrediction / (double)nTotalReads) * 100;
sw.WriteLine(nTotalReads + "\t" + nCorrectPrediction + "\t\t" + Math.Round(percentage, 2) + "%");
System.Console.WriteLine(nTotalReads + "\t" + nCorrectPrediction + "\t\t" + Math.Round(percentage, 2) + "%");
}
}
Here is the output:
Two-Bit Predictor Results Ver. 1
-------------------------
Total Correct
Reads Prediction Percentage
1792 997 55.64%
1792 997 55.64%
1792 1520 84.82%
1792 1522 84.93%
1792 1521 84.88%
1792 1639 91.46%
1792 1651 92.13%
1792 1649 92.02%
1792 1649 92.02%
1792 1648 91.96%
1792 1646 91.85%
1792 1646 91.85%
1792 1646 91.85%
1792 1646 91.85%
1792 1646 91.85%

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to modify a 2d tensor to 3d in libtorch? - pytorch

auto tensor = torch::zeros({80,434}, options); followed by this line auto tensor = tensor.view({1, 80, 434}); I would recommend to create another variable instead of tensor in the second line, like: auto transformed_tensor = tensor.view({1, 80, 434});

Related

Problem with Serial.read() and Struct.pack / serial communication between Arduino and Python (3.x)

Given a square matrix, calculate the absolute difference between the sums of its diagonals

How to detect the fuzzy edge of a raindrop?

PyOpenCl Kernel in Loop Crashes GPU

Two-bit branch prediction should give higher percentage

Categories

Resources