constantBuffer get wrong value (feat memcopy) - resources

I am trying to implement the constant buffer for the array.
So I am using MEMCOPY macro and mapping for writing into constant buffer.
First, write data into a constant buffer
struct TestStruct {
Vector2 a[10];
Vector2 b[10];
TestStruct() {}
};
TestStruct* vs = new TestStruct();
Vector2* a = new Vector2[10];
Vector2* b = new Vector2[10];
for (int i = 0; i < 10; ++i)
{
a[i] = Vector2(i);
b[1] = Vector2(i);
}
CopyMemory(vs->a, a, sizeof(Vector2)*10);
CopyMemory(vs->b, b, sizeof(Vector2)*10);
D3D11_MAPPED_SUBRESOURCE mappedData;
dContext->Map(buffer.Get(), 0, D3D11_MAP_WRITE_DISCARD, 0, &mappedData);
CopyMemory(mappedData.pData, data, sizeof(T));
dContext->Unmap(buffer.Get(), 0);
dContext->PSSetConstantBuffers(0, 1, buffer.GetAddressOf());
and read the buffer for checking data in it
dContext->CopyResource(stagingBuffer.Get(), buffer);
D3D11_MAPPED_SUBRESOURCE mappedResource;
ZeroMemory(&mappedResource, sizeof(D3D11_MAPPED_SUBRESOURCE));
dContext->Map(stagingBuffer.Get(), 0, D3D11_MAP_READ, 0, &mappedResource);
ZeroMemory(allocatedData, bufferDesc.ByteWidth);
CopyMemory(allocatedData, mappedResource.pData, bufferDesc.ByteWidth);
dContext->Unmap(buffer, 0);
the data that I can read from 'allocatedData' is
a[0] = (0,0)
a[1] = (1,1)
a[2] = (2,2)
a[3] = (3,3)
a[4] = (4,4)
a[5] = (5,5)
a[6] = (6,6)
a[7] = (7,7)
a[8] = (8,8)
a[9] = (9,9)
b[0] = (0,0)
b[1] = (9,9)
b[2] = (0,0)
b[3] = (0,0)
b[4] = (0,0)
b[5] = (0,0)
b[6] = (0,0)
b[7] = (0,0)
b[8] = (0,0)
b[9] = (0,0)
'a' has the right value intended
but 'b' has wrong value.
i just guess there might be problem on how to use memcopy or allocating datas.

Related

Calculating number of minimum swaps to sort array (selection sort is too slow) [duplicate]

I'm working on sorting an integer sequence with no identical numbers (without loss of generality, let's assume the sequence is a permutation of 1,2,...,n) into its natural increasing order (i.e. 1,2,...,n). I was thinking about directly swapping the elements (regardless of the positions of elements; in other words, a swap is valid for any two elements) with minimal number of swaps (the following may be a feasible solution):
Swap two elements with the constraint that either one or both of them should be swapped into the correct position(s). Until every element is put in its correct position.
But I don't know how to mathematically prove if the above solution is optimal. Anyone can help?
I was able to prove this with graph-theory. Might want to add that tag in :)
Create a graph with n vertices. Create an edge from node n_i to n_j if the element in position i should be in position j in the correct ordering. You will now have a graph consisting of several non-intersecting cycles. I argue that the minimum number of swaps needed to order the graph correctly is
M = sum (c in cycles) size(c) - 1
Take a second to convince yourself of that...if two items are in a cycle, one swap can just take care of them. If three items are in a cycle, you can swap a pair to put one in the right spot, and a two-cycle remains, etc. If n items are in a cycle, you need n-1 swaps. (This is always true even if you don't swap with immediate neighbors.)
Given that, you may now be able to see why your algorithm is optimal. If you do a swap and at least one item is in the right position, then it will always reduce the value of M by 1. For any cycle of length n, consider swapping an element into the correct spot, occupied by its neighbor. You now have a correctly ordered element, and a cycle of length n-1.
Since M is the minimum number of swaps, and your algorithm always reduces M by 1 for each swap, it must be optimal.
All the cycle counting is very difficult to keep in your head. There is a way that is much simpler to memorize.
First, let's go through a sample case manually.
Sequence: [7, 1, 3, 2, 4, 5, 6]
Enumerate it: [(0, 7), (1, 1), (2, 3), (3, 2), (4, 4), (5, 5), (6, 6)]
Sort the enumeration by value: [(1, 1), (3, 2), (2, 3), (4, 4), (5, 5), (6, 6), (0, 7)]
Start from the beginning. While the index is different from the enumerated index keep on swapping the elements defined by index and enumerated index. Remember: swap(0,2);swap(0,3) is the same as swap(2,3);swap(0,2)
swap(0, 1) => [(3, 2), (1, 1), (2, 3), (4, 4), (5, 5), (6, 6), (0, 7)]
swap(0, 3) => [(4, 4), (1, 1), (2, 3), (3, 2), (5, 5), (6, 6), (0, 7)]
swap(0, 4) => [(5, 5), (1, 1), (2, 3), (3, 2), (4, 4), (6, 6), (0, 7)]
swap(0, 5) => [(6, 6), (1, 1), (2, 3), (3, 2), (4, 4), (5, 5), (0, 7)]
swap(0, 6) => [(0, 7), (1, 1), (2, 3), (3, 2), (4, 4), (5, 5), (6, 6)]
I.e. semantically you sort the elements and then figure out how to put them to the initial state via swapping through the leftmost item that is out of place.
Python algorithm is as simple as this:
def swap(arr, i, j):
arr[i], arr[j] = arr[j], arr[i]
def minimum_swaps(arr):
annotated = [*enumerate(arr)]
annotated.sort(key = lambda it: it[1])
count = 0
i = 0
while i < len(arr):
if annotated[i][0] == i:
i += 1
continue
swap(annotated, i, annotated[i][0])
count += 1
return count
Thus, you don't need to memorize visited nodes or compute some cycle length.
For your reference, here is an algorithm that I wrote, to generate the minimum number of swaps needed to sort the array. It finds the cycles as described by #Andrew Mao.
/**
* Finds the minimum number of swaps to sort given array in increasing order.
* #param ar array of <strong>non-negative distinct</strong> integers.
* input array will be overwritten during the call!
* #return min no of swaps
*/
public int findMinSwapsToSort(int[] ar) {
int n = ar.length;
Map<Integer, Integer> m = new HashMap<>();
for (int i = 0; i < n; i++) {
m.put(ar[i], i);
}
Arrays.sort(ar);
for (int i = 0; i < n; i++) {
ar[i] = m.get(ar[i]);
}
m = null;
int swaps = 0;
for (int i = 0; i < n; i++) {
int val = ar[i];
if (val < 0) continue;
while (val != i) {
int new_val = ar[val];
ar[val] = -1;
val = new_val;
swaps++;
}
ar[i] = -1;
}
return swaps;
}
We do not need to swap the actual elements, just find how many elements are not in the right index (Cycle).
The min swaps will be Cycle - 1;
Here is the code...
static int minimumSwaps(int[] arr) {
int swap=0;
boolean visited[]=new boolean[arr.length];
for(int i=0;i<arr.length;i++){
int j=i,cycle=0;
while(!visited[j]){
visited[j]=true;
j=arr[j]-1;
cycle++;
}
if(cycle!=0)
swap+=cycle-1;
}
return swap;
}
#Archibald, I like your solution, and such was my initial assumptions that sorting the array would be the simplest solution, but I don't see the need to go through the effort of the reverse-traverse as I've dubbed it, ie enumerating then sorting the array and then computing the swaps for the enums.
I find it simpler to subtract 1 from each element in the array and then to compute the swaps required to sort that list
here is my tweak/solution:
def swap(arr, i, j):
tmp = arr[i]
arr[i] = arr[j]
arr[j] = tmp
def minimum_swaps(arr):
a = [x - 1 for x in arr]
swaps = 0
i = 0
while i < len(a):
if a[i] == i:
i += 1
continue
swap(a, i, a[i])
swaps += 1
return swaps
As for proving optimality, I think #arax has a good point.
// Assuming that we are dealing with only sequence started with zero
function minimumSwaps(arr) {
var len = arr.length
var visitedarr = []
var i, start, j, swap = 0
for (i = 0; i < len; i++) {
if (!visitedarr[i]) {
start = j = i
var cycleNode = 1
while (arr[j] != start) {
j = arr[j]
visitedarr[j] = true
cycleNode++
}
swap += cycleNode - 1
}
}
return swap
}
I really liked the solution of #Ieuan Uys in Python.
What I improved on his solution;
While loop is iterated one less to increase speed; while i < len(a) - 1
Swap function is de-capsulated to make one, single function.
Extensive code comments are added to increase readability.
My code in python.
def minimumSwaps(arr):
#make array values starting from zero to match index values.
a = [x - 1 for x in arr]
#initialize number of swaps and iterator.
swaps = 0
i = 0
while i < len(a)-1:
if a[i] == i:
i += 1
continue
#swap.
tmp = a[i] #create temp variable assign it to a[i]
a[i] = a[tmp] #assign value of a[i] with a[tmp]
a[tmp] = tmp #assign value of a[tmp] with tmp (or initial a[i])
#calculate number of swaps.
swaps += 1
return swaps
Detailed explanation on what code does on an array with size n;
We check every value except last one (n-1 iterations) in the array one by one. If the value does not match with array index, then we send this value to its place where index value is equal to its value. For instance, if at a[0] = 3. Then this value should swap with a[3]. a[0] and a[3] is swapped. Value 3 will be at a[3] where it is supposed to be. One value is sent to its place. We have n-2 iteration left. I am not interested what is now a[0]. If it is not 0 at that location, it will be swapped by another value latter. Because that another value also exists in a wrong place, this will be recognized by while loop latter.
Real Example
a[4, 2, 1, 0, 3]
#iteration 0, check a[0]. 4 should be located at a[4] where the value is 3. Swap them.
a[3, 2, 1, 0, 4] #we sent 4 to the right location now.
#iteration 1, check a[1]. 2 should be located at a[2] where the value is 1. Swap them.
a[3, 1, 2, 0, 4] #we sent 2 to the right location now.
#iteration 2, check a[2]. 2 is already located at a[2]. Don't do anything, continue.
a[3, 1, 2, 0, 4]
#iteration 3, check a[3]. 0 should be located at a[0] where the value is 3. Swap them.
a[0, 1, 2, 3, 4] #we sent 0 to the right location now.
# There is no need to check final value of array. Since all swaps are done.
Nicely done solution by #bekce. If using C#, the initial code of setting up the modified array ar can be succinctly expressed as:
var origIndexes = Enumerable.Range(0, n).ToArray();
Array.Sort(ar, origIndexes);
then use origIndexes instead of ar in the rest of the code.
Swift 4 version:
func minimumSwaps(arr: [Int]) -> Int {
struct Pair {
let index: Int
let value: Int
}
var positions = arr.enumerated().map { Pair(index: $0, value: $1) }
positions.sort { $0.value < $1.value }
var indexes = positions.map { $0.index }
var swaps = 0
for i in 0 ..< indexes.count {
var val = indexes[i]
if val < 0 {
continue // Already visited.
}
while val != i {
let new_val = indexes[val]
indexes[val] = -1
val = new_val
swaps += 1
}
indexes[i] = -1
}
return swaps
}
This is the sample code in C++ that finds the minimum number of swaps to sort a permutation of the sequence of (1,2,3,4,5,.......n-2,n-1,n)
#include<bits/stdc++.h>
using namespace std;
int main()
{
int n,i,j,k,num = 0;
cin >> n;
int arr[n+1];
for(i = 1;i <= n;++i)cin >> arr[i];
for(i = 1;i <= n;++i)
{
if(i != arr[i])// condition to check if an element is in a cycle r nt
{
j = arr[i];
arr[i] = 0;
while(j != 0)// Here i am traversing a cycle as mentioned in
{ // first answer
k = arr[j];
arr[j] = j;
j = k;
num++;// reducing cycle by one node each time
}
num--;
}
}
for(i = 1;i <= n;++i)cout << arr[i] << " ";cout << endl;
cout << num << endl;
return 0;
}
Solution using Javascript.
First I set all the elements with their current index that need to be ordered, and then I iterate over the map to order only the elements that need to be swapped.
function minimumSwaps(arr) {
const mapUnorderedPositions = new Map()
for (let i = 0; i < arr.length; i++) {
if (arr[i] !== i+1) {
mapUnorderedPositions.set(arr[i], i)
}
}
let minSwaps = 0
while (mapUnorderedPositions.size > 1) {
const currentElement = mapUnorderedPositions.entries().next().value
const x = currentElement[0]
const y = currentElement[1]
// Skip element in map if its already ordered
if (x-1 !== y) {
// Update unordered position index of swapped element
mapUnorderedPositions.set(arr[x-1], y)
// swap in array
arr[y] = arr[x-1]
arr[x-1] = x
// Increment swaps
minSwaps++
}
mapUnorderedPositions.delete(x)
}
return minSwaps
}
If you have an input like 7 2 4 3 5 6 1, this is how the debugging will go:
Map { 7 => 0, 4 => 2, 3 => 3, 1 => 6 }
currentElement [ 7, 0 ]
swapping 1 with 7
[ 1, 2, 4, 3, 5, 6, 7 ]
currentElement [ 4, 2 ]
swapping 3 with 4
[ 1, 2, 3, 4, 5, 6, 7 ]
currentElement [ 3, 2 ]
skipped
minSwaps = 2
Finding the minimum number of swaps required to put a permutation of 1..N in order.
We can use that the we know what the sort result would be: 1..N, which means we don't actually have to do swaps just count them.
The shuffling of 1..N is called a permutation, and is composed of disjoint cyclic permutations, for example, this permutation of 1..6:
1 2 3 4 5 6
6 4 2 3 5 1
Is composed of the cyclic permutations (1,6)(2,4,3)(5)
1->6(->1) cycle: 1 swap
2->4->3(->2) cycle: 2 swaps
5(->5) cycle: 0 swaps
So a cycle of k elements requires k-1 swaps to put in order.
Since we know where each element "belongs" (i.e. value k belongs at position k-1) we can easily traverse the cycle. Start at 0, we get 6, which belongs at 5,
and there we find 1, which belongs at 0 and we're back where we started.
To avoid re-counting a cycle later, we track which elements were visited - alternatively you could perform the swaps so that the elements are in the right place when you visit them later.
The resulting code:
def minimumSwaps(arr):
visited = [False] * len(arr)
numswaps = 0
for i in range(len(arr)):
if not visited[i]:
visited[i] = True
j = arr[i]-1
while not visited[j]:
numswaps += 1
visited[j] = True
j = arr[j]-1
return numswaps
An implementation on integers with primitive types in Java (and tests).
import java.util.Arrays;
public class MinSwaps {
public static int computate(int[] unordered) {
int size = unordered.length;
int[] ordered = order(unordered);
int[] realPositions = realPositions(ordered, unordered);
boolean[] touchs = new boolean[size];
Arrays.fill(touchs, false);
int i;
int landing;
int swaps = 0;
for(i = 0; i < size; i++) {
if(!touchs[i]) {
landing = realPositions[i];
while(!touchs[landing]) {
touchs[landing] = true;
landing = realPositions[landing];
if(!touchs[landing]) { swaps++; }
}
}
}
return swaps;
}
private static int[] realPositions(int[] ordered, int[] unordered) {
int i;
int[] positions = new int[unordered.length];
for(i = 0; i < unordered.length; i++) {
positions[i] = position(ordered, unordered[i]);
}
return positions;
}
private static int position(int[] ordered, int value) {
int i;
for(i = 0; i < ordered.length; i++) {
if(ordered[i] == value) {
return i;
}
}
return -1;
}
private static int[] order(int[] unordered) {
int[] ordered = unordered.clone();
Arrays.sort(ordered);
return ordered;
}
}
Tests
import org.junit.Test;
import static org.junit.Assert.assertEquals;
public class MinimumSwapsSpec {
#Test
public void example() {
// setup
int[] unordered = new int[] { 40, 23, 1, 7, 52, 31 };
// run
int minSwaps = MinSwaps.computate(unordered);
// verify
assertEquals(5, minSwaps);
}
#Test
public void example2() {
// setup
int[] unordered = new int[] { 4, 3, 2, 1 };
// run
int minSwaps = MinSwaps.computate(unordered);
// verify
assertEquals(2, minSwaps);
}
#Test
public void example3() {
// setup
int[] unordered = new int[] {1, 5, 4, 3, 2};
// run
int minSwaps = MinSwaps.computate(unordered);
// verify
assertEquals(2, minSwaps);
}
}
Swift 4.2:
func minimumSwaps(arr: [Int]) -> Int {
let sortedValueIdx = arr.sorted().enumerated()
.reduce(into: [Int: Int](), { $0[$1.element] = $1.offset })
var checked = Array(repeating: false, count: arr.count)
var swaps = 0
for idx in 0 ..< arr.count {
if checked[idx] { continue }
var edges = 1
var cursorIdx = idx
while true {
let cursorEl = arr[cursorIdx]
let targetIdx = sortedValueIdx[cursorEl]!
if targetIdx == idx {
break
} else {
cursorIdx = targetIdx
edges += 1
}
checked[targetIdx] = true
}
swaps += edges - 1
}
return swaps
}
Python code
A = [4,3,2,1]
count = 0
for i in range (len(A)):
min_idx = i
for j in range (i+1,len(A)):
if A[min_idx] > A[j]:
min_idx = j
if min_idx > i:
A[i],A[min_idx] = A[min_idx],A[i]
count = count + 1
print "Swap required : %d" %count
In Javascript
If the count of the array starts with 1
function minimumSwaps(arr) {
var len = arr.length
var visitedarr = []
var i, start, j, swap = 0
for (i = 0; i < len; i++) {
if (!visitedarr[i]) {
start = j = i
var cycleNode = 1
while (arr[j] != start + 1) {
j = arr[j] - 1
visitedarr[j] = true
cycleNode++
}
swap += cycleNode - 1
}
}
return swap
}
else for input starting with 0
function minimumSwaps(arr) {
var len = arr.length
var visitedarr = []
var i, start, j, swap = 0
for (i = 0; i < len; i++) {
if (!visitedarr[i]) {
start = j = i
var cycleNode = 1
while (arr[j] != start) {
j = arr[j]
visitedarr[j] = true
cycleNode++
}
swap += cycleNode - 1
}
}
return swap
}
Just extending Darshan Puttaswamy code for current HackerEarth inputs
Here's a solution in Java for what #Archibald has already explained.
static int minimumSwaps(int[] arr){
int swaps = 0;
int[] arrCopy = arr.clone();
HashMap<Integer, Integer> originalPositionMap
= new HashMap<>();
for(int i = 0 ; i < arr.length ; i++){
originalPositionMap.put(arr[i], i);
}
Arrays.sort(arr);
for(int i = 0 ; i < arr.length ; i++){
while(arr[i] != arrCopy[i]){
//swap
int temp = arr[i];
arr[i] = arr[originalPositionMap.get(temp)];
arr[originalPositionMap.get(temp)] = temp;
swaps += 1;
}
}
return swaps;
}
def swap_sort(arr)
changes = 0
loop do
# Find a number that is out-of-place
_, i = arr.each_with_index.find { |val, index| val != (index + 1) }
if i != nil
# If such a number is found, then `j` is the position that the out-of-place number points to.
j = arr[i] - 1
# Swap the out-of-place number with number from position `j`.
arr[i], arr[j] = arr[j], arr[i]
# Increase swap counter.
changes += 1
else
# If there are no out-of-place number, it means the array is sorted, and we're done.
return changes
end
end
end
Apple Swift version 5.2.4
func minimumSwaps(arr: [Int]) -> Int {
var swapCount = 0
var arrayPositionValue = [(Int, Int)]()
var visitedDictionary = [Int: Bool]()
for (index, number) in arr.enumerated() {
arrayPositionValue.append((index, number))
visitedDictionary[index] = false
}
arrayPositionValue = arrayPositionValue.sorted{ $0.1 < $1.1 }
for i in 0..<arr.count {
var cycleSize = 0
var visitedIndex = i
while !visitedDictionary[visitedIndex]! {
visitedDictionary[visitedIndex] = true
visitedIndex = arrayPositionValue[visitedIndex].0
cycleSize += 1
}
if cycleSize > 0 {
swapCount += cycleSize - 1
}
}
return swapCount
}
Go version 1.17:
func minimumSwaps(arr []int32) int32 {
var swap int32
for i := 0; i < len(arr) - 1; i++{
for j := 0; j < len(arr); j++ {
if arr[j] > arr[i] {
arr[i], arr[j] = arr[j], arr[i]
swap++
}else {
continue
}
}
}
return swap
}

How to detect the fuzzy edge of a raindrop?

I want to extract the edge of the raindrop.
This is raindrop's photo.
I divide the picture into 8*8 blocks and extract the edges using sobel and canny. Now I can get a rough edge.
This is the edge I got.
I can't get the fuzzy edge of the raindrop.
This fuzzy edge I can't get
//sobel
Mat SobelProcess(Mat src)
{
Mat Output;
Mat grad_x, grad_y, abs_grad_x, abs_grad_y, SobelImage;
Sobel(src, grad_x, CV_16S, 1, 0, CV_SCHARR, 1, 1, BORDER_DEFAULT);
Sobel(src, grad_y, CV_16S, 0, 1, CV_SCHARR, 1, 1, BORDER_DEFAULT);
convertScaleAbs(grad_x, abs_grad_x);
convertScaleAbs(grad_y, abs_grad_y);
addWeighted(abs_grad_x, 0.5, abs_grad_y, 0.5, 0, Output);
//subtract(grad_x, grad_y, SobelImage);
//convertScaleAbs(SobelImage, Output);
return Output;
}
int main()
{
Mat Src;
Src = imread("rain.bmp",0)
imshow("src", Src);
Mat Gauss;
GaussianBlur(Src, Src, Size(5, 5), 0.5);
imshow("Gauss", Src);
//M * N = 8 * 8
int OtsuThresh[M * N];
vector<Mat>tempThresh = ImageSegment(Src);
for (int i = 0; i < M * N; i++)
{
OtsuThresh[i] = Otsu(tempThresh[i]); //get Otsu Threshold
}
vector<Mat>temp;
temp = ImageSegment(Src);//ImageSegment() is a function to divide the picture into 8*8 blocks
for (int i = 0; i < M * N; i++)
{
temp[i] = SobelProcess(temp[i]);
GaussianBlur(temp[i], temp[i], Size(3, 3), 0.5);
Canny(temp[i], temp[i], OtsuThresh[i] / 3, OtsuThresh[i]);
}
Mat Tem;
Tem = ImageMerge(temp);//ImageMerge() is a function to merge the blocks
imshow("Tem", Tem);
}
Then I use watershed. But I can't use it get an ideal result.

How to get the average color of a specific area in a webcam feed (Processing/JavaScript)?

I'm using Processing to get a webcam feed from my laptop. In the top left corner, I have drawn a rectangle over the displayed feed. I'm trying to get the average color of the webcam, but only in the region contained by that rectangle.
I keep getting color (0, 0, 0), black, as the result.
Thank you all!
PS sorry if my code seems messy..I'm new at Processing and so I don't know if this might be hard to read or contain bad practices. Thank you.
import processing.video.*;
Capture webcam;
Capture cap;
PImage bg_img;
color bgColor = color(0, 0, 0);
int rMargin = 50;
int rWidth = 100;
color input = color(0, 0, 0);
color background = color(255, 255, 255);
color current;
int bgTolerance = 5;
void setup() {
size(1280,720);
// start the webcam
String[] inputs = Capture.list();
if (inputs.length == 0) {
println("Couldn't detect any webcams connected!");
exit();
}
webcam = new Capture(this, inputs[0]);
webcam.start();
}
void draw() {
if (webcam.available()) {
// read from the webcam
webcam.read();
image(webcam, 0,0);
webcam.loadPixels();
noFill();
strokeWeight(2);
stroke(255,255, 255);
rect(rMargin, rMargin, rWidth, rWidth);
int yCenter = (rWidth/2) + rMargin;
int xCenter = (rWidth/2) + rMargin;
// rectMode(CENTER);
int rectCenterIndex = (width* yCenter) + xCenter;
int r = 0, g = 0, b = 0;
//for whole image:
//for (int i=0; i<bg_img.pixels.length; i++) {
// color c = bg_img.pixels[i];
// r += c>>16&0xFF;
// g += c>>8&0xFF;
// b += c&0xFF;
//}
//r /= bg_img.pixels.length;
//g /= bg_img.pixels.length;
//b /= bg_img.pixels.length;
//CALCULATE AVG COLOR:
int i;
for(int x = 50; x <= 150; x++){
for(int y = 50; y <= 150; y++){
i = (width*y) + x;
color c = webcam.pixels[i];
r += c>>16&0xFF;
g += c>>8&0xFF;
b += c&0xFF;
}
}
r /= webcam.pixels.length;
g /= webcam.pixels.length;
b /= webcam.pixels.length;
println(r + " " + g + " " + b);
}
}
You're so close, but missing out one important aspect: the number of pixels you're sampling.
Notice in the example code that is commented out for a full image you're dividing by the full number of pixels (pixels.length).
However, in your adapted version you want to compute the average colour of only a subsection of the full image which means a smaller number of pixels.
You're only sampling an area that is 100x100 pixels meaning you need to divide by 10000 instead of webcam.pixels.length (1920x1000). That is why you get 0 as it's integer division.
This is what I mean in code:
int totalSampledPixels = rWidth * rWidth;
r /= totalSampledPixels;
g /= totalSampledPixels;
b /= totalSampledPixels;
Full tweaked sketch:
import processing.video.*;
Capture webcam;
Capture cap;
PImage bg_img;
color bgColor = color(0, 0, 0);
int rMargin = 50;
int rWidth = 100;
int rHeight = 100;
color input = color(0, 0, 0);
color background = color(255, 255, 255);
color current;
int bgTolerance = 5;
void setup() {
size(1280,720);
// start the webcam
String[] inputs = Capture.list();
if (inputs.length == 0) {
println("Couldn't detect any webcams connected!");
exit();
}
webcam = new Capture(this, inputs[0]);
webcam.start();
}
void draw() {
if (webcam.available()) {
// read from the webcam
webcam.read();
image(webcam, 0,0);
webcam.loadPixels();
noFill();
strokeWeight(2);
stroke(255,255, 255);
rect(rMargin, rMargin, rWidth, rHeight);
int yCenter = (rWidth/2) + rMargin;
int xCenter = (rWidth/2) + rMargin;
// rectMode(CENTER);
int rectCenterIndex = (width* yCenter) + xCenter;
int r = 0, g = 0, b = 0;
//for whole image:
//for (int i=0; i<bg_img.pixels.length; i++) {
// color c = bg_img.pixels[i];
// r += c>>16&0xFF;
// g += c>>8&0xFF;
// b += c&0xFF;
//}
//r /= bg_img.pixels.length;
//g /= bg_img.pixels.length;
//b /= bg_img.pixels.length;
//CALCULATE AVG COLOR:
int i;
for(int x = 0; x <= width; x++){
for(int y = 0; y <= height; y++){
if (x >= rMargin && x <= rMargin + rWidth && y >= rMargin && y <= rMargin + rHeight){
i = (width*y) + x;
color c = webcam.pixels[i];
r += c>>16&0xFF;
g += c>>8&0xFF;
b += c&0xFF;
}
}
}
//divide by just the area sampled (x >= 50 && x <= 150 && y >= 50 && y <= 150 is a 100x100 px area)
int totalSampledPixels = rWidth * rHeight;
r /= totalSampledPixels;
g /= totalSampledPixels;
b /= totalSampledPixels;
fill(r,g,b);
rect(rMargin + rWidth, rMargin, rWidth, rHeight);
println(r + " " + g + " " + b);
}
}
Bare in mind this is averaging in the RGB colour space which is not the same as perceptual colour space. For example, if you average red and yellow you'd expect orange, but in RGB, a bit of red and green makes yellow.
Hopefully the RGB average is good enough for what you need, otherwise you may need to convert from RGB to CIE XYZ colour space then to Lab colour space to compute the perceptual average (then convert back to XYZ and RGB to display on screen). If that is something you're interested in trying, you can find an older answer demonstrating this in openFrameworks (which you'll notice can be similar to Processing in simple scenarios).

PyOpenCL how to modify a matrix locally within the kernel function

I am trying to modify a matrix (Pbis) locally within a pyOpenCL kernel function and when filling up this matrix with 0 it alters the result matrix R. When executing this code we obtain weird values in the R matrix. It is probably due to memory allocation but we cannot figure out how to fix it. Normally R should be exclusively composed of the init value.
program = cl.Program(context, """
__kernel void generate_paths(__global float *P, ushort const n,
ushort N, ushort init, __global float *R){
int i = get_global_id(0);
__private float* Pbis;
for (int k=0; k<n; k++){
Pbis[k] = 0;
}
for (int j=0; j<n; j++)
{
R[i*(n+1) + j] = init;
}
R[i*(n+1) + n] = init;
}
""").build()
The parameters for the generation are:
program.generate_paths(queue, res_np.shape, None, P_buf, np.uint16(n), np.uint16(N), np.uint16(init), res_buf)
Here is the entire code for reproducibility:
import numpy as np
import pyopencl as cl
import numpy.linalg as la
import os
os.environ['PYOPENCL_COMPILER_OUTPUT'] = '1'
os.environ['PYOPENCL_CTX'] = '1'
(n, N) = (3,6)
U = np.random.uniform(0,1, size=(n+1)*N)
U = U.astype(np.float32)
P = np.matrix([[0, 1/3, 1/3, 1/3], [1/3, 0, 1/3, 1/3], [1/3, 1/3, 0, 1/3], [1/3, 1/3, 1/3, 0]])
P = P.astype(np.float32)
res_np = np.zeros((N, n+1),dtype = np.float32)
platform = cl.get_platforms()[0]
device = platform.get_devices()[0]
context = cl.Context([device])
queue = cl.CommandQueue(context)
mf = cl.mem_flags
U_buf = cl.Buffer(context, mf.COPY_HOST_PTR | mf.COPY_HOST_PTR, hostbuf=U)
P_buf = cl.Buffer(context, mf.COPY_HOST_PTR | mf.COPY_HOST_PTR, hostbuf=P)
res_buf = cl.Buffer(context, mf.WRITE_ONLY, res_np.nbytes)
init = 0
program = cl.Program(context, """
__kernel void generate_paths(__global const float *U, __global float *P, ushort const n,
ushort N, ushort init, __global float *R){
int i = get_global_id(0);
int current = init;
__private float* Pbis;
for (int k=0; k<n; k++){
Pbis[k] = 0;
}
for (int j=0; j<n; j++)
{
R[i*(n+1) + j] = current;
}
R[i*(n+1) + n] = init;
}
""").build()
#prg.multiply(queue, c.shape, None,
# np.uint16(n), np.uint16(m), np.uint16(p),
# a_buf, b_buf, c_buf)
# a_mul_b = np.empty_like(c)
# cl.enqueue_copy(queue, a_mul_b, c_buf)
program.generate_paths(queue, res_np.shape, None, U_buf, P_buf, np.uint16(n), np.uint16(N), np.uint16(init), res_buf)
chem_gen = np.empty_like(res_np)
cl.enqueue_copy(queue, chem_gen, res_buf)
print("Platform Selected = %s" %platform.name)
print("Device Selected = %s" %device.name)
print("Generated Paths:")
print (chem_gen)

PyOpenCl Kernel in Loop Crashes GPU

I am writing a neighbor look up routine that is brute force using pypopencl. Later on it will fit into my smoothed particle hydro code. Brute force certainly is not efficient but its simple and its a starting point. I have been testing my look up kernel and I find that when I run it in a loop it crashes. I don't get any error messages in python but the screen flickers off, then comes back on with a note that the graphics drivers failed but have been recovered. The odd thing is that if the number of particles that are searched over are small (~1000 or less) its does just fine. If I increase the count (~10k) it crashes. I tried adding in barriers and wait commands, and a finish command, to no avail. I checked to see if I have an array overrun but I cannot find it. I am including the relevant code and apologize upfront for the size of it but wanted to give it out everything so people can look at it. I am hoping some one can run this and recreate the error, or tell me where I am going wrong. My setup is python 3.5 using spyder and installed pyopencl 2016.1.
Thanks,
Seth
First The main file
import numpy as np
import gpuParameters as gpuParameters
import pyopencl as cl
import pyopencl.array as ar
from BruteForceSearch import BruteForceSearch
import time as time
dim = 3 # dimensions of the problem
n = 15000 # number of particles
nbs = 50 # number of neighbors
x = np.random.rand(n) # randomly choose some x
y = np.random.rand(n) # randomly choose some y
z = np.random.rand(n) # randomly choose some z
h = np.ones(n) # smoothing parameter for the b spline
# setup gpu context
gpu = gpuParameters.gpuParameters()
# neighbor list
nlist = -1*np.ones(n*nbs, dtype=np.int32)
# data to gpu
xg = ar.to_device(gpu.queue, x) # x pos on gpu
yg = ar.to_device(gpu.queue, y) # y pos on gpu
zg = ar.to_device(gpu.queue, z) # z pos on gpu
hg = ar.to_device(gpu.queue, h) # h pos on gpu
num_p = ar.to_device(gpu.queue, np.array(n, dtype=np.int32)) # num of particles
nb = ar.to_device(gpu.queue, np.array(nbs, dtype=np.int32)) # num of neighbors
nlst = ar.to_device(gpu.queue, nlist) # neighbor list on gpu
dg = ar.to_device(gpu.queue, np.array(dim, dtype=np.int32)) # dimension on gpu
out = ar.zeros(gpu.queue, n, np.float64) # debug parameter
# call the Brute force neighbor search and h parameter set class
srch = BruteForceSearch(gpu) # instatiate
s = time.time() # timer start
for ii in range(100):
# set a marker I really didn't think this would be necessary
mark = cl.enqueue_marker(gpu.queue) # set a marker for kernel complete
srch.search.search(gpu.queue, x.shape, None,
num_p.data, nb.data, dg.data, xg.data, yg.data, zg.data,
hg.data, nlst.data, out.data) # run the kernel
cl.Event.wait(mark) # wait for complete run of kernel before next iteration
# gpu.queue.finish()
print('iteration: ', ii) # print iteration time to show me its running
e = time.time() # end the timer
cs = time.time() # clock the time it takes to return the array
nlist = nlst.get()
ce = time.time()
# output the times
print('time to calculate: ', e-s)
print('time to copy back: ', ce - cs)
GPU Context Class
import pyopencl as cl
class gpuParameters:
def __init__(self, dType = []):
#will setup the proper context based on given device preference
#if no device perference given will default to first value
if dType == []:
pltfrms = cl.get_platforms()[0]
devices = pltfrms.get_devices(cl.device_type.GPU)
context = cl.Context(devices) #create a device context
print(context)
print(devices)
self.cntxt = context#keep this context in motion
self.queue = cl.CommandQueue(self.cntxt) #create a command que for this context
self.mF = cl.mem_flags
Neighbor Loop up
import numpy as np
import pyopencl as cl
import gpu_sph_assistance_functions as gsaf
class BruteForceSearch:
def __init__(self, gpu):
# instantiation of the search routine primarilly for pre compiling of
# the function
self.gpu = gpu # save the gpu context
# setup and compile the search
self.bruteSearch()
def bruteSearch(self):
W = gsaf.gpu_sph_kernel()
self.search = cl.Program(
self.gpu.cntxt,
W + '''__kernel void search(__global int *nP, __global int *nN,
__global int *dim,
__global double *x, __global double *y,
__global double *z, __global double *h,
__global int *nlist, __global double *out)
{
// indices
int gid = get_global_id(0); // current particle
int idv = 0; // unrolled array id
int count = 0; // count
int dm = *dim; // problem dimension
int itr = 0; // start iteration
int mxitr = 25; // max number of iterations
// calculate variables
double dms = 1.0/(*dim); // 1 over dimension for pow
double xi = x[gid]; // current x position
double yi = y[gid]; // current y position
double zi = z[gid]; // current z position
double dx = 0; // difference in x
double dy = 0; // difference in y
double dz = 0; // difference in z
double r = 0; // radius
double hg = h[gid]; // smoothing parametre
double Wsum = 0; // sum of weights
double W = 0; // current weight
double dwdx = 0; // derivative of weight in x direction
double dwdy = 0; // derivative of weight in y direction
double dwdz = 0; // derivative of weight in z direction
double dwdr = 0; // derivative of weight in r direction
double V = 0; // Volume of particle
double hn = 0; // holding value for comparison
double err = 10; // error
double tol = 1e-7; // tolerance
double diff = 0; // difference
// first clean the array of neighbors
for (int ii = 0; ii < *nN; ii++) // length of num of neighbors
{
idv = *nN*gid + ii; // unrolled index
nlist[idv] = -1; // this is a trigger for excluding values
}
// Next calculate the h parameter
while (err > tol)
{
Wsum = 0; // clean summation
for (int jj = 0; jj < *nP; jj++) // loop over all particles
{
dx = xi - x[jj];
dy = yi - y[jj];
dz = zi - z[jj];
// spline for weights
quintic_spline(dm, hg, dx, dy, dz, &W,
&dwdx, &dwdy, &dwdz, &dwdr);
Wsum += W; // add to store
}
V = 1.0/Wsum; /// volume
hn = pow(V, dms); // new h parameter
diff = hn - hg; // difference
err = fabs(diff); // error
out[gid] = err; // store error for debug
hg = hn; // reset h
itr ++; // update iter
if (itr > mxitr) // break out
{ break; }
}
h[gid] = hg; // store h
/* // get all neighbors in vicinity of particle not
// currently assessed
for(int ii = 0; ii < *nP; ii++)
{
dx = xi - x[ii];
dy = yi - y[ii];
dz = zi - z[ii];
r = sqrt(dx*dx + dy*dy + dz*dz);
if (r < 3.25*hg & count < *nN)
{
idv = *nN*gid + count;
nlist[idv] = ii;
count++;
}
}
*/
}
''').build()
The Spline function for weighting
W = '''void quintic_spline(
int dim, double h, double dx, double dy, double dz, double *W,
double *dWdx, double *dWdy, double *dWdz, double *dWdrO)
{
double pi = 3.141592654; // pi
double m3q = 0; // prefix values
double m2q = 0; // prefix values
double m1q = 0; // prefix values
double T1 = 0; // prefix values
double T2 = 0; // prefix values
double T3 = 0; // prefix values
double D1 = 0; // prefix values
double D2 = 0; // prefix values
double D3 = 0; // prefix values
double Ch = 0; // normalizing parameter for kernel
double C = 0; // normalizing prior to h
double r = sqrt(dx*dx + dy*dy + dz*dz);
double q = r/h; // normalized radius
double dqdr = 1.0/h; // intermediate derivative
double dWdq = 0; // intermediate derivative
double dWdr = 0; // intermediate derivative
double drdx = dx/r; // intermediate derivative
double drdy = dy/r; // intermediate derivative
double drdz = dz/r; // intermediate derivative
if (dim == 1)
{
C = 1.0/120.0;
}
else if (dim == 2)
{
C = 7.0/(pi*478.0);
}
else if (dim == 3)
{
C = 1.0/(120.0*pi);
}
Ch = C/pow(h, dim);
if (r <= 0)
{
drdx = 0.0;
drdy = 0.0;
drdz = 0.0;
}
// local prefix constants
m1q = 1.0 - q;
m2q = 2.0 - q;
m3q = 3.0 - q;
// smoothing parameter constants
T1 = Ch*pow(m3q, 5);
T2 = -6.0*Ch*pow(m2q, 5);
T3 = 15.0*Ch*pow(m1q, 5);
//derivative of spline coefficients
D1 = -5.0*Ch*pow(m3q,4);
D2 = 30.0*Ch*pow(m2q,4);
D3 = -75.0*Ch*pow(m1q,4);
// W calculation
if (q < 1.0)
{
*W = T1 + T2 + T3;
dWdq = D1 + D2 + D3;
}
else if (q >= 1.0 && q < 2.0)
{
*W = T1 + T2;
dWdq = D1 + D2;
}
else if (q >= 2.0 && q < 3.0)
{
*W = T1;
dWdq = D1;
}
else
{
*W = 0.0;
dWdq = 0.0;
}
dWdr = dWdq*dqdr;
// assign the derivatives
*dWdx = dWdr*drdx;
*dWdy = dWdr*drdy;
*dWdz = dWdr*drdz;
*dWdrO = dWdr;
}'''
I tested the code on a Intel i7-4790K CPU with AMD Accelerated Parallel Processing. It does not crash at n=150000 (I only run one iteration). The only odd thing I discovered while quickly looking into the code, was that the kernel is reading and writing in the array h. This should not be a problem, but still I usually try to avoid this.

Resources