My goal is to implement a method, that tracks persons in a single camera. For that, I'm using Scaled Yolov4 to detect persons in the scene, then I generate points inside of their bounding boxes using cv2.goodFeaturesToTrack, and track them using Lucas-Kanade Optical Flow cv2.calcOpticalFlowPyrLK.
the problem is, sometimes the points make huge jumps, and I can't tell why. The following video shows the problem I'm facing, specifically, on second 0:02, the green dots jumps in a weird manner which makes my method detects that person as a new person.
https://www.veed.io/view/37f98715-40c5-4c07-aa97-8c2242d7806c?sharingWidget=true
my question is, is it a limitation on LK optical flow, or I'm doing something wrong? And is there a recommended Optical Flow method for tracking, or an example implementation for Single Camera Multi Person Tracking using Optical Flow? because I couldn't find much literature or codes about it.
The raw position and rotation data of the IMixedRealityPointerHandler output isn't consistent across all users with the same hand gesture-- what strategies can we use to make this more consistent over various users? Does the device train to a specific eye calibration more accurately over time i.e. does more time in the device = better eye/hand tracking for that user?
The raw position and rotation data of the IMixedRealityPointerHandler output isn't consistent across all users with the same hand gesture
Iām not sure my understanding is correct, it looks like what you are talking about is, HoloLens lacks data consistency in the recognition of the same gesture by different users.
Actually, when users making any gestures on HoloLens, they need to keep his hands within the "Gesture Frame", in a range that the gesture-sensing cameras can see appropriately.
For this reason, the user needs to be trained in this area of recognition both for the success of the action and for their comfort.
For best results, three things mentioned in this document are essential to improve the consistency of gesture recognition:
Users need to be aware of Gesture Frame's existence.
Notifying users when their gestures are nearing/breaking the gesture frame boundaries within an application.
Consequences of breaking the gesture frame boundaries should be minimized.
Besides, we highly recommend that running Calibration each time a different person uses the device(please navigate to Setting->Utilities->Calibration).
I have a generative art application which starts with a small set of points, grows them outwards, and checks the growth to make sure it doesn't intersect with anything. My first naive implementation was to do it all on the main UI thread with the expected consequences. As the size grows, there are more points to check, it slows down and eventually blocks the UI.
I did the obvious thing and moved the calculations to another thread so the UI could stay responsive. This helped, but only a little. I accomplished this by having an NSBitmapImageRep that I wrap an NSGraphicsContext around so I can draw into it. But I needed to ensure that I'm not trying to draw it to the screen on the main UI thread while I'm also drawing to it on the background thread. So I introduced a lock. The drawing can take a long time as the data gets larger, too, so even this was problematic.
My latest revision has 2 NSBitmapImageReps. One holds the most recently drawn version and is drawn to the screen whenever the view needs updating. The other is drawn to on the background thread. When the drawing on the background thread is done, it's copied to the other one. I do the copy by getting the base address of each and simply calling memcpy() to actually move the pixels from one to the other. (I tried swapping them rather than copying, but even though the drawing ends with a call to [-NSGraphicsContext flushContext], I was getting partially-drawn results drawn to the window.)
The calculation thread looks like this:
BOOL done = NO;
while (!done)
{
self->model->lockBranches();
self->model->iterate();
done = (!self->model->moreToDivide()) || (!self->keepIterating);
self->model->unlockBranches();
[self drawIntoOffscreen];
dispatch_async(dispatch_get_main_queue(), ^{
self.needsDisplay = YES;
});
}
This works well enough for keeping the UI responsive. However, every time I copy the drawn image into the blitting image, I call [-NSBitmapImageRep baseAddress]. Looking at a memory profile in instruments, each call to that function causes a CGImage to be created. Furthermore, that CGImage isn't released until the calculations finish, which can be several minutes. This causes memory to grow pretty large. I'm seeing around 3-4 Gigs of CGImages in my process, even though I never need more than 2 of them. After the calculations finish and the cache is emptied, my app's memory goes down to only 350-500 MB. I hadn't thought to use an autorelease pool in the calculation loop for this, but will give it a try.
It appears that the OS is caching the images it creates. However, it doesn't clear out the cache until the calculations are finished, so it grows without bound until then. Is there any way to keep this from happening?
Don't use -bitmapData and memcpy() to copy the image. Draw the one image into the other.
I often recommend that developers read the section "NSBitmapImageRep: CoreGraphics impedance matching and performance notes" from the 10.6 AppKit release notes:
NSBitmapImageRep: CoreGraphics impedance matching and performance notes
Release notes above detail core changes at the NSImage level for
SnowLeopard. There are also substantial changes at the
NSBitmapImageRep level, also for performance and to improve impedance
matching with CoreGraphics.
NSImage is a fairly abstract representation of an image. It's pretty
much just a thing-that-can-draw, though it's less abstract than NSView
in that it should not behave differently based aspects of the context
it's drawn into except for quality decisions. That's kind of an opaque
statement, but it can be illustrated with an example: If you draw a
button into a 100x22 region vs a 22x22 region, you can expect the
button to stretch its middle but not its end caps. An image should not
behave that way (and if you try it, you'll probably break!). An image
should always linearly and uniformly scale to fill the rect in which
its drawn, though it may choose representations and such to optimize
quality for that region. Similarly, all the image representations in
an NSImage should represent the same drawing. Don't pack some totally
different image in as a rep.
That digression past us, an NSBitmapImageRep is a much more concrete
object. An NSImage does not have pixels, an NSBitmapImageRep does. An
NSBitmapImageRep is a chunk of data together with pixel format
information and colorspace information that allows us to interpret the
data as a rectangular array of color values.
That's the same, pretty much, as a CGImage. In SnowLeopard an
NSBitmapImageRep is natively backed by a CGImageRef, as opposed to
directly a chunk of data. The CGImageRef really has the chunk of data.
While in Leopard an NSBitmapImageRep instantiated from a CGImage would
unpack and possibly process the data (which happens when reading from
a bitmap file format), in SnowLeopard we try hard to just hang onto
the original CGImage.
This has some performance consequences. Most are good! You should see
less encoding and decoding of bitmap data as CGImages. If you
initialize a NSImage from a JPEG file, then draw it in a PDF, you
should get a PDF of the same file size as the original JPEG. In
Leopard you'd see a PDF the size of the decompressed image. To take
another example, CoreGraphics caches, including uploads to the
graphics card, are tied to CGImage instances, so the more the same
instance can be used the better.
However: To some extent, the operations that are fast with
NSBitmapImageRep have changed. CGImages are not mutable,
NSBitmapImageRep is. If you modify an NSBitmapImageRep, internally it
will likely have to copy the data out of a CGImage, incorporate your
changes, and repack it as a new CGImage. So, basically, drawing
NSBitmapImageRep is fast, looking at or modifying its pixel data is
not. This was true in Leopard, but it's more true now.
The above steps do happen lazily: If you do something that causes
NSBitmapImageRep to copy data out of its backing CGImageRef (like call
bitmapData), the bitmap will not repack the data as a CGImageRef until
it is drawn or until it needs a CGImage for some other reason. So,
certainly accessing the data is not the end of the world, and is the
right thing to do in some circumstances, but in general you should be
thinking about drawing instead. If you think you want to work with
pixels, take a look at CoreImage instead - that's the API in our
system that is truly intended for pixel processing.
This coincides with safety. A problem we've seen with our SnowLeopard
changes is that apps are rather fond of hardcoding bitmap formats. An
NSBitmapImageRep could be 8, 32, or 128 bits per pixel, it could be
floating point or not, it could be premultiplied or not, it might or
might not have an alpha channel, etc. These aspects are specified with
bitmap properties, like -bitmapFormat. Unfortunately, if someone wants
to extract the bitmapData from an NSBitmapImageRep instance, they
typically just call bitmapData, treat the data as (say) premultiplied
32 bit per pixel RGBA, and if it seems to work, call it a day.
Now that NSBitmapImageRep is not processing data as much as it used
to, random bitmap image reps you may get ahold of may have different
formats than they used to. Some of those hardcoded formats might be
wrong.
The solution is not to try to handle the complete range of formats
that NSBitmapImageRep's data might be in, that's way too hard.
Instead, draw the bitmap into something whose format you know, then
look at that.
That looks like this:
NSBItmapImageRep *bitmapIGotFromAPIThatDidNotSpecifyFormat;
NSBitmapImageRep *bitmapWhoseFormatIKnow = [[NSBitmapImageRep alloc] initWithBitmapDataPlanes:NULL pixelsWide:width pixelsHigh:height
bitsPerSample:bps samplesPerPixel:spp hasAlpha:alpha isPlanar:isPlanar
colorSpaceName:colorSpaceName bitmapFormat:bitmapFormat bytesPerRow:rowBytes
bitsPerPixel:pixelBits];
[NSGraphicsContext saveGraphicsState];
[NSGraphicsContext setContext:[NSGraphicsContext graphicsContextWithBitmapImageRep:bitmapWhoseFormatIKnow]];
[bitmapIGotFromAPIThatDidNotSpecifyFormat draw];
[NSGraphicsContext restoreGraphicsState];
unsigned char *bitmapDataIUnderstand = [bitmapWhoseFormatIKnow bitmapData];
This produces no more copies of the data than just accessing
bitmapData of bitmapIGotFromAPIThatDidNotSpecifyFormat, since that
data would need to be copied out of a backing CGImage anyway. Also
note that this doesn't depend on the source drawing being a bitmap.
This is a way to get pixels in a known format for any drawing, or just
to get a bitmap. This is a much better way to get a bitmap than
calling -TIFFRepresentation, for example. It's also better than
locking focus on an NSImage and using -[NSBitmapImageRep
initWithFocusedViewRect:].
So, to sum up: (1) Drawing is fast. Playing with pixels is not. (2) If
you think you need to play with pixels, (a) consider if there's a way
to do it with drawing or (b) look into CoreImage. (3) If you still
want to get at the pixels, draw into a bitmap whose format you know
and look at those pixels.
In fact, it's a good idea to start at the earlier section with a similar title ā "NSImage, CGImage, and CoreGraphics impedance matching" ā and read through to the later section.
By the way, there's a good chance that swapping the image reps would work, but you just weren't synchronizing them properly. You would have to show the code where both reps were used for us to know for sure.
I have an application which renders many filled polygons with OpenGL, in 2D. Filling is done by tesselation but performance is not optimal. 1900 polygons made up of 122000 vertex (that is, about 64 vertex per polygon) are displayed in about 3 seconds.
Apparently, the CPU is not the bottleneck, as if I replace calls to gluTessVertex by calls to glColor - just to test where is the bottleneck, performance is doubled.
I have the same problem with loading many small textures.
Now, which are the options to improve the performance? Seems that most time is spend in the geometry subsystem. Rendering is fast enough.
I already have a worker thread which does the load (so tesselation, texture binding) in one context, and another thread which does the draw in another context. The two contexts share objects via wglShareLists and it works like a charm.
Can I have a third thread in a third context which would handle also tesselation for half of the polygons? Anyone tried that? Is it safe? Any example of sharing objects between three contexts?
Forgot to say, I have an ATI Radeon HD 4550 graphics card, suppose it can handle more than 39kB/s of data.
Increase Performance
Sounds like you're using the old fixed-function pipeline.
If you're unsure of what that is, well, the following functions are a part of the fixed-function pipeline.
glBegin()
glEnd()
glVertex*()
glTexCoord*()
glNormal*()
glColor*()
etc.
Those functions are old and render geometry immediately. That means that each time you call the above functions, that geometry gets send to the GPU. By doing that a lot of times, you can easy make the FPS go way under 60 just by rendering simple things.
Now you need to use buffers and to be more precise VAOs with/or VBOs (and IBOs).
VBO or Vertex Buffer Object, is a buffer which can store vertices which you then can render. This is much much faster and better to use than glBegin() and glEnd(). When you create a VBO you supply it with vertices and they only require to be send to the GPU once, that's basically why they are fast, because they already are in the GPU and only require a single draw call instead of multiple.
The reason I said "with/or" is because in the newer versions you need to create a VAO which then would use a VBO, where before you could simply render the VBOs.
Tessellation
There are multiple ways to do tessellation and things which look like/would give the effect of tessellation.
For instance you could also simply render different models according to the required LOD (Level of Detail), thereby when you're up close to an object you then render the model with all it's details which probably would have a high vertices count. Then the further you're away from the model you simply render another version of that model but which have less vertices, which also equals less detail. Though if you can't really do that on something like terrain and definitely shouldn't do it on something like dynamic terrain and/or procedurally generated terrain.
You can also do actual geometry tessellation and you would do that through a Shader. Since tessellation is a really huge topic I will provide you with 2 urls which both explain and have code on them.
Both of these articles uses modern 4.0/4.0+ OpenGL.
http://prideout.net/blog/?p=48
http://antongerdelan.net/opengl/tessellation.html
Texturing
Generating and binding textures are still the same.
Instead of using gluBuild2DMipmaps() you can use glGenerateMipmap(GL_TEXTURE_2D); it was added in OpenGL version 3.0'ish if I remember correctly.
Again you can (and should) change all you glBegin() - glEnd() (and everything in between) calls out with VAOs and VBOs. You can store everything you want inside a buffer vertices, texture coordinates, normals, colors, etc. You can store the things in separate buffers or you can store them inside a single buffer, usually called an Interleaved Buffer or Interleaved VBO.
You wouldn't be needing glEnable(GL_TEXTURE_2D) and glDisable(GL_TEXTURE_2D) anymore, because you do that within a Shader, you bind textures and use them in a Shader, and since you create the Shader Program you can make it act however you want it to.
When using the PhotoCamera one must create an instance of the PhotoCamera as well as a VideoBrush - and then assign that PhotoCamera instance to the source of the VideoBrush instance before the camera can be initialized. example:
PhotoCamera camera;
VideoBrush brush;
camera = new PhotoCamera();
camera.Initialized += CameraInitialized;
brush = new VideoBrush();
brush.SetSource(camera);
The VideoBrush is clearly useful in scenarios where the developer wishes to create a viewfinder for the camera video stream by associating the VideoBrush instance with the brush of a visual object like a Canvas.Background or Rectangle.Fill. However, when that is not the case, requiring the developer to still go through the motions of creating a VideoBrush seems somewhat random at first glance.
So two questions, why does the PhotoCamera always need to be associated with the VideoBrush?
What is the performance impact associating with attaching the PhotoCamera to a VideoBrush? Specifically how are calls to GetPreviewBuffer(Argb|Y|YCbCr) impacted by the associated VideoBrush?
Thanks!
PS. hopefully this doesn't come off as pointed in anyway, I'd just like to have a better understanding of why this requirement exists - and how it impacts performance.
PPS. the improvements in the WP7 SDK for Mango are amazing - I'm looking forward to seeing what people come up with now that access to the sensors have been opened up.
In mango you simply have two options, either do as you suggested above to use a frame in your app (a Video Frame) to take pictures, essentially grabbing a single frame from the video brush.
Or you can use the old NoDo method of using the PhotoChooser Task, which will launch the framework camera app separately and return an image.
Obviously pro's and cons of both methods so just choose the one that suit you.