Consistency for MRTK2's IMixedRealityPointerHandler - hololens

The raw position and rotation data of the IMixedRealityPointerHandler output isn't consistent across all users with the same hand gesture-- what strategies can we use to make this more consistent over various users? Does the device train to a specific eye calibration more accurately over time i.e. does more time in the device = better eye/hand tracking for that user?

The raw position and rotation data of the IMixedRealityPointerHandler output isn't consistent across all users with the same hand gesture
I’m not sure my understanding is correct, it looks like what you are talking about is, HoloLens lacks data consistency in the recognition of the same gesture by different users.
Actually, when users making any gestures on HoloLens, they need to keep his hands within the "Gesture Frame", in a range that the gesture-sensing cameras can see appropriately.
For this reason, the user needs to be trained in this area of recognition both for the success of the action and for their comfort.
For best results, three things mentioned in this document are essential to improve the consistency of gesture recognition:
Users need to be aware of Gesture Frame's existence.
Notifying users when their gestures are nearing/breaking the gesture frame boundaries within an application.
Consequences of breaking the gesture frame boundaries should be minimized.
Besides, we highly recommend that running Calibration each time a different person uses the device(please navigate to Setting->Utilities->Calibration).

Related

Tracking using Lucas Kanade Optical Flow, shows weird behavior, points are jumping

My goal is to implement a method, that tracks persons in a single camera. For that, I'm using Scaled Yolov4 to detect persons in the scene, then I generate points inside of their bounding boxes using cv2.goodFeaturesToTrack, and track them using Lucas-Kanade Optical Flow cv2.calcOpticalFlowPyrLK.
the problem is, sometimes the points make huge jumps, and I can't tell why. The following video shows the problem I'm facing, specifically, on second 0:02, the green dots jumps in a weird manner which makes my method detects that person as a new person.
https://www.veed.io/view/37f98715-40c5-4c07-aa97-8c2242d7806c?sharingWidget=true
my question is, is it a limitation on LK optical flow, or I'm doing something wrong? And is there a recommended Optical Flow method for tracking, or an example implementation for Single Camera Multi Person Tracking using Optical Flow? because I couldn't find much literature or codes about it.

Detecting damaged car parts

I am trying to build a system that on providing an image of a car can assess the damage percentage of it and also find out which parts are damaged in the car.
Is there any possible way to do this using Python and open-cv or tensorflow ?
The GitHub repositories I found that were relevant to my work are these
https://github.com/VakhoQ/damage-car-detector/tree/master/DamageCarDetector
https://github.com/neokt/car-damage-detective
But what they provide is a qualitative output( like they say the car damage is high or low), I wanted to print out a quantitative output( percentage of damage ) along with the individual part names which are damaged
Is this possible ?
If so please help me out.
Thank you.
To extend the good answers given by #yves-daoust: It is not a trivial task and you should not try to do it at once with one single approach.
You should question yourself how a human with a comparable task, i.e. say an expert who reviews these cars after a leasing contract, proceeds with this. Then you have to formulate requirements and also restrictions for your system.
For instance, an expert first checks for any visual occurences and rates these, then they may check technical issues which may well be hidden from optical sensors (i.e. if the car is drivable, driving a round and estimate if the engine is running smoothly, the steering geometry is aligned (i.e. if the car manages to stay in line), if there are any minor vibrations which should not be there and so on) and they may also apply force (trying to manually shake the wheels to check if the bearings are ok).
If you define your measurement system as restricted to just a normal camera sensor, you are somewhat limited within to what extend your system is able to deliver.
If you just want to spot cosmetic damages, i.e. classification of scratches in paint and rims, I'd say a state of the art machine vision application should be able to help you to some extent:
First you'd need to detect the scratches. Bear in mind that visibility of scratches, especially in the field with changing conditions (sunlight) may be a very hard to impossible task for a cheap sensor. I.e. to cope with reflections a system might need to make use of polarizing filters, special effect paints may interfere with your optical system in a way you are not able to spot anything.
Secondly, after you detect the position and dimension of these scratches in the camera coordinates, you need to transform them into real world coordinates for getting to know the real dimensions of these scratches. It would also be of great use to know the exact location of the scratch on the car (which would require a digital twin of the car - which is not to be trivially done anymore).
After determining the extent of the scratch and its position on the car, you need to apply a cost model. Because some car parts are easily fixable, say a scratch in the bumper, just respray the bumper, but scratch in the C-Pillar easily is a repaint for the whole back quarter if it should not be noticeable anymore.
Same goes with bigger scratches / cracks: The optical detection model needs to be able to distinguish between scratches and cracks (which is very hard to do, just by looking at it) and then the cost model can infer the cost i.e. if a bumper needs just respray or needs complete replacement (because it is cracked and not just scratched). This cost model may seem to be easy but bear in mind this needs to be adopted to every car you "scan". Because one cheap damage for the one car body might be a very hard to fix damage for a different car body. I'd say this might even be harder than to spot the inital scratches because you'd need to obtain the construction plans/repair part lists (the repair handbooks / repair part lists are mostly accessible if you are a registered mechanic but they might cost licensing fees) of any vehicle you want to quote.
You see, this is a very complex problem which is composed of multiple hard sub-problems. The easiest or probably the best way to do this would be to do a bottom up approach, i.e. starting with a simple "scratch detector" which just spots scratches in paint. Then go from there and you easily see what is possible and what is not

Detecting Handedness from Device Use

Is there any body of evidence that we could reference to help determine whether a person is using a device (smartphone/tablet) with their left hand or right hand?
My hunch is that you may be able to use accelerometer data to detect a slight tilt, perhaps only while the user is manipulating some sort of on screen input.
The answer I'm looking for would state something like, "research shows that 90% of right handed users that utilize an input mechanism tilt their phone an average of 5° while inputting data, while 90% of left handed users utilizing an input mechanism have their phone tilted an average of -5°".
Having this data, one would be able to read accelerometer data and be able to make informed decisions regarding placement of on screen items that might otherwise be in the way for left handed users or right handed users.
You can definitely do this but if it were me, I'd try a less complicated approach. First you need to recognize that not any specific approach will yield 100% accurate results - they will be guesses but hopefully highly probable ones. With that said, I'd explore the simple-to-capture data points of basic touch events. You can leverage these data points and pull x/y axis on start/end touch:
touchStart: Triggers when the user makes contact with the touch
surface and creates a touch point inside the element the event is
bound to.
touchEnd: Triggers when the user removes a touch point from the
surface.
Here's one way to do it - it could be reasoned that if a user is left handed, they will use their left thumb to scroll up/down on the page. Now, based on the way the thumb rotates, swiping up will naturally cause the arch of the swipe to move outwards. In the case of touch events, if the touchStart X is greater than touchEnd X, you could deduce they are left handed. The opposite could be true with a right handed person - for a swipe up, if the touchStart X is less than touchEnd X, you could deduce they are right handed. See here:
Here's one reference on getting started with touch events. Good luck!
http://www.javascriptkit.com/javatutors/touchevents.shtml
There are multiple approaches and papers discussing this topic. However, most of them are written between 2012-2016. After doing some research myself I came across a fairly new article that makes use of deep learning.
What sparked my interest is the fact that they do not rely on a swipe direction, speed or position but rather on the capacitive image each finger creates during a touch.
Highly recommend reading the full paper: http://huyle.de/wp-content/papercite-data/pdf/le2019investigating.pdf
Whats even better, the data set together with Python 3.6 scripts to preprocess the data as well as train and test the model described in the paper are released under the MIT license. They also provide the trained models and the software to
run the models on Android.
Git repo: https://github.com/interactionlab/CapFingerId

Questions about updating my node.js game

I am making a little game using node.js for the server and a .js file embedded in a HTML5 canvas for clients. The players each have and object they can move around with the arrow keys.
Now I have made 2 different ways of updating the game, one was sending the new position of the player everytime it changes. It worked but my server had to process around 60 x/y pairs a second(the update rate of the client is 30/sec and there were 2 players moving non-stop).
The second method was to only send new position and speed/direction of the player's object when they change their direction speed, so basically on the other clients the movement of the player was interpolated using the direction/speed from the last update. My server only had to process very few x/y7speed/direction packets, however my clients experienced a little lag when the packets arrived since the interpolated position was often a little bit away from the actual position written in the packet.
Now my questions is: Which method would you recommend? And how should I make my lag compensation for either method?
If you have low latency, interpolate from the position in which the object is drawn up the new position. In low latency it does not represent much of a difference.
If you have high latency, you can implement a kind of EPIC.
http://www.mindcontrol.org/~hplus/epic/
You can also check how it is done in Browser-Quest.
https://github.com/mozilla/BrowserQuest
Good luck!

Really Basic Graphics in C# 2.0 Tutorials

I work for a ticketing agency and we print out tickets on our own ticket printer. I have been straight coding the ticket designs and storing the templates in a database. If we need a new field adding to a ticket I manually add it and use the arcane co-ordinate system to estimate where the fields should go and how much the other fields need to move by to accomodate new info.
We always planned to make this system automate with a simple (I stress the word simple) graphical editor. Basically we don't forsee tickets changing radically in shape any time soon, we have one size of ticket and the ticket printer firmware is super simple because it's more of an industrial machine, it has about 10 fonts and some really basic sizing interactions.
I need to make this editor display a rectangle of the dimensions by pixel of the tickets (can even be actual size) and have a resizable grid which can toggle between superimposition and invisibility on top of the ticket rectangle and represented by dots rather than lines.
Then I want to be able to represent fields by drawing rectangles filled with the letter "x" that show the maximum size of the field (to prevent overlaps). These fields should be selectable, draggable and droppable in a snap to grid fashion.
I've worked out the maths of it but I have no idea how to draw rectangles and then draw grids in layers and then put further rectangles full of 'x'es on top of those. I also don't really know much about changing drawn positions in accordance with mouse events. It's simply not something I've ever had to do.
All the tutorials I've seen so far presume that you already know a lot about using the draw objects and are seeking to extend a basic knowledge of these things. I just need pointing in the direction of a good tutorial in manipulating floating objects in a picturebox in the first place.
Any ideas?
For those of you in need of a guide to this unusual (at least those of us with a BIS background) field I would heartily endorse:
https://web.archive.org/web/20141230145656/http://bobpowell.net/faqmain.aspx
I am now happily drawing graphical interfaces and getting them to respond to control inputs with not too much hassle.

Resources