I am looking for a real time image processing to measure the velocity and output it to another control system.
I have attached an image of a yellow stripe. This has markings on the surface that I would like to automatically detect and use for calculation. In the first step the material moves only in one direction. Here for example to the right. For me only the horizontal part of the movement is of interest, so quasi only the velocity along an x-axis. But the material moves relatively fast. At maximum speed in 28 ms the current mark (spike) is at the position of the one in front of it.
The idea is to use a Raspberry Pi 4 with a camera at the maximum of 120 fps. So every 8.3 ms a picture is generated and should make it possible to clearly detect the movement of the marks.
My questions are:
Is it possible to process the images and detaction that fast to get the velocity in nearly real time? And which algorithem should I use for this configuration? Because it would be best if I could use two or three markers per image and average the velocity of them.
And I would like to use the velocity as an input signal for another system. What is the easiest and fastest way to send the information directly to another control system?
Related
Question is, I want to calculate the speed of my arm for Slap detection. So I am using openpose to get the body points (here total points: 25) using body_25 model and using this along with the time I want to deduce the speed of my arm, i googled through openpose, stackoverflow, github.But could not succeed?
Velocity = Distance / Time = dx/dt
dx = frame3_bodypoints - frame_1_bodypoints;
dt = ?
I don't know how to find this from the openpose, is there a way I can find this? Any thoughts, would be great help!
I've never used OpenPose. But Newtonian physics would indicate that a slap corresponds to a sudden change in velocity of the hand.
I think it's a reasonable first approximation to assume that the Δt between frames is constant. Instantaneous variation in frame rate is called jitter. I would expect jitter to be small for modern recording devices. In any case, I don't know how to get instantaneous frame rate with the tools (OpenCV, PIL) that I am familiar with. I couldn't find any references to frame rate or time in the OpenPose docs.
For calculating velocity and delta-velocity, you have choices. Straight up linear velocity of the hand might be the easiest. For position changes use the geometric mean of positions (Δs = sqrt((x2-x1)^2 + (y2-y1)^2).
You could also calculate an angular velocity between the hand and the elbow, but that would be a little more involved and prone to noise.
I'm working on developing software for an electron microscope.This works by focusing a beam of electrons at a specific part of the sample and then recording the image using a sensor.
The scan data is saved as a 4D array where the first 2 dimensions are the location (x and y) at which the beam was focused and the other 2 dimensions are the raw sensor output which is a 2D image.
while analyzing the data, I realized that there are some stuck pixels which I would ideally be able to repair automatically via software.
Here is an example:
As you can see, the data shape is 256,256,256,256 which means we scanned 256x256 points and the sensor data is a 256x256 image.
On the right data browser window (called nav), you can see the scan location which is 0,0 (also marked on the left window by scanY and scanX). I drew circles around a few of the stuck pixels. here is another scan location for reference:
I can automatically detect these pixels by unraveling the sensor data and checking for locations where the scan values are always the same, but I'm not sure how to repair these.
My first guess was reading the data from all the pixels that are next to this pixel and averaging them, then storing the average value instead of the stuck value, but I'm not sure if this is a good approach.
How do professional software such as Photoshop "repair" or "hide" a defect in the picture? Are there any known algorithms for this issue? I did a bit of searching but didn't manage to find much.
I want to know how this: https://youtu.be/aFXcQvdAe08 was done.
Any ideas how this was created?
In really high level terms, I think it could happen like this:
The entire song is divided up into windows of some fraction of a second.
For each window, a Fourier transform or some equivalent transform the audio signal from the time to frequency domain.
The transformed signal is mirrored radially around the origin, where the "fatness" of each circle could correspond to "how tall the respective hump is" in the frequency graph.
So, to make the animation "smooth" one would probably have at least 10 windows a second. Hope that helps. All images are own work.
I have this fun idea of a project i'd like to do, but i'm not really sure about the math part of it. Here is the idea:
Make a plastic card that would simulate a 9 finger multitouch gesture when it is held against a capacitive screen
Based on the "9 finger" placement, determine some sort of a unique string and use it as an encryption/decryption key for an app
This way i could just open an app, touch the screen with the card and it would get authorized.
But here's the problem:
It shouldn't matter where you place the card on a screen, because the card would be pretty small to fit various screen sizes
The rectangle in which we can randomly position the 9 "fingers" would optimally be 4.5cm x 3cm
The "finger" itself is only recognized as a touch if it is about a 6mm circle (not sure if this can be made smaller)
I figured we could find the left-top "finger" and get every other "finger's" X and Y difference from it. Then concatenate the resulting numbers into a string and use it as a decryption/encryption key. So basically:
key = concat(X2 - X1, Y2 - Y1, X3 - X1, Y3 - Y1, ...)
But i think such an approach would have very few possible combinations (given a relatively small card size and a relatively big "finger") and one could easily write a program to generate all possible combinations and break the key in no time. Am i right about this? If so, how could i improve this?
Thanks for your thoughts
UPDATE 1: actually tried it out on iOS. The result is not promising, since the "fingers" get detected differently each time. The distance between them varies significantly (by as much as 40 pixels!). So i guess this is not as easy as i expected, since the OS seems to detect the touch differently each time for the same two circles.
Your question is lacking some relevant information: how far apart need the circles be so that the system can still distinguish them? What resolution can you realistically expect for the circle centers? And by “6mm circle”, do you mean 6mm diameter or radius (or even circumference)?
Lacking details, I'll make some pretty rough approximations. I'll start by requiring that two of the circles will be placed in opposite corners of the card. That way, you can find them by looking for a pair with maximal distance, and from that compute the orientation and size of the card and correct for that. This leaves 7 fingers to be placed randomly. I'll assume 1mm resolution, and restrict myself to a 45×30mm area. Which means 39×24=936 positions per circle, for a total of 9367≈6,3×1020≈269 combinations. OK, this does not exclude overlapping circles. But since the card is still rather sparsely covered, that shouldn't amount to too much. I'd say 64 bit of entropy (i.e. 264 possible combinations) should be reasonable even if you enforce non-overlapping circles. If you can really detect the circle centers with the required resolution, that is. This should be sufficient security for most applications. Far better than 8-letter passwords, but worse than the symmetric keys usually used for e.g. AES.
Since all of this depends very much on the resolution, it might be worthwhile to investigate that aspect first. Usually you'll get pixel coordinates for your finger positions, but it would be expecting too much to assume that you'd always get the pixel coordinate closest to the center of your circle. So you might start by writing a small application which draws a 6mm circle and records coordinates it receives. Then place a 6mm artificial circle in that drawn one a large number of times. Look how far the recorded positions differ from the center of circle. Take the maximum of those differences, perhaps after removing outliers. I'd add a pixel or two to that, to account for rounding errors due to the rotation of the card. Then turn that pixel count back into a metric length. This is the resolution you can expect. You might have to do this for several devices. If you do perform these experiments, let me know what you find and I'll update my answer accordingly.
I will be start working on a robotics project which involves a mobile robot that has mounted 2 cameras (1.3 MP) fixed at a distance of 0.5m in between.I also have a few ultrasonic sensors, but they have only a 10 metter range and my enviroment is rather large (as an example, take a large warehouse with many pillars, boxes, walls .etc) .My main task is to identify obstacles and also find a roughly "best" route that the robot must take in order to navigate in a "rough" enviroment (the ground floor is not smooth at all). All the image processing is not made on the robot, but on a computer with NVIDIA GT425 2Gb Ram.
My questions are :
Should I mount the cameras on a rotative suport, so that they take pictures on a wider angle?
It is posible creating a reasonable 3D reconstruction based on only 2 views at such a small distance in between? If so, to what degree I can use this for obstacle avoidance and a best route construction?
If a roughly accurate 3D representation of the enviroment can be made, how can it be used as creating a map of the enviroment? (Consider the following example: the robot must sweep an fairly large area and it would be energy efficient if it would not go through the same place (or course) twice;however when a 3D reconstruction is made from one direction, how can it tell if it has already been there if it comes from the opposite direction )
I have found this response on a similar question , but I am still concerned with the accuracy of 3D reconstruction (for example a couple of boxes situated at 100m considering the small resolution and distance between the cameras).
I am just starting gathering information for this project, so if you haved worked on something similar please give me some guidelines (and some links:D) on how should I approach this specific task.
Thanks in advance,
Tamash
If you want to do obstacle avoidance, it is probably easiest to use the ultrasonic sensors. If the robot is moving at speeds suitable for a human environment then their range of 10m gives you ample time to stop the robot. Keep in mind that no system will guarantee that you don't accidentally hit something.
(2) It is posible creating a reasonable 3D reconstruction based on only 2 views at such a small distance in between? If so, to what degree I can use this for obstacle avoidance and a best route construction?
Yes, this is possible. Have a look at ROS and their vSLAM. http://www.ros.org/wiki/vslam and http://www.ros.org/wiki/slam_gmapping would be two of many possible resources.
however when a 3D reconstruction is made from one direction, how can it tell if it has already been there if it comes from the opposite direction
Well, you are trying to find your position given a measurement and a map. That should be possible, and it wouldn't matter from which direction the map was created. However, there is the loop closure problem. Because you are creating a 3D map at the same time as you are trying to find your way around, you don't know whether you are at a new place or at a place you have seen before.
CONCLUSION
This is a difficult task!
Actually, it's more than one. First you have simple obstacle avoidance (i.e. Don't drive into things.). Then you want to do simultaneous localisation and mapping (SLAM, read Wikipedia on that) and finally you want to do path planning (i.e. sweeping the floor without covering area twice).
I hope that helps?
I'd say no if you mean each eye rotating independently. You won't get the accuracy you need to do the stereo correspondence and make calibration a nightmare. But if you want the whole "head" of the robot to pivot, then that may be doable. But you should have some good encoders on the joints.
If you use ROS, there are some tools which help you turn the two stereo images into a 3d point cloud. http://www.ros.org/wiki/stereo_image_proc. There is a tradeoff between your baseline (the distance between the cameras) and your resolution at different ranges. large baseline = greater resolution at large distances, but it also has a large minimum distance. I don't think i would expect more than a few centimeters of accuracy from a static stereo rig. and this accuracy only gets worse when you compound there robot's location uncertainty.
2.5. for mapping and obstacle avoidance the first thing i would try to do is segment out the ground plane. the ground plane goes to mapping, and everything above is an obstacle. check out PCL for some point cloud operating functions: http://pointclouds.org/
if you can't simply put a planar laser on the robot like a SICK or Hokuyo, then i might try to convert the 3d point cloud into a pseudo-laser-scan then use some off the shelf SLAM instead of trying to do visual slam. i think you'll have better results.
Other thoughts:
now that the Microsoft Kinect has been released, it is usually easier (and cheaper) to simply use that to get a 3d point cloud instead of doing actual stereo.
This project sounds a lot like the DARPA LAGR program. (learning applied to ground robots). That program is over, but you may be able to track down papers published from it.