I am trying to do camera-lidar calibration. For that i took some lidar data of a checkerboard. For the extrinsic calibration I selected the corner points of checkerboard in image and in pcd. While selecting the pcd points i noticed that the points for ckecker board is not on the same depth. (I have attached the top view and front view of checkerboard below). Since the points are in different depths, the ectrinsic calibration values I am getting are wrong.
This is the top view..
This is the front view..
Is there any preprocessing step that can be done for reducing this kind of spreading in clusters?
Lidar used here : Benewake CE30D
Related
I aim to make an object detection model and I labelled data with a square box
If I label the images with polygon, will it be better than square?
(labelling on image of people wearing safety helmet or not)
I did try label with polygon shape on a few images and after export txt file for YOLO
why it has only 4 points in the text file as same as labelled with a square shape
how those points will represent an area that I label accurately?
1 0.573748 0.018953 0.045332 0.036101
1 0.944520 0.098375 0.108931 0.167870
You have labeled your object in a polygonial format, but when you had made a conversion to YOLO-format the information in the labelings has reduced. The picture below shows how I suppose has happend;
...where you have done polygon shape annotation (black shape). But, the conversion has "searched" the smallest x-value from the polygonial coordinate points and smallest y-value from corresponding polygonial coordinate points. And, those are the "first two" values of your YOLO-format. The same logic has happend with the "width" and "heigth" -parameters.
A good description about the idea behind the labelling and dataset is shown in https://www.youtube.com/watch?v=h6s61a_pqfM.
In short; for your purpose (for efficiency) I propose you make fast & convenient annotation using rectangles only - no time consuming polygon annotation.
The YOLO you are using very likely only has square annotation support.
See this video showing square vs polygon quality of results for detection, and the problem of annotation time required to create custom data sets.
To use polygonal masks can I suggest switching to use YOLOv3-Polygon or YOLOv5-Polygon
I'm working on developing software for an electron microscope.This works by focusing a beam of electrons at a specific part of the sample and then recording the image using a sensor.
The scan data is saved as a 4D array where the first 2 dimensions are the location (x and y) at which the beam was focused and the other 2 dimensions are the raw sensor output which is a 2D image.
while analyzing the data, I realized that there are some stuck pixels which I would ideally be able to repair automatically via software.
Here is an example:
As you can see, the data shape is 256,256,256,256 which means we scanned 256x256 points and the sensor data is a 256x256 image.
On the right data browser window (called nav), you can see the scan location which is 0,0 (also marked on the left window by scanY and scanX). I drew circles around a few of the stuck pixels. here is another scan location for reference:
I can automatically detect these pixels by unraveling the sensor data and checking for locations where the scan values are always the same, but I'm not sure how to repair these.
My first guess was reading the data from all the pixels that are next to this pixel and averaging them, then storing the average value instead of the stuck value, but I'm not sure if this is a good approach.
How do professional software such as Photoshop "repair" or "hide" a defect in the picture? Are there any known algorithms for this issue? I did a bit of searching but didn't manage to find much.
Back story: I'm creating a Three.js based 3D graphing library. Similar to sigma.js, but 3D. It's called graphosaurus and the source can be found here. I'm using Three.js and using a single particle representing a single node in the graph.
This was the first task I had to deal with: given an arbitrary set of points (that each contain X,Y,Z coordinates), determine the optimal camera position (X,Y,Z) that can view all the points in the graph.
My initial solution (which we'll call Solution 1) involved calculating the bounding sphere of all the points and then scale the sphere to be a sphere of radius 5 around the point 0,0,0. Since the points will be guaranteed to always fall in that area, I can set a static position for the camera (assuming the FOV is static) and the data will always be visible. This works well, but it either requires changing the point coordinates the user specified, or duplicating all the points, neither of which are great.
My new solution (which we'll call Solution 2) involves not touching the coordinates of the inputted data, but instead just positioning the camera to match the data. I encountered a problem with this solution. For some reason, when dealing with really large data, the particles seem to flicker when positioned in front/behind of other particles.
Here are examples of both solutions. Make sure to move the graph around to see the effects:
Solution 1
Solution 2
You can see the diff for the code here
Let me know if you have any insight on how to get rid of the flickering. Thanks!
It turns out that my near value for the camera was too low and the far value was too high, resulting in "z-fighting". By narrowing these values on my dataset, the problem went away. Since my dataset is user dependent, I need to determine an algorithm to generate these values dynamically.
I noticed that in the sol#2 the flickering only occurs when the camera is moving. One possible reason can be that, when the camera position is changing rapidly, different transforms get applied to different particles. So if a camera moves from X to X + DELTAX during a time step, one set of particles get the camera transform for X while the others get the transform for X + DELTAX.
If you separate your rendering from the user interaction, that should fix the issue, assuming this is the issue. That means that you should apply the same transform to all the particles and the edges connecting them, by locking (not updating ) the transform matrix until the rendering loop is done.
I am currently working on a program to detect coordinates of pool balls in an image of a pool table taken from an arbitrary point.
I first calculated the table corners and warped the perspective of the image to obtain a bird's eye view. Unfortunately, this made the spherical balls appear to be slightly elliptical as shown below.
In an attempt to detect the ellipses, I extracted all but the green felt area and used a Hough transform algorithm (HoughCircles) on the resulting image shown below. Unfortunately, none of the ellipses were detected (I can only assume because they are not circles).
Is there any better method of detecting the balls in this image? I am technically using JavaCV, but OpenCV solutions should be suitable. Thank you so much for reading.
The extracted BW image is good but it needs some morphological filters to eliminate noises then you can extract external contours of each object (by cvFindContours) and fit best ellipse to them (by cvFitEllipse2).
I'm currently studying for an exam (which is in two days) for computer vision but am getting pretty confused with the epipolar geometry stuff. I'm going through some past exam papers and I'm stuck on these two questions. Any help/explanations would be appreciated.
1) How are two cameras placed one with respect to another if epipoles in both the images coincide with the principle points (traces of optical axes)?
2) How are two cameras placed one with respect to another if epipoles in both the images seat infinitely far along the Y-axis of the world co-ordinate frame and have the same x-coordinate?
For the second one, my first guess was the cameras sit on top of one another and face the same direction. I'm not sure if it's right though.
The epipole is the position of one camera's centre of projection from the point of view of the other. So:
1) The two cameras are directly facing each other. Think of taking a photo, taking a few steps forwards, turning around 180 degrees and taking another picture. The line joining the centre of projections in the two images is normal to both the image planes.
2) Think of taking a photo, crouching down a few inches and taking another one. If the epipole is at infinity it means that the line joining the centre of projections of the two viewpoints is parallel to the image plane. (The other camera can still be rotated relative to the current viewpoint though)