Live camera to shape distance calculation based on computer vision - python-3.x

The goal is to live detect the walls and export the distacne to wall .There is a setup , A closed 4 wall , one set of unique & ideal shape in each wall ( Triangle , Square .....) A robot with camera will roam inside the walls and have computer vision. Robot should detect the shape and export the distance between camera and wall( or that shape ).
I have implemented this goal by Opencv and the shape detection ( cv2.approxPolyDP ) and distance calculation ( perimeter calculation and edge counting then conversion of pixel length to real distance ).
It perfectly works in 90 degree angle , but not effective when happening in other angles.
Any better way of doing it.
Thanks
for cnt in contours[1:]:
# considering countours from 1 because from practical experience whole frame is often considered as a contour
area = cv2.contourArea(cnt)
# area of detected contour
approx = cv2.approxPolyDP(cnt, 0.02*cv2.arcLength(cnt, True), True)
#It predicts and makes pixel connected contour to a shape
x = approx.ravel()[0]
y = approx.ravel()[1]
# detected shape type label text placement
perimeter = cv2.arcLength(cnt,True)
# find perimeter

in other degrees you have the perspective view of the shapes.
you must use Geometric Transformations to neutralize perspective effect (using a known-shape object or angle of the camera).
also consider that using rectified images is highly recommended Camera Calibration.
Edit:
lets assume you have a square on the wall. when camera capture an image from non-90-degree straight-on view of the object. the square is not align and looks out of shape, this causes measurement error.
but you can use cv2.getPerspectiveTransform() .the function calculates the 3x3 matrix of a perspective transform M.
after that use warped = cv2.warpPerspective(img, M, (w,h)) and apply perspective transformation to the image. now the square (in warped image) looks like 90-degree straight-on view and your current code works well on the output image (warped image).
and excuse me for bad explanation. maybe this blog posts can help you:
4 Point OpenCV getPerspective Transform Example
Find distance from camera to object/marker using Python and OpenCV

Related

Raytracing and Computer Graphics. Color perception functions

Summary
This is a question about how to map light intensity values, as calculated in a raytracing model, to color values percieved by humans. I have built a ray tracing model, and found that including the inverse square law for calculation of light intensities produces graphical results which I believe are unintuitive. I think this is partly to do with the limited range of brightness values available with 8 bit color images, but more likely that I should not be using a linear map between light intensity and pixel color.
Background
I developed a recent interest in creating computer graphics with raytracing techniques.
A basic raytracing model might work something like this
Calculate ray vectors from the center of the camera (eye) in the direction of each screen pixel to be rendered
Perform vector collision tests with all objects in the world
If collision, make a record of the color of the object at the point where the collision occurs
Create a new vector from the collision point to the nearest light
Multiply the color of the light by the color of the object
This creates reasonable, but flat looking images, even when surface normals are included in the calculation.
Model Extensions
My interest was in trying to extend this model by including the distance into the light calculations.
If an object is lit by a light at distance d, then if the object is moved a distance 2d from the light source the illumination intensity is reduced by a factor of 4. This is the inverse square law.
It doesn't apply to all light models. (For example light arriving from an infinite distance has intensity independent of the position of an object.)
From playing around with my code I have found that this inverse square law doesn't produce the realistic lighting I was hoping for.
For example, I built some initial objects for a model of a room/scene, to test things out.
There are some objects at a distance of 3-5 from the camera.
There are walls which make a boundry for the room, and I have placed them with distance of order 10 to 100 from the camera.
There are some lights, distance of order 10 from the camera.
What I have found is this
If the boundry of the room is more than distance 10 from the camera, the color values are very dim.
If the boundry of the room is a distance 100 from the camera it is completely invisible.
This doesn't match up with what I would expect intuitively. It makes sense mathematically, as I am using a linear function to translate between color intensity and RGB pixel values.
Discussion
Moving an object from a distance 10 to a distance 100 reduces the color intensity by a factor of (100/10)^2 = 100. Since pixel RGB colors are in the range of 0 - 255, clearly a factor of 100 is significant and would explain why an object at distance 10 moved to distance 100 becomes completely invisible.
However, I suspect that the human perception of color is non-linear in some way, and I assume this is a problem which has already been solved in computer graphics. (Otherwise raytracing engines wouldn't work.)
My guess would be there is some kind of color perception function which describes how absolute light intensities should be mapped to human perception of light intensity / color.
Does anyone know anything about this problem or can point me in the right direction?
If an object is lit by a light at distance d, then if the object is moved a distance 2d from the light source the illumination intensity is reduced by a factor of 4. This is the inverse square law.
The physical quantity you're describing here is not intensity, but radiant flux. For a discussion of radiometric concepts in the context of ray tracing, see Chapter 5.4 of Physically Based Rendering.
If the boundary of the room is more than distance 10 from the camera, the color values are very dim.
If the boundary of the room is a distance 100 from the camera it is completely invisible.
The inverse square law can be a useful first approximation for point lights in a ray tracer (before more accurate lighting models are implemented). The key point is that the law - radiant flux falling off by the square of the distance - applies only to the light from the point source to a surface, not to the light that's then reflected from the surface to the camera.
In other words, moving the camera back from the scene shouldn't reduce the brightness of the objects in the rendered image; it should only reduce their size.

Is labelling images with polygon better than square?

I aim to make an object detection model and I labelled data with a square box
If I label the images with polygon, will it be better than square?
(labelling on image of people wearing safety helmet or not)
I did try label with polygon shape on a few images and after export txt file for YOLO
why it has only 4 points in the text file as same as labelled with a square shape
how those points will represent an area that I label accurately?
1 0.573748 0.018953 0.045332 0.036101
1 0.944520 0.098375 0.108931 0.167870
You have labeled your object in a polygonial format, but when you had made a conversion to YOLO-format the information in the labelings has reduced. The picture below shows how I suppose has happend;
...where you have done polygon shape annotation (black shape). But, the conversion has "searched" the smallest x-value from the polygonial coordinate points and smallest y-value from corresponding polygonial coordinate points. And, those are the "first two" values of your YOLO-format. The same logic has happend with the "width" and "heigth" -parameters.
A good description about the idea behind the labelling and dataset is shown in https://www.youtube.com/watch?v=h6s61a_pqfM.
In short; for your purpose (for efficiency) I propose you make fast & convenient annotation using rectangles only - no time consuming polygon annotation.
The YOLO you are using very likely only has square annotation support.
See this video showing square vs polygon quality of results for detection, and the problem of annotation time required to create custom data sets.
To use polygonal masks can I suggest switching to use YOLOv3-Polygon or YOLOv5-Polygon

skimage project an image's 3D plane to fronto-parallel view

I'm working on implementing Akush Gupta's synthetic data generation dataset (http://www.robots.ox.ac.uk/~vgg/data/scenetext/gupta16.pdf). In his work. he used a convolutional neural network to extract a point cloud from a 2-dimensional scenery image, segmented the point clouds to isolate different planes, used RANSAC to fit a 3d plane to the point cloud segments, and then warped the pixels for the segment, given the 3D plane, to a fronto-parallel view.
I'm stuck in this last part- warping my extracted 3D plane to a fronto-parallel view. I have X, Y, and Z vectors as well as a normal vector. I'm thinking what I need to do is perform some type of perspective transform or rotation that would bring all the pixels on the plane to a complete 0 Z-axis while the X and Y would remain the same. I could be wrong about this, it's been a long time since I've had any formal training in geometry or linear algebra.
It looks like skimage's Perspective Transform requires me to know the dimensions of the final segment coordinates in 2d space. It looks like AffineTransform requires me to know the rotation. All I have at this point is my X,Y,Z and normal vector and the suspicion that I may know my destination plane by just setting the Z axis to all zeros. I'm not sure if my assumption is correct but I need to be able to warp all the pixels in the segment of interest to fronto-parallel, fit a bounding box, place text inside of it, then warp the final segment back to the original perspective in 3d space.
Any help with how to think about this or implement it would be massively useful.

Generating density map for tree growth rings

I was just wondering if someone know of any papers or resources on generating synthetic images of growth rings in trees. Im thinking 2d scalar-fields or some other data representation which can then be used to render growth rings like images :)
Thanks!
never done or heard about this ...
If you need simulation then search for biology/botanist sites instead.
If you need just visually close results then I would:
make a polygon covering the cut (circle/oval like shape)
start with circle and when all working try to add some random distortion or use ellipse
create 1D texture with the density
it will be used to fill the polygon via triangle fan. So first find an image of the tree type you want to generate for example this:
Analyze the color and intensity as a function of diameter so extract a pie like piece (or a thin rectangle)
and plot a graph of R,G,B values to see how the rings are shaped
then create function that approximate that (or use piecewise interpolation) and create your own texture as function of tree age. You can interpolate in this way booth the color and density of rings.
My example shows that for this tree the color is the same so only its intensity changes. In this case you do not need to approximate all 3 functions. The bumps are a bit noisy due to another texture layer (ignore this at start). You can use:
intensity=A*|cos(pi*t)| as a start
A is brightness
t is age in years/cycles (and also the x coordinate (scaled) in your 1D texture)
so take base color R,G,B multiply it by A for each t and fill the texture pixel with this color. You can add some randomness to ring period (pi*t) and also the scale can be matched more closely. This is linear growth ,... so you can use exponential instead or interpolate to match bumps per length affected by age (distance form t=0)...
now just render the polygon
mid point is the t=0 coordinate in texture each vertex of polygon is t=full_age coordinate in texture. So render the triangle fan with these texture coordinates. If you need more close match (rings are not the same thickness along the perimeter) then you can convert this to 2D texture
[Notes]
You can also do this incrementally so do just one ring per iteration. Next ring polygon is last one enlarged or scaled by scale>1 and add some randomness, but this needs to be rendered by QUAD STRIP. You can have static texture for single ring so interpolate just the density and overall brightness:
radius(i)=radius(i-1)+ring_width=radius(i-1)*scale
so:
scale=(radius(i-1)+ring_width)/radius(i-1)

Fast vertex computation for near skeletal animation

I am building a 3D flower mesh through a series of extrusions (Working in Unity 4 environment). For each flower there are splines that define each branch, petals, leaves etc. Around each spline there is a shape that is being extruded along that spline. And a thickness curve that defines at each point in the spline, how thick the shape could be.
I am trying to animate this mesh using parameters (i.e. look towards sun, bend with the wind).
I have been working on this for nearly two months and my algorithm boiled down to basically reconstructing the flower 'every frame', by iterating over the shape for each spline point, transforming it into the the current space and calculating new vertex positions out of it, a version of parallel transport frames.
for each vertex:
v = p + q * e * w
where
p = point on the path
e = vertex position in the local space of shape being extruded
q = quaternion to transform into the local space of the p (by its direction towards next path point)
w = width at point p
I believe this is as small of a step as it gets to extrude the model. I need to do this once for each vertex in the model basically.
I hit the point where this is too slow for my current scene. I need to have 5-6 flowers with total around 60k vertices and I concluded that this is the bottleneck.
I can see that this is a similar problem to skeletal animation, where each joint would control a cross-section of the extruded shape. It is not a direct question but I'm wondering if someone can elaborate on whether I can steal some techniques from skeletal animation to speed up the process. In my current perspective, I just don't see how I can avoid at least one calculation per vertex, and in that case, I just choose to rebuild the mesh every frame.

Resources