Is there a library that allows to merge bounding boxes with IoU greater than a threshold? - pytorch

I am doing an object detection as a part of my studies, and my supervisor tells me that I should use a method that merges the bounding boxes returned by the net (I use torchvision.models.detection.FasterRCNN), if their IoU is greater than some threshold. They keep calling it nms, but after reading about nms I don't think that nms is supposed to do that.
I already use torchvision.ops.nms, and it removes extra bounding boxes, but I wonder if I'm losing any useful information by discarding them, so it would be great if I could merge some of them.
Does anyone knows a library that provides this method, or a tutorial, that can be helpful?
Thanks in advance.

Related

3D bounding box - is there a simple method for making one?

Is there a simple/quick way to find out the dimensions of a 3D axis aligned bounding box for an item with rotations in the <x,y,z> axes? I know that in math there are different methods/equations to come up with the same answer for a problem and I've previously in the past explained with bare bones information about a method that worked for me, that I probably didn't explain well.
I also know that there are (downloadable)functions available for some languages like java and python to help solve/tackle parts of the math. Though that might not help those that want to understand those different methods better or might not have access to those particular functions for the language they are using.
So if a person has the following information:
The item center point coordinate in the x,y,z
The item x,y,z size
Along with the x,y,z rotation(s)
What would be the easiest approach with a person knowing that information, while either using or not using the item center point, matrix/matrices or cos and sin. Though an approach that would also be compact enough to take up the least amount of memory usage.

Finding all geohashes within two bounding coordinates

I have coordinates, which are assigned a corresponding geohash in my database. Now I want to retrieve all of the coordinates within two bounding coordinates (top right and top left corner). How can I do that properly?
I tried getting the geohash that fits both of those bounding coordinates, but this solution does not work when they are in completely different regions of the world (so they are not sharing anything in common).
Is there a better way to do that?
Thanks for your help
Unfortunately, this isn't something you can do out-of-the-box with datastore / App engine. (There are no built in spatial queries.)
For early prototyping, etc., you can do it the hard way - retrieve all the rows, and discard the ones not meeting your query in code. Obviously, probably not viable with real production data.
See related question Query for Entities Nearby with Geopt for some possible production solutions.

Why is it usually easier to perform selection tests in object space?

I'm taking an introductory graphics course, and while I intuitively understand that converting a click or touch into object coordinates will make the math much cleaner, reduce the chances for human error, and potentially make debugging easier, none of these are actually a very good explanation, conceptually, of why object coordinate spaces are used in selection tests, as opposed to simply using world coordinates for the test - rather, they're just observations of what tends to happen when object coordinates are used. So I ask: why?
A selection test involves comparing the click coordinates, which you get in window coordinates, against lots and lots of object features, which are represented in object coordinates.
You need to transform them into the same coordinate system in order to do the checks, so you can EITHER transform the one simple click point OR you can transform all the various object features.
Transforming one point or line is just a lot easier that transforming a whole bunch of object features of various types.
There are cases where the location of a specific object or point may not be known within a world coordinate system, but is known relative to some other coordinate system.
To summarize an example from my course text, consider the idea of two different towns, one using a grid system for its layout, and the other using what I can only describe as the New England we-made-cow-trails-into-roads method. A government employee is tasked with creating a layout of the area which includes them, and in doing so has to convert the two coordinate systems into a third, which encompasses the other two.
Sometimes, using a world atlas just isn't practical to get across the street, and so something much more local (and relevant) is used instead, as it provides much more detail over a much smaller area.
The text also explains that it may be more than simply impractical to use a given coordinate system - it may yield results that are improbable or just plain wrong. This is evidenced in the evolution of the geocentric and heliocentric models of the universe - the distance of the stars from us was calculated with very different results using the two models.
Thinking of my own example, the best that comes to mind would be something like your own internal organs - from the outside, you don't know for sure exactly the shape, size, and structure of each of them, but your own body does. In order to be able to access that information, you need to look inside the body (ideally in a way that doesn't kill you). It's not something that is plainly observable from outside.

How to check if two pictures "touching" each others?

I'm writing a game in wich the user is having a spaceship and need to "kill" some enemeis that wiil try to kill him back.
I have a "Texture 2d" for the user's spaceship picture,a bullet picture and an enemy picture.
I would like to know,after the user has shoot the bullet to the enemy,how can I check that the bullet has hurts the enemy?
In other words - what function checks that one picture is "covering" (even partial) another one?
Thnx!
:-)
Please have a look into the topic "2D Collision Detection". As you are using XNA the following site should give you a good start:
http://www.progware.org/Blog/post/XNA-2D-Basic-Collision-Detection.aspx
Basically you need to detect when two non-transparent pixels are overlapping, but to prevent unnecessary calculations, you first check if the bounding box for your ship and the enemy ships is even overlapping (since the pixels won't overlap if the bounding boxes don't).
Riemers.net has a good tutorial. Here's a good sample project on per-pixel collision detection from the app hub.
I'm unaware of any pre-existing API functions that do this, but implementing it yourself will be a good exercise.
You should know the x/y coordinates of each of your picture's origins. You should also know the dimensions of each picture.
You can calculate the bounding box of a picture, and whether there exist any points in common.

What is the best approach to compute efficiently the first intersection between a viewing ray and a set of objects?

For instance:
An approach to compute efficiently the first intersection between a viewing ray and a set of three objects: one sphere, one cone and one cylinder (other 3D primitives).
What you're looking for is a spatial partitioning scheme. There are a lot of options for dealing with this, and lots of research spent in this area as well. A good read would be Christer Ericsson's Real-Time Collision Detection.
One easy approach covered in that book would be to define a grid, assign all objects to all cells it intersects, and walk along the grid cells intersecting the line, front to back, intersecting with each object associated with that grid cell. Keep in mind that an object might be associated with more grid-cells, so the intersection point computed might actually not be in the current cell, but actually later on.
The next question would be how you define that grid. Unfortunately, there's no one good answer, and you need to consider what approach might fit your scenario best.
Other partitioning schemes of interest are different tree structures, such as kd-, Oct- and BSP-trees. You could even consider using trees combined with a grid.
EDIT
As pointed out, if your set is actually these three objects, you're definately better of just intersecting each one, and just pick the earliest one. If you're looking for ray-sphere, ray-cylinder, etc, intersection tests, these are not really hard and a quick google should supply all the math you might possibly need. :)
"computationally efficient" depends on how large the set is.
For a trivial set of three, just test each of them in turn, it's really not worth trying to optimise.
For larger sets, look at data structures which divide space (e.g. KD-Trees). Whole chapters (and indeed whole books) are dedicated to this problem. My favourite reference book is An Introduction to Ray Tracing (ed. Andrew. S. Glassner)
Alternatively, if I've misread your question and you're actually asking for algorithms for ray-object intersections for specific types of object, see the same book!
Well, it depends on what you're really trying to do. If you'd like to produce a solution that is correct for almost every pixel in a simple scene, an extremely quick method is to pre-calculate "what's in front" for each pixel by pre-rendering all of the objects with a unique identifying color into a background item buffer using scan conversion (aka the z-buffer). This is sometimes referred to as an item buffer.
Using that pre-computation, you then know what will be visible for almost all rays that you'll be shooting into the scene. As a result, your ray-environment intersection problem is greatly simplified: each ray hits one specific object.
When I was doing this many years ago, I was producing real-time raytraced images of admittedly simple scenes. I haven't revisited that code in quite a while but I suspect that with modern compilers and graphics hardware, performance would be orders of magnitude better than I was seeing then.
PS: I first read about the item buffer idea when I was doing my literature search in the early 90s. I originally found it mentioned in (I believe) an ACM paper from the late 70s. Sadly, I don't have the source reference available but, in short, it's a very old idea and one that works really well on scan conversion hardware.
I assume you have a ray d = (dx,dy,dz), starting at o = (ox,oy,oz) and you are finding the parameter t such that the point of intersection p = o+d*t. (Like this page, which describes ray-plane intersection using P2-P1 for d, P1 for o and u for t)
The first question I would ask is "Do these objects intersect"?
If not then you can cheat a little and check for ray collisions in order. Since you have three objects that may or may not move per frame it pays to pre-calculate their distance from the camera (e.g. from their centre points). Test against each object in turn, by distance from the camera, from smallest to largest. Although the empty space is the most expensive part of the render now, this is more effective than just testing against all three and taking a minimum value. If your image is high res then this is especially efficient since you amortise the cost across the number of pixels.
Otherwise, test against all three and take a minimum value...
In other situations you may want to make a hybrid of the two methods. If you can test two of the objects in order then do so (e.g. a sphere and a cube moving down a cylindrical tunnel), but test the third and take a minimum value to find the final object.

Resources