I'm developing a web component using WebGL2 (and three.js) with
OES_element_index_uint enabled. I'm drawing a geometry using indexed
vertices and I'm seeing the following anomaly -- look for the lines going
across the middle of the figure:
If I redraw the figure I get something slightly different even if I don't change
the inputs (I think).
I've made debug code to double check the buffers I'm passing in to the geometry
and I don't find any problem. I also haven't noticed anything like this using the
same implementation with smaller data sets (yet).
I looked at this webgl2 report site https://alteredqualia.com/tools/webgl-features/
and I notice the lines:
Max elements vertices: 1048575
Max elements indices: 150000
Could this be the cause of the anomaly because my buffers are bigger than that?
Where are these numbers explained?
I'm working on a Mac Laptop with the GPU reported as
Unmasked renderer: AMD Radeon Pro 560X OpenGL Engine
Unmasked vendor: ATI Technologies Inc.
Any clues as to what is going on or explanations about max elements indices/vertices would be much appreciated. Thanks in advance.
I'm rendering in WebGL1.0 and I've found the 64k vertex limitation to be "a thing". I get features like those long triangles like that appearing from nowhere when I hit this "thing".
The way to resolve it is to split your model up into several models so that no sub-model has more than 64k vertices.
I haven't tested the boundary on this myself but by some reports, you should maybe consider 65000 to be the boundary of max vertices per model as you may experience some issues if you use 65535.
Related
I am trying to make a normal template matching search more effizient by first doing the search on downscaled representations of the image. Basically I do a double pyrDown -> quarter resolution.
For most images and templates this works beautifully, but for some others I get really bad matching results. It seems to be especially bad for thin fonts or small contrast.
Look at this example:
And this template:
At 100% resolution I get a matching probability of 99,9%
At 50% resolution I get 90%
At 25% resolution I get 87%
I don't really know why its so bad for some images/templates. I tried to recreate and test in photoshop by hiding/showing the 25% downscaled template on top of the 25% downscaled image, and as you can see, it's not 100% congruent:
https://giphy.com/gifs/coWDjcvHysKgn95IFa
I need a way to get more probability for those matchings at low resolution because it needs to be fast.
Any ideas on how to improve my algorithm?
Here are the original files:
https://www.dropbox.com/s/llbdj9bx5eprxbk/images.zip?dl=0
This is not unusual and those scores seem perfectly fine. However here are some ideas that might help you improve the situation:
You mentioned that it seems to be especially bad for thin fonts. This could be happening because some of the pixels in the lines are being smoothed out or distorted with the Gaussian filter that is applied on pyrDown. It could also be an indication that you have reduced the resolution too much. Unfortunately I think the pyrDown function in OpenCV reduces the resolution by a factor of 2 so it does not give you the ability to fine tune it by other scale factors. Another thing you could try is the instruction resize() with interpolation set to INTER_LINEAR or INTER_CUBIC. The resize() function will allow you to resize the image using any scale factor so you might have more control of performance vs accuracy.
Use multiple templates of the same objects. If you come to a scene and can only achieve an 87% score, create a template out of that scene. Then add it to a database of templates that are to be utilized. Obviously as the amount of templates increases so does the time it takes to complete the search.
The best way to deal with this scenario is to perform an exhaustive match on the highest level of the pyramid then track it down to the lowest level using a reduced search space on lower levels. By exhaustive I mean you will search all rows and all columns across the entire top pyramid level image. You will keep track of the locations (row, col) of the highest matches on the highest level (you are already probably doing that). Then you will multiply those locations by a factor of 2 and perform a restricted search on the next lowest level (ex. 5 x 5 shift centered on the rough location). You keep doing this until you are at the bottom level. This will give you the best overall accuracy and performance. This is also the way most industrial computer vision packages do it.
Question: How do you calculate what the result of trilinear filtering should be?
I am working on getting a video driver working and am running into an issue where I am not passing a test that looks at mipmap levels. A 32x32 texture is created and the mipmap levels are maunally populated as seen below.
texsize_____mip lvl_____R____G_____B
32__________0________ff____00____00
16__________1________00____ff____00
8___________2________00____00____ff
4___________3________ff_____ff____00
2___________4________ff_____00____ff
1___________5________00_____ff____ff
Now I pass my test when using GL_NEAREST_MIPMAP_NEAREST/GL_LINEAR_MIPMAP_NEAREST but when I use GL_NEAREST_MIPMAP_LINEAR/GL_LINEAR_MIPMAP_LINEAR (which from what I can gather both use trilinear filtering) I fail the test which compares my results against an expected of the data above +/- 8 plus an additional small amount of error. When I draw I am drawing using a texture size of 32/16/8/4/2/1 I am getting
texsize_____mip lvl_____R____G_____B
32__________0________ff____00____00
16__________1________01____fe____00
8___________2________00____00____ff
4___________3________f9_____f9____06
2___________4________ff_____0c____f3
1___________5________00_____ff____ff
Where I am failing is at tex size 2 since the error is +/- 12. It also seems like I may be missing somethign since I would expect all of the levels to fail or none of the levels to fail, anyone have any insight into if it is normal for different levels to be filtered differently?
My question is given the results and inputs above what would one expect the results to be? From what I understand if I choose a texture size that correlates to a given level, it should interpolate between that level (lowest) and the level above it (highest). Is there a weighting that also comes into play that I am missing?
Is there any relation (preferably an equation) between the number of polygons in a 3D object and the rendering workload? I want to see how much the rendering workload would be increased if for instance the number of polygons doubles.
There is no clear connection between the arbitrary number of polygons and the mythical "workload".
See the following samples:
You render a cube with 6 faces composed of 12 triangles. You get, say, 1000fps (without vsync). When you tesselate the cube into 120 triangles, most likely the fps counter remains 1000.
You render a single fullscreen-sized quad with a heavy fragment shader with a lot of calculation. You get 0.5fps (or more, but I hope you get the point).
Another extreme. You are rendering a thousand of similar cubes, each with different texture. The rendering state change will take most of the time, not the actual rendering.
So, polygons may have different screen area and they may be rendered not within a single primitive. If you're talking about one big vertex array with a large number of polygons, then for some certain scenarios the performance change must be something like linear. "Something" because the videocard and the drivers are clipping the invisible polys and perfrom the early-out tests for each pixel being rendered.
Could you define 'workload'? – Erno yesterday
Well, I mean working
calculations. I want to see how much overhead (for GPU, CPU,
memory,...) would be increased. Actually I want to conclude the energy
usage of the device – user1196937 2 hours ago
If that is the actual question, a comparison of energy usage:
You will have to pick specific configurations and test those. Energy usage is very different from GPU to GPU and machine to machine.
Some GPU manufactures give very detailed information on the performance of their processors but when you want to compare those you will need an actual machine.
As a brief background, I have been slowly chugging away at the core framework of a game I've been wanting to make for some time now. It has gotten to the point where I want to start really fleshing it out with some graphics assets other than colored boxes. And this brings me to the heart of my question:
What is the best method for creating graphics assets that appear the same quality independent of the device they are drawn on?
My game is styled after Pokemon, so I want to capture the 16-bit feel while still remaining crisp regardless of the device resolution. Does this mean I just create a ton of duplicate sprite sheets? i.e. a 16x16 32x32 48x48 64x64 version of each asset? Or should I be making vector art and rendering it out specifically for each device? Or is there some other alternative I haven't considered?
Thanks!
If by 16-bit feel you mean a classic old-school "pixelated" style (but with crisp edges). Then you can just draw them in the minimal dimension and upscale by whatever factor you need using a Pixel Art Scaling Algorithm, the simplest being nearest neighbour. There are of course many algos that produce much nicer results than NN like the 2xSaI and hqx family of algorithms, and RotSprite if you need rotation.
If you want clean antialiased edges you might want to check out this Microsoft Research paper: Depixelizing Pixel Art
You can then use these algos as a loading pre-pass for your game.
Alternatively, you could shift them "earlier" into your art pipeline to help speed up generation of multiple (resolution/transform) variants, which you could further touch up. This choice largely depends on your level of labor resources and perfectionism. Note also that this loses the "purity" of the solution since it violates DRY because updates will require changes in all variants of a sprite.
I would suggest to first try out some of these upscaling filters and see if you are happy with the results. If you are, you can get away with a loading prepass, which is by far the most desirable outcome because it reduces work and maintenance by a large factor.
I will be start working on a robotics project which involves a mobile robot that has mounted 2 cameras (1.3 MP) fixed at a distance of 0.5m in between.I also have a few ultrasonic sensors, but they have only a 10 metter range and my enviroment is rather large (as an example, take a large warehouse with many pillars, boxes, walls .etc) .My main task is to identify obstacles and also find a roughly "best" route that the robot must take in order to navigate in a "rough" enviroment (the ground floor is not smooth at all). All the image processing is not made on the robot, but on a computer with NVIDIA GT425 2Gb Ram.
My questions are :
Should I mount the cameras on a rotative suport, so that they take pictures on a wider angle?
It is posible creating a reasonable 3D reconstruction based on only 2 views at such a small distance in between? If so, to what degree I can use this for obstacle avoidance and a best route construction?
If a roughly accurate 3D representation of the enviroment can be made, how can it be used as creating a map of the enviroment? (Consider the following example: the robot must sweep an fairly large area and it would be energy efficient if it would not go through the same place (or course) twice;however when a 3D reconstruction is made from one direction, how can it tell if it has already been there if it comes from the opposite direction )
I have found this response on a similar question , but I am still concerned with the accuracy of 3D reconstruction (for example a couple of boxes situated at 100m considering the small resolution and distance between the cameras).
I am just starting gathering information for this project, so if you haved worked on something similar please give me some guidelines (and some links:D) on how should I approach this specific task.
Thanks in advance,
Tamash
If you want to do obstacle avoidance, it is probably easiest to use the ultrasonic sensors. If the robot is moving at speeds suitable for a human environment then their range of 10m gives you ample time to stop the robot. Keep in mind that no system will guarantee that you don't accidentally hit something.
(2) It is posible creating a reasonable 3D reconstruction based on only 2 views at such a small distance in between? If so, to what degree I can use this for obstacle avoidance and a best route construction?
Yes, this is possible. Have a look at ROS and their vSLAM. http://www.ros.org/wiki/vslam and http://www.ros.org/wiki/slam_gmapping would be two of many possible resources.
however when a 3D reconstruction is made from one direction, how can it tell if it has already been there if it comes from the opposite direction
Well, you are trying to find your position given a measurement and a map. That should be possible, and it wouldn't matter from which direction the map was created. However, there is the loop closure problem. Because you are creating a 3D map at the same time as you are trying to find your way around, you don't know whether you are at a new place or at a place you have seen before.
CONCLUSION
This is a difficult task!
Actually, it's more than one. First you have simple obstacle avoidance (i.e. Don't drive into things.). Then you want to do simultaneous localisation and mapping (SLAM, read Wikipedia on that) and finally you want to do path planning (i.e. sweeping the floor without covering area twice).
I hope that helps?
I'd say no if you mean each eye rotating independently. You won't get the accuracy you need to do the stereo correspondence and make calibration a nightmare. But if you want the whole "head" of the robot to pivot, then that may be doable. But you should have some good encoders on the joints.
If you use ROS, there are some tools which help you turn the two stereo images into a 3d point cloud. http://www.ros.org/wiki/stereo_image_proc. There is a tradeoff between your baseline (the distance between the cameras) and your resolution at different ranges. large baseline = greater resolution at large distances, but it also has a large minimum distance. I don't think i would expect more than a few centimeters of accuracy from a static stereo rig. and this accuracy only gets worse when you compound there robot's location uncertainty.
2.5. for mapping and obstacle avoidance the first thing i would try to do is segment out the ground plane. the ground plane goes to mapping, and everything above is an obstacle. check out PCL for some point cloud operating functions: http://pointclouds.org/
if you can't simply put a planar laser on the robot like a SICK or Hokuyo, then i might try to convert the 3d point cloud into a pseudo-laser-scan then use some off the shelf SLAM instead of trying to do visual slam. i think you'll have better results.
Other thoughts:
now that the Microsoft Kinect has been released, it is usually easier (and cheaper) to simply use that to get a 3d point cloud instead of doing actual stereo.
This project sounds a lot like the DARPA LAGR program. (learning applied to ground robots). That program is over, but you may be able to track down papers published from it.