How can I implement a code in verilog that resolves a exponential equation that has numbers that must be represented as fixed point.
For example I have this equation on C++ and wish to convert to Verilog or VHDL:
double y = 0.1+0.75*(1.0/(1.0+exp((x[i]+40.5)/6.0)));
Where 'y' and 'x' must be fixed point numbers. And 'x' is a vector also.
I looked up for modules and libraries that has fixed point but none of them have exponentials.
Verilog has a real data type that provides simulation-time support for floating-point numbers. It also has an exponentiation operator, e.g., a ** b computes a to the power of b.
However, code written using the real datatype is generally not synthesizable. Instead, in real hardware designs, support for fixed and floating point numbers is generally achieved by implementing arithmetic logic units that implement, e.g., the IEEE floating point standard.
Most of the time, such a design will require at least a couple of cycles even for basic operations like addition and multiplication. More complex algorithms like division, sine, cosine, etc., are generally implemented using algorithms based on approximating polynomials.
If you really want to understand how to represent and manipulate fixed point and floating point numbers, you should probably get a textbook for a mathematics course such as Numerical Methods, or an EE course on Computer Arithmetic.
Related
Bfloat16 is a half precision floating point format that has the same 8-bit exponent as single precision, but only 7 (plus 1 implied) bits of significand. Surprisingly, this turns out to be adequate precision for many machine learning applications, so a lot of resources are being put into making arithmetic in this format run fast.
Given that, it would seem to make sense to also try to use it for graphics. Using it for RGB components during calculation, for example, would allow a much wider dynamic range of light sources to be rendered, compared to just trying to calculate with 8-bit integers. At the same time, it could potentially be faster than using single precision floating point for RGB components.
Are any existing graphics rendering systems actually using it for such purposes?
I'm using Python3's decimal module. Is the underlying arithmetic done using the processors floating point types, or does it use integers? The notion that the results are 'exact' and of arbitrary precision suggests to me that integer maths is used below the surface.
Indeed it is integer math, not float math for sure. Roughly speaking every float is two parts - before and after the decimal dot (integer and the remainder). Thanks to that the calculations are done using integer arithmetic and hence are not rounded up so they are staying precise even if you sum up a very large value with a very small fraction.
This comes at a price - the number of operations is significantly larger and it is not always necessary to be so precise at all times. That is why most of the calculations are done using float arithmetic that may cause a loss of precision when there are many arithmetic operations on floats or there are significant differences between the values (e.g. 10^10 ratio and more). There is a separate field of computer science: numerical analysis or numerical methods that study the clever ways to get the most of the speed of float calculations while maintaining highest precision possible.
I'm a bit confused of the local space coordinate system. Suppose I have a complex object in the local space. I know when I want to put it in the world space I have to multiply it with Scale,Rotate,Translate matrix. But the problem is the local coordinate only ranged from -1.0f to 1.0f, when I want to have vertex like (1/500,1/100,1/100) things will not work, everything will become 0 due to the float accuracy problem.
The only solution to me now is separate them into lots of local space systems and ProjectView each individually to put them together. It seems not the correct way of solving the problem. I've been checked lots of books but none of them mentioned this issue. I really want to know how to solve it.
when I want to have vertex like (1/500,1/100,1/100) things will not work
What makes you think that? The float accuracy problem does not mean something will coerce to 0 if it can't be accurately represented. It just means, it will coerce to the floating point number closest to the intended figure.
It's the very same as writing down, e.g., 3/9 with at most 6 significant decimal digits: 0.33334 – it didn't coerce to 0. And the very same goes for floating point.
Now you may be familiar with scientific notation: x·10^y – this is essentially decimal floating point, a mantissa x and an exponent y which essentially specifies the order of magnitude. In binary floating point it becomes x·2^y. In either case the significant digits are in the mantissa. Your typical floating point number (in OpenGL) has a mantissa of 23 bits, which boils down to an amount of 22 significant binary digits (which are about 7 decimal digits).
I really want to know how to solve it.
The real trouble with floating point numbers is, if you have to mix and merge numbers over a large range of orders of magnitudes. As long as the numbers are of similar order of magnitudes, everything happens with just the mantissa. And that one last change in order of magnitude to the [-1, 1] range will not hurt you; heck this can be done by "normalizing" the floating point value and then simply dropping the exponent.
Recommended read: http://floating-point-gui.de/
Update
One further thing: If you're writing 1/500 in a language like C, then you're performing an integer division and that will of course round down to 0. If you want this to be a floating point operation you either have to write floating point literals or cast to float, i.e.
1./500.
or
(float)1/(float)500
Note that casting one of the operands to float suffices to make this a floating point division.
I have a scenario where i'm trying to execute a complex application on AIX and Linux.
During the execution the code makes use of the intrinsic function sqrt() for computation, but the result obtained is different on both the machines.
Does anyone know the reason for this behavior? Is there anyway to overcome this?
P.S
Some values are equal on both machines but majority of them are different.
Processors that follow the IEEE 754 specification must return the exact result for square root (or correctly rounded when exact cannot be represented). For the same input values, floating point format, and rounding mode, different IEEE 754 compliant processors must return an identical result. No variation is allowed. Possible reasons for seeing different results:
One of the processors does not follow the IEEE 754 floating point specification.
The values are really the same, but a print related bug or difference makes them appear different.
The rounding mode or precision control is not set the same on both systems.
One system attempts to follow the IEEE 754 specification but has an imperfection in its square root function.
Did you compare binary output to eliminate the possibility of a print formatting bug or difference?
Most processors today support IEEE 754 floating point. An example where IEEE 754 accuracy is not guaranteed is with the OpenCL native_sqrt function. OpenCL defines native_sqrt (in addition to IEEE 754 compliant sqrt) so that speed can be traded for accuracy if desired.
Bugs in IEEE 754 sqrt implementations are not too common today. A difficult case for an IEEE 754 sqrt function is when the rounding mode is set to nearest and the actual result is very near the midway point between two floating point representations. A method for generating these difficult square root arguments can be found in a paper by William Kahan, How to Test Whether SQRT is Rounded Correctly.
There may be slight differences in the numeric representation of the hardware on the two computers or in the algorithm used for the sqrt function of the two compilers. Finite precision arithmetic is not as the same the arithmetic of real numbers and slight differences in calculations should be expected. To judge whether the differences are unusual, you should state the numeric type that you are using (as asked by ChuckCottrill) and give examples. What is the relative difference. For values of order unity, 1E-9 is an expected difference for single precision floating point.
Check the floating point formats available for each cpu. Are you using single-precision or double-precision floating point? You need to use a floating point format with similar precision on both machines if you want comparable/similar answers.
Floating point is an approximation. A single precision floating point only uses 24 bits (including sign bit) for mantissa, and the other 8 bits for exponent. This allows for about 8 digits of precision. Double precision floating point uses 53 bits, allowing for much more precision.
Lacking detail about the binary values of the differing floating point numbers on the two systesm, and the printed representations of these values, you have rounding or representation differences.
I need to design a system that calculates correlation in verilog and I can only use fixed-point calculations with limited number of bits. So I need to implement a fixed-point multiplier which has less number of bits than the sum of the inputs (the inputs have the same length and number of fractional bits).
The point is that I can't just multiply them normally and then reduce the bits. So is there any particular way to do that?
A=B*C works just fine - you have to keep track of where the binary point will be throughout your calculations. But that's just bookeeping.
If you want the compiler to do the bookkeeping for you, use VHDL and the standard (as of VHDL-2008) fixed_point package