Multiplying 23 bit datatypes in a system with no long long - long-integer

I am trying to implement floating point operations in a microcontroller and so far I have had ample success.
The problem lies in the way I do multiplication in my computer and it works fine:
unsigned long long gig,mm1,mm2;
unsigned long m,m1,m2;
mm1 = f1.float_parts.mantissa;
mm2 = f2.float_parts.mantissa;
m1 = f1.float_parts.mantissa;
m2 = f2.float_parts.mantissa;
gig = mm1*mm2; //this works fine I get all the bits I need since they are all long long, but won't work in the mcu
gig = m1*m2//this does not work, to be precise it gives only the 32 least significant bits , but works on the mcu
So you can see that my problem is that the microcontroller will throw an undefined refence to __muldi3 if I try the gig = mm1*mm2 there.
And If I try with the smaller data types, it only keeps the least significant bits, which I don't want it to. I need the 23 msb bits of the product.
Does anyone have any ideas as to how I can do this?

Apologizes for the short answer, I hope that someone else will take the time to write a fuller explanation, but basically you do exactly as when you multiply two big numbers by hand on a paper! It's just that instead of working with base 10, you work in base 256. That is, treat your numbers as a byte vectors, and do with each byte what you do to a digit when you "hand multiply".

The comments in the FreeBSD implementation of __muldi3() have a good explanation of the required procedure, see muldi3.c. If you want to go straight to the source (always a good idea!), according to the comments this code was based on an algorithm described in Knuth's The Art of Computer Programming vol. 2 (2nd ed), section 4.3.3, p. 278. (N.B. the link is for the 3rd edition.)

Back on the Intel 8088 (the original PC CPU and the last CPU I wrote assembly code for) when you multiplied two 16 bit numbers (32 bits? whoow) the CPU would return 2 16 bit numbers in two different registers - one with the 16 msb and one with the lsb.
You should check the hardware capabilities of your micro-controller, maybe it has a similar setup (obviously you'll need the code this in assembly if it does).
Otherwise you'll have to implement multiplication on your own.

Related

Changing sign of a 64 bit floating point number in verilog?

So I am working with 64 bit floating point numbers in Verilog for synthesis, ideally I would like to do -A*B, where A and B are the two numbers. I have got past doing A*B, so is it okay now if I just change the value of the first bit 0 to 1 or 1 to 0 to make it represent -A*B.
kinda like,
A[0]=~A[0];
Thanks in advance for any suggestion.
Yes! That's all there is to it.
Keep in mind that negating 0 will give you -0. (They're different floating-point bit patterns.) Whether this matters to you will depend on your application.

Division in verilog

I am teaching myself verilog. The book I am following stated in the introduction chapters that to perform division we use the '/' operator or '%' operator. In later chapters it's saying that division is too complex for verilog and cannot be synthesized, so to perform division it introduces a long algorithm.
So I am confused, can't verilog handle simple division? is the / operator useless?
It all depends what type of code you're writing.
If you're writing code that you intend to be synthesised, that you intend to go into an FPGA or ASIC, then you probably don't want to use the division or modulo operators. When you put any arithmetic operator in RTL the synthesiser instances a circuit to do the job; An adder for + & -; A multiplier for *. When you write / you're asking for a divider circuit, but a divider circuit is a very complex thing. It often takes multiple clock cycles, and may use look up tables. It's asking a lot of a synthesis tool to infer what you want when you write a / b.
(Obviously dividing by powers of 2 is simple, but normally you'd use the shift operators)
If you're writing code that you don't want to be synthesised, that is part of a test bench for example, then you can use division all you want.
So to answer your question, the / operator isn't useless, but you have be concious of where and why you're using it. The same is true of *, but to a lesser degree. Multipliers are quite expensive, but most synthesisers are able to infer them.
You have to think in hardware.
When you write a <= b/c you are saying to the synthesis tool "I want a divider that can provide a result every clock cycle and has no intermediate pipline registers".
If you work out the logic circuit required to create that it's very complex, especially for higher bit counts. Generally FPGAs won't have specialist hardware blocks for division so it would have to be implemented out of generic logic resources. It's likely to be both big (lots of luts) and slow (low fmax).
Some synthesisers may implement it anyway (from a quick search it seems quartus will), others won't bother because they don't think it's very useful in practice.
If you are dividing by a constant and can live with an approximate result then you can do tricks with multipliers. Take the reciprocal of what you wanted to divide by, multiply it by a power of two and round to the nearest integer.
Then in your verilog you can implement your approximate divide by multiply (which is not too expensive on modern FPGAS) followed by shift (shifting by a fixed number of bits is essentially free in hardware). Make sure you allow enough bits for the intermediate result.
If you need an exact answer or if you need to divide by something that is not a pre-defined constant you will have to decide what kind of divider you want. IF your throughput is low then you can use a state machine based approach which does one division every n clock cycles. If your throughput is high and you can afford the device area then a pipelined approach which does a division per clock cycle (but requires multiple cycles for the result to flow through) may be more appropriate.
Often tool vendors will provide pre-made blocks (altera calls them megafunctions) for this kind of stuff. The advantage of these is that the tool vendor will likely have carefully optimised them for the device. The downside is they can bring vendor lockin, if you want to move to a different device vendor you will most likely have to swap out the block and the block you swap it with may have different characteristics.
So im confused. cant verilog handle simple division? is the / operator
useless?
The verilog synthesis spec (IEEE 1364.1) actually indicates all arithmetic operators with integer operands should be supported but nobody follows this spec. Some synthesis tools can do integer division but others will reject it(I think XST still does) because combinational division is typically very area inefficient. Multicycle implementations are the norm but these cannot be synthesized from '/'.
Division and modulo are never "simple". Avoid them if you can do so, e.g. through bit masks or shift operations. Especially a variable divisor is really complicated to implement in hardware.
"Verilog the language" handles division and modulo just fine - when you are using a computer to simulate your code you have full access to all it's abilities.
When you are synthesising your code to a particular chip, there are limitations. The limitations tend to be based on what the tool-vendor thinks is "sensible" rather than what is feasible.
In the old days, division by anything other than a power-of-two was deemed to be non-sensible for silicon as it took up a lot of space and ran very slowly. At the moment, some synthesisers with create "divide by a constant" circuits for you.
In future, I see no reason why the synthesiser shouldn't create you a divider (or make use of one that is in the DSP blocks of a potential future architecture). Whether it will or not remains to be seen, but witness the progression of multipliers (from "only powers of two" to "one input constant" to "full implementation" in just a few years)
circuits including only division by 2 : just shift the bit :)
other than 2 .... see you should always think at circuit level verilog is NOT C or C++
/ and % is not synthesizable or if it becomes( in new versions) i believe you should keep your own division circuit this is because the ip they provide will be general ( most probably they will make for floating not fixed)
i bet you had gone through morris mano computer architechure book , there in some last chapters the whole flow is given along with algo , go through it follow it and make your own
see now if your works go for only logic verification and no real circuit is needed , sure go for / and % . no problem it will work for simulation
Division using '/' is possible in verilog. But it is not a synthesizable operator. Same is the case for multiplication using '*'. There are certain algorithms to perform these operations in verliog, and they are used if the code needs to be synthesizable. ie. if you require an equivalent hardware for it.
I am not aware of any algorithms for division, but for multiplication, i have used Booth's algorithm.
if you want the synthesizable code you can use the Divison_IP or you can use the right shifting operator for some divisions like 64/8=8 same 64>>3 = 8.
Division isn't simple in hardware as people spent a lot of time in an efficient
and fast multiplier as an example. However, you can do divid by 2 easily by right shifting one bit in hardware.
Actually your point is very valid and I was also confused in my initial days of learning HDLs.
When you synthesise a division operator, it consumes a lot of resources on FPGA or during logic synthesis for ASIC. Try following instead.
You can also perform division(and multiplication) by shifting some vector(right = division, left = multiplication). But that will be multiplication(and divion) by 2.
Example 0100 = 4
Shift right 0010 = 2(which is 4/2)
Shift left 1000 = 8(which is 4*2).
We use >> operator for shift right, and << for shift left.
But we can also produce variations out of it.
For example multiplication by 3.
So if we have 0100 (4 dec) then also will be
shift left and add one at each step. ((0100 << 1)+1)
Similarly division by 3
shift right and subtract one at each step. ((0100 >> 1) - 1)
These methods were made because to be honest, resources in FPGA are limited, and when it comes to ASICs, your manager tries to kill you for any additional logic. :)
The division operator / is not useless in Verilog/System Verilog. It works in case of simulations as usual mathematical operator.
Some synthesis tools like Xilinx Vivado synthesize the division operator also because it is having a pre-built algorithm in it (though takes more hardware gates).
In simple words, you can do division in Verilog but have to take care of tools and simulators.
Using result <= a/b and works perfectly.
Remember when using the <= operator, the answer is calculated immediately but the answer is entered inside the "result" register at next clock positive edge.
If you don't want to wait till next clock positive edge use result = a/b.
Remember, any arithmetic operation circuit needs some time to finish the operation, and during this time the circuit generates random numbers (bits).
Its like when A-10 warthog attack airplane attacks a tank it shoots lots of bullets. That's how the divider circuit acts while dividing,it spits random bits. After couple of nanoseconds it will finish dividing and return a stable good result.
This is why we wait until next clock cycle for the "result" register. We try to protect it from random garbage numbers.
Division is the most complex operation, so it will have a delay in calculation. For 16bit division the result will be calculated in approximately 6 nanoseconds.

scale 14 bit word to an 8 bit word

I'm working on a project where I sample a signal with an ADC, that represents values as 14 bit words. I need to scale the values to 8 bit words. What's a good way to go about this in general. By the way, I'm using an FPGA so I'd like to do it in "hardware" rather than a software solution. Also in case you're wondering the chain of events will be: sample analog signal, represent sample value with 14 bit word, scale 14 bit word to 8 bit word, transmit 8 bit word with UART to PC COM1.
I've never done this before. I was assuming you use quantization levels, but I'm not sure what an efficient circuit for this operation would be. Any help would be appreciated.
Thanks
You just need an add and a shift:
val_8 = (val_14 + 32) >> 6;
(The + 32 is necessary to get correct rounding - you can omit it but you will get more truncation noise in your signal if you do.)
I think you just drop the six lowest resolution bits and call it good, right? But I might not fully understand the problem statement.
Paul's algorithm is correct, but you'll need some bounds checking.
assign val_8 = (&val_14[13:5]) ? //Make sure your sum won't overflow
8'hFF : //Assign all 1's if it will
val_14[13:6] + val_14[5];

Decoding 68k instructions

I'm writing an interpreted 68k emulator as a personal/educational project. Right now I'm trying to develop a simple, general decoding mechanism.
As I understand it, the first two bytes of each instruction are enough to uniquely identify the operation (with two rare exceptions) and the number of words left to be read, if any.
Here is what I would like to accomplish in my decoding phase:
1. read two bytes
2. determine which instruction it is
3. extract the operands
4. pass the opcode and the operands on to the execute phase
I can't just pass the first two bytes into a lookup table like I could with the first few bits in a RISC arch, because operands are "in the way". How can I accomplish part 2 in a general way?
Broadly, my question is: How do I remove the variability of operands from the decoding process?
More background:
Here is a partial table from section 8.2 of the Programmer's Reference Manual:
Table 8.2. Operation Code Map
Bits 15-12 Operation
0000 Bit Manipulation/MOVEP/Immediate
0001 Move Byte
...
1110 Shift/Rotate/Bit Field
1111 Coprocessor Interface...
This made great sense to me, but then I look at the bit patterns for each instruction and notice that there isn't a single instruction where bits 15-12 are 0001, 0010, or 0011. There must be some big piece of the picture that I'm missing.
This Decoding Z80 Opcodes site explains decoding explicitly, which is something I haven't found in the 68k programmer's reference manual or by googling.
I've decided to simply create a look-up table with every possible pattern for each instruction. It was my first idea, but I discarded it as "wasteful, inelegant". Now, I'm accepting it as "really fast".

How to make gcc on SUN calculate floating points the same way as in Linux

I have a project where I have to perform some mathematics calculations with double variables.
The problem is that I get different results on SUN Solaris 9 and Linux.
There are a lot of ways (explained here and other forums) how to make Linux work as Sun, but not the other way around.
I cannot touch the Linux code, so it is only SUN I can change.
Is there any way to make SUN to behave as Linux?
The code I run(compile with gcc on both systems):
int hash_func(char *long_id)
{
double product, lnum, gold;
while (*long_id)
lnum = lnum * 10.0 + (*long_id++ - '0');
printf("lnum => %20.20f\n", lnum);
lnum = lnum * 10.0E-8;
printf("lnum => %20.20f\n", lnum);
gold = 0.6125423371582974;
product = lnum * gold;
printf("product => %20.20f\n", product);
...
}
if the input is 339886769243483
the output in Linux:
lnum => 339886769243**483**.00000000000000000000
lnum => 33988676.9243**4829473495483398**
product => 20819503.600158**59827399253845**
When on SUN:
lnum => 339886769243483.00000000000000000000
lnum => 33988676.92434830218553543091
product = 20819503.600158**60199928283691**
Note: The result is not always different, moreover most of the times it is the same. Just 10 15-digit numbers out of 60000 have this problem.
Please help!!!
The real answer here is another question: why do you think you need this? There may be a better way to accomplish what you're trying to do that doesn't depend on intricate details of platform floating-point. Having said that...
It's unfortunate that you can't change the Linux code, since it's really the Linux results that are deficient here. The SUN results are as good as they could possibly be: they're correctly rounded; each multiplication gives the unique (in this case) C double that's closest to the result. In contrast, the first Linux multiplication does not give a correctly rounded result.
Your Linux results come from a 32-bit system on x86 hardware, right? The results you show are consistent with, and likely caused by, the phenomenon of 'double rounding': the result of the first multiplication is first rounded to 64-bit precision (the precision used internally by the Intel x87 FPU), and then re-rounded to the usual 53-bit precision of a double. Most of the time (around 1999 times out of 2000 or so on average) this double round has the same effect as a single round to 53-bit precision would have had, but occasionally it can produce a different result, and that's what you're seeing here.
As you say, there are ways to fix the Linux results to match the Solaris ones: one of these is to use appropriate compiler flags to force the use of SSE2 instructions for floating-point operations if possible. The recent 4.5 release of gcc also fixes the difference by means of a new -fexcess-precision flag, though the fix may impact performance when not using SSE2.
[Edit: after several rereads of the gcc manuals, the gcc-patches mailing list thread at http://gcc.gnu.org/ml/gcc-patches/2008-11/msg00105.html, and the related gcc bug report, it's still not clear to me whether use of -fexcess-precision=standard does in fact eliminate double rounding on x87 systems; I think the answer depends on the value of FLT_EVAL_METHOD. I don't have a 32-bit Linux/x86 machine handy to test this on.]
But I don't know how you'd fix the Solaris results to match the Linux ones, and I'm not sure why you'd want to: you'd be making the Solaris results less accurate instead of making the Linux results more accurate.
[Edit: caf has a good suggestion here. On Solaris, try deliberately using long double for intermediate results, then forcing back to double. If done right, this should reproduce the double rounding effect that you're seeing in Linux.]
See David Monniaux's excellent paper The pitfalls of verifying floating-point computations for a good explanation of double rounding. It's essential reading after the Goldberg article mentioned in an earlier answer.
Two things:
you need to read this: http://docs.sun.com/source/806-3568/ncg_goldberg.html
you need to decide what your requirements are for numerical precision in your application
These values differ by less than one part in 252. But a double-precision number has just 52 bits behind the radix point. So they differ by just the last bit. It may be that one machine isn't always rounding correctly, but you're doing two multiplications, so the answer is going to have that much error anyway.

Resources