I know that Verilog has an arithmetic add operator. If I'm building an adder, should I make my own or use that? Which will perform better in my processor?
For simulation, the add operator will behave according to the standard and should be fine to use unless you have a reason to simulate a specific adder implementation.
For synthesis, what you get depends on your synthesis tool and final hardware platform. For example, FPGAs usually have dedicated logic for adds and using the Verilog add operator should take advantage of that automatically.
If you need extreme performance on your hardware, it's possible you could do better by using the available primitives directly. Though add is a very common operation and synthesis should be able to handle it well for most use cases.
Related
Clearly it will depend on the compiler and target — But is there a de facto standard? Do they synthesize as entire ALUs? Or as whatever the minimum adder or comparator would look like?
Another way to ask this question: If I were to have a bunch of logic with math in the verilog might it end up much larger than sticking in a simple cpu and forcing the calculations through that?
But is there a de facto standard? Do they synthesize as entire ALUs? Or as whatever the minimum adder or comparator would look like?
They will synthesize to the smallest block of logic which can still full-fill the operation in the required time.
I have opened up one of my mathematical blocks for you: A bilinear interpolator. This is the structure before it goes into the synthesis tool. At that time it is already a set of dedicated operations. The synthesis tool will then optimize these by e.g. reducing the amount of logic and/or merging functions.
If I were to have a bunch of logic with math in the Verilog might it end up much larger than sticking in a simple cpu and forcing the calculations through that?
Definitely not. You can think yourself through that:
A CPU is build from Verilog code.
That CPU has adders, multipliers etc.
If you would use a CPU instead of each of those, you would get recursion.
What is basic purpose of a layered testbench for verification when we can write all functionality in simple one program block. I know reusability is one purpose but what apart from it makes unique.
It depends on how big is your design. If your design is very small, one program block probably works fine. (BTW, I strongly discourage the use of program block, use a top module instead. There are known issues in thread scheduling with program block in big 3 EDA's simulators)
However if you are design is huge and you have to test it by divide any conquer, then you have to build your testbench in layers that matches the division of labor. If you don't layer your testbench probably, you will not able to reuse your block level code in the system level environment.
As I know, design ware library has multiplier cell library.but I don't know What is the benefit the using DesignWare multiplier Library versus custom multiplier(like booth algorithm)?
Does anyone know what is different and pros &cons each them?
Generally, using an IP from a vendor/device specific tailored library, is beneficial in a sense that it is implemented with a knowledge of the device's specific special function cells, which might contain a dedicated multiplier, divisor, ROM or whatever is not a standard logic cell. So basically, a complex logic could be implemented without "wasting" standard cells thus preserving them for other stuff.
SystemVerilog introduced some very useful constructs to improve coding style. However, as one of my coworkers always says, "You are not writing software, you are describing hardware." With that in mind, what features of the language should be avoided when the end result needs to be synthesized? This paper shows what features are currently synthesizable by the Synopsys tools, but to be safe I think one should only use the features that are synthesizable by all of the major vendors. Also, what constructs will produce strange results in the netlist which will be difficult to follow in an ECO?
In summary: I like compact and easy to maintain code, but not if it causes issues in the back end. What should I avoid?
Edit: In response to the close vote I want to try to make this a bit more specific. This question was inspired by this answer. I am a big fan of using the 'sugar' as Dave calls it to reduce the code complexity, but not if some synthesis tools are going to mangle signal names and make the result difficult to deal with. I am looking for more examples like this.
Theoretically, if you can write software that is synthesized into machine code to run on a piece of hardware, that software can be synthesized into hardware. And conversely, there are hardware constructs in Verilog-1995 that are not considered synthesizable simply because none of the major vendors ever got around to supporting it (e.g. assign/deassign). We still have people using //synopsis translate on/off because it took so long for them to support `ifdef SYNOPSYS.
Most of what I consider to be safe for synthesis in SystemVerilog is what I call syntactic sugar for Verilog. This is just more convenient ways of writing the same Verilog code with a lot less typing. Examples would be:
data types: typedef, struct, enum, int, byte
use of those types as ports, arguments and function return values
assignment operators: ++ -- +=
type casting and bit-streaming
packages
interfaces
port connection shortcuts
defaults for function/tasks/macro arguments, and port connections
Most of the constructs that fall into this category are taken from C and don't really change how the code gets synthesized. It's just more convenient to define and reference signals.
The place it gets difficult to synthesize is where there is dynamically allocated storage. This would be class objects, queues, dynamic arrays, and strings. as well as dynamically created processes with fork/join.
I think some people have a misconception about SystemVerilog thinking it is only for Verification when in fact the first version of the standard was the synthesizable subset, and Intel was one of the first users of it as a language for Design.
SystemVerilog(SV) can be used both as a HDL (Hardware Description Language) and HVL (Hardware Verification Language) and that is why it is often termed an "HDVL".
There are several interesting design constructs in SV which are synthesizable and can be used instead instead of older Verilog constructs, which are helpful in optimizing code and achieving faster results.
enum of SV vs parameter of Verilog while modelling FSM.
Use of logic instead of reg and wire.
Use of always_ff, always_comb, always_latch in place of
single always blocks in Verilog.
Use of the unique and priority statements instead of Verilog's
full and parallel case statements.
Wide range of data types available in SV.
Now what I have discussed above are those constructs of SystemVerilog which are used in RTL design.
But, the constructs which are used in the Verification Environment are non-synthesizable. They are as follows:
Dynamic arrays and associative arrays.
Program Blocks and Clocking blocks.
Mailboxes
Semaphores
Classes and all their related features.
Tasks
Chandle data types.
Queues.
Constrained random features.
Delay, wait, and event control statements.
I am synthesizing some multiplication units in verilog and I was wondering if you generally get better results in terms of area/power savings if you implement your own CSA using booth encoding when multplying or if you just use the * symbol and let the synthesis tool take care of the problem for you?
Thank you!
Generally, I tend to trust the compiler tools I use and don't fret so much about the results as long as they meet my timing and area budgets.
That said, with multipliers that need to run at fast speeds I find I get better results (in DC, at least) if I create a Verilog module containing the multiply (*) and a retiming register or two, and push down into this module to synthesise it before popping up to toplevel synthesis. It seems as if the compiler gets 'distracted' by other timing paths if you try to do everything at once, so making it focus on a multiplier that you know is going to be tricky seems to help.
You have this question tagged with "FPGA." If your target device is an FPGA then it may be advisable to use FPGA's multiplier megafunction (don't remember what Xilinx calls it these days.)
This way, you will be sure that the tool utilizes the whatever internal hardware structure that you intend to use irrespective of synthesizer tool. You will be sure to get an optimum solution that is also predictable from a timing and latency standpoint.
Additionally, you don't have to test it for all the corner cases, especially important if you are doing signed multiplication and what kind of coding guidelines you follow.
I agree with #Marty in that I would use *. I have previously built my own low power adder structures, which then ran in to problems when the design shifted process/had to be run at a higher frequency. Hard coded architectures like this remove quite a bit of portability from the code.
Using the directives is nice in trials to see the different size (area) of architectures, but I leave the decision to the synthesis tool to make the best call based on the timing constraints and available area. I am not sure how power aware the tools are by default. Previously we ended up getting an extra license which added a lot of power aware knowledge to the synthesis.