Clearly it will depend on the compiler and target — But is there a de facto standard? Do they synthesize as entire ALUs? Or as whatever the minimum adder or comparator would look like?
Another way to ask this question: If I were to have a bunch of logic with math in the verilog might it end up much larger than sticking in a simple cpu and forcing the calculations through that?
But is there a de facto standard? Do they synthesize as entire ALUs? Or as whatever the minimum adder or comparator would look like?
They will synthesize to the smallest block of logic which can still full-fill the operation in the required time.
I have opened up one of my mathematical blocks for you: A bilinear interpolator. This is the structure before it goes into the synthesis tool. At that time it is already a set of dedicated operations. The synthesis tool will then optimize these by e.g. reducing the amount of logic and/or merging functions.
If I were to have a bunch of logic with math in the Verilog might it end up much larger than sticking in a simple cpu and forcing the calculations through that?
Definitely not. You can think yourself through that:
A CPU is build from Verilog code.
That CPU has adders, multipliers etc.
If you would use a CPU instead of each of those, you would get recursion.
I know that Verilog has an arithmetic add operator. If I'm building an adder, should I make my own or use that? Which will perform better in my processor?
For simulation, the add operator will behave according to the standard and should be fine to use unless you have a reason to simulate a specific adder implementation.
For synthesis, what you get depends on your synthesis tool and final hardware platform. For example, FPGAs usually have dedicated logic for adds and using the Verilog add operator should take advantage of that automatically.
If you need extreme performance on your hardware, it's possible you could do better by using the available primitives directly. Though add is a very common operation and synthesis should be able to handle it well for most use cases.
A brief idea about application of structural programming technique in the construction of system software?
is it top-down Analysis or modular programming? or something else ?
Assuming you mean 'structured programming', it's a design discipline that arose in the 50s and 60s based on using a limited set of control-flow structures: alternatives (if-then-else), selection (case), repetition (for, while loops), procedures (aka subroutines), and lexically-scoped blocks.
This was contrasted with arbitrary code flow through unconstrained use of 'goto', arbitrary coupling through global variables, etc.
It is closely associated with top-down design, also referred to as stepwise refinement.
These days it's pretty usual for any procedural programming language to supply structured-programming constructs as a normal matter.
I am synthesizing some multiplication units in verilog and I was wondering if you generally get better results in terms of area/power savings if you implement your own CSA using booth encoding when multplying or if you just use the * symbol and let the synthesis tool take care of the problem for you?
Thank you!
Generally, I tend to trust the compiler tools I use and don't fret so much about the results as long as they meet my timing and area budgets.
That said, with multipliers that need to run at fast speeds I find I get better results (in DC, at least) if I create a Verilog module containing the multiply (*) and a retiming register or two, and push down into this module to synthesise it before popping up to toplevel synthesis. It seems as if the compiler gets 'distracted' by other timing paths if you try to do everything at once, so making it focus on a multiplier that you know is going to be tricky seems to help.
You have this question tagged with "FPGA." If your target device is an FPGA then it may be advisable to use FPGA's multiplier megafunction (don't remember what Xilinx calls it these days.)
This way, you will be sure that the tool utilizes the whatever internal hardware structure that you intend to use irrespective of synthesizer tool. You will be sure to get an optimum solution that is also predictable from a timing and latency standpoint.
Additionally, you don't have to test it for all the corner cases, especially important if you are doing signed multiplication and what kind of coding guidelines you follow.
I agree with #Marty in that I would use *. I have previously built my own low power adder structures, which then ran in to problems when the design shifted process/had to be run at a higher frequency. Hard coded architectures like this remove quite a bit of portability from the code.
Using the directives is nice in trials to see the different size (area) of architectures, but I leave the decision to the synthesis tool to make the best call based on the timing constraints and available area. I am not sure how power aware the tools are by default. Previously we ended up getting an extra license which added a lot of power aware knowledge to the synthesis.
I have read "Nonblocking Assignments in Verilog Synthesis, Coding Styles that Kill!" by Clifford Cummings. He says that the code at the bottom of this question is "guaranteed" to be synthesised into a three flip-flop pipeline, but it is not guaranteed to simulate correctly (example pipeb3, page 10; the "guaranteed" comment is on page 12). The document won a best paper award, so I assume the claim is true. http://www.sunburst-design.com/papers/CummingsSNUG2000SJ_NBA.pdf
My question: How is the correctness of Verilog synthesis defined if not by reference to the simulation semantics? Many thanks.
I suppose the bonus points question is: give the simplest possible Verilog program that has well-defined synthesis semantics and does not have well defined simulation semantics, assuming it is not the code below. Thanks again.
In fact, can someone give me a piece of Verilog thatis well defined when both simulated and synthesised, yet the two produce different results?
The code:
module pipeb3 q3, d, clk);
output [7:0] q3;
input [7:0] d;
input clk;
reg [7:0] q3, q2, q1;
always #(posedge clk) q1=d;
always #(posedge clk) q3=q2;
always #(posedge clk) q1=d;
endmodule
PS: in case anyone cares, I though a plausible definition of a correct synthesis tool might be along the lines of "the synthesised hardware will do something that a correct simulator could". But this is inconsistent with the paper.
[I now think the paper is not right. Section 5.2 of the 1364-2001 standard clearly says that the meaning of a Verilog program is defined by its simulation that the standard then proceeds to define (non-determinism and all). There is no mention whatsoever of any "guarantees" that synthesis tools must provide over and above simulators.
There is another standard 1364.1-2002 that describes the synthesisable subset. There is no obvious mention that the semantics of synthesised hardware should somehow differ from simulation. Section 5.2.2 "Modelling edge-sensitive storage devices" says that non-blocking assignments should be used to model flip-flops. In standard-speak that means that the use of anything else is unsupported.
As a final note, the section referred to in the previous paragraph says that blocking assignments can be used to calculate the RHS of the non-blocking assignment. This appears to violate Cummings' recommendation #5.
Cliff Cummings is listed as a member of the working group of the 1364.1-2002 standard. This standard is listed as replaced on the IEEE website but I cannot tell what it was replaced by.]
All -
Time for me to chime in with useful background information and my own opinions.
First - The IEEE-1364.1-2002 Verilog RTL Synthesis Standard was never fully implemented by any vendor, which is why none of us were in any hurry to update the standard or to provide a SystemVerilog version of the synthesis standard. To my knowledge, the standard was not "replaced," and has just expired. To my knowledge, the attributes described in the Standard were never fully implemented by any vendor. The only useful feature in the Standard that I believe was implemented by all vendors was that a vendor is supposed to set the macro `define SYNTHESIS before reading any user code, so that you can now use `ifndef SYNTHESIS - `endif as a generic replacement for the vendor-specific // synopsys translate_on - // synopsys translate_off pragma-comments.
Verilog was invented as a simulation language and was never intended to be a synthesis language. In the late 1980's, Synopsys recognized that engineers really liked this Verilog-simulation language and started to define a subset of the language that they (Synopsys) would recognize and convert through synthesis into hardware. We now refer to this as the RTL synthesis subset, and that subset can grow over time as synthesis tool vendors discover unique and creative ways to convert a new type of description into hardware.
There really is no "correctness of Verilog synthesis defined." Don Mills and I wrote a paper in 1999 entitled, "RTL Coding Styles That Yield Simulation and Synthesis Mismatches," to warn engineers about legal Verilog coding styles that could infer synthesized hardware with different behavior.
http://www.sunburst-design.com/papers/CummingsSNUG1999SJ_SynthMismatch.pdf
Consider this, if synthesized results always matched the behavior of Verilog simulations, there would be no need to run gate simulations. The design, as RTL-simulated, would be correct. Because there is no guaranteed match, engineers run gate-sims to prove that the gate behavior matches the RTL behavior, or they try to run equivalence checking tools to mathematically prove that the pre-synthesis RTL code is equivalent to the post-synthesis gate models, so that gate-sims are not required.
As for the bonus question, this is really hard, because Verilog semantics are rather well defined, even if the definition is that it is a legal race condition.
As far as well-defined code in simulation and synthesis with different results, consider:
module code1c (output reg o, input a, b);
always
o = a & b;
endmodule
In simulation, you never get past time-0. Simulation will loop forever because of the missing sensitivity list. Synthesis tools do not even consider the sensitivity list when inferring combinational logic, so you will get a 2-input and-gate and a warning about missing sensitivity list items that could cause a mis-match between pre- and post-synthesis simulations. In Verilog-2001 we added always #* to avoid this common problem, and in SystemVerilog we added always_comb to remove the sensitivity list and inform the synthesis tool of the designer-intended logic.
As far as whether the paper should offer guarantees on correct synthesis behavior, it probably should not, but the guarantees described in my paper define what an engineer can expect from a synthesis tool based on experience with multiple synthesis tools.
"As a final note, the section referred to in the previous paragraph says that blocking
assignments can be used to calculate the RHS of the non-blocking assignment. This
appears to violate Cummings' recommendation #5."
You are correct, this does violate coding guideline #5 and in my opinion should not be used.
Coding guideline #5 is frequently violated in VHDL designs because VHDL variables cannot trigger another process. I find the VHDL-camp evenly divided on this issue. Half say that you should not use variable assignments and the other half use variables to improve simulation performance but then are required to mix variable assignments with a final signal assignment to trigger other processes.
If you violate coding guideline #5 and if your code is correct, the simulation will work and the synthesis will also work, but if you have any mistakes in your code, it is very difficult to debug designs that violate coding guideline #5 because the waveform display for the combinational piece does not make sense. The output of the combinational logic in a waveform display only updates when reset is not asserted and on a clock edge, which is not how real combinational hardware behaves, and this has proven to be a difficult issue when debugging these designs using waveform displays (I did not include this information in the paper).
Regards - Cliff Cummings - Verilog & SystemVerilog Guru
I believe the reason that will synthesize correctly is because in real silicon there's no difference between 'blocking' and 'nonblocking'.
Synthesis will read that and create three flip flops chained back to back, as you've described.
This won't be a problem in synthesis (assuming you're not violating flop hold time), because real gates exhibit delays. On the rising edge of clk, it will take several ns for the value d to propogate to q1. By the time d propagates to q1, q1 will have already been sampled by the second flop, similarly with q2 and q3.
The reason this doesn't work in simulation is because there are no gate delays. On the positive edge of clock, q1 will be instantly replaced with d, possibly before q1 was sampled by the second flop. In a real circuit (with proper setup and hold time), q1 is guaranteed to be sampled on the positive edge of clock before the first flop can change its output value.
I know this 3 years old, but your post was just flagged up when someone tried to edit it. Cliff's answer is, of course, comprehensive, but it doesn't really answer your question. The other answer is also plain wrong.
My question: How is the correctness of Verilog synthesis defined if
not by reference to the simulation semantics?
You're right, of course. Synthesis is only 'correct' if (a) the result (output) simulates in the same way as the original (input), after possibly making some allowance for timing/etc issues, and/or (b) the synthesiser output can be formally proved to be equivalent to the synthesiser input.
give the simplest possible Verilog program that has well-defined
synthesis semantics and does not have well defined simulation
semantics
In principle, this shouldn't be possible. The synthesiser vendors tried to define templates that were based on code that had well-defined simulation semantics. However, Verilog was (and is) poorly defined, and NBAs didn't initially exist in the language, so you have oddities like the pipeline example. Best to forget about them.
In fact, can someone give me a piece of Verilog that is well defined
when both simulated and synthesised, yet the two produce different
results?
The only definition of 'well defined' (as opposed to 'correct') in synthesis is that multiple vendors will produce exactly the same incorrect result. This is pretty unlikely. I guess the classical async reset and async set clocked F/F would be close.