Are there any performance penalties by using nested Conditional Operator?

Are there any performance penalties by using nested Conditional Operator? - verilog

Here is an example that use nested conditional operator to map register address to it's value.
reg [4:0] mux;
reg [1:0] addr;
mux = (addr == 2'b00) ? i0 :
((addr == 2'b01) ? i1 :
((addr == 2'b10) ? i2 :
((addr == 2'b11) ? i3 :
4'bz)));
In my application, there are about one hundred registers, so The nested level is very deep. If the expression is C language executed by CPU, it will be very slow.
How about FPGA?

From my experience, it depends on the synthesizer as well as the options used. It can use your code as a functional guideline to generate equivalent logic. Or it can use it as a structural guideline in which case each ?: conditional operator is mapped to a 2:1 mux. You can do experiments with your synthesizer to figure how how the generate the gate equivalent, and read the synthesis options in the manual.
Generally I use the ?: conditional operator when I intentionally want a 2:1 mux (or tri-state driver). For more complex conditional multiplexing, I prefer using case statements or if-else statements. This strategy usually fulfills timing and area requirements.
With large multiplexing (you mentioned "about one hundred registers"), meeting timing and area requirements can be difficult. Sometimes the synthesizer can handle this, other times it needs more guidance. Synthesizer directives (refer to manual) and splitting the multiplexer into chunks is one way to deal with it. Your FPGA may have a macro module of dedicated logic (RAMs, complex arithmetic logic, etc) you could instantiate to substitute for portions of your code.

In this specific case of your example most synthesizers will interpret this as a series of 2-to-1 multiplexers.
In a more general case such as
output = (one_condition)? a : (another_condition)? b : (other_condition)? c : ...
It will use a multiplexer for each condition extending the asynchronous path. A longer asynchronous path means a longer settling time and slower maximum clock frequency.

Related

Incomplete assignment and latches

When incompletely assigning a value I get a latch. But why did I get a latch in the example below? I think there is no need for the latch of F output because it is defined at all values of SEL.
Verilog code:
always # (ENB or D or A or B pr SEL)
if (ENB)
begin
Q=D;
if (SEL)
F=A;
else
F=B;
end
Inferred logic:

Although it is defined at all values of SEL, it is not defined for all values of ENB. If ENB = 0, your code says that both Q and F should hold the value from the previous cycle. This is also what is inferred in the image you are linking: only update Q and F if ENB = 1.
If you want Q to be a latch and F not, you can do this:
always # (ENB or D or A or B or SEL)
begin
if (ENB)
Q=D;
if (SEL)
F=A;
else
F=B;
end
Edit: additional information
As pointed out in the comments, I only showed how you could realize combinational logic and a latch, without modifying your code too much. There are, however, some things which could be done better. So, a non-TL;DR version:
Although it is possible to put combinational logic and latches in one procedural block, it is better to split them into two blocks. You are designing two different kinds of hardware, so it is also better to seperate them in Verilog.
Use nonblocking assignments instead of blocking assignments when modeling latches. Clifford E. Cummings wrote an excellent paper on the difference between blocking and nonblocking assignments and why it is important to know the difference. I am also going to use this paper as source here: Nonblocking Assignments in Verilog Synthesis, Coding Styles That Kill!
First, it is important to understand what a race condition in Verilog is (Cummings):
A Verilog race condition occurs when two or more statements that are scheduled to execute in the same simulation time-step, would give different results when the order of statement execution is changed, as permitted by the IEEE Verilog Standard.
Simply put: always blocks may be executed in an arbitrary order, which could cause race conditions and thus unexpected behaviour.
To understand how to prevent this, it is important to understand the difference between blocking and nonblocking assignments. When you use a blocking assignment (=), the evaluation of the right-hand side (in your code A, B, and D) and assignment of the left-hand side (in your code Q and F) is done without interruption from any other Verilog statement (i.e., "it happens immediately"). When using a nonblocking assignment (<=), however, the left-hand side is only updated at the end of a timestep.
As you can imagine, the latter assignment type helps to prevent race conditions, because you know for sure at what moment the left-hand side of your assignment will be updated.
After an analysis of the matter, Cummings concludes, i.a., the following:
Guideline #1: When modeling sequential logic, use nonblocking assignments.
Guideline #2: When modeling latches, use nonblocking assignments.
Guideline #3: When modeling combinational logic with an always block, use blocking assignments.
A last point which I want to highlight from the aforementioned paper is the "why". Except from the fact that you are sure the right hardware is infered, it also helps when correlating pre-synthesis simulations with the the behaviour of your actual hardware:
But why? In general, the answer is simulation related. Ignoring the above guidelines [about using blocking or nonblocking assignments on page 2 of the paper] can still infer the correct synthesized logic, but the pre-synthesis simulation might not match the behavior of the synthesized circuit.
This last point is not possible if you want to strictly adhere to Verilog2001, but if you are free to choose your Verilog version, try to use always_combfor combinational logic and always_latch for latches. Both keywords automatically infer the sensitivity list, and it is easier for tools to find out if you actually coded up the logic you intended to design.
Quoting from the SystemVerilog LRM:
The always_latch construct is identical to the always_comb construct except that software tools should perform additional checks and warn if the behavior in an always_latch construct does not represent latched logic, whereas in an always_comb construct, tools should check and warn if the behavior does not represent combinational logic.
With these tips, your logic would look like this:
always_latch
begin
if (ENB)
Q <= D;
end
always_comb
begin
if (SEL)
F = A;
else
F = B;
end

verilog synthesis not converging after 2000 iterations

I have written the below code for a simple multiplication of 2 n-bit numbers(here n=16). It is getting simulated with desired output waveform but the problem is, it is not getting synthesized in vivado 17.2 even though I have written a static 'for loop'(ie., loop iteration is constant). I am getting below mentioned error.
[Synth 8-3380] loop condition does not converge after 2000 iterations
Note: I have written
for(i=i;i<n;i=i+1)
instead of
for(i=0;i<n;i=i+1)
because the latter one was executing once again after i reached n. So that is not a mistake. Kindly someone help. Thank you for your time
//unsigned integer multiplier
module multiplication(product,multiplier,multiplicand,clk,rset);
parameter n = 16;
output reg [(n<<1)-1:0]product;
input [n-1:0]multiplier, multiplicand;
input clk,rset;
reg [n:0]i='d0;
always #( posedge clk or posedge rset)
begin
if (rset) product <= 'd0;
else
begin
for(i=i;i<n;i=i+1)
begin
product =(multiplier[i] == 1'b1)? product + ( multiplicand << i ): product;
$display("product =%d,i=%d",product,i);
end
end
end
endmodule

First of all, it is not a good practice to use for, while kind of loops if you really want to implement your design on FPGA (Vivado is optimized to be used to implement your design on FPGA). Even if you can successfully synthesize your design, you might face with timing problems or unexpected bugs.
I think you can find your answer here.
Edit: I just wanted to inform you that generally controlling timing is very important in HW design, especially when you want to integrate your design with other system, loops can be nightmare for that.

Go back to using your original for (i=0 loop.
Your error is that you assume i=0 because of reg [n:0]i='d0; That is only true the very first time. Thus only once, at the start of the simulation.
because the latter one was executing once again after i reached n.
Yes, the loop will repeat again and again for every clock cycle. That is what #( posedge clk ...) does.
More errors:
You are using blocking assignment in the clock section, use non-blocking:
product <=(multiplier[i] == 1'b1)? product + ( multiplicand << i ): product;
Your product is only correct the first time after a reset (when it starts at zero). The second clock cycle after a reset you do the multiplication again but start with the previous value of product.
Your i is a bit big you use 17 bits to count to 16. Also global loop variables have pitfalls. I suggest you use the system Verilog syntax of: `for (int i=0; ....)

The error message is not about the iterations of your loop, but about the iterations of the internal algorithm of your synthesis program. It tells you that Vivado failed to create a circuit that could actually be implemented on your FPGA of choice at your clock speed of choice and which actually does what you're asking for.
Let me elaborate, after I mention two items of general import: don't use blocking assignments (=) inside always #(posedge clk) blocks. They almost never do what you want. Only non-blocking assignments (<=) should be clocked. Secondly, the correct way to synthesize a for loop is with generate, even though Vivado seems to accept the simple for.
You are building a fairly large block of combinatorial logic here. Remember that when you synthesize combinatorial logic you are asking for a circuit that can evaluate the expression that you've written within a single clock cycle. The actual expression taken into consideration is the one after the for loop has been unrolled. I.e., you are (conditionally) adding a 16bit number to a 32bit number 16 times, each times shifted one bit further to the left. Each of these additions has a carry bit. So each of these additions will actually have to look at all of the upper 16bits of the result of the previous addition, and each will depend on all but the lowest bit of the previous addition. An adder for one bit + carry needs O(5) gates. Each addition is conditional, which adds at least one more gate for each bit. So you are asking for at minimum of 16*16*6 = 1300 interdependent gates that all have to stabilize within a single clock cycle. Vivado is telling you that it cannot fulfill these requirements.
One way of mitigating this problem will be to synthesize for a lower clock frequency, where the gates have more time to stabilize, and you can thus build longer logic chains. Another option would be to pipeline the operation, say by only evaluating what corresponds to four iterations of your loop within a single clock cycle and then building the result over several clock cycles. This will add some bookkeeping logic to your code, but it is inevitable if one wants to evaluate complex expressions with finite resources at high clock frequencies. It would also introduce you to synchronous logic, which you will have to learn anyway if you want to do anything non-trivial with an FPGA. Note that this kind of pipelining wouldn't affect throughput significantly, because your FPGA would then be doing several multiplications in parallel.
You may also be able to rewrite the expression to handle the carry bits and interdependencies in a smarter way that allows Vivado to find its way through the expression (there probably is such a way, I assume it synthesizes if you simply write the multiplication operator?).
Finally, many FPGAs come with dedicated multiplier units because multiplication is a common operation, but implementing it in logic gates wastes a lot of resources. As you found out.

What is the difference between "," and "or" in always #?

Is there a difference between using , and or in always #.
For example:
always #(S or C)
and
always #(S,C)
If there is a difference, what is it?

They're synonymous. There's no difference whatsoever.
Also, consider using always #(*) for combinational blocks; it's supported by all modern synthesis tools, and automatically makes the block sensitive to any signals referenced in the block.
If you're using SystemVerilog, you should consider using always_comb instead.

There's no difference; they are alternative syntaxes for the same behavior. Originally (in the 1980s) Verilog only allowed #(S or C), but many people confused that with #(S || C). That means wait for a change in the result of the logical or expression S || C. Further complicating matters is that VHDL uses or for their logical boolean operator. So #(S, C) was added probably before 1990 to avoid confusion with the logical operator.
Note that for synthesizable combinational logic, you should be using implicit sensitivity with either
alway #(*) // ... Verilog
always_comb // ... SystemVerilog

Verilog Best Practice - Incrementing a variable

I'm by no means a Verilog expert, and I was wondering if someone knew which of these ways to increment a value was better. Sorry if this is too simple a question.
Way A:
In a combinational logic block, probably in a state machine:
//some condition
count_next = count + 1;
And then somewhere in a sequential block:
count <= count_next;
Or Way B:
Combinational block:
//some condition
count_en = 1;
Sequential block:
if (count_en == 1)
count <= count + 1;
I have seen Way A more often. One potential benefit of Way B is that if you are incrementing the same variable in many places in your state machine, perhaps it would use only one adder instead of many; or is that false?
Which method is preferred and why? Do either have a significant drawback?
Thank you.

One potential benefit of Way B is that if you are incrementing the same variable in many places in your state machine, perhaps it would use only one adder instead of many; or is that false?
Any synthesis tool will attempt automatic resource sharing. How well they do so depends on the tool and code written. Here is a document that describes some features of Design Compiler. Notice that in some cases, less area means worse timing.
Which method is preferred and why? Do either have a significant drawback?
It depends. Verilog(for synthesis) is a means to implement some logic circuit but the spec does not specify exactly how this is done. Way A may be the same as Way B on an FPGA but Way A is not consistent with low power design on an ASIC due to the unconditional sequential assignment. Using reset nets is almost a requirement on an ASIC but since many FPGAs start in a known state, you can save quite a bit of resources by not having them.

I use Way A in my Verilog code. My sequential blocks have almost no logic in them; they just assign registers based on the values of the "wire regs" computed in the combinational always blocks. There is just less to go wrong this way. And with Verilog we need all the help we can get.

What is your definition of "better" ?
It can be better performance (faster maximum frequency of the synthesized circuit), smaller area (less logic gates), or faster simulation execution.
Let's consider smaller area case for Xilinx and Altera FPGAs. Registers in those FPGA families have enable input. In your "Way B", *count_en* will be directly mapped into that enable register input, which will result in less logic gates. Essentially, "Way B" provides more "hints" to a synthesis tool how to better synthesize that circuit. Also it's possible that most FPGA synthesis tools (I'm talking about Xilinx XST, Altera MAP, Mentor Precision, and Synopsys Synplify) will correctly infer register enable input from the "Way A".
If *count_en* is synthesized as enable register input, that will result in better performance of the circuit, because your counter increment logic will have less logic levels.
Thanks

Verilog array syntax

I'm new to Verilog, and am having a lot of trouble with it. For example, I want to have an array with eight cells, each of which is 8 bits wide. The following doesn't work:
reg [7:0] transitionTable [0:7];
assign transitionTable[0] = 10;
neither does just doing transitionTable[0] = 10; or transitionTable[0] = 8'h10; Any ideas?
(In case it is not obvious and relevant: I want to make a finite state machine, and specify the state transitions in an array, since that seems easier than a massive case switch.)

When using assign you should declare the array as a wire instead of areg.

Since your goal is to design an FSM, there is no need to store the state values in an array. This is typically done using Verilog parameter's, a state register and a next_state with a case/endcase statement.
The following paper shows a complete example: FSM Fundamentals

If this is targeted towards synthesis:
A little beyond what was answered above, there are standard FSM coding styles that you should adhere to so the tools can perform better optimization. As described in the Cummings paper, one-hot is usually best for FPGA devices and in fact ISE(with default settings) will ignore your encoding and implement whatever it thinks will best utilize the resources on the device. This almost invariably results in a one-hot encoded FSM regardless of the state encoding you chose, provided it recognizes your FSM.

OK, so to answer your question, let's dig a little deeper into Verilog syntax.
First of all, to specify a range of bits, either do [MSB:LSB] or [LSB:MSB]. The standard is MSB:LSB but it is really up to you here, but try to be consistent.
Next, in array instantiation we have:
reg WIDTH reg_name NUMBER;
where WIDTH is the "size" of each element and NUMBER is the number of elements in the array.
So, you first want to do:
reg [7:0] transitionTable [7:0];
Then, to assign particular bytes (8 bits = 1 byte), do:
initial begin
transitionTable[0] = 8'h10;
end
A good book to learn Verilog from is FPGA Prototyping By Verilog Examples by Pong P. Chu.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string