Why is adding one operation causing my number of logic elements to skyrocket? - verilog

I'm designing a 464 order FIR filter in Verilog for use on the Altera DE0 FPGA. I've got (what I believe to be) a working implementation; however, there's one small issue that's really actually given me quite a headache. The basic operation works like this: A 10 bit number is sent from a micro controller and stored in datastore. The FPGA then filters the data, and lights LED1 if the data is near 100, and off if it's near 50. LED2 is on when the data is neither 100 nor 50, or the filter hasn't filled the buffer yet.
In the specification, the coefficients (which have been pre provided), have been multiplied by 2^15 in order to represent them as integers. Therefore, I need to divide my final output Y by 2^15. I have implemented this using a shift, since it should be (?) the most efficient way. However, this single line causes my number of logic elements to jump from ~11,000 without it, to over 35,000. The Altera DE0 uses a Cyclone III FPGA which only has room for about 15k logic elements. I've tried doing it inside both combinational and sequential logic blocks, both of which have the same exact issue.
Why is this single, seemingly simple operation causing such an inflation elements? I'll include my code, which I'm sure isn't the most efficient, nor the cleanest. I don't care about optimizing this design for performance or area/density at all. I just want to be able to fit it onto the FPGA so it'll run. I'm not very experienced in HDL design, and this is by far the most complex project I've needed to tackle. It's worth noting that I do not remove y completely, I replace the "bad" line with assign YY = y;.
Just as a note: I haven't included all of the coefficients, for sanity's sake. I know there might be a better way to do it than using case statements, but it's the way that it came and I don't really want to relocate 464 elements to a parameter declaration, etc.
module lab5 (LED1, LED2, handshake, reset, data_clock, datastore, bit_out, clk);
// NUMBER OF COEFFICIENTS (465)
// (Change this to a small value for initial testing and debugging,
// otherwise it will take ~4 minutes to load your program on the FPGA.)
parameter NUMCOEFFICIENTS = 465;
// DEFINE ALL REGISTERS AND WIRES HERE
reg [11:0] coeffIndex; // Coefficient index of FIR filter
reg signed [16:0] coefficient; // Coefficient of FIR filter for index coeffIndex
reg signed [16:0] out; // Register used for coefficient calculation
reg signed [31:0] y;
wire signed [7:0] YY;
reg [9:0] xn [0:464]; // Integer array for holding x
integer i;
output reg LED1, LED2;
// Added values from part 1
input reset, handshake, clk, data_clock, bit_out;
output reg [9:0] datastore;
integer k;
reg sent;
initial
begin
sent = 0;
i=0;
datastore = 10'b0000000000;
y=0;
LED1 = 0;
LED2 = 0;
for (i=0; i<NUMCOEFFICIENTS; i=i+1)
begin
xn[i] = 0;
end
end
always#(posedge data_clock)
begin
if(handshake)
begin
if(bit_out)
begin
datastore = datastore >> 1;
datastore [9] = 1;
end
else
begin
datastore = datastore >> 1;
datastore [9] = 0;
end
end
end
always#(negedge clk)
begin
if (!handshake )
begin
if(!sent)
begin
y=0;
for (i=NUMCOEFFICIENTS-1; i > 0; i=i-1) //shifts coeffecients
begin
xn[i] = xn[i-1];
end
xn[0] = datastore;
for (i=0; i<NUMCOEFFICIENTS; i=i+1)
begin
// Calculate coefficient based on the coeffIndex value. Note that coeffIndex is a signed value!
// (Note: These don't necessarily have to be blocking statements.)
case ( 464-i )
12'd0: out = 17'd442; // This coefficient should be multiplied with the oldest input value
12'd1: out = -17'd373;
12'd2: out = -17'd169;
...
12'd463: out = -17'd373; //-17'd373
12'd464: out = 17'd442; //17'd442
// This coefficient should be multiplied with the most recent data input
// This should never occur.
default: out = 17'h0000;
endcase
y = y + (out * xn[i]);
end
sent = 1;
end
end
else if (handshake)
begin
sent = 0;
end
end
assign YY = (y>>>15); //THIS IS THE LINE THAT IS CAUSING THE ISSUE!
always #(YY)
begin
LED1 = 0;
LED2 = 1;
if ((YY >= 40) && (YY <= 60))
begin
LED1 <= 0;
LED2 <= 0;
end
if ((YY >= 90) && (YY <= 110))
begin
LED1 <= 1;
LED2 <= 0;
end
end
endmodule

You're almost certainly seeing the effects of synthesis optimisation.
The following line is the only place that uses y:
assign YY = (y>>>15); //THIS IS THE LINE THAT IS CAUSING THE ISSUE!
If you remove this line, all the logic that feeds into y (including out and xn) will be removed. On Altera you want to look carefully through your map report which will contain (buried amongst a million other things) information about all the logic that Quartus has removed and the reason behind it.
Good places to start are the Port Connectivity Checks which will tell you if any inputs or outputs are stuck high or low or are dangling. The look through the Registers Removed During Synthesis section and Removed Registers Triggering Further Register Optimizations.
You can try to force Quartus not to remove redundant logic by using the following in your QSF:
set_instance_assignment -name preserve_fanout_free_node on -to reg
set_instance_assignment -name preserve_register on -to foo
In your case however it sounds like the correct solution is to re-factor the code rather than try to preserve redundant logic. I suspect you want to investigate using an embedded RAM to store the coefficients.

(In addition to Chiggs' answer, assuming that you are hooking up YY correctly ....)
I would add that, you don't need >>>. It would be simpler to write :
assign YY = y[22:15];
And BTW, initial blocks are ignored for synthesis. So, you want to move that initialization to the respective always blocks in a if (reset) or if (handshake) section.

Related

How to prevent inferred latch and latch unsafe behavior in Verilog?

I am having trouble with a specific part of my program, here in the always block:
module compare_block (clk, reset_n, result, led);
parameter data_width = 8; //width of data input including sign bit
parameter size = 1024;
input clk, reset_n;
input [(data_width+2):0] result; //from filter -- DOUBLE CHECK WIDTH
logic [(data_width):0] data_from_rom; //precalculated 1 Hz sine wave
logic [10:0] addr_to_rom;
output reg led;
reg [(data_width + 2):0] ans_sig [size-1:0];
integer i, iii, jj, j, ii;
reg ans_sig_done, filt_sig_done, comp_sig_done;
reg [(data_width+2):0] sum;
reg [data_width:0] max_val, error_val;
initial max_val='b000000000;
...
always #* begin
sum = 0;
if (ans_sig_done) begin
for (j=4; j<(size-1); j=j+2) begin
sum = sum + ans_sig[j];
if (ans_sig[j] > max_val) begin
max_val = ans_sig[j];
end else begin
max_val = max_val;
end//else
end //for
end//if
end//always
...
endmodule
Essentially, ans_sig is an array, 1024 bytes long that I want to sum into one number (sum) and eventual (not here) take the average of. While I am traversing the ans_sig array, I also want to identify the maximum value within the array (max_val), which is what the nested if-statement is doing. However I get the following severe warnings when I'm compiling in Quartus:
"Inferred latch for "max_val[8]" at compare_block.sv"
"13012 Latch compare_block:compare|max_val[8] has unsafe behavior"
"13013 Ports D and ENA on the latch are fed by the same signal compare_block: compare|LessThan473~synth" (for max_val[8])
I get all of these errors for max_val [0] through max_val [8].
this code represents a null-statement and actually signifies a latch rather than eliminating it:
end else begin
max_val = max_val; <<< null statement
It does not make much sense in using such statement unless you want to show that this has a latch behavior.
You initialized the max_val only once in the initial block. There for the latch behavior is an expected one: you keep max_val between multiple invocations of the sum for loop.
If this is not the case, and you need to re-calculate the max_val every time, you should initialize it in the always block same way as you do sum.
always #* begin
sum = 0;
max_val = 0;
if (ans_sig_done) begin
for (j=4; j<(size-1); j=j+2) begin
sum = sum + ans_sig[j];
if (ans_sig[j] > max_val) begin
max_val = ans_sig[j];
end
end//else
end //for
end//if
end//always
this way you will get rid of the latch.
If this module is for simulation purposes, perhaps you don't need to care about the warnings (I'm not pretty sure. Correct me if I'm wrong). However if it's for implementation, you'll need to use sequential logic to generate sum and max_val with ans_sig_done being the enable signal. You have 1024 11-bit long data, don't ever think about doing such a calculation with zero time consumption. Let's talk about the warnings you got. Since the always block is combinational, what do you expect when ans_sig_done is false. Combinational logic with missing branches results in latch behavior. By the way, you have a sum with the same bit width as each data inside the ans_sig array which will lead to potential data loss during calculation, and a max_val with even narrower bit width.

Will temp variable in always_comb create latch

I have following code snippet where a temp variable is used to count number of 1s in an array:
// count the number 1s in array
logic [5:0] count_v; //temp
always_comb begin
count_v = arr[0];
if (valid) begin
for (int i=1; i<=31; i++) begin
count_v = arr[i] + count_v;
end
end
final_count = count_v;
end
Will this logic create a latch for count_v ? Is synthesis tool smart enough to properly synthesize this logic? I am struggling to find any coding recommendation for these kind of scenarios.
Another example:
logic temp; // temp variable
always_comb begin
temp = 0;
for (int i=0; i<32; i++) begin
if (i>=start) begin
out_data[temp*8 +: 8] = in_data[i*8 +: 8];
temp = temp + 1'b1;
end
end
end
For any always block with deterministic initial assignment, it will not generate latch except logic loop.
Sorry Eddy Yau, we seem to have some discussions going on regarding your post.
Here is some example code:
module latch_or_not (
input cond,
input [3:0] v_in,
output reg latch,
output reg [2:0] comb1,
output reg [2:0] comb2
);
reg [2:0] temp;
reg [2:0] comb_loop;
// Make a latch
always #( * )
if (cond)
latch = v_in[0];
always #( * )
begin : aw1
integer i;
for (i=0; i<4; i=i+1)
comb_loop = comb_loop + v_in[i];
comb2 = comb_loop;
end
always #( * )
begin : aw2
integer i;
temp = 7;
for (i=0; i<4; i=i+1)
temp = temp - v_in[i];
comb1 = temp;
end
endmodule
This is what came out if it according to the Xilinx Vivado tool after elaboration:
The 'latch' output is obvious. You will also notice that temp is not present in the end result.
The 'comb_loop' is not a latch but even worse: it is a combinatorial loop. The output of the logic goes back to the input. A definitely NO-NO!
General rule: if you read a variable before writing to it, then your code implies memory of some sort. In this case, both the simulator and synthesiser have to implement storage of a previous value, so a synthesiser will give you a register or latch. Both your examples write to the temporary before reading it, so no storage is implied.
Does it synthesisie? Try it and see. I've seen lots of this sort of thing in production code, and it works (with the synths I've used), but I don't do it myself. I would try it, see what logic is created, and use that to decide whether you need to think more about it. Counting set bits is easy without a loop, but the count loop will almost certainly work with your synth. The second example may be more problematical.

How to dynamically reverse the bit position in verilog?

wire [9:0] data_reg;
reg [3:0] Reverse_Count = 8; //This register is derived in logic and I need to use it in following logic in order to reverse the bit position.
assign data_reg[9:0] = 10'h88; // Data Register
genvar i;
for (i=0; i< Reverse_Count; i=i+1)
assign IReg_swiz[i] = IReg[Reverse_Count - 1 -i];
This is generating syntax error. May I know how to do this in verilog
If you'd have Reverse_Count as constant, your task boils down to just wire mix-up, which is essentially free in HDL.
In your case, the task can be nicely reduced to first mirroring wide data and then shifting by Reverse_Count to get LBS bit on its position, which itself is done just by a row of N-to-1 multiplexers.
integer i;
reg [9:0] reversed;
wire [9:0] result;
// mirror bits in wide 10-bit value
always #*
for(i=0;i<10;i=i+1)
reversed[i] = data_reg[9-i];
// settle LSB on its place
assign result = reversed>>(10-Reverse_Count);
Reverse_Count is not a constant, ie it is not a parameter or localparam.
This means that the generate statement you would be creating and destroying hardware as required, this is not allowed in verilog as it would not be possible in hardware.
The Bus that your reversing should have a fixed width at compile time, it should be possible to declare Reverse_Count as a parameter.
Since the value of Reverse_Count dunamic, you cannot use a generate statement. You can use an always block with for-loop. To be synthesizable, the for-loop needs able to static unroll. To decide which bits reverse, use an if condition to compare the indexing value and Reverse_Count
Example:
parameter MAX = 10;
reg [MAX-1:0] IReg_swiz;
integer i;
always #* begin
for (i=0; i < MAX ; i=i+1) begin
if (i < Reverse_Count) begin
IReg_swiz[i] = IReg[Reverse_Count - 1 -i];
end
else begin
// All bits need to be assigned or complex latching logic will be inferred.
IReg_swiz[i] = IReg[i]; // Other values okay depending on your requirements.
end
end
end

Verilog Register to output

I am working with an Altera DE2 development board and I want to read an input in on the switches. This is stored in registers. Based on a counter these registers are incremented. The registers are then supposed to be output to the Seven Segment Displays thought a B2D converter. But I can not pass a register to a function.
wire [26:0] Q,Q2,Q3,Q4;
wire [3:0] one,two,three,four;
reg SecInc,MinInc,HrInc;
reg [3:0] M1,M2,H1,H2;
assign one = SW[3:0];
assign two = SW[7:4];
assign three = SW[11:8];
assign four = SW[15:12];
always begin
M1 = SW[3:0];
M2 = SW[7:4];
H1 = SW[11:8];
H2 = SW[15:12];
end
This is how I get and store the inputs. They come from the switches which we use as a binary representation on Hours and Minutes.
Based on a counter we increment a minute or an hour register.
//increment seconds from 0 to 60
counter seconds (SecInc,KEY[0],Q2);
defparam seconds.n = 8;
defparam seconds.mod = 60;
always # (negedge CLOCK_50) begin
if (Q2 >= 60) begin
MinInc = 1;
M1 <= M1 + 1'b1;
if(M1 >= 9) begin
M1 <= 0;
M2 <= M2 + 1'b1;
end
end else begin
MinInc = 0;
end
end
We want to display the result on the SSD's.
hex(M1,HEX4);
hex(M2,HEX5);
hex(H1,HEX6);
hex(H2,HEX7);
Here in lies the problem. This is not allowed in verilog. I need a way to send my registers to a function which displays numbers from 0 to 9 using some B2D conversion.
I will say I have never had a formal intro to verilog before and I have tried all I can think to do. I even tried to make a new module in which I would pass one,two,three,four and have the module increment them, like it does with Q2 for the counter I have shown. Any suggestions or help is greatly appreciated!
As requested here is the hex module:
module hex(BIN, SSD);
input [15:0] BIN;
output reg [0:6] SSD;
always begin
case(BIN)
0:SSD=7'b0000001;
1:SSD=7'b1001111;
2:SSD=7'b0010010;
3:SSD=7'b0000110;
4:SSD=7'b1001100;
5:SSD=7'b0100100;
6:SSD=7'b0100000;
7:SSD=7'b0001111;
8:SSD=7'b0000000;
9:SSD=7'b0001100;
endcase
end
endmodule
Thank you in advance!
Your hex module is not a function, it is a module and therefore must be instantiated with an instance name like this:
hex digit0(.BIN(M1), .SSD(HEX4));
hex digit1(.BIN(M2), .SSD(HEX5));
hex digit2(.BIN(H1), .SSD(HEX6));
hex digit3(.BIN(H2), .SSD(HEX7));
In addition to nguthrie being correct, that you need to instantiate your hex converter as a module, you drive M1 from a race condition in your always block. Non-blocking assignments will evaluate simultaneously within a block (or essentially simultaneously). This is not a program, where things happen in order. What might work better is:
always # (negedge CLOCK_50) begin
if (Q2 >= 60) begin
MinInc = 1;
if (M1 < 9) begin
M1 <= M1 + 1'b1;
end else begin
M1 <= 0;
M2 <= M2 + 1'b1;
end
end else begin
MinInc = 0;
end
end
You will also potentially get unexpected results from your blocking assignments to MinInc, but since I don't see where this is read it's hard to know what will happen.
Read up on blocking (=) vs non-blocking (<=) assignments in Verilog. It's one of the trickiest concepts of the language, and misuse of the two operations is the cause of 90% of the most dastardly bugs I've ever seen.
EDIT: In re-reading your question, it seems that you're trying to drive M1-4 from at least three places. You really can't have a continuous always begin block and a clocked (always # (negedge clock) begin) driving the same register. This will send your compiler into a tantrum.

24 bit counter state machine

I am trying to create a counter in verilog which counts how many clock cycles there have been and after ten million it will reset and start again.
I have created a twenty four bit adder module along with another module containing twenty four D Flip flops to store the count of the cycles outputted from the adder.
I then want to have a state machine which is in the count state until ten million cycles have passed then it goes to a reset state.
Does this sound right? The problem is I am not sure how to implement the state machine.
Can anyone point me to a website/book which could help me with this?
thanks
As Paul S already mentioned, there is no need for a state machine if you want your counter to keep counting after an overflow. You can do something like this (untested, might contain typos):
module overflow_counter (
clk,
reset,
enable,
ctr_out
);
// Port definitions
input clk, reset, enable;
output [23:0] ctr_out;
// Register definitions
reg [23:0] reg_ctr;
// Assignments
assign ctr_out = reg_ctr;
// Counter behaviour - Asynchronous active-high reset
initial reg_ctr <= 0;
always # (posedge clk or posedge reset)
begin
if (reset) reg_ctr <= 0;
else if (enable)
begin
if (reg_ctr == 10000000) reg_ctr <= 0;
else reg_ctr <= reg_ctr + 1;
end
end
endmodule
Of course, normally you'd use parameters so you don't have to make a custom module every time you want an overflowing counter. I'll leave that to you ;).
[Edit] And here are some documents to help you with FSM. I just searched Google for "verilog state machine":
EECS150: Finite State Machines in Verilog
Synthesizable Finite State Machine Design Techniques
I haven't read the first paper, so I can't comment on that. The 2nd one shows various styles of coding FSMs, among which the 3 always blocks style, which I highly recommend, because it's a lot easier to debug (state transitions and FSM output are neatly separated). The link seems to be down, so here is the cached Google result.
You don't need a state machine. You already have state in the counter. All you need to do is detect the value you want to wrap at and load 0 into your counter at that point
In pseudo-code:
if count == 10000000 then
nextCount = 0;
else
nextCount = count + 1;
...or...
nextCount = count + 1;
if count == 10000000 then
resetCount = 1;
State machines are not too tricky. Use localparam (with a width, don't forget the width, not shown here because it is just one bit) to define labels for your states. Then create two reg variables (state_reg, state_next). The _reg variable is your actual register. The _next variable is a "wire reg" (a wire that can be assigned to inside a combinational always block). The two things to remember are to do X_next = X_reg; in the combinational always block (and then the rest of the combinational logic) and X_reg <= X_next; in the sequential always block. You can get fancy for special cases but if you stick to these simple rules then things should just work. I try not to use instantiation for very simple things like adders since Verilog has great support for adders.
Since I work with FPGAs, I assign initial values to my registers and I don't use a reset signal. I'm not sure but for ASIC design I think it is the opposite.
localparam STATE_RESET = 1'b0, STATE_COUNT = 1'b1;
reg [23:0] cntr_reg = 24'd0, cntr_next;
reg state_reg = STATE_COUNT, state_next;
always #* begin
cntr_next = cntr_reg; // statement not required since we handle all cases
if (cntr_reg == 24'd10_000_000)
cntr_next = 24'd0;
else
cntr_next = cntr_reg + 24'd1;
state_next = state_reg; // statement required since we don't handle all cases
case (state_reg)
STATE_COUNT: if (cntr_reg == 24'd10_000_000) state_next = STATE_RESET;
endcase
end
always #(posedge clk) begin
cntr_reg <= cntr_next;
state_reg <= state_next;
end
I found this book to be very helpful. There is also a VHDL version of the book, so you can use both side-by-side as a Rosetta Stone to learn VHDL.

Resources