Calculating a parameter in a loop generate block - verilog

I have an array of parameters WIDTHS, and I need to calculate another parameter RIGHT based on some values in WIDTHS in a generate block. Is this possible? If not, is there an alternative way?
Here is an example of what I am trying to do. Suppose we have a predefined register module REG which has inputs d, q, we (write enable), CLK and _RESET. I would like to create a new module called GroupReg, which instantiates N instances of REG. Each instance has a different width (hence the WIDTH parameter array). The d, q, and we of each group are aggregated in arrays with the same name in GroupReg and need to be specified for each instance. Specifying we is easy (we[i]) since it is only one bit. However, specifying d and q with the correct LEFT and RIGHT values is where I have problem with since each group has a different width.
Looks like the only way to assign a value to a parameter is upon its definition, which prevents assigning a value to it in a generate loop.
module GroupReg(d, q, we, CLK, _RESET);
parameter N = 4; //Number of groups
//INDICES has to have N+1 members. Last member should be 0
parameter integer WIDTHS [N:0] = {40, 30, 20, 10, 0};
parameter integer DW_TOTAL = 128;
input logic [DW_TOTAL-1:0] d; // Data Input
input logic [N-1:0] we; // write enable
input logic CLK; // Clock Input
input logic _RESET; // Reset input (active low)
output logic [DW_TOTAL-1:0] q; // Q output
genvar i, j;
for (i=N-1 ; i>=0 ; i--) begin:REGISTERS
localparam WIDTH = WIDTHS[i];
localparam LEFT = RIGHT + WIDTHS[i];;
localparam RIGHT = 0;
for (j = 0 ; j<i ; j++) // <<----- Does not work
RIGHT = RIGHT + WIDTH[j];
REG #(
.DW (WIDTH),
)
reg_i
(
.d(d[LEFT:RIGHT]),
.q(q[LEFT:RIGHT]),
.we(we[i]),
.CLK(CLK),
._RESET(_RESET)
);
end : REGISTERS
endmodule

I tried using the sum() array reduction method on WIDTHS and it worked in Aldec Riviera PRO:
module some_module;
parameter N = 4; //Number of groups
parameter integer WIDTHS [N:0] = '{40, 30, 20, 10, 0};
parameter integer DW_TOTAL = WIDTHS.sum();
initial begin
$display("DW_TOTAL", DW_TOTAL);
end
endmodule
If you're lucky it's going to work in your simulator too.
I anyway don't really get what you're trying to do making N a parameter, seeing as how you're anyway hardcoding a fixed number of values for the widths.

This works in Modelsim:
module some_module;
parameter N = 4; //Number of groups
parameter integer WIDTHS [N:0] = '{40, 30, 20, 10, 0};
genvar i;
for (i=N-1 ; i>=0 ; i--) begin
localparam integer FOO[i:0] = WIDTHS[i:0];
//localparam RIGHT = FOO.sum();
initial begin
foreach (FOO[i])
$display("FOO[%0d] = %h", i, FOO[i]);
end
end
endmodule
The FOO parameter would only store the relevant entries from WIDTH for a specific loop iteration. If sum() would work, you'd be home free. The slicing syntax doesn't work in Riviera, however.
This is a typical example of vendors interpreting the standard differently, basically because it's not specific enough. Still, if you use a simulator from a different EDA company, try combining the two answers; maybe you're lucky and it works.

Related

How to constrain a counter reg size in verilog for ise synthesis?

I want to declare a counter reg in function of some parameters. I did it in this way :
parameter clk_freq = 95000; // clock frequency in kHz
parameter debounce_per_ms = 20;
localparam MAX_COUNT = ((debounce_per_ms * clk_freq)) + 1;
reg [$ln(MAX_COUNT)/$ln(2):0] count;
This work well in simulation with icarus but ISE 14.7 don't want to synthesize it. That give this error:
WARNING:HDLCompiler:1499 - "/src/button_deb.v" Line 4: Empty module <button_deb> remains a black box.
If I define the count like this :
reg [22:0] count;
ISE synthesize it well. If someone have a clue ?
This worked for me, although I'd swear I used functions like $log, $log10, $ceil and the like in the past with no problems.
module param_with_log2 (
input wire clk,
output wire d
);
function integer log2;
input integer value;
begin
value = value-1;
for (log2=0; value>0; log2=log2+1)
value = value>>1;
end
endfunction
parameter clk_freq = 95000; // clock frequency in kHz
parameter debounce_per_ms = 20;
localparam MAX_COUNT = ((debounce_per_ms * clk_freq)) + 1;
localparam integer UPPER = log2(MAX_COUNT);
reg [UPPER:0] count;
always #(posedge clk)
count <= count + 1;
assign d = count[UPPER];
endmodule
XST seems to have a problem with using constant functions: they only can be at the right side of a parameter declaration expression (as I suggested in my first comment). Credits and more information here:
http://www.beyond-circuits.com/wordpress/2008/11/constant-functions/
Notice too that UPPER is declared as localparam integer so we can use it inside a register definition upper bound expression. Credits go to the owner of this post: http://forums.xilinx.com/t5/Synthesis/XST-and-clog2/m-p/244440/highlight/true#M6609
(the module is just a phony module to have something that I can symthesize without the fear that the synthesizer will wipe all my code. It doesn't perform any kind of debouncing)

How to dynamically reverse the bit position in verilog?

wire [9:0] data_reg;
reg [3:0] Reverse_Count = 8; //This register is derived in logic and I need to use it in following logic in order to reverse the bit position.
assign data_reg[9:0] = 10'h88; // Data Register
genvar i;
for (i=0; i< Reverse_Count; i=i+1)
assign IReg_swiz[i] = IReg[Reverse_Count - 1 -i];
This is generating syntax error. May I know how to do this in verilog
If you'd have Reverse_Count as constant, your task boils down to just wire mix-up, which is essentially free in HDL.
In your case, the task can be nicely reduced to first mirroring wide data and then shifting by Reverse_Count to get LBS bit on its position, which itself is done just by a row of N-to-1 multiplexers.
integer i;
reg [9:0] reversed;
wire [9:0] result;
// mirror bits in wide 10-bit value
always #*
for(i=0;i<10;i=i+1)
reversed[i] = data_reg[9-i];
// settle LSB on its place
assign result = reversed>>(10-Reverse_Count);
Reverse_Count is not a constant, ie it is not a parameter or localparam.
This means that the generate statement you would be creating and destroying hardware as required, this is not allowed in verilog as it would not be possible in hardware.
The Bus that your reversing should have a fixed width at compile time, it should be possible to declare Reverse_Count as a parameter.
Since the value of Reverse_Count dunamic, you cannot use a generate statement. You can use an always block with for-loop. To be synthesizable, the for-loop needs able to static unroll. To decide which bits reverse, use an if condition to compare the indexing value and Reverse_Count
Example:
parameter MAX = 10;
reg [MAX-1:0] IReg_swiz;
integer i;
always #* begin
for (i=0; i < MAX ; i=i+1) begin
if (i < Reverse_Count) begin
IReg_swiz[i] = IReg[Reverse_Count - 1 -i];
end
else begin
// All bits need to be assigned or complex latching logic will be inferred.
IReg_swiz[i] = IReg[i]; // Other values okay depending on your requirements.
end
end
end

Setting multiple values in a vector to a single value

Basically, what is the best practice for programmatically specifying fan-out in System Verilog?
module fanout
#(
parameter N = 4
)
(
input i,
output j [N-1:0]
);
always # (*) begin
for (int k = 0; k < N; k++) begin
j[k] = i;
end
end
endmodule
This allows the width of the output vector to be a parameter -- are there any issues with this? Will this synthesize okay?
You can use the replication operator. This allows you to replicate a value a fixed number of times and concatenate them.
Example (Note you need to change the output to a packed data type output [N-1:0] j:
module fanout
#(
parameter N = 4
)
(
input i,
output [N-1:0] j
);
assign j = {N{i}};
endmodule
Runnable example on EDA playground: http://www.edaplayground.com/x/9vn
You can use a default in an assignment pattern
module fanout
#(
parameter N = 4
)
(
input i,
output j[N]
);
assign j = '{default:i};
endmodule

Why is adding one operation causing my number of logic elements to skyrocket?

I'm designing a 464 order FIR filter in Verilog for use on the Altera DE0 FPGA. I've got (what I believe to be) a working implementation; however, there's one small issue that's really actually given me quite a headache. The basic operation works like this: A 10 bit number is sent from a micro controller and stored in datastore. The FPGA then filters the data, and lights LED1 if the data is near 100, and off if it's near 50. LED2 is on when the data is neither 100 nor 50, or the filter hasn't filled the buffer yet.
In the specification, the coefficients (which have been pre provided), have been multiplied by 2^15 in order to represent them as integers. Therefore, I need to divide my final output Y by 2^15. I have implemented this using a shift, since it should be (?) the most efficient way. However, this single line causes my number of logic elements to jump from ~11,000 without it, to over 35,000. The Altera DE0 uses a Cyclone III FPGA which only has room for about 15k logic elements. I've tried doing it inside both combinational and sequential logic blocks, both of which have the same exact issue.
Why is this single, seemingly simple operation causing such an inflation elements? I'll include my code, which I'm sure isn't the most efficient, nor the cleanest. I don't care about optimizing this design for performance or area/density at all. I just want to be able to fit it onto the FPGA so it'll run. I'm not very experienced in HDL design, and this is by far the most complex project I've needed to tackle. It's worth noting that I do not remove y completely, I replace the "bad" line with assign YY = y;.
Just as a note: I haven't included all of the coefficients, for sanity's sake. I know there might be a better way to do it than using case statements, but it's the way that it came and I don't really want to relocate 464 elements to a parameter declaration, etc.
module lab5 (LED1, LED2, handshake, reset, data_clock, datastore, bit_out, clk);
// NUMBER OF COEFFICIENTS (465)
// (Change this to a small value for initial testing and debugging,
// otherwise it will take ~4 minutes to load your program on the FPGA.)
parameter NUMCOEFFICIENTS = 465;
// DEFINE ALL REGISTERS AND WIRES HERE
reg [11:0] coeffIndex; // Coefficient index of FIR filter
reg signed [16:0] coefficient; // Coefficient of FIR filter for index coeffIndex
reg signed [16:0] out; // Register used for coefficient calculation
reg signed [31:0] y;
wire signed [7:0] YY;
reg [9:0] xn [0:464]; // Integer array for holding x
integer i;
output reg LED1, LED2;
// Added values from part 1
input reset, handshake, clk, data_clock, bit_out;
output reg [9:0] datastore;
integer k;
reg sent;
initial
begin
sent = 0;
i=0;
datastore = 10'b0000000000;
y=0;
LED1 = 0;
LED2 = 0;
for (i=0; i<NUMCOEFFICIENTS; i=i+1)
begin
xn[i] = 0;
end
end
always#(posedge data_clock)
begin
if(handshake)
begin
if(bit_out)
begin
datastore = datastore >> 1;
datastore [9] = 1;
end
else
begin
datastore = datastore >> 1;
datastore [9] = 0;
end
end
end
always#(negedge clk)
begin
if (!handshake )
begin
if(!sent)
begin
y=0;
for (i=NUMCOEFFICIENTS-1; i > 0; i=i-1) //shifts coeffecients
begin
xn[i] = xn[i-1];
end
xn[0] = datastore;
for (i=0; i<NUMCOEFFICIENTS; i=i+1)
begin
// Calculate coefficient based on the coeffIndex value. Note that coeffIndex is a signed value!
// (Note: These don't necessarily have to be blocking statements.)
case ( 464-i )
12'd0: out = 17'd442; // This coefficient should be multiplied with the oldest input value
12'd1: out = -17'd373;
12'd2: out = -17'd169;
...
12'd463: out = -17'd373; //-17'd373
12'd464: out = 17'd442; //17'd442
// This coefficient should be multiplied with the most recent data input
// This should never occur.
default: out = 17'h0000;
endcase
y = y + (out * xn[i]);
end
sent = 1;
end
end
else if (handshake)
begin
sent = 0;
end
end
assign YY = (y>>>15); //THIS IS THE LINE THAT IS CAUSING THE ISSUE!
always #(YY)
begin
LED1 = 0;
LED2 = 1;
if ((YY >= 40) && (YY <= 60))
begin
LED1 <= 0;
LED2 <= 0;
end
if ((YY >= 90) && (YY <= 110))
begin
LED1 <= 1;
LED2 <= 0;
end
end
endmodule
You're almost certainly seeing the effects of synthesis optimisation.
The following line is the only place that uses y:
assign YY = (y>>>15); //THIS IS THE LINE THAT IS CAUSING THE ISSUE!
If you remove this line, all the logic that feeds into y (including out and xn) will be removed. On Altera you want to look carefully through your map report which will contain (buried amongst a million other things) information about all the logic that Quartus has removed and the reason behind it.
Good places to start are the Port Connectivity Checks which will tell you if any inputs or outputs are stuck high or low or are dangling. The look through the Registers Removed During Synthesis section and Removed Registers Triggering Further Register Optimizations.
You can try to force Quartus not to remove redundant logic by using the following in your QSF:
set_instance_assignment -name preserve_fanout_free_node on -to reg
set_instance_assignment -name preserve_register on -to foo
In your case however it sounds like the correct solution is to re-factor the code rather than try to preserve redundant logic. I suspect you want to investigate using an embedded RAM to store the coefficients.
(In addition to Chiggs' answer, assuming that you are hooking up YY correctly ....)
I would add that, you don't need >>>. It would be simpler to write :
assign YY = y[22:15];
And BTW, initial blocks are ignored for synthesis. So, you want to move that initialization to the respective always blocks in a if (reset) or if (handshake) section.

How to design a 64 x 64 bit array multiplier in Verilog?

I know how to design a 4x4 array multiplier , but if I follow the same logic , the coding becomes tedious.
4 x 4 - 16 partial products
64 x 64 - 4096 partial products.
Along with 8 full adders and 4 half adders, How many full adders and half adders do I need for 64 x 64 bit. How do I reduce the number of Partial products? Is there any simple way to solve this ?
Whenever tediously coding a repetitive pattern you should use a generate statement instead:
module array_multiplier(a, b, y);
parameter width = 8;
input [width-1:0] a, b;
output [width-1:0] y;
wire [width*width-1:0] partials;
genvar i;
assign partials[width-1 : 0] = a[0] ? b : 0;
generate for (i = 1; i < width; i = i+1) begin:gen
assign partials[width*(i+1)-1 : width*i] = (a[i] ? b << i : 0) +
partials[width*i-1 : width*(i-1)];
end endgenerate
assign y = partials[width*width-1 : width*(width-1)];
endmodule
I've verified this module using the following test-bench:
http://svn.clifford.at/handicraft/2013/array_multiplier/array_multiplier_tb.v
EDIT:
As #Debian has asked for a pipelined version - here it is. This time using a for loop in an always-region for the array part.
module array_multiplier_pipeline(clk, a, b, y);
parameter width = 8;
input clk;
input [width-1:0] a, b;
output [width-1:0] y;
reg [width-1:0] a_pipeline [0:width-2];
reg [width-1:0] b_pipeline [0:width-2];
reg [width-1:0] partials [0:width-1];
integer i;
always #(posedge clk) begin
a_pipeline[0] <= a;
b_pipeline[0] <= b;
for (i = 1; i < width-1; i = i+1) begin
a_pipeline[i] <= a_pipeline[i-1];
b_pipeline[i] <= b_pipeline[i-1];
end
partials[0] <= a[0] ? b : 0;
for (i = 1; i < width; i = i+1)
partials[i] <= (a_pipeline[i-1][i] ? b_pipeline[i-1] << i : 0) +
partials[i-1];
end
assign y = partials[width-1];
endmodule
Note that with many synthesis tools it's also possible to just add (width) register stages after the non-pipelined adder and let the tools register balancing pass do the pipelining.
[how to] reduce the number of partial products?
A method somewhat common used to be modified Booth encoding:
At the cost of more complicated addend selection, it at least almost halves their number.
In its simplest form, considering groups of three adjacent bits (overlapping by one) from one of the operands, say, b, and selecting 0, a, 2a, -2a or -a as an addend.
The code below generates only half of expected the output.
module arr_multi(a, b, y);
parameter w = 8;
input [w-1:0] a, b; // w-width
output [(2*w)-1:0] y; // p-partials
wire [(2*w*w)-1:0] p; //assign width as input bits multiplied by
output bits
genvar i;
assign p[(2*w)-1 : 0] = a[0] ? b : 0; //first output size bits
generate
for (i = 1; i < w; i = i+1)
begin
assign p[(w*(4+(2*(i-1))))-1 : (w*2)*i] = (a[i]?b<<i :0) + p[(w*(4+(2*
(i-2))))-1 :(w*2)*(i-1)];
end
endgenerate
assign y=p[(2*w*w)-1:(2*w)*(w-1)]; //taking last output size bits
endmodule

Resources