How to correctly calculate the frequency of the device in Timing Analyzer, Intel Quartus - verilog

I have 3 modules: modulo remainder generator, modulo adder and modulo Wallace adder. Their speeds are related as follows: remainder_modulo > wallace_adder_modulo > modulo_adder. But Timing Analyzer as far as I understand gives me the frequency of the device, but that's not what I need. I want to know the real time delay so that the speeds correlate the way they should. What are the specifications I need to rely on?
module remainder_modulo
#(parameter n)
(
input wire [n-1:0] A,
input wire [n-1:0] P,
output wire [n:0] S,
output Po
);
wire [n:0] A_factor = {A, 1'b0};
wire [n:0] P_extended = {1'b0, P};
wire [n:0] S_temp;
multidigitAdder #(.n(n+1)) multAdd(.A(A_factor), .B(P_extended), .Pi(1'b1), .S(S_temp), .Po(Po));
assign S = Po ? S_temp : A_factor;
endmodule
module adder_modulo
#(parameter n)
(
input wire [n-1:0] A,
input wire [n-1:0] B,
input wire [n-1:0] P,
output wire [n-1:0] S,
output Po
);
wire [n-1:0] S_temp, S_temp_mod;
multidigitAdder #(.n(n)) multAdd1(.A(A), .B(B), .Pi(1'b0), .S(S_temp));
multidigitAdder #(.n(n)) multAdd2(.A(S_temp), .B(P), .Pi(1'b1), .S(S_temp_mod), .Po(Po));
assign S = Po ? S_temp_mod : S_temp;
endmodule
module adder_wallace
#(parameter n)
(
input wire [n-1:0] A,
input wire [n-1:0] B,
input wire [n-1:0] P,
input Pi,
output wire [n-1:0] S,
output Po
);
wire [n-1:0] S_arr, Po_arr;
genvar i;
generate
for (i = 0; i < n; i = i + 1) begin : MEM
bitAdder adder(A[i], B[i], P[i], S_arr[i], Po_arr[i]);
end
endgenerate
wire [n:0] multi_B_arr = {Po_arr, Pi};
wire [n:0] multi_A_arr = {1'b0, S_arr};
multidigitAdder #(.n(n + 1)) mAdder(.A(multi_A_arr), .B(multi_B_arr), .Pi(1'b0), .S(S), .Po(Po));
endmodule
module adder_modulo_wallace
#(parameter n)
(
input wire [n-1:0] A,
input wire [n-1:0] B,
input wire [n-1:0] P,
output wire [n-1:0] S,
output Po
);
wire [n-1:0] simpleSum, wallaceSum;
multidigitAdder #(.n(n)) multAdd1(.A(A), .B(B), .Pi(0), .S(simpleSum));
adder_wallace #(.n(n)) add(.A(A), .B(B), .P(P), .Pi(1), .S(wallaceSum), .Po(Po));
assign S = Po ? wallaceSum : simpleSum;
endmodule
module multidigitAdder
#(parameter n)
(
input wire [n-1:0] A,
input wire [n-1:0] B,
input Pi,
output wire [n-1:0] S,
output Po
);
assign {Po, S} = A + B + Pi;
endmodule
remainder_modulo:
Maximum frequency - 165.65 Mhz
Start node: cnt[0]
End node: reduce_modulo:reduce|multidigitAdder:multAdd|Add1~8_OTERM9
Slack: 16.642
Data delay: 3.31
wallace_adder_modulo:
Maximum frequency: 136.59 Mhz
Start node: cnt[0]
End node: adder_modulo_wallace:addWallaceMod|S[3]~3_OTERM9
Slack: 17.084
Data delay: 2.75
adder_modulo:
Maximum frequency: 165.65 Mhz
Start node: cnt[0]
End node: adder_modulo:addMod|multidigitAdder:multAdd2|Add1~6_OTERM9
Slack: 18.076
Data delay: 1.875

The parameter Maximum Frequency is limiting factor on performance.
The posted code will implement as combinational logic whose max delay is 1/Maximum Frequency for the given module.
If the modules are implemented as part of a single clock synchronous system, then the max clock rate of the system will be is controlled by the slowest module which is the wallace_adder_module at 136.59 MHz.
The delay to obtain a new sample from any module in that system is 1/136.59 MHz = 7.3212 ns.
Consider an assembly line of workers consisting of multiple workstations; the performance limiting factor of the line is the slowest station.
There is no expected, actual, or average delay reported by fpga timing tools. There is no theoretical delay. The tools report the maximum so that designers can select a maximum clock frequency. If the delay thru the logic is > than the clock frequency, the design does not work. The assumption in synchronous design is that the logic produce 1 logical result per clock cycle.
Here is the options menu for reporting delays in Vivado's timing analyzer. Other vendors will be similar.
A theoretical delay could be manually postulated based on mapping to theoretical gate delays, however fpga's don't target gates (they target the vendors macro blocks) so those models don't exist in the scope of fpga tools.
Since Vivado provides min delays, you could take min + max/2 as a typical; however I would not rely on that number in any way other than as a thought experiment.
It looks like you synthesized the modules separately without any top level module to bring them together. The hardware implementation & performance numbers will change significantly when they are synthesized together because of combining. Same will happen when combined into a synchronous system with registers/flip flops.
If you want to understand the nature of the delays better, open the tools RTL view and take a close look at how the logic got mapped to the vendors hardware.
There is no need to attempt to align delays between modules for fpga design. Put modules in a synchronous system so that each module is surrounded by registers/ff's and the system acts as if each modules produces a new answer every clock edge.

Related

Confused with ripple carry adder output

I am working on a ripple carry adder using structural verilog, which is supposed to take in two random inputs and calculate accordingly.
The general rca I created calculated correctly, but for some reason I get weird outputs when I add a for loop and use the $random to generate.
Could someone kindly explain where I'm going wrong? Below is my code:
module full_adder(x,y,z,v,cout);
parameter delay = 1;
input x,y,z; //input a, b and c
output v,cout; //sum and carry out
xor #delay x1(w1,x,y);
xor #delay x2(v,w1,z);
and #delay a1(w2,z,y);
and #delay a2(w3,z,x);
and #delay a3(w4,x,y);
or #delay o1(cout, w2,w3,w4);
endmodule
module four_bit_adder(a,b,s,cout,cin);//four_bit_adder
input [15:0] a,b; //input a, b
input cin; //carry in
output [15:0] s; //output s
output cout; //carry out
wire [15:0] c;
full_adder fa1(a[0],b[0],cin,s[0],c0);
full_adder fa2(a[1],b[1],c0,s[1],c1);
.
.
.
full_adder fa16(a[15],b[15],c14,s[15],cout);
endmodule
module testAdder(a,b,s,cout,cin);
input [15:0] s;
input cout;
output [15:0] a,b;
output cin;
reg [15:0] a,b;
reg cin;
integer i;
integer seed1=4;
integer seed2=5;
initial begin
for(i=0; i<5000; i=i+1) begin
a = $random(seed1);
b = $random(seed2);
$monitor("a=%d, b=%d, cin=%d, s=%d, cout=%d",a,b,cin,s,cout);
$display("a=%d, b=%d, cin=%d, s=%d, cout=%d",a,b,cin,s,cout);
end
end
endmodule
Here are two lines from the output that I get:
a=38893, b=58591, cin=x, s= z, cout=z
a=55136, b=58098, cin=x, s= z, cout=z
This is a combinational circuit, so the output changes instantaneously as the input changes. But, here you are apply all the inputs at same timestamp which should not be done since the full_adder module provides 1-timestamp delay. This may not cause problems in this module, but may cause issues while modelling sequential logic. Add a minimum of #10 delay between inputs.
Also, $monitor executes on each change in the signal list, so no need to use it in for loop. Just initialize $monitor in initial condition.
cin is also not driven from the testbench. Default value of reg is 'x and that of wire is 'z. Here, cin is reg, so the default value is displayed, that is 'x
One more thing, you must instantiate the design in your testbench. And connect respective ports. The outputs from testbench act as inputs to your design and vice-versa. This is just like you instantiate full_adder module in four_bit_adder module in design.
Consider testadder as top level module and instantiate design in it. No need of declaring ports as input and output in this module. Declare the design input ports as reg or wire(example: reg [15:0] a when a is design input port) and output ports as wire (example: wire [15:0] sum when sum is design input port).
Referring to your question:
The general rca I created calculated correctly, but for some reason I get weird outputs when I add a for loop and use the $random to generate.
Instead of using $random, use $urandom_range() to generate random numbers in some range. Using SystemVerilog constraints constructs can also help. Refer this link.
Using $urandom_range shall eliminate use of seed1 and seed2, it shall generate random values with some random machine seed.
Following is the module testadder with some of the changes required:
module testAdder();
wire [15:0] s;
wire cout;
// output [15:0] a,b;
// output cin;
reg [15:0] a,b;
reg cin;
integer i;
integer seed1=4;
integer seed2=5;
// Instantiate design here
four_bit_adder fa(a,b,s,cout,cin);
initial begin
// Monitor here, only single time
$monitor("a=%d, b=%d, cin=%d, s=%d, cout=%d",a,b,cin,s,cout);
for(i=0; i<5000; i=i+1) begin
// Drive inputs with some delays.
#10;
// URANDOM_RANGE for input generation in a range
a = $urandom_range(0,15);
b = $urandom_range(0,15);
// a = $random(seed1);
// b = $random(seed2);
// Drive cin randomly.
cin = $random;
$display("a=%d, b=%d, cin=%d, s=%d, cout=%d",a,b,cin,s,cout);
end
end
endmodule
For more information, have a look at sample testbench at this link.

Connect 5-bit bus to 32-bit output bus

My design needs multiple multiplexers, all of them have two inputs and most are 32 bits wide. I started with designing the 32 bit, 2:1 multiplexer.
Now I need a 5 bit, 2:1 multiplexer and I want to reuse my 32 bit design. Connecting the inputs is easy (see code below), but I struggle to connect the output.
This is my code:
reg [4:0] a, b; // Inputs to the multiplexer.
reg select; // Select multiplexer output.
wire [4:0] result; // Output of the multiplexer.
multiplex32_2 mul({27'h0, a}, {27'h0, b}, select, result);
When I run the code through iverilog, I get a warning that says that the multiplexer expects a 32 bit output, but the connected bus is only 5 bit wide. The simulation shows the expected results, but I want to get rid of the warning.
Is there a way to tell iverilog to ignore the 27 unused bits of the multiplexer output or do I have to connect a 32 bit wide bus to the output of the multiplexer?
I don't know of a #pragma or something like that (similar to #pragma argsused from C) that can be used in Verilog.
Xilinx ISE, for example, has a feature called "message filtering", which allows the designer to silence specific warning messages. You find them once, select them, choose to ignore, and subsequent synthesis won't trigger those warnings.
Maybe you can design your multiplexer in a way you don't need to "waste" connections (not actually wasted though, as the synthesizer will prune unused connections from the netlist). A more elegant solution would be to use a parametrized module, and instantiate it with the required width. Something like this:
module mux #(parameter WIDTH=32) (
input wire [WIDTH-1:0] a,
input wire [WIDTH-1:0] b,
input wire sel,
output wire [WIDTH-1:0] o
);
assign o = (sel==1'b0)? a : b;
endmodule
This module has been tested with this simple test bench, which shows you how to instantiate a module with params:
module tb;
reg [31:0] a1,b1;
reg sel;
wire [31:0] o1;
reg [4:0] a2,b2;
wire [4:0] o2;
mux #(32) mux32 (a1,b1,sel,o1);
mux #(5) mux5 (a2,b2,sel,o2);
// Best way to instantiate them:
// mux #(.WIDTH(32)) mux32 (.a(a1),.b(b1),.sel(sel),o(o1));
// mux #(.WIDTH(5)) mux5 (.a(a2),.b(b2),.sel(sel),.o(o2));
initial begin
$dumpfile ("dump.vcd");
$dumpvars (1, tb);
a1 = 32'h01234567;
b1 = 32'h89ABCDEF;
a2 = 5'b11111;
b2 = 5'b00000;
repeat (4) begin
sel = 1'b0;
#10;
sel = 1'b1;
#10;
end
end
endmodule
You can test it yourself using this Eda Playground link:
http://www.edaplayground.com/x/Pkz
I think the problem relates to the output of the multiplexer which is still 5 bits wide. You can solve it by doing something like this:
reg [4:0] a, b; // Inputs to the multiplexer.
reg select; // Select multiplexer output.
wire [31:0] temp;
wire [4:0] result; // Output of the multiplexer.
multiplex32_2 mul({27'h0, a}, {27'h0, b}, select, temp);
assign result = temp[4:0];
This can be easily tested in http://www.edaplayground.com/ using the code below:
( I have re-used #mcleod_ideafix's code)
// Code your testbench here
// or browse Examples
module mux #(parameter WIDTH=32) (
input wire [WIDTH-1:0] a,
input wire [WIDTH-1:0] b,
input wire sel,
output wire [WIDTH-1:0] o
);
assign o = (sel==1'b0)? a : b;
endmodule
module tb;
reg [31:0] a,b;
wire [31:0] o;
wire [4:0] r;
reg sel;
initial begin
$dumpfile("dump.vcd"); $dumpvars;
a = 10; b = 20; sel = 1;
end
mux MM(a,b,sel,o);
assign r = o[4:0];
endmodule
Let me know if you are still getting a warning.

Connecting a 4 bit shift register output to a 4 bit input in another module in Verilog

For our school project I am trying to use linear feedback shift register for pseudo-random number generation on hardware (seven segment). I have written the LFSR and seven segment module, however I have trouble connecting the two modules with each other. The project synthesizes but the HDL Diagram does not show any connection between LFSR and seven segment module. Below is the code.
//main module
module expo(input clock, reset,
output a,b,c,d,e,f,g
);
wire [3:0]connect, clk, a,b,c,d,e,f,g;
LFSR_4_bit lfsr(
.clock(clock),
.LFSR(connect)
);
seven_seg seven(
.in(connect),
.reset(reset),
.a(a),
.b(b),
.c(c),
.d(d),
.e(e),
.f(f),
.g(g)
);
endmodule
//LFSR module
module LFSR_4_bit(
input clock,
output reg[3:0]LFSR = 15
);
wire feedback = LFSR[4];
always #(posedge clock)
begin
LFSR[0] <= feedback;
LFSR[1] <= LFSR[0];
LFSR[2] <= LFSR[1];
LFSR[3] <= LFSR[2] ^ feedback;
LFSR[4] <= LFSR[3];
end
endmodule
//input and output for seven seg module
module sevenseg(
input reset,
input[3:0] in, //the 4 inputs for each display
output a, b, c, d, e, f, g, //the individual LED output for the seven segment along with the digital point
output [3:0] an // the 4 bit enable signal
);
Thanks for the help.
1) You instantiate seven_seg but the module is called module sevenseg This is a compile error.
2) Your LFSR has 4 bits 0 to 3, a fifth bit LFSR[4] is used, this is also a compile error.
Due to the compile errors I am not sure that your viewing the results of the current synthesis, as it should have failed. It is quite likely that you are viewing an old result before they were connected.
Other things I would change:
a) When you define wire [3:0]connect, clk, a,b,c,d,e,f,g; they are all 4 bits.
However as clock (not clk) and a,b,c,d,e,f,g are defined in your port list they are already declared. That line could just be wire [3:0]connect.
b) When initialising values for flip-flop and not using a reset it is better practise to use an initial begin : This is valid for FPGA's not for ASICs where you should use reset signals
initial begin
LFSR = 4'd15;
end

How can i make my verilog shifter more general?

Here i have a shifter but as of rite now it only works for up to 3 bits. I've been looking and i can't find out how to make it work for up to 8 bits.
module shifter(a,b,out);
input [7:0] a, b;
output [7:0] out;
wire [7:0] out1, out2, out3;
mux_8b_2to1 first(a[7:0], {a[3:0],a[7:4]}, b[2], out1);
mux_8b_2to1 second(out1[7:0], {out1[5:0],out1[7:6]}, b[1], out2);
mux_8b_2to1 third(out2[7:0], {out2[6:0],out2[7]}, b[0], out);
endmodule
What you have is a Barrel Shifter. Two ways to make it more generic are make it a functional model (still synthesis-able) or structural model with a generate block. Both approaches follow IEEE Std 1364-2001 (aka Verilog-2001).
The functional generic approach for a barrel shifter only needs a down-shifter. The general function is out = {in,in} >> (WIDTH-shift) where leftover bits can be ignored. To protect for double-roll (i.e. shift > WIDTH ), use the mod operator on the shift (WIDTH-(shift%WIDTH)).
module barrel_shifter_functional #( parameter CTRL=3, parameter WIDTH=CTRL**2 )
( input wire [WIDTH-1:0] in,
input wire [ CTRL-1:0] shift,
output wire [WIDTH-1:0] out );
assign out = {2{in}} >> (WIDTH-(shift%WIDTH));
endmodule
The structural generic approach for a barrel shifter needs a generate block. The for loop in the generate block will unravel at compile time, not run time like a for loop like in an always block. To keep it generic also have have the 2-to-1 mux have a parametrized width. FYI, you can use the generate block with functional code too, for example comment out the mux_2to1 instantiation and uncomment the assign statement below it. Learn more about the generate block by reading IEEE Std 1800-2012 ยง 27. Generate constructs.
module barrel_shifter_structeral #( parameter CTRL=3, parameter WIDTH=CTRL**2 )
( input wire [WIDTH-1:0] in,
input wire [ CTRL-1:0] shift,
output wire [WIDTH-1:0] out );
wire [WIDTH-1:0] tmp [CTRL:0];
assign tmp[CTRL] = in;
assign out = tmp[0];
genvar i;
generate
for (i = 0; i < CTRL; i = i + 1) begin : mux
mux_2to1 #(.WIDTH(WIDTH)) g(
.in0(tmp[i+1]),
.in1({tmp[i+1][WIDTH-(2**i)-1:0],tmp[i+1][WIDTH-1:WIDTH-(2**i)]}),
.sel(shift[i]),
.out(tmp[i]) );
// assign tmp[i] = shift[i] ? {tmp[i+1][WIDTH-(2**i)-1:0],tmp[i+1][WIDTH-1:WIDTH-(2**i)]} : tmp[i+1];
end : mux
endgenerate
endmodule
module mux_2to1 #( parameter WIDTH=8 )
( input wire [WIDTH-1:0] in0, in1,
input wire sel,
output wire [WIDTH-1:0] out );
assign out = sel ? in1 : in0;
endmodule
Both examples are functionally equivalent and synthesize provided CTRL is less than or equal to the ceiling of log2(WIDTH). Synthesis will likely give different results. The generate method will exclusively use 2-to-1 muxes while the pure functional method will depend on the quality of the optimizer.
Working example # http://www.edaplayground.com/s/6/500
I've used the >> and << operators to generate a synthetizable design using ISEWebPack, as this:
module shifter(
input wire [7:0] a,
input wire [7:0] b,
input wire leftright, // 0=shift right, 1=shift left
output reg [7:0] out
);
always #* begin
if (leftright==0)
out = a>>b;
else
out = a<<b;
end
endmodule
This way, the symthesis tool will know that you want to implement a shifter and can use its own macros to best synthetize it:
Synthesizing Unit <shifter>.
Related source file is "shifter.v".
Found 8-bit shifter logical right for signal <out$shift0002> created at line 30.
Found 8-bit shifter logical left for signal <out$shift0003> created at line 32.

Verilog: trying to blink leds in series using a clock divider at multiple frequencies

I'm trying to use two switches to select the frequency I want to blink the led's at. My verilog code is as follows:
`timescale 1ns / 1ps
module clk_divider(
input clk,
input rst,
input [1:0] sw,
output led
);
reg n;
always#(sw[0],sw[1])
n = (27 - sw);
wire [n-1:0] din;
wire [n-1:0] clkdiv;
dff dff_inst0 (
.clk(clk),
.rst(rst),
.D(din[0]),
.Q(clkdiv[0])
);
genvar i;
generate
for (i = 1; i < n; i=i+1)
begin : dff_gen_label
dff dff_inst (
.clk(clkdiv[i-1]),
.rst(rst),
.D(din[i]),
.Q(clkdiv[i])
);
end
endgenerate;
assign din = ~clkdiv;
assign led = clkdiv[n-1];
endmodule
When I check for syntax, it says that "n is not constant." How can I avoid this error? To me, it seems that it should work. Any help would be appreciated!!!
With respect to wire [n-1:0] din; and wire [n-1:0] clkdiv;, you cannot have the width of a bus dependent on the value of an input.
A bus width is defined at synthesis time, it is the number of wires that exist in the physical device. Wires cannot appear or disappear based on the state of a module input or register.
You need to define these wires as having a fixed width, not a dynamic width. Maybe in some cases not all the wires will be used, but you must still define the bus as the maximum width that you will ever need. Similarly in the generate loop, you cannot change the number of flip-flops that are instantiated based on the value of n. You must instantiate as many flip flops as you will ever need, and then enable/disable some as needed.
Also you will hit this separate issue later, but your register n is only a single bit, so it cannot store any number other than 0 or 1. Make the register larger if you intend to hold greater values.

Resources