Two ways to write pipeline in verilog, which one is better? - verilog

I learned two ways of writing pipeline(unblocking and blocking), I wonder which is better?
My personal opinion is that the second one is tedious and I don't understand why so many wire are needed.
Also, is there any standard style(template) of writing pipeline like FSM in verilog?
Thanks in advance.
module simplePipeline
#(
parameter WIDTH = 100
)
(
input clk,
input [WIDTH - 1 : 0] datain,
output [WIDTH - 1: 0] dataout
);
reg [WIDTH - 1 : 0] piprData1;
reg [WIDTH - 1 : 0] piprData2;
reg [WIDTH - 1 : 0] piprData3;
always #(posedge clk) begin
piprData1 <= datain;
end
always #(posedge clk) begin
piprData2 <= piprData1;
end
always #(posedge clk) begin
piprData3 <= piprData2;
end
assign dataout = piprData3;
endmodule
module blockingPipeline#2
(
parameter WIDTH = 100
)
(
input clk,
input rst,
input validIn,
input [WIDTH - 1 : 0] dataIn,
input outAllow,
output wire validOut,
output wire [WIDTH - 1 : 0] dataOut
);
reg pipe1Valid;
reg [WIDTH - 1 : 0] pipe1Data;
reg pipe2Valid;
reg [WIDTH - 1 : 0] pipe2Data;
reg pipe3Valid;
reg [WIDTH - 1 : 0] pipe3Data;
/*------------------------PIPE1 LOGIC------------------------*/
wire pipe1AllowIn;
wire pipe1ReadyGo;
wire pipe1ToPipe2Valid;
assign pipe1ReadyGo = 1'b1
assign pipe1AllowIn = ! pipe1Valid || pipe1ReadyGo && pipe2AllowIn;
assign pipe1ToPipe2Valid = pipe1Valid && pipe1ReadyGo
always #(posedge clk)begin
if( rst ) begin
pipe1Vali <= 1'b0;
end
else if(pipe1AllowIn)begin
pipe1Valid <= validIn;
end
ifvalidIn && pipe1AllowIn)begin
pipe1Data <= dataIn;
end
end
/*------------------------PIPE2 LOGIC------------------------*/
wire pipe2AllowIn;
wire pipe2ReadyGo;
wire pipe2ToPipe3Valid;
assign pipe2ReadyGo = 1'b1;
assign pipe2AllowIn = ! pipe2Valid || pipe2ReadyGo && pipe3AllowIn;
assign pipe2ToPipe3Valid = pipe2Valid && pipe3ReadyGo;
always #(posedge clk)begin
if( rst ) begin
pipe2Valid <= 1'b0;
end
else if(pipe2AllowIn)begin
pipe2Valid <= pipe1ToPipe2Valid;
end
if(pipe1ToPipe2Valid && pipe2AllowIn)begin
pipe2Data <= pipe1Data;
end
end
/*------------------------PIPE3 LOGIC------------------------*/
wire pipe3AllowIn;
wire pipe3ReadyGo;
assign pipe3ReadyGo = 1'b1;
assign pipe3AllowIn = ! pipe3Valid || pipe3ReadyGo && outAllow;
always #(posedge clk)begin
if( rst ) begin
pipe3Valid <= 1'b0;
end
else if(pipe3AllowIn)begin
pipe3Valid <= pipe2ToPipe3Valid;
end
if(pipe2ToPipe3Valid && pipe3AllowIn)begin
pipe3Data <= pipe2Data;
end
end
assign validOut = pipe3Valid && pipe3ReadyGo;
assign dataOut = pipe3Data;
endmodule

The problem with the first version is that there seems to be no clock gate at all. Unless your clock is well gated on a higher level or the pipeline is used every cycle you will waste a lot of power by (unnecessarily) toggling each stage of the pipeline every cycle.

As good practice, the second one seems "better" in the sense that when you design a hardware circuit part you might want offer great control possibilities. Generally speaking, since your code will be implemented in the silicon forever (or in your FPGA with painful reconfiguration) that would be a real problem to not have enough controls capacities, because you can't really add some afterward.
As an example you already mentioned in the comments that you'll have to stall the pipeline, so of course you need more wires to do it. You will also need to reset the pipeline sometimes, this is the purpose of the rst signal.
However, more signals means more silicon surface (or more FPGA resources using) which generally come with a greater price. One can argue that only one or two wires will not makes such great difference for a chip, which is true, but if you re-use your pipeline thousand times on the circuit it will be much more significant.
For me the first implementation can be relevant only with really strong optimization requirements at circuit level like for an ASIC, where you exactly know the overall wanted behavior.

Related

Trouble understanding simulation/module behavior

I implemented a very simple counter with preset functionality (code reproduced below).
module counter
#(
parameter mod = 4
) (
input wire clk,
input wire rst,
input wire pst,
input wire en,
input wire [mod - 1:0] data,
output reg [mod - 1:0] out,
output reg rco
);
parameter max = (2 ** mod) - 1;
always #* begin
if(out == max) begin
rco = 1;
end else begin
rco = 0;
end
end
always #(posedge clk) begin
if(rst) begin
out <= 0;
end else if(pst) begin
out <= data;
end else if(en) begin
out <= out + 1;
end else begin
out <= out;
end
end
endmodule
I am having trouble understanding the following simulation result. With pst asserted and data set to 7 on a rising clock edge, the counter's out is set to data, as expected (first image below. out is the last signal, data is the signal just above, and above that is pst.). On the next rising edge, I kept preset asserted and set data to 0. However, out does not follow data this time. What is the cause of this behavior?
My thoughts
On the rising clock edge where I set data to 0, I notice that out stays at 7, and doesn't increment to 8. So I believe that the counter is presetting, but with the value 7, not 0. If I move the data transition from 7 to 0 up in time, out gets set to 0 as expected (image below). Am I encountering a race condition?
Testbenches
My initial testbench code that produced the first image is reproduced below. I show the changes I made to get coherent results as comments.
parameter mod = 4;
// ...
reg pst;
reg [mod - 1:0] data;
// ...
#(posedge clk); // ==> #(negedge clk)
data = 7;
pst = 1;
#(posedge clk); // ==> #(negedge clk)
data 0;
pst = 1;
#(posedge clk); // ==> #(negedge clk)
pst = 0;
#(posedge clk);
// ...
You have a race condition test bench. The Verilog scheduler is allowed to evaluate any # triggered in the time step in any order it chooses. All code after the granted # will execute until it hits another time blocking statement. In your waveform it looks like data and pst from the from the test bench are sometimes being assigned before the design samples them and sometimes after.
The solution is simple, use non-blocking assignments (<=). Refer to What is the difference between = and <= in Verilog?
#(posedge clk);
data <= 7;
pst <= 1;
#(posedge clk);
data <= 0;
pst <= 1;
#(posedge clk);
pst <= 0;
#(posedge clk);
I am able to obtain correct, predictable behavior if I modify my testbench to only modify input signals to my counter on falling clock edges rather than on rising clock edges (as it should be anyways). My best guess as to why the above behavior was occurring is that changing input signals at the same time the counter module is programmed to sample its inputs leads to undefined simulator behavior.

modelsim programming 60 counter (error loading design)

My code is compiling well, but it does not work when i simulate it.
It displays "error loading design".
i think that input and output port is wrong among these modules.
but i can not find them..
please help me where the error is in my code.
module tb_modulo_60_binary;
reg t_clk, reset;
wire [7:0] t_Y;
parameter sec = 30;
always #(sec) t_clk = ~t_clk;
modulo_60_binary M1 (t_Y, t_clk, reset);
initial begin
t_clk = 1; reset =1; #10;
reset = 0; #3050;
$finish;
end
endmodule
module modulo_60_binary(y, clk, reset);
output [7:0] y;
input reset, clk;
wire TA1, TA2, TA3, JA2, JA4;
reg [7:0] y;
assign TA1 = 1;
assign TA2 = (~y[6]) && y[4];
assign TA3 = (y[5] && y[4]) || (y[6] && y[4]);
assign JA2 = ~y[3];
assign JA4 = y[1]&&y[2];
jk_flip_flop JK1 (1, 1, clk, y[0]);
jk_flip_flop JK2 (JA2, 1, y[0], y[1]);
jk_flip_flop JK3 (1, 1, y[1], y[2]);
jk_flip_flop JK4 (JA4, 1, y[1], y[3]);
t_flip_flop T1 (TA1, clk, y[4]);
t_flip_flop T2 (TA2, clk, y[5]);
t_flip_flip T3 (TA3, clk, y[6]);
always #(negedge clk)
begin
if(reset)
y <= 8'b00000000;
else if(y == 8'b01110011)
y <= 8'b00000000;
end
endmodule
module t_flip_flop(t, clk, q);
input t, clk;
output q;
reg q;
initial q=0;
always #(negedge clk)
begin
if(t == 0) q <= q;
else q <= ~q;
end
endmodule
module jk_flip_flop(j, k, clk, Q);
output Q;
input j, k, clk;
reg Q;
always #(negedge clk)
if({j,k} == 2'b00) Q <= Q;
else if({j,k} == 2'b01) Q <= 1'b0;
else if({j,k} == 2'b10) Q <= 1'b1;
else if({j,k} == 2'b11) Q <= ~Q;
endmodule
Your y signal in modulo_60_binary is being driven in two places:
By bit JK# and T# instances
The reset logic that assigns all bits of y to zeros
Flops and comb-logic must have one clear driver. This is one of the fundamental differences between software and hardware languages.
The rest of my answer assuming the use of the JK and T flops are a design requirement. Therefore you need to delete the always block that assigns y to zeros and and make y a wire type.
Fixing the logic to the T flops is easy. Simply add a conditional statement. Example:
wire do_rst = reset || (y == 8'b01110011);
assign TA1 = do_rst ? y[4] : 1;
assign TA2 = do_rst ? y[5] : (~y[6]) && y[4];
assign TA3 = do_rst ? y[6] : (y[5] && y[4]) || (y[6] && y[4]);
The JK flops is harder because the output of one flop is the clock of another. I'll advice that the clock input for each JK flop should be clk, otherwise you are asking for a design headache for reset when it's y bits are non power of two minus one values (eg 1,3,7,15). This means you need to re-evaluate your JA# logic and add KA# logic (hint the do_rst from above will help). I'm not going to do the work for you beyond this.
There is the option of the asynchronous reset approach, but for this design I will advice ageist it. The reset pulse could be too short on silicon with the conditional reset for y == a particular value(s), which can result in an undependable partial reset. You could add synthesis constraints/rules to keep the push wide enough, but that is just patching a brittle design. Better to design it robust at the beginning.
FYI: y[7] does not have a driver and the module declaration of instance T3 has a typo.

Verilog: wait for module logic evaluation in an always block

I want to use the output of another module inside an always block.
Currently the only way to make this code work is by adding #1 after the pi_in assignment so that enough time has passed to allow Pi to finish.
Relevant part from module pLayer.v:
Pi pi(pi_in,pi_out);
always #(*)
begin
for(i=0; i<constants.nSBox; i++) begin
for(j=0; j<8; j++) begin
x = (state_value[(constants.nSBox-1)-i]>>j) & 1'b1;
pi_in = 8*i+j;#1; /* wait for pi to finish */
PermutedBitNo = pi_out;
y = PermutedBitNo>>3;
tmp[(constants.nSBox-1)-y] ^= x<<(PermutedBitNo-8*y);
end
end
state_out = tmp;
end
Modllue Pi.v
`include "constants.v"
module Pi(in, out);
input [31:0] in;
output [31:0] out;
reg [31:0] out;
always #* begin
if (in != constants.nBits-1) begin
out = (in*constants.nBits/4)%(constants.nBits-1);
end else begin
out = constants.nBits-1;
end
end
endmodule
Delays should not be used in the final implementation, so is there another way without using #1?
In essence i want PermutedBitNo = pi_out to be evaluated only after the Pi module has finished its job with pi_in (=8*i+j) as input.
How can i block this line until Pi has finished?
Do i have to use a clock? If that's the case, please give me a hint.
update:
Based on Krouitch suggestions i modified my modules. Here is the updated version:
From pLayer.v:
Pi pi(.clk (clk),
.rst (rst),
.in (pi_in),
.out (pi_out));
counter c_i (clk, rst, stp_i, lmt_i, i);
counter c_j (clk, rst, stp_j, lmt_j, j);
always #(posedge clk)
begin
if (rst) begin
state_out = 0;
end else begin
if (c_j.count == lmt_j) begin
stp_i = 1;
end else begin
stp_i = 0;
end
// here, the logic starts
x = (state_value[(constants.nSBox-1)-i]>>j) & 1'b1;
pi_in = 8*i+j;
PermutedBitNo = pi_out;
y = PermutedBitNo>>3;
tmp[(constants.nSBox-1)-y] ^= x<<(PermutedBitNo-8*y);
// at end
if (i == lmt_i-1)
if (j == lmt_j) begin
state_out = tmp;
end
end
end
endmodule
module counter(
input wire clk,
input wire rst,
input wire stp,
input wire [32:0] lmt,
output reg [32:0] count
);
always#(posedge clk or posedge rst)
if(rst)
count <= 0;
else if (count >= lmt)
count <= 0;
else if (stp)
count <= count + 1;
endmodule
From Pi.v:
always #* begin
if (rst == 1'b1) begin
out_comb = 0;
end
if (in != constants.nBits-1) begin
out_comb = (in*constants.nBits/4)%(constants.nBits-1);
end else begin
out_comb = constants.nBits-1;
end
end
always#(posedge clk) begin
if (rst)
out <= 0;
else
out <= out_comb;
end
That's a nice piece of software you have here...
The fact that this language describes hardware is not helping then.
In verilog, what you write will simulate in zero time. it means that your loop on i and j will be completely done in zero time too. That is why you see something when you force the loop to wait for 1 time unit with #1.
So yes, you have to use a clock.
For your system to work you will have to implement counters for i and j as I see things.
A counter synchronous counter with reset can be written like this:
`define SIZE 10
module counter(
input wire clk,
input wire rst_n,
output reg [`SIZE-1:0] count
);
always#(posedge clk or negedge rst_n)
if(~rst_n)
count <= `SIZE'd0;
else
count <= count + `SIZE'd1;
endmodule
You specify that you want to sample pi_out only when pi_in is processed.
In a digital design it means that you want to wait one clock cycle between the moment when you are sending pi_in and the moment when you are reading pi_out.
The best solution, in my opinion, is to make your pi module sequential and then consider pi_out as a register.
To do that I would do the following:
module Pi(in, out);
input clk;
input [31:0] in;
output [31:0] out;
reg [31:0] out;
wire clk;
wire [31:0] out_comb;
always #* begin
if (in != constants.nBits-1) begin
out_comb = (in*constants.nBits/4)%(constants.nBits-1);
end else begin
out_comb = constants.nBits-1;
end
end
always#(posedge clk)
out <= out_comb;
endmodule
Quickly if you use counters for i and j and this last pi module this is what will happen:
at a new clock cycle, i and j will change --> pi_in will change accordingly at the same time(in simulation)
at the next clock cycle out_comb will be stored in out and then you will have the new value of pi_out one clock cycle later than pi_in
EDIT
First of all, when writing (synchronous) processes, I would advise you to deal only with 1 register by process. It will make your code clearer and easier to understand/debug.
Another tip would be to separate combinatorial circuitry from sequential. It will also make you code clearer and understandable.
If I take the example of the counter I wrote previously it would look like :
`define SIZE 10
module counter(
input wire clk,
input wire rst_n,
output reg [`SIZE-1:0] count
);
//Two way to do the combinatorial function
//First one
wire [`SIZE-1:0] count_next;
assign count_next = count + `SIZE'd1;
//Second one
reg [`SIZE-1:0] count_next;
always#*
count_next = count + `SIZE'1d1;
always#(posedge clk or negedge rst_n)
if(~rst_n)
count <= `SIZE'd0;
else
count <= count_next;
endmodule
Here I see why you have one more cycle than expected, it is because you put the combinatorial circuitry that controls your pi module in you synchronous process. It means that the following will happen :
first clk positive edge i and j will be evaluated
next cycle, the pi_in is evaluated
next cycle, pi_out is captured
So it makes sense that it takes 2 cycles.
To correct that you should take out of the synchronous process the 'logic' part. As you stated in your commentaries it is logic, so it should not be in the synchronous process.
Hope it helps

CPLD Breathing LED with flexible set points

This is my first post and my first attempt at using a PLD.
I have written some code to make a breathing LED with 7 set points. The code produces a pwm output according to the first set point. It then slowly increases/decreases the pwm towards the next set point (7 in total).
The code works but I think it can be done better as I need to put 16 instantiations of this into a Lattice 4256 CPLD (not possible with my code).
I am keen to learn how a professional Verilog programmer would tackle this.
Many thanks in advance for your support.
PWM Generation
module LED_breath (led, tmr_clk);
output reg led;
input tmr_clk;
reg [7:0] cnt;
reg [6:0] pwm_cnt;
reg [6:0] pwm_val;
reg [2:0] pat_cnt;
reg [9:0] delay_cnt;
reg [6:0] cur_pat;
reg [6:0] nxt_pat;
parameter pattern = {7'h00, 7'h00, 7'h00, 7'h00, 7'h00, 7'h00, 7'h00, 7'h00};
always #(posedge tmr_clk)
begin
pwm_cnt = cnt[7] ? ~cnt[6:0] : cnt[6:0]; //Generate triangle wave
if(pwm_cnt > pwm_val) //Generate pwm
led <= 1'b0;
if(pwm_cnt < pwm_val)
led <= 1'b1;
cnt = cnt + 1;
end
always #(posedge tmr_clk) //breathing pattern
begin
if(!delay_cnt) //Add delay
begin
cur_pat <= ((pattern >> (7*pat_cnt)) & 7'b1111111); //Find correct pattern No. from parameter list
if((pat_cnt+1) == 8) //Check for last pattern - overflow, set to 0
nxt_pat <= (pattern & 7'b1111111);
else
nxt_pat <= ((pattern >> (7*(pat_cnt+1))) & 7'b1111111);
if(pwm_val == nxt_pat) //If pwm is at max or min increment count to get next pattern
pat_cnt <= pat_cnt + 1;
if(cur_pat <= nxt_pat) //Current pattern < next pattern, count up
pwm_val <= pwm_val + 1;
if(cur_pat >= nxt_pat) //Current pattern < next pattern, count down
pwm_val <= pwm_val - 1;
end
delay_cnt <= delay_cnt + 1;
end
endmodule
module top (led_0, led_1, led_2, led_3);
output led_0;
output led_1;
output led_2;
output led_3;
defparam I1.TIMER_DIV = "128";
OSCTIMER I1 (.DYNOSCDIS(1'b0), .TIMERRES(1'b0), .OSCOUT(osc_clk), .TIMEROUT(tmr_clk));
LED_breath #(.pattern({7'h20, 7'h70, 7'h50, 7'h70, 7'h40, 7'h10, 7'h60, 7'h10})) led_A(
.led (led_0),
.tmr_clk (tmr_clk)
);
LED_breath #(.pattern({7'h70, 7'h10, 7'h30, 7'h20, 7'h60, 7'h40, 7'h70, 7'h10})) led_B(
.led (led_1),
.tmr_clk (tmr_clk)
);
LED_breath #(.pattern({7'h10, 7'h30, 7'h10, 7'h18, 7'h40, 7'h50, 7'h30, 7'h60})) led_C(
.led (led_2),
.tmr_clk (tmr_clk)
);
LED_breath #(.pattern({7'h50, 7'h70, 7'h40, 7'h50, 7'h40, 7'h70, 7'h60, 7'h70})) led_D(
.led (led_3),
.tmr_clk (tmr_clk)
);
endmodule
Can you explain a bit what you are trying to achieve in this always block ?
always #(posedge tmr_clk)
I think you're using a fixed frequency and changing duty cycle to get desired breathing effect.
1) Is my thinking correct ?
2) If yes, how do you decide when to change the pattern ?

4 bit countetr using verilog not incrementing

Sir,
I have done a 4 bit up counter using verilog. but it was not incrementing during simulation. A frequency divider circuit is used to provide necessory clock to the counter.please help me to solve this. The code is given below
module my_upcount(
input clk,
input clr,
output [3:0] y
);
reg [26:0] temp1;
wire clk_1;
always #(posedge clk or posedge clr)
begin
temp1 <= ( (clr) ? 4'b0 : temp1 + 1'b1 );
end
assign clk_1 = temp1[26];
reg [3:0] temp;
always #(posedge clk_1 or posedge clr)
begin
temp <= ( (clr) ? 4'b0 : temp + 1'b1 );
end
assign y = temp;
endmodule
Did you run your simulation for at least (2^27) / 2 + 1 iterations? If not then your clk_1 signal will never rise to 1, and your counter will never increment. Try using 4 bits for the divisor counter so you won't have to run the simulation for so long. Also, the clk_1 signal should activate when divisor counter reaches its max value, not when the MSB bit is one.
Apart from that, there are couple of other issues with your code:
Drive all registers with a single clock - Using different clocks within a single hardware module is a very bad idea as it violates the principles of synchronous design. All registers should be driven by the same clock signal otherwise you're looking for trouble.
Separate current and next register value - It is a good practice to separate current register value from the next register value. The next register value will then be assigned in a combinational portion of the circuit (not driven by the clock) and stored in the register on the beginning of the next clock cycle (check code below for example). This makes the code much more clear and understandable and minimises the probability of race conditions and unwanted inferred memory.
Define all signals at the beginning of the module - All signals should be defined at the beginning of the module. This helps to keep the module logic as clean as possible.
Here's you example rewritten according to my suggestions:
module my_counter
(
input wire clk, clr,
output [3:0] y
);
reg [3:0] dvsr_reg, counter_reg;
wire [3:0] dvsr_next, counter_next;
wire dvsr_tick;
always #(posedge clk, posedge clr)
if (clr)
begin
counter_reg <= 4'b0000;
dvsr_reg <= 4'b0000;
end
else
begin
counter_reg <= counter_next;
dvsr_reg <= dvsr_next;
end
/// Combinational next-state logic
assign dvsr_next = dvsr_reg + 4'b0001;
assign counter_next = (dvsr_reg == 4'b1111) ? counter_reg + 4'b0001 : counter_reg;
/// Set the output signals
assign y = counter_reg;
endmodule
And here's the simple testbench to verify its operation:
module my_counter_tb;
localparam
T = 20;
reg clk, clr;
wire [3:0] y;
my_counter uut(.clk(clk), .clr(clr), .y(y));
always
begin
clk = 1'b1;
#(T/2);
clk = 1'b0;
#(T/2);
end
initial
begin
clr = 1'b1;
#(negedge clk);
clr = 1'b0;
repeat(50) #(negedge clk);
$stop;
end
endmodule

Resources