Iteration limit when implementing a multicycled processor - verilog

I'm trying to implement a simple multicycle processor and I ran into some problems that I don't seem to be getting through. The code is below. I'm just experimenting right now to get this flowing. When I'm done, I'll begin implementing instructions and ALU. However, I'm stuck at this point. In the code below, I'm aware that data_memory is never used (I'll get to there if I can resolve this), some inputs and outputs are also not used for now, x1 and x2 are just variables I created to see what's really going on. What's in definitions.v file is self-evident.
I'm using Altera Quartus 15.1 with Verilog2001. This code compiles fine excepts some warnings due to unused stuff but when I try to simulate it with a clock period of 20ns it gives an error saying that "Error (suppressible): (vsim-3601) Iteration limit 5000 reached at time 100 ns". It also says this is suppressible but I don't know how to suppress either.
I looked up for this error and I learned that this is happening because at some point the code goes into an infinite loop. I tried to solve this by creating another variable ok. A cycle will start by setting ok to 0 and after microoperations for that cycle are done, I set ok to 1. So the cycle will not change at an improper time (it's like locking the cycle). Unfortunately, this resulted with the same error.
I tried another flow, too. Instead of cycle and next_cycle, I created one variable for cycle. At every rising edge of clock, I checked the current state and did things accordingly, then set the cycle for next step. Example:
always # (posedge clk) begin
case (cycle)
3'b000: begin
MAR <= PC;
cycle <= 3'b001;
ire <= 1'b1;
x2 <= 2'b00;
3'b001: begin
...
...
This also compiles fine, and can be simulated without error! However, is not functioning correctly, giving weird(or unexpected) results. I find other approach more intuitive. So I will try to make it work.
How can I resolve/implement this?
`include "definitions.v"
module controller(
input clk,
input nres,
output reg ire,
output reg dwe,
output reg dre,
output reg [1:0] x2,
output reg [`IADR_WIDTH-1:0] i_address,
output reg [`DADR_WIDTH-1:0] d_address,
output reg [`DATA_WIDTH-1:0] data_out);
reg [2:0] cycle = 3'b000;
reg [2:0] next_cycle;
reg [`IADR_WIDTH-1:0] PC = 6'b000000;
reg [`INST_WIDTH-1:0] IR = 12'b00000_0000000;
reg [`DADR_WIDTH-1:0] MAR = 6'b000000;
reg [4:0] OPC = 5'b00000;
wire [`DATA_WIDTH-1:0] data_in;
wire [`INST_WIDTH-1:0] instruction;
reg [1:0] x1;
data_memory dmem ( .clk (clk),
.dwe (dwe),
.dre (dre),
.nres (nres),
.d_address (d_address),
.d_data (data_out),
.d_q (data_in));
instruction_memory imem ( .clk (clk),
.ire (ire),
.i_address (i_address),
.i_q (instruction));
reg ok = 1;
always # (posedge clk) begin
cycle = (ok) ? next_cycle : cycle;
end
always # (cycle) begin
case (cycle)
3'b000: begin
ok = 0;
MAR = PC;
next_cycle = 3'b001;
ire = 1'b1;
x2 = 2'b00;
ok = 1;
end
3'b001: begin
ok = 0;
i_address = MAR;
IR = instruction;
ire = 1'b0;
next_cycle = 3'b010;
x2 = 2'b01;
ok = 1;
end
3'b010: begin
ok = 0;
OPC = IR;
next_cycle = 3'b011;
x2 = 2'b10;
ok = 1;
end
3'b011: begin
ok = 0;
if (OPC==5'b01011) x1 = 2'b11;
PC = PC + 1;
next_cycle = 3'b000;
x2 = 2'b11;
ok = 1;
end
endcase
end
endmodule

When we write always #(signal) in verilog, a specified sensitivity list, the logic is triggered on a change in that signal. This can lead to misunderstanding of how hardware actually works. The only hardware we have that changes on an edge is a flip-flop and you need to specify the posedge or negedge keyword for that.
When always #(signal) is synthesised you actually get a combinatorial block, which has the effect of behaving like always #(*). This is an automatic sensitivity list.
So from the comments we will look at this small section of code:
always # (*) begin
case (cycle)
3'b011: begin
ok = 0;
if (OPC==5'b01011) x1 = 2'b11;
PC = PC + 1;
next_cycle = 3'b000;
x2 = 2'b11;
ok = 1;
This a combinatorial block, triggered in the simulator when anything which can effect he output changes. Most signals are assigned to static signals, or other known values with out loops.
PC = PC +1;
The above line though updates the value of PC, this new value of PC should trigger the combinatorial block to be re-evaluated, hitting the PC increment again, etc. This all happens inside the delta cycle of the simulator.
With hardware description languages (HDLs) like Verilog we have to remember that we are describing parallel statements, not serially executed lines of code.

Related

How does verilog treat input values to if statements in always_ff blocks

I'm currently working on a pipelined MIPS cpu using Icarus Verilog and have come across some very strange behaviour when using an if statement within an always_ff loop. I'm currently testing this implementation of a PC block:
module PC (
input logic clk,
input logic rst,
input logic[31:0] PC_JVal,
input logic jump_en,
input logic branch_en,
input logic PC_Stall,
output logic [31:0] PC_Out,
output logic fetch_stall,
output logic active,
output logic [2:0] check
);
// Active is completely dependent on the value of the PC.
// JUMP_EN --> PC = JVAL
// BRANCH_EN --> PC = PC + JVAL
// PC_Stall --> PC = PC
reg [31:0] PC;
logic [31:0] branchSignExt = (PC_JVal[15] == 1) ? {16'hFFFF, PC_JVal[15:0]} : {16'h0000, PC_JVal[15:0]};
logic start;
assign fetch_stall = PC_Stall;
assign active = (PC != 0) ? 1 : 0;
assign PC_Out = (active == 0) ? 0 : ( (PC_Stall == 1) ? PC + 4 : ( (jump_en == 1) ? PC_JVal : ( (branch_en == 1) ? PC + branchSignExt : PC + 4 ) ) );
initial begin
PC = 0;
start = 0;
check = 0;
end
always_ff # (posedge clk) begin
check[1] <= ~check[1];
if (rst) begin
start <= 1;
end
else if (active) begin
if (PC_Stall) begin
PC <= PC;
check[0] <= ~check[0];
end
else if (jump_en) begin
PC <= PC_JVal;
end
else if (branch_en) begin
PC <= PC + branchSignExt;
end
else begin
PC <= PC + 4;
end
end
end
always_ff # (negedge rst) begin
if (start) begin
PC <= 32'hBFBFFFFC;
start <= 0;
end
end
endmodule
And am running the following testbench:
module PC_TB ();
logic clk;
logic rst;
logic[31:0] PC_JVal;
logic jump_en;
logic branch_en;
logic PC_Stall;
logic [31:0] PC_Out;
logic fetch_stall;
logic active;
logic [2:0] check;
initial begin
$dumpfile("PC_TB.vcd");
$dumpvars(0, PC_TB);
clk = 0;
jump_en = 0;
PC_Stall = 0;
branch_en = 0;
rst = 0;
repeat(100) begin
#50; clk = ~clk;
end
$fatal(1, "Timeout");
end
initial begin
# (posedge clk);
# (posedge clk);
# (posedge clk);
rst = 1;
# (posedge clk);
# (posedge clk);
# (posedge clk);
rst = 0;
# (posedge clk);
# (posedge clk);
# (posedge clk);
PC_Stall = 1;
# (posedge clk);
PC_Stall = 0;
# (posedge clk);
# (posedge clk);
end
PC PC(.clk(clk), .rst(rst), .PC_JVal(PC_JVal), .jump_en(jump_en), .branch_en(branch_en), .PC_Stall(PC_Stall), .PC_Out(PC_Out), .fetch_stall(fetch_stall), .active(active), .check(check));
endmodule
The issue I'm having is that how the if statement checking for PC_Stall is evaluated seems to alternate between clock cycles and I have no clue why.
I get the following VCD output when running it with the test bench as is (not the desired output), with the PC Stall not really happening (the PC value should remain for 2 cycles, but here it is only for one.)
Stall lasts 1 Cycle
Then by just shifting the point at which the PC_Stall is asserted forward by one cycle, results in Stall lasting 3 cycles, even though its only asserted for 1.
Stall lasts 3 cycles
I've been really stuck on this and genuinely have no idea what is wrong, and I would appreciate the help.
iverilog does not have very good support for SystemVerilog features yet. If you compile your code on other simulators, such as VCS on edaplayground, you will get compile errors. For example:
Error-[ICPD] Illegal combination of drivers
Illegal combination of procedural drivers
Variable "check" is driven by an invalid combination of procedural drivers.
Variables written on left-hand of "always_ff" cannot be written to by any
other processes, including other "always_ff" processes.
This variable is declared at : logic [2:0] check;
The first driver is at : always_ff #(posedge clk) begin
check[1] <= (~check[1]);
...
The second driver is at : check = 0;
You must fix all such errors.
Note, several simulators are available on edaplayground if you sign up for a free account.
So it appears to be a compiler issue regarding how conditionals are treated when both inputs to said conditionals change and the conditionals themselves are executed on a positive clock edge.
The issue was fixed by adding a small delay just before said conditional, to give the values time to update or something, not sure and this seems like quite a botched solution, it works though.

Trouble understanding simulation/module behavior

I implemented a very simple counter with preset functionality (code reproduced below).
module counter
#(
parameter mod = 4
) (
input wire clk,
input wire rst,
input wire pst,
input wire en,
input wire [mod - 1:0] data,
output reg [mod - 1:0] out,
output reg rco
);
parameter max = (2 ** mod) - 1;
always #* begin
if(out == max) begin
rco = 1;
end else begin
rco = 0;
end
end
always #(posedge clk) begin
if(rst) begin
out <= 0;
end else if(pst) begin
out <= data;
end else if(en) begin
out <= out + 1;
end else begin
out <= out;
end
end
endmodule
I am having trouble understanding the following simulation result. With pst asserted and data set to 7 on a rising clock edge, the counter's out is set to data, as expected (first image below. out is the last signal, data is the signal just above, and above that is pst.). On the next rising edge, I kept preset asserted and set data to 0. However, out does not follow data this time. What is the cause of this behavior?
My thoughts
On the rising clock edge where I set data to 0, I notice that out stays at 7, and doesn't increment to 8. So I believe that the counter is presetting, but with the value 7, not 0. If I move the data transition from 7 to 0 up in time, out gets set to 0 as expected (image below). Am I encountering a race condition?
Testbenches
My initial testbench code that produced the first image is reproduced below. I show the changes I made to get coherent results as comments.
parameter mod = 4;
// ...
reg pst;
reg [mod - 1:0] data;
// ...
#(posedge clk); // ==> #(negedge clk)
data = 7;
pst = 1;
#(posedge clk); // ==> #(negedge clk)
data 0;
pst = 1;
#(posedge clk); // ==> #(negedge clk)
pst = 0;
#(posedge clk);
// ...
You have a race condition test bench. The Verilog scheduler is allowed to evaluate any # triggered in the time step in any order it chooses. All code after the granted # will execute until it hits another time blocking statement. In your waveform it looks like data and pst from the from the test bench are sometimes being assigned before the design samples them and sometimes after.
The solution is simple, use non-blocking assignments (<=). Refer to What is the difference between = and <= in Verilog?
#(posedge clk);
data <= 7;
pst <= 1;
#(posedge clk);
data <= 0;
pst <= 1;
#(posedge clk);
pst <= 0;
#(posedge clk);
I am able to obtain correct, predictable behavior if I modify my testbench to only modify input signals to my counter on falling clock edges rather than on rising clock edges (as it should be anyways). My best guess as to why the above behavior was occurring is that changing input signals at the same time the counter module is programmed to sample its inputs leads to undefined simulator behavior.

Verilog How to change wire by bit with clock?

module clks(
input clk,
output [15:0] led
);
wire div2, div4, div8;
reg [2:0] count = 0;
assign div2 = count[0];
assign div4 = count[1];
assign div8 = count[2];
always #(posedge clk) count = count + 1;
endmodule
How can I turn on each led (I have 15 leds) using clock?
I'm really having trouble finding helpful resources online
initial begin
case({count})
2'b00:
led = 15'b000000000000001;
2'b01:
led = 15'b000000000000010;
...
endcase
end
This didn't work.
Or could I do something like this?
led = led + 1;
In your sample code above, you defined count as 3 bits, but your case statements are 2 bits wide. Also, you don't want the initial statement, rather use an always statement.
always # (count)
begin
case(count)
3'b000 : led = 15'b000_0000_0001;
3'b001 : led = 15'b000_0000_0010;
...
endcase
end
I guess that 'by using clock' means changing the led every clock cycle, right? Also it looks like you are trying to encode the led sequentially. In this case you can do the following:
you need to reset your lead to an initial value, sey 15'b1;
every clock cycle you can just shift it left by one. You should not do it in an initial block (though there is a technical way to do so). Use always blocks:
Here is an example:
module clks(
input clk,
input reset,
output reg [15:0] led
);
always #(posedge clk) begin
if (reset == 1)
led <= 15'b1;
else
led <= led << 1;
end
endmodule
In the above case '1' will travel through all bits of led over 15 clock cycles once. 'led' will become '0' after this. You have to make sure that it becomes '1' again if you want to continue in cycles.
Another possibility is to initialize 'led' in the always block, but it is not always synthesizable. YOu do not need a reset signal here.
initial led = 15'b1;
always #(posedge clk) led <= led << 1;

Verilog: wait for module logic evaluation in an always block

I want to use the output of another module inside an always block.
Currently the only way to make this code work is by adding #1 after the pi_in assignment so that enough time has passed to allow Pi to finish.
Relevant part from module pLayer.v:
Pi pi(pi_in,pi_out);
always #(*)
begin
for(i=0; i<constants.nSBox; i++) begin
for(j=0; j<8; j++) begin
x = (state_value[(constants.nSBox-1)-i]>>j) & 1'b1;
pi_in = 8*i+j;#1; /* wait for pi to finish */
PermutedBitNo = pi_out;
y = PermutedBitNo>>3;
tmp[(constants.nSBox-1)-y] ^= x<<(PermutedBitNo-8*y);
end
end
state_out = tmp;
end
Modllue Pi.v
`include "constants.v"
module Pi(in, out);
input [31:0] in;
output [31:0] out;
reg [31:0] out;
always #* begin
if (in != constants.nBits-1) begin
out = (in*constants.nBits/4)%(constants.nBits-1);
end else begin
out = constants.nBits-1;
end
end
endmodule
Delays should not be used in the final implementation, so is there another way without using #1?
In essence i want PermutedBitNo = pi_out to be evaluated only after the Pi module has finished its job with pi_in (=8*i+j) as input.
How can i block this line until Pi has finished?
Do i have to use a clock? If that's the case, please give me a hint.
update:
Based on Krouitch suggestions i modified my modules. Here is the updated version:
From pLayer.v:
Pi pi(.clk (clk),
.rst (rst),
.in (pi_in),
.out (pi_out));
counter c_i (clk, rst, stp_i, lmt_i, i);
counter c_j (clk, rst, stp_j, lmt_j, j);
always #(posedge clk)
begin
if (rst) begin
state_out = 0;
end else begin
if (c_j.count == lmt_j) begin
stp_i = 1;
end else begin
stp_i = 0;
end
// here, the logic starts
x = (state_value[(constants.nSBox-1)-i]>>j) & 1'b1;
pi_in = 8*i+j;
PermutedBitNo = pi_out;
y = PermutedBitNo>>3;
tmp[(constants.nSBox-1)-y] ^= x<<(PermutedBitNo-8*y);
// at end
if (i == lmt_i-1)
if (j == lmt_j) begin
state_out = tmp;
end
end
end
endmodule
module counter(
input wire clk,
input wire rst,
input wire stp,
input wire [32:0] lmt,
output reg [32:0] count
);
always#(posedge clk or posedge rst)
if(rst)
count <= 0;
else if (count >= lmt)
count <= 0;
else if (stp)
count <= count + 1;
endmodule
From Pi.v:
always #* begin
if (rst == 1'b1) begin
out_comb = 0;
end
if (in != constants.nBits-1) begin
out_comb = (in*constants.nBits/4)%(constants.nBits-1);
end else begin
out_comb = constants.nBits-1;
end
end
always#(posedge clk) begin
if (rst)
out <= 0;
else
out <= out_comb;
end
That's a nice piece of software you have here...
The fact that this language describes hardware is not helping then.
In verilog, what you write will simulate in zero time. it means that your loop on i and j will be completely done in zero time too. That is why you see something when you force the loop to wait for 1 time unit with #1.
So yes, you have to use a clock.
For your system to work you will have to implement counters for i and j as I see things.
A counter synchronous counter with reset can be written like this:
`define SIZE 10
module counter(
input wire clk,
input wire rst_n,
output reg [`SIZE-1:0] count
);
always#(posedge clk or negedge rst_n)
if(~rst_n)
count <= `SIZE'd0;
else
count <= count + `SIZE'd1;
endmodule
You specify that you want to sample pi_out only when pi_in is processed.
In a digital design it means that you want to wait one clock cycle between the moment when you are sending pi_in and the moment when you are reading pi_out.
The best solution, in my opinion, is to make your pi module sequential and then consider pi_out as a register.
To do that I would do the following:
module Pi(in, out);
input clk;
input [31:0] in;
output [31:0] out;
reg [31:0] out;
wire clk;
wire [31:0] out_comb;
always #* begin
if (in != constants.nBits-1) begin
out_comb = (in*constants.nBits/4)%(constants.nBits-1);
end else begin
out_comb = constants.nBits-1;
end
end
always#(posedge clk)
out <= out_comb;
endmodule
Quickly if you use counters for i and j and this last pi module this is what will happen:
at a new clock cycle, i and j will change --> pi_in will change accordingly at the same time(in simulation)
at the next clock cycle out_comb will be stored in out and then you will have the new value of pi_out one clock cycle later than pi_in
EDIT
First of all, when writing (synchronous) processes, I would advise you to deal only with 1 register by process. It will make your code clearer and easier to understand/debug.
Another tip would be to separate combinatorial circuitry from sequential. It will also make you code clearer and understandable.
If I take the example of the counter I wrote previously it would look like :
`define SIZE 10
module counter(
input wire clk,
input wire rst_n,
output reg [`SIZE-1:0] count
);
//Two way to do the combinatorial function
//First one
wire [`SIZE-1:0] count_next;
assign count_next = count + `SIZE'd1;
//Second one
reg [`SIZE-1:0] count_next;
always#*
count_next = count + `SIZE'1d1;
always#(posedge clk or negedge rst_n)
if(~rst_n)
count <= `SIZE'd0;
else
count <= count_next;
endmodule
Here I see why you have one more cycle than expected, it is because you put the combinatorial circuitry that controls your pi module in you synchronous process. It means that the following will happen :
first clk positive edge i and j will be evaluated
next cycle, the pi_in is evaluated
next cycle, pi_out is captured
So it makes sense that it takes 2 cycles.
To correct that you should take out of the synchronous process the 'logic' part. As you stated in your commentaries it is logic, so it should not be in the synchronous process.
Hope it helps

Verilog HDL, User Input from an FPGA

I am currently working on a project in Verilog HDL with an FPGA obtained from my school (I am running Quartus II vers. 10.1 and 11.0 (I've tried both)). I am getting a very bizarre bug that I cannot figure out for the life of me.
I am developing a Morse Code program which detects dots and dashes, and then outputs the appropriate letter on a HEX display based upon this input. The HEX display works beautifully, but my UserInput module doesn't seem to do anything at all!
module UserInput(Clock, reset, in, out);
input Clock, reset, in;
output reg [1:0] out;
wire [2:0] PS;
reg [2:0] NS;
parameter NONE = 2'b00, DOT = 2'b01, DASH = 2'b11; //For Output
parameter UP = 3'b000, SHORT0 = 3'b001, SHORT1 = 3'b010, UP_DOT = 3'b011, LONG = 3'b100, UP_DASH = 3'b101;
//Active High
always#(PS or in)
case (PS)
UP: if (in) NS = SHORT0;
else NS = UP;
SHORT0: if (in) NS = SHORT1;
else NS = UP_DOT;
SHORT1: if (in) NS = LONG;
else NS = UP_DOT;
UP_DOT: NS = UP;
LONG: if (in) NS = LONG;
else NS = UP_DASH;
UP_DASH: NS = UP;
default: NS = 3'bxxx;
endcase
always#(PS)
case (PS)
UP: out = NONE;
SHORT0: out = NONE;
SHORT1: out = NONE;
UP_DOT: out = DOT;
LONG: out = NONE;
UP_DASH: out = DASH;
default: out = 2'bxx;
endcase
D_FF dff0 (PS[0], NS[0], reset, Clock);
D_FF dff1 (PS[1], NS[1], reset, Clock);
D_FF dff2 (PS[2], NS[2], reset, Clock);
endmodule
module D_FF (q, d, reset, clk);
input d, reset, clk;
output reg q;
always#(posedge clk or posedge reset)
begin
if (reset) q = 0;
else q = d;
end
endmodule
The input for the module is a KEY on the FPGA. The FSM represented by the UserInput module has the key be in the "UP" state at t=0. Then, if there is input, it will move through SHORT0 or SHORT1, and finally LONG. If it the key is released at any of these states, they go to their appropriate intermediary UP states and provide an output of "DOT" or "DASH".
However, when I connect this to my FPGA, I get nothing. From my testing, it seems that it never moves away from the "UP" state. Even my simulations give me nothing. Secondly, I've tried connecting a different UserInput module from a different project (one I know works), and still nothing. Is there something going on in the background of Verilog I am missing?
Here is an image of the simulation waveform:
DFf 0, 1, and 2 are the bits 0, 1, and 2 of PS. My simulation won't allow a showing of the NS.
Your code looks bad to me (which I guess you want to hear as your code doesn't work). It looks like a combination of timing problems and a design flaw.
Let's walk through your waveform view and see if we can't work out what's going on.
signal in goes high, which triggers an always block. PS is 0 so we set NS to 1. This is not in time for the rising clock edge so it's not triggered in the DFF (as you'd have suspected), never mind it'll be caught on the next clock edge.
signal in goes low, which triggers an always block, PS is 0 so we set NS to 0. This happens in time for the rising clock edge and is captured in the DFF (argh we missed the NS signal going to 1 as we wanted).
Also, someone mentioned that there's an error with your flip-flop being asserted while reset is asserted. This isn't a problem: the reset is synchronous. So on the next rising clock edge the DFF is reset to 0.
So, what's the solution (looks like homework to me, so hopefully you've fixed this already!):
It should look something like this (I haven't simulated it, so no guarantees):
Module UserInput (clk, reset, in, out);
input clk, reset, in;
output [1:0] out;
// output parameters
parameter IDLE = 2'b00;
parameter DOT = 2'b01;
parameter DASH = 2'b10;
// FSM states
parameter LOW = 3'b000;
parameter SHORT1 = 3'b001;
parameter SHORT2 = 3'b010;
parameter LONG = 3'b100;
reg [2:0] state;
wire [1:0] next_out;
wire [2:0] next_state;
always #(posedge clk)
begin
if (reset)
begin
out <= IDLE;
state <= LOW;
end;
else
begin
out <= next_out;
state <= next_state;
end
end if;
end
always #(*)
begin
case (state)
LOW:
next_out = IDLE;
next_state = (in? SHORT1 : LOW);
SHORT1:
begin
next_state = (in? SHORT2: LOW);
next_out = (in? IDLE : DOT);
end;
SHORT2:
next_state = (in? LONG: LOW);
next_out = (in? IDLE : DOT);
LONG:
next_state = (in? LONG : LOW);
next_out = (in? IDLE : DASH);
default:
// we shouldn't get here!!
next_state = LOW;
next_out = IDLE;
end;
end module;
So what's going on here:
I think this should be fairly obvious. When the in signal moves from high to low then we want to output the current state (LONG as a DASH, SHORT1 and SHORT2 as a DOT), otherwise we output IDLE. If the in signal is high then we want to move the state along depending on how long it's been high for.
There's an error with this code which won't effect simulation, but which will almost certainly effect you on the FPGA: if you are getting input from an external source then you'll need to buffer it through a (series?) of flip-flops to prevent metastability problems. This can be fixed by adding a series of D flip-flops to capture the in signal and then passing this "cleaned" buffered_in to the UserInput.
ie:
module in_buffer (clk, reset, in, out);
input clk, reset, in;
output out;
reg buf1, buf2;
always # (posedge clk)
begin
if (reset)
begin
out <= 0;
buf1 <= 0;
buf2 <= 0;
end
else
out <= buf2;
buf2 <= buf1;
buf1 <= in;
end
end

Resources