Absolute value in Verilog (sequential design) - verilog

I want to get a register absolute value inside an always# block with a clock but I'm getting the abs of the previous value instead of the current one.
I saw this before and here is what I am doing:
reg signed [7:0] x;
reg signed [7:0] xabs;
...
always # (posedge CLK or posedge RST)
begin
...
if($signed(x) < 0)
xabs <= -$signed(x);
else
xabs <= x;
...
end
Is there anything that I am doing wrong?
waveform:
waveform

If you use always #posedge, then xabs will be assigned a cycle after x is assigned.
If you want xabs to change as soon as x changes, you can use a always #* which triggers immediately.
reg [32-1:0] xabs;
always #* begin
...
if($signed(x) < 0)
xabs = -$signed(x);
else
xabs = x;
...
end
Our you can make xabs a wire and use an assign.
wire [32-1:0] xabs;
assign xabs = ($signed(x) < 0) ? -$signed(x) : x;

Related

Trouble understanding simulation/module behavior

I implemented a very simple counter with preset functionality (code reproduced below).
module counter
#(
parameter mod = 4
) (
input wire clk,
input wire rst,
input wire pst,
input wire en,
input wire [mod - 1:0] data,
output reg [mod - 1:0] out,
output reg rco
);
parameter max = (2 ** mod) - 1;
always #* begin
if(out == max) begin
rco = 1;
end else begin
rco = 0;
end
end
always #(posedge clk) begin
if(rst) begin
out <= 0;
end else if(pst) begin
out <= data;
end else if(en) begin
out <= out + 1;
end else begin
out <= out;
end
end
endmodule
I am having trouble understanding the following simulation result. With pst asserted and data set to 7 on a rising clock edge, the counter's out is set to data, as expected (first image below. out is the last signal, data is the signal just above, and above that is pst.). On the next rising edge, I kept preset asserted and set data to 0. However, out does not follow data this time. What is the cause of this behavior?
My thoughts
On the rising clock edge where I set data to 0, I notice that out stays at 7, and doesn't increment to 8. So I believe that the counter is presetting, but with the value 7, not 0. If I move the data transition from 7 to 0 up in time, out gets set to 0 as expected (image below). Am I encountering a race condition?
Testbenches
My initial testbench code that produced the first image is reproduced below. I show the changes I made to get coherent results as comments.
parameter mod = 4;
// ...
reg pst;
reg [mod - 1:0] data;
// ...
#(posedge clk); // ==> #(negedge clk)
data = 7;
pst = 1;
#(posedge clk); // ==> #(negedge clk)
data 0;
pst = 1;
#(posedge clk); // ==> #(negedge clk)
pst = 0;
#(posedge clk);
// ...
You have a race condition test bench. The Verilog scheduler is allowed to evaluate any # triggered in the time step in any order it chooses. All code after the granted # will execute until it hits another time blocking statement. In your waveform it looks like data and pst from the from the test bench are sometimes being assigned before the design samples them and sometimes after.
The solution is simple, use non-blocking assignments (<=). Refer to What is the difference between = and <= in Verilog?
#(posedge clk);
data <= 7;
pst <= 1;
#(posedge clk);
data <= 0;
pst <= 1;
#(posedge clk);
pst <= 0;
#(posedge clk);
I am able to obtain correct, predictable behavior if I modify my testbench to only modify input signals to my counter on falling clock edges rather than on rising clock edges (as it should be anyways). My best guess as to why the above behavior was occurring is that changing input signals at the same time the counter module is programmed to sample its inputs leads to undefined simulator behavior.

modelsim programming 60 counter (error loading design)

My code is compiling well, but it does not work when i simulate it.
It displays "error loading design".
i think that input and output port is wrong among these modules.
but i can not find them..
please help me where the error is in my code.
module tb_modulo_60_binary;
reg t_clk, reset;
wire [7:0] t_Y;
parameter sec = 30;
always #(sec) t_clk = ~t_clk;
modulo_60_binary M1 (t_Y, t_clk, reset);
initial begin
t_clk = 1; reset =1; #10;
reset = 0; #3050;
$finish;
end
endmodule
module modulo_60_binary(y, clk, reset);
output [7:0] y;
input reset, clk;
wire TA1, TA2, TA3, JA2, JA4;
reg [7:0] y;
assign TA1 = 1;
assign TA2 = (~y[6]) && y[4];
assign TA3 = (y[5] && y[4]) || (y[6] && y[4]);
assign JA2 = ~y[3];
assign JA4 = y[1]&&y[2];
jk_flip_flop JK1 (1, 1, clk, y[0]);
jk_flip_flop JK2 (JA2, 1, y[0], y[1]);
jk_flip_flop JK3 (1, 1, y[1], y[2]);
jk_flip_flop JK4 (JA4, 1, y[1], y[3]);
t_flip_flop T1 (TA1, clk, y[4]);
t_flip_flop T2 (TA2, clk, y[5]);
t_flip_flip T3 (TA3, clk, y[6]);
always #(negedge clk)
begin
if(reset)
y <= 8'b00000000;
else if(y == 8'b01110011)
y <= 8'b00000000;
end
endmodule
module t_flip_flop(t, clk, q);
input t, clk;
output q;
reg q;
initial q=0;
always #(negedge clk)
begin
if(t == 0) q <= q;
else q <= ~q;
end
endmodule
module jk_flip_flop(j, k, clk, Q);
output Q;
input j, k, clk;
reg Q;
always #(negedge clk)
if({j,k} == 2'b00) Q <= Q;
else if({j,k} == 2'b01) Q <= 1'b0;
else if({j,k} == 2'b10) Q <= 1'b1;
else if({j,k} == 2'b11) Q <= ~Q;
endmodule
Your y signal in modulo_60_binary is being driven in two places:
By bit JK# and T# instances
The reset logic that assigns all bits of y to zeros
Flops and comb-logic must have one clear driver. This is one of the fundamental differences between software and hardware languages.
The rest of my answer assuming the use of the JK and T flops are a design requirement. Therefore you need to delete the always block that assigns y to zeros and and make y a wire type.
Fixing the logic to the T flops is easy. Simply add a conditional statement. Example:
wire do_rst = reset || (y == 8'b01110011);
assign TA1 = do_rst ? y[4] : 1;
assign TA2 = do_rst ? y[5] : (~y[6]) && y[4];
assign TA3 = do_rst ? y[6] : (y[5] && y[4]) || (y[6] && y[4]);
The JK flops is harder because the output of one flop is the clock of another. I'll advice that the clock input for each JK flop should be clk, otherwise you are asking for a design headache for reset when it's y bits are non power of two minus one values (eg 1,3,7,15). This means you need to re-evaluate your JA# logic and add KA# logic (hint the do_rst from above will help). I'm not going to do the work for you beyond this.
There is the option of the asynchronous reset approach, but for this design I will advice ageist it. The reset pulse could be too short on silicon with the conditional reset for y == a particular value(s), which can result in an undependable partial reset. You could add synthesis constraints/rules to keep the push wide enough, but that is just patching a brittle design. Better to design it robust at the beginning.
FYI: y[7] does not have a driver and the module declaration of instance T3 has a typo.

Verilog: wait for module logic evaluation in an always block

I want to use the output of another module inside an always block.
Currently the only way to make this code work is by adding #1 after the pi_in assignment so that enough time has passed to allow Pi to finish.
Relevant part from module pLayer.v:
Pi pi(pi_in,pi_out);
always #(*)
begin
for(i=0; i<constants.nSBox; i++) begin
for(j=0; j<8; j++) begin
x = (state_value[(constants.nSBox-1)-i]>>j) & 1'b1;
pi_in = 8*i+j;#1; /* wait for pi to finish */
PermutedBitNo = pi_out;
y = PermutedBitNo>>3;
tmp[(constants.nSBox-1)-y] ^= x<<(PermutedBitNo-8*y);
end
end
state_out = tmp;
end
Modllue Pi.v
`include "constants.v"
module Pi(in, out);
input [31:0] in;
output [31:0] out;
reg [31:0] out;
always #* begin
if (in != constants.nBits-1) begin
out = (in*constants.nBits/4)%(constants.nBits-1);
end else begin
out = constants.nBits-1;
end
end
endmodule
Delays should not be used in the final implementation, so is there another way without using #1?
In essence i want PermutedBitNo = pi_out to be evaluated only after the Pi module has finished its job with pi_in (=8*i+j) as input.
How can i block this line until Pi has finished?
Do i have to use a clock? If that's the case, please give me a hint.
update:
Based on Krouitch suggestions i modified my modules. Here is the updated version:
From pLayer.v:
Pi pi(.clk (clk),
.rst (rst),
.in (pi_in),
.out (pi_out));
counter c_i (clk, rst, stp_i, lmt_i, i);
counter c_j (clk, rst, stp_j, lmt_j, j);
always #(posedge clk)
begin
if (rst) begin
state_out = 0;
end else begin
if (c_j.count == lmt_j) begin
stp_i = 1;
end else begin
stp_i = 0;
end
// here, the logic starts
x = (state_value[(constants.nSBox-1)-i]>>j) & 1'b1;
pi_in = 8*i+j;
PermutedBitNo = pi_out;
y = PermutedBitNo>>3;
tmp[(constants.nSBox-1)-y] ^= x<<(PermutedBitNo-8*y);
// at end
if (i == lmt_i-1)
if (j == lmt_j) begin
state_out = tmp;
end
end
end
endmodule
module counter(
input wire clk,
input wire rst,
input wire stp,
input wire [32:0] lmt,
output reg [32:0] count
);
always#(posedge clk or posedge rst)
if(rst)
count <= 0;
else if (count >= lmt)
count <= 0;
else if (stp)
count <= count + 1;
endmodule
From Pi.v:
always #* begin
if (rst == 1'b1) begin
out_comb = 0;
end
if (in != constants.nBits-1) begin
out_comb = (in*constants.nBits/4)%(constants.nBits-1);
end else begin
out_comb = constants.nBits-1;
end
end
always#(posedge clk) begin
if (rst)
out <= 0;
else
out <= out_comb;
end
That's a nice piece of software you have here...
The fact that this language describes hardware is not helping then.
In verilog, what you write will simulate in zero time. it means that your loop on i and j will be completely done in zero time too. That is why you see something when you force the loop to wait for 1 time unit with #1.
So yes, you have to use a clock.
For your system to work you will have to implement counters for i and j as I see things.
A counter synchronous counter with reset can be written like this:
`define SIZE 10
module counter(
input wire clk,
input wire rst_n,
output reg [`SIZE-1:0] count
);
always#(posedge clk or negedge rst_n)
if(~rst_n)
count <= `SIZE'd0;
else
count <= count + `SIZE'd1;
endmodule
You specify that you want to sample pi_out only when pi_in is processed.
In a digital design it means that you want to wait one clock cycle between the moment when you are sending pi_in and the moment when you are reading pi_out.
The best solution, in my opinion, is to make your pi module sequential and then consider pi_out as a register.
To do that I would do the following:
module Pi(in, out);
input clk;
input [31:0] in;
output [31:0] out;
reg [31:0] out;
wire clk;
wire [31:0] out_comb;
always #* begin
if (in != constants.nBits-1) begin
out_comb = (in*constants.nBits/4)%(constants.nBits-1);
end else begin
out_comb = constants.nBits-1;
end
end
always#(posedge clk)
out <= out_comb;
endmodule
Quickly if you use counters for i and j and this last pi module this is what will happen:
at a new clock cycle, i and j will change --> pi_in will change accordingly at the same time(in simulation)
at the next clock cycle out_comb will be stored in out and then you will have the new value of pi_out one clock cycle later than pi_in
EDIT
First of all, when writing (synchronous) processes, I would advise you to deal only with 1 register by process. It will make your code clearer and easier to understand/debug.
Another tip would be to separate combinatorial circuitry from sequential. It will also make you code clearer and understandable.
If I take the example of the counter I wrote previously it would look like :
`define SIZE 10
module counter(
input wire clk,
input wire rst_n,
output reg [`SIZE-1:0] count
);
//Two way to do the combinatorial function
//First one
wire [`SIZE-1:0] count_next;
assign count_next = count + `SIZE'd1;
//Second one
reg [`SIZE-1:0] count_next;
always#*
count_next = count + `SIZE'1d1;
always#(posedge clk or negedge rst_n)
if(~rst_n)
count <= `SIZE'd0;
else
count <= count_next;
endmodule
Here I see why you have one more cycle than expected, it is because you put the combinatorial circuitry that controls your pi module in you synchronous process. It means that the following will happen :
first clk positive edge i and j will be evaluated
next cycle, the pi_in is evaluated
next cycle, pi_out is captured
So it makes sense that it takes 2 cycles.
To correct that you should take out of the synchronous process the 'logic' part. As you stated in your commentaries it is logic, so it should not be in the synchronous process.
Hope it helps

Verilog code 2 errors i can't find: Would be grateful for an extra pair of eyes to spot a mistake i might've overlooked

I'm writing a verilog code where i'm reading two files and saving those numbers into registers. I'm then multiplying them and adding them. Pretty much a Multiplication Accumulator. However i'm having a hard frustrating time with the code that i have. It read the numbers from the files correctly and it multiples but here is the problem? When i first run it using ModelSim, I reset everything so i can clear out the accumulator. I then begin the program, but there is always this huge delay in my "macc_out" and i cannot seem to figure out why. This delay should not be there and instead it should be getting the result out A*B+MAC. Even after the delay, it's not getting the correct output. My second problem is that if i go from reset high, to low (start the program) and then back to reset high ( to reset all my values), they do not reset! This is frustrating since i've been working on this for a week and don't know/can't see a bug. Im asking for an extra set of eyes to see if you can spot my mistake. Attached is my code with the instantiations and also my ModelSim functional Wave Form. Any help is appreciated!
module FSM(clk,start,reset,done,clock_count);
input clk, start, reset;
output reg done;
output reg[10:0] clock_count;
reg [0:0] macc_clear;
reg[5:0] Aread, Bread, Cin;
wire signed [7:0] a, b;
wire signed [18:0] macc_out;
reg [3:0] i,j,m;
reg add;
reg [0:0] go;
reg[17:0] c;
parameter n = 8;
reg[1:0] state;
reg [1:0] S0 = 2'b00;
reg [1:0] S1 = 2'b01;
reg [1:0] S2 = 2'b10;
reg [1:0] S3 = 2'b11;
ram_A Aout(.clk(clk), .addr(Aread), .q(a));
ram_B Bout(.clk(clk), .addr(Bread), .q(b));
mac macout(.clk(clk), .macc_clear(macc_clear), .A(a), .B(b), .macc_out(macc_out), .add(add));
ram_C C_in(.clk(clk), .addr(Cin), .q(c));
always #(posedge clk) begin
if (reset == 1) begin
i <= 0;
add<=0;
j <= 0;
m <= 0;
clock_count <= 0;
go <= 0;
macc_clear<=1;
end
else
state<=S0;
case(state)
S0: begin
// if (reset) begin
// i <= 0;
// add<=0;
// j <= 0;
// m <= 0;
// clock_count <= 0;
// go <= 0;
// macc_clear<=1;
// state <= S0;
// end
macc_clear<=1;
done<=0;
state <= S1;
end
S1: begin
add<=1;
macc_clear<=0;
clock_count<=clock_count+1;
m<=m+1;
Aread <= 8*m + i;
Bread <= 8*j + m;
if (m==7) begin
state <= S2;
macc_clear<=1;
add<=0;
end
else
state <=S1;
end
S2: begin
add<=1;
macc_clear<=0;
m<=0;
i<=i+1;
if (i<7)
state<=S1;
else if (i==8) begin
state<=S3;
add<=0;
end
end
S3: begin
add<=1;
i<=0;
j<=j+1;
if(j<7)
state<=S1;
else begin
state<=S0;
done<=1;
add<=0;
end
end
endcase
end
always # (posedge macc_clear) begin
Cin <= 8*j + i;
c <= macc_out;
end
endmodule
module mac(clk, macc_clear, A, B, macc_out, add);
input clk, macc_clear;
input signed [7:0] A, B;
input add;
output reg signed [18:0] macc_out;
reg signed [18:0] MAC;
always #( posedge clk) begin
if (macc_clear) begin
macc_out <= MAC;
MAC<=0;
end
else if (add) begin
MAC<=(A*B)+ MAC;
macc_out<=MAC;
end
end
endmodule
module ram_A( clk, addr,q);
output reg[7:0] q;
input [5:0] addr;
input clk;
reg [7:0] mem [0:63];
initial begin
$readmemb("ram_a_init.txt", mem);
end
always #(posedge clk) begin
q <= mem[addr];
end
endmodule
module ram_C(clk,addr, q);
input [18:0] q;
input [5:0] addr;
input clk;
reg [18:0] mem [0:63];
always #(posedge clk) begin
mem[addr] <= q;
end
endmodule
ModelSim Functional Simulation Wave Form
1) Take a look at the schematic view for your MACC module - I think some of your "problems" will be obvious from that;
2) Consider using an always#(*) (Combinational) block for your FSM control signals (stuff like add or macc_clear) rather than a always#(posedge clk) (sequential) - it makes the logic to assert them easier. Right now they're registered, so you have a cycle delay. ;
3) In your MAC, you clear the MAC register on a reset, but you don't clear the macc_out register.
In short, I think you need to step back, and consider which signals are combinational logic, and which ones are sequential and need to be in registers.

My verilog VGA driver causes the screen to flicker (Basys2)

I'm trying to recreate Adventure(1979) in Verilog and so far I have character movement, collision and map generation done. It didn't flicker that much before I separated the maps into modules now it flickers constantly. When I was looking up this issue, I found out that the clock on the Basys2 board is pretty noisy and could be the culprit. However, putting the maps into modules shouldn't have made it worse unless I messed something up. Any idea what happened?
Here's my map generator:
module map_generator(clk_vga, reset, CurrentX, CurrentY, HBlank, VBlank, playerPosX, playerPosY, mapData
);
input clk_vga;
input reset;
input [9:0]CurrentX;
input [8:0]CurrentY;
input HBlank;
input VBlank;
input [9:0]playerPosX;
input [8:0]playerPosY;
output [7:0]mapData;
reg [7:0]mColor;
reg [5:0]currentMap = 0;
wire [7:0]startCastle;
StartCastle StartCastle(
.clk_vga(clk_vga),
.CurrentX(CurrentX),
.CurrentY(CurrentY),
.mapData(startCastle)
);
always #(posedge clk_vga) begin
if(reset)begin
currentMap <= 0;
end
end
always #(posedge clk_vga) begin
if(HBlank || VBlank) begin
mColor <= 0;
end
else begin
if(currentMap == 4'b0000) begin
mColor[7:0] <= startCastle[7:0];
end
//Add more maps later
end
end
assign mapData[7:0] = mColor[7:0];
endmodule
Here's the startCastle:
module StartCastle(clk_vga, CurrentX, CurrentY, active, mapData);
input clk_vga;
input [9:0]CurrentX;
input [8:0]CurrentY;
input active;
output [7:0]mapData;
reg [7:0]mColor;
always #(posedge clk_vga) begin
if(CurrentY < 40) begin
mColor[7:0] <= 8'b11100000;
end
else if(CurrentX < 40) begin
mColor[7:0] <= 8'b11100000;
end
else if(~(CurrentX < 600)) begin
mColor[7:0] <= 8'b11100000;
end
else if((~(CurrentY < 440) && (CurrentX < 260)) || (~(CurrentY < 440) && ~(CurrentX < 380))) begin
mColor[7:0] <= 8'b11100000;
end else
mColor[7:0] <= 8'b00011100;
end
assign mapData = mColor;
endmodule
Here's the VGA driver which is connected to my top module:
module vga_driver(clk_50MHz, vs_vga, hs_vga, RED, GREEN, BLUE, HBLANK, VBLANK, CURX, CURY, COLOR, CLK_DATA, RESET);
input clk_50MHz;
output vs_vga;
output hs_vga;
output [2:0] RED;
output [2:0] GREEN;
output [1:0] BLUE;
output HBLANK;
output VBLANK;
reg VS = 0;
reg HS = 0;
input RESET;
//current client data
input [7:0] COLOR;
output CLK_DATA;
output [9:0] CURX;
output [8:0] CURY;
//##### Module constants (http://tinyvga.com/vga-timing/640x480#60Hz)
parameter HDisplayArea = 640; // horizontal display area
parameter HLimit = 800; // maximum horizontal amount (limit)
parameter HFrontPorch = 16; // h. front porch
parameter HBackPorch = 48; // h. back porch
parameter HSyncWidth = 96; // h. pulse width
parameter VDisplayArea = 480; // vertical display area
parameter VLimit = 525; // maximum vertical amount (limit)
parameter VFrontPorch = 10; // v. front porch
parameter VBackPorch = 33; // v. back porch
parameter VSyncWidth = 2; // v. pulse width
//##### Local variables
wire clk_25MHz;
reg [9:0] CurHPos = 0; //maximum of HLimit (2^10 - 1 = 1023)
reg [9:0] CurVPos = 0; //maximum of VLimit
reg HBlank_reg, VBlank_reg, Blank = 0;
reg [9:0] CurrentX = 0; //maximum of HDisplayArea
reg [8:0] CurrentY = 0; //maximum of VDisplayArea (2^9 - 1 = 511)
//##### Submodule declaration
clock_divider clk_div(.clk_in(clk_50MHz), .clk_out(clk_25MHz));
//shifts the clock by half a period (negates it)
//see timing diagrams for a better understanding of the reason for this
clock_shift clk_shift(.clk_in(clk_25MHz), .clk_out(CLK_DATA));
//simulate the vertical and horizontal positions
always #(posedge clk_25MHz) begin
if(CurHPos < HLimit-1) begin
CurHPos <= CurHPos + 1;
end
else begin
CurHPos <= 0;
if(CurVPos < VLimit-1)
CurVPos <= CurVPos + 1;
else
CurVPos <= 0;
end
if(RESET) begin
CurHPos <= 0;
CurVPos <= 0;
end
end
//##### VGA Logic (http://tinyvga.com/vga-timing/640x480#60Hz)
//HSync logic
always #(posedge clk_25MHz)
if((CurHPos < HSyncWidth) && ~RESET)
HS <= 1;
else
HS <= 0;
//VSync logic
always #(posedge clk_25MHz)
if((CurVPos < VSyncWidth) && ~RESET)
VS <= 1;
else
VS <= 0;
//Horizontal logic
always #(posedge clk_25MHz)
if((CurHPos >= HSyncWidth + HFrontPorch) && (CurHPos < HSyncWidth + HFrontPorch + HDisplayArea) || RESET)
HBlank_reg <= 0;
else
HBlank_reg <= 1;
//Vertical logic
always #(posedge clk_25MHz)
if((CurVPos >= VSyncWidth + VFrontPorch) && (CurVPos < VSyncWidth + VFrontPorch + VDisplayArea) || RESET)
VBlank_reg <= 0;
else
VBlank_reg <= 1;
//Do not output any color information when we are in the vertical
//or horizontal blanking areas. Set a boolean to keep track of this.
always #(posedge clk_25MHz)
if((HBlank_reg || VBlank_reg) && ~RESET)
Blank <= 1;
else
Blank <= 0;
//Keep track of the current "real" X position. This is the actual current X
//pixel location abstracted away from all the timing details
always #(posedge clk_25MHz)
if(HBlank_reg && ~RESET)
CurrentX <= 0;
else
CurrentX <= CurHPos - HSyncWidth - HFrontPorch;
//Keep track of the current "real" Y position. This is the actual current Y
//pixel location abstracted away from all the timing details
always #(posedge clk_25MHz)
if(VBlank_reg && ~RESET)
CurrentY <= 0;
else
CurrentY <= CurVPos - VSyncWidth - VFrontPorch;
assign CURX = CurrentX;
assign CURY = CurrentY;
assign VBLANK = VBlank_reg;
assign HBLANK = HBlank_reg;
assign hs_vga = HS;
assign vs_vga = VS;
//Respects VGA Blanking areas
assign RED = (Blank) ? 3'b000 : COLOR[7:5];
assign GREEN = (Blank) ? 3'b000 : COLOR[4:2];
assign BLUE = (Blank) ? 2'b00 : COLOR[1:0];
endmodule
clk_div:
module clock_divider(clk_in, clk_out);
input clk_in;
output clk_out;
reg clk_out = 0;
always #(posedge clk_in)
clk_out <= ~clk_out;
endmodule
clk_shift:
module clock_shift(clk_in, clk_out);
input clk_in;
output clk_out;
assign clk_out = ~clk_in;
endmodule
I'm posting this as an answer because I cannot put a photo in a comment.
Is this what your design looks like?
My only guess ATM is that you might have misplaced some ports during instantiation of vga_driver and/or map_generator (if you used the old style instantiation). Nevertheless, I'm going to check VGA timmings, as I can see a strange vertical line at the left of the screen, as if the hblank interval was visible.
By the way: I've changed the way you generate the display. You use regs for HS, VS, etc, which get updated the next clock cycle. I treat display generation as a FSM, so outputs come from combinational blocks triggered by certain values (or range of values) from the counters. Besides, I start horizontal and vertical counters so position (0,0) measured in pixel coordinates in the screen actually maps to values (0,0) from horizontal and vertical counters, so no arithmetic needed.
This is my version of VGA display generation:
module videosyncs (
input wire clk,
input wire [2:0] rin,
input wire [2:0] gin,
input wire [1:0] bin,
output reg [2:0] rout,
output reg [2:0] gout,
output reg [1:0] bout,
output reg hs,
output reg vs,
output wire [10:0] hc,
output wire [10:0] vc
);
/* http://www.abramovbenjamin.net/calc.html */
// VGA 640x480#60Hz,25MHz
parameter htotal = 800;
parameter vtotal = 524;
parameter hactive = 640;
parameter vactive = 480;
parameter hfrontporch = 16;
parameter hsyncpulse = 96;
parameter vfrontporch = 11;
parameter vsyncpulse = 2;
parameter hsyncpolarity = 0;
parameter vsyncpolarity = 0;
reg [10:0] hcont = 0;
reg [10:0] vcont = 0;
reg active_area;
assign hc = hcont;
assign vc = vcont;
always #(posedge clk) begin
if (hcont == htotal-1) begin
hcont <= 0;
if (vcont == vtotal-1) begin
vcont <= 0;
end
else begin
vcont <= vcont + 1;
end
end
else begin
hcont <= hcont + 1;
end
end
always #* begin
if (hcont>=0 && hcont<hactive && vcont>=0 && vcont<vactive)
active_area = 1'b1;
else
active_area = 1'b0;
if (hcont>=(hactive+hfrontporch) && hcont<(hactive+hfrontporch+hsyncpulse))
hs = hsyncpolarity;
else
hs = ~hsyncpolarity;
if (vcont>=(vactive+vfrontporch) && vcont<(vactive+vfrontporch+vsyncpulse))
vs = vsyncpolarity;
else
vs = ~vsyncpolarity;
end
always #* begin
if (active_area) begin
gout = gin;
rout = rin;
bout = bin;
end
else begin
gout = 3'h00;
rout = 3'h00;
bout = 2'h00;
end
end
endmodule
Which is instantiated by your vga_driver module, which becomes nothing but a wrapper for this module:
module vga_driver (
input wire clk_25MHz,
output wire vs_vga,
output wire hs_vga,
output wire [2:0] RED,
output wire [2:0] GREEN,
output wire [1:0] BLUE,
output wire HBLANK,
output wire VBLANK,
output [9:0] CURX,
output [8:0] CURY,
input [7:0] COLOR,
input wire RESET
);
assign HBLANK = 0;
assign VBLANK = 0;
videosyncs syncgen (
.clk(clk_25MHz),
.rin(COLOR[7:5]),
.gin(COLOR[4:2]),
.bin(COLOR[1:0]),
.rout(RED),
.gout(GREEN),
.bout(BLUE),
.hs(hs_vga),
.vs(vs_vga),
.hc(CURX),
.vc(CURY)
);
endmodule
Note that in map_generator, the first if statement in this always block will never be true. We can forget about it, as the VGA display module will blank RGB outputs when needed.
always #(posedge clk_vga) begin
if(HBlank || VBlank) begin //
mColor <= 0; // Never reached
end //
else begin //
if(currentMap == 4'b0000) begin
mColor[7:0] <= startCastle[7:0];
end
//Add more maps later
end
end
Using the same approach, I've converted the map generator module to be a combinational module. For example, for map 0 (the castle -without the castle, I see-) it is like this:
module StartCastle(
input wire [9:0] CurrentX,
input wire [8:0] CurrentY,
output wire [7:0] mapData
);
reg [7:0] mColor;
assign mapData = mColor;
always #* begin
if(CurrentY < 40) begin
mColor[7:0] <= 8'b11100000;
end
else if(CurrentX < 40) begin
mColor[7:0] <= 8'b11100000;
end
else if(~(CurrentX < 600)) begin
mColor[7:0] <= 8'b11100000;
end
else if((~(CurrentY < 440) && (CurrentX < 260)) || (~(CurrentY < 440) && ~(CurrentX < 380))) begin
mColor[7:0] <= 8'b11100000;
end else
mColor[7:0] <= 8'b00011100;
end
endmodule
Just a FSM whose output is the colour that goes in a pixel. The input being the coordinates of the current pixel.
So when it is time to display map 0, map_generator simply switches to it based upon the current value of currentMap
module map_generator (
input wire clk,
input wire reset,
input wire [9:0]CurrentX,
input wire [8:0]CurrentY,
input wire HBlank,
input wire VBlank,
input wire [9:0]playerPosX,
input wire [8:0]playerPosY,
output wire [7:0]mapData
);
reg [7:0] mColor;
assign mapData = mColor;
reg [5:0]currentMap = 0;
wire [7:0] castle_map;
StartCastle StartCastle(
.CurrentX(CurrentX),
.CurrentY(CurrentY),
.mapData(castle_map)
);
always #(posedge clk) begin
if(reset) begin
currentMap <= 0;
end
end
always #* begin
if(currentMap == 6'b000000) begin
mColor = castle_map;
end
//Add more maps later
end
endmodule
This may look like a lot of comb logic is generated and so glitches may happen. It's actually very fast, no noticeable glitches on screen, and you can use the actual current x and y coordinates to choose what to display on screen. Thus, no need for an inverted clock. My final version of your design has only one 25MHz clock.
By the way, you want to keep device dependent constructions away from your design, placing things like clock generators in separate modules that will be connected to your design in the top module, which should be the only device dependent module.
So, I've written a device-agnostic adventure module, which will contain the entire game:
module adventure (
input clk_vga,
input reset,
output vs_vga,
output hs_vga,
output [2:0] RED,
output [2:0] GREEN,
output [1:0] BLUE
);
wire HBLANK, VBLANK;
wire [7:0] COLOR;
wire [9:0] CURX;
wire [8:0] CURY;
wire [9:0] playerPosX = 10'd320; // no actually used in the design yet
wire [8:0] playerPosY = 9'd240; // no actually used in the design yet
vga_driver the_screen (.clk_25MHz(clk_vga),
.vs_vga(vs_vga),
.hs_vga(hs_vga),
.RED(RED),
.GREEN(GREEN),
.BLUE(BLUE),
.HBLANK(HBLANK),
.VBLANK(VBLANK),
.CURX(CURX),
.CURY(CURY),
.COLOR(COLOR)
);
map_generator the_mapper (.clk(clk_vga),
.reset(reset),
.CurrentX(CURX),
.CurrentY(CURY),
.HBlank(HBLANK),
.VBlank(VBLANK),
.playerPosX(playerPosX),
.playerPosY(playerPosY),
.mapData(COLOR)
);
endmodule
This module is not complete: it lacks inputs from joystick or any other input device to update player current position. For now, player current position is fixed.
The top level design (TLD) is written exclusively for the FPGA trainer you have. It is here where you need to generate proper clocks using your device's available resources, such as the DCM in Spartan 3/3E devices.
module tld_basys(
input wire clk_50MHz,
input wire RESET,
output wire vs_vga,
output wire hs_vga,
output wire [2:0] RED,
output wire [2:0] GREEN,
output wire [1:0] BLUE
);
wire clk_25MHz;
dcm_clocks gen_vga_clock (
.CLKIN_IN(clk_50MHz),
.CLKDV_OUT(clk_25MHz)
);
adventure the_game (.clk_vga(clk_25MHz),
.reset(RESET),
.vs_vga(vs_vga),
.hs_vga(hs_vga),
.RED(RED),
.GREEN(GREEN),
.BLUE(BLUE)
);
endmodule
The DCM generated clocks goes in this module (generated by the Xilinx Core Generator)
module dcm_clocks (CLKIN_IN,
CLKDV_OUT
);
input CLKIN_IN;
output CLKDV_OUT;
wire CLKFB_IN;
wire CLKFX_BUF;
wire CLKDV_BUF;
wire CLKIN_IBUFG;
wire CLK0_BUF;
wire GND_BIT;
assign GND_BIT = 0;
BUFG CLKDV_BUFG_INST (.I(CLKDV_BUF),
.O(CLKDV_OUT));
IBUFG CLKIN_IBUFG_INST (.I(CLKIN_IN),
.O(CLKIN_IBUFG));
BUFG CLK0_BUFG_INST (.I(CLK0_BUF),
.O(CLKFB_IN));
DCM_SP #(.CLKDV_DIVIDE(2.0), .CLKIN_DIVIDE_BY_2("FALSE"),
.CLKIN_PERIOD(20.000), .CLKOUT_PHASE_SHIFT("NONE"),
.DESKEW_ADJUST("SYSTEM_SYNCHRONOUS"), .DFS_FREQUENCY_MODE("LOW"),
.DLL_FREQUENCY_MODE("LOW"), .DUTY_CYCLE_CORRECTION("TRUE"),
.FACTORY_JF(16'hC080), .PHASE_SHIFT(0), .STARTUP_WAIT("FALSE") )
DCM_SP_INST (.CLKFB(CLKFB_IN),
.CLKIN(CLKIN_IBUFG),
.DSSEN(GND_BIT),
.PSCLK(GND_BIT),
.PSEN(GND_BIT),
.PSINCDEC(GND_BIT),
.RST(GND_BIT),
.CLKDV(CLKDV_BUF),
.CLKFX(),
.CLKFX180(),
.CLK0(CLK0_BUF),
.CLK2X(),
.CLK2X180(),
.CLK90(),
.CLK180(),
.CLK270(),
.LOCKED(),
.PSDONE(),
.STATUS());
endmodule
Although it is safe (for Xilinx devices) to use a simple clock divider as you did. If you fear that the synthesizer won't treat your divided clock as an actual clock, add a BUFG primitive to route the output from the divider to a global buffer so it can be used as a clock with no problems (see the module above for an example on how to do this).
As a final note, you may want to add more independency from the final device, by using 24-bit colours for your graphics. At the TLD, you will use the actual number of bits per colour component you really have, but if you move from the Basys2 with 8-bit colour trainer board to the, say, Nexys4 board, with 12-bit colour, you will automatically enjoy a richer output display.
Now, it looks like this (no vertical bars at the left, and colours seem to be more vibrant)

Resources