How to execute task concurrently with other statements in an always block? - verilog

I am writing code for 8*4 RAM in Verilog. For each binary cell of memory, I am using an SR flip-flop. Initially, each cell is assigned 1'bx. The logic seems to be correct, but the output isn't. It is probably because statements are not getting executed concurrently. Can anyone suggest how can I get the task SRFlipFlop to get executed concurrently with other statements?
module memory(addr, read_data, rw, write_data, clk);
// read_data is the data read
// rw specifies read or write operation. 1 for read and 0 for write
// write data is the data to be written
// addr is the address to be written or read
task SRFlipFlop;
input d,r,s,clk; // d is the value initially stored
output q;
begin
case({s,r})
{1'b0,1'b0}: q<=d;
{1'b0,1'b1}: q<=1'b0;
{1'b1,1'b0}: q<=1'b1;
{1'b1,1'b1}: q<=1'bx;
endcase
end
endtask
task decoder; // a 3 to 8 line decoder
input [2:0] A;
input E;
output [7:0] D;
if (!E)
D <= 16'b0000000000000000;
else
begin
case (A)
3'b000 : D <= 8'b00000001;
3'b001 : D <= 8'b00000010;
3'b010 : D <= 8'b00000100;
3'b011 : D <= 8'b00001000;
3'b100 : D <= 8'b00010000;
3'b101 : D <= 8'b00100000;
3'b110 : D <= 8'b01000000;
3'b111 : D <= 8'b10000000;
endcase
end
endtask
output reg [3:0] read_data;
input [3:0] write_data;
input [2:0] addr;
input rw, clk;
reg [3:0] memory [7:0];
reg [3:0] r [7:0];
reg [3:0] s [7:0];
reg [3:0] intermediate;
reg [3:0] select [7:0];
reg [7:0] out;
reg [7:0] out1;
integer i,j,k,l;
initial
begin
for (i = 0; i <= 7; i=i+1)
begin
for (j = 0; j <= 3; j=j+1)
begin
memory[i][j] = 1'bx;
r[i][j] = 1'b0;
s[i][j] = 1'b0;
select[i][j] = 1'b0;
end
end
end
always #(posedge clk)
begin
decoder(addr, 1'b1, out);
for (i = 0; i <= 7; i=i+1)
begin
if (out[i] == 1'b1)
begin
for (j = 0; j <= 3; j=j+1)
begin
select[i][j] <= 1'b1;
s[i][j] <= write_data[j] & !rw & select[i][j];
r[i][j] <= !write_data[j] & !rw & select[i][j];
SRFlipFlop(memory[i][j],r[i][j],s[i][j],clk,intermediate);
memory[i][j] <= intermediate;
read_data[j] <= memory[i][j];
end
end
end
end
endmodule

Your code style is very software-oriented. Personally I like to know how my code will look as a circuit, so instead of using nested for loops and tasks I will use modules and generate-loops to create my circuits.
I have not been able to make your code work, but I suspect that the error is in the fact that s and r are not reset to zero on every iteration.
I have created a functioning design here:
http://www.edaplayground.com/x/Guc
Instead of using the initial block to initialize values I have added an asynchronous reset.
The SRFF-task has been converted to a module. A RAMblock module instantiates four SRFF-modules. 8 RAMblocks are instantiated in the memory module.
I have converted your packed(reg [] a []) arrays into unpacked arrays(reg [][] a) to be able to perform bitwise operations on several bits without for-loops.
If you have questions about the code, feel free to message me.
Edit: Perhaps the most important thing to note in this design is that I separate the sequential circuitry from the combinatorial. This way it is much easier to control what should be updated on the posedge of clk and what should just be a combinatorial reaction to the changes performed at the posedge.

Related

Verilog: wait for module logic evaluation in an always block

I want to use the output of another module inside an always block.
Currently the only way to make this code work is by adding #1 after the pi_in assignment so that enough time has passed to allow Pi to finish.
Relevant part from module pLayer.v:
Pi pi(pi_in,pi_out);
always #(*)
begin
for(i=0; i<constants.nSBox; i++) begin
for(j=0; j<8; j++) begin
x = (state_value[(constants.nSBox-1)-i]>>j) & 1'b1;
pi_in = 8*i+j;#1; /* wait for pi to finish */
PermutedBitNo = pi_out;
y = PermutedBitNo>>3;
tmp[(constants.nSBox-1)-y] ^= x<<(PermutedBitNo-8*y);
end
end
state_out = tmp;
end
Modllue Pi.v
`include "constants.v"
module Pi(in, out);
input [31:0] in;
output [31:0] out;
reg [31:0] out;
always #* begin
if (in != constants.nBits-1) begin
out = (in*constants.nBits/4)%(constants.nBits-1);
end else begin
out = constants.nBits-1;
end
end
endmodule
Delays should not be used in the final implementation, so is there another way without using #1?
In essence i want PermutedBitNo = pi_out to be evaluated only after the Pi module has finished its job with pi_in (=8*i+j) as input.
How can i block this line until Pi has finished?
Do i have to use a clock? If that's the case, please give me a hint.
update:
Based on Krouitch suggestions i modified my modules. Here is the updated version:
From pLayer.v:
Pi pi(.clk (clk),
.rst (rst),
.in (pi_in),
.out (pi_out));
counter c_i (clk, rst, stp_i, lmt_i, i);
counter c_j (clk, rst, stp_j, lmt_j, j);
always #(posedge clk)
begin
if (rst) begin
state_out = 0;
end else begin
if (c_j.count == lmt_j) begin
stp_i = 1;
end else begin
stp_i = 0;
end
// here, the logic starts
x = (state_value[(constants.nSBox-1)-i]>>j) & 1'b1;
pi_in = 8*i+j;
PermutedBitNo = pi_out;
y = PermutedBitNo>>3;
tmp[(constants.nSBox-1)-y] ^= x<<(PermutedBitNo-8*y);
// at end
if (i == lmt_i-1)
if (j == lmt_j) begin
state_out = tmp;
end
end
end
endmodule
module counter(
input wire clk,
input wire rst,
input wire stp,
input wire [32:0] lmt,
output reg [32:0] count
);
always#(posedge clk or posedge rst)
if(rst)
count <= 0;
else if (count >= lmt)
count <= 0;
else if (stp)
count <= count + 1;
endmodule
From Pi.v:
always #* begin
if (rst == 1'b1) begin
out_comb = 0;
end
if (in != constants.nBits-1) begin
out_comb = (in*constants.nBits/4)%(constants.nBits-1);
end else begin
out_comb = constants.nBits-1;
end
end
always#(posedge clk) begin
if (rst)
out <= 0;
else
out <= out_comb;
end
That's a nice piece of software you have here...
The fact that this language describes hardware is not helping then.
In verilog, what you write will simulate in zero time. it means that your loop on i and j will be completely done in zero time too. That is why you see something when you force the loop to wait for 1 time unit with #1.
So yes, you have to use a clock.
For your system to work you will have to implement counters for i and j as I see things.
A counter synchronous counter with reset can be written like this:
`define SIZE 10
module counter(
input wire clk,
input wire rst_n,
output reg [`SIZE-1:0] count
);
always#(posedge clk or negedge rst_n)
if(~rst_n)
count <= `SIZE'd0;
else
count <= count + `SIZE'd1;
endmodule
You specify that you want to sample pi_out only when pi_in is processed.
In a digital design it means that you want to wait one clock cycle between the moment when you are sending pi_in and the moment when you are reading pi_out.
The best solution, in my opinion, is to make your pi module sequential and then consider pi_out as a register.
To do that I would do the following:
module Pi(in, out);
input clk;
input [31:0] in;
output [31:0] out;
reg [31:0] out;
wire clk;
wire [31:0] out_comb;
always #* begin
if (in != constants.nBits-1) begin
out_comb = (in*constants.nBits/4)%(constants.nBits-1);
end else begin
out_comb = constants.nBits-1;
end
end
always#(posedge clk)
out <= out_comb;
endmodule
Quickly if you use counters for i and j and this last pi module this is what will happen:
at a new clock cycle, i and j will change --> pi_in will change accordingly at the same time(in simulation)
at the next clock cycle out_comb will be stored in out and then you will have the new value of pi_out one clock cycle later than pi_in
EDIT
First of all, when writing (synchronous) processes, I would advise you to deal only with 1 register by process. It will make your code clearer and easier to understand/debug.
Another tip would be to separate combinatorial circuitry from sequential. It will also make you code clearer and understandable.
If I take the example of the counter I wrote previously it would look like :
`define SIZE 10
module counter(
input wire clk,
input wire rst_n,
output reg [`SIZE-1:0] count
);
//Two way to do the combinatorial function
//First one
wire [`SIZE-1:0] count_next;
assign count_next = count + `SIZE'd1;
//Second one
reg [`SIZE-1:0] count_next;
always#*
count_next = count + `SIZE'1d1;
always#(posedge clk or negedge rst_n)
if(~rst_n)
count <= `SIZE'd0;
else
count <= count_next;
endmodule
Here I see why you have one more cycle than expected, it is because you put the combinatorial circuitry that controls your pi module in you synchronous process. It means that the following will happen :
first clk positive edge i and j will be evaluated
next cycle, the pi_in is evaluated
next cycle, pi_out is captured
So it makes sense that it takes 2 cycles.
To correct that you should take out of the synchronous process the 'logic' part. As you stated in your commentaries it is logic, so it should not be in the synchronous process.
Hope it helps

Verilog code 2 errors i can't find: Would be grateful for an extra pair of eyes to spot a mistake i might've overlooked

I'm writing a verilog code where i'm reading two files and saving those numbers into registers. I'm then multiplying them and adding them. Pretty much a Multiplication Accumulator. However i'm having a hard frustrating time with the code that i have. It read the numbers from the files correctly and it multiples but here is the problem? When i first run it using ModelSim, I reset everything so i can clear out the accumulator. I then begin the program, but there is always this huge delay in my "macc_out" and i cannot seem to figure out why. This delay should not be there and instead it should be getting the result out A*B+MAC. Even after the delay, it's not getting the correct output. My second problem is that if i go from reset high, to low (start the program) and then back to reset high ( to reset all my values), they do not reset! This is frustrating since i've been working on this for a week and don't know/can't see a bug. Im asking for an extra set of eyes to see if you can spot my mistake. Attached is my code with the instantiations and also my ModelSim functional Wave Form. Any help is appreciated!
module FSM(clk,start,reset,done,clock_count);
input clk, start, reset;
output reg done;
output reg[10:0] clock_count;
reg [0:0] macc_clear;
reg[5:0] Aread, Bread, Cin;
wire signed [7:0] a, b;
wire signed [18:0] macc_out;
reg [3:0] i,j,m;
reg add;
reg [0:0] go;
reg[17:0] c;
parameter n = 8;
reg[1:0] state;
reg [1:0] S0 = 2'b00;
reg [1:0] S1 = 2'b01;
reg [1:0] S2 = 2'b10;
reg [1:0] S3 = 2'b11;
ram_A Aout(.clk(clk), .addr(Aread), .q(a));
ram_B Bout(.clk(clk), .addr(Bread), .q(b));
mac macout(.clk(clk), .macc_clear(macc_clear), .A(a), .B(b), .macc_out(macc_out), .add(add));
ram_C C_in(.clk(clk), .addr(Cin), .q(c));
always #(posedge clk) begin
if (reset == 1) begin
i <= 0;
add<=0;
j <= 0;
m <= 0;
clock_count <= 0;
go <= 0;
macc_clear<=1;
end
else
state<=S0;
case(state)
S0: begin
// if (reset) begin
// i <= 0;
// add<=0;
// j <= 0;
// m <= 0;
// clock_count <= 0;
// go <= 0;
// macc_clear<=1;
// state <= S0;
// end
macc_clear<=1;
done<=0;
state <= S1;
end
S1: begin
add<=1;
macc_clear<=0;
clock_count<=clock_count+1;
m<=m+1;
Aread <= 8*m + i;
Bread <= 8*j + m;
if (m==7) begin
state <= S2;
macc_clear<=1;
add<=0;
end
else
state <=S1;
end
S2: begin
add<=1;
macc_clear<=0;
m<=0;
i<=i+1;
if (i<7)
state<=S1;
else if (i==8) begin
state<=S3;
add<=0;
end
end
S3: begin
add<=1;
i<=0;
j<=j+1;
if(j<7)
state<=S1;
else begin
state<=S0;
done<=1;
add<=0;
end
end
endcase
end
always # (posedge macc_clear) begin
Cin <= 8*j + i;
c <= macc_out;
end
endmodule
module mac(clk, macc_clear, A, B, macc_out, add);
input clk, macc_clear;
input signed [7:0] A, B;
input add;
output reg signed [18:0] macc_out;
reg signed [18:0] MAC;
always #( posedge clk) begin
if (macc_clear) begin
macc_out <= MAC;
MAC<=0;
end
else if (add) begin
MAC<=(A*B)+ MAC;
macc_out<=MAC;
end
end
endmodule
module ram_A( clk, addr,q);
output reg[7:0] q;
input [5:0] addr;
input clk;
reg [7:0] mem [0:63];
initial begin
$readmemb("ram_a_init.txt", mem);
end
always #(posedge clk) begin
q <= mem[addr];
end
endmodule
module ram_C(clk,addr, q);
input [18:0] q;
input [5:0] addr;
input clk;
reg [18:0] mem [0:63];
always #(posedge clk) begin
mem[addr] <= q;
end
endmodule
ModelSim Functional Simulation Wave Form
1) Take a look at the schematic view for your MACC module - I think some of your "problems" will be obvious from that;
2) Consider using an always#(*) (Combinational) block for your FSM control signals (stuff like add or macc_clear) rather than a always#(posedge clk) (sequential) - it makes the logic to assert them easier. Right now they're registered, so you have a cycle delay. ;
3) In your MAC, you clear the MAC register on a reset, but you don't clear the macc_out register.
In short, I think you need to step back, and consider which signals are combinational logic, and which ones are sequential and need to be in registers.

Serializer 32 to 8 - Verilog HDL

I am trying to make serializer from 32bits to 8 bits. Because I am just starting verilog I am facing problem. I would like to get 32 bits (on every 4th clock cycles) and then to send 8 bits on every clock cycle. How can I take just part of my dataIn, I wrote code below but assignment expression is not working. Sorry if question is basic. Thank you in advance on answer.
module ser32to8(clk, dataIn, dataOut);
input clk;
input [32:0] dataIn;
output [7:0] dataOut;
always #(posedge clk)
begin
dataOut <= dataIn << 8;
end
endmodule
The reason why the assignment failed (besides your code not doing any serialization) is because you didn't declare dataOut as a reg, and so you cannot assign to it inside an always block.
Here's how you do it correctly. (Since you didn't say in which order you wanted to serialize, I chose to go for lowest byte first, highest byte last. To reverse the order, exchange >> by << and tmp[7:0] by tmp[31:24].)
module ser32to8(
input clk,
input [31:0] dataIn,
output [7:0] dataOut
);
// count: 0, 1, 2, 3, 0, ... (wraps automatically)
reg [1:0] count;
always #(posedge clk) begin
count <= count + 2'd1;
end
reg [31:0] tmp;
always #(posedge clk) begin
if (count == 2'd0)
tmp <= dataIn;
else
tmp <= (tmp >> 8);
end
assign dataOut = tmp[7:0];
endmodule
How can you just take part of your dataIn data? By using the [] notation. dataIn[7:0] takes the 8 least significant bits, dataIn[15:8] takes the next 8 bits, and so on up to dataIn[31:24] which would take the 8 most significant bits.
To apply this to your problem, you can do like this (take into account that this is not an optimal solution, as outputs are not registered and hence, glitches may occur)
module ser32to8(
input wire clk,
input wire [31:0] dataIn,
output reg [7:0] dataOut
);
reg [1:0] cnt = 2'b00;
always #(posedge clk)
cnt <= cnt + 1;
always #* begin
case (cnt)
2'd0: dataOut = dataIn[7:0];
2'd1: dataOut = dataIn[15:8];
2'd2: dataOut = dataIn[23:16];
2'd3: dataOut = dataIn[31:24];
default: dataOut = 8'h00;
endcase
end
endmodule
You must declare dataOut as a reg, since you are using it in always block.Also, you are trying to assign 32 bit datain to 8 bit dataout , it is not logically correct.
The idea behind the question is not so clear but my guess would be that you want to wait for 4 clock cycles before you send the data, if that is the case below snippet could help, A counter to wait before 4 clock cycles will do the trick
module top (input clk,rst,
input [31:0] dataIn,
output [7:0] dataOut
);
reg [31:0] tmp;
reg [31:0] inter;
integer count;
always #(posedge clk)
begin
if (rst) begin
count <= 0;
tmp <= '0;
end
else
begin
if (count < 3) begin
tmp <= dataIn << 4;
count <= count +1; end
else if (count == 3)
begin
inter <= tmp;
count <= 0;
end
else
begin
tmp <= dataIn;
end
end
end
assign dataOut = inter[7:0];
endmodule
But there are some limitations tested with tb http://www.edaplayground.com/x/4Cg
Note: Please ignore the previous code it won't work(I was not clear so
tried it differently)
EDIT:
If I understand your question correctly a simple way to do it is
a)
module top ( input rst,clk,
input [31:0] dataIn,
output [7:0] dataOut);
reg [1:0] cnt;
always #(posedge clk) begin
if (rst) cnt <= 'b0;
else cnt <= cnt + 1;
end
assign dataOut = (cnt == 0) ? dataIn [7:0] :
(cnt == 1) ? dataIn [15:8] :
(cnt == 2) ? dataIn [23:16] :
(cnt == 3) ? dataIn [31:24] :
'0;
endmodule
Incase if you don't want to write it seperately for loop will come in handy to make it more simple
b)
module top ( input rst,clk,
input [31:0] dataIn,
output reg [7:0] dataOut);
reg [1:0] cnt;
integer i;
always #(posedge clk) begin
if (rst) cnt <= 'b0;
else cnt <= cnt + 1;
end
always # * begin
for ( i =0;i < cnt ; i=i+1) begin
dataOut <= dataIn[(i*8)+:8]; end
end
endmodule
I have tried both with test cases and found to be working, tc's present #
a) http://www.edaplayground.com/x/VCF
b) http://www.edaplayground.com/x/4Cg
You may want to give it a try
You can follow the figure below to design your circuit. Hope it can be useful with you. If you need the code, feel free to contact me.
SER 112 bits with 8 outputs in parallel

Error count for two different patterns in verilog

I have made 2 forms of data patterns and wants to compare them in the form of error count.....when the 2 patterns are not equal, the error count should be high....i made the code including test bench, but when i ran behavioral sumilation, the error count is only high at value 0 and not at value 1.....I expect it to be high at both 0 and 1....please help me out in this, since I am new with verilog
here is the code
`timescale 1ns / 1ps
module pattern(
clk,
start,
rst,enter code here
clear,
data_in1,
data_in2,
error
);
input [1:0] data_in1;
input [1:0] data_in2;
input clk;
input start;
input rst;
input clear;
output [1:0] error;
reg [1:0] comp_out;
reg [1:0] i = 0;
assign error = comp_out;
always#(posedge clk)
begin
comp_out = 0;
if(rst)
comp_out = 0;
else
begin
for(i = 0; i < 2; i = i + 1)
begin
if(data_in1[i] != data_in2[i])
comp_out <= comp_out + 1;
end
end
end
endmodule
here is the test bench for the above code
`timescale 1ns / 1ps
module tb_pattern();
// inputs
reg clk;
reg rst;
reg [1:0] data_in1;
reg [1:0] data_in2;
wire [1:0] error;
//outputs
//wire [15:0] count;
//instantiate the unit under test (UUT)
pattern uut (
// .count(count),
.clk(clk),
.start(start),
.rst(rst),
.clear(clear),
.data_in1(data_in1),
.data_in2(data_in2),
.error(error)
);
initial begin
clk = 1'b0;
rst = 1'b1;
repeat(4) #10 clk = ~clk;
rst = 1'b0;
forever #10 clk = ~clk; // generate a clock
end
initial begin
//initialize inputs
clk = 0;
//rst = 1;
data_in1 = 2'b00;
data_in2 = 2'b01;
#100
data_in1 = 2'b11;
data_in2 = 2'b00;
#100
$finish;
end
//force rest after delay
//#20 rst = 0;
//#25 rst = 1;
endmodule
When incrementing in a for loop you need to use blocking assignment (=), however when assigning flops you should use non-blocking assignment (<=). When you need to use a for loop to assign a flop, it is best to split the combinational and synchronous functionality into separate always blocks.
...
reg [1:0] comp_out, next_comb_out;
always #* begin : comb
next_comp_out = 0;
for (i = 0; i < 2; i = i + 1) begin
if (data_in1[i] != data_in2[i]) begin
next_comp_out = next_comp_out + 1;
end
end
end
always #(posedge clk) begin : dff
if (rst) begin
comb_out <= 1'b0;
end
else begin
comb_out <= next_comp_out;
end
end
...
begin
for(i = 0; i < 2; i = i + 1)
begin
if(data_in1[i] != data_in2[i])
comp_out <= comp_out + 1;
end
end
This for loop doesn't work the way you think it does. Because this is a non-blocking assignment, only the last iteration of the loop actually applies. So only the last bit is actually being compared here.
If both bits of your data mismatch, then the loop unrolls to something which looks like this:
comp_out <= comp_out + 1;
comp_out <= comp_out + 1;
Because this is non-blocking, the RHS of the equation are both evaluated at the same time, leaving you with:
comp_out <= 0 + 1;
comp_out <= 0 + 1;
So even though you tried to use this as a counter, only the last line takes effect, and you get a mismatch count of '1', no matter how many bits mismatch.
Try using a blocking statement (=) for comp_out assignment instead.

Nonblocking simultaneous assignments to wires and registers in Verilog

I am interested to write Verilog module which simultaneously will update several outputs
Something like following code, makes 3 operations at the same time (clk 10):
module mymodule (a,b,c,d,e);
input a;
input b;
output c;
output d;
output e;
wire b;
wire a;
wire c;
wire d;
reg e;
initial begin
c <= #10 (a+b);
d <= #10 a;
e <= #10 b;
end
endmodule
Is that code legal?
How todo a one off assign of variables after 10 timeunits or clocks:
As a testbench level construct:
reg c,d,e;
initial begin
#10;
c = a+b;
d = a;
e = b;
end
For RTL (synthesis) first you need a testbench with a clock.
I would generate a clock like this in my testharness :
reg clk ; //Rising edge every 10 timesteps
initial begin
clk = 0;
#5;
forever begin
#5 ;
clk = ~clk;
end
end
Build a counter which counts to 10 and once it reaches 10 Enables the flip-flop to load new values.
wire enable = (counter == 4'b10);
always #(posedge clk or negedge rst_n) begin
if (~rst_n) begin
c <= 1'b0;
d <= 1'b0;
e <= 1'b0;
end
else if (enable) begin
c <= (a+b);
d <= a;
e <= b;
end
end
endmodule
Extra Verilog Tips
Outputs are implicitly wires no need to redefine them.
Non-Blocking assignments <= are for use in an always #(posedge clk) when inferring flip-flops.
regs or logic types can be assigned inside always or initial blocks. wires are used with assign or for connectivity between ports.

Resources