I am working on a Verilog design where I am using SRAM inside a FSM. I need to synthesize it later on since I want to fabricate the IC. My question is that I have a fully working code using reg registers where I use blocking assignment for concurrent operation. Since there is no clock in this system, it works fine. Now, I want to replace these registers with SRAM based memory, which brings in clock into the system. My first thought is to use non-blocking assignment and changing the dependency list from always #(*) to always # (negedge clk).
In the code snippet below, I want to read 5 sets of data from the SRAM (SR4). So what I do is I place a counter that counts till 5 (wait_var) for this to happen. By introducing additional counter, this code ensures that at 1st clock edge it enters the counter and at subsequent clock edges, the five sets of data is read from SRAM. This technique works for simple logic such as this.
S_INIT_MEM: begin
// ******Off-Chip (External) Controller will write the data into SR4. Once data is written, init_data will be raised to 1.******
if (init_data == 1'b0) begin
CEN4 <= CEN;
WEN4 <= WEN;
RETN4 <= RETN;
EMA4 <= EMA;
A4 <= A_in;
D4 <= D_in;
end
else begin
CEN4 <= 1'b0; //SR4 is enabled
EMA4 <= 3'b0; //EMA set to 0
WEN4 <= 1'b1; //SR4 set to read mode
RETN4 <= 1'b1; //SR4 RETN is turned ON
A4 <= 8'b0000_0000;
if (wait_var < 6) begin
if (A4 == 8'b0000_0000 ) begin
NUM_DIMENSIONS <= Q4;
A4 <= 8'b0000_0001;
end
if (A4 == 8'b0000_0001 ) begin
NUM_PARTICLES <= Q4;
A4 <= 8'b0000_0010;
end
if (A4 == 8'b0000_0010 ) begin
n_gd_iterations <= Q4;
A4 <= 8'b0000_0011;
end
if (A4 == 8'b0000_0011 ) begin
iterations <= Q4;
A4 <= 8'b0000_0100;
end
if (A4 == 8'b0000_0100 ) begin
threshold_val <= Q4;
A4 <= 8'b0000_0101;
end
wait_var <= wait_var + 1;
end
//Variables have been read from SR4
if(wait_var == 6) begin
CEN4 <= 1'b1;
next_state <= S_INIT_PRNG;
wait_var <= 0;
end
else begin
next_state <= S_INIT_MEM;
end
end
end
However, when I need to write a complex logic in the similar fashion, the counter based delay method gets too complex. Eg. say I want to read data from one SRAM (SR1) and want to write it to another SRAM (SR3).
CEN1 = 1'b0;
A1 = ((particle_count-1)*NUM_DIMENSIONS) + (dimension_count-1);
if (CEN1 == 1'b0) begin
CEN3 = 1'b0;
WEN3 = 1'b0;
A3 = ((particle_count-1)*NUM_DIMENSIONS) + (dimension_count-1);
if(WEN3 == 1'b0) begin
D3 = Q1;
WEN3 = 1'b1;
CEN3 = 1'b1;
end
CEN1 = 1'b1;
end
I know this still uses blocking assignments and I need to convert them to non-blocking assignments, but if I do and I do not introduce 1 clock cycle delay manually using counter, it will not work as desired. Is there a way to get around this in a simpler manner?
Any help would be highly appreciated.
The main part is, that non-blocking assignments are a simulation only artifact and provides a way for simulation to match hardware behavior. If you use them incorrectly, you might end up with simulation time races and mismatch with hardware. In this case your verification effort goes to null.
There is a set of common practices used in the industry to handle this situation. One is to use non-blocking assignments for outputs of all sequential devices. This avoids races and makes sure that the behavior of sequential flops and latches pipes data the same way as in real hardware.
Hence, one cycle delay caused by the non-blocking assignments is a myth. If you design sequential flops when the second one latches the data from the first, then the data will be moved across flops sequentially every cycle:
clk ------v----------------v
in1 -> [flop1] -> out1 -> [flop2] -> out2
clk 1 1 1 0
clk 3 1 1 1
clk 4 0 0 1
clk 5 0 0 0
In the above example data is propagated from out1 to out2 in the every next clock cycle which can be expressed in verilog as
always #(posedge clk)
out1 <= in1;
always #(posedge clk)
out2 <= out1;
Or you can combine those
always #(posedge clk) begin
out1 <= in1;
out2 <= out1;
end
So, the task of your design is to cleanly separate sequential logic from combinatorial logic and therefore separate blocks with blocking and non-blocking assignments.
There are cases which can and must be used with blocking assignments inside sequential blocks, as mentioned in comments: if you use temporary vars to simplify your expressions inside sequential blocks assuming that those vars are never used anywhere else.
Other than above never mix blocking and non-blocking assignments in a single always block.
Also, usually due to synthesis methodologies, use if 'negedge' is discouraged. Avoid it unless your synthesis methodology does not care.
You should browse around to get more information and example of blocking/non-blocking assignments and their use.
Related
I have tried writing a small verilog module that will find the maximum of 10 numbers in an array. At the moment I am just trying to verify the correctness of the module without going into specific RTL methods that will to do such a task.
I am just seeing a a couple of registers when I am synthesizing this module. Nothing more that that. Ideally the output should be 7 which is at index 4 but I am seeing nothing neither on FPGA board or in the test bench. What I am doing wrong with this ?
module findmaximum(input clk,rst,output reg[3:0]max, output reg[3:0]index);
reg [3:0]corr_Output[0:9];
always#(posedge clk or posedge rst)
if(rst)
begin
corr_Output[0]=0;
corr_Output[1]=0;
corr_Output[2]=0;
corr_Output[3]=0;
corr_Output[4]=0;
corr_Output[5]=0;
corr_Output[6]=0;
corr_Output[7]=0;
corr_Output[8]=0;
corr_Output[9]=0;
end
else
begin
corr_Output[0]=0;
corr_Output[1]=0;
corr_Output[2]=0;
corr_Output[3]=0;
corr_Output[4]=7;
corr_Output[5]=0;
corr_Output[6]=0;
corr_Output[7]=0;
corr_Output[8]=0;
corr_Output[9]=0;
end
integer i;
always#(posedge clk or posedge rst)
if(rst)
begin
max=0;
index=0;
end
else
begin
max = corr_Output[0];
for (i = 0; i <= 9; i=i+1)
begin
if (corr_Output[i] > max)
begin
max = corr_Output[i];
index = i;
end
end
end
endmodule
Looking are your code, the only possible outputs are max=0,index=0 and a clock or two after reset max=7,index=4. Therefore, your synthesizer is likely optimizing the code with equivalent behavior with simpler logic.
For your find max logic to be meaningful, you need to change the values of corr_Output periodically. This can be done via input writes, LFSR (aka pseudo random number generator), and or other logic.
Other issues:
Synchronous logic (updated on a clock edge) should be assigned by with non-blocking (<=). Combinational logic should be assigned with blocking (=). When this guideline is not followed there is a risk of behavior differences between simulation and synthesis. In the event you need to compare with intermediate values (like your original max and index), then you need to separate the logic into two always blocks like bellow. See code bellow.
Also, FPGAs tend to have limited asynchronous reset support. Use synchronous reset instead by removing the reset from the sensitivity list.
always#(posedge clk) begin
if (rst) begin
max <= 4'h0;
index <= 4'h0;
end
else begin
max <= next_max;
index <= next_index;
end
always #* begin
next_max = corr_Output[0];
next_index = 4'h0;
for (i = 1; i <= 9; i=i+1) begin // <-- start at 1, not 0 (0 is same a default)
if (corr_Output[i] > next_max) begin
next_max = corr_Output[i];
next_index = i;
end
end
end
now I know in Verilog, to make a sequential logic you would almost always have use the non-blocking assignment (<=) in an always block. But does this rule also apply to internal variables? If blocking assignments were to be used for internal variables in an always block would it make it comb or seq logic?
So, for example, I'm trying to code a sequential prescaler module. It's output will only be a positive pulse of one clk period duration. It'll have a parameter value that will be the prescaler (how many clock cycles to divide the clk) and a counter variable to keep track of it.
I have count's assignments to be blocking assignments but the output, q to be non-blocking. For simulation purposes, the code works; the output of q is just the way I want it to be. If I change the assignments to be non-blocking, the output of q only works correctly for the 1st cycle of the parameter length, and then stays 0 forever for some reason (this might be because of the way its coded but, I can't seem to think of another way to code it). So is the way the code is right now behaving as a combinational or sequential logic? And, is this an acceptable thing to do in the industry? And is this synthesizable?
```
module scan_rate2(q, clk, reset_bar);
//I/O's
input clk;
input reset_bar;
output reg q;
//internal constants/variables
parameter prescaler = 8;
integer count = prescaler;
always #(posedge clk) begin
if(reset_bar == 0)
q <= 1'b0;
else begin
if (count == 0) begin
q <= 1'b1;
count = prescaler;
end
else
q <= 1'b0;
end
count = count - 1;
end
endmodule
```
You should follow the industry practice which tells you to use non-blocking assignments for all outputs of the sequential logic. The only exclusion are temporary vars which are used to help in evaluation of complex expressions in sequential logic, provided that they are used only in a single block.
In you case using 'blocking' for the 'counter' will cause mismatch in synthesis behavior. Synthesis will create flops for both q and count. However, in your case with blocking assignment the count will be decremented immediately after it is being assigned the prescaled value, whether after synthesis, it will happen next cycle only.
So, you need a non-blocking. BTW initializing 'count' within declaration might work in fpga synthesis, but does not work in schematic synthesis, so it is better to initialize it differently. Unless I misinterpreted your intent, it should look like the following.
integer count;
always #(posedge clk) begin
if(reset_bar == 0) begin
q <= 1'b0;
counter <= prescaler - 1;
end
else begin
if (count == 0) begin
q <= 1'b1;
count <= prescaler -1;
end
else begin
q <= 1'b0;
count <= count - 1;
end
end
end
You do not need temp vars there, but you for the illustration it can be done as the following:
...
integer tmp;
always ...
else begin
q <= 1'b0;
tmp = count - 1; // you should use blocking here
count <= tmp; // but here you should still use NBA
end
I was following a tutorial on SPI master in Verilog. I've been debugging this for about three hours now and cannot get it to work.
I've been able to break down the issue into a minimum representative issue. Here are the specifications:
We have two states, IDLE and COUNTING. Then, on the clock positive edge, we check:
If the state is IDLE, then the counter register is set to 0. If while in this state the dataReady pin is high, then the state is set to COUNTING and the counter is set to all 1s.
If the state is COUNTING, the state remains COUNTING as long as counter is not zero. Otherwise, the state is returned to IDLE.
Then, we count on the negative edge:
On the negative edge of clock if state is COUNTING, then decrement counter.
Here's the code I came up with to fit this specification:
// look in pins.pcf for all the pin names on the TinyFPGA BX board
module top (
input CLK, // 16MHz clock
input PIN_14,
output LED, // User/boot LED next to power LED
output USBPU // USB pull-up resistor
);
// drive USB pull-up resistor to '0' to disable USB
assign USBPU = 0;
reg [23:0] clockDivider;
wire clock;
always #(posedge CLK)
clockDivider <= clockDivider + 1;
assign clock = clockDivider[23];
wire dataReady;
assign dataReady = PIN_14;
parameter IDLE = 0, COUNTING = 1;
reg state = IDLE;
reg [3:0] counter;
always #(posedge clock) begin
case (state)
IDLE: begin
if (dataReady)
state <= COUNTING;
end
COUNTING: begin
if (counter == 0)
state <= IDLE;
end
endcase
end
always #(negedge clock) begin
if (state == COUNTING)
counter <= counter - 1;
end
always #(state) begin
case (state)
IDLE:
counter <= 0;
COUNTING:
counter <= counter;
endcase
end
assign LED = counter != 0;
endmodule
With this, we get the error:
ERROR: multiple drivers on net 'LED' (LED_SB_DFFNE_Q.Q and LED_SB_DFFNE_Q_1.Q)
Why? There is literally only one assign statement on the LED.
First of all it would not be easy to come up with a synthesizable model in such a case. But, you do not need any negedge logic to implement your model. Also you made several mistakes and violated many commonly accepted practices.
Now about some problems in your code.
By using non-blocking assignment in the clock line you created race condition in the simulation which will probably cause incorrect simulation results:
always #(posedge CLK)
clockDivider <= clockDivider + 1; // <<< this is a red flag!
assign clock = clockDivider[23];
...
always #(posedge clk)
you incorrectly used nbas in your always block
always(#state)
... counter <= conunter-1; // <<< this is a red flag again!
your state machine has no reset. Statements like reg state = IDLE; will only work in simulation and in some fpgas. It is not synthesizable in general. I suggest that you do not use it but provide a reset signal instead.
Saying that, i am not aware of any methodology which would use positive and negative edges in such a case. So, you should not. All your implementation can be done under the posedge, something like the following. However
always #(posedge clock) begin
if (reset) begin // i suggest that you use reset in some form.
state <= IDLE;
counter <= 0;
end
else begin
case (state)
IDLE: begin
if (dataReady) begin
state <= COUNTING;
counter <= counter - 1;
end
end
COUNTING: begin
if (counter == 0)
state <= IDLE;
else
counter <= counter - 1;
end
endcase
end
end
I hope i did it right, did not test.
Now you do not need the other two always blocks at all.
How to fix multiple driver , default value and combinational loop problems in the code below?
always #(posedge clk)
myregister <= #1 myregisterNxt;
always #* begin
if(reset)
myregisterNxt = myregisterNxt +1;
else if(flag == 1)
myregister = myregister +2;
end
right there are at least 3 issues in your code:
you are driving myregister within 2 different always blocks. Synthesis will find multiple drivers there. Simulation results will be unpredictable. The rule: you must drive a signal within a single always block.
you ave a zero-delay loop over myregisterNxt = myregisterNxt +1. Since you are using a no-flop there, it is a real loop in simulation and in hardware. You need to break such loops with flops
#1 delay is not synthesizable and it is not needed here at all.
You have not described what you were trying to build and it is difficult to figure it out from our code sample. In general, reset is used to set up initial values. So, something like the following could be a template for you.
always #(posedge clk) begin
if (reset)
myregister <= 0;
else
myregister <= myregister + increment;
end
always #* begin
if (flag == 1)
increment = 1;
else
increment = 2;
end
the flop with posedge clk and nonblocking assignments will not be in a loop.
I have created a testbench for a simple positive edge triggerred d flip flop with synchronous active low reset. In the testbench, the first case gives inputs at "#posedge clk", and in the second case, I am giving inputs based on "wait 10ns" statements.
In the first case, the output of the flop changes after 1 clock cycle, whereas in the second case it changes immediately in the same clock cycle in the simulator.
Why?
I am simulating in the Quartus Simulator.
Code:
initial
begin
//Case 1: Using Event Based statements
n_reset = 1'b0;
reset = 1'b1;
d = 1'b0;
repeat(2)#(posedge clk);
n_reset = 1'b1;
repeat(2)#(posedge clk);
d = 1'b1;
#(posedge clk);
d = 1'b0;
#(posedge clk);
d = 1'b1;
//Case 2: Using wait Statement
#50ns;
n_reset = 1'b0;
reset = 1'b1;
d = 1'b0;
#50ns;
n_reset = 1'b1;
#20ns;
d = 1'b1;
#10ns;
d = 1'b0;
#10ns;
d = 1'b1;
#50ns;
end
You see differences because there are simulation race conditions in the testbench.
Let's assume your flipflop uses the recommended coding style, something like this:
always #(posedge clk) q <= d;
q is the flipflop output. It is important to note that the flipflop uses a nonblocking assignment (<=).
The testbench code uses blocking assignments (=) everywhere. Let's also assume you are always changing d at the same time as the clk rising edge when you use # delays. When that is the case, you have a race condition no matter what type of timing control you are using (Case 1 or Case 2). Although you are seeing a difference between the two cases, there is no guarantee that you will see a difference because of the race.
Since the testbench uses blocking and the design (flipflop) use nonblocking, the value of d that is sampled inside the design is indeterminate.
The best way to avoid the race condition and to guarantee that the d input is synchronous to clk is to use nonblocking assignments in the testbench:
repeat(2) #(posedge clk);
d <= 1'b1;
#(posedge clk);
d <= 1'b0;
Using wait Statement:
Delays the execution of a procedural statement by specific simulation time.
"#<_time> <_statement>;"
Using Event Based statements:
Delays execution of the next statement until the specified transition on a signal.
# (< posedge >|< negedge > signal) < statement >;
Level-Sensitive Even Controls ( Wait statements ):
Delays execution of the next statement until < expression > evaluates to true
wait (< expression >) < statement >;
Now I think you are able to differentiate, Right?