If I want statements to happen in parallel and another statement to happen when all other statements are done with, for example:
task read;
begin
if (de_if==NOP) begin
dp_op <= 3'b000;
dp_phase = EXEC;
end
else begin
if (de_if==EXEC_THEN) begin
dp_const <= de_src3[0];
dp_src <= de_src3;
dp_op <= {NOP,de_ctrl3};
dp_dest <= de_dest1;
end
else if (get_value(de_ctrl1,de_src1)==dp_mem[de_src2]) begin
dp_const <= de_src3[0];
dp_src <= de_src3;
dp_op <= {NOP,de_ctrl3};
dp_dest <= de_dest1;
end
else begin
dp_const <= de_src4[0];
dp_src <= de_src4;
dp_op <= {NOP,de_ctrl4};
dp_dest <= de_dest2;
end
#1 dp_phase=READ;
end
end
endtask
In this code I want the statement dp_phase = READ to only be executed after all other assignments are done, how do I do it?
As you can see what I did is wait 1 clock before the assignment but i do not know if this is how its done ...
You need a state machine. That's the canonical way to make things happen in a certain sequence. Try to remember that using a hardware description language is not like a regular programming language...you are just describing the kind of behavior that you would like the hardware to have.
To make a state machine you will need a state register, one or more flip-flops that keep track of where you are in the desired sequence of events. The flip-flops should be updated on the rising clock edge but the rest of your logic can be purely combinational.
Related
I am working on a Verilog design where I am using SRAM inside a FSM. I need to synthesize it later on since I want to fabricate the IC. My question is that I have a fully working code using reg registers where I use blocking assignment for concurrent operation. Since there is no clock in this system, it works fine. Now, I want to replace these registers with SRAM based memory, which brings in clock into the system. My first thought is to use non-blocking assignment and changing the dependency list from always #(*) to always # (negedge clk).
In the code snippet below, I want to read 5 sets of data from the SRAM (SR4). So what I do is I place a counter that counts till 5 (wait_var) for this to happen. By introducing additional counter, this code ensures that at 1st clock edge it enters the counter and at subsequent clock edges, the five sets of data is read from SRAM. This technique works for simple logic such as this.
S_INIT_MEM: begin
// ******Off-Chip (External) Controller will write the data into SR4. Once data is written, init_data will be raised to 1.******
if (init_data == 1'b0) begin
CEN4 <= CEN;
WEN4 <= WEN;
RETN4 <= RETN;
EMA4 <= EMA;
A4 <= A_in;
D4 <= D_in;
end
else begin
CEN4 <= 1'b0; //SR4 is enabled
EMA4 <= 3'b0; //EMA set to 0
WEN4 <= 1'b1; //SR4 set to read mode
RETN4 <= 1'b1; //SR4 RETN is turned ON
A4 <= 8'b0000_0000;
if (wait_var < 6) begin
if (A4 == 8'b0000_0000 ) begin
NUM_DIMENSIONS <= Q4;
A4 <= 8'b0000_0001;
end
if (A4 == 8'b0000_0001 ) begin
NUM_PARTICLES <= Q4;
A4 <= 8'b0000_0010;
end
if (A4 == 8'b0000_0010 ) begin
n_gd_iterations <= Q4;
A4 <= 8'b0000_0011;
end
if (A4 == 8'b0000_0011 ) begin
iterations <= Q4;
A4 <= 8'b0000_0100;
end
if (A4 == 8'b0000_0100 ) begin
threshold_val <= Q4;
A4 <= 8'b0000_0101;
end
wait_var <= wait_var + 1;
end
//Variables have been read from SR4
if(wait_var == 6) begin
CEN4 <= 1'b1;
next_state <= S_INIT_PRNG;
wait_var <= 0;
end
else begin
next_state <= S_INIT_MEM;
end
end
end
However, when I need to write a complex logic in the similar fashion, the counter based delay method gets too complex. Eg. say I want to read data from one SRAM (SR1) and want to write it to another SRAM (SR3).
CEN1 = 1'b0;
A1 = ((particle_count-1)*NUM_DIMENSIONS) + (dimension_count-1);
if (CEN1 == 1'b0) begin
CEN3 = 1'b0;
WEN3 = 1'b0;
A3 = ((particle_count-1)*NUM_DIMENSIONS) + (dimension_count-1);
if(WEN3 == 1'b0) begin
D3 = Q1;
WEN3 = 1'b1;
CEN3 = 1'b1;
end
CEN1 = 1'b1;
end
I know this still uses blocking assignments and I need to convert them to non-blocking assignments, but if I do and I do not introduce 1 clock cycle delay manually using counter, it will not work as desired. Is there a way to get around this in a simpler manner?
Any help would be highly appreciated.
The main part is, that non-blocking assignments are a simulation only artifact and provides a way for simulation to match hardware behavior. If you use them incorrectly, you might end up with simulation time races and mismatch with hardware. In this case your verification effort goes to null.
There is a set of common practices used in the industry to handle this situation. One is to use non-blocking assignments for outputs of all sequential devices. This avoids races and makes sure that the behavior of sequential flops and latches pipes data the same way as in real hardware.
Hence, one cycle delay caused by the non-blocking assignments is a myth. If you design sequential flops when the second one latches the data from the first, then the data will be moved across flops sequentially every cycle:
clk ------v----------------v
in1 -> [flop1] -> out1 -> [flop2] -> out2
clk 1 1 1 0
clk 3 1 1 1
clk 4 0 0 1
clk 5 0 0 0
In the above example data is propagated from out1 to out2 in the every next clock cycle which can be expressed in verilog as
always #(posedge clk)
out1 <= in1;
always #(posedge clk)
out2 <= out1;
Or you can combine those
always #(posedge clk) begin
out1 <= in1;
out2 <= out1;
end
So, the task of your design is to cleanly separate sequential logic from combinatorial logic and therefore separate blocks with blocking and non-blocking assignments.
There are cases which can and must be used with blocking assignments inside sequential blocks, as mentioned in comments: if you use temporary vars to simplify your expressions inside sequential blocks assuming that those vars are never used anywhere else.
Other than above never mix blocking and non-blocking assignments in a single always block.
Also, usually due to synthesis methodologies, use if 'negedge' is discouraged. Avoid it unless your synthesis methodology does not care.
You should browse around to get more information and example of blocking/non-blocking assignments and their use.
now I know in Verilog, to make a sequential logic you would almost always have use the non-blocking assignment (<=) in an always block. But does this rule also apply to internal variables? If blocking assignments were to be used for internal variables in an always block would it make it comb or seq logic?
So, for example, I'm trying to code a sequential prescaler module. It's output will only be a positive pulse of one clk period duration. It'll have a parameter value that will be the prescaler (how many clock cycles to divide the clk) and a counter variable to keep track of it.
I have count's assignments to be blocking assignments but the output, q to be non-blocking. For simulation purposes, the code works; the output of q is just the way I want it to be. If I change the assignments to be non-blocking, the output of q only works correctly for the 1st cycle of the parameter length, and then stays 0 forever for some reason (this might be because of the way its coded but, I can't seem to think of another way to code it). So is the way the code is right now behaving as a combinational or sequential logic? And, is this an acceptable thing to do in the industry? And is this synthesizable?
```
module scan_rate2(q, clk, reset_bar);
//I/O's
input clk;
input reset_bar;
output reg q;
//internal constants/variables
parameter prescaler = 8;
integer count = prescaler;
always #(posedge clk) begin
if(reset_bar == 0)
q <= 1'b0;
else begin
if (count == 0) begin
q <= 1'b1;
count = prescaler;
end
else
q <= 1'b0;
end
count = count - 1;
end
endmodule
```
You should follow the industry practice which tells you to use non-blocking assignments for all outputs of the sequential logic. The only exclusion are temporary vars which are used to help in evaluation of complex expressions in sequential logic, provided that they are used only in a single block.
In you case using 'blocking' for the 'counter' will cause mismatch in synthesis behavior. Synthesis will create flops for both q and count. However, in your case with blocking assignment the count will be decremented immediately after it is being assigned the prescaled value, whether after synthesis, it will happen next cycle only.
So, you need a non-blocking. BTW initializing 'count' within declaration might work in fpga synthesis, but does not work in schematic synthesis, so it is better to initialize it differently. Unless I misinterpreted your intent, it should look like the following.
integer count;
always #(posedge clk) begin
if(reset_bar == 0) begin
q <= 1'b0;
counter <= prescaler - 1;
end
else begin
if (count == 0) begin
q <= 1'b1;
count <= prescaler -1;
end
else begin
q <= 1'b0;
count <= count - 1;
end
end
end
You do not need temp vars there, but you for the illustration it can be done as the following:
...
integer tmp;
always ...
else begin
q <= 1'b0;
tmp = count - 1; // you should use blocking here
count <= tmp; // but here you should still use NBA
end
What does the <= do in Verilog?
For example:
always #(posedge Clock) begin
if (Clear) begin
BCD1 <= 0;
BCD0 <= 0;
end
end
"<=" in Verilog is called non-blocking assignment which brings a whole lot of difference than "=" which is called as blocking assignment because of scheduling events in any vendor based simulators.
It is Recommended to use non-blocking assignment for sequential logic and blocking assignment for combinational logic, only then it infers correct hardware logic during synthesis.
Non-blocking statements in sequential block will infer flip flop in actual hardware.
Always remember do not mix blocking and non-blocking in any sequential or combinational block.
During scheduling process of simulator:
There are four regions and order of execution of commands as follows
1) Active region
--Blocking assignments
--Evaluation of RHS of non-blocking assignments(NBA)
--Continuous assignment
--$display command
--Evaluate input and output of primitives
2) Inactive region
--#0 blocking assignments
3) NBA(non-blocking assignment update)
--update LHS of non-blocking assignments (NBA)
4) Postponed
--$monitor command
--$strobe command
Using of blocking assignment "=" for two variable at the same time slot causes race condition
eg: Verilog code with race condition,
always #(posedge Clock)
BCD0 = 0; // Usage of blocking statements should be avoided
always #(posedge Clock)
BCD1 = BCD0;
In order to avoid race condition use non-blocking statement "<="
eg:
always #(posedge Clock)
BCD0 <= 0; // Recommended to use NBA
always #(posedge Clock)
BCD1 <= BCD0;
When this block is executed, there will be two events added to the non blocking assign update queue.
Hence, it does the updation of BCD1 from BCD0 at the end of the time step.
Using Non-blocking "<=" assignment in continuous assignment statement is not allowed according to verilog LRM and will result in compilation error.
eg:
assign BCD0 <= BCD1; //Results in compilation error
Only use NBA in procedural assignment statements,
- initial and
- always blocks
This is called a 'non-blocking' assignment. The non-blocking assignment allows designers to describe a state-machine update without needing to declare and use temporary storage variables.
For example, in this code, when you're using a non-blocking assignment, its action won't be registered until the next clock cycle. This means that the order of the assignments is irrelevant and will produce the same result.
The other assignment operator, '=', is referred to as a blocking assignment. When '=' assignment is used, for the purposes of logic, the target variable is updated immediately.
The understand this more deeply, please look at this example (from Wikipedia):
module toplevel(clock,reset);
input clock;
input reset;
reg flop1;
reg flop2;
always # (posedge reset or posedge clock)
if (reset)
begin
flop1 <= 0;
flop2 <= 1;
end
else
begin
flop1 <= flop2;
flop2 <= flop1;
end
endmodule
In this example, flop1 <= flop2 and flop2 <= flop1 would swap the values of these two regs. But if we used blocking assignment, =, this wouldn't happen and the behavior would be wrong.
Since people have already explained the blocking/non blocking situation, I'll just add this here to help with understanding.
" <= " replaces the word "gets" as you read code
For example :
.... //Verilog code here
A<=B //read it as A gets B
When does A get B? In the given time slot, think of everything in hardware happening in time slots, like a specific sampled event, driven by clock. If the "<=" operator is used in a module with a clock that operates every 5ns, imagine A getting B at the end of that time slot, after every other "blocking" assignments have resolved and at the same time as other non blocking assignments.
I know its confusing, it gets better as you use and mess up bunch of designs and learn how it works that way.
"<=" is a non-blocking assignment operator in verilog."=" is a blocking assignment operator.
Consider the following code..
always#(clk)
begin
a=b;
end
always#(clk)
begin
b=a;
end
The values of a and b are being exchanged using two different always blocks.. Using "=" here caused a race-around condition. ie. both the variables a and b are being changes at the same time..
Using "<=" will avoid the race-around.
always#(clk)
begin
a<=b;
end
always#(clk)
begin
b<=a;
end
Hope i helped too..
<= is a non blocking assignment. The <= statements execute parallely. Think of a pipelined architecture, where we come across using such assignments.
A small exammple:
// initialise a, b, c with 1, 2 and 3 respectively.
initial begin
a <= 1
b <= 2
c <= 3
end
always#(clock.posedge)
begin
a <= b
b <= c
c <= a
end
After the first posedge clock:
a = 2, b = 3, c = 1
After the second posedge clock:
a = 3, b = 1, c = 2
After third posedge clock:
a = 1, b = 2, c = 3
As most told, it is a "Non Blocking <=" assignment widely used for Sequential logic design because it can emulate it best.
Here is why :
Mostly involving a delay(here posedge clock) it is something like it schedules the evaluation of the RHS to LHS after the mentioned delay and moves on to the next statement(emulating sequential) in flow unlike "Blocking = " which will actually delay the execution of the next statement in line with the mentioned delay (emulating combinational)
Is there any other functionality like always (that would only run if the sensitive signal changes and won't iterate as long as signal stays the same) which can be cascaded, separately or within the always , but is synthesizable in Verilog.
While I don't think there's a construct specifically like this in Verilog, there is an easy way to do this. If you do an edge detect on the signal you want to be sensitive to, you can just "if" on that in your always block. Like such:
reg event_detected;
reg [WIDTH-1:0] sensitive_last;
always # (posedge clk) begin
if (sensitive_signal != sensitive_last) begin
event_detected <= 1'b1;
end else begin
event_detected <= 1'b0;
end
sensitive_last <= sensitive_signal;
end
// Then, where you want to do things:
always # (posedge clk) begin
if (event_detected ) begin
// Do things here
end
end
The issue with doing things with nested "always" statements is that it isn't immediately obvious how much logic it would synthesize to. Depending on the FPGA or ASIC architecture you would have a relatively large register and extra logic that would be instantiated implicitly, making things like waveform debugging and gate level synthesis difficult (not to mention timing analysis). In a world where every gate/LUT counts, that sort of implicitly defined logic could become a major issue.
The assign statement is the closest to always you you can get. assign can only be for continuous assignment. The left hand side assignment must be a wire; SystemVerilog also allows logic.
I prefer the always block over assign. I find simulations give better performance when signals that usually update at the same time are group together. I believe the optimizer in the synthesizer can does a better job with always, but this might depend on the synthesizer being used.
For synchronous logic you'll need an always block. There is no issue reading hardware switches within the always block. The fpga board may already de-bounce the input for you. If not, then send the input through a two phase pipe line before using it with your code. This helps with potential setup/hold problems.
always #(posedge clk) begin
pre_sync_human_in <= human_in;
sync_human_in <= pre_sync_human_in;
end
always #* begin
//...
case( sync_human_in )
0 : // do this
1 : // do that
// ...
endcase
//...
end
always #(posedge clk) begin
//...
if ( sync_human_in==0 ) begin /* do this */ end
else begin /* else do */ end
//...
end
If you want to do a hand-shake having the state machine wait for a human to enter a multi-bit value, then add to states that wait for the input. One state that waits for not ready (stale bit from previous input), and the other waiting for ready :
always #(posedge clk) begin
case(state)
// ...
PRE_HUMAN_IN :
begin
// ...
state <= WAIT_HUMAN__FOR_NOT_READY;
end
WAIT_HUMAN_FOR_NOT_READY :
begin
// ready bit is still high for the last input, wait for not ready
if (sync_human_in[READ_BIT])
state <= WAIT_HUMAN_FOR_NOT_READY;
else
state <= WAIT_HUMAN_FOR_READY;
end
WAIT_HUMAN_FOR_READY :
begin
// ready bit is low, wait for it to go high before proceeding
if (sync_human_in[READ_BIT])
state <= WORK_WITH_HUMAN_INPUT;
else
state <= WAIT_HUMAN_FOR_READY;
end
// ...
endcase
end
I encountered a problem with synthesis where if I had two variables in an if statement, Synthesis will fail (with a very misleading and unhelpful error message).
Given the code snippet below
case(state)
//other states here
GET_PAYLOAD_DATA:
begin
if (packet_size < payload_length) begin
packet_size <= packet_size + 1;
//Code to place byte into ram that only triggers with a toggle flag
next_state = GET_PAYLOAD_DATA;
end else begin
next_state = GET_CHKSUM2;
end
end
I get an error in Xilinx ISE during synthesis:
ERROR:Xst:2001 - Width mismatch detected on comparator next_state_cmp_lt0000/ALB. Operand A and B do not have the same size.
The error claims that next_state isn't correct, but if I take out payload_length and assign a static value to it, it works perfectly fine. As both packet_size and payload_length are of type integer, they are the same size and that is not the problem. Therefore I assume its a similar problem to for loops not being implementable in hardware unless it is a static loop with a defined end. But If statements should work as it is just a comparator between 2 binary values.
What I was trying to do here is that when a byte is received by my module, it will be added into RAM until the the size of the entire payload (which I get from earlier packet data) is reached, then change to a different state to handle the checksum. As the data only comes in 1 byte at a time, I recall this state multiple times until the counter reaches the limit, then I set the next state to something else.
My question is then, how do I achieve the same results of calling my state and repeat until the counter has reached the length of the payload without the error showing up?
EDIT:
Snippets of how packet_size and payload_length are declared, as requested in comments
integer payload_length, packet_size;
initial begin
//other stuff
packet_size <= 0;
end
always # (posedge clk) begin
//case statements with various states
GET_PAYLOAD_LEN:
begin
if (rx_toggle == 1) begin
packet_size <= packet_size + 1;
addr <= 3;
din <= rx_byte_buffer;
payload_length <= rx_byte_buffer;
next_state = GET_PAYLOAD_DATA;
end else begin
next_state = GET_PAYLOAD_LEN;
end
end
rx_byte_buffer is a register of the input data my module receives as 8 bits wide, while packet_size increments in various other states of the machine prior to the one you see above.
I have gotten around the error by switching the if statement conditionals around, but still want to understand why that would change anything.
There are some errors that stick out right away about the code, while they may not fix this problem, they will need to be corrected because it will cause a difference in simulation and hardware tests.
The nextstate logic needs to be in a different always block that does not change based on the posedge of clock. The sensitivity list needs to include things like "state" and/or "*". And if you wanted the nextstate logic to be registered like it is now (which you don't) you should use a nonblocking assignment, this is described in great deal in the cummings paper, provided below.
http://www.sunburst-design.com/papers/CummingsSNUG2000SJ_NBA_rev1_2.pdf
the code should look something like this:
always # (*) begin
//case statements with various states
GET_PAYLOAD_LEN:
begin
if (rx_toggle == 1) begin
packet_size_en = 1'b1;
//these will need to be changed in a similar manner
addr <= 3;
din <= rx_byte_buffer;
payload_length <= rx_byte_buffer;
/////////////////////////////////////////////////////
next_state = GET_PAYLOAD_DATA;
end else begin
next_state = GET_PAYLOAD_LEN;
end
end
always#(posedge clk) begin
if(pcket_size_en)
packet_size <= packet_size +1 ;
end
Also, the first thing I would try is to make these a defined length, by making them of type reg (I assume that you wont be needing a signed number so it should have no difference on simulation), outside of generate blocks, you should try to not let synthesis play around with integers.