Frank Vahid - Digital design - High level state machine write to storage example - state-machine

I am self-teaching myself VHDL, and took this answer's advice by starting with Frank Vahid's Digital Design (2nd edition).
On page 254 (link) he explains that updates to storage items occur on the next rising clock edge, therefore, at the end of a state's clock cycle. This is explained in figure 5.14 using the "Jreg" storage item, the behavior of which I understand.
What I don't understand, is why the storage item "P" behaves differently:
1) Its value is already known upon entering state S0 (whereas "Jreg" is not)
2) Upon transitioning to state S1, "P" immediately updates to the value given by state S1, whereas "Jreg" is not updated until the end of that clock cycle
Is there a difference between "Jreg" and "P" I'm not aware of? Is it an error in the example?

It appears that P is a combinational signal (not dependent on a clock) and that Jreg is sequential register (is dependent on a clock). Jreg appears to behave as a counter (which requires a clock or drive signal of some sort). The example says that the machine waits for an input signal B to go high and once it does, it sets output P high. Then, using Jreg to count the number of clock cycles passed since B went high, it holds P high until Jreg counts to a certain number of clock cycles (2) at which point both Jreg and P are reset to 0.
1) Jreg is unknown at the start and most likely so is P, however P does not have to wait for a clock tick because its a combinational signal.
2) Again Jreg is sequential so it must wait for a clock tick to change its state.

Related

Understanding Verilog Code with two Clocks

I am pretty new to Verilog
and I use it to verify some code from a simulation program.
Right now I am struggeling if a verilog code snippet because the simulation programm uses 2 clocks ( one system clock and a pll of this ) where two hardware componentes work together, thus synchronize each other:
module something (input data)
reg vid;
always #(posegde sys_clk)
vid <= data;
always #(posegde pll_clk)
if (vid)
// do something
When reading about non blocking assignments it says the evaluation of the left-hand side is postponed until other evaluations in the current time step are completed.
Intuitive I thought this means they are evaluated at the end of the time step, thus if data changes from 0 to 1 in sys_clk tick "A", this means at the end of "A" and the beginning of next sys_clk tick this value is in vid and so only after "A" the second always block ( of pll_clk) can read vid = 1
Is this how it works or did i miss something ?
Thank you :)
In this particular case it means that
if posedge sys_clk and pll_clk happen simultaneously then vid will not have a chance to update before it gets used in the pll_clk block. So, if vid was '0' before the clock edges (and is updated to '1' in the first block), it will still be '0' in the if statement of the second block. This sequence is guaranteed by use of the non-blocking assignment in the first block
if the posedges are not happening at the same time, then the value of vid will be updated at posedge sys_clk and picked up later at the following posedge of pll_clk.
In simulation non-blocking assignment guarantees that the assignment itself happens after all the blocks are evaluated in the current clock tick. It has nothing to do with the next clock cycle. However, the latter is often used in tutorials to illustrate a particular single-clock situation, creating confusion.
Also being simultaneous is a simulation abstraction, meaning that both edges happen in the same clock tick (or within a certain small time interval in hardware).

correct use and design of gated clock in Verilog

I have a design I am working on in Verilog. In part of my design a counter is incremented by a clock - this occurs a half clock cycle before the output of the counter is latched into a parallel load shift register.
In some circumstances I want to HOLD the counter. To do this I have gated the clock:
assign sync_gated = i_sync || !r_en;
This is combinational logic, but I don't see any issue as there is a full clock cycle (we run at 2MHz) for the output of the counter to settle. A few ns of propagation delay will not cause a problem.
The code synthesises OK, but I get this warning:
y_ctr/sync_gated_inv(y_ctr/sync_gated_inv1:O)|
NONE()(y_ctr/r_axis_address_15)| 16 |
x_ctr/sync_gated_inv(x_ctr/sync_gated_inv1:O)|
NONE()(x_ctr/r_axis_address_15)| 16 |
---------------------------------------------+---------------------------------+-------+ (*) These 2 clock signal(s) are generated by combinatorial logic, and
XST is not able to identify which are the primary clock signals.
Please use the CLOCK_SIGNAL constraint to specify the clock signal(s)
generated by combinatorial logic
Is this bad design? if so, why? or do I just need to add some constraint here to reassure the compiler?
Thank you.
You can't 'just' gate the clock. You can get all kind of clocking artifacts.
Look here for how to gate a clock.

How does clock gating in RTL design work?

I'm trying to understand how clock gating works in RTL design.
I've an example wave here:
Description:
1st signal is gated_clock
2nd signal is clock_enable
3rd signal is ungated_clock
So there are 3 cycles in this wave (let's say cycle 0,1,2). In cycle 0, clock_enable was low and gated_clock was turned off. In cycle 1 clock_enable goes high and in next cycle (cycle 2) gated_clock turns on.
Now, during simulation I see some cases where an incoming data received at cycle 1 is properly being registered into the module that is gated by the clock (using gated_clock). It's kinda odd to me and I don't quite understand how it's possible.
The logic is like this:
always_ff #(posedge gated_clock, negedge reset) begin
if (~reset) begin
some_val <= 1'b0;
end else begin
if (in_valid==1'b1 && in_ready==1'b1) begin
some_val <= in_val;
end else begin
some_val <= 1'b0;
end
end
end
So I'm seeing that if in_valid and in_ready was high in cycle 1 then some_val will register the incoming in_val data and it'll be available in cycle 2. However in cycle 1, gated_clock was zero. So how did the in_val get sampled here? From what I understand, posedge gated_clock must be 1 if we want to flop in_val in cycle 1 .
I might be missing some core circuit level digital design concept. I'll really appreicate any help.
Updated wave:
1st signal is gated_clock
2nd signal is clock_enable
3rd signal is ungated_clock
4th signal is in_valid
5th signal is in_ready
6th signal is in_val
7th signal is some_val
So here you will see at cycle 0, gated_clock is off but in_val and in_ready is high. The input data in_val is also high. In next cycle some_val goes high. So it looks like in_val captured in cycle 0 even though gated_clock was off.
It's possible there is a glitch on the gated clock that's not showing up on the waveform. You'll need to look at the User Manual of the tool you're using to find out how to record and display glitches. It might also help to see the logic for gating the clock. Is clock_enable assigned using an NBA (<=)?
Your understand of what is clocked seemed off. in_val isn't clocked here (actually, from the snippet, I can't see where it is coming from). It is free to change at will (again, from the point of view of this snippet). At the point that gated_clock goes high then whatever the value of in_val at that time will now be captured in some_val and this will be available until such time as gated_clock goes high again (at which point we will sample a new value).
Based on the new waveform some_val is generated correctly to the posted RTL. On the very first edge of gated_clock signals in_valid and in_ready are both high, thus some_val is going high too in that cycle. On the next edge it toggles back to low because in_valid goes low (and btw in_val too)

Verilog and ASM implementation

In the question below,
The ASM chart shows that value of q_next is compared to 0 to proceed to next state but before q_next is compared, the value of q is already updated with q_next, so if we compare the value of q with 0, will the results be same in terms of timing and other parameters?
Also what should be the types of q_next and q ? Should they be reg or wire?
I have attached the screenshots of the ASM chart and the Verilog code. I also don't understand the timing implications of conditional box (in general, can't we put the output of a conditional box in a separate state which doesn't depend on the output of the conditional box)?,
like when in the wait1 state, we check the value of sw and if true, we decrement the counter and then check if counter has reached to zero and then asser db_tick. I want to understand the time flow when we move from wait1 and increment counter and assert db_tick. Are there any clock cycles involved between these stages, that is moving from a state to a conditional box?
Also in the verilog code, we use q_load and q_tick to control the counter. Why these signals are used when we can simply control the counter in the states?
Is it done to make sure that the FSM (control path) controls the counter (data path)? Please explain. Thanks in advance.
In the question below, the ASM chart shows that value of q_next is
compared to 0 to proceed to next state but before q_next is compared,
the value of q is already updated with q_next, so if we compare the
value of q with 0, will the results be same in terms of timing and
other parameters?
No. When q_next has a value of 0, q still contains a value of 1 until it's updated on the next positive clock edge. If you check for q==0, you will spend an extra clock cycle in each wait state.
Also what should be the types of q_next and q? Should they be reg or
wire?
Either. reg types (like q_reg) mean they're assign a value in an always block, while wire types (like q_next) are assigned using the assign statement or as the output of a submodule.
I also don't understand the timing implications of conditional box (in
general, can't we put the output of a conditional box in a separate
state which doesn't depend on the output of the conditional box)?,
like when in the wait1 state, we check the value of sw and if true, we
decrement the counter and then check if counter has reached to zero
and then asser db_tick. I want to understand the time flow when we
move from wait1 and increment counter and assert db_tick.
Here's the flow of operations for a single clock cycle while in the wait1 state:
Is SW==1? If not, do nothing else, and go to state zero. Those operations will be done on the next cycle.
If SW==1, compute q_next, and assign that value to q for the next cycle.
Is q_next==0? If not, remain in wait1 for the next cycle and repeat.
Otherwise, assert db_tick=1 for this clock cycle, and go to state one.
If you split up the two conditionals into two separate states, counting down to 0 will take twice as long.
Are there any clock cycles involved between these stages, that is
moving from a state to a conditional box?
Based on the diagram, all operations (comparing sw, subtracting from q, etc) within a given state - that is, one of the dotted-line boxes - are performed in a single clock cycle.
Also in the verilog code, we use q_load and q_tick to control the
counter. Why these signals are used when we can simply control the
counter in the states? Is it done to make sure that the FSM (control
path) controls the counter (data path)?
You could do it that way too. Just be sure to assign a value to q_next in the default case to prevent latching. Splitting the data path and control path into separate always blocks/assign statements does improve readability though, IMO.

Which region are continuous assignments and primitive instantiations with #0 scheduled

All #0 related code examples I have found are related to procedural code (IE code inside begin-end). What about continuous assignments and primitive instantiations? The IEEE 1364 & IEEE 1800 (Verilog & SystemVerilog respectively) only give a one line description that I can find (Quoting all version of IEEE 1364 under the section name "The stratified event queue"):
An explicit zero delay (#0) requires that the process be suspended and added as an inactive event for the current time so that the process is resumed in the next simulation cycle in the current time.
I read documents and talked with a few engineers that have been working with Verilog long before the IEEE Std 1364-1995. In summary, the inactive region was failed solution to synchronizing flip-flops with Verilog's indeterminate processing order. Later Verilog created non-blocking assignments (<=) and resolved the synchronizing with indeterminate order. The inactive region was left in the scheduler to not break legacy code and a few obscure corner cases. Modern guidelines say to avoid the using #0 because it creates race conditions and may hinder simulation performance. The performance impact is a don't care for small designs. I run huge designs that with mixed RTL to transistor level modules. So even small performance gains add up and not having to debug a rouge race conditions are time savers.
I've ran test case removing/adding #0 to Verilog primitives on large scale designs. Some simulators have notable changes others do not. It is difficult to tell who is doing a better job following the LRM or has a smarter optimizer.
Adding a per-compile script to remove hard coded forms of #0, is easy enough. The challenge is with parameterized delay. Do I really need to create generate blocks for to avoid the inactive region? Feels like it could introduce more problems than solve:
generate
if ( RISE > 0 || FALL > 0)
tranif1 #(RISE,FALL) ipassgate ( D, S, G );
else
tranif1 ipassgate ( D, S, G );
if ( RISE > 0 || FALL > 0 || DECAY > 0)
cmos #(RISE,FALL,DECAY) i1 ( out, in, NG, PG );
else
cmos i1 ( out, in, NG, PG );
if (DELAY > 0)
assign #(DELAY) io = drive ? data : 'z;
else
assign io = drive ? data : 'z;
endgenerate
Verilog primitives and continuous assignments have been with Verilog since the beginning. I believe parameterized delay has been around longer then the inactive region. I haven't found any documentation on recommendation or explanation for these conditions. My local network of Verilog/SystemVerilog gurus are all unsure which region it should run in. Is there a detail we are all overlooking or is it a gray area in the language? If it is a gray area, how do I determine which way it is implanted?
An accepted answer should include a citation to any version of IEEE1364 or IEEE1800. Or at least a way to do proof of concept testing.
This is an easy one. Section 28.16 Gate and net delays of the 1800-2012 LRM as well as section 7.14 Gate and net delays of the 1364-2005 LRM both say
For both gates and nets, the default delay shall be zero when no delay
specification is given.
So that means
gateName instanceName (pins);
is equivalent to writing
gateName #0 instanceName (pins);
I'm not sure where the text you quoted came from, but section 4.4.2.3 Inactive events region of the 1800-2012 LRM says
If events are being executed in the active region set, an explicit #0
delay control requires the process to be suspended and an event to be
scheduled into the Inactive region of the current time slot so that
the process can be resumed in the next Inactive to Active iteration.
The key text is delay control, which is a procedural construct. So #0 as an inactive event only applies to procedural statements.
The problem with procedural #0's is that they move race conditions, they don't eliminate them. Sometimes you have to add multiple serial #0's to move away from a races condition, but you don't always know how many because another piece of code is also adding #0's. Just look at the UVM code; it's littered with messy #0's because they did not take the time to code things properly.

Resources