What is the exact usage of Clocking Blocks in System Verilog, and how does it differ from normal always # (posedge clk) block?
Some differences, which I know :
Clocking Block samples input data from Preponed Region, whereas in normal always block, there are always chances of race condition.
Clocking Block is not synthesizable, but normal always # (posedge clk) is synthesizable.
Still, I am not getting the specific usage of Clocking Block, so kindly give your inputs, as well as correct me, if I have mentioned something wrong.
While I havent done much with clocking blocks, I can provide a basic understanding of their purpose and primary difference with the always block construct.
It is important to note these constructs are very different and solve very different problems. The always block is really the heart of Verilog and serves as the primary descriptor of logic and registers (Im kind of lumping together always #*, always_comb, always_latch, always #(posedge clk) and always_ff all together because they all do a similar thing, though for different use cases and with several nuances). So, the always #(posedge clk) is for describing registers or, more accurately, describing actions to be taken every time the given signal has a positive edge (just like FFs/registers behave in real circuits). Thus, when the clocking event happens, the code for that block executes.
Clocking blocks are used to generalize how the timing of events surrounding clock events should behave. In real circuits, you typically have hold time and setup time constraints for each FF in the design. These constraints dictate the limitation on clock frequency for circuits and are important to understand when it comes to designing hazard-free logic circuits. In simulation of HDL code however, recreating these timing paradigms can be annoying and not scalable, especially when dealing with synchronous interfaces between testbench code and design code. As such, SystemVerilog includes the clocking block construct as a way of providing testbenches with a method of easily defining the timing of such interfaces with a defined clock, builtin skew and constructs that allows stimulus in testbenches to be defined by the clock in a nicer way.
When you define a clocking block, you are defining a set of signals to be synchronized to the provided clock with defined skews, so then whenever you try to assign inputs or read from outputs, these signals are automatically skewed by the given amount (thus behaving a in more realistic way). Also, with clocking, you can use the ## construct in stimulus and checking blocks to delay events by a certain number of clock cycles (true you can use #(posedge clk); to do that, but the ## syntax is much cleaner. Ultimately, clocking blocks allow you to build scalable testbenches that include timing information for synchronous interfaces (because the timing information is all in the clocking block). You can find a more complete explanation and examples of clocking blocks here:
https://www.doulos.com/knowhow/sysverilog/tutorial/clocking/
The important take-aways are these:
The difference between always #(posedge clk) and clocking blocks is that the former is about describing registers and the latter is about describing the timing of a synchronous interface between a DUT and the testbench.
Thus, the direct comparison you make in your questions is not really appropriate. However to answer your questions directly:
Clocking blocks sample their inputs in the Postponed region of the timestep defined by the input skew (ie, skew time before the clocking event). As the default is 1step, the sample is done in the Postponed region of the previous step before the clocking event (which is the same as the Preponed region of the current step in terms of value). The outputs are driven in the ReNBA region skew time steps after the clocking event (the default skew is 0, thus the output is driven in the ReNBA of the same timestep as the clocking event).
As clocking blocks are for defining a timing model (for synchronous lines) between a DUT and its testbench, they are indeed not snythesizable. They are a testbench construct, much like initial (ignoring a few cases), final, assertions and programs.
To learn more about clocking blocks, read Chapter 14 of IEEE1800-2012. 14.13 talks about input skew and 14.16 talks about output skew.
Related
This question is in the context of FPGA synthesis if that makes any difference. The data sheet (iCE40UP) states that each logic cell has a D-type flop with asynchronous reset and clock enable inputs.
Many verilog tutorials introduce sequential logic with something like:
always #(posedge clk)
begin
some_reg <= [...]
end
I'm familiar with clocked logic and this makes intuitive sense to me.
Then the very next concepts introduced are usually:
Be careful to not accidentally create a latch because what you really need is a proper register.
always #(posedge clk or [pos|neg]edge reset)
always #(*)
In Wikipedia I read scary statements like "if the system has a dependence on any continuous inputs then these are likely to be vulnerable to metastable states. [...] If the inputs to an arbiter or flip-flop arrive almost simultaneously, the circuit most likely will traverse a point of metastability."
At the risk of having my question closed for being poorly-formed ... what am I missing?
Is asynchronous reset recommended design practice? What is gained by not treating reset like any other input and having it take effect on the next cycle? Documentation for real chips usually requires that the RST* pin is held low for many clock cycles.
Does having a latch in the design make it asynchronous? How do we ensure proper timing is observed in the presence of a latch driven by something outside the clock domain?
When would anyone ever actually want a latch in a clocked design? Why does verilog make it so easy to create one accidentally?
Thanks!
Seemingly related questions:
- Verilog D-Flip-Flop not re-latching after asynchronous reset
- What if I used Asynchronous reset, Should I have to make as synchronous turned it?
Synchronous vs. asynchronous reset has some similarities to the big endian vs. little endian battle for CPUs.
In many cases, both types work equally well.
But there are cases when either type has an advantage over the other.
At situations like powerup or powerdown you may not have a valid clock, but you still need the reset to work to put your system in a known passive state, and avoid dangerous I/O glitches.
Only asynchronous reset can do that.
If your design contains registers which lack reset capability, such as RAM blocks, then using asynchronous reset on the registers feeding adr, data and control signals to the RAM can cause corruption of the RAM content when a reset occurs. So if you need the ability to do a warm reset where RAM content must be preserved: Use synchronous warm reset for the logic closest to the RAM.
Altera and Xilinx are adding to the confusion by recommending their customers to only use synchronous reset.
Using only synchronous reset can work well on Altera and Xilinx, since both are SRAM based FPGA architectures, so powerup glitches are never a concern.
But if you want to make your design portable to other architectures, such as ASICs or flash FPGAs, then asynchronous reset may be the better default choice.
Regarding your question about metastability caused by asynchronous reset. That is correct. A fully asynchronous reset signal can cause metastability.
That is why you must always synchronize the rising edge of a active low asynchronous reset signal.
Only the falling edge of the reset can be fully asynchronous.
Synchronizing only the rising edge is done by two flip-flops.
Latches: No, you almost never want latches in a clocked design.
Good practice is to let the DRC trigger an error in case a latch is found.
how to handle the multi-clock design signals with the clock generated from the same source?
For example,
one clock domain is 25Mhz
the other one is 100Mhz
how can I handle the data bus from 25Mhz to 100Mhz
and also 100Mhz to 25Hhz?
don't want to use AFIFO tho
any other easy CDC way to handle it?
Case 1: If the source ensures that the edges of the clocks are aligned, there is no need to do anything in the design. A single-bit and multi-bit data have no difference.
Case 2: If the edges are not aligned, but the phase relationship is known, the clocks are still synchronous. The synthesis/STA/P&R tool can calculate the worst cases for timing (e.g. setup/hold) checks. In case there is no violation, no need to do anything again. The most important part here is defining the timing constraints correctly.
Case 3: If the clocks are asynchronous, one solution is carrying an enable signal with the bus. The enable signal is synchronized by a pair of flip-flops. Then the data bits are masked or passed according to the value of synchronized enable signal. This solution is explained here as well as many other solutions and cases.
Depends if the two clocks are synchronous or asynchronous with respect to each other. You can use a 2 bit/n-bit synchronizer to eliminate the meta stability issue in CDC. Other approaches are mux based handshake mechanism, gray code counter.
If you are sending data from slower clock domain to faster clock domain, fast clock should be 1.5 times that of the slow clock.
for faster clock to slower clock domain, Data of Fast clock should be 1.5 times that of the slower clock.
I am trying to synthesize the rocket core in Design compiler using TSMC28HPM library. The timing is not getting met !
Targetted Frequency : 500MHz
Without FPU : Achievable freq. 400MHz
With FPU : Achievable freq. 200MHz
Currently my constraints just has the clock defined.
Are there any timing exceptions for the design ?
What is the scenario assumed/tested to achieve 1 GHz ?
Register retiming is not yet enabled ( Would it push the frequency to 1GHz ? )
Failing paths summary:
Startpoint: RocketTile_1_core/div/divisor_reg_* (rising edge-triggered flip-flop clocked by clk) Endpoint: RocketTile_1_core/div/remainder_reg_* (rising edge-triggered flip-flop clocked by clk) (VIOLATED) -0.76
Startpoint: RocketTile_1_core/div/remainder_reg_* (rising edge-triggered flip-flop clocked by clk) Endpoint: RocketTile_1_core/div/remainder_reg_* (rising edge-triggered flip-flop clocked by clk) (VIOLATED) -0.76
Startpoint: RocketTile_1_HellaCache_1/s2_store_bypass_reg (rising edge-triggered flip-flop clocked by clk) Endpoint: RocketTile_1_core/mem_reg_wdata_reg_* (rising edge-triggered flip-flop clocked by clk) (VIOLATED) -0.60
Startpoint: RocketTile_1_HellaCache_1/d (rising edge-triggered flip-flop clocked by clk) Endpoint: RocketTile_1_core/mem_reg_wdata_reg_* (rising edge-triggered flip-flop clocked by clk) (VIOLATED) -0.60
More failing paths to mem_reg_wdata_reg_*
Startpoint: RocketTile_1_core/mem_ctrl_branch_reg (rising edge-triggered flip-flop clocked by clk) Endpoint: RocketTile_1_dtlb/r_refill_tag_reg_* (rising edge-triggered flip-flop clocked by clk) (VIOLATED) -0.54
Startpoint: uncore_PRCI_1/time_reg_* (rising edge-triggered flip-flop clocked by clk) Endpoint: uncore_PRCI_1/time_reg_* (rising edge-triggered flip-flop clocked by clk) (VIOLATED) -0.52
Startpoint: uncore_outmemsys/l1tol2net/acqNet/arb/T_1236_reg_* (rising edge-triggered flip-flop clocked by clk) Endpoint: uncore_outmemsys/L2BroadcastHub_1/BufferedBroadcastAcquireTracker_2_1/data_buffer_4_reg_* (rising edge-triggered flip-flop clocked by clk) (VIOLATED) -0.51
Most of the violations are from t_1236_reg_*
Retiming of the FPU is mandatory - it's described combinationally and padded out with a parameterizable number of registers.
I also recommend playing with the other parameters to see if you can find a more favorable setup (TLB entries, BTB entries, etc.). Remove ISA extensions like the div unit and FPU since those are showing up in your critical paths. Also be aware that the uncore/L2 should probably be placed in its own clock domain.
However, since Rocket has reached >1.5GHz with full ISA support in IBM 45nm, I'm surprised that you aren't reaching 500 MHz.
There are many areas you need to consider for logic synthesis.
You're using DC, are you using a physical flow with a floor plan?
A proper physical flow will give you more accurate modelling of realistic wireload models.... else you might be synthesising something that ends out being unimplementable at P&R
You make no mention of your clock tree design.... what uncertainty are you emulating the effects of your (eventual) physical clocktree, there will be some skew!
I am not familiar with the tsmc library you are using, is there a variety of VT cells , low VT for fast logic at the sacrifice of leakage current.
In a spread VT setup you'd expect to see the number of low VT cells increase as the clock speed spec increases.
You may b using all regular or high VT cells! Which would give you a slower potential design speed.
It's always worth keeping an eye on heavily loaded cells, and areas of the architecture that bloat in area as the clock speed spec increases.
Logic bloat (increased area) is a classic sign of struggling timing closure and will lead to problems at P&R
Are you inserting DFT yet? Bare in mind if you aren't, that DC may be using scan cells already to achieve timing closure, this will lead to DFT problems down the line when you DO try to insert scan.
For a more informed answer, a complete timing report would be needed showing all of the cells between the timing start and end points
As already mentioned in other answers, pipelining of the data path will be critical, as probably will be memory timings.
Caution is always advised with a non-physical synthesis flow, you should always consider a PPA study.
Good luck
We know that program blocks are used in SystemVerilog to avoid race conditions between DUT and testbench. What did the verification engineers do before SystemVerilog came into picture? I can only think of using hand shake signals.
You use the same semantics that designers use to prevent race conditions in RTL: Non-blocking assignments, or alternative clock edges.
Program blocks are an unnecessary construct in SystemVerilog. See http://go.mentor.com/programblocks
You can avoid race condition without using program block.
Race condition is created just because of expression or assignments are try to access same signal at a same time.
If two signals try to access same signal at different time stamp then user can remove race condition.
Actually code is written in verilog or system verilog is execute in different time region like active region, reactive region.
Race condition can be removed using following things.
(1) Program block
(2) Clocking block
(3) Non blocking assigment
Before program block and clocking block race condition is removed using non blocking assignment.
As I explained above statement written in verilog code or system verilog code is not execute code in single time same. There are different region in which specific syntax is executed by tool.
Here I mainly talked about Active and Reactive region.
Active region consider continuous assignments, blocking assignments.
Reactive region consider LHS of non blocking assignments are evaluated in this region.
First active region is evaluated then reactive region is evaluated.
So before program block to remove race condition verification engineers take care of this things(regions of execution).
Now in system verilog there are many other regions are added like prepone region, observed region, postpone region.
Why we are using posedge clk in the designs we are using. Mostly negedge clk used for Flipflops. And, negedge clk will give Low Power.
Clarify me one thing that what is difference between posedge, negedge and event clk triggering and internal mechanism behind it. Give me some applications where we actually use which type of triggering mechanism.
Let us take below examples
initial clk=0;
always
#5 clk=~clk; //Clock starting from 0
initial clk=1;
always
#5 clk=~clk;// Clock starting from 1
What is different between these two programs? Is there any change will occur in triggering clk to circuit?
Normally, designs work with rising edges (posedge). Falling edges (negedge) are needed for:
multi cycle paths
a generic DDR description
other special I/O protocols
I think this is a stereotype:
In Europe, a clock starts with a high period followed by low, whereas in America the clock starts with low followed by a high period.
=> It's a question of defining the clock.
I'm not aware of any power savings from using negedge.