Why do verilog tutorials commonly make reset asynchronous? - verilog

This question is in the context of FPGA synthesis if that makes any difference. The data sheet (iCE40UP) states that each logic cell has a D-type flop with asynchronous reset and clock enable inputs.
Many verilog tutorials introduce sequential logic with something like:
always #(posedge clk)
begin
some_reg <= [...]
end
I'm familiar with clocked logic and this makes intuitive sense to me.
Then the very next concepts introduced are usually:
Be careful to not accidentally create a latch because what you really need is a proper register.
always #(posedge clk or [pos|neg]edge reset)
always #(*)
In Wikipedia I read scary statements like "if the system has a dependence on any continuous inputs then these are likely to be vulnerable to metastable states. [...] If the inputs to an arbiter or flip-flop arrive almost simultaneously, the circuit most likely will traverse a point of metastability."
At the risk of having my question closed for being poorly-formed ... what am I missing?
Is asynchronous reset recommended design practice? What is gained by not treating reset like any other input and having it take effect on the next cycle? Documentation for real chips usually requires that the RST* pin is held low for many clock cycles.
Does having a latch in the design make it asynchronous? How do we ensure proper timing is observed in the presence of a latch driven by something outside the clock domain?
When would anyone ever actually want a latch in a clocked design? Why does verilog make it so easy to create one accidentally?
Thanks!
Seemingly related questions:
- Verilog D-Flip-Flop not re-latching after asynchronous reset
- What if I used Asynchronous reset, Should I have to make as synchronous turned it?

Synchronous vs. asynchronous reset has some similarities to the big endian vs. little endian battle for CPUs.
In many cases, both types work equally well.
But there are cases when either type has an advantage over the other.
At situations like powerup or powerdown you may not have a valid clock, but you still need the reset to work to put your system in a known passive state, and avoid dangerous I/O glitches.
Only asynchronous reset can do that.
If your design contains registers which lack reset capability, such as RAM blocks, then using asynchronous reset on the registers feeding adr, data and control signals to the RAM can cause corruption of the RAM content when a reset occurs. So if you need the ability to do a warm reset where RAM content must be preserved: Use synchronous warm reset for the logic closest to the RAM.
Altera and Xilinx are adding to the confusion by recommending their customers to only use synchronous reset.
Using only synchronous reset can work well on Altera and Xilinx, since both are SRAM based FPGA architectures, so powerup glitches are never a concern.
But if you want to make your design portable to other architectures, such as ASICs or flash FPGAs, then asynchronous reset may be the better default choice.
Regarding your question about metastability caused by asynchronous reset. That is correct. A fully asynchronous reset signal can cause metastability.
That is why you must always synchronize the rising edge of a active low asynchronous reset signal.
Only the falling edge of the reset can be fully asynchronous.
Synchronizing only the rising edge is done by two flip-flops.
Latches: No, you almost never want latches in a clocked design.
Good practice is to let the DRC trigger an error in case a latch is found.

Related

Grayzone between blocking and non-blocking I/O?

I am familar with programming according to the two paradigms, blocking and non-blocking, on the JVM (Java/nio, Scala/Akka).
However, I see a kind of grayzone in between that confuses me.
Look at any non-blocking program of your choice: it is full of blocking statements!
For example, each assignment of a variable is a blocking operation that waits for CPU-registers and memory-reads to succeed.
Moreover, non-blocking programs even contain blocking statements that carry out computations on complex in-memory-collections, without violating the non-blocking paradigm.
In contrast to that, the non-blocking paradigm would clearly be violated if we would call some external web-service in a blocking way to receive its result.
But what is in between these extremes? What about reading/writing a tiny file, a local socket, or making an API-call to an embedded data storage engine (such as SQLite, RocksDb, etc.). Is it ok to do blocking reads/writes to these APIs? They usually give strong timing guarantees in practice (say << 1ms as long as the OS is not stalled), so there is almost no practical difference to pure in-memory-access. As a precise example: is calling RocksDBs get/put within an Akka Actor considered to be an inadvisable blocking I/O?
So, my question is whether there are rules of thumb or precise criteria that help me in deciding whether I may stick to a simple blocking statement in my non-blocking program, or whether I shall wrap such a statement into non-blocking boilerplate (framework-depending, e.g., outsourcing such calls to a separate thread-pool, nesting one step deeper in a Future or Monad, etc.).
for example, each assignment of a variable is a blocking operation that waits for CPU-registers and memory-reads to succeed
That's not really what is considered "blocking". Those operations are constant time, and that constant is very low (a few cycles in general) compared to the latency of any IO operations (anywhere between thousands and billions of cycles) - except for page faults due to swapped memory, but if those happen regularly you have a problem anyway.
And if we want to get all nitpicky, individual instructions do not fully block a CPU thread as modern CPUs can reorder instructions and execute ones that have no data dependencies out of order while waiting for memory/caches or other more expensive instructions to finish.
Moreover, non-blocking programs even contain blocking statements that carry out computations on complex in-memory-collections, without violating the non-blocking paradigm.
Those are not considered as blocking the CPU from doing work. They should not even block user interactivity if they are correctly designed to present the results to the user when they are done without blocking the UI.
Is it ok to do blocking reads/writes to these APIs?
That always depends on why you are using non-blocking approaches in the first place. What problem are you trying to solve? Maybe one API warrants a non-blocking approach while the other does not.
For example most file IO methods are nominally blocking, but writes without fsync can be very cheap, especially if you're not writing to spinning rust so it can be overkill to avoid those methods on your compute threadpool. On the other hand one usually does not want to block a thread in a fixed threadpool while waiting for a multi-second database query

same source, different clk frequency(multi-clock design)

how to handle the multi-clock design signals with the clock generated from the same source?
For example,
one clock domain is 25Mhz
the other one is 100Mhz
how can I handle the data bus from 25Mhz to 100Mhz
and also 100Mhz to 25Hhz?
don't want to use AFIFO tho
any other easy CDC way to handle it?
Case 1: If the source ensures that the edges of the clocks are aligned, there is no need to do anything in the design. A single-bit and multi-bit data have no difference.
Case 2: If the edges are not aligned, but the phase relationship is known, the clocks are still synchronous. The synthesis/STA/P&R tool can calculate the worst cases for timing (e.g. setup/hold) checks. In case there is no violation, no need to do anything again. The most important part here is defining the timing constraints correctly.
Case 3: If the clocks are asynchronous, one solution is carrying an enable signal with the bus. The enable signal is synchronized by a pair of flip-flops. Then the data bits are masked or passed according to the value of synchronized enable signal. This solution is explained here as well as many other solutions and cases.
Depends if the two clocks are synchronous or asynchronous with respect to each other. You can use a 2 bit/n-bit synchronizer to eliminate the meta stability issue in CDC. Other approaches are mux based handshake mechanism, gray code counter.
If you are sending data from slower clock domain to faster clock domain, fast clock should be 1.5 times that of the slow clock.
for faster clock to slower clock domain, Data of Fast clock should be 1.5 times that of the slower clock.

Avoiding race conditions without using program blocks in systemverilog

We know that program blocks are used in SystemVerilog to avoid race conditions between DUT and testbench. What did the verification engineers do before SystemVerilog came into picture? I can only think of using hand shake signals.
You use the same semantics that designers use to prevent race conditions in RTL: Non-blocking assignments, or alternative clock edges.
Program blocks are an unnecessary construct in SystemVerilog. See http://go.mentor.com/programblocks
You can avoid race condition without using program block.
Race condition is created just because of expression or assignments are try to access same signal at a same time.
If two signals try to access same signal at different time stamp then user can remove race condition.
Actually code is written in verilog or system verilog is execute in different time region like active region, reactive region.
Race condition can be removed using following things.
(1) Program block
(2) Clocking block
(3) Non blocking assigment
Before program block and clocking block race condition is removed using non blocking assignment.
As I explained above statement written in verilog code or system verilog code is not execute code in single time same. There are different region in which specific syntax is executed by tool.
Here I mainly talked about Active and Reactive region.
Active region consider continuous assignments, blocking assignments.
Reactive region consider LHS of non blocking assignments are evaluated in this region.
First active region is evaluated then reactive region is evaluated.
So before program block to remove race condition verification engineers take care of this things(regions of execution).
Now in system verilog there are many other regions are added like prepone region, observed region, postpone region.

Usage of Clocking Blocks in Systemverilog

What is the exact usage of Clocking Blocks in System Verilog, and how does it differ from normal always # (posedge clk) block?
Some differences, which I know :
Clocking Block samples input data from Preponed Region, whereas in normal always block, there are always chances of race condition.
Clocking Block is not synthesizable, but normal always # (posedge clk) is synthesizable.
Still, I am not getting the specific usage of Clocking Block, so kindly give your inputs, as well as correct me, if I have mentioned something wrong.
While I havent done much with clocking blocks, I can provide a basic understanding of their purpose and primary difference with the always block construct.
It is important to note these constructs are very different and solve very different problems. The always block is really the heart of Verilog and serves as the primary descriptor of logic and registers (Im kind of lumping together always #*, always_comb, always_latch, always #(posedge clk) and always_ff all together because they all do a similar thing, though for different use cases and with several nuances). So, the always #(posedge clk) is for describing registers or, more accurately, describing actions to be taken every time the given signal has a positive edge (just like FFs/registers behave in real circuits). Thus, when the clocking event happens, the code for that block executes.
Clocking blocks are used to generalize how the timing of events surrounding clock events should behave. In real circuits, you typically have hold time and setup time constraints for each FF in the design. These constraints dictate the limitation on clock frequency for circuits and are important to understand when it comes to designing hazard-free logic circuits. In simulation of HDL code however, recreating these timing paradigms can be annoying and not scalable, especially when dealing with synchronous interfaces between testbench code and design code. As such, SystemVerilog includes the clocking block construct as a way of providing testbenches with a method of easily defining the timing of such interfaces with a defined clock, builtin skew and constructs that allows stimulus in testbenches to be defined by the clock in a nicer way.
When you define a clocking block, you are defining a set of signals to be synchronized to the provided clock with defined skews, so then whenever you try to assign inputs or read from outputs, these signals are automatically skewed by the given amount (thus behaving a in more realistic way). Also, with clocking, you can use the ## construct in stimulus and checking blocks to delay events by a certain number of clock cycles (true you can use #(posedge clk); to do that, but the ## syntax is much cleaner. Ultimately, clocking blocks allow you to build scalable testbenches that include timing information for synchronous interfaces (because the timing information is all in the clocking block). You can find a more complete explanation and examples of clocking blocks here:
https://www.doulos.com/knowhow/sysverilog/tutorial/clocking/
The important take-aways are these:
The difference between always #(posedge clk) and clocking blocks is that the former is about describing registers and the latter is about describing the timing of a synchronous interface between a DUT and the testbench.
Thus, the direct comparison you make in your questions is not really appropriate. However to answer your questions directly:
Clocking blocks sample their inputs in the Postponed region of the timestep defined by the input skew (ie, skew time before the clocking event). As the default is 1step, the sample is done in the Postponed region of the previous step before the clocking event (which is the same as the Preponed region of the current step in terms of value). The outputs are driven in the ReNBA region skew time steps after the clocking event (the default skew is 0, thus the output is driven in the ReNBA of the same timestep as the clocking event).
As clocking blocks are for defining a timing model (for synchronous lines) between a DUT and its testbench, they are indeed not snythesizable. They are a testbench construct, much like initial (ignoring a few cases), final, assertions and programs.
To learn more about clocking blocks, read Chapter 14 of IEEE1800-2012. 14.13 talks about input skew and 14.16 talks about output skew.

Why using zero timing (#0)in verilog is not good practice?

Why using zero timing is not good.
I cant find details about that any what problem will arise if w use that method.
Designers are often tempted to use #0 to avoid race conditions between two procedural blocks. A #0 in a procedural block forces that block to stop and be rescheduled after all other blocks. The problem happens when you have a multiple blocks that all want to execute last. Who should win?
This itself can become a new race condition and its resolution could vary from run to run and from simulator to simulator. In short, multiple threads using #0 delays can cause non-deterministic execution behavior.
Besides, it makes your code hard to read and also non-synthesizable. SystemVerilog has provided new constructs for avoiding #0 in a more predictable and readable way. Here is one example (See 7.2 Event trigger race conditions).
Note that there are cases other than the classics usage of #0 in SystemVerilog that you may actually need to use #0. For example, differed assertions.

Resources