same source, different clk frequency(multi-clock design) - verilog

how to handle the multi-clock design signals with the clock generated from the same source?
For example,
one clock domain is 25Mhz
the other one is 100Mhz
how can I handle the data bus from 25Mhz to 100Mhz
and also 100Mhz to 25Hhz?
don't want to use AFIFO tho
any other easy CDC way to handle it?

Case 1: If the source ensures that the edges of the clocks are aligned, there is no need to do anything in the design. A single-bit and multi-bit data have no difference.
Case 2: If the edges are not aligned, but the phase relationship is known, the clocks are still synchronous. The synthesis/STA/P&R tool can calculate the worst cases for timing (e.g. setup/hold) checks. In case there is no violation, no need to do anything again. The most important part here is defining the timing constraints correctly.
Case 3: If the clocks are asynchronous, one solution is carrying an enable signal with the bus. The enable signal is synchronized by a pair of flip-flops. Then the data bits are masked or passed according to the value of synchronized enable signal. This solution is explained here as well as many other solutions and cases.

Depends if the two clocks are synchronous or asynchronous with respect to each other. You can use a 2 bit/n-bit synchronizer to eliminate the meta stability issue in CDC. Other approaches are mux based handshake mechanism, gray code counter.

If you are sending data from slower clock domain to faster clock domain, fast clock should be 1.5 times that of the slow clock.
for faster clock to slower clock domain, Data of Fast clock should be 1.5 times that of the slower clock.

Related

How do I test and / or benchmark traditional Linux Kernel vs Linux Kernel with RT Preempt patch?

I am working on a project to contrast and observe the performance gain with Preempt RT patch for Linux.
What kind of C programs should I look to execute on the two different kernels to gain good understanding of the benifits that Preempt RT patch offers.
Looking for suggestions on the programs.
To compare/demonstrate specifically the scheduling characteristics perhaps implement a system where:
An interrupt is generated via a digital input IN
The interrupt handler passes the input event via a semaphore to a high priority user process.
The user process, on receipt of the semaphore creates a (say) 10ms pulse on a digital output OUT.
Then:
Drive IN with a series of pulses from a signal generator
Attach an oscilloscope to IN and OUT.
Trigger the scope on the active (interrupt generating) edge of IN
Measure the time and variance between the interrupt-edge on IN and the start of the pulse on OUT.
Trigger the scope on the rising edge of the pulse on OUT.
Measure the length and variance of the pulse width.
Most modern scopes have a "persistence" feature where the trace is not cleared between sweeps. That is useful for measuring the variance.
If you lack a scope or a signal/function generator, you could use a switch, and software timestamps in the ISR and in the user process to log event times. But you would need to ensure in the user task that no preemption occurs between capturing the time and setting the OUT state by using a critical section, and will likely need to debounce the switch. That, in the case would simply be a matter of not setting the semaphore if the last event timestamp was less than say 20ms ago.
If PREEMPT-RT is doing its job, the tests should exhibit lower latency, greater precision and less variance than with the default scheduler regardless of the load of other (lower priority) processes running. If that still does not meet your requirements you may need a real RTOS.
If this characteristic is not your application requires, then you may not need or benefit from PREEMPT-RT and inappropriate allocation of process priorities or poor task design may even cause your application to fail to meet requirements. To make PREEMPT-RT work you have to know what you are doing; it does not magically make your system "real-time"; rather it facilitates the implementation of real-time systems.

How accurate is Explicit Synchronization of Schedule Tables?

I am reading up on time synchronization in AUTOSAR. Specifically, how to use global/PTP time to actually do time sensitive work on an ECU.
The way I understand it (from the OS spec "AUTOSAR_SWS_OS"), the way to do this is to put tasks in Schedule Tables, and then synchronize the tables either implicitly or explicitly.
Implicit synchronization I understand: lower level code/hardware sorts out the synchronization of a physical clock, and then the schedule tables just use a timer based on this clock.
I'm a bit puzzled by Explicit Synchronization however: It seems the way the table is synchronized is by periodic calls to SyncScheduleTable(). This tells the scheduler "the PTP time now is X".
But wouldn't the process of retrieving the current PTP time and then updating the table (in software...) introduce error in the time sync? I would think this would take at least a few microseconds?
Is the level of synchronization not expected to be sub-microsecond in AUTOSAR?
You will always have small offsets between SW modules.
Normally you receive PTP global time from bus.
Then STBM will use global time value and manage internal timer to start counting from last received value from bus. after sometime you will have offset between internal time and master clock from bus.
This offset will be corrected always when new value is received but you will always have small offsets.
The internal timer will keep synchronizing the schedule table and yes there will be few nano seconds for time take from reading the time from HW register till it is sent to the OS to synchronize.
Even after synchronization command, OS runnables events will be fired with offsets due to CPU load.
In the end would really few nano seconds hurt your design? in most projects I have seen such small offsets are acceptable.

Why do verilog tutorials commonly make reset asynchronous?

This question is in the context of FPGA synthesis if that makes any difference. The data sheet (iCE40UP) states that each logic cell has a D-type flop with asynchronous reset and clock enable inputs.
Many verilog tutorials introduce sequential logic with something like:
always #(posedge clk)
begin
some_reg <= [...]
end
I'm familiar with clocked logic and this makes intuitive sense to me.
Then the very next concepts introduced are usually:
Be careful to not accidentally create a latch because what you really need is a proper register.
always #(posedge clk or [pos|neg]edge reset)
always #(*)
In Wikipedia I read scary statements like "if the system has a dependence on any continuous inputs then these are likely to be vulnerable to metastable states. [...] If the inputs to an arbiter or flip-flop arrive almost simultaneously, the circuit most likely will traverse a point of metastability."
At the risk of having my question closed for being poorly-formed ... what am I missing?
Is asynchronous reset recommended design practice? What is gained by not treating reset like any other input and having it take effect on the next cycle? Documentation for real chips usually requires that the RST* pin is held low for many clock cycles.
Does having a latch in the design make it asynchronous? How do we ensure proper timing is observed in the presence of a latch driven by something outside the clock domain?
When would anyone ever actually want a latch in a clocked design? Why does verilog make it so easy to create one accidentally?
Thanks!
Seemingly related questions:
- Verilog D-Flip-Flop not re-latching after asynchronous reset
- What if I used Asynchronous reset, Should I have to make as synchronous turned it?
Synchronous vs. asynchronous reset has some similarities to the big endian vs. little endian battle for CPUs.
In many cases, both types work equally well.
But there are cases when either type has an advantage over the other.
At situations like powerup or powerdown you may not have a valid clock, but you still need the reset to work to put your system in a known passive state, and avoid dangerous I/O glitches.
Only asynchronous reset can do that.
If your design contains registers which lack reset capability, such as RAM blocks, then using asynchronous reset on the registers feeding adr, data and control signals to the RAM can cause corruption of the RAM content when a reset occurs. So if you need the ability to do a warm reset where RAM content must be preserved: Use synchronous warm reset for the logic closest to the RAM.
Altera and Xilinx are adding to the confusion by recommending their customers to only use synchronous reset.
Using only synchronous reset can work well on Altera and Xilinx, since both are SRAM based FPGA architectures, so powerup glitches are never a concern.
But if you want to make your design portable to other architectures, such as ASICs or flash FPGAs, then asynchronous reset may be the better default choice.
Regarding your question about metastability caused by asynchronous reset. That is correct. A fully asynchronous reset signal can cause metastability.
That is why you must always synchronize the rising edge of a active low asynchronous reset signal.
Only the falling edge of the reset can be fully asynchronous.
Synchronizing only the rising edge is done by two flip-flops.
Latches: No, you almost never want latches in a clocked design.
Good practice is to let the DRC trigger an error in case a latch is found.

Usage of Clocking Blocks in Systemverilog

What is the exact usage of Clocking Blocks in System Verilog, and how does it differ from normal always # (posedge clk) block?
Some differences, which I know :
Clocking Block samples input data from Preponed Region, whereas in normal always block, there are always chances of race condition.
Clocking Block is not synthesizable, but normal always # (posedge clk) is synthesizable.
Still, I am not getting the specific usage of Clocking Block, so kindly give your inputs, as well as correct me, if I have mentioned something wrong.
While I havent done much with clocking blocks, I can provide a basic understanding of their purpose and primary difference with the always block construct.
It is important to note these constructs are very different and solve very different problems. The always block is really the heart of Verilog and serves as the primary descriptor of logic and registers (Im kind of lumping together always #*, always_comb, always_latch, always #(posedge clk) and always_ff all together because they all do a similar thing, though for different use cases and with several nuances). So, the always #(posedge clk) is for describing registers or, more accurately, describing actions to be taken every time the given signal has a positive edge (just like FFs/registers behave in real circuits). Thus, when the clocking event happens, the code for that block executes.
Clocking blocks are used to generalize how the timing of events surrounding clock events should behave. In real circuits, you typically have hold time and setup time constraints for each FF in the design. These constraints dictate the limitation on clock frequency for circuits and are important to understand when it comes to designing hazard-free logic circuits. In simulation of HDL code however, recreating these timing paradigms can be annoying and not scalable, especially when dealing with synchronous interfaces between testbench code and design code. As such, SystemVerilog includes the clocking block construct as a way of providing testbenches with a method of easily defining the timing of such interfaces with a defined clock, builtin skew and constructs that allows stimulus in testbenches to be defined by the clock in a nicer way.
When you define a clocking block, you are defining a set of signals to be synchronized to the provided clock with defined skews, so then whenever you try to assign inputs or read from outputs, these signals are automatically skewed by the given amount (thus behaving a in more realistic way). Also, with clocking, you can use the ## construct in stimulus and checking blocks to delay events by a certain number of clock cycles (true you can use #(posedge clk); to do that, but the ## syntax is much cleaner. Ultimately, clocking blocks allow you to build scalable testbenches that include timing information for synchronous interfaces (because the timing information is all in the clocking block). You can find a more complete explanation and examples of clocking blocks here:
https://www.doulos.com/knowhow/sysverilog/tutorial/clocking/
The important take-aways are these:
The difference between always #(posedge clk) and clocking blocks is that the former is about describing registers and the latter is about describing the timing of a synchronous interface between a DUT and the testbench.
Thus, the direct comparison you make in your questions is not really appropriate. However to answer your questions directly:
Clocking blocks sample their inputs in the Postponed region of the timestep defined by the input skew (ie, skew time before the clocking event). As the default is 1step, the sample is done in the Postponed region of the previous step before the clocking event (which is the same as the Preponed region of the current step in terms of value). The outputs are driven in the ReNBA region skew time steps after the clocking event (the default skew is 0, thus the output is driven in the ReNBA of the same timestep as the clocking event).
As clocking blocks are for defining a timing model (for synchronous lines) between a DUT and its testbench, they are indeed not snythesizable. They are a testbench construct, much like initial (ignoring a few cases), final, assertions and programs.
To learn more about clocking blocks, read Chapter 14 of IEEE1800-2012. 14.13 talks about input skew and 14.16 talks about output skew.

Timestamp generated by two threads

I have two thread in my code. One thread is a generator which creates messages. A timestamp is generated before a message is transmitted. The other thread is a receiver which accepts replies from multiple clients. A timestamp is created for each reply. Two threads are running at the same time.
I find the timestamp generated by the receivers is earlier than the timestamp generated by the generator. The correct order should be the timestamp for the receiver is later than the timestamp for the generator.
If I give a high priority for the generator thread, this problem does not occcur. But this can also slow down the performance.
Is there other way to guarantee the correct order and less effection on the performance? Thanks.
Based on the comment thread in the question, this is likely the effect of the optimizer. This is really a problem with the design more than anything else - it assumes that the clocks between the producer and consumer are shared or tightly synchronized. This assumption seems reasonable until you need to distribute the processing between more than one computer.
Clocks are rarely (if ever) tightly synchronized between different computers. The common algorithm for synchronizing computers is the Network Time Protocol. You can achieve very close to millisecond synchronization on the local area network but even that is difficult.
There are two solutions to this problem that come to mind. The first is to have the producer's timestamp is passed through the client and into the receiver. If the receiver receives a timestamp that is earlier than it's notion of the current time, then it simply resets the timestamp to the current time. This type of normalization will allow assumptions about time being a monotonically increasing sequence continue to hold.
The other solution is to disable optimization and hope that the problem goes away. As you might expect, your mileage may vary considerably with this solution.
Depending on the problem that you are trying to solve you may be able to provide your own synchronized clock between the different threads. Use an atomically incrementing number instead of the wall time. java.util.concurrent.atomic.AtomicInteger or one of its relatives can be used to provide a single number that is incremented every time that a message is generated. This allows the producer and receiver to have a shared value to use as a clock of sorts.
In any case, clocks are really hard to use correctly especially for synchronization purposes. If you can find some way to remove assumptions about time from distributed systems, your architectures and solutions will be more resilient and more deterministic.

Resources