Realtime CPU clock vs High Frequency Software clock - verilog

I am curious to learn about the technology which is used in generating software clock in simulators. The frequency of my machine is only ~2.4GHz but I can generate up to 500THz clock using a simulator(Refer below system Verilog snippet ).
`timescale 1fs/1fs;//This is the minimum time-unit and precision that can be used to generate 500THz clock
module temp();
bit clk_b;
always #1 clk_b =~ clk_b ;
endmodule
Is this higher frequency just a software illusion or does it have any link with CPU crystal oscillator?

The simulation does not "run" in realtime. So it will compute the result for the steps and if it is done it is done. Which means that the ratio between number of required steps (as well as problem complexity) and your computer performance will define how much time the simulation will need to finish. The timescale setting of the simulation is just what it says: a way to relate the simulation steps into a time(scale).
So it is really an "illusion" if you want to call it so.

SystemVerilog is a HVL i.e. a Hardware Verification Language. It is (mostly) used to verify hardware designs.
The main purpose of the language is to provide a platform where one can create logic to verify the DUT by running simulations i.e. generating different operating conditions for the DUT and checking how it behaves under each condition. But this does not necessarily mean that DUT is supposed to operate in such extreme conditions generated by the SystemVerilog testbench.
When you are generating 500THz clock from your testbench and checking the behaviour of your DUT, you are making sure that the DUT is not (virtually) going to break down even in such extreme conditions. But please note that this is just a virtual environment you have created and not the actual environment under which the DUT once synthesised is supposed to operate.
If the maximum frequency of the machine (or DUT) is ~2.5GHz, it is supposed to operate at that frequency in the actual environment, but just out of curiosity you can even check operation of DUT with different input clock frequencies by generating different simulations.
Hope it helps!

Related

how to get power estimation using xpower

I have been working on a class project using Verilog. I had to create a circuit and then calculate the power that the circuit uses. I have been trying to do it using Xpower Analyzer I follow the instruction to create the vcd file, compile and synthesize the code using Xilinx ISE 14.7 . Everything goes well until the result shows up. I received 0 power consumption from the clock. I try to constrains the clock and it only give me a increment in dynamic power from 0 to 0.009, but not luck in the clock. Also, I try Xpower in my personal computer and at my university computer lab, so I don't think that it is a software bug.
Moreover, I have try different design such as a simple alu, register etc. Nonetheless, I still getting the same power result.
More information:
Testbench runs well and does what I want
I declare clock like: module toptrafficlight(
clock,rst,output );
List item: I have constrained the clock to 20ns
Timing phase = 0. After synthesis (not sure what this means)
Warrnings from:
HDLCompiler:413 - Line 86: Result of 5-bit expression is truncated to fit in 4-bit target.
PhysDesignRules:372 - Gated clock. Clock net main_gated_clk is sourced by a combinatorial pin. This is not good design practice. Use the CE pin to control the loading of data into the flip-flop.
Power result from Xpower Analyzer
My questions are?
is it a way to setup the clock? which I think might be the cause of the problem
is there anything else needed to be done beside getting the VCD file and synthesize the code?
any other ideas, examples or tutorial?
The screenshot shows that the design is very small, so it's not a big surprise for clock power to be smaller than 1mW. Xilinx also provides an Excel sheet for power estimation. It can be used for a quick tryout to see what circumstances make the clock power significant.
Xilinx Power Estimator (XPE)

How to drive a clock to a single clock domain?

I have a project to do in VHDL on a FPGA (cyclone IV). The majority of my entities works with a single clock. I know that clock gating is not a good solution (see image) because it causes timing violations. Can someone tell me what are the good practice rules to do this kind of things? (I obviously did some researches on Internet but every link I found talks about clock domain crossing)
Thank you
The diagram isn't a demonstration of clock gating. It is showing the impact of skew between different portions of a clock distribution tree. This is a reality of synthesizing any practical design even if you aren't gating the clock. A gated clock introduces additional skew on nodes within its fanout in addition to that which naturally occurs in clock distribution.
In non-trivial devices the clocks are passed through a tree of buffers to minimize capacitive load. In FPGAs the clocks are usually routed on carefully designed global clock nets that have been optimized to minimize skew. In ASICs balanced clock trees will be synthesized and physical timing constraints will guide placement of the buffers to minimize skew at the flip-flops. Unless you are doing something exotic, the backend tools will take care of getting you the best clock trees possible.
As a designer you primarily deal with the skew problem by setting up proper timing constraints and using static timing analysis to verify you meet setup and hold requirements under worst case (and best case) conditions. An FPGA has already had its delays characterized for use in static timing. With an ASIC, the delays in your design will be estimated before and after place-and-route and fed into the analyzer. A gated clock will introduce skew that reduces the available timing budget on the affected data paths. The timing analyzer will account for this if you have set up the constraints properly. Once you pass static timing on the final design, your job is done.
If you have timing failures due to combinational delays in your datapath You have to do one of the following:
Reduce the failing combinational path by reorganizing the logic or inserting pipeline stages
Use a slower clock
Define a multi-cycle delay if you don't need valid results on every cycle
If a gated clock introduces too much skew you can think of it as creating a new clock domain altogether and using clock domain synchronization techniques to pass signals between the separate domains.

Adjusting the operating frequency of a module in Verilog

I am creating a fairly complicated module which involves timing analysis of 2 Modules each having their own algorithm, but take in 2 signed numbers as inputs and output a signed number.
I am designing this module for an FPGA in Verilog using Xilinx as my synthesis tool. Now I understand that Xilinx usually gives the worst case timing analysis for any module. This means that if I have a range of numbers which take 250 picoseconds, from input to output including the routing time, if there is even a single set of inputs that takes 400 picoseconds, the timing analysis shown by Xilinx would be 400 picoseconds.
My goal is to find:
1) If Module 1 is faster than Module 2 for any set of numbers.
1) Range of numbers for which Module 1 is faster than Module 2.
The only logical approach I can think of is, by increasing the operating frequency of the module. That is to force both the Modules to give their outputs after say 300 picoseconds rather than 400 picoseconds.
Obviously if I increase the operating frequency, some of the inputs in the testbench will give out erroneous outputs. My hypothesis is that the module that starts giving out erroneous answers first, has the algorithm.
So my doubts are:
1) Is it possible to increase the operating frequency of a Module in Verilog using Xilinx (some settings that I must enforce upon during synthesis or analysis). If not, is there a better tool that will be do my timing analysis?
2) Is this approach viable? Short of doing a gate level synthesis using Cadence, is there anyway, I can find out the actual time delay analysis for each set of signed numbers for each gate using Verilog?
You are right in assuming that Xilinx always reports worst-case timing for the whole design, where clock-rates are concerned, Don't take the synthesis results as being very accurate - they can vary by quite a lot once you've placed and routed the design.
I guess you could take the post-PAR Verilog netlist and simulate that with a variety of inputs using different simulated clock speeds - if there are slow paths which are not used for certain inputs, you should be able to run the simulated clock faster for those inputs.
Sounds like a very time consuming task, and I'm not sure what the point is. Where I come from (automotive) "worst-case" is the only number we can look at with any level of confidence!

What is the point of "create_clock" command in FPGA design?

In FPGA programming, what is the point of using the create_clock command in the XDC (or UCF) file? Let's say I have a clock port CLK that is assigned to a physical pin (which is my clock), in the XDC (or UCF) file. Why can't I just go ahead and use this CLK pin in my top level HDL? Why do I need to add something like this:
create_clock -name sys_clk_pin -period "XXX" [get_ports "CLK"]
Also, let's say I have a main clock "CLK" and some other clocks which I generate in HDL. Do I have to use "create_clock" for all the minor clock in XDC too?
I don't get this whole "create_clock" thing. Any help or direction is much appreciated.
Thanks
Design constraints, as the name suggests, are used in order to define additional constraints of your design, which can't be captured from HDL description.
Lets take create_clock command as an example. You specified the clock pin in your HDL description, why isn't this enough? The reason is that clock signal is not a usual signal - it is used as a reference signal by a synchronous logic (flip-flops).
I suppose you're familiar with "propagation delay" (through logic gates) concept. You want to make sure that all signals originating at one flop and sampled at the other will be able to propagate during a single clock cycle. Now, the total propagation delay you can know right after synthesis because each logic gate in FPGA has associated propagation delay (just sum these up). But how your analysis tools know what is the maximal allowed propagation delay? You do not specify these constraints in HDL, right? This is one of the cases where the frequency you specified with create_clock command will be used - it will be converted to period, and an analysis tool will warn you if any of the combinatorial paths in your design takes longer to propagate than clock's period.
The above example describes one of the actions performed by Static Timing Analysis (STA) tools in which "design constraints" are employed.
Another kind of tools which make extensive use of design constraints is Clock Domain Crossing (CDC) tools. These tools employed in designs containing more than one clock. The CDC concepts are described brilliantly here
In case you take one clock and generate another one from it (clock divider for example) you want to make CDC tool aware of this, because the fact that these clocks are related is important. Your way to inform CDC tool that the clocks are related is to use create_generated_clock constraint.
NOTE: the above examples are basic and by no means comprehensive.

How do I implement a synthesizable DPLL in Verilog?

Is there any straight forward way to implement an all digital phase lock in synthesizable Verilog? Everything (including the VCO) should be synthesized. The signals I'm looking to lock to are ~0.1-1% of the system clock frequency. I am using one that I've reconstructed from 1980's IEEE papers, but it doesn't behave as well as advertised.
For simplicity, the lock can work on a binary pulse signal.
In FPGA designs I normally use the built in DCMs, or PLLs.
The Cyclone 2 has up to 4 PLLs built in.
Have a look at PLLs in Cyclone 2.

Resources