I am creating a fairly complicated module which involves timing analysis of 2 Modules each having their own algorithm, but take in 2 signed numbers as inputs and output a signed number.
I am designing this module for an FPGA in Verilog using Xilinx as my synthesis tool. Now I understand that Xilinx usually gives the worst case timing analysis for any module. This means that if I have a range of numbers which take 250 picoseconds, from input to output including the routing time, if there is even a single set of inputs that takes 400 picoseconds, the timing analysis shown by Xilinx would be 400 picoseconds.
My goal is to find:
1) If Module 1 is faster than Module 2 for any set of numbers.
1) Range of numbers for which Module 1 is faster than Module 2.
The only logical approach I can think of is, by increasing the operating frequency of the module. That is to force both the Modules to give their outputs after say 300 picoseconds rather than 400 picoseconds.
Obviously if I increase the operating frequency, some of the inputs in the testbench will give out erroneous outputs. My hypothesis is that the module that starts giving out erroneous answers first, has the algorithm.
So my doubts are:
1) Is it possible to increase the operating frequency of a Module in Verilog using Xilinx (some settings that I must enforce upon during synthesis or analysis). If not, is there a better tool that will be do my timing analysis?
2) Is this approach viable? Short of doing a gate level synthesis using Cadence, is there anyway, I can find out the actual time delay analysis for each set of signed numbers for each gate using Verilog?
You are right in assuming that Xilinx always reports worst-case timing for the whole design, where clock-rates are concerned, Don't take the synthesis results as being very accurate - they can vary by quite a lot once you've placed and routed the design.
I guess you could take the post-PAR Verilog netlist and simulate that with a variety of inputs using different simulated clock speeds - if there are slow paths which are not used for certain inputs, you should be able to run the simulated clock faster for those inputs.
Sounds like a very time consuming task, and I'm not sure what the point is. Where I come from (automotive) "worst-case" is the only number we can look at with any level of confidence!
Related
I am curious to learn about the technology which is used in generating software clock in simulators. The frequency of my machine is only ~2.4GHz but I can generate up to 500THz clock using a simulator(Refer below system Verilog snippet ).
`timescale 1fs/1fs;//This is the minimum time-unit and precision that can be used to generate 500THz clock
module temp();
bit clk_b;
always #1 clk_b =~ clk_b ;
endmodule
Is this higher frequency just a software illusion or does it have any link with CPU crystal oscillator?
The simulation does not "run" in realtime. So it will compute the result for the steps and if it is done it is done. Which means that the ratio between number of required steps (as well as problem complexity) and your computer performance will define how much time the simulation will need to finish. The timescale setting of the simulation is just what it says: a way to relate the simulation steps into a time(scale).
So it is really an "illusion" if you want to call it so.
SystemVerilog is a HVL i.e. a Hardware Verification Language. It is (mostly) used to verify hardware designs.
The main purpose of the language is to provide a platform where one can create logic to verify the DUT by running simulations i.e. generating different operating conditions for the DUT and checking how it behaves under each condition. But this does not necessarily mean that DUT is supposed to operate in such extreme conditions generated by the SystemVerilog testbench.
When you are generating 500THz clock from your testbench and checking the behaviour of your DUT, you are making sure that the DUT is not (virtually) going to break down even in such extreme conditions. But please note that this is just a virtual environment you have created and not the actual environment under which the DUT once synthesised is supposed to operate.
If the maximum frequency of the machine (or DUT) is ~2.5GHz, it is supposed to operate at that frequency in the actual environment, but just out of curiosity you can even check operation of DUT with different input clock frequencies by generating different simulations.
Hope it helps!
I have been working on a class project using Verilog. I had to create a circuit and then calculate the power that the circuit uses. I have been trying to do it using Xpower Analyzer I follow the instruction to create the vcd file, compile and synthesize the code using Xilinx ISE 14.7 . Everything goes well until the result shows up. I received 0 power consumption from the clock. I try to constrains the clock and it only give me a increment in dynamic power from 0 to 0.009, but not luck in the clock. Also, I try Xpower in my personal computer and at my university computer lab, so I don't think that it is a software bug.
Moreover, I have try different design such as a simple alu, register etc. Nonetheless, I still getting the same power result.
More information:
Testbench runs well and does what I want
I declare clock like: module toptrafficlight(
clock,rst,output );
List item: I have constrained the clock to 20ns
Timing phase = 0. After synthesis (not sure what this means)
Warrnings from:
HDLCompiler:413 - Line 86: Result of 5-bit expression is truncated to fit in 4-bit target.
PhysDesignRules:372 - Gated clock. Clock net main_gated_clk is sourced by a combinatorial pin. This is not good design practice. Use the CE pin to control the loading of data into the flip-flop.
Power result from Xpower Analyzer
My questions are?
is it a way to setup the clock? which I think might be the cause of the problem
is there anything else needed to be done beside getting the VCD file and synthesize the code?
any other ideas, examples or tutorial?
The screenshot shows that the design is very small, so it's not a big surprise for clock power to be smaller than 1mW. Xilinx also provides an Excel sheet for power estimation. It can be used for a quick tryout to see what circumstances make the clock power significant.
Xilinx Power Estimator (XPE)
I wish to know how many LPM_DIV (Altera dividers) can I generate in one single project if my FPGA board is the 5CSEMA5F31C6N DE1-SOC.
I am intending to do a project on which I have to process data through many dividers working at the same time (same clock), the total amount varies, ranging from be 4 and 4096. I wonder if this is somehow feasible in the FPGA.
PD: A friend have told me that it is in fact feasible to generate this large amount of dividers, but only in simulation, but not to synthesize in the FPGA, that Quartus II would give me a report on which says how many logic gates I lack to complete the requirement.
Remember that HDL instantiates physical circuitry, and those circuits don't just magically disappear because you no longer need them. You would need to design your system to always have all 4096 dividers, and only take the outputs from the ones you use.
The number of dividers depends on how you have them configured: input sizes, pipelining, optimization method. One 64/64 divider with registered output uses ~2300 (7%) of the ALMs on the DE1-SoC, meaning you could fit ~14 dividers on the chip. Using different input parameters, you'd get different results.
If you synthesize one of your dividers, Quartus will give you a percent of resources used for each type. From that, it should be easy to determine the number of dividers you can fit.
I have googled this question, but i can't seem to find a proper answer. Is there a certain equation to it? Or do i calculate the delay by looking at the longest path in the circuit?
You calculate path delays using static timing analysis tools that are usually provided by the target technology vendor (i.e. FPGA)
Quick and dirty method - Open the synthesis report, and look for maximum combinational delay. This should give you an approximate value.
The right way - If the block is purely combinational, register the inputs and outputs using a clock of ANY frequency. Generate timing from input register to output register post implementation (place and route) using the tcl command report_timing -from {Flop1} -to {Flop2}. This will give you an elaborate description of cell and wire delays. This analysis would be accurate.
In FPGA programming, what is the point of using the create_clock command in the XDC (or UCF) file? Let's say I have a clock port CLK that is assigned to a physical pin (which is my clock), in the XDC (or UCF) file. Why can't I just go ahead and use this CLK pin in my top level HDL? Why do I need to add something like this:
create_clock -name sys_clk_pin -period "XXX" [get_ports "CLK"]
Also, let's say I have a main clock "CLK" and some other clocks which I generate in HDL. Do I have to use "create_clock" for all the minor clock in XDC too?
I don't get this whole "create_clock" thing. Any help or direction is much appreciated.
Thanks
Design constraints, as the name suggests, are used in order to define additional constraints of your design, which can't be captured from HDL description.
Lets take create_clock command as an example. You specified the clock pin in your HDL description, why isn't this enough? The reason is that clock signal is not a usual signal - it is used as a reference signal by a synchronous logic (flip-flops).
I suppose you're familiar with "propagation delay" (through logic gates) concept. You want to make sure that all signals originating at one flop and sampled at the other will be able to propagate during a single clock cycle. Now, the total propagation delay you can know right after synthesis because each logic gate in FPGA has associated propagation delay (just sum these up). But how your analysis tools know what is the maximal allowed propagation delay? You do not specify these constraints in HDL, right? This is one of the cases where the frequency you specified with create_clock command will be used - it will be converted to period, and an analysis tool will warn you if any of the combinatorial paths in your design takes longer to propagate than clock's period.
The above example describes one of the actions performed by Static Timing Analysis (STA) tools in which "design constraints" are employed.
Another kind of tools which make extensive use of design constraints is Clock Domain Crossing (CDC) tools. These tools employed in designs containing more than one clock. The CDC concepts are described brilliantly here
In case you take one clock and generate another one from it (clock divider for example) you want to make CDC tool aware of this, because the fact that these clocks are related is important. Your way to inform CDC tool that the clocks are related is to use create_generated_clock constraint.
NOTE: the above examples are basic and by no means comprehensive.