ice40 clock delay, output timing analysis

ice40 clock delay, output timing analysis - verilog

I have an ice40 that drives the clock and data inputs of an ASIC.
The ice40 drives the ASIC's clock with the same clock that drives the ice40's internal logic. The problem is that the rising clock triggers the ice40's internal logic and changes the ice40's data outputs a few nanoseconds before the rising clock reaches the ASIC, and therefore the ASIC observes the wrong data at its rising clock.
I've solved this issue by using an inverter chain to delay the ice40's internal clock without delaying the clock driving the ASIC. That way, the rising clock reaches the ASIC before the ice40's data outputs change. But that raises a few questions:
Is my strategy -- using an inverter chain to delay the ice40 internal clock -- a good strategy?
To diagnose the problem, I used Lattice's iCEcube2 to analyze the min/max delays between the internal clock and output pins:
Notice that the asic_dataX delays are shorter than the clk_out delay, indicating the problem.
Is there a way to get this information from yosys/nextpnr?
Thank you for any insight!

Instead of tinkering with the delays I would recommend to use established techniques. For example SPI simple clocks the data on the one edge and changes them on the other: .
The logic to implement that is rather simple. Here an example implementation for an SPI slave:
module SPI_slave #(parameter WIDTH = 6'd16, parameter phase = 1'b0,
parameter polarity = 1'b0, parameter bits = 5) (
input wire rst,
input wire CS,
input wire SCLK,
input wire MOSI,
output reg MISO,
output wire data_avbl,
input wire [WIDTH-1:0] data_tx,
output reg [WIDTH-1:0] data_rx
);
reg [bits:0] bitcount;
reg [WIDTH-1:0] buf_send;
assign clk = phase ^ polarity ^ SCLK;
assign int_rst = rst | CS;
assign tx_clk = clk | CS;
assign data_avbl = bitcount == 0;
always #(negedge tx_clk or posedge rst) begin
MISO <= rst ? 1'b0 : buf_send[WIDTH-1];
end
always #(posedge clk or posedge int_rst) begin
if (int_rst) begin
bitcount <= WIDTH;
data_rx <= 0;
buf_send <= 0;
end else begin
bitcount <= (data_avbl ? WIDTH : bitcount) - 1'b1;
data_rx <= { data_rx[WIDTH-2:0], MOSI };
buf_send <= bitcount == 1 ? data_tx[WIDTH-1:0] : { buf_send[WIDTH-2:0], 1'b0};
end
end
endmodule
As one can see the data are captured at the positive edge and changed on the negative edge. If one wants to avoid the mixing of edge sensistivies a doubled clock can be used instead.

Related

Modelsim - Object not logged & no signal data while simulating verilog clock divider code

What I'm trying to do: I wish to count numbers from 0 to hexadecimal F and display these on my FPGA board at different frequencies - my board's clock (CLOCK_50) is at 50 MHz and I wish to alter the frequency/speed of the counting based on two input switches on my board (SW[1:0]).
Verilog Code for top-module & clock divider module:
//top level module
module rate_divider (input CLOCK_50, input [1:0] SW, input [1:0] KEY,output [6:0] HEX0);
//Declare parameters that define the # of clock cycles needed to generate an enable pulse
according to the desired frequency.
parameter FREQ_5MHz = 4'd9; //To divide to 5 MHz we need (10-1) cycles,
//since the pulse needs to start at the 9th cycle.
parameter FULL50_MHz = (4'd1); //The CLOCK_50's Frequency.
//Select the desired parameter based on the input switches
reg [3:0] cycles_countdown;
wire enable_display_count;
wire [3:0] selected_freq;
always #(*)
case (SW)
2'b00: cycles_countdown = FULL50_MHz;
2'b01: cycles_countdown = FREQ_5MHz;
default : cycles_countdown = FULL50_MHz;
endcase
assign selected_freq = cycles_countdown;
//wire that is the output of the display_counter and input to the 7 segment
wire [3:0] hex_value;
// instantiate my other modules
clock_divider_enable freq_divider (.d(selected_freq), .clk(CLOCK_50), .reset(KEY[0]),
.enable(enable_display_count));
display_counter count_hex (.enable(enable_display_count), .clk(CLOCK_50), .hex_out(hex_value), .reset(KEY[1]));
hex_decoder HX0 (.hex_digit(hex_value), .segments(HEX0[6:0]));
endmodule
//the clock_divider sub-circuit.
module clock_divider_enable (input [3:0] d, input clk, reset,
output enable);
reg [3:0] q;
always #(posedge clk)
begin
if (!reset || !q)
q <= d;
else
q <= q - 4'd1;
end
assign enable = (q == 4'h0) ? 1'b1 : 1'b0;
endmodule
ModelSim Code:
vlib work
vlog rate_divider.v
vsim rate_divider
log {/*}
add wave {/*}
#initial reset - using KEY[1:0]. Note: active low synchronous reset.
force {CLOCK_50} 1
force {KEY[0]} 0
force {KEY[1]} 0
run 10ns
#choose 5 MHz as the desired frequency - turn SW[0] high.
force {CLOCK_50} 0 0ns, 1 {10ns} -r 20ns
force {KEY[0]} 1
force {KEY[1]} 1
force {SW[0]} 1
force {SW[1]} 0
run 600ns
Problems I am facing:
Here's the thing - when I don't use the always block to select a parameter, and pass a desired parameter to the wire selected_freq, my simulation works fine - I can see the expected enable pulse.
HOWEVER, if I use the always block, the reg cycles_countdown does get the correct value assigned, BUT for some reason the enable signal is just a red line. When I select my clock_divider_enable module and add it's 'q' signal onto my waveform, it is red too and shows no data, and the object q is "not logged". As such, I'm unable to debug and figure out what exactly the problem with my code is.
It'd be great if someone could help with how to fix the simulation issue rather than just point out the issue with my Verilog code since I want to learn how to use ModelSim efficiently so that in the future debugging will be easier for me.
Equipment Used:
FPGA: Altera De-1-SoC, Cyclone V chip
CAD/Simulation Tools: Altera Quartus II Lite 17.0 + ModelSim Starter Edition

SW wasn't given an initial value, therefore it is high-Z (X if connected to a reg).
I'm guessing when you used the parameter approach you were parameterizing cycles_countdown. Some simulators do not trigger #* at time-0. So if there isn't a change on the senctivity list, then the block may not execute; leaving cycles_countdown as its initial value (4'hX).
Instead of driving your test with TCL commands, you can use create a testbench in with verilog. This testbench should only be used in simulation, not synthesis.
module rate_devider_tb;
reg CLOCK_50;
reg [1:0] SW;
reg [1:0] KEY;
wire [6:0] HEX0;
rate_divider dut( .CLOCK_50(CLOCK_50), .SW(SW), .KEY(KEY), .HEX0(HEX0));
always begin
CLOCK_50 = 1'b1;
#10;
CLOCK_50 = 1'b0;
#10;
end
initial begin
// init input signals
SW <= 2'b01;
KEY <= 2'b00;
// Log file reporting
$monitor("SW:%b KEY:%b HEX0:%h # %t", SW, KEY, HEX0, $time);
// waveform dumping
$dumpfile("test.vcd");
$dumpvars(0, rate_devider_tb);
wait(CLOCK_50 === 1'b0); // initialization x->1 will trigger an posedge
#(posedge CLOCK_50);
KEY <= 2'b01; // remove reset after SW was sampled
#600; // 600ns assuming timescale is in 1ns steps
$finish();
end

Clock Domain Crossing for Pulse and Level Signal

For pulse we use Pulse-Synchronizer and for Level Signal we use 2-flop synchronizer but what if the signal can be of Pulse or Level behaviour. Is there any way to synchronize that?

Yes, you can but the solution needs to be based on the width of the input pulse relative to the output clock.
When the output clock is very slow, and you have a pulse, you need to add an inline pulse stretcher that operates in the input clock domain. The stretch is defined by the bit width of stretch_out below and "MUST" be greater than one clock on the output clk domain.
reg [3:0] stretch_out;
always # (posedge inclk)
begin
stretch_out <= in_signal ? 4'b1111 : {stretch_out[2:0],1'b0};
end
Now you can just use your double flop synchronizer.
reg [1:0] out_sync;
always # (posedge outclk)
begin
out_sync <= {out_sync[0],stretch_out[3]};
end
This should synchronize a level and pulse from a fast domain into a slow domain.
The only issue, is that you will be adding more than just your usual two flop latency.

You could asynchronously set using the signal in the destination domain, synchronize using dual flops, and then detect the rising edge. Should work for both short pulses and long levels.
// Prevent DRC violations if using scan
wire in_signal_n = scan_mode ? 1'b1 : !signal_in;
// Following code creates a flop with both async setb and resetb
reg sig_n_async;
always # ( posedge outclk or negedge reset_n or negedge in_signal_n)
if (!reset_n)
sig_n_async <= 0;
else if (!in_signal_n)
sig_n_async <= 1;
else
sig_n_async <= 0;
// Synchronizer
reg [1:0] out_sync;
always # (posedge outclk or negedge reset_n)
if (!reset_n)
out_sync <= 0;
else
out_sync <= {out_sync[0],sig_n_async};
// Rising edge
reg out_sync_del;
always # (posedge outclk or negedge reset_n)
if (!reset_n)
out_sync_del <= 0;
else
out_sync_del <= out_sync[1];
wire signal_out = out_sync[1] & !out_sync_del;

Verilog Inter-FPGA SPI Communication

I am trying to communicate between two Xilinx Spartan 3e FPGAs using SPI communication and GPIO pins. The goal is to have a master-slave communication working but for now I am just sending data from Master to Slave and trying to see if the data received is correct.
This is the Master code that sends 16 bits of data to Slave in serial format. After checking on the scope numerous times it seems correct.
module SPI_MASTER_SEND(
input CLK_50MHZ,
input [1:0] ID_user,
input [15:0] DATA_TO_SEND,
output reg SData,
output SCLK,
output notCS
);
parameter max = 20; // max-counter size
reg [6:0]div_counter;
wire [6:0] data_count;
assign data_count[6:0] = div_counter[6:0];
reg CLOCK;
reg Clk_out;
reg CompleteB;
//have the notCS be low for 20 pulses, and hi for 20 pulses.
//sends 16 bits of data during low pulse
always#(posedge CLOCK) begin
if (div_counter == max-1)
begin
div_counter <= 0;
Clk_out <= ~Clk_out;
end
else
begin
div_counter <= div_counter + 1;
end
end
assign notCS = Clk_out;
reg flag;
assign SCLK = flag&&CLOCK; //Clock when notCS is down for 16 pulses
always #(posedge CLOCK) // Parallel to Serial
begin
if (data_count >= 7'd3 && data_count < 7'd18 && notCS ==0)
begin
SData <= DATA_TO_SEND[18-data_count];
flag <=1;
CompleteB<=0;
end
else if (data_count == 7'd18 && notCS ==0)
begin
flag <=1;
SData<=DATA_TO_SEND[0];
CompleteB<=1;
end
else
begin
CompleteB<=0;
flag<=0;
SData <= 0;
end
end
endmodule
This is the code on the Slave receiving end, I check the data on the falling edge of the clock (have tried posedge too) to avoid any timing conflicts.
The Clock,notCS, and SI (serial in) are all coming from the master via gpio pins
module SPI_COMM_SLAVE(CLK,SI,notCS,outputPO,ID_user);
input CLK,SI,notCS;
input [1:0] ID_user;
reg [15:0] PO;
output reg [15:0] outputPO;
reg CompleteB;
reg C;
reg [5:0] cnt;
initial cnt[5:0] = 6'b000000;
always#(negedge CLK)
begin
if (cnt < 6'd15)
begin
PO[15-cnt] <= SI;
cnt <= cnt + 1'b1;
CompleteB<=0;
end
else if (cnt == 6'd15)
begin
PO[0] <= SI;
cnt<=6'b000000;
CompleteB <=1;
end
else
begin
cnt <= cnt;
CompleteB<=0;
end
end
always#(*)begin
if(CompleteB == 1)
outputPO[15:0] <= PO[15:0];
else
outputPO[15:0]<=outputPO[15:0];
end
endmodule
After outputting the "outputPO" to the DAC it gives a bunch of garbage and is clearly not a single value.
Thank you

To debug an FPGA problem like this you should absolutely simulate the design. If you have not already, create a testbench to initiate a write in the master module and connect the slave module as it would be in the system. Check the wave forms match the behavior you expect. It is not effective debugging in hardware until this simulation is working. If you do not have access to a paid simulator there are free verilog simulators available. One suggestion is to build this simulation environment in EDA Playground and then you can share it here as part of the problem description.
Secondly I noticed a number of things that could be improved the quality and readability of your code which does make debugging easier:
Indent code inside blocks (begin/end pairs, etc).
Always use non-blocking assignments inside clocked processes and blocking assignments in combinatorial blocks. For example the non-blocking statements in your combinatorial process assigning outputPO in SPI_COMM_SLAVE are wrong. This can lead to simulation not matching synthesized results.
Latches are not recommended for fpga designs. SPI_COMM_SLAVE will synthesize a 16bit latch for outputPO. Consider making this signal a register.
Your master architecture looks more complex than it needs to be. Consider separating the functionality that initiates the spi transactions (div_counter) from the logic that does the actual spi transaction.

How to improve the speed of a multiplier in verilog?

How to improve the speed of a multiplier in verilog?
Hi
I want to know about 'How to improve the speed of a multiplier without increasing clock speed in verilog?'
Does anyone know about regarding this?
We don't have much money to buy DesignWare of Synopsys's.
Unfortunately, Also we met some problem regarding speed limit of multiplier. So I trying to find way to improve multiplier without clock speed up. Especially, our ASIC has already approached to timing limit. We don't have timing margin.But We have to change regarding the multiplier logic.
For example, we have already met the all timing clock in synthesis.but we need to change the algorithm some multiplier regarding logic.

Assuming that all surrounding logic has been minimised, inputs from flip-flops output direct to flip-flops.
module multiplier(
WDITH = 24
)(
input clk,
input signed [WIDTH-1:0] a,
input signed [WIDTH-1:0] b,
output logic signed [(WIDTH*2) -1:0] mul
);
logic signed [WIDTH-1:0] a_i;
logic signed [WIDTH-1:0] b_i;
always #(posedge clk) begin
a_i <= a;
b_i <= b;
mul <= a_i * b_i;
end
endmodule
Having the a*b style in RTL allows the synthesis library to choose the best multiplier style (Area/power vs speed ). Assuming the question is a result of synthesis not being able to close timing.
What limits the multiplier speed?
Input width could be minimised to speed up design.
For ASIC design the next process node could be chosen ie going from 22nm to 14nm. For FPGA a more expensive chip supporting a faster multiplier speed.
Alternatively the target clock speed of the Multiplier could be halved and two used in parallel. Multi-cycle clocks could be used in synthesis if actual clock is to remain the same but the result sampled every other clock.
module multiplier(
WDITH = 24
)(
input clk,
input rst_n,
input signed [WIDTH-1:0] a,
input signed [WIDTH-1:0] b,
output logic signed [(WIDTH*2) -1:0] mul
);
logic signed [WIDTH-1:0] a1_i;
logic signed [WIDTH-1:0] b1_i;
logic signed [WIDTH-1:0] a2_i;
logic signed [WIDTH-1:0] b2_i;
logic signed [(WIDTH*2) -1:0] mul1;
logic signed [(WIDTH*2) -1:0] mul2;
logic state;
always #(posedge clk, negedge rst_n) begin
if (~rst_n) begin
state <= 'b0;
end
else begin
state <= ~state;
end
end
always #* begin
mul1_i = a1_i * b1_i;
mul2_i = a2_i * b2_i;
end
always #(posedge clk, negedge rst_n) begin
if (~rst_n) begin
a1_i <= 'b0;
b1_i <= 'b0;
a2_i <= 'b0;
b2_i <= 'b0;
mul <= 'b0
end
else begin
if (state) begin
a1_i <= a;
b1_i <= b;
mul <= mul2_i;
end
else begin
a2_i <= a;
b2_i <= b;
mul <= mul1_i;
end
end
end
endmodule
Where mul1_i and mul2_i; are given multi cycle properties in synthesis, so they have twice the clock period to resolve.
Another possibility is to instantiate a multi-cycle design ware multiplier, using the designware Datapath and building block IP. They have 2,3,4,5 and 6 cycle multipliers.
An example of a 2-Stage Multiplier :
module DW02_mult_2_stage_inst( inst_A, inst_B, inst_TC,
inst_CLK, PRODUCT_inst );
parameter A_width = 8;
parameter B_width = 8;
input [A_width-1 : 0] inst_A;
input [B_width-1 : 0] inst_B;
input inst_TC;
input inst_CLK;
output [A_width+B_width-1 : 0] PRODUCT_inst;
// Instance of DW02_mult_2_stage
DW02_mult_2_stage #(A_width, B_width)
U1 ( .A(inst_A), .B(inst_B), .TC(inst_TC),
.CLK(inst_CLK), .PRODUCT(PRODUCT_inst) );
endmodule

Circuit behaves poorly in timing simulation but alright in behavioral - new to verilog

I'm new to verilog development and am having trouble seeing where I'm going wrong on a relatively simple counter and trigger output type design.
Here's the verilog code
Note the code returns the same result whether or not the reg is declared on the output_signal without the internal_output_buffer
`timescale 1ns / 1ps
module testcounter(
input wire clk,
input wire resetn,
input wire [31:0] num_to_count,
output reg [7:0] output_signal
);
reg [31:0] counter;
initial begin
output_signal = 0;
end
always#(negedge resetn) begin
counter = 0;
end
always#(posedge clk) begin
if (counter == num_to_count) begin
counter = 0;
if (output_signal == 0) begin
output_signal = 8'hff;
end
else begin
output_signal = 8'h00;
end
end
else begin
counter = counter + 1;
end
end
assign output_signal = internal_output_buffer;
endmodule
And the code is tested by
`timescale 1ns / 1ps
module testcounter_testbench(
);
reg clk;
reg resetn;
reg [31:0] num_to_count;
wire [7:0] output_signal;
initial begin
clk = 0;
forever #1 clk = ~clk;
end
initial begin
num_to_count = 20;
end
initial begin
#7 resetn = 1;
#35 resetn = 0;
end
testcounter A1(.clk(clk),.resetn(resetn),.num_to_count(num_to_count),.output_signal(output_signal));
endmodule
Behavioral simulation looks as I expected
But the timing simulation explodes
And for good measure: the actual probed execution blows up and looks like
Any tips would be appreciated. Thanks all.

The difference between the timing and functional simulations is that a timing simulation models the actual delay of logic gates while the functional simulation just checks if values are correct.
For e.g. if you have a simple combinational adder with two inputs a and b, and output c. A functional simulation will tell you that c=a+b. and c will change in the exact microsecond that a or b changes.
However, a timing simulation for the same circuit will only show you the result (a+b) on c after some time t, where t is the delay of the adder.
What is your platform? If you are using an FPGA it is very difficult to hit 500 MHz. Your clock statement:
forever #1 clk = ~clk;
shows that you toggle the clock every 1ns, meaning that your period is 2ns and your frequency is 500MHz.
The combinational delay through FPGA resources such as lookup tables, multiplexers and wire segments is probably more than 2ns. So your circuit violates timing constraints and gives wrong behaviour.
The first thing I would try is to use a much lower clock frequency, for example 100 MHz and test the circuit again. I expect it to produce the correct results.
forever #5 clk = ~clk;
Then to know the maximum safe frequency you can run at, look at your compilation reports in your design tools by running timing analysis. It is available in any FPGA CAD tool.

Your code seems working fine using Xilinx Vivado 14.2 but there is only one error which is the following line
assign output_signal = internal_output_buffer;
You can't assign registers by using "assign" and also "internal_output_buffer" is not defined.
I also personally recommend to set all registers to some values at initial. Your variables "resetn" and "counter" are not assigned initially. Basicly change your code like this for example
reg [31:0] counter = 32'b0;
Here is my result with your code:

Your verilog code in the testcounter looks broken: (a) you're having multiple drivers, and (b) like #StrayPointer notices, you're using blocking assignments for assigning Register (Flip-Flop) values.
I'm guessing your intent was the following, which could fix a lot of simulation mismatches:
module testcounter
(
input wire clk,
input wire resetn,
input wire [31:0] num_to_count,
output reg [7:0] output_signal
);
reg [31:0] counter;
always#(posedge clk or negedge resetn) begin
if (!resetn) begin
counter <= 0;
end else begin
if (counter == num_to_count) begin
counter <= 0;
end else begin
counter <= counter + 1;
end
end
end
assign output_signal = (counter == num_to_count) ? 8'hff : 8'h00;
endmodule

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

ice40 clock delay, output timing analysis - verilog

Related

Modelsim - Object not logged & no signal data while simulating verilog clock divider code

Clock Domain Crossing for Pulse and Level Signal

Verilog Inter-FPGA SPI Communication

How to improve the speed of a multiplier in verilog?

Circuit behaves poorly in timing simulation but alright in behavioral - new to verilog

Categories

Resources