error when elaborating in design vision - verilog

I am trying to write a code in verilog and synthes it in design vision but when elaborating design vision give below errors :
net "countS[5]" is driven by more than one source,and at least one source is constant net
net "countS[4]" is driven by more than one source,and at least one source is constant net
net "countS[3]" is driven by more than one source,and at least one source is constant net
net "countS[2]" is driven by more than one source,and at least one source is constant net
net "countS[1]" is driven by more than one source,and at least one source is constant net
net "countS[0]" is driven by more than one source,and at least one source is constant net
net "countH[5]" is driven by more than one source,and at least one source is constant net
net "countH[4]" is driven by more than one source,and at least one source is constant net
net "countH[3]" is driven by more than one source,and at least one source is constant net
net "countH[2]" is driven by more than one source,and at least one source is constant net
net "countH[1]" is driven by more than one source,and at least one source is constant net
net "countH[0]" is driven by more than one source,and at least one source is constant net
my code exist in below :
module main(clk,ts1,ts2,ts3,ts4,mode,res);
//clock of circuit
input clk;
//input switchs that indicate delays in test mode
input [3:0] ts1;
input [3:0] ts2;
input [3:0] ts3;
input [3:0] ts4;
//input switch that indicate mode of circuit
input [1:0] mode;
//output that indicate state of circuit
output reg [2:0] res;
//regs for counting
reg [5:0] countH;
reg [5:0] countS;
//array that indicate delays
reg [7:0] delays [3:0];
initial begin
//resetting circuit variables
countH = 0;
countS = 0;
res = 0;
//setting delays for regular mode
delays[0] = 30; //rg
delays[1] = 5; //ry
delays[2] = 45; //gr
delays[3] = 5; //yr
end
//trig always whenever mode was changed
always #(mode[0] or mode[1]) begin
//restarting timer
countH = 0;
countS = 0;
//mean that mode is regular
if(mode == 2'b00) begin
delays[0] = 30; //rg
delays[1] = 5; //ry
delays[2] = 45; //gr
delays[3] = 5; //yr
//mean that mode is test mode
end else if(mode == 2'b01) begin
//setting delays according to input switchs
delays[0] = ts1; //rg
delays[1] = ts2; //ry
delays[2] = ts3; //gr
delays[3] = ts4; //yr
//mean that mode is standby
end else begin
delays[0] = 0; //rg
delays[1] = 0; //ry
delays[2] = 0; //gr
delays[3] = 0; //yr
res = 4;
end
end
//trig in all clocks
always #(negedge clk) begin
countH = countH + 1;
//count=60 mean 1sec
if(countH == 60) begin
//updating variables
countH = 0;
countS = countS + 1;
//mean that mode is standby
if(mode == 2) begin
res = 4;
countS = 0;
//mean that mode is regular or test
end else begin
//checking for delay
if(countS == delays[res]) begin
countS = 0;
res = res + 1;
if(res == 4) begin
res = 0;
end
end
end
end
end
endmodule

The errors are generated by the synthesis tool, since synthesis tries to convert the design written i Verilog to hardware, but during this translation it finds that the wires countS (and other) are driven from several locations.
These locations are the initial and always blocks where the wires are assigned. Think of it like doing a design using discrete gates; in this case it would also lead to a problem if multiple drive the same wires.
So you need to modify the design, so each wire/reg is only driven by one always block or continuous assignment, where the initial block is probably to be converted to some asynchronous or synchronous reset, or initial value like reg [5:0] countS := 0;.

Related

How can I prevent that DSP blocks are synthesized away if they are not connected to a top level output?

I am using an Intel Stratix 10 FPGA and Quartus Prime Pro 21.4 to develop a power test project.
I cannot figure out how keep Quartus from optimizing away my DSP blocks.
I want to use all 3000 DSP blocks in our FPGA so that I can see the max current draw of the DSP block. Of course, we can use the power estimator, but we require a real-world physical test.
I actually don't need the output from the DSP block. I only care that they are running and using FPGA resources.
I have instantiated the Intel fixed DSP core IP as a multiplier:
https://www.intel.com/content/www/us/en/docs/programmable/683450/current/native-fixed-point-dsp-intel-stratix-51840.html
I am using a generate for loop to generate 3000 of these DSP IP blocks. My problem is that the DSP blocks are synthesized away unless I connect the output from each of the DSP blocks directly to a top level output. I only have ~1000 outputs available so this is not possible.
I thought I could just connect each output with a register array to catch the output. But it seems that if I don't actually use the output values or connect it outright to a top level output pin, then Quartus thinks we don't need it and optimizes it away.
The 2nd solution I tried is to use combinational logic:
top_output = DSP_out[0] || DSP_out[1] || DSP_out[2] || DSP_out[3]
this solution will generate 4 DSP blocks even though the generate loop runs 3000 times. I tried doing this in a loop, but it did not work. Is there a way to trick the system into synthesizing all the DSP blocks even if I don't connect the block to a top level output?
I seem to be able to access the output of the DSP block with no issues. For instance, I was able to turn on or off an LED based on the numbers I fed into a single multiplier.
Here is the full code:
`timescale 1ps/1ps
`default_nettype none
module power_test_design (
input wire clk_i,
output reg [0:0] outputa,
output reg [0:0] outputb
);
localparam NUM_DSP_BLOCKS = 3000;
genvar i;
wire reset;
integer k;
//input stimulus signals for the DSP
reg [17:0] ay_r;
reg [17:0] by_r;
reg [17:0] ax_r;
reg [17:0] bx_r;
//create wires and registers to hold outputs from multiplier
(* keep = "true" *) wire [36:0] resulta [NUM_DSP_BLOCKS-1:0];
(* keep = "true" *) reg [36:0] resulta_r [NUM_DSP_BLOCKS-1:0];
(* keep = "true" *) wire [36:0] resultb [NUM_DSP_BLOCKS-1:0];
(* keep = "true" *) reg [36:0] resultb_r [NUM_DSP_BLOCKS-1:0];
reg [2:0] ena_r;
// Stratix10 system reset
reset_release U_RESET (
.ninit_done (reset ) // output, width = 1, ninit_done.ninit_done
);
// DSP stimulus
always #(posedge clk_i) begin : DSP_SET_FF
if (reset)
begin
ay_r <= {18{1'b0}};
by_r <= {18{1'b0}};
ax_r <= {18{1'b0}};
bx_r <= {18{1'b0}};
ena_r <= {3{1'b0}};
end else
begin
ena_r <= 3'b001;
ay_r <= $unsigned(ay_r) + 1;
by_r <= $unsigned(by_r) + 1;
ax_r <= $unsigned(ax_r) + 2;
bx_r <= $unsigned(bx_r) + 3;
end
end
generate
for (i=0; i<NUM_DSP_BLOCKS; i=i+1) begin : GEN_DSPS
dsp_fixed U_DSP (
.ay (ay_r), // input, width = 18, ay.ay
.by (by_r), // input, width = 18, by.by
.ax (ax_r), // input, width = 18, ax.ax
.bx (bx_r), // input, width = 18, bx.bx
.resulta (resulta[i]), // output, width = 37, resulta.resulta
.resultb (resultb[i]), // output, width = 37, resultb.resultb
.clk0 (clk_i), // input, width = 1, clk0.clk
.clk1 (), // input, width = 1, clk1.clk
.clk2 (), // input, width = 1, clk2.clk
.ena (ena_r) // input, width = 3, ena.ena
);
//bring result to a register to assign output logic
assign resulta_r[i] = resulta[i];
assign resultb_r[i] = resultb[i];
end
endgenerate
//output logic -this code generates 6 DSP blocks....I need to generate all 3000
always #(posedge clk_i) begin : outputLogic
for (k=1; k<50; k=k+1)
begin
outputa = resulta_r[k] || resulta_r[k+1] || resulta_r[k+2];
outputb = resultb_r[k+3] || resultb_r[k+4] || resultb_r[k+5];
end
end
endmodule
`resetall
So far, I tried several ways to assign this output. first:
always #(resulta_r[0], resulta_r[1], resulta_r[2], resulta_r[3]) begin
if (resulta_r[0] == 4)
begin
outputa = 1;
end
else if (resulta_r[1] == 6)
begin
outputa = 1;
end
else if (resulta_r[2] == 6)
begin
outputa = 1;
end
else if (resulta_r[3] == 6)
begin
outputa = 1;
end
else
begin
outputa = 0;
end
end
With this code, DSP blocks are generated for each if statement. So, the next idea was
always #(posedge clk_i) begin : outputLogic
for (k=1; k<50; k=k+1)
begin
outputa = resulta_r[k] || resulta_r[k+1] || resulta_r[k+2];
outputb = resultb_r[k+3] || resultb_r[k+4] || resultb_r[k+5];
end
end
This works in a similar way. I get a DSP block generated for each result[k] in the combinational statement. But this only generates 6 DSP blocks in total when synthesizing. It only generates blocks based on how many DSP block outputs are in this combinational statement.
I solved this issue using Virtual pins in quartus. I can assign each output pin to only be a virtual pin and not an actual pin. With this setup I can have as many output pins as I require and not really connect them to anything.
Quartus Virtual Pins
The design still doesn't scale up to 3000 for some reason, but I have reached out to Intel for that. The original issue of optimizing away the DSP blocks unless they are connected to an output is solved.
The other solution that solved this issue was to chain several of these DSP blocks together. It also doesn't scale, but solves the original question asked here as well.

Will temp variable in always_comb create latch

I have following code snippet where a temp variable is used to count number of 1s in an array:
// count the number 1s in array
logic [5:0] count_v; //temp
always_comb begin
count_v = arr[0];
if (valid) begin
for (int i=1; i<=31; i++) begin
count_v = arr[i] + count_v;
end
end
final_count = count_v;
end
Will this logic create a latch for count_v ? Is synthesis tool smart enough to properly synthesize this logic? I am struggling to find any coding recommendation for these kind of scenarios.
Another example:
logic temp; // temp variable
always_comb begin
temp = 0;
for (int i=0; i<32; i++) begin
if (i>=start) begin
out_data[temp*8 +: 8] = in_data[i*8 +: 8];
temp = temp + 1'b1;
end
end
end
For any always block with deterministic initial assignment, it will not generate latch except logic loop.
Sorry Eddy Yau, we seem to have some discussions going on regarding your post.
Here is some example code:
module latch_or_not (
input cond,
input [3:0] v_in,
output reg latch,
output reg [2:0] comb1,
output reg [2:0] comb2
);
reg [2:0] temp;
reg [2:0] comb_loop;
// Make a latch
always #( * )
if (cond)
latch = v_in[0];
always #( * )
begin : aw1
integer i;
for (i=0; i<4; i=i+1)
comb_loop = comb_loop + v_in[i];
comb2 = comb_loop;
end
always #( * )
begin : aw2
integer i;
temp = 7;
for (i=0; i<4; i=i+1)
temp = temp - v_in[i];
comb1 = temp;
end
endmodule
This is what came out if it according to the Xilinx Vivado tool after elaboration:
The 'latch' output is obvious. You will also notice that temp is not present in the end result.
The 'comb_loop' is not a latch but even worse: it is a combinatorial loop. The output of the logic goes back to the input. A definitely NO-NO!
General rule: if you read a variable before writing to it, then your code implies memory of some sort. In this case, both the simulator and synthesiser have to implement storage of a previous value, so a synthesiser will give you a register or latch. Both your examples write to the temporary before reading it, so no storage is implied.
Does it synthesisie? Try it and see. I've seen lots of this sort of thing in production code, and it works (with the synths I've used), but I don't do it myself. I would try it, see what logic is created, and use that to decide whether you need to think more about it. Counting set bits is easy without a loop, but the count loop will almost certainly work with your synth. The second example may be more problematical.

Proper way to synchronize two state machines with slightly skewed clock in Verilog

I am implementing a receiver for an ADC in Verilog. One sample is obtained after each 21st clock cycle.
The receiver generates the control signals as well as a duty cycled sampling clock for the ADC. The ADC sends the data back sequentially but in order to account for delay, it also sends back a skew matched copy of the duty cycled sampling clock. This clock is to be used to clock in the data.
The code should work for zero delay between the two clocks as well as larger delays. (But the delay won't be larger than a few clock cycles).
I do not know the best way to do this because:
Synthesis prohibits that variables are written in different always #(posedge...) blocks with (possibly) different clocks.
The part that clocks in the data does not have a real clock (it is duty-cycled!) so it cannot maintain a state on its own. It somehow needs to obtain the information in which cycle it is from the controlling FSM
Once the sampled value has been read, it needs to be transferred back to the original, un-skewed clock domain for further processing.
This shows a minimal example of my approach:
// Used to synchronize state between domains
reg sync_cnv = 0; // toggled by TX side when new sampling cycle starts
reg sync_sdo = 0; // synchronized by the RX side
reg reset_rx = 0; // Notify RX side of a global reset
reg reset_rx_ack = 0; // acknowledgement thereof
reg [4:0] state = 0;
reg [4:0] nextState = 0;
always #(posedge clk) begin
if (reset == 1) begin // global reset
state <= 0;
sync_cnv <= 0;
reset_rx <= 1;
end else begin
state <= nextState;
// new sampling cycle starts. Inform RX logic
if (state == 0) begin
sync_cnv <= ~sync_cnv;
end
// If RX acknowledges the reset, we can turn if off again
if (reset_rx_ack == 1) begin
reset_rx <= 0;
end
end
end
// Normally, would generate all kinds of status/control signal for the ADC here
always #(*) begin
if (state == 20) begin
nextState = 0;
end else begin
nextState = state + 1;
end
end
The state is just implemented as a 21-state counter variable state and nextState
When state if zero, a new sampling interval begins. The receiver logic (see below) will recognize this by the fact that sync_cnv changes.
On global reset, the FSM is brought into a known state. Furthermore, reset_rx is set to 1 to notify the receiver logic (see below) about the reset. It stays at 1 until it is acknowledged (reset_rx_ack).
The receive logic:
reg [14:0] counter = 0; // just for dummy data. Increments every sample interval
reg sampling_done = 0; // raised when sampling is done
reg [15:0] cbuf; // holds data during data reception
always #(posedge rxclk) begin
if ( reset_rx == 1) begin
reset_rx_ack <= 1;
sync_sdo <= sync_cnv;
counter <= 0;
end else begin
reset_rx_ack <= 0;
if (sync_cnv != sync_sdo) begin
// A new sampling interval begins
sync_sdo <= sync_cnv;
counter <= counter + 1;
sampling_done <= 1;
data <= cbuf;
end else begin
// normal operation
cbuf <= counter;
sampling_done <= 0;
end
end
end
// synchronize "sampling_done" back to the unskewed clock.
// if data_valid, then data can be read the next cycle of clk
always #(posedge clk) begin
r1 <= sampling_done; // first stage of 2-stage synchronizer
r2 <= r1; // second stage of 2-stage synchronizer
r3 <= r2; // edge detector memory
end
assign data_valid = (r2 && !r3); // pulse on rising edge
This code works flawlessly in simulation (with and without skew). It also works on the FPGA most of the time. However, the data value after a reset is not predictable: Mostly the data starts with 0 (as expected) but sometimes with 1 and or an arbitrary number (probably from the last cycle before the reset).
Using a NRZ signal between clock domains is a known method. But you do not have a real synchroniser. To safely go between clocks you need two registers and a third one for edge detection:
// Clock domain 1:
nrz <= ~nrz;
// Clock domain 2:
reg nrz_meta,nrz_sync,nrz_old;
....
nrz_meta <= nrz;
nrz_sync <= nrz_meta;
// nrz_sync is the signal you can safely use!
// Do NOT use nrz_sync ^ nrz_meta, it is not reliable!
nrz_old <= nrz_sync; // required to 'find' an edge
if (nrz_old ^ nrz_sync)
begin
// Process data
....
On a reset you set all registers to zero. That way you do not have a 'false' sample at the start. It is simplest to have the same asynchronous reset in all clock domains. Dealing with resets in clock domains is a rather (big) subject which would take an A4 page to tersely explain. In your case nothing happens for 21 clock cycles so you are safe.
Alternative is to use a standard asynchronous FIFO to transfer data between clock domains. It is the best solution if your clocks are totally independent (that is either clock can be slower or faster then the other one)
I am sure you can find code for it on the WWW.
An asynchronous FIFO in the other direction can be used to send control information to your ADC.

Can't identify unsafe latch behaviour or completeness of case statement in Verilog code

Hey I'm trying to create a small module that reads which button is pressed on a DE2 4x4 matrix board and then display which column and which row is being pressed on the LED's but I'm having a few problems.
Right now the Columns work but not the rows. I think it has something to do with the fact that the LEDS I use to display the row status have "unsafe latch behaviour" but I'm not too sure.
I have also noticed that for my case statement only ever resolves to the default statement and I don't know why and it says it can't check for completeness.
Would anybody be able to help me? If so thank you very much.
module MatrixInput(MInput, MOutput, LEDR);
input [16:10] MInput; //cols
output reg [24:18] MOutput; //rows
output reg [7:0] LEDR;
reg [31:0] counter; //just setting to max size for now
reg [31:0] i;
reg LEDFlag;
initial begin
counter = 0;
i = 7;
LEDFlag = 0;
end
always#(*) begin
case(counter)
0: MOutput = 7'b0x1x1x1;
1: MOutput = 7'b1x0x1x1;
2: MOutput = 7'b1x1x0x1;
3: MOutput = 7'b1x1x1x0;
default: MOutput = 7'b1x0x0x0;
endcase
LEDR[7] = MInput[10];
LEDR[6] = MInput[12];
LEDR[5] = MInput[14];
LEDR[4] = MInput[16];
repeat(4) begin //step through each col
if (LEDR[i] == 1) //set the LED flag on if any of the col LEDS on
LEDFlag = 1;
if (i != 3) //count down from 7 to 3
i = i - 1;
else
i = 7;
end
LEDR[counter] = LEDFlag;
LEDFlag = 0;
if (counter != 4)
counter = counter + 1;
else
counter = 0;
end
endmodule
There are a number of issues here, I'll give you some hints to get started. Number one is you need some kind of clock to make the counter actually count in a way that you can observe. Otherwise it just zips around like an infinite loop in software (actually, the synthesis tool is probably smart enough to see this and not synthesize any logic at all). Second, initial works only in simulation, but it is not a synthesizable construct. When you power up the logic, counter is going to be at some random value which will likely not match any of the 0-3 cases you have defined, which is why it always goes to the default case. You need a reset and to specify a reset value.

Why is adding one operation causing my number of logic elements to skyrocket?

I'm designing a 464 order FIR filter in Verilog for use on the Altera DE0 FPGA. I've got (what I believe to be) a working implementation; however, there's one small issue that's really actually given me quite a headache. The basic operation works like this: A 10 bit number is sent from a micro controller and stored in datastore. The FPGA then filters the data, and lights LED1 if the data is near 100, and off if it's near 50. LED2 is on when the data is neither 100 nor 50, or the filter hasn't filled the buffer yet.
In the specification, the coefficients (which have been pre provided), have been multiplied by 2^15 in order to represent them as integers. Therefore, I need to divide my final output Y by 2^15. I have implemented this using a shift, since it should be (?) the most efficient way. However, this single line causes my number of logic elements to jump from ~11,000 without it, to over 35,000. The Altera DE0 uses a Cyclone III FPGA which only has room for about 15k logic elements. I've tried doing it inside both combinational and sequential logic blocks, both of which have the same exact issue.
Why is this single, seemingly simple operation causing such an inflation elements? I'll include my code, which I'm sure isn't the most efficient, nor the cleanest. I don't care about optimizing this design for performance or area/density at all. I just want to be able to fit it onto the FPGA so it'll run. I'm not very experienced in HDL design, and this is by far the most complex project I've needed to tackle. It's worth noting that I do not remove y completely, I replace the "bad" line with assign YY = y;.
Just as a note: I haven't included all of the coefficients, for sanity's sake. I know there might be a better way to do it than using case statements, but it's the way that it came and I don't really want to relocate 464 elements to a parameter declaration, etc.
module lab5 (LED1, LED2, handshake, reset, data_clock, datastore, bit_out, clk);
// NUMBER OF COEFFICIENTS (465)
// (Change this to a small value for initial testing and debugging,
// otherwise it will take ~4 minutes to load your program on the FPGA.)
parameter NUMCOEFFICIENTS = 465;
// DEFINE ALL REGISTERS AND WIRES HERE
reg [11:0] coeffIndex; // Coefficient index of FIR filter
reg signed [16:0] coefficient; // Coefficient of FIR filter for index coeffIndex
reg signed [16:0] out; // Register used for coefficient calculation
reg signed [31:0] y;
wire signed [7:0] YY;
reg [9:0] xn [0:464]; // Integer array for holding x
integer i;
output reg LED1, LED2;
// Added values from part 1
input reset, handshake, clk, data_clock, bit_out;
output reg [9:0] datastore;
integer k;
reg sent;
initial
begin
sent = 0;
i=0;
datastore = 10'b0000000000;
y=0;
LED1 = 0;
LED2 = 0;
for (i=0; i<NUMCOEFFICIENTS; i=i+1)
begin
xn[i] = 0;
end
end
always#(posedge data_clock)
begin
if(handshake)
begin
if(bit_out)
begin
datastore = datastore >> 1;
datastore [9] = 1;
end
else
begin
datastore = datastore >> 1;
datastore [9] = 0;
end
end
end
always#(negedge clk)
begin
if (!handshake )
begin
if(!sent)
begin
y=0;
for (i=NUMCOEFFICIENTS-1; i > 0; i=i-1) //shifts coeffecients
begin
xn[i] = xn[i-1];
end
xn[0] = datastore;
for (i=0; i<NUMCOEFFICIENTS; i=i+1)
begin
// Calculate coefficient based on the coeffIndex value. Note that coeffIndex is a signed value!
// (Note: These don't necessarily have to be blocking statements.)
case ( 464-i )
12'd0: out = 17'd442; // This coefficient should be multiplied with the oldest input value
12'd1: out = -17'd373;
12'd2: out = -17'd169;
...
12'd463: out = -17'd373; //-17'd373
12'd464: out = 17'd442; //17'd442
// This coefficient should be multiplied with the most recent data input
// This should never occur.
default: out = 17'h0000;
endcase
y = y + (out * xn[i]);
end
sent = 1;
end
end
else if (handshake)
begin
sent = 0;
end
end
assign YY = (y>>>15); //THIS IS THE LINE THAT IS CAUSING THE ISSUE!
always #(YY)
begin
LED1 = 0;
LED2 = 1;
if ((YY >= 40) && (YY <= 60))
begin
LED1 <= 0;
LED2 <= 0;
end
if ((YY >= 90) && (YY <= 110))
begin
LED1 <= 1;
LED2 <= 0;
end
end
endmodule
You're almost certainly seeing the effects of synthesis optimisation.
The following line is the only place that uses y:
assign YY = (y>>>15); //THIS IS THE LINE THAT IS CAUSING THE ISSUE!
If you remove this line, all the logic that feeds into y (including out and xn) will be removed. On Altera you want to look carefully through your map report which will contain (buried amongst a million other things) information about all the logic that Quartus has removed and the reason behind it.
Good places to start are the Port Connectivity Checks which will tell you if any inputs or outputs are stuck high or low or are dangling. The look through the Registers Removed During Synthesis section and Removed Registers Triggering Further Register Optimizations.
You can try to force Quartus not to remove redundant logic by using the following in your QSF:
set_instance_assignment -name preserve_fanout_free_node on -to reg
set_instance_assignment -name preserve_register on -to foo
In your case however it sounds like the correct solution is to re-factor the code rather than try to preserve redundant logic. I suspect you want to investigate using an embedded RAM to store the coefficients.
(In addition to Chiggs' answer, assuming that you are hooking up YY correctly ....)
I would add that, you don't need >>>. It would be simpler to write :
assign YY = y[22:15];
And BTW, initial blocks are ignored for synthesis. So, you want to move that initialization to the respective always blocks in a if (reset) or if (handshake) section.

Resources