Extra regs being created during synthesis - verilog

Say I have 2 multi-bit regs in design. Both of them share a common condition (cond_x) as their enable but 1 of them has an extra condition (cond_y) apart from reset signal for when its meant to be reset.
Example (ignoring reset part of code for simplicity) -
Same always block
always #(posedge clock) begin
if(cond_x) begin
a <= a_next;
b <= b_next;
else if(cond_y) begin
b <= 5'b0;
end
end
Different always blocks
always #(posedge clock) begin
if(cond_x) begin
a <= a_next;
end
end
always #(posedge clock) begin
if(cond_x) begin
b <= b_next;
else if(cond_y) begin
b <= 5'b0;
end
end
When I synthesize 2 i get more number of regs than are expected in the design. Using 1 it is accurate. The extra regs are only for lower two bits of b and are suffixed by __rep1. Not sure what that means or how it is being created.
Is there any possible reason for the same? I am using Synopsys DC

Design Compiler can replicate cells to improve timing, load etc. and the replicated cells get the suffix _rep<n>. The datasheet of DC Ultra has the following explanation:
DC Ultra looks at a larger subsection of the critical path during
logic duplication and can replicate many gates to reduce load of high
fan-out nets, hence improving timing on critical paths through load
isolation.
However the two code snippets seem identical, DC can produce different results depending on the starting conditions. Most probably the second code was synthesized into a worse circuit for b[1:0] and the tool had to replicate these two flip-flops.

Related

Verilog module generates impossible data when running on FPGA

I’m developing a boolean data logger on a ZYNQ 7000 SoC. The logger takes a boolean input from GPIO and logs the input’s value and the time it takes to flip.
I use a 32-bit register as a log entry, the MSB bit is the boolean value. The 30:0 bits is an unsigned integer which records the time between last 2 flips. The logger should work like the following picture.
Here's my implementation of the logger in Verilog. To read the logged data from the processor, I use an AXI slave interface generated by vivado and inline my logger in the AXI module.
module BoolLogger_AXI #(
parameter BufferDepth = 512
)(
input wire data_in, // boolean input
input wire S_AXI_ACLK, // clock
input wire S_AXI_ARESETN, // reset_n
// other AXI signals
);
wire slv_reg_wren; // write enable of AXI interface
reg[31:0] buff[0:BufferDepth-1];
reg[15:0] idx;
reg[31:0] count;
reg last_data;
always #(posedge S_AXI_ACLK) begin
if((!S_AXI_ARESETN) || slv_reg_wren) begin
idx <= 0;
count <= 1;
last_data <= data_in;
end else begin
if(last_data!=data_in) begin // add an entry only when input flips
last_data <= data_in;
if(idx < BufferDepth) begin // stop logging if buffer is full
buff[idx] <= count | (data_in << 31);
idx <= idx + 1;
end
count <= 1;
end else begin
count <= count + 1;
end
end
end
//other AXI stuff
endmodule
In the AXI module, the 512*32bit logged data is mapped to addresses from 0x43c20000 to 0x43c20800.
In the Verilog code, the logger adds a new entry only when the boolean input flips. In simulation, the module works as expected. But in the FPGA, sometimes the logged data is not valid. There are successive 2 data and their MSB bit is the same, which means the entry is added even when the boolean input stays the same.
The invalid data appear from time to time. I've tried reading from the address programmatically (*(u32*)(0x43c20000+4*idx)), and there are still invalid data. I watch idx in a ILA module and idx is 512, which means the logging finishes when I read the data.
The FPGA clock is 10 MHz. The input signal is 10 Hz. So the typical period is 10e6/10/2=0x7A120, which most of the data is close to, except the invalid data.
I think if the Verilog code is implemented well, there should be no such invalid data. What may be the problem? Is this an issue about timing?
The code
First off, are you absolutely sure you are not issuing an accidental write on the AXI bus, resetting the registers?
If so, have you tried inserting a so-called double-flop on data_in (two flip-flops, delaying the signal two clock ticks)? I suppose that your data_in is not synchronous to the FPGA clock, which will lead to metastability and you having bad days if not accounted for. Have a look here for information by NANDLAND.
Citing the linked source:
If you have ever tried to sample some input to your FPGA, such as a button press, or if you have had to cross clock domains, you have had to deal with Metastability. A metastable state is one in which the output of a Flip-Flop inside of your FPGA is unknown, or non-deterministic. When a metastable condition occurs, there is no way to tell if the output of your Flip-Flop is going to be a 1 or a 0. A metastable condition occurs when setup or hold times are violated.
Metastability is bad. It can cause your FPGA to exhibit very strange behavior.
In that source there is also a link to a white paper from Altera about the topic of metastability, linked here for reference.
Citing from that paper:
When a metastable signal does not resolve in the allotted time, a logic failure can result if the destination logic observes inconsistent logic states, that is, different destination registers capture different values for the metastable signal.
and
To minimize the failures due to metastability in asynchronous signal transfers, circuit designers typically use a sequence of registers (a synchronization register chain or synchronizer) in the destination clock domain to resynchronize the signal to the new clock domain. These registers allow additional time for a potentially metastable signal to resolve to a known value before the signal is used in the rest of the design.
Basically having the asynchronous signal routed to two flip-flops might for example lead to one FF reading a 1 and one FF reading a 0. This in turn could lead to the data point being saved, but the counter not being reset to 0 (hence doubling the measured time) and the bit being saved as 0.
Finally, it seems to me, that you are using the Vivado-generated example AXI core. Dan Gisselquist simply calls it "broken". This might not be the problem here, but you might want to have a look at his posts and his AXI core design.

Fast array inner access in verilog

I have some lines of code below:
wire [WIDTH_PIXEL-1:0] x_vector [0:36];
wire [6-1:0] x_sample [0:511]; // 0 <= x_sample <= 36
reg [WIDTH_PIXEL-1:0] rx_512 [0:511];
genvar p;
generate
for(p=0;p<=511;p=p+1) begin: PPP
always#(posedge clk) begin
if(x_sample[p] == counter2) begin
rx_512[p] <= x_vector[x_sample[p]];
end
end
I want to save 512 x_vector elements whose address is the value of x_sample[p]. The problem is when I synthesize on Quartus, the total LC-combinationals over 50000. I know the problem lies on the line
rx_512[p] <= x_vector[x_sample[p]];
So is there any way for improving the access memory? Thank you.
Keep in mind that Verilog is meant as a hardware emulation language.
This makes that you have to learn to write two different types of code:
Code that gets converted to hardware
Test bench code
For the former there are a lot more restrictions. As you correctly noticed you get 512 comparators each comparing 6 bits plus each conditionally selecting one of 37 PIXELWIDTH values and assigning it to one of 512 PIXELWIDTH destinations. My guess is easily a million gates.
You have to use a divide an conquer approach. As Qiu says make the code sequential: One operation per clock cycle. It will take more clock cycles but a lot less logic. Unfortunately you might find out that you do not have enough time to e.g. process a whole image in that (frame?) time. Then choose to do two or four operations per cycle.
You have to continuously weigh speed versus number of gates & power. Maybe you find out that you can't do the operations at all with the chosen hardware. (Nobody said writing Verilog was easy!)
I don't know if it helps but you can make the compiler/optimizer's life a bit easier if you use:
rx_512[p] <= x_vector[counter2];

How to create a pos-edge Write pulse into a neg-edge pulse?

I have a data source signal that transitions high on the positive edge of its clock when it has data ready to be written.
I also have ram memory (running from the same clock) but expects it's write request signal to transition on the negative edge of the clock (and stay high until the following negative edge of the clock).
If I try driving the memory's wr_req directly from the data source then both the clock and wr_req transition at the same time and the memory doesn't get
the data.
How can I delay the write pulse such that it goes high (for one cycle) starting on the next negative edge of the clock?
If I understand correctly, this should do what you want:
reg blah;
always #(negedge clk) begin
blah <= !foo;
end
Or even:
reg blah;
always #* begin
if (!clk) begin
blah = !foo;
end
end
Simulated:

Flip flop with load/set, reset, clk, and input

I'm not looking for a hardware language description of the flip flop, but the logic gate level to implement.
In verilog, the equivalent I'm looking for is:
always#(posedge clk or negedge reset) begin
if(~reset)
Q <= 1'b0;
else if(~load)
Q <= D;
end
I've looked at: http://reviseomatic.org/help/e-flip-flop/4013%20D-Type%20Flip%20Flop.php
and
http://www.csee.umbc.edu/~squire/images/dff.jpg
the problem with the above implementation is that after I set a value to Q (D=0,Q=0,load=0) with load(set in picture) = 0, then when i set load high load = 1 on the next clk cycle, i get (D=x,Q=1,load=1). In order words, changing load from true to false will change the value of Q, but I want Q to hold it's previous value.
What is a flip flop that would hold it's value on Q after it has been set and enable is set high?
You should try looking up a mux flop.
It has a mux in front of the standard d-type and connects it input to output when load is not selected.
Your problem is that 'synchronous load enable' is not the same as 'asynchronous set'. Your Verilog code shows a F/F with an async reset, and a synchronous load enable. Your first (reviseomatic) reference is just nonsense - ignore it. It attempts (wrongly) to describe a 4013, which doesn't have a load enable. I haven't looked at the second reference in detail, but it looks like a conventional latch-based implementation of a F/F with async active-low set and reset.
You can implement flops in several ways:
For a CMOS transmission-gate flop implementation, see the NXP
datasheet for a 4013
For latch-based TTL, see the
datasheet for a 7474
The old TI databooks used to show flop
implementations using async feedback circuits.
For the synchronous load control part, look at Morgan's mux link.

what is the best way to exchange 2 registers in Verilog

i know several ways to exchange 2 registers :
using 3 xors, using register, using multiplexer, etc...
how can we make conditional exchange, it should take as less code as possible and work as fast as possible
Tripple-XOR is a software trick used to exchange register values on a sequential machine where a direct register exchange instruction (eg. x86 XCHG) is not available. The XOR instructions cannot be executed concurrently as they each depend on the previous output, so it takes three instruction cycles.
With Verilog you are describing hardware, so you exchange two register's values in a single cycle by assignment. This will infer a load path for both registers from each other's output.
if (swap) begin
a <= b;
b <= a;
end
You mention multiplexers - only if there is another load path, multiplexers and control logic will be instantiated as required, eg. some swap/increment device would have multiplexers since a can be loaded with either b or a+1.
if (swap) begin
a <= b;
b <= a;
end else begin
a <= a+1;
end
The simple way is probably best in Verilog - just assign them to each other using non-blocking assignments
a <= b;
b <= a;
The synthesizer will do the right thing.
a <= b;
b <= a;
That's all you need to do.

Resources