Combinational way of implementing a CAM in verilog

Combinational way of implementing a CAM in verilog - verilog

I'm trying to implement a cache and index lookup memory in SystemVerilog. It's a simple CAM + circular buffer. The interface is:
input rst_n;
input clk;
input [WORD_BITS-1:0] inp;
input rd_en;
input wr_en;
output logic [DEPTH_BITS-1:0] index;
output logic index_valid;
reg [WORD_BITS-1:0] buffer[$pow(2, DEPTH_BITS)];
reg [DEPTH_BITS-1:0] next;
There's basic async reset code. There's a synchronous block that stores inp in buffer and advances next whenever wr_en is high.
Now I'm trying to come up with an efficient and readable way of finding the index of inp when rd_en is high. It seems this could be completely combinational except when clocking the result into the index output. The way I'm visualizing it in my head is to xor inp with all of the buffer locations (it will be fairly small, perhaps 64 entries) then if that is equal to 0 the entry was found. Then a block to arbitrarily choose one of the indices with a 0 value. This is where is differs from a traditional CAM, there could be multiple entries for the same value but I really only need the index of one of those and it doesn't matter which one.
Any thoughts on how to do this in System Verilog (2012)? I know I can loop through all the memory locations synchronously and save a bunch of area but I'd rather it be fast than small. I'm targeting FPGAs. (initially inexpensive Lattice and maybe Xilinx parts) I know a few of the Lattice parts actually have CAM blocks but this is for cases where that isn't available.

Following up on a suggestion from another forum, the following seems to work well.
always_comb begin
index_valid = 0;
for (int i=0; i < 64; i=i+1) begin
if (rd_en) begin
if (inp == buffer[i]) begin
index = i;
index_valid = 1'b1;
end
end
end
end

Related

Filling register vector from FIFO with generated shifts

I'm a bit of a neophyte with Verilog, and I have just started working on a project, and I'm trying to verify that the code I have started with is workable. The code snippet below is unloading a FIFO into a vector of 8 bit registers. At each clock cycle it unloads a byte from the FIFO and puts it in the end of the register chain, shifting all the other bytes down the chain.
reg [ 7:0] mac_rx_regs [0 : 1361];
generate for (ii=0; ii<1361; ii=ii+1)
begin: mac_rx_regs_inst
always #(posedge rx_clk_int, posedge tx_reset)
if (tx_reset) begin
mac_rx_regs[ii] <= 8'b0;
mac_rx_regs[1361] <= 8'b0;
end else begin
if (rx_data_valid_r) begin
mac_rx_regs[ii] <= mac_rx_regs[ii+1];
mac_rx_regs[1361] <= rx_data_r;
end
end
end
endgenerate
I'd like to know if this is a good way to do this. I would have expected to just address the register vector with the byte count from reading the FIFO. I'm concerned that this isn't deterministic in that the order that the generated always blocks run is not specified, plus it seems that it'll cause a lot of unnecessary logic to be created for moving data from one register to another.

To start with, you don't really need to worry about the number of always statements in general. If they are all using the same clock and reset, you will get expected behavior relative to interaction between the processes.
The one thing I do, that is more about style than anything else, is to add a #FD to my flop assignments like shown below to make simulation look a little better, IMHO.
Also, this is simple enough that you could code this as a single process.
parameter FD = 1;
reg [1361*8-1:0] mac_rx_regs; // Arrays are good if you are trying to
// infer memory, but if you are okay
// with registers, just declare a vector.
always # (posedge clk or posedge reset)
begin
if (reset)
mac_rx_regs <= #FD 1361*8'h0;
else
// This next statement shifts in a new 8 bits when rx_data_valid_r is asserted.
// It will assign max_rx_regs to max_rx_regs (nop) when deasserted.
mac_rx_regs <= #FD rx_data_valid_r ? {mac_rx_regs[1361*8-9:0],rx_data_r} :
mac_rx_regs;
end

How to prevent ISE compiler from optmizing away my array?

I'm new to Verilog, ISE, FPGAs. I'm trying to implement a simple design into an FPGA, but the entire design is being optimized away. It is basically an 2D array with some arbitrary values. Here is the code:
module top(
output reg out
);
integer i;
integer j;
reg [5:0] array [0:99][0:31];
initial begin
for(i=0;i<100;i=i+1) begin
for(j=0;j<32;j=j+1) begin
array[i][j] = j;
out = array[i][j];
end
end
end
endmodule
It passes XST Synthesis fine, but it fails MAP in the Implementation process. Two Errors are given:
ERROR:Map:116 - The design is empty. No processing will be done.
ERROR:Map:52 - Problem encountered processing RPMs.
The entire code is being optimized away in XST. Why? What am I doing wrong?

The reason your design is being synthesized away is because you have not described any logic in your module.
The only block in your design is an initial block which is typically not used in synthesis except in limited cases; the construct mainly used for testbenches in simulation (running the Verilog through ModelSim or another simluator).
What you want is to use always blocks or assign statements to describe logic for XST to synthesize into a netlist for the FPGA to emulate. As the module you provided has neither of these constructs, no netlist can be generated, thus nothing synthesized!
In your case, it is not entirely clear what logic you want to describe as the result of your module will always have out equal to 31. If you want out to cycle through the values 0 to 31, you'll need to add some sequential logic to implement that. Search around the net for some tutorials on digital design so you have the fundamentals down (combinational logic, gates, registers, etc). Then, think about what you want the design to do and map it to those components. Then, write the Verilog that describes that design.
EDIT IN LIGHT OF COMMENTS:
The reason you are get no LUT/FF usage on the report is because the FPGA doesn't need to use any resources (or none of those resources) to implement your module. As out is tied to constant 31, it will always have the value of 1, so the FPGA only needs to tie out to Vdd (NOTE that out is not 31 because it is only a 1-bit reg). The other array values are never used nor accesses, so the FPGA synthesized them away (ie, not output needs to know the value of array[0][1] as out is a constant and no other ports exist in the design). In order to preserve the array, you need only use it to drive some output somehow. Heres a basic example to show you:
module top( input [6:0] i_in, // Used to index the array like i
input [4:0] j_in, // Used to index the array like j
output reg [5:0] out // Note, out is now big enough to store all the bits in array
);
integer i;
integer j;
reg [5:0] array[0:99][0:31];
always #(*) begin
// Set up the array, not necessarily optimal, but it works
for (i = 0; i < 100; i = i + 1) begin
for (j = 0; j < 32; j = j + 1) begin
array[i][j] = j;
end
end
// Assign the output to value in the array at position i_in, j_in
out = array[i_in][j_in];
end
endmodule
If you connect the inputs i_in and j_in to switches or something and out to 6 LEDs, you should be able to index the array with the switches and get the output on the LEDs to confirm your design.

Why is adding one operation causing my number of logic elements to skyrocket?

I'm designing a 464 order FIR filter in Verilog for use on the Altera DE0 FPGA. I've got (what I believe to be) a working implementation; however, there's one small issue that's really actually given me quite a headache. The basic operation works like this: A 10 bit number is sent from a micro controller and stored in datastore. The FPGA then filters the data, and lights LED1 if the data is near 100, and off if it's near 50. LED2 is on when the data is neither 100 nor 50, or the filter hasn't filled the buffer yet.
In the specification, the coefficients (which have been pre provided), have been multiplied by 2^15 in order to represent them as integers. Therefore, I need to divide my final output Y by 2^15. I have implemented this using a shift, since it should be (?) the most efficient way. However, this single line causes my number of logic elements to jump from ~11,000 without it, to over 35,000. The Altera DE0 uses a Cyclone III FPGA which only has room for about 15k logic elements. I've tried doing it inside both combinational and sequential logic blocks, both of which have the same exact issue.
Why is this single, seemingly simple operation causing such an inflation elements? I'll include my code, which I'm sure isn't the most efficient, nor the cleanest. I don't care about optimizing this design for performance or area/density at all. I just want to be able to fit it onto the FPGA so it'll run. I'm not very experienced in HDL design, and this is by far the most complex project I've needed to tackle. It's worth noting that I do not remove y completely, I replace the "bad" line with assign YY = y;.
Just as a note: I haven't included all of the coefficients, for sanity's sake. I know there might be a better way to do it than using case statements, but it's the way that it came and I don't really want to relocate 464 elements to a parameter declaration, etc.
module lab5 (LED1, LED2, handshake, reset, data_clock, datastore, bit_out, clk);
// NUMBER OF COEFFICIENTS (465)
// (Change this to a small value for initial testing and debugging,
// otherwise it will take ~4 minutes to load your program on the FPGA.)
parameter NUMCOEFFICIENTS = 465;
// DEFINE ALL REGISTERS AND WIRES HERE
reg [11:0] coeffIndex; // Coefficient index of FIR filter
reg signed [16:0] coefficient; // Coefficient of FIR filter for index coeffIndex
reg signed [16:0] out; // Register used for coefficient calculation
reg signed [31:0] y;
wire signed [7:0] YY;
reg [9:0] xn [0:464]; // Integer array for holding x
integer i;
output reg LED1, LED2;
// Added values from part 1
input reset, handshake, clk, data_clock, bit_out;
output reg [9:0] datastore;
integer k;
reg sent;
initial
begin
sent = 0;
i=0;
datastore = 10'b0000000000;
y=0;
LED1 = 0;
LED2 = 0;
for (i=0; i<NUMCOEFFICIENTS; i=i+1)
begin
xn[i] = 0;
end
end
always#(posedge data_clock)
begin
if(handshake)
begin
if(bit_out)
begin
datastore = datastore >> 1;
datastore [9] = 1;
end
else
begin
datastore = datastore >> 1;
datastore [9] = 0;
end
end
end
always#(negedge clk)
begin
if (!handshake )
begin
if(!sent)
begin
y=0;
for (i=NUMCOEFFICIENTS-1; i > 0; i=i-1) //shifts coeffecients
begin
xn[i] = xn[i-1];
end
xn[0] = datastore;
for (i=0; i<NUMCOEFFICIENTS; i=i+1)
begin
// Calculate coefficient based on the coeffIndex value. Note that coeffIndex is a signed value!
// (Note: These don't necessarily have to be blocking statements.)
case ( 464-i )
12'd0: out = 17'd442; // This coefficient should be multiplied with the oldest input value
12'd1: out = -17'd373;
12'd2: out = -17'd169;
...
12'd463: out = -17'd373; //-17'd373
12'd464: out = 17'd442; //17'd442
// This coefficient should be multiplied with the most recent data input
// This should never occur.
default: out = 17'h0000;
endcase
y = y + (out * xn[i]);
end
sent = 1;
end
end
else if (handshake)
begin
sent = 0;
end
end
assign YY = (y>>>15); //THIS IS THE LINE THAT IS CAUSING THE ISSUE!
always #(YY)
begin
LED1 = 0;
LED2 = 1;
if ((YY >= 40) && (YY <= 60))
begin
LED1 <= 0;
LED2 <= 0;
end
if ((YY >= 90) && (YY <= 110))
begin
LED1 <= 1;
LED2 <= 0;
end
end
endmodule

You're almost certainly seeing the effects of synthesis optimisation.
The following line is the only place that uses y:
assign YY = (y>>>15); //THIS IS THE LINE THAT IS CAUSING THE ISSUE!
If you remove this line, all the logic that feeds into y (including out and xn) will be removed. On Altera you want to look carefully through your map report which will contain (buried amongst a million other things) information about all the logic that Quartus has removed and the reason behind it.
Good places to start are the Port Connectivity Checks which will tell you if any inputs or outputs are stuck high or low or are dangling. The look through the Registers Removed During Synthesis section and Removed Registers Triggering Further Register Optimizations.
You can try to force Quartus not to remove redundant logic by using the following in your QSF:
set_instance_assignment -name preserve_fanout_free_node on -to reg
set_instance_assignment -name preserve_register on -to foo
In your case however it sounds like the correct solution is to re-factor the code rather than try to preserve redundant logic. I suspect you want to investigate using an embedded RAM to store the coefficients.

(In addition to Chiggs' answer, assuming that you are hooking up YY correctly ....)
I would add that, you don't need >>>. It would be simpler to write :
assign YY = y[22:15];
And BTW, initial blocks are ignored for synthesis. So, you want to move that initialization to the respective always blocks in a if (reset) or if (handshake) section.

How to generate delay in verilog for synthesis?

I Want to Design a Verilog code for Interfacing 16*2 LCD. As in LCD's to give "command" or "data" we have to give LCD's Enable pin a "High to low Pulse " pulse that means
**E=1;
Delay();//Must be 450ns wide delay
E=0;**
This the place where I confuse I means in Verilog for synthesis # are not allowed so how can I give delay here I attached my code below. It must be noted that I try give delay in my code but I think delay not work so please help me to get rid of this delay problem......
///////////////////////////////////////////////////////////////////////////////////
////////////////////LCD Interfacing with Xilinx FPGA///////////////////////////////
////////////////////Important code for 16*2/1 LCDs/////////////////////////////////
//////////////////Coder-Shrikant Vaishnav(M.Tech VLSI)/////////////////////////////
///////////////////////////////////////////////////////////////////////////////////
module lcd_fpgashri(output reg [7:0]data,output reg enb,output reg rs,output reg rw ,input CLK);
reg [15:0]hold;
reg [13:0]count=0;
//Code Starts from here like C's Main......
always#(posedge CLK)
begin
count=count+1; //For Delay
//For LCD Initialization
lcd_cmd(8'b00111000);
lcd_cmd(8'b00000001);
lcd_cmd(8'b00000110);
lcd_cmd(8'b00001100);
//This is a String "SHRI" that I want to display
lcd_data(8'b01010011);//S
lcd_data(8'b01001000);//H
lcd_data(8'b01010010);//R
lcd_data(8'b01001001);//I
end
//Task For Command
task lcd_cmd(input reg [7:0]value);
begin
data=value;
rs=1'b0;
rw=1'b0;
enb=1'b1; //sending high to low pulse
hold=count[13]; //This is the place where I try to design delay
enb=1'b0;
end
endtask
//Task for Data
task lcd_data(input reg [7:0]value1);
begin
data=value1;
rs=1'b1;
rw=1'b0;
enb=1'b1; //sending high to low pulse
hold=count[13]; //This is the place where I try to design delay
enb=1'b0;
end
endtask
endmodule

You seem to be stuck in a software programming mindset based on your code, you're going to have to change things around quite a bit if you want to actually describe a controller in HDL.
Unfortunately for you there is no way to just insert an arbitrary delay into a 'routine' like you have written there.
When you write a software program, it is perfectly reasonable to write a program like
doA();
doB();
doC();
Where each line executes one at a time in a sequential fashion. HDL does not work in this way. You need to not think in terms of tasks, and start thinking in terms of clocks and state machines.
Remember that when you have an always block, the entire block executes in parallel on every clock cycle. When you have a statement like this in an always block:
lcd_cmd(8'b00111000);
lcd_cmd(8'b00000001);
lcd_cmd(8'b00000110);
lcd_cmd(8'b00001100);
This does you no good, because all four of these execute simultaneously on positive edge of the clock, and not in a sequential fashion. What you need to do is to create a state machine, such that it advances and performs one action during a clock period.
If I were to try to replicate those four lcd_cmd's in a sequential manner, it might look something like this.
always #(posedge clk)
case(state_f)
`RESET: begin
state_f <= `INIT_STEP_1;
data = 8'b00111000;
end
`INIT_STEP_1: begin
state_f <= `INIT_STEP_2;
data = 8'b00000001;
end
`INIT_STEP_2: begin
state_f <= `INIT_STEP_3;
data = 8'b00000110;
end
`INIT_STEP_3: begin
state_f <= `INIT_STEP_4;
data =8'b00111000;
end
`INIT_STEP_4: begin
state_f <= ???; //go to some new state
data = 8'b00000110;
end
endcase
end
Now with this code you are advancing through four states in four clock cycles, so you can start to see how you might handle writing a sequence of events that advances on each clock cycle.
This answer doesn't get you all of the way, as there is no 'delay' in between these as you wanted. But you could imagine having a state machine where after setting the data you move into a DELAY state, where you could set a counter which counts down enough clock cycles you need to meet your timing requirements before moving into the next state.

The best way to introduce delay is to use a counter as Tim has mentioned.
Find out how many clock cycles you need to wait to obtain the required delay (here 450ns) w.r.t your clock period.
Lets take the number of clock cycles calculated is count. In that case the following piece of code can get you the required delay. You may however need to modify the logic for your purpose.
always # (posedge clk) begin
if (N == count) begin
N <= 0;
E = ~E;
end else begin
N <= N +1;
end
end
Be sure to initialize N and E to zero.

Check the clock frequency of your FPGA board and initialize a counter accordingly. For example, if you want a delay of 1 second on an FPGA board with 50MHz clock frequency, you will have to write a code for a counter that counts from 0 to 49999999. Use the terminalCLK as clk for your design. Delayed clock input will put a delay to your design. The psuedo code for that will be:
module counter(count,terminalCLK,clk)
parameter n = 26, N = 50000000;
input clk;
output reg [n-1:0] count;
output reg terminalCLK;
always#(posedge clk)
begin
count <= count + 1;
if (count <= N/2)
terminalCLK <= ~terminalCLk;
if (count == N)
terminalCLK <= ~terminalCLk;
end

24 bit counter state machine

I am trying to create a counter in verilog which counts how many clock cycles there have been and after ten million it will reset and start again.
I have created a twenty four bit adder module along with another module containing twenty four D Flip flops to store the count of the cycles outputted from the adder.
I then want to have a state machine which is in the count state until ten million cycles have passed then it goes to a reset state.
Does this sound right? The problem is I am not sure how to implement the state machine.
Can anyone point me to a website/book which could help me with this?
thanks

As Paul S already mentioned, there is no need for a state machine if you want your counter to keep counting after an overflow. You can do something like this (untested, might contain typos):
module overflow_counter (
clk,
reset,
enable,
ctr_out
);
// Port definitions
input clk, reset, enable;
output [23:0] ctr_out;
// Register definitions
reg [23:0] reg_ctr;
// Assignments
assign ctr_out = reg_ctr;
// Counter behaviour - Asynchronous active-high reset
initial reg_ctr <= 0;
always # (posedge clk or posedge reset)
begin
if (reset) reg_ctr <= 0;
else if (enable)
begin
if (reg_ctr == 10000000) reg_ctr <= 0;
else reg_ctr <= reg_ctr + 1;
end
end
endmodule
Of course, normally you'd use parameters so you don't have to make a custom module every time you want an overflowing counter. I'll leave that to you ;).
[Edit] And here are some documents to help you with FSM. I just searched Google for "verilog state machine":
EECS150: Finite State Machines in Verilog
Synthesizable Finite State Machine Design Techniques
I haven't read the first paper, so I can't comment on that. The 2nd one shows various styles of coding FSMs, among which the 3 always blocks style, which I highly recommend, because it's a lot easier to debug (state transitions and FSM output are neatly separated). The link seems to be down, so here is the cached Google result.

You don't need a state machine. You already have state in the counter. All you need to do is detect the value you want to wrap at and load 0 into your counter at that point
In pseudo-code:
if count == 10000000 then
nextCount = 0;
else
nextCount = count + 1;
...or...
nextCount = count + 1;
if count == 10000000 then
resetCount = 1;

State machines are not too tricky. Use localparam (with a width, don't forget the width, not shown here because it is just one bit) to define labels for your states. Then create two reg variables (state_reg, state_next). The _reg variable is your actual register. The _next variable is a "wire reg" (a wire that can be assigned to inside a combinational always block). The two things to remember are to do X_next = X_reg; in the combinational always block (and then the rest of the combinational logic) and X_reg <= X_next; in the sequential always block. You can get fancy for special cases but if you stick to these simple rules then things should just work. I try not to use instantiation for very simple things like adders since Verilog has great support for adders.
Since I work with FPGAs, I assign initial values to my registers and I don't use a reset signal. I'm not sure but for ASIC design I think it is the opposite.
localparam STATE_RESET = 1'b0, STATE_COUNT = 1'b1;
reg [23:0] cntr_reg = 24'd0, cntr_next;
reg state_reg = STATE_COUNT, state_next;
always #* begin
cntr_next = cntr_reg; // statement not required since we handle all cases
if (cntr_reg == 24'd10_000_000)
cntr_next = 24'd0;
else
cntr_next = cntr_reg + 24'd1;
state_next = state_reg; // statement required since we don't handle all cases
case (state_reg)
STATE_COUNT: if (cntr_reg == 24'd10_000_000) state_next = STATE_RESET;
endcase
end
always #(posedge clk) begin
cntr_reg <= cntr_next;
state_reg <= state_next;
end
I found this book to be very helpful. There is also a VHDL version of the book, so you can use both side-by-side as a Rosetta Stone to learn VHDL.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Combinational way of implementing a CAM in verilog - verilog

Following up on a suggestion from another forum, the following seems to work well. always_comb begin index_valid = 0; for (int i=0; i < 64; i=i+1) begin if (rd_en) begin if (inp == buffer[i]) begin index = i; index_valid = 1'b1; end end end end

Related

Filling register vector from FIFO with generated shifts

How to prevent ISE compiler from optmizing away my array?

Why is adding one operation causing my number of logic elements to skyrocket?

How to generate delay in verilog for synthesis?

24 bit counter state machine

Categories

Resources