what is the real meaning of #10 verilog testbench? - verilog

I am new in verilog programming. So I was trying to explore the meaning of simple MUX code. In the test bench, It is observed there are multiple " #10 "s.
what is the purpose of this line?
Also please explain the need of defining inputs as "reg" and output simply as "wires"
I have added the snapshot for reference.
Thanks in advance.
Vt

It adds 10 units of time delay before executing the statement.
#always(clock.posedge) begin
#10
c = a + b
end
The above example adds a and b after 10 units of delay from posedge of clock

#10 determines the time delay before the operation.
"reg" and "wire" are data types ,to know the detail difference between different data types refer :
Verilog HDL: A Guide to Digital Design and Synthesis
By Samir Palnitkar

Related

Fast array inner access in verilog

I have some lines of code below:
wire [WIDTH_PIXEL-1:0] x_vector [0:36];
wire [6-1:0] x_sample [0:511]; // 0 <= x_sample <= 36
reg [WIDTH_PIXEL-1:0] rx_512 [0:511];
genvar p;
generate
for(p=0;p<=511;p=p+1) begin: PPP
always#(posedge clk) begin
if(x_sample[p] == counter2) begin
rx_512[p] <= x_vector[x_sample[p]];
end
end
I want to save 512 x_vector elements whose address is the value of x_sample[p]. The problem is when I synthesize on Quartus, the total LC-combinationals over 50000. I know the problem lies on the line
rx_512[p] <= x_vector[x_sample[p]];
So is there any way for improving the access memory? Thank you.
Keep in mind that Verilog is meant as a hardware emulation language.
This makes that you have to learn to write two different types of code:
Code that gets converted to hardware
Test bench code
For the former there are a lot more restrictions. As you correctly noticed you get 512 comparators each comparing 6 bits plus each conditionally selecting one of 37 PIXELWIDTH values and assigning it to one of 512 PIXELWIDTH destinations. My guess is easily a million gates.
You have to use a divide an conquer approach. As Qiu says make the code sequential: One operation per clock cycle. It will take more clock cycles but a lot less logic. Unfortunately you might find out that you do not have enough time to e.g. process a whole image in that (frame?) time. Then choose to do two or four operations per cycle.
You have to continuously weigh speed versus number of gates & power. Maybe you find out that you can't do the operations at all with the chosen hardware. (Nobody said writing Verilog was easy!)
I don't know if it helps but you can make the compiler/optimizer's life a bit easier if you use:
rx_512[p] <= x_vector[counter2];

Snake game using FPGA (ALTERA)

I am planning to make a snake game using the Altera DE2-115 and display it on LED Matrix
Something similar to this in the video http://www.youtube.com/watch?v=niQmNYPiPw0
but still i don't know how to start,any help?
You'll have choose between 2 implementation routes:
Using a soft processor (NIOS II)
Writing the game logic in a hardware description language ([System]Verilog or VHDL)
The former will likely be the fastest route to achieving the completed game assuming you have some software background already, however if your intention is to learn to use FPGAs then option 2 may be more beneficial. If you use a NIOS or other soft processor core you'll still need to create an FPGA image and interface to external peripherals.
To get started you'll want to look at some of the example designs that come with your board. You also need to fully understand the underlying physical interface to the LED Matrix display to determine how to drive it. You then want to start partitioning your design into sensible blocks - a block to drive the display, a block to handle user input, storage of state, the game logic itself etc. Try and break them down sufficiently to decide what the interfaces to communicate between the blocks are going to look like.
In terms of implementation, for option 1 you will probably make use of Altera's SoC development environment called Qsys. Again working from an existing example design as a starting point is probably the easiest way to get up-and-running quickly. With option 2 you need to pick a hardware description language ([System]Verilog or VHDL or MyHDL) and code away in your favourite editor.
When you start writing RTL code you'll need a mechanism to simulate it. I'd suggest writing block-level simulations for each of the blocks you write (ideally writing the tests based on your definitions of the interfaces before writing the RTL). Assuming you're on a budget you don't have many options: Altera bundles a free version of Modelsim with their Quartus tool, there's the open source options of the Icarus simulator if using verilog or GHDL for VHDL.
When each block largely works run it through the synthesis tool to check FPGA resource utilisation and timing closure (the maximum frequency the design can be clocked at). You probably don't care too much about the frequency implementing a game like snake but it's still good practice to be aware of how what you write translates into an FPGA implementation and the impact on timing.
You'll need to create a Quartus project to generate an FPGA bitfile - again working from an existing example design is the way to go. This will provide pin locations and clock input frequencies etc. You may need to write timing constraints to define the timing to your LED Matrix display depending on the interface.
Then all you have to do is figure out why it works in simulation but not on the FPGA ;)
Let's say you have a LED matrix like this:
To answer not your question, but your comment about " if u can at least show me how to make the blinking LED i will be grateful :)", we can do as this:
module blink (input wire clk, /* assuming a 50MHz clock in your trainer */
output wire anode, /* to be connected to RC7 */
output wire cathode); /* to be connected to RB7 */
reg [24:0] freqdiv = 25'h0000000;
always #(posedge clk)
freqdiv <= freqdiv + 1;
assign cathode = 1'b0;
assign anode = freqdiv[24];
endmodule
This will make the top left LED to blink at a rate of 1,4 blinks per second aproximately.
This other example will show a running dot across the matrix, left to right, top to down:
module runningdot (input wire clk, /* assuming a 50MHz clock in your trainer */
output wire [7:0] anodes, /* to be connected to RC0-7 */
output wire [7:0] cathodes); /* to be connected to RB0-7 */
reg [23:0] freqdiv = 24'h0000000;
always #(posedge clk)
freqdiv <= freqdiv + 1;
wire clkled = freqdiv[23];
reg [7:0] r_anodes = 8'b10000000;
reg [7:0] r_cathodes = 8'b01111111;
assign anodes = r_anodes;
assign cathodes = r_cathodes;
always #(posedge clkled) begin
r_anodes <= {r_anodes[0], r_anodes[7:1]}; /* shifts LED left to right */
if (r_anodes == 8'b00000001) /* when the last LED in a row is selected... */
r_cathodes <= {r_cathodes[0], r_cathodes[7:1]}; /* ...go to the next row */
end
endmodule
Your snake game, if using logic and not an embedded processor, is way much complicated than these examples, but it will use the same logic principles to drive the matrix.

Dividing and Modulus in verilog

I'm creating an ALU in verilog just for the purpose of simulation. But I can't figure out how to divide two 16 bit inputs. A regular A=B/C doesn't work (where B,C is a input[15:0] and A is output reg[15:0]). Similarly with A=B%C.
Would I have to separately implement a division circuit module? I understand division is a very complex operation and that would be the actual way to do it but I'm only doing it for simulation. Is there no shorter way to divide two 16bit inputs?
Your code currently does this:
initial begin
out1=in1/in2; out2=in1%in2;
end
This doesn't do anything - the initial block runs through once at the start of simulation, when in1 and in2 are X's, setting out1 and out2 to X, and then stopping. Change your logic to:
always
#* begin
out1=in1/in2;
out2=in1%in2;
end
This executes every time in1 or in2 changes.

Asynchronous FIFO Design

I found the following piece of code in the internet , while searching for good FIFO design. From the linkSVN Code FIFO -Author Clifford E. Cummings . I did some research , I was not able to figure out why there are three pointers in the design ?I can read the code but what am I missing ?
module sync_r2w #(parameter ADDRSIZE = 4)
(output reg [ADDRSIZE:0] wq2_rptr,
input [ADDRSIZE:0] rptr,
input wclk, wrst_n);
reg [ADDRSIZE:0] wq1_rptr;
always #(posedge wclk or negedge wrst_n)
if (!wrst_n) {wq2_rptr,wq1_rptr} <= 0;
else {wq2_rptr,wq1_rptr} <= {wq1_rptr,rptr};
endmodule
module sync_w2r #(parameter ADDRSIZE = 4)
(output reg [ADDRSIZE:0] rq2_wptr,
input [ADDRSIZE:0] wptr,
input rclk, rrst_n);
reg [ADDRSIZE:0] rq1_wptr;
always #(posedge rclk or negedge rrst_n)
if (!rrst_n) {rq2_wptr,rq1_wptr} <= 0;
else {rq2_wptr,rq1_wptr} <= {rq1_wptr,wptr};
endmodule
What you are looking at here is what's called a dual rank synchronizer. As you mentioned this is an asynchronous FIFO. This means that the read and write sides of the FIFO are not on the same clock domain.
As you know flip-flops need to have setup and hold timing requirements met in order to function properly. When you drive a signal from one clock domain to the other there is no way to guarantee this requirements in the general case.
When you violate these requirements FFs go into what is called a 'meta-stable' state where there are indeterminate for a small time and then (more or less) randomly go to 1 or 0. They do this though (and this is important) in much less than one clock cycle.
That's why the two layers of flops here. The first has a chance of going meta-stable but should resolve in time to be captured cleanly by the 2nd set of flops.
This on it's own is not enough to pass a multi-bit value (the address pointer) across clock domains. If more than one bit is changing at the same time then you can't be sure that the transition will be clean on the other side. So what you'll see often in these situations is that the FIFO pointers will by gray coded. This means that each increment of the counter changes at most one bit at a time.
e.g. Rather than 00 -> 01 -> 10 -> 11 -> 00 ... it will be 00 -> 01 -> 11 -> 10 -> 00 ...
Clock domain crossing is a deep and subtle subject. Even experienced designers very often mess them up without careful thought.
BTW ordinary Verilog simulations will not show anything about what I just described in a zero-delay sim. You need to do back annotated SDF simulations with real timing models.
In this example, the address is passed through the shift register in order for it to be delayed by one clock cycle. There could have been more “pointers” in order to delay the output even more.
Generally, it is easier to understand what is going on and why if you simulate the design and look at the waveform.
Also, here are some good FIFO implementations you can look at:
Xilinx FIFO Generator IP Core
Altera's single/double-clock FIFOs
OpenCores Generic FIFOs
Hope it helps. Good Luck!

Synthesis error in Verilog

I am trying to implement the FatICA algorithm in verilog. I have written the whole code and till simulation it shows no error but when I try to synthesize the code it gives an error stating " ";" expecting instead of".""
I am using four floating point modules for arithmetic calculation and I have generated 1000 instances of sum, sqrt ... etc using for loop for the in between calculations.Following is the code for generate
genvar s;
generate
for(s=1;s<=4000;s=(s+1))
begin:cov_mul_ins
Float32Mul cov_mul (.CLK(clk),
.nRST(1'b1),
.leftArg(dummy_14),
.rightArg(dummy_15),
.loadArgs(1'b1)
);
end
endgenerate
Now I am accessing the individual instances using the Dot operator
for(d=1;d<=2;d=(d+1))
begin
for(e=1;e<=2;e=(e+1))
begin
for(c=1;c<=1000;c=(c+1))
begin
if((d==1)&&(e==1))
begin
dummy_14=centered_data_copy[d][c];
dummy_15=Parent.centered_data_float_trans[c][e];
#10 ***cov_mul_ins[c].cov_mul***(.CLK(clk),
.nRST(1'b1),
.leftArg(dummy_14),
.rightArg(dummy_15),
.loadArgs(1'b1),
.product(cov_temp[c][1])
);
I would be grateful if someone could pin point the error I am making.Thanks!
Couple of things to note:
Out of module references can't be synthesised. This means that you can't "peek inside" instantiated modules to look at nets or call functions if you want that code to be synthesisable. It's grand for testbenches though.
Your attempted function call has a delay on it, which will be ignored, ie #10 cov_mul_ins[c].cov_mul ( ... );
I can see your thinking in a softwarey lets-put-everything-in-a-class-and-call-methods way. This is perfect for testbenches, but synthesis will complain, as you've seen. When it comes to hardware, well, you need to think of the hardware - ask youself which blocks you need to build to run your algorithm. For example, if your algorithm needs 30 multiplies on each input sample, then you need either 30 instances of a multiplier, or one multiplier and sequence your 30 operations through it. Or 15 multipliers, each doing 2 multiplications per sample period, or 10 multipliers doing 3 etc...
Try to delete "#10" because I think it is not synthesable.

Resources