Dividing and Modulus in verilog - verilog

I'm creating an ALU in verilog just for the purpose of simulation. But I can't figure out how to divide two 16 bit inputs. A regular A=B/C doesn't work (where B,C is a input[15:0] and A is output reg[15:0]). Similarly with A=B%C.
Would I have to separately implement a division circuit module? I understand division is a very complex operation and that would be the actual way to do it but I'm only doing it for simulation. Is there no shorter way to divide two 16bit inputs?

Your code currently does this:
initial begin
out1=in1/in2; out2=in1%in2;
end
This doesn't do anything - the initial block runs through once at the start of simulation, when in1 and in2 are X's, setting out1 and out2 to X, and then stopping. Change your logic to:
always
#* begin
out1=in1/in2;
out2=in1%in2;
end
This executes every time in1 or in2 changes.

Related

Extra regs being created during synthesis

Say I have 2 multi-bit regs in design. Both of them share a common condition (cond_x) as their enable but 1 of them has an extra condition (cond_y) apart from reset signal for when its meant to be reset.
Example (ignoring reset part of code for simplicity) -
Same always block
always #(posedge clock) begin
if(cond_x) begin
a <= a_next;
b <= b_next;
else if(cond_y) begin
b <= 5'b0;
end
end
Different always blocks
always #(posedge clock) begin
if(cond_x) begin
a <= a_next;
end
end
always #(posedge clock) begin
if(cond_x) begin
b <= b_next;
else if(cond_y) begin
b <= 5'b0;
end
end
When I synthesize 2 i get more number of regs than are expected in the design. Using 1 it is accurate. The extra regs are only for lower two bits of b and are suffixed by __rep1. Not sure what that means or how it is being created.
Is there any possible reason for the same? I am using Synopsys DC
Design Compiler can replicate cells to improve timing, load etc. and the replicated cells get the suffix _rep<n>. The datasheet of DC Ultra has the following explanation:
DC Ultra looks at a larger subsection of the critical path during
logic duplication and can replicate many gates to reduce load of high
fan-out nets, hence improving timing on critical paths through load
isolation.
However the two code snippets seem identical, DC can produce different results depending on the starting conditions. Most probably the second code was synthesized into a worse circuit for b[1:0] and the tool had to replicate these two flip-flops.

How can I timing in verilog from ns to s?

How can I change a number per second on a digital display in verilog?
Before the header appear "timescale 1ns / 1ps ".
So when I write "#1 ... " this is 1ns.
How I could write to appear the number per second without write "#1000000000"?
always #(*)
begin
if(binary_input==0)
begin
seg=8'b10011111; /*1*/
#100 seg=8'b00001001; /*9*/
#100 seg=8'b00001101;
Delays such as #10 and #1us are not synthesizable. These delays are used for test benches and non-synthesizable behavioral models (a.k.a. reference models) such as clock generators.
In a synthesizable design, time is measured in number of clock cycles. To has your design wait a specific amount of time, a counter that can store a value of desired time divided by the clock period. For example, 1 second on a 100 nano-second clock requires a counter that can store a value of 10,000,000.

What is always followed by #(...) pound mean in Verilog?

In a simple clock generator example, I see the following code:
always #(cycle/2) clk ~= clk;
I've seen always #(*) before but not pound (#). I tried to find it in the documentation, but all I could find was some reference to "real-valued ports" with no further elaboration.
It's a delay operation. It essentially just reads
always begin
#(cycle/2) //wait for cycle/2 time
clk ~= clk;
end
You might sometimes see this used with raw values, like #5 or #10, which means to wait 5 or 10 units of your timescale.

Verilog: Setting a constant value on clock even

I am new to Verilog, so this question might be quite dumb.
What I am trying: I have a component that has a clk, an 8 bit input and an 8 bit output. What it should do, is:
If the clock event is negative edge, it should set the output to 0
If the clock event is positive edge, it should set the output to whatever input is at this moment of the edge event. During the high phase of the clock, the output should NOT change, regardless changes on the input.
What I tried so far:
always #(negedge clk)
_ledOut <= 0;
always #(posedge clk)
_ledOut[RowSize-1:0] <= ledIn[RowSize-1:0];
This tells my, that it can't resolve multiple constant drivers for net _ledOut.
However, putting this together in an always #(negedge clk, posedge clk) tells me, it can't test for both conditions.
So I tried to make just one always #(clk) block and then used an if statement:
always #(clk) begin
if(clk == 0)
_ledOut <= 0;
else if(clk == 1)
_ledOut[RowSize-1:0] <= ledIn[RowSize-1:0];
end
But this didn't just switch on a clk event. During the high phase of the clock, it links _ledOut with ledIn, so that changes on ledIn do also have effect on _ledOut. What am I doing wrong here?
Best regards,
Michael
This tells my, that it can't resolve multiple constant drivers for net
_ledOut.
For synthesis you cannot assign reg types from multiple always blocks.
However, putting this together in an always #(negedge clk, posedge
clk) tells me, it can't test for both conditions.
This essentially describes a DDR register. While many FPGA devices have these they typically cannot be synthesized. Xilinx uses ODDR2 and IDDR2 primitives if you really need this functionality.
If the clock event is negative edge, it should set the output to 0 If
the clock event is positive edge, it should set the output to whatever
input is at this moment of the edge event. During the high phase of
the clock, the output should NOT change, regardless changes on the
input.
If this is all you need then you can use a D flip flop with an AND gate on the output. The flip-flop will sample ledIn on each rising edge of clk and the AND gate will mask the output whenever the clock is zero. This is not ideal as you generally do not want clocks to touch non-sequential logic but avoiding this would likely mean changing your requirements.
As toolic indicated, the code you posted will work but you should understand that code will synthesize to a multiplexer controlled by clk.
Ok, here is my working solution now. Maybe it's not the best verilog code you have seen out there. ;) This is, however, my first thing I do with it, as a project at my university. So as long as it does what I want it to do, this is a great success to me! ;)
Here is the code I used now, thanks to Adam12:
parameter RowSize = 8;
input clk;
input [RowSize-1:0] ledIn;
output [RowSize-1:0] ledOut;
reg[RowSize-1:0] _ledOut;
assign ledOut = _ledOut & {RowSize{clk}};
always #(posedge clk) begin
_ledOut[RowSize-1:0] <= ledIn[RowSize-1:0];
end
Consider the following stimulus:
module tb;
parameter RowSize = 8;
reg clk;
reg [7:0] ledIn, _ledOut;
always #(clk) begin
if(clk == 0)
_ledOut <= 0;
else if(clk == 1)
_ledOut[RowSize-1:0] <= ledIn[RowSize-1:0];
end
initial begin
$monitor($time, " clk=%b ledIn=%h _ledOut=%h", clk, ledIn, _ledOut);
ledIn = 0;
#22 ledIn = 8'h55;
#20 $finish;
end
always begin
#5 clk <= 0;
#5 clk <= 1;
end
endmodule
It produces this output:
0 clk=x ledIn=00 _ledOut=xx
5 clk=0 ledIn=00 _ledOut=00
10 clk=1 ledIn=00 _ledOut=00
15 clk=0 ledIn=00 _ledOut=00
20 clk=1 ledIn=00 _ledOut=00
22 clk=1 ledIn=55 _ledOut=00
25 clk=0 ledIn=55 _ledOut=00
30 clk=1 ledIn=55 _ledOut=55
35 clk=0 ledIn=55 _ledOut=00
40 clk=1 ledIn=55 _ledOut=55
Notice at time 22, when ledIn changes, the _ledOut output does not change. _ledOut only changes at the next posedge of clk at time 30. Therefore, the always #(clk) solution is doing what you want: the output only changes at the clock edge, as you specified.
This is a pretty unusual question, and it makes me advise you need to give more information about what you are actually trying to achieve, since it may well impact the timing performance and clock constraints if this is targeting an FPGA. Synthesis has been mentioned, but what will you be feeding the clock-gated output into? If it's a pin-pad, then you should read the DDR pad buffers in the device specifications and infer the specific primitive to be able to drive a DDR signal.
If you are keeping this signal within the chip then this is a very bizarre request. If I needed to generate that waveform, I would probably use a PLL to generate a phase-locked clock at twice the base frequency and put the gated data into that domain, with a toggle to apply the mast, so that the tooling will be able to properly analyse the clock crossings and the resulting data path is still effectively transitioning on a single edge.
The answers above to infer a register with a combinatorial multiplexer forced on the output is interesting, but whatever you feed this into will have to deal with awkward setup/hold conditions, and if on-chip, would only be sampling one edge anyway, so this is kind of redundant.

Better way of coding a RAM in Verilog

Which code is better in writing a RAM?
assigning data_out inside always block:
module memory(
output reg [7:0] data_out,
input [7:0] address,
input [7:0] data_in,
input write_enable,
input clk
);
reg [7:0] memory [0:255];
always #(posedge clk) begin
if (write_enable) begin
memory[address] <= data_in;
end
data_out <= memory[address];
end
endmodule
assigning data_out using assign statement:
module memory(
output [7:0] data_out,
input [7:0] address,
input [7:0] data_in,
input write_enable,
input clk
);
reg [7:0] memory [0:255];
always #(posedge clk) begin
if (write_enable) begin
memory[address] <= data_in;
end
end
assign data_out = memory[address];
endmodule
Any recommendations?
It depends on your requirements.
This registers your memory output. If you are synthesizing this to gates, you will have 8 more flip-flops than in case 2. That means you use a little more area. It also means your output will have less propagation delay relative to the clock than case 2. Furthermore, the output data will not be available until the next clock cycle.
Your output data will be available within the same clock cycle as it was written, albeit with longer propagation delay relative to the clock.
You need to decide which to use based on your requirements.
A third option is to use a generated RAM, which is a hard macro. This should have area, power and possibly timing advantages over both case 1 and 2.
to add to toolic's answer - if you use the asynchronous read method (case 2), it won't map to a RAM block in an FPGA, as the RAM blocks in all the major architectures I'm aware of have a synchronous read.
Both forms are valid, depending on the type of pipelining you want. I always recommend following the Xilinx RAM coding guidelines -- it's a good way to ensure that the code synthesizes into proper FGPA constructs.
For example, your example 1 would synthesize into into Xilinx BRAM (i.e., dedicated Block Ram), since it is synchronous read, and your example 2 would synthesize into Xilinx Distributed Ram (since it is asynchronous read).
See the coding guidelines in Xilinx document UG901 (Vivado Design Suite User Guide), in the RAM HDL Coding Techniques section. It also has a good description of the difference between synchronous read and asynchronous read for RAMs.
In the second program, there would be compilation error as we can not 'Assign' a value to 'Reg'.
It will give an error saying: 'Register is illegal in left-hand side of continuous assignment'

Resources