hardware implementation of Modulo m adder - verilog

I have 8 inputs whose modulo sum i have to take with modulus m.i know algorithm for 2 input but it is not working here.
eg i have sum=sum0+sum1+sum2+sum3+sum4+sum5+sum6+sum7 and i have to take mod m of sum.How to do this rom hardware implementation point of view?
i aslo write code but its not working
m3 is mod3
always#(posedge clk)
begin
sum3a<=mod30+mod31;
sum3b<=mod32+mod33;
sum3c<=mod34+mod35;
sum3d<=mod36+mod37;
sum3e<=sum3a+sum3b;
sum3f<=sum3c+sum3d;
x31= (sum3e+sum3f);
x32= (sum3e-m3);
if (x32>=0 )
sum3 <= x32;
else
sum3 <= x31;
end

Do not mix blocking and non-blocking assignments in the same always block. sum3e variable depends on sum3a and sum3b but at the same time sum3a and sum3b value is changing because of non-blocking assignments,This will results in logical errors.

Related

Left Shift In Ring Counter

module RingCounter(
input logic Clock,
input logic Reset,
output logic [3:0] Count
);
always_ff #(posedge Clock, posedge Reset)
begin
if (Reset)
Count <= 4’d1;
else
Count <= {Count[2:0], Count[3]};
end
endmodule
I have the working code above for a 4-bit ring counter in SystemVerilog, but I am unsure how one line of it works as it wasn't clearly explained in the lecture.
Count <= {Count[2:0], Count[3]};
Any help on explaining exactly what this line does would be much appreciated.
The curly braces, {}, are the concatenation operator. They concatenate multiple bits into a bus.
On the left hand side of the nonblocking assignment (<=), you have Count, which is a short-hand way of writing the 4-bit bus: Count[3:0].
On the right hand side of the assignment, you have the 3-bit signal Count[2:0] concatenated with the 1-bit signal Count[3].
Another way to write the RHS is as 4 separate bits in the following order:
{Count[2], Count[1], Count[0], Count[3]}
Another way to write the LHS is as 4 separate bits in the following order:
{Count[3], Count[2], Count[1], Count[0]}
Therefore, the assignment sets the new Count[3] to the old Count[2], etc.
Refer to IEEE Std 1800-2017, section 11.4.12 Concatenation operators.

How does adding 1'b1 to 8 bit reg work in Verilog?

I am absolute beginner in Verilog and I am wondering how does the addition statement work in this piece of program.
reg [7:0] hcount;
...
always #(posedge clk) begin
if(!n_rst) begin
hcount <= 'd0;
end else if(hcount== (WIDTH-1)) begin
hcount <= 'd0;
end else begin
hcount <= hcount + 1'b1;
end
end
I understand that 1'b1 expands to 8'b1 because hcount has 8 bit width and now the calculation will work with 8 bit now. But how exactly does that addition work? Your help is much appreciated.
Verilog is a hardware description langue in a sense that it tries to describe behavior of real hardware. One of attributes of modern hardware is clock. Clock drives flops which synchronize data across different hardware devices.
Clock behavior in verilog is simulated by posedge clk (or negede), meaning that the corresponding always block will be executed if and only if clk switches to 1 from any other value (x, z, 0);.
So, in your case, there is supposed to be a clock (clk) which gets generated somewhere in test bench. It periodically switches between 0 and 1.
As soon as it switches 0 -> 1 it gets executed. If condition is right, the hcount <= hcount + 1'b1 will be executed. As you mentioned the compiler will zero-extend 1'b1 to the 8-bit value 00000001. The rest is the same as in any programming language, hcount will be incremented.
There is certain semantic associated with the non-blocking assignment, <=, but this would be a different question. For the purpose of your question it does not matter.
So, a result will be that a single increment will be done every clock cycle unless n_rst is '0'. Also, as soon as the counter reaches WIDH - 1 it will be set to '0'. Only one operation is allowed at a single clock edge.

Verilog race with clock divider using flops

I made a basic example on eda playground of the issue I got.
Let s say I have two clocks 1x and 2x. 2x is divided from 1x using flop divider.
I have two registers a and b. a is clocked on 1x, b is clocked in 2x.
b is sampling value of a.
When we have rising edge of 1x and 2x clocks, b is not taking the expected value of a but it s taking the next cycle value.
This is because of this clock divider scheme, if we make division using icgs and en it works fine.
But is there a way to make it work using this clock divider scheme with flops ?
EDA playground link : https://www.edaplayground.com/x/map#
module race_test;
logic clk1x = 0;
logic clk2x = 0;
always
#5ns clk1x = !clk1x;
int a, b;
always #(posedge clk1x) begin
a <= a+1;
clk2x <= !clk2x;
end
// Problem here is that b will sample postpone value of a
// clk2x is not triggering at the same time than clk1x but a bit later
// This can be workaround by putting blocking assignment for clock divider
always #(posedge clk2x) begin
b <= a;
end
initial begin
$dumpfile("test.vcd");
$dumpvars;
#1us
$stop;
end
endmodule
Digital clock dividers present problems with both simulation and physical timing.
Verilog's non-blocking assignment operator assumes that everyone reading and writing the same variables are synchronized to the same clock event. By using an NBA writing to clk2x, you have shifted the reading of a to another delta time*, and as you discovered, a has already been updated.
In real hardware, there are considerable propagation delays that usually avoid this situation. However, you are using the same D-flop to assign to clk2x, so there will be propagation delays there as well. You last always block now represents a clock domain crossing issue. So depending on the skews between the two clocks, you could still have a race condition.
One way of correcting this is using a clock generator module with an even higher frequency clock
always #2.5ns clk = !clk;
always #(posedge clk) begin
clk1x <= !clk1x;
if (clk1x == 1)
clk2x = !clk2x;
Of course you have solved this problem, but I think there is a better way.
The book says one can use blocking assignment to avoid race. But blocking assignemnt causes errors in synopsys lint check. So, one way to avoid race problem without lint error is to use dummy logic, like this
wire [31:0] a_logic;
wire dummy_sel;
assign dummy_sel = 1'b0;
assign a_logic = dummy_sel ? ~a : a;
always #(posedge clk2x) begin
b <= a_logic;
end

Low power design for adders

I have to implement a circuit that performs A+B+C+D serially.
A and B are added using the first adder, the result is added to C using the second adder and finally the result is added to D using the third adder, one after the other.
The problem is, in order to make the design low power. I have to turn off the other two adders which are not in use. All I can think is Enable and Disable signals, but this causes latency issues.
How do I synthesize this in in an effective manner in verilog?
A,B,C,D may change every clock cycle. a start signal is used to indicate when a new calculation is required.
I assume your adder has been implied via sum = A + B;. For area optimisation why do you not share a single adder unit. A+B in CLK1, SUM+C in CLK2, SUM+D in CLK3. Then you have nothing to disable or clock gate.
The majority of power is used when values change, so zeroing inputs when not used can actually increase power by creating unnecessary toggles. As adders are combinatorial logic all we can do to save power for a given architecture is hold values stable, this could be done through the use of clock gate cells controlling/sequencing input and output flip-flops clks.
Update
With the information that a new calculation may be required every clock cycle, and there is an enable signal called start. Th question made reference to adding them serially ie :
sum1 = A + B;
sum2 = sum1 + C;
sum3 = sum2 + D;
Since the result is calculated potentially every clock cycle they are all on or all off. The given serialisation (which is all to be executed in parallel) has 3 adders stringed together (ripple path of 3 adders). if we refactor to :
sum1 = A + B;
sum2 = C + D;
sum3 = sum1 + sum2;
Or ripple path is only 2 adders deep allowing a quicker settling time, which implies less ripple or transients to consume power.
I would be tempted to do this all on 1 line and allow the synthesis tool to optimise it.
sum3 = A + B + C + D;
For power saving I would turn on auto clock gating when synthesising and use a structure that worked well with this technique:
always #(posedge clk or negedge rst_n) begin
if (~rst_n) begin
sum3 <= 'b0;
end
else begin
if (start) begin //no else clause, means this signal can clk gate the flop
sum3 <= A + B + C + D;
end
end
end

Issue with Logic in Verilog

I'm trying to write a multiplier based on a design. It consists of two 16-bit inputs and the a single adder is used to calculate the partial product. The LSB of one input is AND'ed with the 16 bits of the other input and the output of the AND gate is repetitively added to the previous output. The Verilog code for it is below, but I seem to be having trouble with getting the outputs to work.
module datapath(output reg [31:15]p_high,
output reg [14:0]p_low,
input [15:0]x, y,
input clk); // reset, start, x_ce, y_ce, y_load_en, p_reset,
//output done);
reg [15:0]q0;
reg [15:0]q1;
reg [15:0]and_output;
reg [16:0]sum, prev_sum;
reg d_in;
reg [3:0] count_er;
initial
begin
count_er <= 0;
sum <= 17'b0;
prev_sum <= 17'b0;
end
always#(posedge clk)
begin
q0 <= y;
q1 <= x;
and_output <= q0[count_er] & q1;
sum <= and_output + prev_sum;
prev_sum <= sum;
p_high <= sum;
d_in <= p_high[15];
p_low[14] <= d_in;
p_low <= p_low >> 1;
count_er <= count_er + 1;
end
endmodule
I created a test bench to test the circuit and the first problem I see is that, the AND operation doesn't work as I want it to. The 16-bits of the x-operand are and'ed with the LSB of the y-operand. The y-operand is shifted by one bit after every clock cycle and the final product is calculated by successively adding the partial products.
However, I am having trouble starting from the sum and prev_sum lines and their outputs are being displayed as xxxxxxxxxxxx.
You don't seem to be properly resetting all the signals you need to, or you seem to be confusing the way that nonblocking assignments work.
After initial begin:
sum is 0
prev_sum is 0
and_output is X
After first positive edge:
sum is X, because and_output is X, and X+0 returns X. At this point sum stays X forever, because X + something is always X.
You're creating a register for almost every signal in your design, which means that none of your signals update immediately. You need to make a distinction between the signals that you want to register, and the signals that are just combinational terms. Let the registers update with nonblocking statements on the posedge clock, and let the combinational terms update immediately by placing them in an always #* block.
I don't know the algorithm that you're trying to use, so I can't say which lines should be which, but I really doubt that you intend for it to take one clock cycle for x/y to propagate to q0/q1, another cycle for q to propagate to and_output, and yet another clock cycle to propogate from and_output to sum.
Comments on updated code:
Combinational blocks should use blocking assignments, not nonblocking assignments. Use = instead of <= inside the always #* block.
sum <= and_output + sum; looks wrong, It should be sum = and_output + p_high[31:16] according to your picture.
You're assigning p_low[14] twice here. Make the second statement explicitly set bits [13:0] only:
p_low[14] <= d_in;
p_low[13:0] <= p_low >> 1;
You are mixing blocking and nonblocking assignments in the same sequential always block, which can cause unexpected results:
d_in <= p_high[15];
p_low[14] = d_in;

Resources