Low power design for adders - verilog

I have to implement a circuit that performs A+B+C+D serially.
A and B are added using the first adder, the result is added to C using the second adder and finally the result is added to D using the third adder, one after the other.
The problem is, in order to make the design low power. I have to turn off the other two adders which are not in use. All I can think is Enable and Disable signals, but this causes latency issues.
How do I synthesize this in in an effective manner in verilog?
A,B,C,D may change every clock cycle. a start signal is used to indicate when a new calculation is required.

I assume your adder has been implied via sum = A + B;. For area optimisation why do you not share a single adder unit. A+B in CLK1, SUM+C in CLK2, SUM+D in CLK3. Then you have nothing to disable or clock gate.
The majority of power is used when values change, so zeroing inputs when not used can actually increase power by creating unnecessary toggles. As adders are combinatorial logic all we can do to save power for a given architecture is hold values stable, this could be done through the use of clock gate cells controlling/sequencing input and output flip-flops clks.
Update
With the information that a new calculation may be required every clock cycle, and there is an enable signal called start. Th question made reference to adding them serially ie :
sum1 = A + B;
sum2 = sum1 + C;
sum3 = sum2 + D;
Since the result is calculated potentially every clock cycle they are all on or all off. The given serialisation (which is all to be executed in parallel) has 3 adders stringed together (ripple path of 3 adders). if we refactor to :
sum1 = A + B;
sum2 = C + D;
sum3 = sum1 + sum2;
Or ripple path is only 2 adders deep allowing a quicker settling time, which implies less ripple or transients to consume power.
I would be tempted to do this all on 1 line and allow the synthesis tool to optimise it.
sum3 = A + B + C + D;
For power saving I would turn on auto clock gating when synthesising and use a structure that worked well with this technique:
always #(posedge clk or negedge rst_n) begin
if (~rst_n) begin
sum3 <= 'b0;
end
else begin
if (start) begin //no else clause, means this signal can clk gate the flop
sum3 <= A + B + C + D;
end
end
end

Related

Verilog race with clock divider using flops

I made a basic example on eda playground of the issue I got.
Let s say I have two clocks 1x and 2x. 2x is divided from 1x using flop divider.
I have two registers a and b. a is clocked on 1x, b is clocked in 2x.
b is sampling value of a.
When we have rising edge of 1x and 2x clocks, b is not taking the expected value of a but it s taking the next cycle value.
This is because of this clock divider scheme, if we make division using icgs and en it works fine.
But is there a way to make it work using this clock divider scheme with flops ?
EDA playground link : https://www.edaplayground.com/x/map#
module race_test;
logic clk1x = 0;
logic clk2x = 0;
always
#5ns clk1x = !clk1x;
int a, b;
always #(posedge clk1x) begin
a <= a+1;
clk2x <= !clk2x;
end
// Problem here is that b will sample postpone value of a
// clk2x is not triggering at the same time than clk1x but a bit later
// This can be workaround by putting blocking assignment for clock divider
always #(posedge clk2x) begin
b <= a;
end
initial begin
$dumpfile("test.vcd");
$dumpvars;
#1us
$stop;
end
endmodule
Digital clock dividers present problems with both simulation and physical timing.
Verilog's non-blocking assignment operator assumes that everyone reading and writing the same variables are synchronized to the same clock event. By using an NBA writing to clk2x, you have shifted the reading of a to another delta time*, and as you discovered, a has already been updated.
In real hardware, there are considerable propagation delays that usually avoid this situation. However, you are using the same D-flop to assign to clk2x, so there will be propagation delays there as well. You last always block now represents a clock domain crossing issue. So depending on the skews between the two clocks, you could still have a race condition.
One way of correcting this is using a clock generator module with an even higher frequency clock
always #2.5ns clk = !clk;
always #(posedge clk) begin
clk1x <= !clk1x;
if (clk1x == 1)
clk2x = !clk2x;
Of course you have solved this problem, but I think there is a better way.
The book says one can use blocking assignment to avoid race. But blocking assignemnt causes errors in synopsys lint check. So, one way to avoid race problem without lint error is to use dummy logic, like this
wire [31:0] a_logic;
wire dummy_sel;
assign dummy_sel = 1'b0;
assign a_logic = dummy_sel ? ~a : a;
always #(posedge clk2x) begin
b <= a_logic;
end

hardware implementation of Modulo m adder

I have 8 inputs whose modulo sum i have to take with modulus m.i know algorithm for 2 input but it is not working here.
eg i have sum=sum0+sum1+sum2+sum3+sum4+sum5+sum6+sum7 and i have to take mod m of sum.How to do this rom hardware implementation point of view?
i aslo write code but its not working
m3 is mod3
always#(posedge clk)
begin
sum3a<=mod30+mod31;
sum3b<=mod32+mod33;
sum3c<=mod34+mod35;
sum3d<=mod36+mod37;
sum3e<=sum3a+sum3b;
sum3f<=sum3c+sum3d;
x31= (sum3e+sum3f);
x32= (sum3e-m3);
if (x32>=0 )
sum3 <= x32;
else
sum3 <= x31;
end
Do not mix blocking and non-blocking assignments in the same always block. sum3e variable depends on sum3a and sum3b but at the same time sum3a and sum3b value is changing because of non-blocking assignments,This will results in logical errors.

How to generate delay in verilog for synthesis?

I Want to Design a Verilog code for Interfacing 16*2 LCD. As in LCD's to give "command" or "data" we have to give LCD's Enable pin a "High to low Pulse " pulse that means
**E=1;
Delay();//Must be 450ns wide delay
E=0;**
This the place where I confuse I means in Verilog for synthesis # are not allowed so how can I give delay here I attached my code below. It must be noted that I try give delay in my code but I think delay not work so please help me to get rid of this delay problem......
///////////////////////////////////////////////////////////////////////////////////
////////////////////LCD Interfacing with Xilinx FPGA///////////////////////////////
////////////////////Important code for 16*2/1 LCDs/////////////////////////////////
//////////////////Coder-Shrikant Vaishnav(M.Tech VLSI)/////////////////////////////
///////////////////////////////////////////////////////////////////////////////////
module lcd_fpgashri(output reg [7:0]data,output reg enb,output reg rs,output reg rw ,input CLK);
reg [15:0]hold;
reg [13:0]count=0;
//Code Starts from here like C's Main......
always#(posedge CLK)
begin
count=count+1; //For Delay
//For LCD Initialization
lcd_cmd(8'b00111000);
lcd_cmd(8'b00000001);
lcd_cmd(8'b00000110);
lcd_cmd(8'b00001100);
//This is a String "SHRI" that I want to display
lcd_data(8'b01010011);//S
lcd_data(8'b01001000);//H
lcd_data(8'b01010010);//R
lcd_data(8'b01001001);//I
end
//Task For Command
task lcd_cmd(input reg [7:0]value);
begin
data=value;
rs=1'b0;
rw=1'b0;
enb=1'b1; //sending high to low pulse
hold=count[13]; //This is the place where I try to design delay
enb=1'b0;
end
endtask
//Task for Data
task lcd_data(input reg [7:0]value1);
begin
data=value1;
rs=1'b1;
rw=1'b0;
enb=1'b1; //sending high to low pulse
hold=count[13]; //This is the place where I try to design delay
enb=1'b0;
end
endtask
endmodule
You seem to be stuck in a software programming mindset based on your code, you're going to have to change things around quite a bit if you want to actually describe a controller in HDL.
Unfortunately for you there is no way to just insert an arbitrary delay into a 'routine' like you have written there.
When you write a software program, it is perfectly reasonable to write a program like
doA();
doB();
doC();
Where each line executes one at a time in a sequential fashion. HDL does not work in this way. You need to not think in terms of tasks, and start thinking in terms of clocks and state machines.
Remember that when you have an always block, the entire block executes in parallel on every clock cycle. When you have a statement like this in an always block:
lcd_cmd(8'b00111000);
lcd_cmd(8'b00000001);
lcd_cmd(8'b00000110);
lcd_cmd(8'b00001100);
This does you no good, because all four of these execute simultaneously on positive edge of the clock, and not in a sequential fashion. What you need to do is to create a state machine, such that it advances and performs one action during a clock period.
If I were to try to replicate those four lcd_cmd's in a sequential manner, it might look something like this.
always #(posedge clk)
case(state_f)
`RESET: begin
state_f <= `INIT_STEP_1;
data = 8'b00111000;
end
`INIT_STEP_1: begin
state_f <= `INIT_STEP_2;
data = 8'b00000001;
end
`INIT_STEP_2: begin
state_f <= `INIT_STEP_3;
data = 8'b00000110;
end
`INIT_STEP_3: begin
state_f <= `INIT_STEP_4;
data =8'b00111000;
end
`INIT_STEP_4: begin
state_f <= ???; //go to some new state
data = 8'b00000110;
end
endcase
end
Now with this code you are advancing through four states in four clock cycles, so you can start to see how you might handle writing a sequence of events that advances on each clock cycle.
This answer doesn't get you all of the way, as there is no 'delay' in between these as you wanted. But you could imagine having a state machine where after setting the data you move into a DELAY state, where you could set a counter which counts down enough clock cycles you need to meet your timing requirements before moving into the next state.
The best way to introduce delay is to use a counter as Tim has mentioned.
Find out how many clock cycles you need to wait to obtain the required delay (here 450ns) w.r.t your clock period.
Lets take the number of clock cycles calculated is count. In that case the following piece of code can get you the required delay. You may however need to modify the logic for your purpose.
always # (posedge clk) begin
if (N == count) begin
N <= 0;
E = ~E;
end else begin
N <= N +1;
end
end
Be sure to initialize N and E to zero.
Check the clock frequency of your FPGA board and initialize a counter accordingly. For example, if you want a delay of 1 second on an FPGA board with 50MHz clock frequency, you will have to write a code for a counter that counts from 0 to 49999999. Use the terminalCLK as clk for your design. Delayed clock input will put a delay to your design. The psuedo code for that will be:
module counter(count,terminalCLK,clk)
parameter n = 26, N = 50000000;
input clk;
output reg [n-1:0] count;
output reg terminalCLK;
always#(posedge clk)
begin
count <= count + 1;
if (count <= N/2)
terminalCLK <= ~terminalCLk;
if (count == N)
terminalCLK <= ~terminalCLk;
end

Issue with Logic in Verilog

I'm trying to write a multiplier based on a design. It consists of two 16-bit inputs and the a single adder is used to calculate the partial product. The LSB of one input is AND'ed with the 16 bits of the other input and the output of the AND gate is repetitively added to the previous output. The Verilog code for it is below, but I seem to be having trouble with getting the outputs to work.
module datapath(output reg [31:15]p_high,
output reg [14:0]p_low,
input [15:0]x, y,
input clk); // reset, start, x_ce, y_ce, y_load_en, p_reset,
//output done);
reg [15:0]q0;
reg [15:0]q1;
reg [15:0]and_output;
reg [16:0]sum, prev_sum;
reg d_in;
reg [3:0] count_er;
initial
begin
count_er <= 0;
sum <= 17'b0;
prev_sum <= 17'b0;
end
always#(posedge clk)
begin
q0 <= y;
q1 <= x;
and_output <= q0[count_er] & q1;
sum <= and_output + prev_sum;
prev_sum <= sum;
p_high <= sum;
d_in <= p_high[15];
p_low[14] <= d_in;
p_low <= p_low >> 1;
count_er <= count_er + 1;
end
endmodule
I created a test bench to test the circuit and the first problem I see is that, the AND operation doesn't work as I want it to. The 16-bits of the x-operand are and'ed with the LSB of the y-operand. The y-operand is shifted by one bit after every clock cycle and the final product is calculated by successively adding the partial products.
However, I am having trouble starting from the sum and prev_sum lines and their outputs are being displayed as xxxxxxxxxxxx.
You don't seem to be properly resetting all the signals you need to, or you seem to be confusing the way that nonblocking assignments work.
After initial begin:
sum is 0
prev_sum is 0
and_output is X
After first positive edge:
sum is X, because and_output is X, and X+0 returns X. At this point sum stays X forever, because X + something is always X.
You're creating a register for almost every signal in your design, which means that none of your signals update immediately. You need to make a distinction between the signals that you want to register, and the signals that are just combinational terms. Let the registers update with nonblocking statements on the posedge clock, and let the combinational terms update immediately by placing them in an always #* block.
I don't know the algorithm that you're trying to use, so I can't say which lines should be which, but I really doubt that you intend for it to take one clock cycle for x/y to propagate to q0/q1, another cycle for q to propagate to and_output, and yet another clock cycle to propogate from and_output to sum.
Comments on updated code:
Combinational blocks should use blocking assignments, not nonblocking assignments. Use = instead of <= inside the always #* block.
sum <= and_output + sum; looks wrong, It should be sum = and_output + p_high[31:16] according to your picture.
You're assigning p_low[14] twice here. Make the second statement explicitly set bits [13:0] only:
p_low[14] <= d_in;
p_low[13:0] <= p_low >> 1;
You are mixing blocking and nonblocking assignments in the same sequential always block, which can cause unexpected results:
d_in <= p_high[15];
p_low[14] = d_in;

24 bit counter state machine

I am trying to create a counter in verilog which counts how many clock cycles there have been and after ten million it will reset and start again.
I have created a twenty four bit adder module along with another module containing twenty four D Flip flops to store the count of the cycles outputted from the adder.
I then want to have a state machine which is in the count state until ten million cycles have passed then it goes to a reset state.
Does this sound right? The problem is I am not sure how to implement the state machine.
Can anyone point me to a website/book which could help me with this?
thanks
As Paul S already mentioned, there is no need for a state machine if you want your counter to keep counting after an overflow. You can do something like this (untested, might contain typos):
module overflow_counter (
clk,
reset,
enable,
ctr_out
);
// Port definitions
input clk, reset, enable;
output [23:0] ctr_out;
// Register definitions
reg [23:0] reg_ctr;
// Assignments
assign ctr_out = reg_ctr;
// Counter behaviour - Asynchronous active-high reset
initial reg_ctr <= 0;
always # (posedge clk or posedge reset)
begin
if (reset) reg_ctr <= 0;
else if (enable)
begin
if (reg_ctr == 10000000) reg_ctr <= 0;
else reg_ctr <= reg_ctr + 1;
end
end
endmodule
Of course, normally you'd use parameters so you don't have to make a custom module every time you want an overflowing counter. I'll leave that to you ;).
[Edit] And here are some documents to help you with FSM. I just searched Google for "verilog state machine":
EECS150: Finite State Machines in Verilog
Synthesizable Finite State Machine Design Techniques
I haven't read the first paper, so I can't comment on that. The 2nd one shows various styles of coding FSMs, among which the 3 always blocks style, which I highly recommend, because it's a lot easier to debug (state transitions and FSM output are neatly separated). The link seems to be down, so here is the cached Google result.
You don't need a state machine. You already have state in the counter. All you need to do is detect the value you want to wrap at and load 0 into your counter at that point
In pseudo-code:
if count == 10000000 then
nextCount = 0;
else
nextCount = count + 1;
...or...
nextCount = count + 1;
if count == 10000000 then
resetCount = 1;
State machines are not too tricky. Use localparam (with a width, don't forget the width, not shown here because it is just one bit) to define labels for your states. Then create two reg variables (state_reg, state_next). The _reg variable is your actual register. The _next variable is a "wire reg" (a wire that can be assigned to inside a combinational always block). The two things to remember are to do X_next = X_reg; in the combinational always block (and then the rest of the combinational logic) and X_reg <= X_next; in the sequential always block. You can get fancy for special cases but if you stick to these simple rules then things should just work. I try not to use instantiation for very simple things like adders since Verilog has great support for adders.
Since I work with FPGAs, I assign initial values to my registers and I don't use a reset signal. I'm not sure but for ASIC design I think it is the opposite.
localparam STATE_RESET = 1'b0, STATE_COUNT = 1'b1;
reg [23:0] cntr_reg = 24'd0, cntr_next;
reg state_reg = STATE_COUNT, state_next;
always #* begin
cntr_next = cntr_reg; // statement not required since we handle all cases
if (cntr_reg == 24'd10_000_000)
cntr_next = 24'd0;
else
cntr_next = cntr_reg + 24'd1;
state_next = state_reg; // statement required since we don't handle all cases
case (state_reg)
STATE_COUNT: if (cntr_reg == 24'd10_000_000) state_next = STATE_RESET;
endcase
end
always #(posedge clk) begin
cntr_reg <= cntr_next;
state_reg <= state_next;
end
I found this book to be very helpful. There is also a VHDL version of the book, so you can use both side-by-side as a Rosetta Stone to learn VHDL.

Resources