Taylor Series in Verilog - verilog

I am doing my first student Project in Verilog. My project is to calculate log base 2 using "Taylor Series" in Fixed-Point arithmetic (s4.27). I have implemented the Horner's method as well in my code.
Overall it looks like this:
log=c0+(q*(c1+(q*(c2+(q*(c3+(q*(c4(q*(c5+(q*c6))))))))));
Finally log_base_2=log*con;
`timescale 10ns/10ns
module log (q ,clk,log_result);
input clk;
input [31:0] q; // q=x-a; x=user input, a=1.5 (Taylor series is calculated around point "a")
output [31:0] log_result;
localparam con= 32'h0B8AA3B0; //1.44269504088895
localparam c0 = 32'h033E647C; //0.40546510810816
localparam c1 = 32'h05555558; //0.66666666666666
localparam c2 = 32'hFE38E38E; //-0.222222222222
localparam c3 = 32'h00CA4588; //0.0987654321
localparam c4 = 32'hFF9ADD3C; //-0.04938271605
localparam c5 = 32'h0035F068; //0.02633744856
localparam c6 = 32'hFFE208AA; //-0.01463191587
wire [31:0] x0,x1,x2,x3,x4,x5,x6;
wire [31:0] y0,y1,y2,y3,y4,y5,y6;
multiplier #(27,32) m6(.i_multiplicand(q),.i_multiplier(c6),.o_result(x6));
adder #(27,32) a6 (.a(x6),.b(c5),.c(y6));
multiplier #(27,32) m5(.i_multiplicand(q),.i_multiplier(y6),.o_result(x5));
adder #(27,32) a5 (.a(x5),.b(c4),.c(y5));
multiplier #(27,32) m4(.i_multiplicand(q),.i_multiplier(y5),.o_result(x4));
adder #(27,32) a4 (.a(x4),.b(c3),.c(y4));
multiplier #(27,32) m3(.i_multiplicand(q),.i_multiplier(y4),.o_result(x3));
adder #(27,32) a3 (.a(x3),.b(c2),.c(y3));
multiplier #(27,32) m2(.i_multiplicand(q),.i_multiplier(y3),.o_result(x2));
adder #(27,32) a2 (.a(x2),.b(c1),.c(y2));
multiplier #(27,32) m1(.i_multiplicand(q),.i_multiplier(y2),.o_result(x1));
adder #(27,32) a1 (.a(x1),.b(c0),.c(y1));
multiplier #(27,32) (.i_multiplicand(con),.i_multiplier(y1),.o_result(x0));
assign log_result = x0;
endmodule
Test bench code:
`timescale 10ns/10ns
module tb_log ();
reg clk;
reg [ 31 : 0 ] q;
wire [ 31 : 0 ] log_result;
log log_i (
.q(q),
.clk(clk),
.log_result(log_result)
);
parameter CLKPERIODE = 100;
initial clk = 1'b1;
always #(CLKPERIODE/2) clk = !clk;
initial begin
$dumpfile("log_wave.vcd");
$dumpvars(1);
$monitor ("Q=%h,Log2=%h ", q, log_result);
#1
#(CLKPERIODE)
q = 32'hFC000000; // q=1-1.5=-0.5;
$finish();
end
endmodule
So I am expecting a result close to zero. But unfortunately I am getting 9B917CED. When I tried to include clock an error naming "Malformed Statement" occurred. I am using Icarus verilog for compiling.
Code for fixed-point multiplier(qmult) and adder(qadd)
I am sure there are bugs but currently my rookie eyes are unable to notice it.
What am I missing?

I suggest you do just one multiplication and one adder and check the result of that. Fixed point multiplications should be very well checked for overflowing, which I very strongly suspect is the case here!
It is a problem I have seen again and again: multipliers in all Verilog examples take two N bit numbers and produce a 2N result. Very few sources tackle e.g. a 20 stage FIR filter where you have twenty multipliers each using two 16 bit numbers. Following "example" code you end up with a 52 bit wide result.
To prevent overflow each argument of the multiplication should be <16 bits. (As you are using 32 bits results). Then every time you add you need another extra bit. Thus either your data gets wider further down the stream, or you have to start sufficiently small that it fits in the en result of 32 bits.
You can compensate if you have constant values and know what range of the result is. e.g. in a digital filter all coefficients add up to 1 and they are often positive, negative,positive, negative so the additions do not all increment all the time.
Welcome to the world of hardware integer arithmetics!

Related

Fixed-point Signed Multiplication in Verilog

I am designing a signed verilog multiplier which I intend to use multiple times in another module.
My two inputs will be always s4.27 format. 1 bit signed, 4 bits of integer and 27 bits of fraction. My output has to be also in s4.27 format and I have to get the most accurate result out of it.
In C, the following not so perfect code snippet did the job.
int32_t mul(int32_t x, int32_t y)
{
int64_t mul = x;
mul *= y;
mul >>= 27;
return (int32_t) mul;
}
In verilog my simple version of code is given below,
`timescale 1ns/1ps
module fixed_multiplier (
i_a,
i_b,
o_p,
clk
);
input clk;
input signed [31:0] i_a;
input signed [31:0] i_b;
output signed [31:0] o_p;
wire signed [63:0] out;
assign out = i_a*i_b;
assign o_p = out;
endmodule
The above mentioned code has bugs that I know because I am not getting the desired results at all.
So my questions are,
(1) As this line "assign o_result = out;" seems crucial to me, how shall I do my assignments to my final output so that I get the correct and most accurate s4.27 format output? Please note, this output will be fed to an adder and the adder output will be again an input for the multiplier.
Above question being asked, I also tried with xor-ing of sign bits of both inputs and assigning [57:27] bits to final output. Did not suit me and resulted in overflow, while in C same inputs did not give any overflow error.
(2) With C I did not have any problem with fixed-point multiplication while in verilog I guess I am struggling as I am quite a newbie. Any suggestions what things to keep in mind while dealing with signed multiplication/addition?
Below is the testbench code,
`timescale 1ns / 1ps
module tb_mul;
// Inputs
reg clk;
reg [31:0] a;
reg [31:0] b;
// Outputs
wire [31:0] c;
fixed_multiplier mul_i (
.clk(clk),
.i_a(a),
.i_b(b),
.o_p(c)
);
initial begin
$dumpfile("test_mul.vcd");
$dumpvars(1);
$monitor ("a=%h,\tb=%h,\tc=%h",a,b,c);
a = 32'h10000000;
b = 32'h10000000;
$finish();
end
endmodule
Thank you in advance.
Your multiplier does not work like you think it does. Verilog will assume your multiplication will be unsigned, and will compute it as such. You might want to do something like the following:
wire [61:0] temp_out;
assign temp_out = i_multiplicand[30:0] * i_multiplier[30:0];
assign sign = i_multiplicand[31] ^ i_multiplier[31];
assign out = {sign, temp_out[57:37]};

Turn 2 bit module (Multiplier) into more bits

I have the following code for a 2 bit multiplier:
module Multiplier (a0, a1, b0, b1, c[3:0]);
output [3:0]c;
input a0, a1, b0, b1;
wire a0b1, a1b0, ha0c, a1b1;
and (c[0], a0, b0);
and (a0b1, a0, b1);
and (a1b0, a1, b0);
HalfAdder ha0 (a1b0, a0b1, c[1], ha0c);
and (a1b1, a1, b1);
HalfAdder ha1 (ha0c, a1b1, c[2], c[3]);
endmodule
I want to be able to expand this to more than 2 bits though (32 bits). The structure of my code poses a challenge for this though. First off I would have to have 68 parameters for the module. Also I would have to manually create 64 wires (duplicates of wire a0b1, a1b0, ha0c, a1b1). Finally I would need to manually write out a bunch of logic gates and HalfAdder modules to connect all the logic. Because of this I am wondering if there is a way that I can refactor my code to be able to instantiate a binary multiplier of n (a passed parameter) size.
You need to parameterize and use a generate block. (And it is much better to use a synchronous circuit then an asynchronous circuit).
Here is an incomplete example, you can fill in the necessary logic :
module Multiplier (a, b, c, clk);
parameter WIDTH = 64;
output [2*WIDTH:0]c;
input [WIDTH-1:0]a;
input [WIDTH-1:0]b;
input clk;
genvar i;
generate for (i = 0; i < WIDTH; i <= i + 1)
begin : shifts
// shift a 1-bit for each i and 'logical and' it with b
reg [WIDTH + i :0]carry;
wire [WIDTH + i -1:0]shifted = {a,i{0}} & b[i];
// sum the result of shift and 'logical and'
always # (posedge clk)
begin
carry <= shifted + shifts[i-1].carry ;
end
end
assign c = shifts[WIDTH].carry;
endgenerate
endmodule

red output running testbench on 4bit ALU

So I'm trying to create a 4-bit ALU in verilog that does multiplication, addition, bcd addition and concatenation. Here's my code so far:
module alu4bit(A,B,S,Y);
input [3:0] A, B;
input [1:0] S;
output [7:0] Y;
reg [7:0] Y;
wire [7:0] A0, A1, A2, A3;
multiplier4bit mod3(A,B,A3);
always # (A,B,S)
begin
case (S)
// 2'b00:
// 2'b01:
// 2'b10:
2'b11: Y = A3;
endcase
end
endmodule
When trying to run a test bench setting S=3 for my multiplier and A=5, B=5, I get red lines with XXXXX for output. I think it has something to do with how I set up the outputs for the submodules. Should A0-3 be wires? Wish I had an error message to go by, but I'm kind of stuck at this point.
If you want your mux to be sensitive to the A3 signal, you need to add it to the sensitivity list:
always # (A,B,S,A3)
Consider simplifying this to:
always #*
Refer to the IEEE Std 1800-2012, section "9.4.2.2 Implicit event_expression list".

What is the improve way to multiplying by 15?

I'm trying to implement as follows to multiplying by 15.
module mul15(
output [10:0] result,
input [3:0] a
);
assign result = a*15;
endmodule
But is there any improve way to multiplying to a by 15?
I think there are 2 ways like this
1.result = a<<4 -1;
2.result = {a,3'b1111_1111};
Ans I think the best way is 2.
but I'm not sure also with aspect to synthesis.
update:
What if I am multiplying 0 at {a,3'b1111_1111}? This is 255 not 0.
Does anyone know the best way?
Update
How about this way?
Case1
result = {a,8'b0}+ {a,7'b0}+ {a,6'b0}+ {a,5'b0}+ {a,4'b0}+ {a,7'b0}+ {a,3'b0}+ {a,2'b0}+ {a,1'b0}+ a;
But it looks 8 adder used.
Case2
result = a<<8 -1
I'm not sure what is the best way else.
There is always a*16 - a. Static multiplications of power of 2 are basically free in hardware; it is just hard-coded 0s to the LSB. So you just need one 11-bit full-subtracter, which is a full adder and some inverters.
other forms:
result = a<<4 - a;
result = {a,4'b0} - a; // unsigned full-subtractor
result = {a,4'b0} + ~a + 1'b1; // unsigned full-adder w/ carry in, 2's complement
result = {{3{a[3]}},a,4'b0} + ~{ {7{a[3]}}, a} + 1'b1; // signed full-adder w/ carry in, 2's complement
The cleanest RTL version is as you have stated in the question:
module mul15(
input [3:0] a
output reg [7:0] result,
);
always #* begin
result = a * 4'd15;
end
endmodule
The Multiplicand 15 in binary is 4'b1111; That is 8 + 4 + 2 + 1.
Instead of a multiplier it could be broken down into the sum of these powers of 2. Powers of 2 are just barrel shifts. This is how a shift and add multiplier would work.
module mul15(
input [3:0] a
output reg [7:0] result,
);
always #* begin
// 8 4 2 1 =>15
result = (a<<3) + (a<<2) + (a<<1) + a;
end
endmodule
To minimise the number of adders required a CSD could be used. making 15 out of 16-1:
module mul15(
input [3:0] a
output reg [7:0] result,
);
always #* begin
// 16 - 1 =>15
result = (a<<4) - a;
end
endmodule
With a modern synthesis tool these should all result in same the thing. Therefore having more readable code which gives a clear instruction to the tool as to what you intended gives it the free rein to optimise as required.

32 bit adder subtractor ALU using generate

I need to implement a 32 bit adder subtractor ALU for a class assignment. I have a 1-bit adder subtractor that works fine and the operation is made with the help of a select statement (code for all is given below). Anyway, the problem I am facing is that I am unable to figure out how to use the carry/borrow out of one module to the subsequent module.
module add_sub(select, i0, i1, cin, out, cout
);
input i0, i1, select, cin;
output out, cout;
wire y0, y1, y2, y3, y4, y5, y6;
wire z0, z1, z2, z3, z4;
//diff = i0 xor i1 xor cin
//borrow = cin. ~(i1 xor i2) or ~x.y
xor (y0, i1, cin);
xor (y1, i0, y0); //y1=diff or sum as only carry and borrow vary between adder and subtractor circuits
xor (y2, i1, i0);
and (y3, cin, ~y2);
and (y4, ~i0, i1);
or (y6, y5, y4); //y6 = borrow
and (z0, i0, i1);
xor (z1, i0, i1);
and (z2, cin, z1);
or (z3, z0, z2); //z3= carry out for sum
//conditional operator for assigning sum or difference. if select = 0, we add, else subtract
assign out = y1;
assign cout = select ? y6 : z3;
endmodule
This module is instantiated in a loop in the alu module that is given below...
module alu(sel, num1, num2, alu_cin, alu_out, alu_c
);
parameter N = 32;
input sel; //select line for add or sub
input [N-1:0] num1; //two inputs
input [N-1:0] num2;
input alu_cin;
output [N-1:0] alu_out; //32 bit output
output alu_c; // becomes final carry or borrow accordingly
genvar i;
generate for (i=0; i<=N-1; i=i+1)
begin: alu_loop
if (i == 0)
add_sub as_i (sel, num1[i], num2[i], alu_cin, alu_out[i], alu_c);
else
add_sub as_i (sel, num1[i], num2[i], alu_loop[i-1].as_i.cout[i-1], alu_out[i], alu_c);
end
endgenerate
endmodule
In the test bench for the alu, I gave appropriate 32 bit values and the select value as I need. The problem comes with
add_sub as_i (sel, num1[i], num2[i], alu_loop[i-1].as_i.cout[i-1], alu_out[i], alu_c);
It says "Indexing cannot be applied to a scalar." as I am trying to simulate it. Syntax check is completed perfectly.
I need access to cout from the one-bit module to pass it on as cin to the next one. The alu_c can be overwritten as only the last one bit is needed.
Any help would be appreciated. Thanks in advance. :) All this is done on Xilinx ISE through Verilog modules.
It is syntactically correct but you are using a bit-select on a single bit value, which is a semantic error.
add_sub as_i (
sel,num1[i],num2[i],alu_loop[i-1].as_i.cout[i-1],alu_out[i],alu_c);
^^^^
Declared as scalar output in add_sub
output out, cout;
While Verilog allows referencing a port using the dot notation(hierarchical referencing), it's not a good practice outside of testbenches. You should declare a wire for that connectivity instead.
for (i=0; i<=N-1; i=i+1)
begin: alu_loop
wire cout; // Visible as alu_loop[N].cout
end

Resources