I have been working on approximate multiplication recently and I want to write a Verilog code for dynamic segment multiplication (DSM) . It suggest that you find the first index in you number which has a value of 1 and then take other 3 indexes next to it to form a 4 bit number that represent an 8 bit number then you should multiply these 4 bit numbers instead of 8 bits then some shifts to have the final result it helps a lot on hardware actually.. but my problem is about multiplication of these segments because sometimes they should be considered signed and some time unsigned I have the last 3 lines of my code: (a and b are input 8 bit numbers) and m1 and m2 are segments I wrote m,m2 as reg signed [3:0] and a and b as input signed [7:0]
Here is my code:
assign out = ({a[7],b[7]}==2'b11)||({a[7],b[7]}==2'b00) ? ($unsigned(m1)*$unsigned(m2)) << (shift_m1+shift_m2) : 16'dz;
assign out = ({a[7],b[7]}==2'b01) ? ($signed({1'b0,m1})*$signed(m2)) << (shift_m1+shift_m2) : 16'dz;
assign out = ({a[7],b[7]}==2'b10) ? ($signed(m1)*$signed({1'b0,m2})) << (shift_m1+shift_m2) : 16'dz;
But in simulation Verilog always considers segments as unsigned and does unsigned multiplication even though I noted signed or unsigned mark...
Can anyone help? I read all of the questions about this problem in stackoverflow and other places but still cannot solve this issue...
The rules for non-self determined operands say that if one operand is unsigned, the result is unsigned. 16'dz is unsigned.
The conditional operator i ? j : k has the condition operand i self-determined, but the two selections j and k are in a context based on the assignment or expression it is a part of. The shift operator i << j has the shift amount operand j self-determined.
All of the context rules are explained in section 11.6.1 Rules for expression bit lengths in the IEEE 1800-2017 SystemVerilog LRM.
You can get your desired result by using the signed literal 16'sdz.
However the logic you wrote may not be synthesizable for certain technologies that do not allow using a z state inside your device. The correct and more readable way is using a case statement:
alway #(*) case({a[7],b[7]})
2'b00,
2'b11: out = $unsigned(m1)*$unsigned(m2) << shift_m1+shift_m2;
2'b01: out = $signed({1'b0,m1})*m2 << shift_m1+shift_m2;
2'b10: out = m1*$signed({1'b0,m2}) << shift_m1+shift_m2;
endcase
I have two 8-bit inputs A and B,
input [7:0] A,B;
and a 9-bit output F,
output reg [8:0] F;
A and B are combined and assigned to F like this:
F <= ~(A^B);
If A is equal to 8'hFF, and B is equal to 8'hF0, why does F become 9'h1F0 and not 9'h0F0?
Why is the output 9'h1F0 and not 9'h0F0?
You defined F as 9 bits wide. Thus the compiler will expand the right-hand-side arguments to 9 bits before doing any operations.
As both A and B are unsigned they become resp
A = 9'h0FF, B=9'h0F0. EXOR gives 9'h00F. Ones complement then gives 9'h1F0.
Beware that the width expansion does not happen if you put the expression between {}:
F2 = {~(A^B)};
F2 will be 9'h0F0;
Because sections 11.8.2 Steps for evaluating an expression and 11.8.3 Steps for evaluating an assignment of the IEEE 1800-2017 LRM effectively say that the operands get extended first to match the size of the result before any operation is performed.
I am a beginner in Verilog.I need to understand the logic of a testcase but I am having difficulty because of the logic of these variables.
Are these 'define F and G of integer types.I read that parameter are constants.
'define F 32
'define G 0
module M(...);
parameter pMaxPacketsSize =1024;
localparam pTotalBits=3*'G;
localparam pForcePktSize=(pMaxPacketsSize-'F);
localparam pLastPacketSize =((pTotalBits-1)%(pForcePktSize))+1;
localparam pNumTransactions=((pTotalBits-1)/(pForcePktSize))+1;
localparam pPortSize=(pNumTransactions>1)?pMaxPacketsSize:((((pTotalBits-1)/32)+1)*32)+'F;
As G is defined to be 0,
what will be the value of ForcePacketsize.I attempted binary subtraction and arrived at 128(7 bits)[Is this Correct?].[0-32].Are all these operations needs to be performed in binary arithmetic.I want to know the value of these parameters(pForcePktSize,pLastPacketSize,PNumTransactions).
One more statement I want to understand is this:
wire[pPortSize-1:0]D;
wire[pNumTransactions-1:0] t;
assign t=1'b1<<D[14:0];
I know it is of type :[size][radix][value] means 1 in binary then left shifting,but how this is being assigned to array(will t be 100000000000000 14 zeroes and then 1)
I tried to run online on some IDE's but get error that I give up.
`define in verilog is the same thing as #define in c. It defines a text macro. `G and `F instantiate macros and get replaced by their context in the program before parsing.
So, in your case
localparam pTotalBits=3*'G;
localparam pForcePktSize=(pMaxPacketsSize-'F);
will be replaced with
localparam pTotalBits=3*0;
localparam pForcePktSize=(pMaxPacketsSize-32);
The replacement is textual and instantiations of the macros just got replaced with their definitions. There is no type associated with macro definition.
I am trying to multiply 1x3 * 3X64 matrix, here since each value in matrix is decimal number so for each value I have taken 4 bits that is 4x64 bits in total accessing 4bits of each row at a time.
I tried to generalize it.
The matrix is of form 1x3 [2,4,3] &
3*64(64 decimal value in each row)
row 1[111111111111111111111111111111(64)]
row 2[11111111(8)22222222(8).....88888888(8)]
row 3[1234567812345678..................12345678]
The code which I tried
always#(h1,h2,h3)
begin
z1 =((w0[3:0]*h1[3:0])+(w1[3:0]*h2[3:0])+(w2[3:0]*h3[3:0]));
z2=((w0[3:0]*h1[7:4])+(w1[3:0]*h2[7:4])+(w2[3:0]*h3[7:4]));
.
.
.
.
.
z64=((w0[3:0]*h1[255:252])+(w1[3:0]*h2[255:252])+(w2[3:0]*h3[255:252]));
end
endmodule
I need generalized form of this..
Error that I have got:
ERROR:HDLCompilers:110 - "mat.v" line 36 Least significant bit operand
in part-select of vector wire 'h1' is illegal
for(i=3;i<255;i=i+4)
begin
for(j=0;j<255;j=j+4)
begin
z[i:j]=((w0[3:0]*(h1[i:j]))+(w1[3:0]*h2[i:j])+(w2[0]*h3[i:j]));
end
A part select in Verilog must have constant bounds. h1[i:j] is illegal. h1[i +: 4] is legal and means the same as the illegal h1[i:(i+3)]. (And h1[i+3 -: 4] means the same as the illegal h1[(i+3):i]).
However, wouldn't your problem not be better solved by using two dimensional arrays? eg:
reg [3:0] h1 [0:63];
This is a follow-on question from How can I iteratively create buses of parameterized size to connect modules also iteratively created?. The answer is too complex to answer in a comment and the solution may be helpful for other SOs. This question is following the self-answer format. Addition answer are encouraged.
The following code works and uses a bi-directional array.
module Multiplier #(parameter M = 4, parameter N = 4)(
input [M-1:0] A, //Input A, size M
input [N-1:0] B, //Input B, size N
output [M+N-1:0] P ); //Output P (product), size M+N
wire [M+N-1:0] PP [N-1:0]; // Partial Product array
assign PP[0] = { {N{1'b0}} , { A & {M{B[0]}} } }; // Pad upper bits with 0s
assign P = PP[N-1]; // Product
genvar i;
generate
for (i=1; i < N; i=i+1)
begin: addPartialProduct
wire [M+i-1:0] gA,gB,gS; wire Cout;
assign gA = { A & {M{B[i]}} , {i{1'b0}} };
assign gB = PP[i-1][M+i-1:0];
assign PP[i] = { {(N-i){1'b0}}, Cout, gS}; // Pad upper bits with 0s
RippleCarryAdder#(M+i) adder( .A(gA), .B(gB), .S(gS), .Cin(1'b0), .* );
end
endgenerate
endmodule
Some of the bits are never used, such as PP[0][M+N-1:M+1]. A synthesizer will usually remove these bits during optimization and possibly give a warning. Some synthesizers are not advance enough to do this correctly. To resolve this, the designer must implement extra logic. In this example the parameter for all the RippleCarryAdder's would be set to M+N. The extra logic wastes area and potently degrades performance.
How can the unused bits be safely eliminated? Can multidimensional arrays with different dimensions be used? Will the end code be readable and debug-able?
Can multidimensional arrays with different dimensions be used?
Short answer, NO.
Verilog does not support unique sized multidimensional arrays. SystemVerilog does support dynamic arrays however these cannot be connected to module ports and cannot be synthesized.
Embedded code (such as Perl's EP3, Ruby's eRuby/ruby_it, Python's prepro, etc.) can generate custom denominational arrays and code iterations, but the parameters must be hard coded before compile. The final value of any parameter of a given instance is discoverer during compile time, well after the embedded script is ran. The parameter must be treated as a global constant, therefore Multiplier#(4,4) and Multiplier#(8,8) cannot exist in the same project unless to teach the script how to extract the full hierarchy and parameters of the project. (Good luck coding and maintaining that).
How can the unused bits be safely eliminated?
If the synthesizer is not advance enough to exclude unused bits on its own, then the bits can be optimized by flattening the multidimensional array into a one-dimensional array with intelligent part-select. The trick is finding the equation which can be achieved by following these steps:
Find the pattern of the lsb index for each part part select:
Assume M is 4, the lsb for each part-select are 0, 5, 11, 18, 26, 35, .... Plug this pattern into WolframAlpha to find the equation a(n) = (n-1)*(n+8)/2.
Repeat with M equal to 3 for the pattern 0, 4, 9, 15, ... to get equation a(n)=(n-1)*(n+6)/2
Repeat again with M equal to 5 for the pattern 0, 6, 13, 21, 30, ... to get equation a(n)=(n-1)*(n+10)/2.
Since the relation of M and N is linear (i.e. multiple; no exponential, logarithmic, etc.), only two equations are needed to create a variable parameter M equation. For non-linear equations more data-point equations are recommended. In this case note that for M=3,4,5 the pattern (n+6),(n+8),(n+10), therefore the generic equation can be derived to: lsb(n)=(n-1)*(n+2*M)/2
Fine the pattern of the msb index for each part select:
Use the same process of as finding the lsb (ends up being msb(n)=(n**2+(M*2+1)*n-2)/2). Or define the msb in terms of lsb: msb(n)=lsb(n+1)-1
IEEE std 1364-2001 (Verilog 2001) introduced macros with arguments and indexed part-select; see § 19.3.1 '`define' and § 4.2.1 'Vector bit-select and part-select addressing' respectively. Or see IEEE std 1800-2012 § 22.5.1 '`define' and § 11.5.1 'Vector bit-select and part-select addressing' respectively. This answer assumes that these features are supported by the SO's simulator and synthesizer since the generate keyword was also introduced in IEEE std 1364-2001, see § 12.1.3 'Generated instantiation' (and IEEE std 1800-2012 § 27. 'Generate constructs'). For tools that are not fully support IEEE std 1364-2001, see `ifdef examples provided here.
Since the functions to calculate the part-select ranges are frequently used, use `define macros with arguments. This will help prevent copy/paste bugs. The extra sets of () in the macro definitions are to insure proper order of operations. It is also a good idea to `undef the macros at the end of the module definition, preventing the global space from getting polluted. With the flattened array it may become challenging to debug. By defining pass-through connections within the generate block's for-loop the signal can become readable and can be probed in waveform.
module Multiplier #(parameter M = 4, parameter N = 4)(
input [M-1:0] A, //Input A, size M
input [N-1:0] B, //Input B, size N
output [M+N-1:0] P ); //Output P (product), size M+N
// global space macros
`define calc_pp_lsb(n) (((n)-1)*((n)+2*M)/2)
`define calc_pp_msb(n) (`calc_pp_lsb(n+1)-1)
`define calc_pp_range(n) `calc_pp_lsb(n) +: (M+n)
wire [`calc_pp_msb(N):0] PP; // Partial Product
assign PP[`calc_pp_range(1)] = { 1'b0 , { A & {M{B[0]}} } };
assign P = PP[`calc_pp_range(N)]; // Product
genvar i;
generate
for (i=1; i < N; i=i+1)
begin: addPartialProduct
wire [M+i-1:0] gA,gB,gS; wire Cout;
assign gA = PP[`calc_pp_range(i)];
assign gB = { A & {M{B[i]}} , {i{1'b0}} };
assign PP[`calc_pp_range(i+1)] = {Cout,gS};
RippleCarryAdder#(M+i) adder( .A(gA), .B(gB), .S(gS), .Cin (1'b0), .* );
end
endgenerate
// Cleanup global space
`undef calc_pp_range
`undef calc_pp_msb
`undef calc_pp_lsb
endmodule
Working example with side-by-side and test bench: http://www.edaplayground.com/s/6/591
Will the end code be readable and debug-able?
Yes, for anyone who has already learned how to properly use the generate construct. The generate block's for-loop defines local wires which are confined to scope of the loop index. gA form loop-0 and gA from loop-1 are unique signals and cannot interact with each other. The local signals can be probed in waveform which is great for debugging.