Width independent functions - verilog

Is it possible to write a function that can detect the input data width automatically? For example, consider the parity function below:
function parity;
input [31:0] data;
parity = ^ data;
endfunction
When parity(data) is called, the input data should be limited to 32 bits.
Alternatively, one could write a macro, such as `PARITY(data) in which the system function $bits can detect the width of data and make the macro width-independent. Is it possible to have the same flexibility for functions?
Edit: I need my code to be synthesizable.

You can create a parameterized function. See section 13.8 in the LRM. It looks like the function must be declared inside a class like this:
virtual class C #(parameter WIDTH=32);
static function parity (input [WIDTH-1:0] data);
parity=^data;
endfunction
endclass
Then when you call the function parameterized it with the bits task:
assign parity_bit = C#($bits(data))::parity(data);
Working example on EDA Playground.

You can use macros. The function can be declared like:
`define PARITY(FUNC_name, WIDTH) \
function FUNC_name (input [WIDTH-1:0] data); \
begin \
FUNC_name = ^ data; \
end \
endfunction
and you can call it with:
`PARITY(parity, 32);
assign parity_bit = parity(data);
This code is synthesizable in xilinx, altera and synopsys tools

It is possible using unbounded arrays.
Unfortunately SystemVerilog doesn't have decent support for unbounded arrays. The LRM seems to equate unbounded with dynamic, which suggests it's going to be almost impossible to create something synthesisable. VHDL has unbounded arrays which are supported by tools and incredibly useful so it's a pity that SystemVerilog didn't include this feature properly.
Here is an example:
function automatic logic parity(input logic data[]);
logic p = 0;
for (int i=0; i<data.size(); i++)
p ^= data[i];
return p;
//return = ^data; <--- not allowd on unpacked arrays?
endfunction
logic [7:0] data_in;
logic result;
logic data_in_unpacked [] = new[$bits(data_in)];
always_comb begin
// Convert to unpacked array (better way to do this?)
for (int i=0; i<$bits(data_in); i++)
data_in_unpacked[i] = data_in[i];
result = parity(data_in_unpacked);
end
This is running on Modelsim on EDAPlayground here: http://www.edaplayground.com/x/3tS
EDIT 1: Updated the code - I just realised it's possible to call new[] at initialisation and thus statically, so in theory synthesis tools could support this. It would be interesting to synthesise this and see...
EDIT 2: Thought I'd try synthesising and unsurprisingly Quartus doesn't like this:
Error (10170): Verilog HDL syntax error at testing.sv(10) near text "]"; expecting an operand
Error (10170): Verilog HDL syntax error at testing.sv(18) near text "]"; expecting an operand
Error (10112): Ignored design unit "my_parity" at testing.sv(2) due to previous errors

Interesting question. According to my knowledge, I don't think that's possible. I would also stay away from macros (even more problems). I can propose a synthesizable workaround:
When calling your function parity on widths lesser than your defined width pad your data with 0's like this: assign my_parity_bits = parity({16'd0, my_data}); Hopefully, synthesis tool would ignore those 0's but you will have to check it yourself.
If you want to perform such operation on large data buses in a convenient way you will have to write some more Verilog. E.g. a module that would accept a WIDTH parameter and actual data as an input vector. To do this, I would advise you to write a generic module that does exactly what your function parity does. Then, write a module which will be a parity wrapper. Inside this wrapper I would perform math operations on input WIDTH parameter to determine number of parity modules needed for input data and instantiate those modules in a generate loop.
Remember that Verilog is a hardware description language, thus such limitations. Think about what your code will synthesize into when writing RTL.

Quick update on using the paramaterized virtual class idea posted above. Stack overflow won't let me leave comments so I had to post a new answer. I just wanted to add that I tried this in Vivado, and it appears to work correctly in synthesis. Using nguthrie's solution above, I can do
logic [7:0] data;
assign data = {sw, btn};
assign led[0] = C#($bits(data))::parity(data);
This is in Vivado 2021

Related

Delay associated with xor of 1023 10 bit vectors in Verilog

I am somewhat new to verilog and I have a question that is confusing me .
I have a number of constant parameters , specifically nearly 1023 of them c0 , c1,c2 ..... c1022, each one being 10 bit in length . I also have a vector r[1022:0] , which is 1023 bits in length . My task is to compute ci*r[i] where i varies from 0 to 1022 and finally take the xor of the 1023 10 bit vectors that i get.When I do this in simulation , verilog generates the output at time 0 for the assign statement . How can verilog generate the output at time 0 ? Will there be no delay associated with these 1023 xors?
Also, if I need to do this succinctly , is there a short form that I can use or do I need to manually write c0 *r[0] ^ c1 *r[1] ......^ c[1022]*r[1022] which is synthesizable ?
A Verilog simulator will execute whatever legal syntax you give it—the tool knows nothing about what the implementation eventually looks like. It's up to you to feed timing constraints to the synthesis tool and it tells you if it can fit the logic to meet the constraints (or you might have to run another tool to see if it meets timing constraints).
Since you named your parameters c0, c1, c2, ..., you might as well named them czero, cone, ctwo, ... which gives you no options for shortcuts.
If you tool supports SystemVerilog, you can write your parameter as an array and then use the array xor reduction operator
parameter [9:0] C[1023] = {10'h123, 10'h234, ...};
assign out = C.xor() with (item*r[item.index]);
If you synthesis tool does not support this SystemVerilog syntax you, you can pack the parameter values into a single vector and use an indexed part select in Verilog.
parameter [10220-1:0] C = {10'h123, 10'h234, ...};
function [9:0] xor_reduction (input [1022:0] r);
integer I;
begin
xor_reduction = 0;
for(I=0;I<1023;I=I+1)
xor_reduction = xor_refuction ^ (r[1022-I]*C[I-:10]);
end
endfunction
assign out = xor_reduction(r);

How does %m work in $display system task?

I want to understand the usage of %m in the $display system task in Verilog.
This is sample code given in the book. It would be better if someone explains this with more examples as it seems unclear in the book.
//the highest-level module called top. No argument is required. This
//is a useful feature)
$display("This string is displayed from %m level of hierarchy");
-- This string is displayed from top.p1 level of hierarchy
The free IEEE Std 1800-2012, section 21.2.1.6 Hierarchical name format states:
The %m format specifier does not accept an argument. Instead, it
causes the display task to print the hierarchical name of the design
element, subroutine, named block, or labeled statement that invokes
the system task containing the format specifier. This is useful when
there are many instances of the module that calls the system task.
Here is an example:
module top;
buff b0 (.buf_in(1'b0), .buf_out());
endmodule
module buff (
input buf_in,
output buf_out
);
wire a;
inv i0 (.in(buf_in), .out(a ));
inv i1 (.in(a ), .out(buf_out));
initial $display("Inside hierarchy %m");
endmodule
module inv (
input in,
output out
);
assign out = ~in;
initial $display("Inside hierarchy %m");
endmodule
Outputs:
Inside hierarchy top.b0
Inside hierarchy top.b0.i0
Inside hierarchy top.b0.i1

Using real parameter to determine counter sizes

I am trying to make my debounce code more modular by passing in parameters that are the frequency and the desired bounce time to eliminate button/switch bounce. This is how I approached it:
module debounceCounter
#(
parameter CLOCK_FREQUENCY_Hz = 50_000_000,
parameter BOUNCE_TIME_s = 0.003
)
(
input wire sysClk, reset,
input wire i_async,
output reg o_sync
);
/* include tasks/functions */
`include "clog2.v"
/* constants */
parameter [(clog2(BOUNCE_TIME_s * CLOCK_FREQUENCY_Hz + 0.5) - 1) : 0]
MAX_COUNT = BOUNCE_TIME_s * CLOCK_FREQUENCY_Hz;
Synthesis using Xilinx ISE 14.7 Throws this error:
Xst:850 - "../../rtl/verilog/debounceCounter.v" line0: Unsupported real
constant
How can I get around this issue so that I can determine the counter size and max count value based on parameters being passed in from code above this module in the heirarchy? A majority of my code has sizes of variables and such determined by frequency generics, so not being able to use methods like VHDL has proven to create problems in my designs.
Seems to work fine on Vivado 2016.3 (the oldest I have available). I think the problem is that 2014.7 is too old to support this. You didn't show the contents of the `include, but I'm assuming its the one from AR# 44586. If so, it should take and return integers and it will truncate the real floating point values for you. Floating point arithmetic is fine to use in Verilog/SystemVerilog testbenches and parameters.
How can I get around this issue so that I can determine the counter
size and max count value based on parameters being passed in from code
above this module in the heirarchy?
Update to a recent version. 2017.1 or 2017.3 are working good for me. I tested the following on 2016.3 and it also worked fine.
Try using SystemVerilog (.sv) which supports the $clog2() system function natively without the `include. Not sure when .sv started working, but probably needs 2015+.
Verify that your version of clog2 in the clog2.v header matches the following
NOTE: There is another pretty serious bug in the code you posted.
When you want to get the MSB required to hold a constant expression "x" the pattern should be $clog2((x)+1)-1. You have only added 0.5 instead of 1. This causes there to not be enough bits whenever the result of the floating point expression "x" falls between 2^n and (2^n + 0.5). For example, what you have erronously computes the constant as 17'h0 instead of 18'h4_0000 for the the frequency 87381333 but it still appears to work for your example at 50Mhz. Murphy's law says you will accidentally fall into this narrow bad range at the worst possible time, but never during testing :).
For reference, this is what I tested, with the `include expanded inline:
`timescale 1ns / 1ps
module debounceCounter
#(
//parameter CLOCK_FREQUENCY_Hz = 50_000_000,
parameter CLOCK_FREQUENCY_Hz = 87381333, // whoops
parameter BOUNCE_TIME_s = 0.003
)
(
input wire sysClk, reset,
input wire i_async,
output reg o_sync
);
/* include tasks/functions */
//`include "clog2.v"
function integer clog2;
input integer value;
begin
value = value-1;
for (clog2=0; value>0; clog2=clog2+1)
value = value>>1;
end
endfunction
/* constants */
//parameter [(clog2(BOUNCE_TIME_s * CLOCK_FREQUENCY_Hz + 0.5) - 1) : 0] // <- BUG!!! 0.5 should 1
parameter [(clog2(BOUNCE_TIME_s * CLOCK_FREQUENCY_Hz + 1) - 1) : 0]
MAX_COUNT = BOUNCE_TIME_s * CLOCK_FREQUENCY_Hz;
initial
$display("MAX_COUNT %d", MAX_COUNT);
endmodule
Type Real is not synthesizable. Draw/Create your design before you translate into/write HDL and you will realize this. Ask yourself, "What does a real synthesize to in gates?"
For those tools (e.g. Synplify) that do "support" Type Real, it is just a vendor interpretation, and as such is impossible to "support" since it is not defined as part of any HDL standard. The implication: If you had a simulator that interprets Type Real one way, and your synthesizer (likely) interprets it another way, you will get sim/syn mismatches. You may get away with them, depending on what you are trying to accomplish, but, it would still be considered poor design practice.
Behavioral code, for modeling and use in testbenches, as stated above, a different story as it is not synthesized.

How does the synthesizer decides on bitwdith for intermediate results?

Consider the following module:
module power(input [11-1:0] xi,xq,output [22-1:0] y);
assign y = xi*xi + xq*xq;
endmodule
I know that my single assignment is actually decomposed of 3 steps: 2 squares and one addition. My question is how would the synthesizer decides on the bitwidth of the intermediate steps xi*xi and xq*xq?
I noticed that when running logic equivelance circuit (lec) for the above code, it causes trouble and could only be solved by decomposing the single assignment into three assignments as follows:
module power(input [11-1:0] xi,xq,output [22-1:0] yy);
wire [21-1:0] pi,pq;
assign pi = xi*xi;
assign pq = xq*xq;
assign yy = pi+pq;
endmodule
Here's how your simulator decides on bitwdith for intermediate results.
Verilog Simulation
This expression - assign y = xi*xi + xq*xq; - is an example of a context determined expression. A Verilog simulator takes the widest of all the nets or variables in the expression and uses that. So, in your code, the widest is y at 22 bits wide, so Verilog will use 22 bits throughout.
VHDL Simulation
The behaviour of a VHDL simulator depends on the package used. If you use the numeric_std package, as is recommended, then you would need to obey the following rules:
The width of the sum should be the same as the wider of the two operands.
The width of the product should be the sum of the widths of the operands.
Therefore, your code would compile if translated directly into VHDL:
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;
entity power is
port (xi, xq : in signed(11-1 downto 0);
y : out signed(22-1 downto 0));
end entity power;
architecture A of power is
begin
y <= xi*xi + xq*xq;
end architecture A;
Shouldn't everything be signed?
Given the names of your module (power) and inputs (xi and xq) and having spent 25 years designing radio systems, shouldn't they be signed? Shouldn't your Verilog be:
module power(input signed [11-1:0] xi,xq,output signed [22-1:0] y);
assign y = xi*xi + xq*xq;
endmodule
That is why I chose the signed type from numeric_std, not the unsigned type.
Synthesis
Well, I've waffled on about simulators, but you asked about synthesis. And, to be frank, I don't know what a synthesiser would do. But, given the job of a synthesiser is to design a logic circuit that behaves exactly like the simulation, you would think that any self-respecting synthesiser would use the same bit-widths as the simulator. So, I'm pretty sure that's your answer.

Eliminating unused bits: creating synthesisable multidimensional arrays of with different dimensions

This is a follow-on question from How can I iteratively create buses of parameterized size to connect modules also iteratively created?. The answer is too complex to answer in a comment and the solution may be helpful for other SOs. This question is following the self-answer format. Addition answer are encouraged.
The following code works and uses a bi-directional array.
module Multiplier #(parameter M = 4, parameter N = 4)(
input [M-1:0] A, //Input A, size M
input [N-1:0] B, //Input B, size N
output [M+N-1:0] P ); //Output P (product), size M+N
wire [M+N-1:0] PP [N-1:0]; // Partial Product array
assign PP[0] = { {N{1'b0}} , { A & {M{B[0]}} } }; // Pad upper bits with 0s
assign P = PP[N-1]; // Product
genvar i;
generate
for (i=1; i < N; i=i+1)
begin: addPartialProduct
wire [M+i-1:0] gA,gB,gS; wire Cout;
assign gA = { A & {M{B[i]}} , {i{1'b0}} };
assign gB = PP[i-1][M+i-1:0];
assign PP[i] = { {(N-i){1'b0}}, Cout, gS}; // Pad upper bits with 0s
RippleCarryAdder#(M+i) adder( .A(gA), .B(gB), .S(gS), .Cin(1'b0), .* );
end
endgenerate
endmodule
Some of the bits are never used, such as PP[0][M+N-1:M+1]. A synthesizer will usually remove these bits during optimization and possibly give a warning. Some synthesizers are not advance enough to do this correctly. To resolve this, the designer must implement extra logic. In this example the parameter for all the RippleCarryAdder's would be set to M+N. The extra logic wastes area and potently degrades performance.
How can the unused bits be safely eliminated? Can multidimensional arrays with different dimensions be used? Will the end code be readable and debug-able?
Can multidimensional arrays with different dimensions be used?
Short answer, NO.
Verilog does not support unique sized multidimensional arrays. SystemVerilog does support dynamic arrays however these cannot be connected to module ports and cannot be synthesized.
Embedded code (such as Perl's EP3, Ruby's eRuby/ruby_it, Python's prepro, etc.) can generate custom denominational arrays and code iterations, but the parameters must be hard coded before compile. The final value of any parameter of a given instance is discoverer during compile time, well after the embedded script is ran. The parameter must be treated as a global constant, therefore Multiplier#(4,4) and Multiplier#(8,8) cannot exist in the same project unless to teach the script how to extract the full hierarchy and parameters of the project. (Good luck coding and maintaining that).
How can the unused bits be safely eliminated?
If the synthesizer is not advance enough to exclude unused bits on its own, then the bits can be optimized by flattening the multidimensional array into a one-dimensional array with intelligent part-select. The trick is finding the equation which can be achieved by following these steps:
Find the pattern of the lsb index for each part part select:
Assume M is 4, the lsb for each part-select are 0, 5, 11, 18, 26, 35, .... Plug this pattern into WolframAlpha to find the equation a(n) = (n-1)*(n+8)/2.
Repeat with M equal to 3 for the pattern 0, 4, 9, 15, ... to get equation a(n)=(n-1)*(n+6)/2
Repeat again with M equal to 5 for the pattern 0, 6, 13, 21, 30, ... to get equation a(n)=(n-1)*(n+10)/2.
Since the relation of M and N is linear (i.e. multiple; no exponential, logarithmic, etc.), only two equations are needed to create a variable parameter M equation. For non-linear equations more data-point equations are recommended. In this case note that for M=3,4,5 the pattern (n+6),(n+8),(n+10), therefore the generic equation can be derived to: lsb(n)=(n-1)*(n+2*M)/2
Fine the pattern of the msb index for each part select:
Use the same process of as finding the lsb (ends up being msb(n)=(n**2+(M*2+1)*n-2)/2). Or define the msb in terms of lsb: msb(n)=lsb(n+1)-1
IEEE std 1364-2001 (Verilog 2001) introduced macros with arguments and indexed part-select; see § 19.3.1 '`define' and § 4.2.1 'Vector bit-select and part-select addressing' respectively. Or see IEEE std 1800-2012 § 22.5.1 '`define' and § 11.5.1 'Vector bit-select and part-select addressing' respectively. This answer assumes that these features are supported by the SO's simulator and synthesizer since the generate keyword was also introduced in IEEE std 1364-2001, see § 12.1.3 'Generated instantiation' (and IEEE std 1800-2012 § 27. 'Generate constructs'). For tools that are not fully support IEEE std 1364-2001, see `ifdef examples provided here.
Since the functions to calculate the part-select ranges are frequently used, use `define macros with arguments. This will help prevent copy/paste bugs. The extra sets of () in the macro definitions are to insure proper order of operations. It is also a good idea to `undef the macros at the end of the module definition, preventing the global space from getting polluted. With the flattened array it may become challenging to debug. By defining pass-through connections within the generate block's for-loop the signal can become readable and can be probed in waveform.
module Multiplier #(parameter M = 4, parameter N = 4)(
input [M-1:0] A, //Input A, size M
input [N-1:0] B, //Input B, size N
output [M+N-1:0] P ); //Output P (product), size M+N
// global space macros
`define calc_pp_lsb(n) (((n)-1)*((n)+2*M)/2)
`define calc_pp_msb(n) (`calc_pp_lsb(n+1)-1)
`define calc_pp_range(n) `calc_pp_lsb(n) +: (M+n)
wire [`calc_pp_msb(N):0] PP; // Partial Product
assign PP[`calc_pp_range(1)] = { 1'b0 , { A & {M{B[0]}} } };
assign P = PP[`calc_pp_range(N)]; // Product
genvar i;
generate
for (i=1; i < N; i=i+1)
begin: addPartialProduct
wire [M+i-1:0] gA,gB,gS; wire Cout;
assign gA = PP[`calc_pp_range(i)];
assign gB = { A & {M{B[i]}} , {i{1'b0}} };
assign PP[`calc_pp_range(i+1)] = {Cout,gS};
RippleCarryAdder#(M+i) adder( .A(gA), .B(gB), .S(gS), .Cin (1'b0), .* );
end
endgenerate
// Cleanup global space
`undef calc_pp_range
`undef calc_pp_msb
`undef calc_pp_lsb
endmodule
Working example with side-by-side and test bench: http://www.edaplayground.com/s/6/591
Will the end code be readable and debug-able?
Yes, for anyone who has already learned how to properly use the generate construct. The generate block's for-loop defines local wires which are confined to scope of the loop index. gA form loop-0 and gA from loop-1 are unique signals and cannot interact with each other. The local signals can be probed in waveform which is great for debugging.

Resources