How to dynamically reverse the bit position in verilog? - verilog

wire [9:0] data_reg;
reg [3:0] Reverse_Count = 8; //This register is derived in logic and I need to use it in following logic in order to reverse the bit position.
assign data_reg[9:0] = 10'h88; // Data Register
genvar i;
for (i=0; i< Reverse_Count; i=i+1)
assign IReg_swiz[i] = IReg[Reverse_Count - 1 -i];
This is generating syntax error. May I know how to do this in verilog

If you'd have Reverse_Count as constant, your task boils down to just wire mix-up, which is essentially free in HDL.
In your case, the task can be nicely reduced to first mirroring wide data and then shifting by Reverse_Count to get LBS bit on its position, which itself is done just by a row of N-to-1 multiplexers.
integer i;
reg [9:0] reversed;
wire [9:0] result;
// mirror bits in wide 10-bit value
always #*
for(i=0;i<10;i=i+1)
reversed[i] = data_reg[9-i];
// settle LSB on its place
assign result = reversed>>(10-Reverse_Count);

Reverse_Count is not a constant, ie it is not a parameter or localparam.
This means that the generate statement you would be creating and destroying hardware as required, this is not allowed in verilog as it would not be possible in hardware.
The Bus that your reversing should have a fixed width at compile time, it should be possible to declare Reverse_Count as a parameter.

Since the value of Reverse_Count dunamic, you cannot use a generate statement. You can use an always block with for-loop. To be synthesizable, the for-loop needs able to static unroll. To decide which bits reverse, use an if condition to compare the indexing value and Reverse_Count
Example:
parameter MAX = 10;
reg [MAX-1:0] IReg_swiz;
integer i;
always #* begin
for (i=0; i < MAX ; i=i+1) begin
if (i < Reverse_Count) begin
IReg_swiz[i] = IReg[Reverse_Count - 1 -i];
end
else begin
// All bits need to be assigned or complex latching logic will be inferred.
IReg_swiz[i] = IReg[i]; // Other values okay depending on your requirements.
end
end
end

Related

what will be a good way to write 10bits decoder?

I am trying to write a 10 bits binary to thermal decoder.
For a 4 bits decoder, it is relative straightforward as shown below. However, for 10 bits, is there a smarter way to do it instead of write 1000 line code.
module decoder(in,out);
input [3:0] in;
output [15:0] out;
// input enable;
reg [15:0] out;
always #(in) begin
casez(in)
4'h1: out=16'b0000000000000001;
4'h2: out=16'b0000000000000011;
4'h3: out=16'b0000000000000111;
4'h4: out=16'b0000000000001111;
4'h5: out=16'b0000000000011111;
4'h6: out=16'b0000000000111111;
4'h7: out=16'b0000000001111111;
4'h8: out=16'b0000000011111111;
4'h9: out=16'b0000000111111111;
4'hA: out=16'b0000001111111111;
4'hB: out=16'b0000011111111111;
4'hC: out=16'b0000111111111111;
4'hD: out=16'b0001111111111111;
4'hE: out=16'b0011111111111111;
4'hF: out=16'b0111111111111111;
default: out=16'h0000;
endcase
end
endmodule
Yes, you could make this module fully parametrizable by using an unrollable for-loop. This loop would check for each bit of the out signal, if the loop-index is still smaller than the binary input signal.
The code would look like this:
module decoder #(
parameter IN_W = 10,
parameter OUT_W = 1 << IN_W
)
(
input [IN_W-1:0] in,
output reg [OUT_W-1:0] out
);
integer i;
always #* begin
// Use an unrollable loop.
for (i = 0; i < OUT_W; i++) begin
// (in < i) returns a 1-bit value
out[i] = (in < i);
end
end
endmodule
As you probably noticed, I also changed the way the ports are declared. In Verilog 2001—and also in more recent (System)Verilog versions—the port list and port declaration may be combined. This newer syntax, also known as ANSI-style, has the benefit that you don't need to add as much boilerplate code.

How can you output a constant value in Verilog?

I am trying to output an array of 1280 bits, each 10 bits long, with the numbers 0->128.
I heard localparam may be the best option, but it seems like a strange request, so I'm wondering if anyone with experience may be able to help me.
Thanks
You can create a function that provides a constant value to a localparam or any other signal.
wire [1279:0] signal;
assign signal = pattern(0);
function [1279:0] pattern(input arg); // Verilog requires at least one argument to a function
integer i;
begin
for (i=0;i<128;i=i+1)
pattern[i*10 +:10] = i;
end
endfunction
SystemVerilog:
wire [1279:0] signal;
assign signal = pattern();
function bit [1279:0] pattern();
for (int i=0;i<128;i++)
pattern[i*10 +:10] = i;
endfunction
You can use a for-loop in an initial or reset statement:
reg [0:1279] big_vector;
integer i;
// here you need an initial
// or a reset section
for (i=0; i<128; i=i+1)
big_vector[ i*128 +: 10] = i;
If you do not touch/change big_vector the synthesis tool will convert it to a constant.

Will temp variable in always_comb create latch

I have following code snippet where a temp variable is used to count number of 1s in an array:
// count the number 1s in array
logic [5:0] count_v; //temp
always_comb begin
count_v = arr[0];
if (valid) begin
for (int i=1; i<=31; i++) begin
count_v = arr[i] + count_v;
end
end
final_count = count_v;
end
Will this logic create a latch for count_v ? Is synthesis tool smart enough to properly synthesize this logic? I am struggling to find any coding recommendation for these kind of scenarios.
Another example:
logic temp; // temp variable
always_comb begin
temp = 0;
for (int i=0; i<32; i++) begin
if (i>=start) begin
out_data[temp*8 +: 8] = in_data[i*8 +: 8];
temp = temp + 1'b1;
end
end
end
For any always block with deterministic initial assignment, it will not generate latch except logic loop.
Sorry Eddy Yau, we seem to have some discussions going on regarding your post.
Here is some example code:
module latch_or_not (
input cond,
input [3:0] v_in,
output reg latch,
output reg [2:0] comb1,
output reg [2:0] comb2
);
reg [2:0] temp;
reg [2:0] comb_loop;
// Make a latch
always #( * )
if (cond)
latch = v_in[0];
always #( * )
begin : aw1
integer i;
for (i=0; i<4; i=i+1)
comb_loop = comb_loop + v_in[i];
comb2 = comb_loop;
end
always #( * )
begin : aw2
integer i;
temp = 7;
for (i=0; i<4; i=i+1)
temp = temp - v_in[i];
comb1 = temp;
end
endmodule
This is what came out if it according to the Xilinx Vivado tool after elaboration:
The 'latch' output is obvious. You will also notice that temp is not present in the end result.
The 'comb_loop' is not a latch but even worse: it is a combinatorial loop. The output of the logic goes back to the input. A definitely NO-NO!
General rule: if you read a variable before writing to it, then your code implies memory of some sort. In this case, both the simulator and synthesiser have to implement storage of a previous value, so a synthesiser will give you a register or latch. Both your examples write to the temporary before reading it, so no storage is implied.
Does it synthesisie? Try it and see. I've seen lots of this sort of thing in production code, and it works (with the synths I've used), but I don't do it myself. I would try it, see what logic is created, and use that to decide whether you need to think more about it. Counting set bits is easy without a loop, but the count loop will almost certainly work with your synth. The second example may be more problematical.

Multiplying two 32 bit numbers using 32 bit carry look ahead adder

I have tried to write the code in Verilog to multiply two 32 bit binary numbers using a 32 bit carry look ahead adder but my program fails to compile. the generate if condition must be a constant expression error keeps on coming in Modelsim for the part 'if(store[0]==1)' and 'if(C[32]==1)'
This is the algorithm that I followed:
Begin Program
Multiplier = 32 bits
Multiplicand = 32 bits
Register = 64 bits
Put the multiplier in the least significant half and clear
the most significant half
For i = 1 to 32
Begin Loop
If the least significant bit of the 64-bit register
contains binary ‘1’
Begin If
Add the Multiplicand to the Most Significant
Half using the CLAA
Begin Adder
C[0 ] = ’0’
For j = 0 to 31
Begin Loop
Calculate Propagate P[j] = Multiplicand[j]^ Most Significant Half[j]
Calculate Generate G[j] =
Multiplicand[j]·Most Significant Half[j]
Calculate Carries C[i + 1] = G[i] + P[i] ·
C[i]
Calculate Sum S[i] = P[i] Å C[i]
End Loop
End Adder
Shift the 64-bit Register one bit to the right
throwing away the least significant bit
Else
Only Shift the 64-bit Register one bit to the
right throwing away the least significant bit
End If
End Loop
Register = Sum of Partial Products
End Program
Code:
module Multiplier_32(multiplier,multiplicand,store);
output store;
input [31:0]multiplier,multiplicand;
wire [63:0]store;
genvar i,j;
wire g=32;
wire [31:0]P,G,sum;
wire [32:0]C;
assign store[31:0]=multiplier;
generate for(i=0;i<32;i=i+1)
begin
if(store[0]==1)
begin
assign C[0]=0;
for(j=0;j<32;j=j+1)
begin
assign P[j]= multiplicand[j]^store[g];
assign G[j]=multiplicand[j]&store[g];
assign C[j+1]=G[i]|(P[i]&C[j]);
assign sum[j]=P[i]^C[j];
assign g=g-1;
end
assign store[63:32]=sum[31:0];
if(C[32]==1)
begin
assign store[62:0]=store[63:1];
assign store[63]=1;
end
else
begin
assign store[62:0]=store[63:1];
assign store[63]=0;
end
end
else
begin
assign store[62:0]=store[63:1];
assign store[63]=0;
end
end
endgenerate
endmodule
A generate block is evaluated at compile/elaboration time. They are used to construct hardware from patterns and not to evaluate logic. The value of store[0], C[32], and all other signals are unknown at this time. The only know values are parameters and genvars.
In this case, a combinational block (always #*) will fulfill your functionality requirements. Replace all your wire with reg, but all your assignments inside a always #*, and remove all the assign keywords (assign should not be used inside an always block).
module Multiplier_32(
input [31:0] multiplier, multiplicand,
output reg [63:0] store
);
integer i,j;
integer g;
reg [31:0] P,G,sum;
reg [32:0] C;
always #* begin
g = 32;
store[31:0]=multiplier;
for(i=0;i<32;i=i+1) begin
// your code here, do not use 'assign'
end
end
endmodule

Why is adding one operation causing my number of logic elements to skyrocket?

I'm designing a 464 order FIR filter in Verilog for use on the Altera DE0 FPGA. I've got (what I believe to be) a working implementation; however, there's one small issue that's really actually given me quite a headache. The basic operation works like this: A 10 bit number is sent from a micro controller and stored in datastore. The FPGA then filters the data, and lights LED1 if the data is near 100, and off if it's near 50. LED2 is on when the data is neither 100 nor 50, or the filter hasn't filled the buffer yet.
In the specification, the coefficients (which have been pre provided), have been multiplied by 2^15 in order to represent them as integers. Therefore, I need to divide my final output Y by 2^15. I have implemented this using a shift, since it should be (?) the most efficient way. However, this single line causes my number of logic elements to jump from ~11,000 without it, to over 35,000. The Altera DE0 uses a Cyclone III FPGA which only has room for about 15k logic elements. I've tried doing it inside both combinational and sequential logic blocks, both of which have the same exact issue.
Why is this single, seemingly simple operation causing such an inflation elements? I'll include my code, which I'm sure isn't the most efficient, nor the cleanest. I don't care about optimizing this design for performance or area/density at all. I just want to be able to fit it onto the FPGA so it'll run. I'm not very experienced in HDL design, and this is by far the most complex project I've needed to tackle. It's worth noting that I do not remove y completely, I replace the "bad" line with assign YY = y;.
Just as a note: I haven't included all of the coefficients, for sanity's sake. I know there might be a better way to do it than using case statements, but it's the way that it came and I don't really want to relocate 464 elements to a parameter declaration, etc.
module lab5 (LED1, LED2, handshake, reset, data_clock, datastore, bit_out, clk);
// NUMBER OF COEFFICIENTS (465)
// (Change this to a small value for initial testing and debugging,
// otherwise it will take ~4 minutes to load your program on the FPGA.)
parameter NUMCOEFFICIENTS = 465;
// DEFINE ALL REGISTERS AND WIRES HERE
reg [11:0] coeffIndex; // Coefficient index of FIR filter
reg signed [16:0] coefficient; // Coefficient of FIR filter for index coeffIndex
reg signed [16:0] out; // Register used for coefficient calculation
reg signed [31:0] y;
wire signed [7:0] YY;
reg [9:0] xn [0:464]; // Integer array for holding x
integer i;
output reg LED1, LED2;
// Added values from part 1
input reset, handshake, clk, data_clock, bit_out;
output reg [9:0] datastore;
integer k;
reg sent;
initial
begin
sent = 0;
i=0;
datastore = 10'b0000000000;
y=0;
LED1 = 0;
LED2 = 0;
for (i=0; i<NUMCOEFFICIENTS; i=i+1)
begin
xn[i] = 0;
end
end
always#(posedge data_clock)
begin
if(handshake)
begin
if(bit_out)
begin
datastore = datastore >> 1;
datastore [9] = 1;
end
else
begin
datastore = datastore >> 1;
datastore [9] = 0;
end
end
end
always#(negedge clk)
begin
if (!handshake )
begin
if(!sent)
begin
y=0;
for (i=NUMCOEFFICIENTS-1; i > 0; i=i-1) //shifts coeffecients
begin
xn[i] = xn[i-1];
end
xn[0] = datastore;
for (i=0; i<NUMCOEFFICIENTS; i=i+1)
begin
// Calculate coefficient based on the coeffIndex value. Note that coeffIndex is a signed value!
// (Note: These don't necessarily have to be blocking statements.)
case ( 464-i )
12'd0: out = 17'd442; // This coefficient should be multiplied with the oldest input value
12'd1: out = -17'd373;
12'd2: out = -17'd169;
...
12'd463: out = -17'd373; //-17'd373
12'd464: out = 17'd442; //17'd442
// This coefficient should be multiplied with the most recent data input
// This should never occur.
default: out = 17'h0000;
endcase
y = y + (out * xn[i]);
end
sent = 1;
end
end
else if (handshake)
begin
sent = 0;
end
end
assign YY = (y>>>15); //THIS IS THE LINE THAT IS CAUSING THE ISSUE!
always #(YY)
begin
LED1 = 0;
LED2 = 1;
if ((YY >= 40) && (YY <= 60))
begin
LED1 <= 0;
LED2 <= 0;
end
if ((YY >= 90) && (YY <= 110))
begin
LED1 <= 1;
LED2 <= 0;
end
end
endmodule
You're almost certainly seeing the effects of synthesis optimisation.
The following line is the only place that uses y:
assign YY = (y>>>15); //THIS IS THE LINE THAT IS CAUSING THE ISSUE!
If you remove this line, all the logic that feeds into y (including out and xn) will be removed. On Altera you want to look carefully through your map report which will contain (buried amongst a million other things) information about all the logic that Quartus has removed and the reason behind it.
Good places to start are the Port Connectivity Checks which will tell you if any inputs or outputs are stuck high or low or are dangling. The look through the Registers Removed During Synthesis section and Removed Registers Triggering Further Register Optimizations.
You can try to force Quartus not to remove redundant logic by using the following in your QSF:
set_instance_assignment -name preserve_fanout_free_node on -to reg
set_instance_assignment -name preserve_register on -to foo
In your case however it sounds like the correct solution is to re-factor the code rather than try to preserve redundant logic. I suspect you want to investigate using an embedded RAM to store the coefficients.
(In addition to Chiggs' answer, assuming that you are hooking up YY correctly ....)
I would add that, you don't need >>>. It would be simpler to write :
assign YY = y[22:15];
And BTW, initial blocks are ignored for synthesis. So, you want to move that initialization to the respective always blocks in a if (reset) or if (handshake) section.

Resources