This may seem like a rather stupid question, but the transition from software to HDL's is sometimes rather frustrating initially!
My problem: I have an array multiplication I am trying to accomplish in Verilog. This is a multiplication of two arrays (point by point) which are of length 200 each. The following code worked fine in the testbench:
for (k=0; k<200; k=k+1)
result <= result + A[k] * B[k];
But it doesn't even come close to working in the Verilog module. I thought the reason was because the operation should take place over many many clock cycles. Since it involves writing out 200 multiplications and 199 additions if I do it by hand (!), I was wondering if there was a trick in making the for loop work (and be synthesizable)?
Thanks,
Faisal.
You don't want to use a for loop there, you want to use a block of clocked logic. For loops are only for describing parallel structures that don't feedback on themselves, they are only rarely useful and not for the same kinds of things you would use them for in a software program.
To do what you're trying to achieve, you should have a block like so:
always #(posedge clk or posedge reset)
if (reset) begin
result <= 0;
k <= 0;
result_done <= 0;
end else begin
result <= result_done ? result : (result + A[k] * B[k]);
k <= result_done ? k : k + 1;
result_done <= result_done ? 1 : (k == 200);
end
end
This zeros the result on reset, adds A[k] * B[k] to a sum for 200 clocks, and then stops counting when k == 200 and asserts a 'done' signal.
Related
now I know in Verilog, to make a sequential logic you would almost always have use the non-blocking assignment (<=) in an always block. But does this rule also apply to internal variables? If blocking assignments were to be used for internal variables in an always block would it make it comb or seq logic?
So, for example, I'm trying to code a sequential prescaler module. It's output will only be a positive pulse of one clk period duration. It'll have a parameter value that will be the prescaler (how many clock cycles to divide the clk) and a counter variable to keep track of it.
I have count's assignments to be blocking assignments but the output, q to be non-blocking. For simulation purposes, the code works; the output of q is just the way I want it to be. If I change the assignments to be non-blocking, the output of q only works correctly for the 1st cycle of the parameter length, and then stays 0 forever for some reason (this might be because of the way its coded but, I can't seem to think of another way to code it). So is the way the code is right now behaving as a combinational or sequential logic? And, is this an acceptable thing to do in the industry? And is this synthesizable?
```
module scan_rate2(q, clk, reset_bar);
//I/O's
input clk;
input reset_bar;
output reg q;
//internal constants/variables
parameter prescaler = 8;
integer count = prescaler;
always #(posedge clk) begin
if(reset_bar == 0)
q <= 1'b0;
else begin
if (count == 0) begin
q <= 1'b1;
count = prescaler;
end
else
q <= 1'b0;
end
count = count - 1;
end
endmodule
```
You should follow the industry practice which tells you to use non-blocking assignments for all outputs of the sequential logic. The only exclusion are temporary vars which are used to help in evaluation of complex expressions in sequential logic, provided that they are used only in a single block.
In you case using 'blocking' for the 'counter' will cause mismatch in synthesis behavior. Synthesis will create flops for both q and count. However, in your case with blocking assignment the count will be decremented immediately after it is being assigned the prescaled value, whether after synthesis, it will happen next cycle only.
So, you need a non-blocking. BTW initializing 'count' within declaration might work in fpga synthesis, but does not work in schematic synthesis, so it is better to initialize it differently. Unless I misinterpreted your intent, it should look like the following.
integer count;
always #(posedge clk) begin
if(reset_bar == 0) begin
q <= 1'b0;
counter <= prescaler - 1;
end
else begin
if (count == 0) begin
q <= 1'b1;
count <= prescaler -1;
end
else begin
q <= 1'b0;
count <= count - 1;
end
end
end
You do not need temp vars there, but you for the illustration it can be done as the following:
...
integer tmp;
always ...
else begin
q <= 1'b0;
tmp = count - 1; // you should use blocking here
count <= tmp; // but here you should still use NBA
end
I am trying to implement a module in my project for which i need the final value to be stable for a while, hence implemented as below. both of them are showing the same result in simulation. will the tool generate same hardware or different one?
always #(posedge clk) begin
if(en)
count <= count + 1;
else
begin
a <= count;
count <= 0;
end
if(count == 0) b <= a;
end
what is the difference between above coding style and the one below? Does it make any difference while synthesis?
always #(posedge clk) begin
if(en)
count <= count + 1;
else
begin
a <= count;
count <= 0;
end
end
always #(posedge clk) begin
if(count == 0)
b <= a;
end
And I am using Vivado 2015.4 tool for synthesis.
It will generate the same hardware output. It doesn't matter if you split clocked statements into one or multiple always-statements, as long as they are functionally identical.
will the tool generate same hardware or different one?
Click "open elaborated design" in Vivado and see for yourself!
But what you'll find is: they're equivalent. No difference whatsoever.
I have two push buttons (using Basys2 rev C board) and I want to increment a register (counter) when I push one of them. I used this:
always #( posedge pb1 or posedge pb2 )
begin
if(count2==9) count2=0;
else count2= count2+1;
end
but when I implemented it (using ISE 9.2), an error appeared:
The logic for does not match a known FF or Latch template.
However when I tried it using just one event (posedge pb1), it worked.
So why did that happen?
The error message means that the target technology (I am guessing in your case is an FPGA or CPLD) doesn't have the physical circuit required to implement the functionality you described with this behavioural code.
One of the important things to consider when writing synthesizable RTL (verilog or VHDL) is you are describing an electronic circuit. You should understand what real world logic you are trying implement (combinatorial logic, registers) before you start coding. In this case, you are describing a register with two separate clocks--something that doesn't exist in any FPGA or ASIC library I've seen. If you can't figure out what you're trying to implement, the chances are the synthesizer can't either.
In other words, not everything you can describe in Verilog can be translated into an actual circuit.
The solution depends on what you want to do - if you require that the counter increments on both pb1 and pb2 rising edges, irrespective of the other pbs state, I would look into solutions which use another (independent) clock (clk in the code below) - something like this:
reg old_pb1, old_pb2;
always # (posedge clk) begin
if (old_pb1 == 0 && pb1 == 1)
if(count2==9) count2 = 0;
else count2 <= count2 + 1;
if (old_pb2 == 0 && pb2 == 1)
if(count2==9) count2 = 0;
else count2 <= count2 + 1;
old_pb1 <= pb1;
old_pb2 <= pb2;
end
If you have no other clock, you could also combine both input signals like in this example:
wire pbs = pb1 | pb2;
always # (pbs) begin
if(count2==9) count2 <= 0;
else count2 <= count2 + 1;
end
Another option would be to use independent counters for the inputs:
always # (posedge pb1)
begin
if(count_pb1==9) count_pb1 <= 0;
else count_pb1 <= count_pb1 + 1;
end
always # (posedge pb2)
begin
if(count_pb2==9) count_pb2 <= 0;
else count_pb2 <= count_pb2 + 1;
end
wire [4:0] count2 = count_pb1 + count_pb2;
All options have their own restrictions, limitations and drawbacks, therefore it depends heavily on what you want to do. Corner cases matter.
Note that I put these example codes together without testing them - please let me know in a comment if you are having trouble with any of them and I look into it.
I have this generate block below which I think should work, but I am seeing issues with the always #(*) part under the else block. When using VCS, temp_in[i+1][j] is assigned 'x' always. I expect it to be set to '0'. If I instantiate a module/gate instead of always block, like I did for the if part, then it works correctly. Googling for the right syntax for using foreach, generate, always and if within a single block does not yield any useful results. I know the fix is a minor change but I am not that familiar with all the language constructs, so I will appreciate any help.
ceil() is a function which returns an integer. It uses only parameters which are fixed at compile time, so I expect the loop unrolling to happen correctly.
genvar i, j, k;
generate
for (i = 0; i < NUM_STAGES; i = i + 1) begin:gen_stage
for (j = 0; j < (TOTAL_LENGTH/(2**(i+1))); j = j + 1) begin:gen_or
if(j < ceil(i)) begin
for (k = 0; k < CPU_DATA_WIDTH; k = k + 1) begin:gen_bit
msw_mem_out_mux_bit_or U_msw_mem_out_mux_bit_or (
.in_1 (temp_in[i][2*j][k]),
.in_2 (temp_in[i][(2*j)+1][k]),
.out (temp_in[i+1][j][k])
);
end
end else begin
always #(*) begin
temp_in[i+1][j] = {CPU_DATA_WIDTH{1'b0}};
end
end
end
end
endgenerate
An always #* waits until a change occurs on a signal in the inferred sensitivity list. i and j are constants (from the perspective of simulation time when always #* is evaluating), so the your always block has no signals in the sensitivity list.
If using SystemVerilog, change always #* to always_comb which will run at time 0. For Verilog, add an initial block.
Reference: IEEE Std 1800-2012 ยง 9.2.2.2.2 always_comb compared to always #*
I am trying to read the values of memory after 5 cycles into an output register in verilog. How do I do that?
For example if i have a code which looks like this,
reg[31:0] mem[0:5];
if(high==1)
begin
newcount1<=count2;
mem[i]<=newcount1;
i<=i+1;
count2=0;
end
After the 5 cycles of operation whatever mem values i get, how do i read them in another output register? and can i perform averaging operation on those 5 cycles? and get a nominal value?
Say Your Memory Data Out is mem_data and you want it Read out in mem_data_out with Latency of 5 Cycles.
parameter MDP_Latency = 4;
reg [31:0] mem_data_out;
reg [31:0] [MDP_Latency - 1 : 0] mem_data_out_temp;
always#(posedge clk) begin
if(!reset) begin
for(int i = 0; i < MDP_Latency; i ++)
begin
mem_data_out <= 'd0;
mem_data_out_temp[i] <= 'd0;
end
end
else
begin
for(int i = 0; i < MDP_Latency; i ++)
begin
if(i == 0)
begin
mem_data_out_temp[i] <= mem_data;
end
else
begin
mem_data_out_temp[i] <= mem_data_out_temp[i - 1];
end
end
mem_data_out <= mem_data_out_temp[MDP_Latency];
end
end
`
Lets have a look at the posted code:
reg[31:0] mem[0:5];
if(high==1)
begin
newcount1<=count2;
mem[i]<=newcount1;
i<=i+1;
count2=0;
end
The lack of indentation makes it hard to read, i is not declared. The memory is actually 6 locations, 0 to 5.
You have conditionals and assignment not insides and initial or always block.
I am not sure what you are doing with count2 but mixing blocking and non-blocking is considered bad practise, it can be done but you must be really careful to not cause RTL to gates mismatch.
User1932872 Has posted an answer using for loops it looks like a valid answer but I think loops at this stage over complicate learning and understanding what you are creating in HDLs. While learning I would avoid such features and only uses them once comfortable with the whole flow.
reg[31:0] mem[0:4]; //5 Locations
always #(posedge clk) begin //Clked process assignmnets use non-blocking(<=)
mem[0]<=newcount1;
mem[1]<=mem[0];
mem[2]<=mem[1];
mem[3]<=mem[2];
mem[4]<=mem[3];
end
With this structure we can see the pipeline of data though mem[0] to mem[4]. We are implying 5 flip-flops where the output of the first drives data into the next. A combinatorial sum of them all could be:
reg [31+4:0] sum; //The +4 allows for bitgrowth you may need to truncate or limit.
always #* begin // Combinatorial use blocking (=)
sum = mem[0] + mem[1] + mem[2] + mem[3] + mem[4];
end