I have a 1023 bit vector in Verilog. All I want to do is check if the ith bit is 1 and if it is 1 , I have to add 'i' to another variable .
In C , it would be something like :
int sum=0;
int i=0;
for(i=0;i<1023;i++) {
if(a[i]==1) {
sum=sum+i;
}
Of course , the addition that I am doing is over a Galois Field . So, I have a module called Galois_Field_Adder to do the computation .
So, my question now is how do I conditionally check if a specific bit is 1 and if so call my module to do that specific addition .
NOTE: The 1023 bit vector is declared as an input .
It's hard to answer your question without seeing your module, as we can't gage where you are in your Verilog. You always have to think of how your code translates in gates. If we want to translate your C code into synthesizable logic, we can take the same algorithm, go through each bit one after the other, and add to the sum depending on each bit. You would use something like this:
module gallois (
input wire clk,
input wire rst,
input wire [1022:0] a,
input wire a_valid,
output reg [18:0] sum,
output reg sum_valid
);
reg [9:0] cnt;
reg [1021:0] shift_a;
always #(posedge clk)
if (rst)
begin
sum[18:0] <= {19{1'bx}};
sum_valid <= 1'b0;
cnt[9:0] <= 10'd0;
shift_a[1021:0] <= {1022{1'bx}};
end
else
if (a_valid)
begin
sum[18:0] <= 19'd0;
sum_valid <= 1'b0;
cnt[9:0] <= 10'd1;
shift_a[1021:0] <= a[1022:1];
end
else if (cnt[9:0])
begin
if (cnt[9:0] == 10'd1022)
begin
sum_valid <= 1'b1;
cnt[9:0] <= 10'd0;
end
else
cnt[9:0] <= cnt[9:0] + 10'd1;
if (shift_a[0])
sum[18:0] <= sum[18:0] + cnt[9:0];
shift_a[1021:0] <= {1'bx, shift_a[1021:1]};
end
endmodule
You will get your result after 1023 clock cycles. This code needs to be modified depending on what goes around it, what interface you want etc...
Of importance here is that we use a shift register to test each bit, so that the logic adding your sum only takes shift_a[0], sum and cnt as an input.
Code based on the following would also work in simulation:
if (a[cnt[9:0])
sum[18:0] <= sum[18:0] + cnt[9:0];
but the logic adding to sum would in effect take all 1023 bits of a[] as an input. This would be quite hard to turn into actual lookup tables.
In simulation, you can also implement something very crude such as this:
reg [1022:0]a;
reg [9:0] sum;
integer i;
always #(a)
begin
sum[9:0] = 10'd0;
for (i=0; i < 1023; i=i+1)
if (a[i])
sum[9:0] = sum[9:0] + i;
end
If you were to try to synthesize this, sum would actually turn into a chunk of combinatorial logic, as the 'always' block doesn't rely on a clock. This code is in fact equivalent to this:
always #(a)
case(a):
1023'd0: sum[18:0] = 19'd0;
1023'd1: sum[18:0] = 19'd1;
1023'd2: sum[18:0] = 19'd3;
etc...
Needless to say that a lookup table with 1023 input bits is a VERY big memory...
Then if you want to improve your code, and use your FPGA as an FPGA and not like a CPU, you need to start thinking about parallelism, for instance working in parallel on different ranges of your input a. But this is another thread...
Related
I have a conceptual question about FSM's and whether or not the following code is a true FSM. This is for my own curiosity and understanding about the subject. When I wrote this code, I was under the impression that this was an FSM, but now I am not so sure. According to a lot of reading I have done, a TRUE FSM should only consist of a sequential state transition block, followed by either one or two combination blocks to calculate the next state and the output logic.
My current code synthesizes and works on my Basys3 board, so I understand there may be an argument for "if it ain't broke, don't fix it" but it's been bugging me for a while now that I may have an incorrect understanding about how to write an FSM in HDL.
I have tried at least 4 or 5 different ways to rewrite my code using the format mentioned above, but I can't seem to get it without inferring latches, mainly due to the use of a counter, and the fact that pckd_bcd needs to remember its previous value.
Could somebody please explain to me either why this algorithm isn't fit for a true FSM, why it is fit for a true FSM and where my misunderstanding is, or how to rewrite this into the format mentioned above.
module double_dabble #
(
parameter NUM_BITS = 8
)
(
input wire i_clk,
input wire i_rst,
input wire [NUM_BITS-1:0] i_binary,
output wire [15:0] o_pckd_bcd
);
localparam s_IDLE = 3'b000;
localparam s_SHIFT = 3'b001;
localparam s_CHECK = 3'b010;
localparam s_ADD = 3'b011;
localparam s_DONE = 3'b100;
reg [2:0] state;
reg [NUM_BITS-1:0] bin;
reg [3:0] count;
reg [15:0] pckd_bcd;
reg [NUM_BITS-1:0] input_reg;
reg [15:0] output_reg;
assign o_pckd_bcd = output_reg;
always #(posedge i_clk)
begin
if(i_rst)
begin
state <= s_IDLE;
bin <= 0;
count <= 0;
pckd_bcd <= 0;
output_reg <= 0;
input_reg <= 0;
end
else
begin
input_reg <= i_binary;
state <= s_IDLE;
// FSM
case(state)
s_IDLE :
begin
state <= s_IDLE;
bin <= 0;
count <= 0;
pckd_bcd <= 0;
if (input_reg!=i_binary)
begin
bin <= i_binary;
state <= s_SHIFT;
end
end
s_SHIFT :
begin
state <= s_CHECK;
bin <= bin<<1;
count <= count+1;
pckd_bcd <= pckd_bcd<<1;
pckd_bcd[0] <= bin[NUM_BITS-1];
end
s_CHECK :
begin
state <= s_ADD;
if(count>=NUM_BITS)
state <= s_DONE;
end
s_ADD :
begin
state <= s_SHIFT;
pckd_bcd[15:12] <= add_three(pckd_bcd[15:12]);
pckd_bcd[11:8] <= add_three(pckd_bcd[11:8]);
pckd_bcd[7:4] <= add_three(pckd_bcd[7:4]);
pckd_bcd[3:0] <= add_three(pckd_bcd[3:0]);
end
s_DONE :
begin
state <= s_IDLE;
output_reg <= pckd_bcd;
end
endcase
end
end
function [3:0] add_three;
input [3:0] bcd;
if (bcd > 4)
add_three = bcd +3;
else
add_three = bcd;
endfunction
endmodule
In general, your code looks like it implements an FSM. You have evidence that the design is working as desired on your board.
The coding style you have chosen (one always block) is a valid Verilog approach. Another common FSM coding style involves two always blocks: one sequential and one combinational. See here for an example. It is not mandatory for an FSM to use two always blocks in Verilog.
One thing that I do notice in your code is that you are making multiple nonblocking assignments to the state signal at the same time. That is unusual.
For example, right before the case statement, you have:
state <= s_IDLE;
Then, in the IDLE case-item, you have the same code:
state <= s_IDLE;
I recommend trying to clean that up by adding a default to the case statement. A default is also a good coding practice when you have some undefined states, as in your FSM. You have only defined 5 of the 8 possible states (state is a 3-bit register).
Another example of multiple nonblocking assignments is:
s_CHECK :
begin
state <= s_ADD;
if(count>=NUM_BITS)
state <= s_DONE;
end
That would be better coded as:
s_CHECK :
begin
state <= (count>=NUM_BITS) ? s_DONE : s_ADD;
end
Following recommended coding practices yields more predictable simulation and synthesis results.
module clks(
input clk,
output [15:0] led
);
wire div2, div4, div8;
reg [2:0] count = 0;
assign div2 = count[0];
assign div4 = count[1];
assign div8 = count[2];
always #(posedge clk) count = count + 1;
endmodule
How can I turn on each led (I have 15 leds) using clock?
I'm really having trouble finding helpful resources online
initial begin
case({count})
2'b00:
led = 15'b000000000000001;
2'b01:
led = 15'b000000000000010;
...
endcase
end
This didn't work.
Or could I do something like this?
led = led + 1;
In your sample code above, you defined count as 3 bits, but your case statements are 2 bits wide. Also, you don't want the initial statement, rather use an always statement.
always # (count)
begin
case(count)
3'b000 : led = 15'b000_0000_0001;
3'b001 : led = 15'b000_0000_0010;
...
endcase
end
I guess that 'by using clock' means changing the led every clock cycle, right? Also it looks like you are trying to encode the led sequentially. In this case you can do the following:
you need to reset your lead to an initial value, sey 15'b1;
every clock cycle you can just shift it left by one. You should not do it in an initial block (though there is a technical way to do so). Use always blocks:
Here is an example:
module clks(
input clk,
input reset,
output reg [15:0] led
);
always #(posedge clk) begin
if (reset == 1)
led <= 15'b1;
else
led <= led << 1;
end
endmodule
In the above case '1' will travel through all bits of led over 15 clock cycles once. 'led' will become '0' after this. You have to make sure that it becomes '1' again if you want to continue in cycles.
Another possibility is to initialize 'led' in the always block, but it is not always synthesizable. YOu do not need a reset signal here.
initial led = 15'b1;
always #(posedge clk) led <= led << 1;
so I have a module that does convolution, it takes a data input and the filter input , where input is array of 9 numbers , every posedge of the clk these two inputs are being multiplied and then added accumulatively, i.e I save every new multiplication product into a register. after each 9 iterations I have to save the result and reset it , but I have to do it in one clock cycle, since my new data is coming on the next posedge. So the issue that I am facing is how to not save data and reset the out without losing data? Please help if you have any suggestions. It also need to be mentioned that my conv_module is a sub-module and I will be instantiating it in a top module , so I have to access all the inputs and outputs from uptop.
This is the code that I've written so far, but it does not work the way I want it, cause I cannot tap the array of output numbers from the top module.
module mult_conv( input clk,
input rst,
input signed [4:0] a,
input signed[2:0] b,
output reg signed[7:0] out
);
wire signed [7:0] mult;
reg signed [7:0] sum;
reg [3:0] counter;
reg do_write;
reg [7:0] out_top;
assign mult = {{3{a[4]}},a} * {{5{b[2]}},b};
always #(posedge clk or posedge rst)
begin
if (rst)
begin
counter <= 4'h0;
addr <= 'h0;
sum <= 0;
do_write <= 1'b0;
end // rst
else
begin
if (counter == 4'h8)
begin // we have gathered 9 samples
counter <= 4'h0;
// start again so ignore old sum
sum <= mult;
out <= sum;
out_top <= out;
end
else
begin
counter <= counter + 4'h1;
// Add results
sum <= sum + mult;
out <= 0;
out_top <= out_top;
end
// Write signal has to be set one cycle early
do_write = (counter==4'h7);
end // clocked
end // always
endmodule
You have a plethora of errors in that code.
Apart from that you have a 3Mega bit memory from which you use only 1 in 9 locations.
You write out in two places. That does not work.
You use a %9. That can not be mapped onto hardware.
You have a sel signal which somehow controls your sum.
On top of that I understand you want to bring the whole memory out.
Your code because it needs to be drastically re-written.
But your biggest problem is that you definitely can't make the memory come out. What ever post-processing you want to do you have two choices:
Process the output data as it appears.
Store the data outside the module in a memory and have another process read that memory.
I think only (1) is the correct way because your signal can have infinite length.
As to fixing this code a bit:
Replace the %9 with a counter to count from 0 to 8.
Process out in in clocked section. See below
Move the addr and sel generating logic in here. Keep it all together.
Below is the basic code of how to do a 9-sequence convolution. I have to ignore 'sel' as I have no idea of the timing. I have also added address generation and a write signal so the result can be store in an external memory. But I still think you should process the result on the fly.
always #(posedge clk or posedge rts)
begin
if (rst)
begin
counter <= 4'h0;
addr <= 'h0;
sum <= 0;
do_write <= 1'b0;
end // rst
else
begin
if (counter == 4'h8)
begin // we have gathered 9 samples
counter <= 4'h0;
addr <= addr + 1;
// start again so ignore old sum
sum <= mult;
end
else
begin
counter <= counter + 4'h1;
// Add results
sum <= sum + mult;
end
// Write signal has to be set one cycle early
do_write = (counter==4'h7);
end // clocked
end // always
(Code above was entered on-the fly, may contain syntax, typing or other errors!!)
As you can see the trick is to know when to add the old result or when to ignore the old sum and start again.
(I spend about 3/4 of an hour on that so on my normal tariff you would have to pay me $93.75 :-)
I provided the basic code to let you work out the specifics. I did nothing with out but left that to you.
do_write and addr where for a possible memory to pick up the result. Without memory you can drop addr but do_write should tell you when a new convolution result is available, in which case you might want to give a it a different name. e.g. 'sum_valid'.
I am trying to read the values of memory after 5 cycles into an output register in verilog. How do I do that?
For example if i have a code which looks like this,
reg[31:0] mem[0:5];
if(high==1)
begin
newcount1<=count2;
mem[i]<=newcount1;
i<=i+1;
count2=0;
end
After the 5 cycles of operation whatever mem values i get, how do i read them in another output register? and can i perform averaging operation on those 5 cycles? and get a nominal value?
Say Your Memory Data Out is mem_data and you want it Read out in mem_data_out with Latency of 5 Cycles.
parameter MDP_Latency = 4;
reg [31:0] mem_data_out;
reg [31:0] [MDP_Latency - 1 : 0] mem_data_out_temp;
always#(posedge clk) begin
if(!reset) begin
for(int i = 0; i < MDP_Latency; i ++)
begin
mem_data_out <= 'd0;
mem_data_out_temp[i] <= 'd0;
end
end
else
begin
for(int i = 0; i < MDP_Latency; i ++)
begin
if(i == 0)
begin
mem_data_out_temp[i] <= mem_data;
end
else
begin
mem_data_out_temp[i] <= mem_data_out_temp[i - 1];
end
end
mem_data_out <= mem_data_out_temp[MDP_Latency];
end
end
`
Lets have a look at the posted code:
reg[31:0] mem[0:5];
if(high==1)
begin
newcount1<=count2;
mem[i]<=newcount1;
i<=i+1;
count2=0;
end
The lack of indentation makes it hard to read, i is not declared. The memory is actually 6 locations, 0 to 5.
You have conditionals and assignment not insides and initial or always block.
I am not sure what you are doing with count2 but mixing blocking and non-blocking is considered bad practise, it can be done but you must be really careful to not cause RTL to gates mismatch.
User1932872 Has posted an answer using for loops it looks like a valid answer but I think loops at this stage over complicate learning and understanding what you are creating in HDLs. While learning I would avoid such features and only uses them once comfortable with the whole flow.
reg[31:0] mem[0:4]; //5 Locations
always #(posedge clk) begin //Clked process assignmnets use non-blocking(<=)
mem[0]<=newcount1;
mem[1]<=mem[0];
mem[2]<=mem[1];
mem[3]<=mem[2];
mem[4]<=mem[3];
end
With this structure we can see the pipeline of data though mem[0] to mem[4]. We are implying 5 flip-flops where the output of the first drives data into the next. A combinatorial sum of them all could be:
reg [31+4:0] sum; //The +4 allows for bitgrowth you may need to truncate or limit.
always #* begin // Combinatorial use blocking (=)
sum = mem[0] + mem[1] + mem[2] + mem[3] + mem[4];
end
I'm trying to write a multiplier based on a design. It consists of two 16-bit inputs and the a single adder is used to calculate the partial product. The LSB of one input is AND'ed with the 16 bits of the other input and the output of the AND gate is repetitively added to the previous output. The Verilog code for it is below, but I seem to be having trouble with getting the outputs to work.
module datapath(output reg [31:15]p_high,
output reg [14:0]p_low,
input [15:0]x, y,
input clk); // reset, start, x_ce, y_ce, y_load_en, p_reset,
//output done);
reg [15:0]q0;
reg [15:0]q1;
reg [15:0]and_output;
reg [16:0]sum, prev_sum;
reg d_in;
reg [3:0] count_er;
initial
begin
count_er <= 0;
sum <= 17'b0;
prev_sum <= 17'b0;
end
always#(posedge clk)
begin
q0 <= y;
q1 <= x;
and_output <= q0[count_er] & q1;
sum <= and_output + prev_sum;
prev_sum <= sum;
p_high <= sum;
d_in <= p_high[15];
p_low[14] <= d_in;
p_low <= p_low >> 1;
count_er <= count_er + 1;
end
endmodule
I created a test bench to test the circuit and the first problem I see is that, the AND operation doesn't work as I want it to. The 16-bits of the x-operand are and'ed with the LSB of the y-operand. The y-operand is shifted by one bit after every clock cycle and the final product is calculated by successively adding the partial products.
However, I am having trouble starting from the sum and prev_sum lines and their outputs are being displayed as xxxxxxxxxxxx.
You don't seem to be properly resetting all the signals you need to, or you seem to be confusing the way that nonblocking assignments work.
After initial begin:
sum is 0
prev_sum is 0
and_output is X
After first positive edge:
sum is X, because and_output is X, and X+0 returns X. At this point sum stays X forever, because X + something is always X.
You're creating a register for almost every signal in your design, which means that none of your signals update immediately. You need to make a distinction between the signals that you want to register, and the signals that are just combinational terms. Let the registers update with nonblocking statements on the posedge clock, and let the combinational terms update immediately by placing them in an always #* block.
I don't know the algorithm that you're trying to use, so I can't say which lines should be which, but I really doubt that you intend for it to take one clock cycle for x/y to propagate to q0/q1, another cycle for q to propagate to and_output, and yet another clock cycle to propogate from and_output to sum.
Comments on updated code:
Combinational blocks should use blocking assignments, not nonblocking assignments. Use = instead of <= inside the always #* block.
sum <= and_output + sum; looks wrong, It should be sum = and_output + p_high[31:16] according to your picture.
You're assigning p_low[14] twice here. Make the second statement explicitly set bits [13:0] only:
p_low[14] <= d_in;
p_low[13:0] <= p_low >> 1;
You are mixing blocking and nonblocking assignments in the same sequential always block, which can cause unexpected results:
d_in <= p_high[15];
p_low[14] = d_in;