I'm trying to write a multiplier based on a design. It consists of two 16-bit inputs and the a single adder is used to calculate the partial product. The LSB of one input is AND'ed with the 16 bits of the other input and the output of the AND gate is repetitively added to the previous output. The Verilog code for it is below, but I seem to be having trouble with getting the outputs to work.
module datapath(output reg [31:15]p_high,
output reg [14:0]p_low,
input [15:0]x, y,
input clk); // reset, start, x_ce, y_ce, y_load_en, p_reset,
//output done);
reg [15:0]q0;
reg [15:0]q1;
reg [15:0]and_output;
reg [16:0]sum, prev_sum;
reg d_in;
reg [3:0] count_er;
initial
begin
count_er <= 0;
sum <= 17'b0;
prev_sum <= 17'b0;
end
always#(posedge clk)
begin
q0 <= y;
q1 <= x;
and_output <= q0[count_er] & q1;
sum <= and_output + prev_sum;
prev_sum <= sum;
p_high <= sum;
d_in <= p_high[15];
p_low[14] <= d_in;
p_low <= p_low >> 1;
count_er <= count_er + 1;
end
endmodule
I created a test bench to test the circuit and the first problem I see is that, the AND operation doesn't work as I want it to. The 16-bits of the x-operand are and'ed with the LSB of the y-operand. The y-operand is shifted by one bit after every clock cycle and the final product is calculated by successively adding the partial products.
However, I am having trouble starting from the sum and prev_sum lines and their outputs are being displayed as xxxxxxxxxxxx.
You don't seem to be properly resetting all the signals you need to, or you seem to be confusing the way that nonblocking assignments work.
After initial begin:
sum is 0
prev_sum is 0
and_output is X
After first positive edge:
sum is X, because and_output is X, and X+0 returns X. At this point sum stays X forever, because X + something is always X.
You're creating a register for almost every signal in your design, which means that none of your signals update immediately. You need to make a distinction between the signals that you want to register, and the signals that are just combinational terms. Let the registers update with nonblocking statements on the posedge clock, and let the combinational terms update immediately by placing them in an always #* block.
I don't know the algorithm that you're trying to use, so I can't say which lines should be which, but I really doubt that you intend for it to take one clock cycle for x/y to propagate to q0/q1, another cycle for q to propagate to and_output, and yet another clock cycle to propogate from and_output to sum.
Comments on updated code:
Combinational blocks should use blocking assignments, not nonblocking assignments. Use = instead of <= inside the always #* block.
sum <= and_output + sum; looks wrong, It should be sum = and_output + p_high[31:16] according to your picture.
You're assigning p_low[14] twice here. Make the second statement explicitly set bits [13:0] only:
p_low[14] <= d_in;
p_low[13:0] <= p_low >> 1;
You are mixing blocking and nonblocking assignments in the same sequential always block, which can cause unexpected results:
d_in <= p_high[15];
p_low[14] = d_in;
Related
so I have a module that does convolution, it takes a data input and the filter input , where input is array of 9 numbers , every posedge of the clk these two inputs are being multiplied and then added accumulatively, i.e I save every new multiplication product into a register. after each 9 iterations I have to save the result and reset it , but I have to do it in one clock cycle, since my new data is coming on the next posedge. So the issue that I am facing is how to not save data and reset the out without losing data? Please help if you have any suggestions. It also need to be mentioned that my conv_module is a sub-module and I will be instantiating it in a top module , so I have to access all the inputs and outputs from uptop.
This is the code that I've written so far, but it does not work the way I want it, cause I cannot tap the array of output numbers from the top module.
module mult_conv( input clk,
input rst,
input signed [4:0] a,
input signed[2:0] b,
output reg signed[7:0] out
);
wire signed [7:0] mult;
reg signed [7:0] sum;
reg [3:0] counter;
reg do_write;
reg [7:0] out_top;
assign mult = {{3{a[4]}},a} * {{5{b[2]}},b};
always #(posedge clk or posedge rst)
begin
if (rst)
begin
counter <= 4'h0;
addr <= 'h0;
sum <= 0;
do_write <= 1'b0;
end // rst
else
begin
if (counter == 4'h8)
begin // we have gathered 9 samples
counter <= 4'h0;
// start again so ignore old sum
sum <= mult;
out <= sum;
out_top <= out;
end
else
begin
counter <= counter + 4'h1;
// Add results
sum <= sum + mult;
out <= 0;
out_top <= out_top;
end
// Write signal has to be set one cycle early
do_write = (counter==4'h7);
end // clocked
end // always
endmodule
You have a plethora of errors in that code.
Apart from that you have a 3Mega bit memory from which you use only 1 in 9 locations.
You write out in two places. That does not work.
You use a %9. That can not be mapped onto hardware.
You have a sel signal which somehow controls your sum.
On top of that I understand you want to bring the whole memory out.
Your code because it needs to be drastically re-written.
But your biggest problem is that you definitely can't make the memory come out. What ever post-processing you want to do you have two choices:
Process the output data as it appears.
Store the data outside the module in a memory and have another process read that memory.
I think only (1) is the correct way because your signal can have infinite length.
As to fixing this code a bit:
Replace the %9 with a counter to count from 0 to 8.
Process out in in clocked section. See below
Move the addr and sel generating logic in here. Keep it all together.
Below is the basic code of how to do a 9-sequence convolution. I have to ignore 'sel' as I have no idea of the timing. I have also added address generation and a write signal so the result can be store in an external memory. But I still think you should process the result on the fly.
always #(posedge clk or posedge rts)
begin
if (rst)
begin
counter <= 4'h0;
addr <= 'h0;
sum <= 0;
do_write <= 1'b0;
end // rst
else
begin
if (counter == 4'h8)
begin // we have gathered 9 samples
counter <= 4'h0;
addr <= addr + 1;
// start again so ignore old sum
sum <= mult;
end
else
begin
counter <= counter + 4'h1;
// Add results
sum <= sum + mult;
end
// Write signal has to be set one cycle early
do_write = (counter==4'h7);
end // clocked
end // always
(Code above was entered on-the fly, may contain syntax, typing or other errors!!)
As you can see the trick is to know when to add the old result or when to ignore the old sum and start again.
(I spend about 3/4 of an hour on that so on my normal tariff you would have to pay me $93.75 :-)
I provided the basic code to let you work out the specifics. I did nothing with out but left that to you.
do_write and addr where for a possible memory to pick up the result. Without memory you can drop addr but do_write should tell you when a new convolution result is available, in which case you might want to give a it a different name. e.g. 'sum_valid'.
I have a 1023 bit vector in Verilog. All I want to do is check if the ith bit is 1 and if it is 1 , I have to add 'i' to another variable .
In C , it would be something like :
int sum=0;
int i=0;
for(i=0;i<1023;i++) {
if(a[i]==1) {
sum=sum+i;
}
Of course , the addition that I am doing is over a Galois Field . So, I have a module called Galois_Field_Adder to do the computation .
So, my question now is how do I conditionally check if a specific bit is 1 and if so call my module to do that specific addition .
NOTE: The 1023 bit vector is declared as an input .
It's hard to answer your question without seeing your module, as we can't gage where you are in your Verilog. You always have to think of how your code translates in gates. If we want to translate your C code into synthesizable logic, we can take the same algorithm, go through each bit one after the other, and add to the sum depending on each bit. You would use something like this:
module gallois (
input wire clk,
input wire rst,
input wire [1022:0] a,
input wire a_valid,
output reg [18:0] sum,
output reg sum_valid
);
reg [9:0] cnt;
reg [1021:0] shift_a;
always #(posedge clk)
if (rst)
begin
sum[18:0] <= {19{1'bx}};
sum_valid <= 1'b0;
cnt[9:0] <= 10'd0;
shift_a[1021:0] <= {1022{1'bx}};
end
else
if (a_valid)
begin
sum[18:0] <= 19'd0;
sum_valid <= 1'b0;
cnt[9:0] <= 10'd1;
shift_a[1021:0] <= a[1022:1];
end
else if (cnt[9:0])
begin
if (cnt[9:0] == 10'd1022)
begin
sum_valid <= 1'b1;
cnt[9:0] <= 10'd0;
end
else
cnt[9:0] <= cnt[9:0] + 10'd1;
if (shift_a[0])
sum[18:0] <= sum[18:0] + cnt[9:0];
shift_a[1021:0] <= {1'bx, shift_a[1021:1]};
end
endmodule
You will get your result after 1023 clock cycles. This code needs to be modified depending on what goes around it, what interface you want etc...
Of importance here is that we use a shift register to test each bit, so that the logic adding your sum only takes shift_a[0], sum and cnt as an input.
Code based on the following would also work in simulation:
if (a[cnt[9:0])
sum[18:0] <= sum[18:0] + cnt[9:0];
but the logic adding to sum would in effect take all 1023 bits of a[] as an input. This would be quite hard to turn into actual lookup tables.
In simulation, you can also implement something very crude such as this:
reg [1022:0]a;
reg [9:0] sum;
integer i;
always #(a)
begin
sum[9:0] = 10'd0;
for (i=0; i < 1023; i=i+1)
if (a[i])
sum[9:0] = sum[9:0] + i;
end
If you were to try to synthesize this, sum would actually turn into a chunk of combinatorial logic, as the 'always' block doesn't rely on a clock. This code is in fact equivalent to this:
always #(a)
case(a):
1023'd0: sum[18:0] = 19'd0;
1023'd1: sum[18:0] = 19'd1;
1023'd2: sum[18:0] = 19'd3;
etc...
Needless to say that a lookup table with 1023 input bits is a VERY big memory...
Then if you want to improve your code, and use your FPGA as an FPGA and not like a CPU, you need to start thinking about parallelism, for instance working in parallel on different ranges of your input a. But this is another thread...
I am working with an Altera DE2 development board and I want to read an input in on the switches. This is stored in registers. Based on a counter these registers are incremented. The registers are then supposed to be output to the Seven Segment Displays thought a B2D converter. But I can not pass a register to a function.
wire [26:0] Q,Q2,Q3,Q4;
wire [3:0] one,two,three,four;
reg SecInc,MinInc,HrInc;
reg [3:0] M1,M2,H1,H2;
assign one = SW[3:0];
assign two = SW[7:4];
assign three = SW[11:8];
assign four = SW[15:12];
always begin
M1 = SW[3:0];
M2 = SW[7:4];
H1 = SW[11:8];
H2 = SW[15:12];
end
This is how I get and store the inputs. They come from the switches which we use as a binary representation on Hours and Minutes.
Based on a counter we increment a minute or an hour register.
//increment seconds from 0 to 60
counter seconds (SecInc,KEY[0],Q2);
defparam seconds.n = 8;
defparam seconds.mod = 60;
always # (negedge CLOCK_50) begin
if (Q2 >= 60) begin
MinInc = 1;
M1 <= M1 + 1'b1;
if(M1 >= 9) begin
M1 <= 0;
M2 <= M2 + 1'b1;
end
end else begin
MinInc = 0;
end
end
We want to display the result on the SSD's.
hex(M1,HEX4);
hex(M2,HEX5);
hex(H1,HEX6);
hex(H2,HEX7);
Here in lies the problem. This is not allowed in verilog. I need a way to send my registers to a function which displays numbers from 0 to 9 using some B2D conversion.
I will say I have never had a formal intro to verilog before and I have tried all I can think to do. I even tried to make a new module in which I would pass one,two,three,four and have the module increment them, like it does with Q2 for the counter I have shown. Any suggestions or help is greatly appreciated!
As requested here is the hex module:
module hex(BIN, SSD);
input [15:0] BIN;
output reg [0:6] SSD;
always begin
case(BIN)
0:SSD=7'b0000001;
1:SSD=7'b1001111;
2:SSD=7'b0010010;
3:SSD=7'b0000110;
4:SSD=7'b1001100;
5:SSD=7'b0100100;
6:SSD=7'b0100000;
7:SSD=7'b0001111;
8:SSD=7'b0000000;
9:SSD=7'b0001100;
endcase
end
endmodule
Thank you in advance!
Your hex module is not a function, it is a module and therefore must be instantiated with an instance name like this:
hex digit0(.BIN(M1), .SSD(HEX4));
hex digit1(.BIN(M2), .SSD(HEX5));
hex digit2(.BIN(H1), .SSD(HEX6));
hex digit3(.BIN(H2), .SSD(HEX7));
In addition to nguthrie being correct, that you need to instantiate your hex converter as a module, you drive M1 from a race condition in your always block. Non-blocking assignments will evaluate simultaneously within a block (or essentially simultaneously). This is not a program, where things happen in order. What might work better is:
always # (negedge CLOCK_50) begin
if (Q2 >= 60) begin
MinInc = 1;
if (M1 < 9) begin
M1 <= M1 + 1'b1;
end else begin
M1 <= 0;
M2 <= M2 + 1'b1;
end
end else begin
MinInc = 0;
end
end
You will also potentially get unexpected results from your blocking assignments to MinInc, but since I don't see where this is read it's hard to know what will happen.
Read up on blocking (=) vs non-blocking (<=) assignments in Verilog. It's one of the trickiest concepts of the language, and misuse of the two operations is the cause of 90% of the most dastardly bugs I've ever seen.
EDIT: In re-reading your question, it seems that you're trying to drive M1-4 from at least three places. You really can't have a continuous always begin block and a clocked (always # (negedge clock) begin) driving the same register. This will send your compiler into a tantrum.
Recently, I had seen some D flip-flop RTL code in verilog like this:
module d_ff(
input d,
input clk,
input reset,
input we,
output q
);
always #(posedge clk) begin
if (~reset) begin
q <= 1'b0;
end
else if (we) begin
q <= d;
end
else begin
q <= q;
end
end
endmodule
Does the statement q <= q; necessary?
Does the statement q <= q; necessary?
No it isn't, and in the case of an ASIC it may actually increase area and power consumption. I'm not sure how modern FPGAs handle this. During synthesis the tool will see that statement and require that q be updated on every positive clock edge. Without that final else clause the tool is free to only update q whenever the given conditions are met.
On an ASIC this means the synthesis tool may insert a clock gate(provided the library has one) instead of mux. For a single DFF this may actually be worse since a clock gate typically is much larger than a mux but if q is 32 bits then the savings can be very significant. Modern tools can automatically detect if the number of DFFs using a shared enable meets a certain threshold and then choose a clock gate or mux appropriately.
In this case the tool needs 3 muxes plus extra routing
always #(posedge CLK or negedge RESET)
if(~RESET)
COUNT <= 0;
else if(INC)
COUNT <= COUNT + 1;
else
COUNT <= COUNT;
Here the tool uses a single clock gate for all DFFs
always #(posedge CLK or negedge RESET)
if(~RESET)
COUNT <= 0;
else if(INC)
COUNT <= COUNT + 1;
Images from here
As far as simulation is concerned, removing that statement should not change anything, since q should be of type reg (or logic in SystemVerilog), and should hold its value.
Also, most synthesis tools should generate the same circuit in both cases since q is updated using a non-blocking assignment. Perhaps a better code would be to use always_ff instead of always (if your tool supports it). This way the compiler will check that q is always updated using a non-blocking assignment and sequential logic is generated.
I am trying to create a counter in verilog which counts how many clock cycles there have been and after ten million it will reset and start again.
I have created a twenty four bit adder module along with another module containing twenty four D Flip flops to store the count of the cycles outputted from the adder.
I then want to have a state machine which is in the count state until ten million cycles have passed then it goes to a reset state.
Does this sound right? The problem is I am not sure how to implement the state machine.
Can anyone point me to a website/book which could help me with this?
thanks
As Paul S already mentioned, there is no need for a state machine if you want your counter to keep counting after an overflow. You can do something like this (untested, might contain typos):
module overflow_counter (
clk,
reset,
enable,
ctr_out
);
// Port definitions
input clk, reset, enable;
output [23:0] ctr_out;
// Register definitions
reg [23:0] reg_ctr;
// Assignments
assign ctr_out = reg_ctr;
// Counter behaviour - Asynchronous active-high reset
initial reg_ctr <= 0;
always # (posedge clk or posedge reset)
begin
if (reset) reg_ctr <= 0;
else if (enable)
begin
if (reg_ctr == 10000000) reg_ctr <= 0;
else reg_ctr <= reg_ctr + 1;
end
end
endmodule
Of course, normally you'd use parameters so you don't have to make a custom module every time you want an overflowing counter. I'll leave that to you ;).
[Edit] And here are some documents to help you with FSM. I just searched Google for "verilog state machine":
EECS150: Finite State Machines in Verilog
Synthesizable Finite State Machine Design Techniques
I haven't read the first paper, so I can't comment on that. The 2nd one shows various styles of coding FSMs, among which the 3 always blocks style, which I highly recommend, because it's a lot easier to debug (state transitions and FSM output are neatly separated). The link seems to be down, so here is the cached Google result.
You don't need a state machine. You already have state in the counter. All you need to do is detect the value you want to wrap at and load 0 into your counter at that point
In pseudo-code:
if count == 10000000 then
nextCount = 0;
else
nextCount = count + 1;
...or...
nextCount = count + 1;
if count == 10000000 then
resetCount = 1;
State machines are not too tricky. Use localparam (with a width, don't forget the width, not shown here because it is just one bit) to define labels for your states. Then create two reg variables (state_reg, state_next). The _reg variable is your actual register. The _next variable is a "wire reg" (a wire that can be assigned to inside a combinational always block). The two things to remember are to do X_next = X_reg; in the combinational always block (and then the rest of the combinational logic) and X_reg <= X_next; in the sequential always block. You can get fancy for special cases but if you stick to these simple rules then things should just work. I try not to use instantiation for very simple things like adders since Verilog has great support for adders.
Since I work with FPGAs, I assign initial values to my registers and I don't use a reset signal. I'm not sure but for ASIC design I think it is the opposite.
localparam STATE_RESET = 1'b0, STATE_COUNT = 1'b1;
reg [23:0] cntr_reg = 24'd0, cntr_next;
reg state_reg = STATE_COUNT, state_next;
always #* begin
cntr_next = cntr_reg; // statement not required since we handle all cases
if (cntr_reg == 24'd10_000_000)
cntr_next = 24'd0;
else
cntr_next = cntr_reg + 24'd1;
state_next = state_reg; // statement required since we don't handle all cases
case (state_reg)
STATE_COUNT: if (cntr_reg == 24'd10_000_000) state_next = STATE_RESET;
endcase
end
always #(posedge clk) begin
cntr_reg <= cntr_next;
state_reg <= state_next;
end
I found this book to be very helpful. There is also a VHDL version of the book, so you can use both side-by-side as a Rosetta Stone to learn VHDL.