So I'm trying to implement my first FSM, and I'm getting very confused.
The codes a bit long, so let me summarize:
I start with declaring inputs and outputs
Then state declarations (I have five plus three placeholders)
Then Current state assignment, which is sequential
always #(posedge clk)
if (rst == 1'b1)
Current_State <= MainGreen;
Current_State <= Next_state;
And then... I get lost. I originally just had one big ol' sequential circuit that assigned next_state and outputs, but this was messy/ probably had lots of errors.
What I have right now simply has next_state logic, but nothing to do with outputs:
always #*
Next_state = Current_State;
case (Current_State)
if (count && expired)
Next_state = MainYel;
if (WR && expired)
Next_state = AllRed;
else if (expired)
Next_state = SideGreen;
if (expired)
Next_state = SideGreen;
if(sensor && expired)
Next_state = SideYel;
Next_state = MainGreen;
I have about eight outputs based on state alone and four based on state and input. How should I assign them?
You're 90% of the way there. There are two ways to proceed (well probably more than that, but I'll give you what I think are two of the best options):
First, do you have a lot of outputs that only get asserted for a small minority of the states? If so, I'd recommend something like this in your combinatorial always block:
always #*
// default output values
output1 = 1'b0;
output2 = 1'b0;
output3 = 1'b0;
case (Current_State)
output2 = 1'b1;
// calculate next state
output4 = 1'b1;
// calculate next state
This is probably the most efficient way to code your state machine since you don't need to define every output in every state. Now if you have each output active in a lot of different states, it might be easier for you to define those outputs in every state in your case statement.
A final way, which I wouldn't recommend, is to derive the sm ouputs in separate assign statements. It will work just as well, but I think keeping the outputs together with the next state logic is much easier for code maintenance and a good habit to develop. It's one thing to hack out some code quickly for an assignment, it's another to develop code for a real product that may be getting updated several times over the life of a product, and maintainability is essential (something I had to learn on the job because no one taught it in university).
I am trying to implement an FSM that reacts to either one of two buttons being pressed. Let's call these buttons A and B. What I want is something like:
always#(posedge A or posedge B) begin
if(A) begin **do one thing**
end else if (B) begin **do another**
The situation I am scared of is the case when, for example, the user is holding down button A and then presses B. The if statement would detect that A is high, when the actual sensitive parameter I want a reaction to is B. How can I do this in Verilog?
One way or another, you need to keep track of the state of "A has been depressed and has not yet been released," etc. You can either track that state externally to your state machine, as in #wilcroft's answer, or as part of your state machine. To handle this as part of the state machine you would need to change the sensitivity list to respond to either presses or releases (i.e. not just posedge), and include state information for either or both buttons being on:
always #(A or B) begin
if (state == NONE_ON) begin
if(A) begin next_state = A_ON; **do one thing**
end else if (B) begin next_state = B_ON; **do another**
else if (state == A_ON) begin
if (!A) begin next_state = NONE_ON;
end else if (B) begin next_state = AB_ON; **do the B things***
else if (state == B_ON) begin
if (!B) begin next_state = NONE_ON;
end else if (A) begin next_state = AB_ON; **do the A things***
else if (state == AB_ON) begin
if (!A) begin next_state = B_ON;
end else if (!B) begin next_state = A_ON;
In some sense keeping track of state like this is the whole point of a state machine which is what you say you are trying to build, and this is a common motivation to build a state machine in the first place.
However if the state machine you were intending to build is at all complex, then adding further A/B information to the state table could significantly multiply your states and make the overall state machine a good deal more complex and spaghetti-like, since your intended states may end up as additional sub-states of A_ON, and of B_ON, and also of AB_ON.
On the other hand, depending on what you were trying to do, given that you were trying to build a state machine based on A or B being pressed it seems very likely that at least some of this information might already be implied (for example that at least some button was pressed) in the states you originally had in mind, so it might not change the complexity all that much.
(Note that if you're concerned about handling the possibility of both buttons being pressed or released simultaneously that would also make this implementation more complicated as well.)
One way would be to register the values of A and B, and compare them to their current values. This requires a system clock of some sort, but you're likely already using one for your FSM.
As an example:
input A, B;
input clk;
reg A_prev, B_prev;
always #(posedge clk)
A_prev <= A;
B_prev <= B;
always #(*)
if (A && !A_prev) **do whatever**
else if (B && !B_prev) **do whatever**
Since A and B are registered, the second always block will detect if A (or B) was low on the previous clock cycle and is now high.
This is something that I think should be doable, but I am failing at how to do it in the HDL world. Currently I have a design I inherited that is summing a multidimensional array, but we have to pre-write the addition block because one of the dimensions is a synthesize-time option, and we cater the addition to that.
If I have something like reg tap_out[src][dst][tap], where src and dst is set to 4 and tap can be between 0 and 15 (16 possibilities), I want to be able to assign output[dst] be the sum of all the tap_out for that particular dst.
Right now our summation block takes all the combinations of tap_out for each src and tap and sums them in pairs for each dst:
Is there a way to do this better in Verilog? In C I would use some for-loops, but that doesn't seem possible here.
for-loops work perfectly fine in this situation
integer src_idx, tap_idx;
always #* begin
sum = 0;
for (scr_idx=0; src_idx<4; src_idx=scr_idx+1) begin
for (tap_idx=0; tap_idx<16; tap_idx=tap_idx+1) begin
sum = sum + tap_out[src_idx][dst][tap_idx];
It does unroll into a large combinational logic during synthesis and the results should be the same adding up the bits line by line.
Propagation delay from a large summing logic could have a timing issue. A good synthesizer should find the optimum timing/area when told the clocking constraint. If logic is too complex for the synthesizer, then add your own partial sum logic that can run in parallel
reg [`WIDHT-1:0] /*keep*/ partial_sum [3:0]; // tell synthesis to preserve these nets
integer src_idx, tap_idx;
always #* begin
sum = 0;
for (scr_idx=0; src_idx<4; src_idx=scr_idx+1) begin
partial_sum[scr_idx] = 0;
// partial sums are independent of each other so the can run in parallel
for (tap_idx=0; tap_idx<16; tap_idx=tap_idx+1) begin
partial_sum[scr_idx] = partial_sum[scr_idx] + tap_out[src_idx][dst][tap_idx];
sum = sum + partial_sum[scr_idx]; // sum the partial sums
If timing is still an issue, then you have must treat the logic as multi-cycle and sample the value some clock cycles after the input changed.
In RTL (the level of abstraction you are likely modelling with your HDL), you have to balance parallelism with time. By doing things in parallel, you save time (typically) but the logic takes up a lot of space. Conversely, you can make the adds completely serial (do one add at one time) and store the results in a register (it sounds like you want to accumulate the total sum, so I will explain that).
It sounds like the fully parallel is not practical for your uses (if it is and you want to rewrite it, look up generate statements). So, you'll need to create a small FSM and accumulate the sums into a register. Here's a basic example, which sums an array of 16-bit numbers (assume they are set somewhere else):
reg [15:0] arr[0:9]; // numbers
reg [31:0] result; // accumulated sum
reg load_result; // load signal for register containing result
reg clk, rst_L; // These are the clock and reset signals (reset asserted low)
/* This is a register for storing the result */
always #(posedge clk, negedge rst_L) begin
if (~rst_L) begin
result <= 32'd0;
else begin
if (load_result) begin
result <= next_result;
/* A counter for knowing which element of the array we are adding
reg [3:0] counter, next_counter;
reg load_counter;
always #(posedge clk, negedge rst_L) begin
if (~rst_L) begin
counter <= 4'd0;
else begin
if (load_counter) begin
counter <= counter + 4'd1;
/* Perform the addition */
assign next_result = result + arr[counter];
/* Define the state machine states and state variable */
localparam IDLE = 2'd0;
localparam ADDING = 2'd1;
localparam DONE = 2'd2;
reg [1:0] state, next_state;
/* A register for holding the current state */
always #(posedge clk, negedge rst_L) begin
if (~rst_L) begin
state <= IDLE;
else begin
state <= next_state;
/* The next state and output logic, this will control the addition */
always #(*) begin
/* Defaults */
next_state = IDLE;
load_result = 1'b0;
load_counter = 1'b0;
case (state)
IDLE: begin
next_state = ADDING; // Start adding now (right away)
ADDING: begin
load_result = 1'b1; // Load in the result
if (counter == 3'd9) begin // If we're on the last element, stop incrementing counter, we are done
load_counter = 1'b0;
next_state = DONE;
else begin // Otherwise, keep adding
load_counter = 1'b1;
next_state = ADDING;
DONE: begin // finished adding, result is in result!
next_state = DONE;
There are lots of resources on the web explaining FSMs if you are having trouble with the concept, but they can be used to implement your basic C-style for loop.
Is there any other functionality like always (that would only run if the sensitive signal changes and won't iterate as long as signal stays the same) which can be cascaded, separately or within the always , but is synthesizable in Verilog.
While I don't think there's a construct specifically like this in Verilog, there is an easy way to do this. If you do an edge detect on the signal you want to be sensitive to, you can just "if" on that in your always block. Like such:
reg event_detected;
reg [WIDTH-1:0] sensitive_last;
always # (posedge clk) begin
if (sensitive_signal != sensitive_last) begin
event_detected <= 1'b1;
end else begin
event_detected <= 1'b0;
sensitive_last <= sensitive_signal;
// Then, where you want to do things:
always # (posedge clk) begin
if (event_detected ) begin
// Do things here
The issue with doing things with nested "always" statements is that it isn't immediately obvious how much logic it would synthesize to. Depending on the FPGA or ASIC architecture you would have a relatively large register and extra logic that would be instantiated implicitly, making things like waveform debugging and gate level synthesis difficult (not to mention timing analysis). In a world where every gate/LUT counts, that sort of implicitly defined logic could become a major issue.
The assign statement is the closest to always you you can get. assign can only be for continuous assignment. The left hand side assignment must be a wire; SystemVerilog also allows logic.
I prefer the always block over assign. I find simulations give better performance when signals that usually update at the same time are group together. I believe the optimizer in the synthesizer can does a better job with always, but this might depend on the synthesizer being used.
For synchronous logic you'll need an always block. There is no issue reading hardware switches within the always block. The fpga board may already de-bounce the input for you. If not, then send the input through a two phase pipe line before using it with your code. This helps with potential setup/hold problems.
always #(posedge clk) begin
pre_sync_human_in <= human_in;
sync_human_in <= pre_sync_human_in;
always #* begin
case( sync_human_in )
0 : // do this
1 : // do that
// ...
always #(posedge clk) begin
if ( sync_human_in==0 ) begin /* do this */ end
else begin /* else do */ end
If you want to do a hand-shake having the state machine wait for a human to enter a multi-bit value, then add to states that wait for the input. One state that waits for not ready (stale bit from previous input), and the other waiting for ready :
always #(posedge clk) begin
// ...
// ...
// ready bit is still high for the last input, wait for not ready
if (sync_human_in[READ_BIT])
// ready bit is low, wait for it to go high before proceeding
if (sync_human_in[READ_BIT])
// ...
If I want statements to happen in parallel and another statement to happen when all other statements are done with, for example:
task read;
if (de_if==NOP) begin
dp_op <= 3'b000;
dp_phase = EXEC;
else begin
if (de_if==EXEC_THEN) begin
dp_const <= de_src3[0];
dp_src <= de_src3;
dp_op <= {NOP,de_ctrl3};
dp_dest <= de_dest1;
else if (get_value(de_ctrl1,de_src1)==dp_mem[de_src2]) begin
dp_const <= de_src3[0];
dp_src <= de_src3;
dp_op <= {NOP,de_ctrl3};
dp_dest <= de_dest1;
else begin
dp_const <= de_src4[0];
dp_src <= de_src4;
dp_op <= {NOP,de_ctrl4};
dp_dest <= de_dest2;
#1 dp_phase=READ;
In this code I want the statement dp_phase = READ to only be executed after all other assignments are done, how do I do it?
As you can see what I did is wait 1 clock before the assignment but i do not know if this is how its done ...
You need a state machine. That's the canonical way to make things happen in a certain sequence. Try to remember that using a hardware description language is not like a regular programming are just describing the kind of behavior that you would like the hardware to have.
To make a state machine you will need a state register, one or more flip-flops that keep track of where you are in the desired sequence of events. The flip-flops should be updated on the rising clock edge but the rest of your logic can be purely combinational.
I encountered a problem with synthesis where if I had two variables in an if statement, Synthesis will fail (with a very misleading and unhelpful error message).
Given the code snippet below
//other states here
if (packet_size < payload_length) begin
packet_size <= packet_size + 1;
//Code to place byte into ram that only triggers with a toggle flag
next_state = GET_PAYLOAD_DATA;
end else begin
next_state = GET_CHKSUM2;
I get an error in Xilinx ISE during synthesis:
ERROR:Xst:2001 - Width mismatch detected on comparator next_state_cmp_lt0000/ALB. Operand A and B do not have the same size.
The error claims that next_state isn't correct, but if I take out payload_length and assign a static value to it, it works perfectly fine. As both packet_size and payload_length are of type integer, they are the same size and that is not the problem. Therefore I assume its a similar problem to for loops not being implementable in hardware unless it is a static loop with a defined end. But If statements should work as it is just a comparator between 2 binary values.
What I was trying to do here is that when a byte is received by my module, it will be added into RAM until the the size of the entire payload (which I get from earlier packet data) is reached, then change to a different state to handle the checksum. As the data only comes in 1 byte at a time, I recall this state multiple times until the counter reaches the limit, then I set the next state to something else.
My question is then, how do I achieve the same results of calling my state and repeat until the counter has reached the length of the payload without the error showing up?
Snippets of how packet_size and payload_length are declared, as requested in comments
integer payload_length, packet_size;
initial begin
//other stuff
packet_size <= 0;
always # (posedge clk) begin
//case statements with various states
if (rx_toggle == 1) begin
packet_size <= packet_size + 1;
addr <= 3;
din <= rx_byte_buffer;
payload_length <= rx_byte_buffer;
next_state = GET_PAYLOAD_DATA;
end else begin
next_state = GET_PAYLOAD_LEN;
rx_byte_buffer is a register of the input data my module receives as 8 bits wide, while packet_size increments in various other states of the machine prior to the one you see above.
I have gotten around the error by switching the if statement conditionals around, but still want to understand why that would change anything.
There are some errors that stick out right away about the code, while they may not fix this problem, they will need to be corrected because it will cause a difference in simulation and hardware tests.
The nextstate logic needs to be in a different always block that does not change based on the posedge of clock. The sensitivity list needs to include things like "state" and/or "*". And if you wanted the nextstate logic to be registered like it is now (which you don't) you should use a nonblocking assignment, this is described in great deal in the cummings paper, provided below.
the code should look something like this:
always # (*) begin
//case statements with various states
if (rx_toggle == 1) begin
packet_size_en = 1'b1;
//these will need to be changed in a similar manner
addr <= 3;
din <= rx_byte_buffer;
payload_length <= rx_byte_buffer;
next_state = GET_PAYLOAD_DATA;
end else begin
next_state = GET_PAYLOAD_LEN;
always#(posedge clk) begin
packet_size <= packet_size +1 ;
Also, the first thing I would try is to make these a defined length, by making them of type reg (I assume that you wont be needing a signed number so it should have no difference on simulation), outside of generate blocks, you should try to not let synthesis play around with integers.