16-bit CPU design: Issues with implementing fetch-execute cycle - verilog

I am doing a computer architecture course on Coursera called
NandtoTetris and have been struggling with my 16-bit CPU design. The
course uses a language called HDL, which is a very simple Verilog like
language.
I have spent so many hours trying to iterate on my CPU design based on
the diagram below and I don't understand what I am doing wrong. I
tried my best to represent the fetch and execute mechanics. Does
anyone have any advice on how to solve this?
Here are the design and control syntax diagram links:
CPU IO high-level diagram:
Gate level CPU diagram:
Control instruction syntax:
Here is my code below:
// Put your code here:
// Instruction decoding:from i of “ixxaccccccdddjjj”
// Ainstruction: Instruction is 16-bit value of the constant that should be loaded into the A register
// C-instruction: The a- and c-bits code comp part, d- and j-bits code dest and jump(x-bits are ignored).
Mux16(a=outM, b=instruction, sel=instruction[15], out=aMUX); // 0 for A-instruction or 1 for a C-instruction
Not(in=instruction[15], out=aInst); // assert A instruction with op-code as true
And(a=instruction[15], b=instruction[5], out=cInst); // assert wite-to-A-C-instruction with op code AND d1-bit
Or(a=aInst, b=cInst, out=aMuxload); // assert Ainstruction or wite-to-A-C-instruction is true
ARegister(in=aMUX, load=cInst, out=addressM); // load Ainstruction or wite-to-A-C-instruction
// For C-instruction, a-bit determines if ALU will operate on A register input (0) vs M input (1)
And(a=instruction[15], b=instruction[12], out=Aselector); // assert that c instruction AND a-bit
Mux16(a=addressM, b=inM, sel=Aselector, out=aluMUX); // select A=0 or A=1
ALU(x=DregisterOut, y=aluMUX, zx=instruction[11], nx=instruction[10], zy=instruction[9], ny=instruction[8], f=instruction[7], no=instruction[6], zr=zr, ng=ng,out=outM);
// The 3 d-bits of “ixxaccccccdddjjj” ALUout determine registers are destinations for for ALUout
// Whenever there is a C-Instruction and d2 (bit 4) is a 1 the D register is loaded
And(a=instruction[15], b=instruction[4], out=writeD); // assert that c instruction AND d2-bit
DRegister(in=outM, load=writeD, out=DregisterOut); // d2 of d-bits for D register destination
// Whenever there is a C-Instruction and d3 (bit 3) is a 1 then writeM (aka RAM[A]) is true
And(a=instruction[15], b=instruction[3], out=writeM); // assert that c instruction AND d3-bit
// Programe counter to fetch next instruction
// PC logic: if (reset==1), then PC = 0
// else:
// load = comparison(instruction jump bits, ALU output zr & ng)
// if load == 1, PC = A
// else: PC ++
And(a=instruction[2], b=ng, out=JLT); // J2 test against ng: out < 0
And(a=instruction[1], b=zr, out=JEQ); // J1 test against zr: out = 0
Or(a=ng, b=zr, out=JGToutMnot)); // J0 test if ng and zr are both zero
Not(in=JGToutMnot, out=JGToutM; // J0 test if ng and zr are both zero
And(a=instruction[0], b=JGToutM, out=JGT);
Or(a=JLT, b=JEQ, out=JLE); // out <= 0
Or(a=JGT, b=JLE, out=JMP); // final jump assertion
And(a=instruction[15], b=JMP, out=PCload); // C instruction AND JMP assert to get the PC load bit
// load in all values into the programme counter if load and reset, otherwise continue increasing
PC(in=addressM, load=PCload, inc=true, reset=reset, out=pc);

It is tricky to answer these kinds of questions without doing the work for you, which isn't helpful to you in the long run.
Some general thoughts.
Consider each element in isolation (including the circles where signals come together).
Label each line between elements with a name. These will become internal control lines. It helps reduce the chances of confusion.
Be very careful about junk outputs. If you're not supposed to be putting valid data on outM, use a Mux to output false.
Potential gotcha: I seem to remember that it's a bad idea to use a design output (like outM) as an input to something else. Outputs should only be outputs. Right now you are sending the output of the ALU to outM and using outM as an input to other elements. I suggest you try outputting the ALU to a new signal "ALUout", and using that as the input for the other elements and (through a mux with false controlled by writeM) outM. But remember, writeM is an output! So the block that generates writeM needs to generate a copy of itself to use as the control to the mux. FORTUNATELY, a block can have multiple out statements!
For example, right now you're generating outM like this (I won't comment on whether it is wrong, I am just using it as an illustration):
And(a=instruction[15], b=instruction[3], out=writeM);
You can create a second output like this:
And(a=instruction[15], b=instruction[3], out=writeM, out=writeM2)
and then "clean" your outM like this:
Mux16(a=false,b=ALUout,sel=writeM2,out=outM);
Good luck!

Related

Proper way to use a bus in a for loop in SystemVerilog?

I'm trying to make a module in SystemVerilog that can find the dot product between two vectors with up to 8 8-bit values. I'm trying to make it flexible for vectors of different length, so I have an input called EN that's 3 bits and determines the number of multiplications to perform.
So, if EN == 3'b101, the first five values of each vector will be multiplied and added together, then output as a 32-bit value. Right now, I'm trying to do that like:
int acc = 0;
always_comb
begin
for(int i = 0; i < EN; i++) begin
acc += A[i] * B[i];
end
end
assign OUT = acc;
Where A and B are the two input vectors. However, SystemVerilog is telling me there's an illegal comparison being performed between i and EN.
So my questions are:
1) Is this the proper way to have a variable vector "length" in SystemVerilog?
2) If so, what's the proper way to iterate n times where n is the value on a bus?
Thank you!
I have to guess here, but I'm assuming it's a synthesizer complaining about that code. The synthesizer I use accepts your code with minor modifications, but maybe not all do since the loop can't be unrolled statically (notice I have input logic [2:0] EN, maybe input int EN does not work due to having too big a max number of cycles). Your loop per se (question #2) is fine.
int acc;
always_comb
begin
// If acc is not reset always_comb tries to update on its old value and puts
// it in sensitivity list, halting simulation... also no initialization to variable
// used in always_comb is allowed.
acc = 0;
...
This is a somewhat decent reason to complain about your otherwise perfectly good code, and the tool does not make the assumption that it is "reasonable" to generate all possible loops in this specific case (if EN was an unsigned integer your chip would be stupidly huge after all): you can force the tool to infer all possibilities with something that looks like the following:
module test (
input int A[8],
input int B[8],
input logic [2:0] EN,
output int OUT
);
int acc[8]; // 8 accumulators
always_comb begin
acc[0] = A[0] * B[0]; // acc[-1] does not exist, different formula!
for (int i = 1; i < 8; i++) begin
// Each partial sum builds on previous one.
acc[i] = acc[i-1] + (A[i] * B[i]);
end
end
assign OUT = acc[EN]; // EN used as selector for a multiplexer on partial sums
endmodule: test
The above module is an explicit description of the "parallel loop" my synthesizer infers.
Regarding your question #1, the answer is "it depends". In hardware there is no variable length, so unless you fix the number of iterations as a parameter as opposed to an input you either have a maximum size and ignore some values or you iterate over multiple cycles using pointers to some memory. If you want to have a variable vector length in a test (not going to silicon) then you can declare a "dynamic array" that you can resize at will (IEEE 1800-2017, 7.5: Dynamic arrays):
int dyn_vec[];
As a final side note, int bad integer good for everything that is not testbench in order to catch X values and avoid RTL-synthesis mismatch.

Interpreting concatenation operator with comparator

Our professor gave us this skeleton for a case statement, and so far no one is able to understand what it's doing.
always#(*)
begin
case(state)
3'b000:{nout, nstate} = (in)?(in=1):(in=0)
endcase
end
More insight:
This is being implemented as a button debouncer.
nout is the output of the next state: a single bit
nstate is the next state: 3 bits
in is also 1 bit wide
My understanding is that the concatenation operator will append nout to nstate resulting in 4 bits. (ie: if nout is 1 and nstate is 010, this part of the statement will produce 1010)
On the other side of the equality assignment we have a simple comparator, which upon further inspection, doesn't seem to do anything...
It's basically saying
if(in == 1) {
in = 1;
} else {
in = 0;
}
With that understanding, we're assigning a single bit to nout and nstate?
This understanding doesn't make any sense to me. I've compared my notes with 2 other classmates whom wrote the exact same thing so I'm thinking either we don't understand the code or there's an error.
Further insight:
Upon researching further, I've found the state diagram appear in multiple places, which makes me fairly confident that this is a common Moore Machine.
i hope that you did not cut and paste those expressions correctly.
3'b000:{nout, nstate} = (in)?(in=1):(in=0);
The above statement is a complete mess. Most likely it will fail any linting. It might be ok syntactically, but makes no sense logically and makes such code unreadable and not maintainable. It has to look like the following:
3'b000:{nout, nstate} = (in)?(1'b1):(1'b0);
The left hand side concat represents a signal with lower 3 bit associated with nstate, and upper n bits with nout. The right hand side ternary operator produces either one bit '1' or 1 bit '0' (actually id does the same int the original expression, because 'in' is 1 bit wide. Verilog will extend the rhs one bit to the size of the lhs and add missing '0's. As a result nout will be 0 and nstate will be either 3'b000 or 3'b001, depending on the value of in.

When this blocks would be executed?

1) When this procedural block will be executed?
output a;
reg a;
always#(a)
begin
// Do something...
end
In C this would be executed when "a" has a value non-zero.
2) When this if statement would be true?
if(!a)
begin
// Do something...
end
I come from C and I'm actually confused about Verilog.
verilog is 4-state simulator, meaning that any bit in a variable could have one of the 4 values: 0, 1, x and z. 'x' expresse an unknown value and all variables in verilog are initialized to this value at the beginning.
now if a is initizlized to x, the following statements will always fail:
if (a) -> because x cannot be true (in verilog)
if (!a) -> false, because !x is also unknown and is x;
You should check verilog 4-state simulation and arithmetic rules.
Always block, on the other hand, will wait till value of 'a' changes, for exemple, from 'x' to '0'.
always #(a) ...
So, untill you assign a value of '1', the 'if' statement will fail and always block will not execute till you change the value of 'a' to something different from 'x'. Initilzation of those variables is usually done in test bench code, say in an initial block. For example
initial #10 a = 1; // initialize 'a' to '1' after 10 time units.
Since you coming from 'c', there is a big difference in the language concept. Verilog is a parallel programming language, it means that all procedural blocks are executed preemptively in parallel. inital blocks and always blocks are procedural blocks. Statements in side a block are executed sequentially, but two different blocks are executed in parallel. Events and delays cause preemption.

Evaluation order for always blocks triggered within always blocks in Verilog?

I understand that, for 2 always blocks with the same trigger, their order of evaluation is completely unpredictable.
However, suppose I have:
always #(a) begin : blockX
c = 0;
d = a + 2;
if(c != 1) e = 2;
end
always #(a) begin : blockY
e = 3;
end
always #(d) begin : blockZ
c = 1;
e = 1;
end
Suppose block X evaluates first. Does changing d in blockX immediately jump to blockZ? If not, when is blockZ evaluated with respect to blockY?
My programmer's instinct thinks of the sequence of events as a stack, where evaluating blockX is like a function call to blockZ and I immediately jump there in the code, then finish evaluating blockX.
However, because we call the active events queue, well, a queue, this suggests blockZ is enqueued at the back of the active events queue, and I'm 100% guaranteed it will be evaluated last (unless there are other triggered always blocks).
There's also the intermediate possibility, where it's neither first nor last but is also evaluated in a random and unpredictable order.
So in this example, are 1, 2, or 3 all possible final values for e, depending on how the compiler is feeling at run time?
Additionally, while I understand, of course, this represents awful style, where might I find the specification for this kind of behvaior?
Always blocks are not function calls. See a recent answer I just gave for a similar question. These blocks are concurrent processes. The LRM only guarentees the ordering of statements within a begin/end block. There is no defined ordering between concurrently executing begin/end blocks (See Section 4.7 Nondeterminism in the 1800-2012 LRM) So a simulator is free to interleave the statements in any way as long as it honors the order within a single block.
So you are correct that e could have the final values 1, 2 or 3 depending on how a simulator decides to implement and optimize your code.

swap two variables in verilog using XOR

I have a line of data of 264 bits in memory buffer written using Verilog HDL.
buffer[2]=264'b000100000100001000000000001000000000000001000001000000000000000000000000000000000000100000010000010000100000000000100000000010000100001100000000000000000000000000000000000010000001000001000010000000000010000000000000010001010000000000000000000000000000000000001000;
I want to transfer 10 bits within the above raw from buffer[2][147:138] bits to buffer[2][59:50], then transfer buffer[2][235:226] bits into buffer[2][147:138]
I try to do this using XOR but it dose not work
buffer[2][59:50]=buffer[2][59:50]^buffer[2][147:138];
buffer[2][147:138]=buffer[2][59:50]^buffer[2][147:138];
buffer[2][59:50]=buffer[2][59:50]^buffer[2][147:138];
buffer[2][235:226]=buffer[2][235:226]^buffer[2][147:138];
buffer[2][147:138]=buffer[2][235:226]^buffer[2][147:138];
buffer[2][235:226]=buffer[2][235:226]^buffer[2][147:138];
How can I do this without using non-blocking assignment ?
You can swap with concatenations, no xor required:
{buffer[2][147:138],buffer[2][59:50]} = {buffer[2][59:50],buffer[2][147:138]};
{buffer[2][235:226],buffer[2][147:138] = {buffer[2][147:138],buffer[2][235:226]};
Your title says swap, but your description says transfer. To transfer you can can still use the same approach:
{buffer[2][147:138],buffer[2][59:50]} = {buffer[2][235:226],buffer[2][147:138]}
// Or you can do this, beware order matters
buffer[2][59:50] = buffer[2][147:138];
buffer[2][147:138] = buffer[2][235:226];
Be careful where you do this in an always block. It can create a combinational feedback loop after synthesized if done incorrectly. The bits must first be assigned by a determinate value (ideally a flop) before doing the swap.
Just create a new variable to hold the new, rearranged, array. This should not generate any logic, you are just rearranging wires.
reg [263:0] reArrBuffer [0:2];
assign reArrBuffer =
'{buffer[0],
buffer[1],
{buffer[2][263:148], buffer[2][235:226], buffer[2][137:60], buffer[2][147:138], buffer[2][49:0]}
};
Note: You need ' in front of the first { to create an assignment pattern for an unpacked array. It can be removed if buffer and reArrBuffer is packed.

Resources