Register Uses in RISC-V - riscv

Am I loading the wrong register here!!!
RISC-V Error Message is:
Error in line 8: "t1" operand is of incorrect type
The Code is:
.data
var_a: .byte 23 52 63 72
var_b: .word 235 263 722 352
.text
main: lui s0, 0x10010
lw t0, 4(s0)
lb t1, 2(s0)
addi t2, t0, t1
lb t3, 3(s0)
sub t4, t2, t3
sw t4, 22(s0)
exit: ori a7, zero, 10
ecall

addi is used for adding a register and an immediate (thus the "i" at the end), a constant value; such as addi t2, t0, 5 (t2 = t0 + 5). Use add instead; add t2, t0, t1

Related

Getting error in primitive output connection must be a scalar var or net

I am a beginner in using HDL and have made several basic modules in Verilog. Now today, while creating one of my project in Verilog I got this strange error on line 5:
primitive output connection must be a scalar var or net
I don't have any clue how to solve this.
I tried changing the buffer module to xor module, but no change was observed.
module decoder(A, B);
input[1:32] A;
output[1:38] B;
buf p1(B[3:3], A[1:1]);
buf p2(B[5:7], A[2:4]);
buf p3(B[9:15], A[5:11]);
buf p4(B[17:31], A[12:26]);
buf p5(B[33:38], A[27:32]);
xor u1(B[1], A[3], A[5], A[7], A[9], A[11], A[13], A[15], A[17], A[19], A[21], A[23], A[25], A[27], A[29], A[31]);
xor u2(B[2], A[3], A[6], A[7], A[10], A[11], A[14], A[15], A[18], A[19], A[22], A[23], A[26], A[27], A[30], A[31]);
xor u3(B[4], A[5], A[6], A[7], A[12], A[13], A[14], A[15], A[20], A[21], A[22], A[23], A[28], A[29], A[30], A[31]);
xor u4(B[8], A[9], A[10], A[11], A[12], A[13], A[14], A[15], A[24], A[25], A[26], A[27], A[28], A[29], A[30], A[31]);
xor u5(B[16], A[17], A[18], A[19], A[20], A[21], A[22], A[23], A[24], A[25], A[26], A[27], A[28], A[29], A[30], A[31]);
xor u6(B[32], 0, A[32]);
endmodule
The simulation is not running, and it gives this error.
Verilog built-in primitives are split into groups each with a specific number of input and output ports.
and nand or nor xor xnor have one output and multiple inputs.
buf and not have multiple outputs and one input.
(There are more types like enable gates and pass gates but let's leave those for now)
Thus a buf instance must be buf <name> (output, output, output,... input);
Thus a xor instance must be xor<name> (output, input, input, input ...);
As you can see your p2(B[5:7], A[2:4]); does not follow this rule as you have three inputs: A[2:4].
As a side note: It is customary to index vectors from high to low: B[13:8]and also go from high to zero: input [31:0] value,. What you do is not wrong but it makes life more difficult if your code has to work together with established code.

Trouble with setting my inputs and outputs ( A complement is direct input(-ed) )

This is the circuit-> http://prntscr.com/lceyql i want to implement(structural) and i am having trouble setting inputs and outputs due to the A complement(A and A complement).
I am new to verilog.
I also want to run a test on the circuit(running it on ModelSim) and i dont know how can i achieve all the 0-1 combinations since there is A and A'(probably will be autoanswered if first question is answered)...meaning something like that:
initial
begin
InA=0; InB=0; InC=0; InD=0; InE=0;
# 10 InA=0; InB=0; InC=1;
# 10 InA=0; InB=1; InC=0;
# 10 InA=0; InB=1; InC=1;
# 10 InA=1; InB=0; InC=0;
# 10 InA=1; InB=0; InC=1;
# 10 InA=1; InB=1; InC=0;
# 10 InA=1; InB=1; InC=1;
# 10 $stop;
end
The module:
module circuit1 (A, B, C, D, E, F);
input A, B, C, D, E;
output F;
wire w1, w2, w3, w4, w5;
nand G1 (w1, A, B);
or G2 (w2, C, D);
nor G3 (w3, E, C);
nor G4 (w4, w1, w2);
nand G5 (w5, w2, w3);
xor G6 (F, w4, w5);
endmodule
I think in order to achieve what you want you can remove E input and replace nor G3 line with nor G3 (w3, ~A, C); That means you provide the complement of A as input

Coding a simple calculator in Verilog gate level

Hi i'm a EE student taking the Digital Systems course and I have an assignment writing a calculator code in Verilog. I'm not supposed to use behavioral codes except for * and / and I have 3 questions concerning my code.
My code is as following.
module fa(x, y, z, s, c);
input x, y, z;
output s, c;
wire p, q, r;
xor XOR1(p, x, y);
xor XOR2(s, p, z);
and AND1(q, x, y);
and AND2(r, z, p);
or OR1(c, q, r);
endmodule
module rca(B, A, C0, CS , S, V);
input [16:0] A, B;
input C0, CS;
output [16:0] S;
output V;
wire [17:1] C;
wire [16:0] nB;
xor SignB0(nB[0], CS, B[0]);
xor SignB1(nB[1], CS, B[1]);
xor SignB2(nB[2], CS, B[2]);
xor SignB3(nB[3], CS, B[3]);
xor SignB4(nB[4], CS, B[4]);
xor SignB5(nB[5], CS, B[5]);
xor SignB6(nB[6], CS, B[6]);
xor SignB7(nB[7], CS, B[7]);
xor SignB8(nB[8], CS, B[8]);
xor SignB9(nB[9], CS, B[9]);
xor SignB10(nB[10], CS, B[10]);
xor SignB11(nB[11], CS, B[11]);
xor SignB12(nB[12], CS, B[12]);
xor SignB13(nB[13], CS, B[13]);
xor SignB14(nB[16], CS, B[14]);
xor SignB15(nB[15], CS, B[15]);
xor SignB16(nB[16], CS, B[16]);
fa Bit0(nB[0], A[0], C0, S[0], C[1]);
fa Bit1(nB[1], A[1], C[1], S[1], C[2]);
fa Bit2(nB[2], A[2], C[2], S[2], C[3]);
fa Bit3(nB[3], A[3], C[3], S[3], C[4]);
fa Bit4(nB[4], A[4], C[4], S[4], C[5]);
fa Bit5(nB[5], A[5], C[5], S[5], C[6]);
fa Bit6(nB[6], A[6], C[6], S[6], C[7]);
fa Bit7(nB[7], A[7], C[7], S[7], C[8]);
fa Bit8(nB[8], A[8], C[8], S[8], C[9]);
fa Bit9(nB[9], A[9], C[9], S[9], C[10]);
fa Bit10(nB[10], A[10], C[10], S[10], C[11]);
fa Bit11(nB[11], A[11], C[11], S[11], C[12]);
fa Bit12(nB[12], A[12], C[12], S[12], C[13]);
fa Bit13(nB[13], A[13], C[13], S[13], C[14]);
fa Bit14(nB[14], A[14], C[14], S[14], C[15]);
fa Bit15(nB[15], A[15], C[15], S[15], C[16]);
fa Bit16(nB[16], A[16], C[16], S[16], C[17]);
xor Overflow(V, C[17], C[16]);
endmodule
module mul(A, B, Z, Vc);
input [16:0] A, B;
output [16:0] Z;
output Vc;
assign Z = A * B;
det Vm(Z, Vc);
endmodule
module div(A, B, Z, Vc);
input [16:0] A, B;
output [16:0] Z;
output Vc;
assign Z = A / B;
det Vd(Z, Vc);
endmodule
module det(A, V);
input [16:0] A;
output V;
always #* begin
if(A>99999) begin
V = 1;
end
else if(A<-9999) begin
V = 1;
end
else begin
V = 0;
end
end
endmodule
module cal_alu(A, B, S, Z, V);
input [16:0] A, B;
input [1:0] S;
output [16:0] Z;
output V;
wire Va, Vb, Vc;
det VA(A, Va);
det VB(B, Vb);
always #* begin
case(S)
2'b00 :
rca Add(A, B, 0, 0, Z, Vc);
2'b01 :
rca Sub(A, B, 0, 1, Z, Vc);
2'b10 :
mul Mul(A, B, Z, Vc);
2'b11 :
div Div(A, B, Z, Vc);
endcase
end
or VV(V, Va, Vb, Vc);
endmodule
How do I detect the overflow numbers A and B concerning whether the two are >99999 or <-9999 especially in gate level? Cause I'm not sure but I thought 'if' and 'case' were behavioral level codes.
So the 2-bit switch is suppose to determine the kind of calculation (addition, subtraction, multiplication, or division) but how do I do that not using 'case'?
How can I detect overflow for the result of the multiplier and the divider? I know addition and subtraction overflows can be detected by XORing the largest two carries but I have no idea for the multiplication and division.
Here are the answers for the questions
yes, if and case statements are behavioral. You have to use a comparator. A simple 2 bit comparator equation for checking if A > B would be O = A0&~B1&~B0 | A1&~B1 | A1&A0&~B0
Use a mux to select the appropriate output instead of a case statement.
Calculate the product with double width for output i.e. 32 bit output for 16 bit inputs and then use a comparator to calculate the overflow.

ARM: Disabling MMU and updating PC

In short, I would like to shut down all MMU (and cache) operations in a Linux context (from inside the Kernel), for debug purposes, just to run some tests. To be perfectly clear, I don't intend that my system still be functional after that.
About my setup: I'm currently fiddling with a Freescale Vybrid (VF610) - which integrates a Cortex A5 - and its low power modes. Since I'm experimenting some suspiciously local memory corruption while the chip is in "Low Power Stop" mode and my DDR3 in self refresh, I'm trying to shift the operations bit by bit, and right now performing all the suspend/resume steps without actually executing the WFI. Since before this instruction I run with address translation, and after that without (it's essentially a reset), I would like to "simulate" that by "manually" shutting down the MMU.
(I currently have no JTAG nor any other debug access to my chip. I load it via MMC/TFTP/NFS, and debug it with LEDs.)
What I've tried so far:
/* disable the Icache, Dcache and branch prediction */
mrc p15, 0, r6, c1, c0, 0
ldr r7, =0x1804
bic r6, r6, r7
mcr p15, 0, r6, c1, c0, 0
isb
/* disable the MMU and TEX */
bic r7, r6, r7
isb
mcr p15, 0, r6, c1, c0, 0 # turn on MMU, I-cache, etc
mrc p15, 0, r6, c0, c0, 0 # read id reg
isb
dsb
dmb
and other variations to the same effect.
What I observe:
Before the MMU block, I can light a LED (3 assembly instructions, no branch, nothing fancy, nor any access to my DDR, which is already in self refresh - the virtual address for the GPIO port is stored in a register before that).
After the MMU block, I can no more, whether I try with physical or virtual addresses.
I think the problem may be related to my PC, which retains an outdated virtual address. Seeing how things are done elsewhere in the kernel, but the other way round (that is, while enabling translation) :
ldr r3, =cpu_resume_after_mmu
instr_sync
mcr p15, 0, r0, c1, c0, 0 # turn on MMU, I-cache, etc
mrc p15, 0, r0, c0, c0, 0 # read id reg
instr_sync
mov r0, r0
mov r0, r0
ret r3 # jump to virtual address
ENDPROC(cpu_resume_mmu)
.popsection
cpu_resume_after_mmu:
(from arch/arm/kernel/sleep.S, cpu_resume_mmu)
I wonder to what this 2 instructions delay is related to, and where it is documented. I've found nothing on the subject. I've tried something equivalent, without success:
adr lr, BSYM(phys_block)
/* disable the Icache, Dcache and branch prediction */
mrc p15, 0, r6, c1, c0, 0
ldr r7, =0x1804
bic r6, r6, r7
mcr p15, 0, r6, c1, c0, 0
isb
/* disable the MMU and TEX */
bic r7, r6, r7
isb
mcr p15, 0, r6, c1, c0, 0 # turn on MMU, I-cache, etc
mrc p15, 0, r6, c0, c0, 0 # read id reg
isb
dsb
msb
mov r0, r0
mov r0, r0
ret lr
phys_block:
blue_light
loop
Thanks to anyone who has a clue or some pointers!
To address the "what this 2-instruction delay is" part of the question, as with much of /arch/arm, it's mostly just leftover legacy guff*.
Back in the days long before any kind of barrier instructions, you had to account for the fact that at the point you switch the MMU, the pipeline contains instructions already fetched and decoded before the switch, so having anything like a branch or memory access in there will go horribly wrong if the address space has changed by the time it executes. The ARMv4 Architecture Reference Manual makes the wonderful statement "The correct code sequence for enabling and disabling the MMU is IMPLEMENTATION DEFINED" - in practice what that mostly meant was that you knew your pipeline was 3 stages long so stuck two NOPs in to fill it safely. Or took full advantage of the fact to do horrible things like arrange a jump straight to a translated VA without going via an identity mapping (yikes!).
From an entertaining trawl of old microarchitecture manuals, 3 NOPs are needed for StrongARM (compared to 2 for the 3-stage ARM7 pipeline), and reading CP15 with a data dependency on the result is the recommended self-synchronising sequence for XScale, which explains the apparently pointless read of the main ID register.
On something modern (ARMv6 or later), however, none of this should be needed as you have architected barriers, so you just flip the switch then issue an isb to flush the pipeline, which is what the instr_sync macro expands to when building for such architectures.
* or a fine example of the Linux "works on everything" approach, depending on your point of view...
Since both Jacen and dwelch kindly brought the answer I needed through a comment (each), I will answer my own question here for the sake of clarity:
The trick was simply to add an identity mapping from/to the page doing the transition, allowing us to jump to it with a "physical" (though actually virtual) PC, then disable MMU.
Here is the final code (a bit specific, but commented):
/* Duplicate mapping to here */
mrc p15, 0, r4, c2, c0, 0 // Get TTRB0
ldr r10, =0x00003fff
bic r4, r10 // Extract page table physical base address
orr r4, #0xc0000000 // Nastily "translate" it to the virtual one
/*
* Here r8 holds vf_suspend's physical address. I had no way of
* doing this more "locally", since both physical and virtual
* space for my code are runtime-allocated.
*/
add lr, r8, #(phys_block-vf_suspend) // -> phys_block physical address
lsr r9, lr, #20 // SECTION_SHIFT -> Page index
add r7, r4, r9, lsl #2 // PMD_ORDER -> Entry address
ldr r10, =0x00000c0e // Flags
orr r9, r10, r9, lsl #20 // SECTION_SHIFT -> Entry value
str r9, [r7] // Write entry
ret lr // Jump / transition to virtual addressing
phys_block:
/* disable the MMU and TEX */
isb
mrc p15, 0, r6, c1, c0, 0
ldr r7, =0x10000001
bic r6, r6, r7
mcr p15, 0, r6, c1, c0, 0 # turn on MMU, I-cache, etc
mrc p15, 0, r6, c0, c0, 0 # read id reg
isb
dsb
dmb
/* disable the Icache, Dcache and branch prediction */
mrc p15, 0, r6, c1, c0, 0
ldr r7, =0x1804
bic r6, r6, r7
mcr p15, 0, r6, c1, c0, 0
isb
// Done !

Icarus produces different results than Silos

I am recieving some strange results when trying to compile and simulate a Verilog module and stimulus. If I simulate it in Silos, the code functions as expected. If I simulate it in Icarus (iverlog and vvp) the time differs from Silos(the starting at 0 rather than 200 I don't care about as much as Silos has 235 -> 255 and Icarus has 235 -> 265). The Silos repeat function works as I would expect, but when using Icarus I can't really seem to figure out how they got that result. Also, when changing the repeat R2GDELAY to 3, Icarus also does not seem to preform as expected. Is there something I am missing when using Icarus such as I must manually set the start time to 0 for an accurate result later in the simulation, or Silos auto-initializes variables which I must do manually in Icarus? This code is taken form a Verilog HDL book which can be found here http://authors.phptr.com/palnitkar/
Here is the code:
`define TRUE 1'b1
`define FALSE 1'b0
`define RED 2'd0
`define YELLOW 2'd1
`define GREEN 2'd2
//State definition HWY CNTRY
`define S0 3'd0 //GREEN RED
`define S1 3'd1 //YELLOW RED
`define S2 3'd2 //RED RED
`define S3 3'd3 //RED GREEN
`define S4 3'd4 //RED YELLOW
//Delays
`define Y2RDELAY 3 //Yellow to red delay
`define R2GDELAY 2 //Red to Green Delay
module sig_control (hwy, cntry, X, clock, clear);
//I/O ports
output [1:0] hwy, cntry;
//2 bit output for 3 states of signal
//GREEN, YELLOW, RED;
reg [1:0] hwy, cntry;
//declare output signals are registers
input X;
//if TRUE, indicates that there is car on
//the country road, otherwise FALSE
input clock, clear;
//Internal state variables
reg [2:0] state;
reg [2:0] next_state;
initial
begin
state = `S0;
next_state = `S0;
hwy = `GREEN;
cntry = `RED;
end
//state changes only at positive edge of clock
always #(posedge clock)
state = next_state;
//Compute values of main signal and country signal
always #(state)
begin
case(state)
`S0: begin
hwy = `GREEN;
cntry = `RED;
end
`S1: begin
hwy = `YELLOW;
cntry = `RED;
end
`S2: begin
hwy = `RED;
cntry = `RED;
end
`S3: begin
hwy = `RED;
cntry = `GREEN;
end
`S4: begin
hwy = `RED;
cntry = `YELLOW;
end
endcase
end
//State machine using case statements
always #(state or X)
begin
if(clear)
next_state = `S0;
else
case (state)
`S0: if(X)
next_state = `S1;
else
next_state = `S0;
`S1: begin //delay some positive edges of clock
repeat(`Y2RDELAY) #(posedge clock) ;
next_state = `S2;
end
`S2: begin //delay some positive edges of clock
//EDIT ADDED SEMICOLON
repeat(`R2GDELAY) #(posedge clock);
next_state = `S3;
end
`S3: if( X)
next_state = `S3;
else
next_state = `S4;
`S4: begin //delay some positive edges of clock
repeat(`Y2RDELAY) #(posedge clock) ;
next_state = `S0;
end
default: next_state = `S0;
endcase
end
endmodule
//Stimulus Module
module stimulus;
wire [1:0] MAIN_SIG, CNTRY_SIG;
reg CAR_ON_CNTRY_RD;
//if TRUE, indicates that there is car on
//the country road
reg CLOCK, CLEAR;
//Instantiate signal controller
sig_control SC(MAIN_SIG, CNTRY_SIG, CAR_ON_CNTRY_RD, CLOCK, CLEAR);
//Setup monitor
initial
$monitor($time, " Main Sig = %b Country Sig = %b Car_on_cntry = %b",
MAIN_SIG, CNTRY_SIG, CAR_ON_CNTRY_RD);
//setup clock
initial
begin
CLOCK = `FALSE;
forever #5 CLOCK = ~CLOCK;
end
//control clear signal
initial
begin
CLEAR = `TRUE;
repeat (5) #(negedge CLOCK);
CLEAR = `FALSE;
end
//apply stimulus
initial
begin
CAR_ON_CNTRY_RD = `FALSE;
#200 CAR_ON_CNTRY_RD = `TRUE;
#100 CAR_ON_CNTRY_RD = `FALSE;
#200 CAR_ON_CNTRY_RD = `TRUE;
#100 CAR_ON_CNTRY_RD = `FALSE;
#200 CAR_ON_CNTRY_RD = `TRUE;
#100 CAR_ON_CNTRY_RD = `FALSE;
#100 $finish;
end
endmodule
Here is the output from Silos:
200 Main Sig = 10 Country Sig = 00 Car_on_cntry = 1
205 Main Sig = 01 Country Sig = 00 Car_on_cntry = 1
235 Main Sig = 00 Country Sig = 00 Car_on_cntry = 1
255 Main Sig = 00 Country Sig = 10 Car_on_cntry = 1
300 Main Sig = 00 Country Sig = 10 Car_on_cntry = 0
305 Main Sig = 00 Country Sig = 01 Car_on_cntry = 0
335 Main Sig = 10 Country Sig = 00 Car_on_cntry = 0
500 Main Sig = 10 Country Sig = 00 Car_on_cntry = 1
505 Main Sig = 01 Country Sig = 00 Car_on_cntry = 1
535 Main Sig = 00 Country Sig = 00 Car_on_cntry = 1
555 Main Sig = 00 Country Sig = 10 Car_on_cntry = 1
600 Main Sig = 00 Country Sig = 10 Car_on_cntry = 0
605 Main Sig = 00 Country Sig = 01 Car_on_cntry = 0
635 Main Sig = 10 Country Sig = 00 Car_on_cntry = 0
800 Main Sig = 10 Country Sig = 00 Car_on_cntry = 1
805 Main Sig = 01 Country Sig = 00 Car_on_cntry = 1
835 Main Sig = 00 Country Sig = 00 Car_on_cntry = 1
855 Main Sig = 00 Country Sig = 10 Car_on_cntry = 1
900 Main Sig = 00 Country Sig = 10 Car_on_cntry = 0
905 Main Sig = 00 Country Sig = 01 Car_on_cntry = 0
935 Main Sig = 10 Country Sig = 00 Car_on_cntry = 0
Here is the output from iverilog:
0 Main Sig = 10 Country Sig = 00 Car_on_cntry = 0
200 Main Sig = 10 Country Sig = 00 Car_on_cntry = 1
205 Main Sig = 01 Country Sig = 00 Car_on_cntry = 1
235 Main Sig = 00 Country Sig = 00 Car_on_cntry = 1
265 Main Sig = 00 Country Sig = 10 Car_on_cntry = 1
300 Main Sig = 00 Country Sig = 10 Car_on_cntry = 0
305 Main Sig = 00 Country Sig = 01 Car_on_cntry = 0
335 Main Sig = 10 Country Sig = 00 Car_on_cntry = 0
500 Main Sig = 10 Country Sig = 00 Car_on_cntry = 1
505 Main Sig = 01 Country Sig = 00 Car_on_cntry = 1
535 Main Sig = 00 Country Sig = 00 Car_on_cntry = 1
565 Main Sig = 00 Country Sig = 10 Car_on_cntry = 1
600 Main Sig = 00 Country Sig = 10 Car_on_cntry = 0
605 Main Sig = 00 Country Sig = 01 Car_on_cntry = 0
635 Main Sig = 10 Country Sig = 00 Car_on_cntry = 0
800 Main Sig = 10 Country Sig = 00 Car_on_cntry = 1
805 Main Sig = 01 Country Sig = 00 Car_on_cntry = 1
835 Main Sig = 00 Country Sig = 00 Car_on_cntry = 1
865 Main Sig = 00 Country Sig = 10 Car_on_cntry = 1
900 Main Sig = 00 Country Sig = 10 Car_on_cntry = 0
905 Main Sig = 00 Country Sig = 01 Car_on_cntry = 0
935 Main Sig = 10 Country Sig = 00 Car_on_cntry = 0
EDIT: Added semicolon as indicated in the code above.
Thanks for your help!
You have race conditions in your logic, as mentioned in the comments you should not be using the clock as an input to your combinational logic.
Review the following two logic blocks:
1. repeat(`R2GDELAY) #(posedge clock)
next_state = `S3;
2. always #(posedge clock)
state = next_state;
When posedge clock occurs, a simulator will chose one of these two statements to execute first, with no rules as to which it might choose. If it chooses #1 first, the next state will be set to S3, and then #2 will execute, assigning state to S3. If #2 executes first, state will be set to something else, and then next_state will be set to S3 after the state is assigned.
Now you have divergent behavior based on which random event was chosen to execute first by the simulator.
The way to avoid this is not to have your combinational blocks look at the clock in any way. The clock should only be used to update your registers, with nonblocking assignments <=.

Resources