RISC V LD error - (.text+0xc4): relocation truncated to fit: R_RISCV_JAL against `*UND*' - riscv

Does any body has clue why I get below error :-
/tmp/cceP5axg.o: in function `.L0 ':
(.text+0xc4): relocation truncated to fit: R_RISCV_JAL against `*UND*'
collect2: error: ld returned 1 exit status

R_RISCV_JAL relocation can represent an even signed 21-bit offset (-1MiB to +1MiB-2). If your symbol is further than this limit , then you have this error.

This error can also happen as an odd result of branch instructions that use hard-coded offsets. I was getting the same exact error on a program that was far less than 2Mib. It turns out it was because I had several instructions that looked like bne rd, rs, offset, but the offset was a number literal like 0x8.
The solution was to remove the literal offset and replace it with a label from the code so it looks like
bne x7, x9, branch_to_here
[code to skip]
branch_to_here:
more code ...
instead of
bne x7, x9, 0x8
[code to skip]
more code ...
When I did that to every branch instruction, the error went away. Sorry to answer this 10 months late, but I hope it helps you, anonymous reader.

Since I've searched many resources to solve this issue, I think my attempt may help others.
There're 2 reasons may trigger this issue:
The target address is an odd:
bne ra, ra, <odd offset>
The target address is a specific value during compile time (not linking):
bne ra, ra, 0x80003000
My attempt to solve:
label:
addi x0, x0, 0x0
addi x0, x0, 0x0
bne ra, ra, label + 6 // Jump to an address that relates to a label
// This can generate Instruction Address Misaligned exception
sub_label:
addi x0, x0, 0x0
beq ra, ra, sub_label // Jump to a label directly
addi x0, x0, 0x0
nop

Related

sse4 packed sum between int32_t and int16_t (sign extend to int32_t)

I have the following code snippet (a gist can be found here) where I am trying to do a sum between 4 int32_t negative values and 4 int16_t values (that will be sign extend to int32_t).
extern exit
global _start
section .data
a: dd -76, -84, -84, -132
b: dw 406, 406, 406, 406
section .text
_start:
movdqa xmm0, [a]
pmovsxwd xmm2, [b]
paddq xmm0, xmm2
;Expected: 330, 322, 322, 274
;Results: 330, 323, 322, 275
call exit
However, when going through my debugger, I couldn't understand why the output results are different from the expected results. Any idea ?
paddq does 64-bit qword chunks, so there's carry across two of the 32-bit boundaries, leading to an off-by-one in the high half of each qword.
paddd is 32-bit dword chunks, matching the pmovsxwd dword element destination size. This is a SIMD operation with 4 separate adds, independent of each other.
BTW, you could have made this more efficient by folding the 16-byte aligned load into a memory operand for padd, but yeah for debugging it can help to see both inputs in registers with a separate load.
default rel ; use RIP-relative addressing modes when possible
_start:
movsxwd xmm0, [b]
paddd xmm0, [a]
Also you'd normally put read-only arrays in section .rodata.

RISC-V interrupts, setting up MTIMECMP

I am trying to write a program in RISC-V assembly for HiFive1 board to wake up with timer interrupt
This is my interrupt setup routine
.section .text
.align 2
.globl setupINTERRUPT
.equ MTIMECMP, 0x2004000
setupINTERRUPT:
addi sp, sp, -16 # allocate a stack frame, moves the stack up by 16 bits
sw ra, 12(sp) # save return adress on stack
li t0, 0x8 # time interval at which to triger the interrupt
li t1, MTIMECMP # MTIMECMP register of the CLINT memmory map
sw t0, 0(t1) # store the interval in MTIMECMP memory location
li t0, 0x800 # make a mask for 3rd bit
csrrs t1, mstatus, t0 # use CRS READ/SET instruction to set 3rd bit using previously defined mask
li t0, 0x3 # make a mask for 0th and 1st bit
csrrc t1, mtvec, t0 # use CSR READ/CLEAR instruction to clear 0th and 1st bit
li t0, 0x80 # make a mask for 7th bit
csrrs t1, mie, t0 # set 7th bit for MACHINE TIMER INTERRUPT ENABLE
lw ra, 12(sp) # restore the return address
addi sp, sp, 16 # dealocating stack frame
ret
I am not too sure if im setting the MTIMECMP correctly, i know its a 64 bit memory location.
I am trying to use this interrupt as a delay timer for a blinking LED (just trying to make sure the interrupt works before i move onto writing a handler)
here is my setLED program. (not that all the GPIO register setup was done previously and is known to work). I have WFI instruction before each of the ON and OFF functions. The LED doesn't light up, even though in the debug mode it does. I think in LED it skips the WFI instruction as if the interrupt was asserted.
.section .text
.align 2
.globl setLED
#include "memoryMap.inc"
#include "GPIO.inc"
.equ NOERROR, 0x0
.equ ERROR, 0x1
.equ LEDON, 0x1
# which LED to set comes into register a0
# desired On/Off state comes into a1
setLED:
addi sp, sp, -16 # allocate a stack frame, moves the stack up by 16 bits
sw ra, 12(sp) # save return adress on stack
li t0, GPIO_CTRL_ADDR # load GPIO adress
lw t1, GPIO_OUTPUT_VAL(t0) # get the current value of the pins
beqz a1, ledOff # Branch off to turn off led if a1 requests it
li t2, LEDON # load up valued of LEDON into temp register
beq a1, t2, ledOn # branch if on requested
li a0, ERROR # we got a bad status request, return an error
j exit
ledOn:
wfi
xor t1, t1, a0 # doing xor to only change the value of requested LED
sw t1, GPIO_OUTPUT_VAL(t0) # write the new output value to GPIO out
li a0, NOERROR # no error
j exit
ledOff:
wfi
xor a0, a0, 0xffffffff # invert everything so that all bits are one except the LED we are turning off
and t1, t1, a0 # and a0 and t1 to get the LED we want to turn off
sw t1, GPIO_OUTPUT_VAL(t0) # write the new output value
li a0, NOERROR
exit:
lw ra, 12(sp) # restore the return address
addi sp, sp, 16 # dealocating stack frame
ret

What's the meaning of mov 0x8(%r14,%r15,8),%rax

In here what's the meaning of 0x8(%r14,%r15,8), I know 0x8(%r14,%r15,8) is SRC, but I don't understand why use two register %r14 and %r15 in here, and I don't understand how to cal the src address.
Thanks so much for any input.
Information pulled from http://flint.cs.yale.edu/cs421/papers/x86-asm/asm.html
AT&T Addressing:
Memory Address Reference: Address_or_Offset(%base_or_offset, %Index_Register, Scale)
Final Address Calculation: Address_or_Offset + %base_or_offset + [Scale * %Index_Reg]
Example:
mov (%esi,%ebx,4), %edx /* Move the 4 bytes of data at address ESI+4*EBX into EDX. */

Compare user-inputted string/character to another string/character

So I'm a bit of a beginner to ARM Assembly (assembly in general, too). Right now I'm writing a program and one of the biggest parts of it is that the user will need to type in a letter, and then I will compare that letter to some other pre-inputted letter to see if the user typed the same thing.
For instance, in my code I have
.balign 4 /* Forces the next data declaration to be on a 4 byte segment */
dime: .asciz "D\n"
at the top of the file and
addr_dime : .word dime
at the bottom of the file.
Also, based on what I've been reading online I put
.balign 4
inputChoice: .asciz "%d"
at the top of the file, and put
inputVal : .word 0
at the bottom of the file.
Near the middle of the file (just trust me that there is something wrong with this standalone code, and the rest of the file doesn't matter in this context) I have this block of code:
ldr r3, addr_dime
ldr r2, addr_inputChoice
cmp r2, r3 /*See if the user entered D*/
addeq r5, r5, #10 /*add 10 to the total if so*/
Which I THINK should load "D" into r3, load whatever String or character the user inputted into r2, and then add 10 to r5 if they are the same.
For some reason this doesn't work, and the r5, r5, #10 code only works if addne comes before it.
addr_dime : .word dime is poitlessly over-complicated. The address is already a link-time constant. Storing the address in memory (at another location which has its own address) doesn't help you at all, it just adds another layer of indirection. (Which is actually the source of your problem.)
Anyway, cmp doesn't dereference its register operands, so you're comparing pointers. If you single-step with a debugger, you'll see that the values in registers are pointers.
To load the single byte at dime, zero-extended into r3, do
ldrb r3, dime
Using ldr to do a 32-bit load would also get the \n byte, and a 32-bit comparison would have to match that too for eq to be true.
But this can only work if dime is close enough for a PC-relative addressing mode to fit; like most RISC machines, ARM can't use arbitrary absolute addresses because the instruction-width is fixed.
For the constant, the easiest way to avoid that is not to store it in memory in the first place. Use .equ dime, 'D' to define a numeric constant, then you can use
cmp r2, dime # compare with immediate operand
Or ldr r3, =dime to ask the assembler to get the constant into a register for you. You can do this with addresses, so you could do
ldr r2, =inputVal # r2 = &inputVal
ldrb r2, [r2] # load first byte of inputVal
This is the generic way to handle loading from static data that might be too far away for a PC-relative addressing mode.
You could avoid that by using a stack address (sub sp, #16 / mov r5, sp or something). Then you already have the address in a register.
This is exactly what a C compiler does:
char dime[4] = "D\n";
char input[4] = "xyz";
int foo(int start) {
if (dime[0] == input[0])
start += 10;
return start;
}
From ARM32 gcc6.3 on the Godbolt compiler explorer:
foo:
ldr r3, .L4 # load a pointer to the data section at dime / input
ldrb r2, [r3]
ldrb r3, [r3, #4]
cmp r2, r3
addeq r0, r0, #10
bx lr
.L4:
# gcc greated this "literal pool" next to the code
# holding a pointer it can use to access the data section,
# wherever the linker ends up putting it.
.word .LANCHOR0
.section .data
.p2align 2
### These are in a different section, near each other.
### On Godbolt, click the .text button to see full assembler directives.
.LANCHOR0: # actually defined with a .set directive, but same difference.
dime:
.ascii "D\012\000"
input:
.ascii "xyz\000"
Try changing the C to compare with a literal character instead of a global the compiler can't optimize into a constant, and see what you get.

ARM assembly addition

I'm trying to get modulo of addition of two numbers with ARM 32-bit processor. Well, I'm trying to make it with three unsigned long 128 bit numbers but I cant succeed. Can anyone give me an idea or basic example of it?
mov r1, #11
mov r2, #13
mov r3, #15
add r1, r1,r2
subge r1, r1, r3
ldr lr, address_of_return2
ldr lr, [lr]
bx lr
You need cmp r1,r3 between add and subge. First add, than test if greater than modulo, finally substract if greater or equal (assuming both input numbers are less than modulo).
PS: Or cmp r3,r1.... not sure by the order right now.

Resources