Read file from a specific position in x86 - linux

Is it possible to start reading a file from a specific line or byte. Currently I use this code to read 4 bytes of a file:
section .data
filename db "file.txt", 0
section .bss
read_data resb 4
section .text
global _start
_start:
mov rax, SYS_OPEN
mov rdi, filename
mov rsi, O_RDONLY
mov rdx, 0
syscall
push rax
mov rdi, rax
mov rax, SYS_READ
mov rsi, read_data
mov rdx, 4
syscall
mov rax, SYS_CLOSE
pop rdi
syscall
This code always reads the first 4 bytes, but I want to start reading from other parts of the file, like the middle for example. What do I need to add or change?

A freshly-opened file descriptor starts at position = 0. If you keep reading from the same fd in a loop, you'll get successive chunks. (Use a larger buffer like 8kiB and loop over dwords in user-space, though, using the value that read returned as an upper limit! A system call is very expensive in CPU time.)
Is it possible to start reading a file from a specific line or byte.
Byte: yes
Line: no. In Unix/Linux, the kernel doesn't have an index of line-start byte offsets or any other line-oriented API. The line handling in stdio fgets for example is purely done in user-space. There have been some historical OSes with record-based files, but Unix files are flat arrays of bytes. (They can have holes, unwritten extents, and extended attributes... But the kernel APIs for the main file contents only operate with by byte offsets).
If you want to do lines, read a big block and loop forward until you've seen some number of newlines. If you're not there yet, read another block; repeat until you find the start and end of the line number you want, or you hit EOF. x86-64 can efficiently search 16 bytes at a time with pcmpeqb / pmovmskb / popcnt (popcnt requires SSE4.2 or the specific popcnt feature bit).
Or with just SSE2, or when optimizing for large blocks, with pcmpeqb / psadbw (against all-zero) to hsum bytes to qwords / paddd. Then check how many lines you went every so often with some scalar code. Or keep it simple and branch on finding the first newline in a SIMD vector.
Obviously the slow and simple option is a byte-at-a-time loop that counts '\n' characters - if you know how to do strchr with SSE2 it should be straightforward to vectorize that search using the above suggestions.
But if you only want some specific byte positions, you have two main options:
seek with lseek(2) before read(2) (see #Nicolae Natea's answer)
Use POSIX/Linux pread(2) to read from a specified offset, without moving the fd's file offset for future read calls. The Linux system call name is pread64 (__NR_pread64 equ 17 from asm/unistd_64.h)
ssize_t pread(int fd, void *buf, size_t count, off_t offset); The only difference from read is the offset arg, the 4th arg thus passed in R10 (not RCX like the user-space function calling convention). off_t is a 64-bit type simply passed in a single register in 64-bit code.
Other than the pread64 name in the .h, there's nothing special about the asm interface compared to the C interface, it follows the standard system-calling convention. (It exists since Linux 2.1.60 ; before that glibc's wrapper emulated it with lseek.)
There are other things you can do like mmap, or a preadv system call, but pread is most exactly what you're looking for if you have a known position you want to read from.

Before performing the read you should perform a lseek, so that the file position is updated.
so something along the lines:
mov rdi, rax ; fd
mov rax, SYS_LSEEK
mov rsi, <whatever offset you want>
mov rdx, 0 ; keep 0 if the offset should be from the begining of the file
syscall
note: RDI will still hold the same fd value after a syscall so you don't need extra save/restore for the fd across lseek / read / close.
Tip:
It might be easier to write the code in c and compile it with gcc -g -S -fverbose-asm -Og -c main.c and then look at main.s. (How to remove "noise" from GCC/clang assembly output?). But that will only show the compiler making calls to libc wrapper functions, unless you use inline system call macros like MUSL libc provides.

Related

Is it possible for a program to read itself?

Theoretical question. But let's say I have written an assembly program. I have "labelx:" I want the program to read at this memory address and only this size and print to stdout.
Would it be something like
jmp labelx
And then would i then use the Write syscall , making sure to read from the next instruction from labelx:
mov rsi,rip
mov rdi,0x01
mov rdx,?
mov rax,0x01
syscall
to then output to stdout.
However how would I obtain the size to read itself? Especially if there is a
label after the code i want to read or code after. Would I have to manually
count the lines?
mov rdx,rip+(bytes*lines)
And then syscall with populated registers for the syscall to write to from rsi to rdi. Being stdout.
Is this Even possible? Would i have to use the read syscall first, as the write system call requires rsi to be allocated memory buffer. However I assumed .text is already allocated memory and is read only. Would I have to allocate onto the stack or heap or a static buffer first before write, if it's even possible in the first place?
I'm using NASM syntax btw. And pretty new to assembly. And just a question.
Yes, the .text section is just bytes in memory, no different from section .rodata where you might normally put msg: db "hello", 10. x86 is a Von Neumann architecture (not Harvard), so there's no distinction between code pointers and data pointers, other than what you choose to do with them. Use objdump -drwC -Mintel on a linked executable to see the machine-code bytes, or GDB's x command in a running process, to see bytes anywhere.
You can get the assembler to calculate the size by putting labels at the start/end of the part you want, and using mov edx, prog_end - prog_start in the code at the point where you want that size in RDX.
See How does $ work in NASM, exactly? for more about subtracting two labels (in the same section) to get a size. (Where $ is an implicit label at the start of the current line, although $ isn't likely what you want here.)
To get the current address into a register, you need a RIP-relative LEA, not mov, because RIP isn't a general-purpose register and there's no special form of mov that reads it.
here:
lea rsi, [rel here] ; with DEFAULT REL you could just use [here]
mov edi, 1 ; stdout fileno
mov edx, .end - here ; assemble-time constant size calculation
mov eax, 1 ; __NR_write
syscall
.end:
This is fully position-independent, unlike if you used mov esi, here. (How to load address of function or label into register)
The LEA could use lea rsi, [rel $] to assemble to the same machine-code bytes, but you want a label there so you can subtract them.
I optimized your MOV instructions to use 32-bit operand-size, implicitly zero-extending into the full 64-bit RDX and RAX. (And RDI, but write(int fd, void *buf, size_t len) only looks at EDI anyway for the file descriptor).
Note that you can write any bytes of any section; there's nothing special about having a block of code write itself. In the above example, put the start/end labels anywhere. (e.g. foo: and .end:, and mov edx, foo.end - foo taking advantage of how NASM local labels work, by appending to the previous non-local label, so you can reference them from somewhere else. Or just give them both non-dot names.)

CPU_ZERO "undefined symbol" using pthread_setaffinity_np in NASM

I am using the pthreads library in NASM under Ubuntu 18.04. Thread creation works correctly, but I want to assign each thread to a separate core with pthread_setaffinity_np.
Following is the section of code I use to initialize threads. It compiles as written but at run time I get "undefined symbol: CPU_ZERO."
Using examples from C, I inserted %define _GNU_SOURCE at the top of the program, but I still get the undefined symbol CPU_ZERO error.
section .data align=16
; For thread scheduling:
cpuset: times 4 dq 0
section .text
label_0:
mov rdi,ThreadID ; ThreadCount
mov rsi,pthread_attr_t ; Thread Attributes
mov rdx,Test_fn ; Function Pointer
mov rcx,pthread_arg
call pthread_create wrt ..plt
; Set affinity mask
mov rdi,cpuset
call CPU_ZERO wrt ..plt
call pthread_self wrt ..plt
push rax
mov rdi,rax
mov rsi,cpuset
call CPU_SET wrt ..plt
pop rax
mov rdi,rax
mov rsi,32
mov rdx,cpuset
call pthread_setaffinity_np wrt ..plt
; check the result with pthread_getaffinity_np
mov rax,[tcounter]
add rax,8
mov [tcounter],rax
mov rbx,[Number_Of_Cores]
cmp rax,rbx
jl label_0
My question is: how do I use CPU_ZERO and CPU_SET in NASM (or any other assembly language; I can translate to NASM).
Thanks for any help.
CPU_ZERO and CPU_SET are C macros, not functions which you can call.
You'll have to roll your own function to perform equivalent zeroing / setting.
Those are CPP macros, not function. You can tell from the all-caps names. And from the fact the man page calls them macros.
As usual, the notes section of the man page has details that are useful for asm:
Since CPU sets are bit masks allocated in units of long words, the
actual number of CPUs in a dynamically allocated CPU set will be
rounded up to the next multiple of sizeof(unsigned long). An
application should consider the contents of these extra bits to be
undefined.
Notwithstanding the similarity in the names, note that the constant
CPU_SETSIZE indicates the number of CPUs in the cpu_set_t data type
(thus, it is effectively a count of the bits in the bit mask), while
the setsize argument of the CPU_*_S() macros is a size in bytes.
On my system (Arch Linux, glibc 2.29-4)
/usr/include/bits/cpu-set.h says
...
#define __CPU_SETSIZE 1024
#define __NCPUBITS (8 * sizeof (__cpu_mask))
...
typedef __CPU_MASK_TYPE __cpu_mask; // ultimately unsigned long via some other headers
...
typedef struct
{
__cpu_mask __bits[__CPU_SETSIZE / __NCPUBITS];
} cpu_set_t;
So a cpu_set_t is 1024 bits = 128 bytes = times 16 dq 0 or resq 16, at least on my system with that kernel config.
CPU_ZERO is free in your case; your statically-allocated cpu_set_t is statically zero-initialized. For some reason you put it in .data instead of .bss, so the executable will have to actually contain those zeros, but same difference.
If you did want to zero one on the stack, for example, rep stosd is one easy way, or xorps xmm0, xmm0 and 8x movups stores would also work.
Since high performance is not essential (CPU affinity-setting code probably only runs once), bts is a very convenient way to set bits in a bitmap (CPU_SET). With a memory destination, it takes a bit-index that can go outside the dword selected by the addressing mode. bts mem, reg is slow and microcoded (like 10 uops on Skylake), but nice for code size. bts mem, imm is only 3 uops, but or byte [mem + i/8], 1<<(i%8) is only 2 uops.
or also lets you set more than 1 bit at once, or more simply just mov store some bytes that contain the desired pattern of zeros and ones.
But TL:DR: it's just a bitmap, manipulate it however you like using asm, or even statically initialize it with non-zero values.
My approach to solving this problem is summarized in my answer at Reproduce these C types in assembly?.

How do I test to ensure only an integer is entered and ensure length of input is 5 bytes or less in Assembly?

How do I test to ensure only an integer is entered and ensure length of input is 5 bytes or less in the following code?
I am trying to understand how to properly control input so that the input beyond 5 bytes is not outputted to the terminal upon exiting of the program.
In addition, how would I test to ensure only a string is entered and finally in the last scenario, only a double is entered?
*** Updated code based on x82 and Peter C's guidance. I did some C disas and was able to amend my original code below. It still has some flaws but you are both a great deal of help! I am just stuck on when more than 5 integer bytes are entered it wont re-prompt as it does when I enter in a character data as it continues to dump extra bytes data to tty.
SECTION .data ; initialized data section
promptInput db 'Enter Number: ', 0
lenPromptInput equ $ - promptInput
displayInput db 'Data Entered: ', 0
lenDisplayInput equ $ - lenDisplayInput
SECTION .bss ; uninitialized data section
number resb 1024 ; allocate 1024 bytes for number variable
SECTION .text ; code section
global _start ; linker entry point
_start:
nop ; used for debugging
Read:
mov eax, 4 ; specify sys_write call
mov ebx, 1 ; specify stdout file descriptor
mov ecx, promptInput ; display promptInput
mov edx, lenPromptInput ; length of promptInput
int 0x80 ; call sys_write
mov eax, 3 ; specify sys_read call
mov ebx, 0 ; specify stdin file descriptor
mov ecx, number ; pass address of the buffer to read to
mov edx, 1024 ; specify sys_read to read 1024 bytes stdin
int 0x80 ; call sys_read
cmp eax, 0 ; examine sys_read return value in eax
je Exit ; je if end of file
cmp byte [number], 0x30 ; test input against numeric 0
jb Read ; jb if below 0 in ASCII chart
cmp byte [number], 0x39 ; test input against numeric 9
ja Read ; ja if above 9 in ASCII chart
Write:
mov eax, 4 ; specify sys_write call
mov ebx, 1 ; specify stdout file descriptor
mov ecx, displayInput ; display displayInput
mov edx, lenDisplayInput ; length of displayInput
int 0x80 ; call sys_write
mov eax, 4 ; specify sys_write call
mov ebx, 1 ; specify stdout file descriptor
mov ecx, number ; pass address of the number to write
mov edx, 5 ; pass number of numbers to write
int 0x80 ; call sys_write
Exit:
mov eax, 1 ; specific sys_exit call
mov ebx, 0 ; return code 0 to OS
int 0x80 ; call sys_exit
(Since you accepted this answer, I'll point out that the actual answer to this question about using read on TTYs is my other answer on this question.)
Here's an answer to your low-quality followup question which I was about to post when you deleted it.
Note that I said "you can ask for debugging help in a new question", not that you should ask 3 different questions in one, and re-post your whole code barely changed with no serious attempt at solving your own problem. It's still up to you to make the new question a good question.
I probably wouldn't have answered it if I hadn't sort of led to you posting it in the first place. Welcome to StackOverflow, I'm being generous since you're new and don't know what's a good question yet.
The usual term for the characters '0' through '9' is "digit", not "integer". It's much more specific.
ensure only integers are inputted in the buffer
You can't. You have to decide what you want to do if you detect such input.
Need help creating an array to loop through
Your buffer is an array of bytes.
You can loop over it with something like
# eax has the return value from the read system call, which you've checked is strictly greater than 0
mov esi, number ; start pointer
scan_buffer:
movzx edx, byte [esi]
# do something with the character in dl / edx
...
inc esi ; move to the next character
dec eax
jnz scan_buffer ; loop n times, where n = number of characters read by the system call.
ensure characters over the 1024 buffer do not send data to the tty
If you're worried that 1024 isn't necessarily big enough for this toy program, then use select(2) or poll(2) to check if there's more input to be read without blocking if there isn't.
I'm just going to answer the POSIX systems programming part of the question, and leave it up to you to make the right system calls once you know what you want your program to do. Use gdb to debug it (see the bottom of the x86 tag wiki for debug tips), and strace to trace system calls.
You might want to write your program in C instead of trying to learn asm and the Unix system call API at the same time. Write something in C to test the idea, and then implement it in asm. (Then you can look at the compiler output when you get stuck, to see how the compiler did things. As long as you carefully read and understand how the compiler-generated code works, you're still learning. I'd suggest compiling with -O2 or -O3 as a starting point for an asm implementation. At least -O1, definitely not -O0.).
I am trying to understand how to properly control input so that the input beyond 5 bytes is not outputted to the terminal upon exiting of the program.
This is just a POSIX semantics issue, nothing to do with asm. It would be the same if you were doing systems programming in C, calling the read(2) system call. You're calling it in asm with mov eax,3 / int 0x80, instead of calling the glibc wrapper like C compiler output would, but it's the same system call.
If there is unread data on the terminal (tty) when your program exits, the shell will read it when it checks for input.
In an interactive shell running on a tty, programs you run (like ./a.out or /bin/cat) have their stdin connected to the same tty device that the shell takes interactive input from. So unread data on your program's stdin is the same thing as unread data that the shell will see.
Things are different if you redirected your program's input from a file. (./a.out < my_file.txt). Then your program won't start with an already-open file descriptor for the tty. It could still open("/dev/tty") (which is a "magic" symlink that always refers to the controlling tty) and vacuum up anything that was typed while it was running.
ensure only an integer is entered and ensure length of input is 5 bytes or less in the following code?
You can't control what your input will be. You can detect input you don't like, and print an error message or anything else you want to do.
If you want input characters to stop echoing to the screen after 5 bytes, you'd need to put the tty into raw mode (instead of the default line-buffered "cooked" mode) and either do the echo manually, or disable echo after 5 bytes. (The latter wouldn't be reliable, though. There'd be a race condition between disabling echo and the user typing a 6th byte (e.g. as part of a paste).
RE: edit
I am just stuck on when more than 5 integer bytes are entered it wont re-prompt as it does when I enter in a character data as it continues to dump extra bytes data to tty.
You broke your program, because the logic is still designed around re-read()ing a character if you don't like the digit you read. But your read call reads up to 5 bytes.
The normal thing to do is one big read and then parse the whole line by looping over the bytes in the buffer. So use a big buffer (like 1024 bytes) in the .bss section, and make a read system call.
Don't make another read system call unless you want to prompt the user to enter another line of text.

How are system calls interpreted in x86 assembly linux

I am confused towards why/how a value gets printed in x86 assembly in a Linux environment.
For example if I wish to print a value I would do this:
mov eax, 4
mov ebx, 1
mov ecx, msg
mov edx msgLength
int 80h
Now I understand the numerical value 4 will make the system call to sys_write after the interrupt. But my question is, what is the significance of the 4? Is it loading the address of the decimal value 4 into eax? Or is it loading the value 4 into the eax register?
I am confused after reading I can transfer the value at an address to a register using the following instruction:
mov eax, [msg]
eax will now contain the bytes at the address of msg, but I would guess this format is not acceptable:
mov eax, [4]
So what is really happening when I move 4 into eax to print something?
Simply the value (number) 4 is loaded into eax, no magic there. The operating system will look at the value in eax to figure out what function you want. System call number is a code that identifies the various available kernel functions you can use.
Linux kernel maintains all the system call routines as an array of function pointers (can be called as sys_call table) and the value in the eax gives the index to that array (which system call to choose) by the kernel. Other registers like ebx, ecx, edx contains the appropriate parameters for that system call routine.
And the int 80h is for software interrupt to the cpu from user mode to kernel mode because actual system call routine is kernel space function.

How do i read single character input from keyboard using nasm (assembly) under ubuntu?

I'm using nasm under ubuntu. By the way i need to get single input character from user's keyboard (like when a program ask you for y/n ?) so as key pressed and without pressing enter i need to read the entered character. I googled it a lot but all what i found was somehow related to this line (int 21h) which result in "Segmentation Fault". Please help me to figure it out how to get single character or how to over come this segmentation fault.
It can be done from assembly, but it isn't easy. You can't use int 21h, that's a DOS system call and it isn't available under Linux.
To get characters from the terminal under UNIX-like operating systems (such as Linux), you read from STDIN (file number 0). Normally, the read system call will block until the user presses enter. This is called canonical mode. To read a single character without waiting for the user to press enter, you must first disable canonical mode. Of course, you'll have to re-enable it if you want line input later on, and before your program exits.
To disable canonical mode on Linux, you send an IOCTL (IO ControL) to STDIN, using the ioctl syscall. I assume you know how to make Linux system calls from assembler.
The ioctl syscall has three parameters. The first is the file to send the command to (STDIN), the second is the IOCTL number, and the third is typically a pointer to a data structure. ioctl returns 0 on success, or a negative error code on fail.
The first IOCTL you need is TCGETS (number 0x5401) which gets the current terminal parameters in a termios structure. The third parameter is a pointer to a termios structure. From the kernel source, the termios structure is defined as:
struct termios {
tcflag_t c_iflag; /* input mode flags */
tcflag_t c_oflag; /* output mode flags */
tcflag_t c_cflag; /* control mode flags */
tcflag_t c_lflag; /* local mode flags */
cc_t c_line; /* line discipline */
cc_t c_cc[NCCS]; /* control characters */
};
where tcflag_t is 32 bits long, cc_t is one byte long, and NCCS is currently defined as 19. See the NASM manual for how you can conveniently define and reserve space for structures like this.
So once you've got the current termios, you need to clear the canonical flag. This flag is in the c_lflag field, with mask ICANON (0x00000002). To clear it, compute c_lflag AND (NOT ICANON). and store the result back into the c_lflag field.
Now you need to notify the kernel of your changes to the termios structure. Use the TCSETS (number 0x5402) ioctl, with the third parameter set the the address of your termios structure.
If all goes well, the terminal is now in non-canonical mode. You can restore canonical mode by setting the canonical flag (by ORing c_lflag with ICANON) and calling the TCSETS ioctl again. always restore canonical mode before you exit
As I said, it isn't easy.
I needed to do this recently, and inspired by Callum's excellent answer, I wrote the following (NASM for x86-64):
DEFAULT REL
section .bss
termios: resb 36
stdin_fd: equ 0 ; STDIN_FILENO
ICANON: equ 1<<1
ECHO: equ 1<<3
section .text
canonical_off:
call read_stdin_termios
; clear canonical bit in local mode flags
and dword [termios+12], ~ICANON
call write_stdin_termios
ret
echo_off:
call read_stdin_termios
; clear echo bit in local mode flags
and dword [termios+12], ~ECHO
call write_stdin_termios
ret
canonical_on:
call read_stdin_termios
; set canonical bit in local mode flags
or dword [termios+12], ICANON
call write_stdin_termios
ret
echo_on:
call read_stdin_termios
; set echo bit in local mode flags
or dword [termios+12], ECHO
call write_stdin_termios
ret
; clobbers RAX, RCX, RDX, R8..11 (by int 0x80 in 64-bit mode)
; allowed by x86-64 System V calling convention
read_stdin_termios:
push rbx
mov eax, 36h
mov ebx, stdin_fd
mov ecx, 5401h
mov edx, termios
int 80h ; ioctl(0, 0x5401, termios)
pop rbx
ret
write_stdin_termios:
push rbx
mov eax, 36h
mov ebx, stdin_fd
mov ecx, 5402h
mov edx, termios
int 80h ; ioctl(0, 0x5402, termios)
pop rbx
ret
(Editor's note: don't use int 0x80 in 64-bit code: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? - it would break in a PIE executable (where static addresses aren't in the low 32 bits), or with the termios buffer on the stack. It does actually work in a traditional non-PIE executable, and this version can be easily ported to 32-bit mode.)
You can then do:
call canonical_off
If you're reading a line of text, you probably also want to do:
call echo_off
so that each character isn't echoed as it's typed.
There may be better ways of doing this, but it works for me on a 64-bit Fedora installation.
More information can be found in the man page for termios(3), or in the termbits.h source.
The easy way: For a text-mode program, use libncurses to access the keyboard; for a graphical program, use Gtk+.
The hard way: Assuming a text-mode program, you have to tell the kernel that you want single-character input, and then you have to do a whole lot of bookkeeping and decoding. It's really complicated. There is no equivalent of the good old DOS getch() routine. You can start learning about how to do it here: Terminal I/O. Graphical programs are even more complicated; the lowest-level API for that is Xlib.
Either way, you're going to go mad coding whatever this is in assembly; use C instead.

Resources