How do programs that edit memory of other processes work, such as Cheat Engine and iHaxGamez? My understanding is that a process reading from (let alone writing to) another process' memory is immediate grounds for a segmentation fault.
Gaining access to another processes memory under linux is fairly straightforward (assuming you have sufficient user privileges).
For example the file /dev/mem will provide access to the entire memory space of cpu. Details of the mappings for an individual process can be found in /proc/<pid>/maps.
Another example has been given here.
The operation system's hardware abstraction layer usually offers functions to manipulate the memory of other processes. In Windows, the corresponding functions are ReadProcessMemory and WriteProcessMemory.
It has no reason to segfault; OS (kernel, ...) API is used to write.
Segfault occurs (get signalled) from OS when a process attempts to access it's own memory in a bad way (char[] overflow).
About the games: well, if a value is stored at an address, and gets read sometimes, then it could be modified before next reading occurs.
You can use WinAPI WriteProcessMemory to write to memory space of other process.
Also read some PE/COFF documentation and use VirtualQueryEx and ReadProcessMemory to know what and where to write.
Related
I'm trying to find any system functionality that would allow a process to allocate "temporary" memory - i.e. memory that is considered discardable by the process, and can be take away by the system when memory is needed, but allowing the process to benefit from available memory when possible. In other words, the process tells the system it's OK to sacrifice the block of memory when the process is not using it. Freeing the block is also preferable to swapping it out (it's more expensive, or as expensive, to swap it out rather then re-constitute its contents).
Systems (e.g. Linux), have those things in the kernel, like F/S memory cache. I am looking for something like this, but available to the user space.
I understand there are ways to do this from the program, but it's really more of a kernel job to deal with this. To some extent, I'm asking the kernel:
if you need to reduce my, or another process residency, take these temporary pages off first
if you are taking these temporary pages off, don't swap them out, just unmap them
Specifically, I'm interested on a solution that would work on Linux, but would be interested to learn if any exist for any other O/S.
UPDATE
An example on how I expect this to work:
map a page (over swap). No difference to what's available right now.
tell the kernel that the page is "temporary" (for the lack of a better name), meaning that if this page goes away, I don't want it paged in.
tell the kernel that I need the temporary page "back". If the page was unmapped since I marked it "temporary", I am told that happened. If it hasn't, then it starts behaving as a regular page.
Here are the problems to have that done over existing MM:
To make pages not being paged in, I have to allocate them over nothing. But then, they can get paged out at any time, without notice. Testing with mincore() doesn't guarantee that the page will still be there by the time mincore() finishes. Using mlock() requires elevated privileges.
So, the closest I can get to this is by using mlock(), and anonymous pages. Following the expectations I outlined earlier, it would be:
map an anonymous, locked page. (MAP_ANON|MAP_LOCKED|MAP_NORESERVE). Stamp the page with magic.
for making page "temporary", unlock the page
when needing the page, lock it again. If the magic is there, it's my data, otherwise it's been lost, and I need to reconstitute it.
However, I don't really need for pages to be locked in RAM when I'm using them. Also, MAP_NORESERVE is problematic if memory is overcommitted.
This is what the VmWare ESXi server aka the Virtual Machine Monitor (VMM) layer implements. This is used in the Virtual Machines and is a way to reclaim memory from the virtual machine guests. Virtual machines that have more memory allocated than they actually are using/require are made to release/free it to the VMM so that it can assign it back to the Virtual Machines guests that are in need of it.
This technique of Memory Reclamation is mentioned in this paper: http://www.vmware.com/files/pdf/mem_mgmt_perf_vsphere5.pdf
On similar lines, something similar you can implement in your kernel.
I'm not sure to understand exactly your needs. Remember that processes run in virtual memory (their address space is virtual), that the kernel is dealing with virtual to physical address translation (using the MMU) and with paging. So page fault can happen at any time. The kernel will choose to page-in or page-out at arbitrary moments - and will choose which page to swap (only the kernel care about RAM, and it can page-out any physical RAM page at will). Perhaps you want the kernel to tell you when a page is genuinely discarded. How would the kernel take away temporary memory from your process without your process being notified ? The kernel could take away and later give back some RAM.... (so you want to know when the given back memory is fresh)
You might use mmap(2) with MAP_NORESERVE first, then again (on the same memory range) with MAP_FIXED|MAP_PRIVATE. See also mincore(2) and mlock(2)
You can also later use madvise(2) with MADV_WONTNEED or MADV_WILLNEED etc..
Perhaps you want to mmap some device like /dev/null, /dev/full, /dev/zero or (more likely) write your own kernel module providing a similar device.
GNU Hurd has an external pager mechanism... You cannot yet get exactly that on Linux. (Perhaps consider mmap on some FUSE mounted file).
I don't understand what you want to happen when the kernel is paging out your memory, and what you want to happen when the kernel is paging in again such a page because your process is accessing it. Do you want to get a zero-ed page, or a SIGSEGV ?
I have a Linux machine and I am trying to catch all the writes or reads to the memory for a specific amount of time (I basically need the byte address and the value that is being written). Is there any tool that can help me do that or do I have to change the OS code?
You mentioned that you only want to monitor memory reads and writes to a certain physical memory address. I'm going to assume that when you say memory reads/writes, you mean an assembly instruction that reads/writes data to memory and not an instruction fetch.
You would have to modify some paging code in the kernel so it page faults when a certain address range is accessed. Then, in the page fault handler, you could somehow log the access. You could extract the target address and data by decoding the instruction that caused the fault and reading the data off the registers. After logging, the page is configured to not to fault and the instruction is reattempted. Similar to the copy-on-write technique but you're logging each read/write to the region.
The other, hardware, method is to somehow install a bus sniffer or tap into a hardware debugging interface on your platform to monitor which regions of memory is being accessed but I imagine you'll run into trouble with caches with this method.
As mentioned by another poster, you could also modify an emulator to capture certain memory accesses and run your code on that.
I'd say both methods are very platform specific and will take a lot of effort to do. Out of curiosity, what is it that you're hoping to achieve? There must be a better way to solve it than to monitor accesses to physical memory.
Self-introspection is suitable for some types of debugging. For a complete trace of memory access, it is not. How is the debug code supposed to store a trace without performing more memory access?
If you want to stay in software, your best bet is to run the code being traced inside an emulator. Not a virtual machine that uses the MMU to isolate the test code while still providing direct access, but a full emulator. Plenty exist for x86 and most other architectures you would care about.
Well, if you're just interested in memory reads and writes by a particular process (to part/all of that process's virtual memory space), you can use a combination of ptrace and mprotect (mprotect to make the memory not accessable and ptrace run until it accesses the memory and then single step).
Sorry to say, it's just not possible to do what you want, even if you change OS code. Reads and writes to memory do not go through OS system calls.
The closest you could get would be to use accessor functions for the variables of interest. The accessors could be instrumented to put trace info in a separate buffer. Embedded debugging often does this to get a log of I/O register accesses.
I thought this was expected behavior?
From: http://classic.chem.msu.su/cgi-bin/ceilidh.exe/gran/gamess/forum/?C35e9ea936bHW-7675-1380-00.htm
Paraphrased summary: "Working on the Linux port we found that cudaHostAlloc/cuMemHostAlloc CUDA API calls return un-initialized pinned memory. This hole may potentially allow one to examine regions of memory previously used by other programs and Linux kernel. We recommend everybody to stop running CUDA drivers on any multiuser system."
My understanding was that "Normal" malloc returns un-initialized memory, so I don't see what the difference here is...
The way I understand how memory allocation works would allow the following to happen:
-userA runs a program on a system that crunches a bunch of sensitive information. When the calculations are done, the results are written to disk, the processes exits, and userA logs off.
-userB logs in next. userB runs a program that requests all available memory in the system, and writes the content of his un-initialized memory, which contains some of userA's sensitive information that was left in RAM, to disk.
I have to be missing something here. What is it? Is memory zero'd-out somewhere? Is kernel/pinned memory special in a relevant way?
Memory returned by malloc() may be nonzero, but only after being used and freed by other code in the same process. Never another process. The OS is supposed to rigorously enforce memory protections between processes, even after they have exited.
Kernel/pinned memory is only special in that it apparently gave a kernel mode driver the opportunity to break the OS's process protection guarantees.
So no, this is not expected behavior; yes, this was a bug. Kudos to NVIDIA for acting on it so quickly!
The only part that requires root priviledges to install CUDA is the NVIDIA driver. As a result all operations done using NVIDIA compiler and link can be done using regular system calls, and standard compiling (provided you have the proper information -lol-). If any security holes lies there, it remains, wether or not cudaHostAlloc/cuMemHostAlloc is modified.
I am dubious about the first answer seen on this post. The man page for malloc specifies that
the memory is not cleared. The man page for free does not mention any clearing of the memory.
The clearing of memory seems to be in the responsability of the coder of a sensitive section -lol-, that leave the problem of an unexpected (rare) exit. Apart from VMS (good but not widely used OS), I dont think any OS accept the performance cost of a systematic clearing. I am not clear about the way the system may track in the heap of a newly allocated memory what was previously in the process area, and what was not.
My conclusion is: if you need a strict level of privacy, do not use a multi-user system
(or use VMS).
I'm using Slackware 12.2 on an x86 machine. I'm trying to debug/figure out things by dumping specific parts of memory. Unfortunately my knowledge on the Linux kernel is quite limited to what I need for programming/pentesting.
So here's my question: Is there a way to access any point in memory? I tried doing this with a char pointer so that It would only be a byte long. However the program crashed and spat out something in that nature of: "can't access memory location". Now I was pointing at the 0x00000000 location which where the system stores it's interrupt vectors (unless that changed), which shouldn't matter really.
Now my understanding is the kernel will allocate memory (data, stack, heap, etc) to a program and that program will not be able to go anywhere else. So I was thinking of using NASM to tell the CPU to go directly fetch what I need but I'm unsure if that would work (and I would need to figure out how to translate MASM to NASM).
Alright, well there's my long winded monologue. Essentially my question is: "Is there a way to achieve this?".
Anyway...
If your program is running in user-mode, then memory outside of your process memory won't be accessible, by hook or by crook. Using asm will not help, nor will any other method. This is simply impossible, and is a core security/stability feature of any OS that runs in protected mode (i.e. all of them, for the past 20+ years). Here's a brief overview of Linux kernel memory management.
The only way you can explore the entire memory space of your computer is by using a kernel debugger, which will allow you to access any physical address. However, even that won't let you look at the memory of every process at the same time, since some processes will have been swapped out of main memory. Furthermore, even in kernel mode, physical addresses are not necessarily the same as the addresses visible to the process.
Take a look at /dev/mem or /dev/kmem (man mem)
If you have root access you should be able to see your memory there. This is a mechanism used by kernel debuggers.
Note the warning: Examining and patching is likely to lead to unexpected results when read-only or write-only bits are present.
From the man page:
mem is a character device file that is an image of
the main memory of the computer. It may be used, for
example, to examine (and even patch) the system.
Byte addresses in mem are interpreted as physical
memory addresses. References to nonexistent locations
cause errors to be returned.
...
The file kmem is the same as mem, except that the
kernel virtual memory rather than physical memory is
accessed.
I know that information exchange can happen via following interfaces between kernel and user space programs
system calls
ioctls
/proc & /sys
netlink
I want to find out
If I have missed any other interface?
Which one of them is the fastest way to exchange large amounts of data?
(and if there is any document/mail/explanation supporting such a claim that I can refer to)
Which one is the recommended way to communicate? (I think its netlink, but still would love to hear opinions)
The fastest way to exchange vast amount of data is memory mapping. The mmap call can be used on a device file, and the corresponding kernel driver can then decide to map kernel memory to user address space. A good example of this is the Video For Linux drivers, and I suppose the frame buffer driver works the same way. For an good explanation of how the V4L2 driver works, you have :
The lwn.net article about streaming I/O
The V4L2 spec itself
You can't beat memory mapping for large amount of data, because there is no memcopy like operation involved, the physical underlying memory is effectively shared between kernel and userspace. Of course, like in all shared memory mechanism, you have to provide some synchronisation so that kernel and userspace don't think they have ownership at the same time.
Shared Memory between kernel and usespace is doable.
http://kerneltrap.org/node/14326
For instructions/examples.
You can also use a named pipe which are pretty fast.
All this really depends on what data you are sharing, is it concurrently accessed and what the data is structured like. Calls may be enough for simple data.
Linux kernel /proc FIFO/pipe
Might also help
good luck
You may also consider relay (formerly relayfs):
"Basically relayfs is just a bunch of per-cpu kernel buffers that can be efficiently written into from kernel code. These buffers are represented as files which can be mmap'ed and directly read from in user space. The purpose of this setup is to provide the simplest possible mechanism allowing potentially large amounts of data to be logged in the kernel and 'relayed' to user space."
http://relayfs.sourceforge.net/
You can obviously do shared memory with copy_from_user etc, you can easily set up a character device driver basically all you have to do is make a file_operation structures but this is by far not the fastest way.
I have no benchmarks but system calls on moderns systems should be the fastest. My reasoning is that its what's been most optimized for. It used to be that to get to from user -> kernel one had to create an interrupt, which would then go to the Interrupt table(an array) then locate the interrupt handlex(0x80) and then go to kernel mode. This was really slow, and then came the .sysenter instruction, which basically makes this process really fast. Without going into details, .sysenter reads form a register CS:EIP immediately and the change is quite fast.
Shared memory on the contrary requires writing to and reading from memory, which is infinitely more expensive than reading from a register.
Here is a possible compilation of all the possible interface, although in some ways they overlapped one another (eg, socket and system call are both effectively using system calls):
Procfs
Sysfs
Configfs
Debugfs
Sysctl
devfs (eg, Character Devices)
TCP/UDP Sockets
Netlink Sockets
Ioctl
Kernel System Calls
Signals
Mmap
As for shared memory , I've found that even with NUMA the two thread running on two differrent cores communicate through shared memory still required write/read from L3 cache which if lucky (in one socket)is
about 2X slower than syscall , and if(not on one socket ),is about 5X-UP
slower than syscall,i think syscall's hardware mechanism helped.