How can I dump z/OS above-the-bar memory for debugging - 64-bit

I am writing a utility to exploit above-the-bar memory. I would like to dump the memory segments that I have allocated to help with debugging. SNAP and SNAPX refer to 24 and 31 bit addressing modes, but not 64. Forcing an abend using an ABEND macro or decimal zero divide provide very limited amounts of above the bar memory in the dump. Does anyone have experience dumping above-the-bar memory in 64 bit addressing mode? Do you have suggestions?
I was able to access the information using the IEATDUMP macro.
Thanks for your interest and responses.

I would recommend taking a slip dump and importing the dump into Abend-Aid. Abend-Aid is great for above-the-bar debugging.
Execute the slip command below.
Once it dumps, type 'TSO DUMPLOG' on the command line and copy the dump dataset.
Open Abend-Aid, type 'imp' on the command line and import the dump dataset.
Slip Command Example:
SL SET,IF,EN,ID=CBB0,P=(MYPROGRAM,000328),A=SVCD,AL=(CU,S),E
Where MYPROGRAM is the name of the job and 000328 is the offset in the program when I want a dump.
Note, this command is executed in SDSF

Related

Is there a way to see what happens during program execution in detail on Linux?

I am trying to debug a performance of my program. What would be ideal is to have a way to see in detail when was the thread doing useful work, when was it blocked by page faults, when was it executing some memory writes and reads, etc...
I would simply like to have a detailed understanding of whats going on. Is it possible?
The linux kernel sources come with the perf tool that can measure a large number of performance counter, all of those you listed included, and can print statistics about it, annotate symbols, instructions and source lines with them (if debug symbols are available), and can track any process or also logical cpu cores.
Your Linux distribution will have the tool probably in a standalone package. Some hardening options of the kernel may limit what information root or non-root users can collect with it.
You can use perf and visualizing a perf output file graphically with hotspot

How do different commands get executed in CPU x86-64 registers?

Years ago a teacher once said to class that 'everything that gets parsed through the CPU can also be exploited'.
Back then I didn't know too much about the topic, but now the statement is nagging on me and I
lack the correct vocabulary to find an answer to this question in the internet myself, so I kindly ask you for help.
We had the lesson about 'cat', 'grep' and 'less' and she said that in the worst case even those commands can cause harm if we parse the wrong content through it.
I don't really understand how she meant that. I do know how CPU registers work, we also had to write an educational buffer overflow so I have seen assembly code in the registers aswell.
I still don't get the following:
How do commands get executed in the CPU at all? e.g. I use 'cat' so somehwere there will be a call of the command. But how does the data I enter get parsed to the CPU? If I 'cat' a .txt file which contains 'hello world' - can I find that string in HEX somewhere in the CPU registers? And if yes:
How does the CPU know that said string is NOT to be executed?
Could you think of any scencario where the above commands could get exploited? Afaik only text gets parsed through it, how could that be exploitable? What do I have to be careful about?
Thanks alot!
Machine code executes by being fetched by the instruction-fetch part of the CPU, at the address pointed to by RIP, the instruction-pointer. CPUs can only execute machine code from memory.
General-purpose registers get loaded with data from data load/store instructions, like mov eax, [rdi]. Having data in registers is totally unrelated to having it execute as machine code. Remember that RIP is a pointer, not actual machine-code bytes. (RIP can be set with jump instructions, including indirect jump to copy a GP register into it, or ret to pop the stack into it).
It would help to learn some basics of assembly language, because you seem to be missing some key concepts there. It's kind of hard to answer the security part of this question when the entire premise seems to be built on some misunderstanding of how computers work. (Which I don't think I can easily clear up here without writing a book on assembly language.) All I can really do is point you at CPU-architecture stuff that answers part of the title question of how instructions get executed. (Not from registers).
Related:
How does a computer distinguish between Data and Instructions?
How instructions are differentiated from data?
Modern Microprocessors
A 90-Minute Guide! covers the basic fetch/decode/execute cycle of simple pipelines. Modern CPUs might have more complex internals, but from a correctness / security POV are equivalent. (Except for exploits like Spectre and Meltdown that depend on speculative execution).
https://www.realworldtech.com/sandy-bridge/3/ is a deep-dive on Intel's Sandybridge microarchitecture. That page covering instruction-fetch shows how things really work under the hood in real CPUs. (AMD Zen is fairly similar.)
You keep using the word "parse", but I think you just mean "pass". You don't "parse content through" something, but you can "pass content through". Anyway no, cat usually doesn't involve copying or looking-at data in user-space, unless you run cat -n to add line numbers.
See Race condition when piping through x86-64 assembly program for an x86-64 Linux asm implementation of plain cat using read and write system calls. Nothing in it is data-dependent, except for the command-line arg. The data being copied is never loaded into CPU registers in user-space.
Inside the kernel, copy_to_user inside Linux's implementation of a read() system call on x86-64 will normally use rep movsb for the copy, not a loop with separate load/store, so even in kernel the data gets copied from the page-cache, pipe buffer, or whatever, to user-space without actually being in a register. (Same for write copying it to whatever stdout is connected to.)
Other commands, like less and grep, would load data into registers, but that doesn't directly introduce any risk of it being executed as code.
Most of the things have already been answered by Peter. However i would like to add a few things.
How do commands get executed in the CPU at all? e.g. I use 'cat' so somehwere there will be a call of the command. But how does the data I enter get parsed to the CPU? If I 'cat' a .txt file which contains 'hello world' - can I find that string in HEX somewhere in the CPU registers?
cat is not directly executed by the CPU cat.c. You could check the source code and get and in-depth view. .
What actually happens is that each instruction is converted to assembly instruction and they get executed by the CPU. The instructions are not vulnerable because what they do is just move some data and switch some bits. Most of the vulnerability are due to memory management and cat has been vulnerable in the past Check this for more detail
How does the CPU know that said string is NOT to be executed?
It does not. Its the job of the operating system to tell what is to be executed and what not.
Could you think of any scencario where the above commands could get exploited? Afaik only text gets parsed through it, how could that be exploitable? What do I have to be careful about?
You have to be careful about how you are passing the text file to the memory. You could even make your own interpreter that would execute txt file and then the interpreter will be telling the CPU about how to execute that instruction.

How to dump the heap of running C++ process to a file under Linux?

I've got a program that is running on a headless/embedded Linux box, and under certain circumstances that program seems to be using up quite a bit more memory (as reported by top, etc) than I would expect it to use.
Since the fault condition is difficult to reproduce outside of the actual working environment, and since the embedded box doesn't have niceties like valgrind or gdb installed, what I'd like to do is simply write out the process's heap-memory to a file, which I could then transfer to my development machine and look through at my leisure, to see if I can tell from the contents of the file what kind of data it is that is taking up the bulk of the heap. If I'm lucky there might be a smoking gun like a repeating string or magic-number that comes up a lot, that points me to the place in my code that is either leaking or perhaps just growing a data structure without bounds.
Is there a good way to do this? The only way I can think of would be to force the process to crash and then collect a core dump, but since the fault condition is rare it would be preferable if I could collect the information without crashing the process as a side effect.
You can read the entire memory space of the process via /proc/pid/mem; You can read /proc/pid/maps to see what is where in the memory space (so you can find the bounds of the heap and read just that). You can attempt to read the data while the process is running (in which case it might be changing while you are reading it), or you can stop the process with a SIGSTOP signal and later resume it with a SIGCONT.

ImageMagick's display GPU "memory leak"?

I'm testing CUDA app and I have run into strange memory issue:
My program performs some image operations and displays it using ImageMagick's display program.
The problem is that every time I run that IM's display I get more GPU memory usage, so less memory for GPU computation.
I'm using IM's display, because I couldn't find anything that displays image from the pipe input. Any suggestions?
Anyway why IM's display takes so much GPU memory and why is it not freed?
Based on your question, you're attempting to display a series of files in sequence using a shell not unlike Bash after performing a set of GPU-intensive operations. You're curious why more GPU memory is being consumed with every subsequent invocation of ImageMagick display, which appears to be closing out successfully after the conclusion of each operation.
We may further theorize that you're using ImageMagick's OpenCL support for at least some of your processing. While we don't have enough information to determine what your GPU's texture buffers look like at the completion of each rendering via display, I speculate your GPU isn't freeing textures expediently, causing memory to slowly creep up.
Instead of continuing to build conjecture around this hypothesis, I will instead recommend a tool to debug your issue: gDEBugger. This should allow you to interrogate your video card to determine exactly why things are slowing down.
Best of luck with your application.
I know it's old, but we have figured out that using pipes (popen()) makes sophisticated copy of the program in memory, what also causes copying the end program directives, or whatever called... So when I close program opened with popen I also finish all CUDA related context that are usually freed in "background", when program ends. So cleaning CUDA memory after I close popen application won't work, and I thing here was my memory leak and general major program error.
I hope someone will find it useful.

How to find or calculate a Linux process's page table size and other kernel accounting?

How can I find out how big a Linux process's page table is, along with any other variable-size process accounting?
If you are really interested in the page tables, do a
$ cat /proc/meminfo | grep PageTables
PageTables: 24496 kB
Since Linux 2.6.10, the amount of memory used by a single process' page tables has been exposed via the VmPTE field of /proc/<pid>/status.
Not sure about Linux, but most UNIX variants provide sysctl(3) for this purpose. There is also the sysctl(8) command line utility.
Hmmm, back in Ye Olden Tymes, we used to call nlist(3) to get the system address for the data we were interested in, then open /dev/kmem, seek to the address, then read the data. Not sure if this works in Linux, but it might be worth typing "man 3 nlist" and seeing what comes back.
You should describe your problem, and not ask about details. If you fork too much (especially with a process which has a large address space) there are all kind of things which go wrong (including out of memory), hitting a pagetable maximum size is IMHO not a realistic problem.
Thad said, I would also be interested to read a process pagetable share in Linux.
As a simple rule of thumb you can however asume that each process occopies a share in the pagetable which is equal to its virtual size, for example 6 bytes for each page. So for example if you have a Oracle Database with 8GB SGA and 500 Processes sharing it, each of the process will use 14MB pagetable, which results in 7GB pagetables+8GB SGA. (sample numbers from http://kevinclosson.wordpress.com/2009/07/25/little-things-doth-crabby-make-%E2%80%93-part-ix-sometimes-you-have-to-really-really-want-your-hugepages/)

Resources