Reading a variable length record mainframe file - mainframe

I have a mainframe data file in binary format, with variable records. No copybook works in this case, nor do I know end of line. How do I read such a file?

Assuming you're reading this file in a COBOL program running on the Mainframe, this is really no problem. COBOL doesn't write null-delimited output. It writes variable length records with the length embedded in the first two bytes of 4-byte prefix area called a (R)ecord (D)escriptor (W)ord, which is NOT included in the record layout copybook. To read such a record back into another COBOL, you just need a properly coded copybook.

Related

What is the purpose of a N-byte 'magic' number?

When parsing NES roms, the first four bytes are a 'magic' number:
78/0x4E (N)
69/0x45 (E)
83/0x53 (S)
26/0x1A (DOS end of file character)
What purpose does this, or any other examples, provide?
So-called "magic numbers" are often specified by binary file formats as a kind of "signature" of the file format. Programs that read the file format can check for the magic number and reject the file as invalid if it doesn't match. This provides an easy way to catch garbage data early. It also allows programs that understand multiple formats to figure out what type of file they're reading, so it can be processed appropriately.
In the case of iNES format ROM images, it's no different. The purpose of the magic number in the header is to tell the emulator (or other program) that this is an iNES format image and that it should be parsed as such.

Changing the head of a large Fortran binary file without dealing with the whole body

I have a large binary file (~ GB size) generated from a Fortran 90 program. I want to modify something in the head part of the file. The structure of the file is very complicated and contains many different variables, which I want to avoid going into. After reading and re-writing the head, is it possible to "copy and paste" the reminder of the file without knowing its detailed structure? Or even better, can I avoid re-writing the whole file altogether and just make changes on the original file? (Not sure if it matters, but the length of the header will be changed.)
Since you are changing the length of the header, I think that you have to write a new, revised file. You could avoid having to "understand" the records after the header by opening the file with stream access and just reading bytes (or perhaps four byte words if the file is a multiple of four bytes) until you reach EOF and copying them to the new file. But if the file was originally created as sequential access and you want to access it that way in the future, you will have to handle the record length information for the header record(s), including altering the value(s) to be consistent with the changed the length of the record(s). This record length information is typically a four-byte integer at beginning and end of each record, but it depends on the compiler.

Windows console application with gets() ROP exploit

I'm trying (for learning purposes) to take advantage of gets() function vulnerability using return-oriented programming (ROP) technique. The target program is a Windows console application that in some point asks for some input, and then uses gets() to store the input in the local 80 characters long array.
I created a file that contains 80 'a' characters in the beginning + some extra characters + 0x5da06c48 address for overwriting the old EIP pointer.
I'm opening the file in text editor and copy-pasting the content into the console as input. I've used IDA Pro (or OllyDbg) to set a breakpoint right after the return from the gets() function and noticed that the address was corrupted - it was set to 0x3fa03f48 (two 3f substitutions).
I've tried other addresses as well - part of them works well, but most of the times the address is being corrupted (sometimes characters missing or substituted, sometimes truncated).
How to get over this problem? Any suggestion will be highly appreciated!
Copy-Pasting binary data is hit-and-miss. Have you tried feeding the input into your test program directly from the file using input redirection?
First of all keep track of the Endianness of your platform. If you think your bits are in the right order but you are still getting malformed input, it might be that your shell/text editor isn't binary safe. You are better off writing an exploit for this flaw in a scripting language such as Python, using the Subprocess library which allows you to write data directly to an arbitrary process's stdin pipe.

How to find the position of Central Directory in a Zip file?

I am trying to find the position of the first Central Directory file header in a Zip file.
I'm reading these:
http://en.wikipedia.org/wiki/Zip_(file_format)
http://www.pkware.com/documents/casestudies/APPNOTE.TXT
As I see it, I can only scan through the Zip data, identify by the header what kind of section I am at, and then do that until I hit the Central Directory header. I would obviously read the File Headers before that and use the "compressed size" to skip the actual data, and not for-loop through every byte in the file...
If I do it like that, then I practically already know all the files and folders inside the Zip file in which case I don't see much use for the Central Directory anymore.
To my understanding the purpose of Central Directory is to list file metadata, and the position of the actual data in the Zip file so you wouldn't need to scan the whole file?
After reading about End Of Central Directory record, Wikipedia says:
This ordering allows a zip file to be created in one pass, but it is
usually decompressed by first reading the central directory at the
end.
How would I find End of Central Directory record easily? We need to remember that it can have an arbitrary sized comment there, so I may not know how many bytes from the end of the data stream it is located at. Do I just scan it?
P.S. I'm writing a Zip file reader.
Start at the end and scan towards the beginning, looking for the end of directory signature and counting the number of bytes you have scanned. When you find a candidate, get the byte 20 offset for the comment length (L). Check if L + 20 matches your current count. Then check that the start of the central directory (pointed to by the byte 12 offset) has an appropriate signature.
If you assumed the bits were pretty random when the signature check happened to be a wild guess (e.g. a guess landing into a data segment), the probability of getting all the signature bits correct is pretty low. You could refine this and figure out the chance of landing in a data segment and the chance of hitting a legitimate header (as a function of the number of such headers), but this is already sounded like a low likelihood to me. You could increase your confidence level by then checking the signature of the first file record listed, but be sure to handle the boundary case of an empty zip file.
I ended up looping through the bytes starting from the end. The loop stops if it finds a matching byte sequence, the index is below zero or if it already went through 64k bytes.
Just cross your fingers and hope that there isn't an entry with the CRC, timestamp or datestamp as 06054B50, or any other sequence of four bytes that happen to be 06054B50.

Splitting long input into multiple text files

I have some code which will generate an infinite number of lines in output. So, I can't store those values in a single output file.
Instead, I split the output file into more files. I am splitting the file according to the index numbers. Now my doubt is I don't know how many numbers my file will be having. So is it possible to split the file into different output without giving index? For example:
first 100,000 lines in m.txt
from 100,001 to next 200,000 in n.txt
If you don't need to be able to find a particular line based on the file name, you can split the output based on the file size. Write lines to m1.txt until the next line will make it >1MB; then move to the next file - m2.txt.
split(1) appears to be exactly the tool for your job.
Generate files with a running index. Start with opening e.g. m_000001.txt. Write a fixed nuber of lines to that file. Close file. Open next file, e.g. m_000002.txt, and continue.
Making sure that you don't overflow the disk is an housekeepting task to be done separately. Here one can think of backups, compression, file rotation and so on.
You may want to use logrotate for this purpose. It has a lot of options: check out the man page.
Here's the introduction of the man page:
"logrotate is designed to ease administration of systems that generate
large numbers of log files. It allows automatic rotation, compression,
removal, and mailing of log files. Each log file may be handled daily,
weekly, monthly, or when it grows too large."
4 ways to split while writing:
A) Fixed no of characters (Size)
B) Fixed no of lines
C) Fixed Interval of time before writing
D) Fixed Counter of a function before calling a write
Based on those splitings, You can name the output file.

Resources