Does DOS fill memory after the .exe file at .exe start time? - exe

I was reading https://stackoverflow.com/a/3723153/97248 about how DOS loads an .exe file. There is the minimum required memory (== additional memory) field in the .exe header (at offset 10). When this is nonzero, does DOS fill this part of memory with 0s before calling the entry point of the program?
DOSBox 0.74-3 seems to be filling it with 0s.

The answer is no. More details, including assembly code to manually zero-initialize the additional memory (.bss) are in the answer linked in #RossRidge's comment: https://retrocomputing.stackexchange.com/questions/12027/did-dos-zero-out-the-bss-area-when-it-loaded-a-program/12030#12030

Related

Efficiently inserting blocks into the middle of a file

I'm looking for, essentially, the ext4 equivalent of mremap().
I have a big mmap()'d file that I'm allocating arrays in, and the arrays need to grow. So I want to make the first array larger at its current location, and budge all the other arrays along in the file and the address space to make room.
If this was just anonymous memory, I could use mremap() to budge over whole pages in constant time, as long as I'm inserting a whole number of memory pages. But this is a disk-backed file, so the data needs to move in the file as well as in memory.
I don't actually want to read and then rewrite whole blocks of data to and from the physical disk. I want the data to stay on disk in the physical sectors it is in, and to induce the filesystem to adjust the file metadata to insert new sectors where I need the extra space. If I have to keep my inserts to some multiple of a filesystem-dependent disk sector size, that's fine. If I end up having to copy O(N) sector or extent references around to make room for the inserted extent, that's fine. I just don't want to have 2 gigabytes move from and back to the disk in order to insert a block in the middle of a 4 gigabyte file.
How do I accomplish an efficient insert by manipulating file metadata? Is a general API for this actually exposed in Linux? Or one that works if the filesystem happens to be e.g. ext4? Will a write() call given a source address in the memory-mapped file reduce to the sort of efficient shift I want under the right circumstances?
Is there a C or C++ API function with the semantics "copy bytes from here to there and leave the source with an undefined value" that I should be calling in case this optimization gets added to the standard library and the kernel in the future?
I've considered just always allocating new pages at the end of the file, and mapping them at the right place in memory. But then I would need to work out some way to reconstruct that series of mappings when I reload the file. Also, shrinking the data structure would be a nontrivial problem. At that point, I would be writing a database page manager.
I think I actually may have figured it out.
I went looking for "linux make a file sparse", which led me to this answer on Unix & Linux Stack Exchange which mentioned the fallocate command line tool. The fallocate tool has a --dig-holes option, which turns parts of a file that could be represented by holes into holes.
I then went looking for "fallocate dig holes" to find out how that works, and I got the fallocate man page. I noticed it also offers a way to insert a hole of some size:
-i, --insert-range
Insert a hole of length bytes from offset, shifting existing
data.
If a command line tool can do it, Linux can do it, so I dug into the source code for fallocate, which you can find on Github:
case 'i':
mode |= FALLOC_FL_INSERT_RANGE;
break;
It looks like the fallocate tool accomplishes a cheap hole insert (and a move of all the other file data) by calling the fallocate() Linux-specific function with the FALLOC_FL_INSERT_RANGE flag, added in Linux 4.1. This flag won't work on all filesystems, but it does work on ext4 and it does exactly what I want: adjust the file metadata to efficiently free up some space in the file's offset space at a certain point.
It's not immediately clear to me how this interacts with currently memory-mapped pages, but I think I can work with this.

RAM usage after program loading - mismatch with TOP statistics

I have expected that after running program the top utility will show memory (VIRT column) used by my program equal to or greater than size on the disk. I was surprised when the result was different - the file size on the disc turned out to be greater than the top one showed. Could you explain what is wrong in my expectations....
P.S. Application is native built with gcc.
Not all parts of an executable file get mapped into memory when you run it.
If you examine your executable with readelf -WS <executable> (assuming elf executable format) you can see the list of file sections. Only sections with flag A (alloc) get loaded.
Sections starting with .debug, for example, do not get mapped, unless it runs under a debugger, and these sections are often the largest.

Compare a running process in memory with an executable in disk

I have a big project which will load an executable (let's call it greeting) into memory, but for some reason (e.g. there are many files called greeting under different directories), I need to know if the process in memory is exactly the one I want to use.
I know how to compare two files: diff, cmp, cksum and so on. But is there any way to compare a process in memory with an executable in hard disk?
According this answer you can get the contents of the memory version of the binary from the proc file system. I think you can cksum the original and the in memory version.
According to the man page of /proc, under Linux 2.2 and later, the
file is a symbolic link containing the actual pathname of the executed
command. Apparently, the binary is loaded into memory, and
/proc/[pid]/exe points to the content of the binary in memory.

Prepend to Very Large File in Fixed Time or Very Fast [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a file that is very large (>500GB) that I want to prepend with a relatively small header (<20KB). Doing commands such as:
cat header bigfile > tmp
mv tmp bigfile
or similar commands (e.g., with sed) are very slow.
What is the fastest method of writing a header to the beginning of an existing large file? I am looking for a solution that can run under CentOS 7.2. It is okay to install packages from CentOS install or updates repo, EPEL, or RPMForge.
It would be great if some method exists that doesn't involve relocating or copying the large amount of data in the bigfile. That is, I'm hoping for a solution that can operate in fixed time for a given header file regardless of the size of the bigfile. If that is too much to ask for, then I'm just asking for the fastest method.
Compiling a helper tool (as in C/C++) or using a scripting language is perfectly acceptable.
Is this something that needs to be done once, to "fix" a design oversight perhaps? Or is it something that you need to do on a regular basis, for instance to add summary data (for instance, the number of data records) to the beginning of the file?
If you need to do it just once then your best option is just to accept that a mistake has been made and take the consequences of the retro-fix. As long as you make your destination drive different from the source drive you should be able to fix up a 500GB file within about two hours. So after a week of batch processes running after hours you could have upgraded perhaps thirty or forty files
If this is a standard requirement for all such files, and you think you can apply the change only when the file is complete -- some sort of summary information perhaps -- then you should reserve the space at the beginning of each file and leave it empty. Then it is a simple matter of seeking into the header region and overwriting it with the real data once it can be supplied
As has been explained, standard file systems require the whole of a file to be copied in order to add something at the beginning
If your 500GB file is on a standard hard disk, which will allow data to be read at around 100MB per second, then reading the whole file will take 5,120 seconds, or roughly 1 hour 30 minutes
As long as you arrange for the destination to be a separate drive from the source, your can mostly write the new file in parallel with the read, so it shouldn't take much longer than that. But there's no way to speed it up other than that, I'm afraid
If you were not bound to CentOS 7.2, your problem could be solved (with some reservations1) by fallocate, which provides the needed functionality for the ext4 filesystem starting from Linux 4.2 and for the XFS filesystem since Linux 4.1:
int fallocate(int fd, int mode, off_t offset, off_t len);
This is a nonportable, Linux-specific system call. For the portable,
POSIX.1-specified method of ensuring that space is allocated for a
file, see posix_fallocate(3).
fallocate() allows the caller to directly manipulate the allocated
disk space for the file referred to by fd for the byte range starting
at offset and continuing for len bytes.
The mode argument determines the operation to be performed on the
given range. Details of the supported operations are given in the
subsections below.
...
Increasing file space
Specifying the FALLOC_FL_INSERT_RANGE flag (available since Linux 4.1)
in mode increases the file space by inserting a hole within the
file size without overwriting any existing data. The hole will start
at offset and continue for len bytes. When inserting the hole inside
file, the contents of the file starting at offset will be shifted
upward (i.e., to a higher file offset) by len bytes. Inserting a
hole inside a file increases the file size by len bytes.
...
FALLOC_FL_INSERT_RANGE requires filesystem support. Filesystems that
support this operation include XFS (since Linux 4.1) and ext4 (since
Linux 4.2).
1 fallocate allows prepending data to the file only at multiples of the filesystem block size. So it will solve your problem only if it's acceptable for you to pad the extra space with whitespace, comments, etc.
Without a support for fallocate()+FALLOC_FL_INSERT_RANGE the best you can do is
Increase the file (so that it has its final size);
mmap() the file;
memmove() the data;
Fill the header data in the beginning.

Executing a ELF binary without creating a local file?

Is it possible to execute a binary file without copying it to hard drive?
I know /lib/ld-linux.so.2 can load arbitrary binaries, but that still requires a locally stored file, my thought was to allocate a memory region, duplicate the contents to memory and execute it.
So is that possible?
my thought was to allocate a memory region, duplicate the contents to memory and execute it.
For a statically-linked a.out, you can do just that.
For a dynamically-linked one, you'll need something like dlopen_phdr.
It is possible but very difficult. I worked on a similar problem years ago here. (Caution! That code is incomplete and contains serious bugs.)
The hard part is to make sure that your process's address space requirements do not conflict with the binary's or the dynamic linker's (e.g., /lib/ld-linux.so.2). If you control both programs' memory layout (because you linked them) you can avoid that problem. (My code does not assume such control, and it takes pains to move itself out of the way if necessary.)
If there are no address space conflicts, it is a matter of allocating memory for the exe's PT_LOAD segments (and the linker's, if any), aligning and copying the segments in, protecting the read-only segments, allocating stack, initializing stack and registers the way the kernel does, and jumping to the exe's or linker's entry address.

Resources