Common Lisp Memory Issues - clisp

I am using GNU Clisp to compute a very big matrix represented as a hash table of hash tables. The ultimate hash values being stored are single floats.
The program seems to run out of memory after a while and I am thinking I need to
change the variable type somehow of the ultimate values so as to use less memory
have the operating system allocate more memory
somehow use virtual memory from the hard drive; or some combination thereof.
Any suggestions? I did a lot of searches and could not find anything.

You can use short-float - they are immediate on all platforms that CLISP supports.
Depending on your platform you might want to use the -m option to allocation more memory, but I don't think this make any difference on a modern platform - CLISP will allocate all it needs as it goes, up to the physical + swap.
virtual memory (swap) should be enabled using OS. Note that it is very slow compared to physical RAM, so it should be relied upon judiciously.

Related

mmap with large alignment

I'm working on a memory management library that targets both Windows and Linux systems. On Windows, I'm currently using VirtualAlloc2 with MEM_ADDRESS_REQUIREMENTS to allocate blocks of memory that are aligned to relatively large powers of two. Is there any way to achieve the same result with mmap?
I'm aware that a possible solution is to overallocate and then trim the excessive memory using munmap, but I'd like to avoid this since it potentially triples the number of system calls per allocation.

Using memory mapped file with OpenCL

I access a file on a disk using memory mapped I/O (mmap call on linux).
Is it possible to pass this virtual memory buffer to OpenCL using CL_MEM_USE_HOST_PTR (for reading only). And could this result in performance gains?
I want to avoid copying an entire file into host memory, and instead let the OpenCL kernel control which parts of the file get loaded/buffered by the operating system.
I think this should work - you shouldn't end up with errors, crashes, or incorrect results; whether or not it brings performance gains probably depends on hardware, driver/CL implementation, and access patterns. I would not be surprised if it didn't make much of a difference in many cases. I could imagine the GPU driver prefaulting and wiring down all the pages in order to map it into the GPU's address space.

Large physically contiguous memory area

For my M.Sc. thesis, I have to reverse-engineer the hash function Intel uses inside its CPUs to spread data among Last Level Cache slices in Sandy Bridge and newer generations. To this aim, I am developing an application in Linux, which needs a physically contiguous memory area in order to make my tests. The idea is to read data from this area, so that they are cached, probe if older data have been evicted (through delay measures or LLC miss counters) in order to find colliding memory addresses and finally discover the hash function by comparing these colliding addresses.
The same procedure has already been used in Windows by a researcher, and proved to work.
To do this, I need to allocate an area that must be large (64 MB or more) and fully cachable, so without DMA-friendly options in TLB. How can I perform this allocation?
To have a full control over the allocation (i.e., for it to be really physically contiguous), my idea was to write a Linux module, export a device and mmap() it from userspace, but I do not know how to allocate so much contiguous memory inside the kernel.
I heard about Linux Contiguous Memory Allocator (CMA), but I don't know how it works
Applications don't see physical memory, a process have some address space in virtual memory. Read about the MMU (what is contiguous in virtual space might not really be physically contiguous and vice versa)
You might perhaps want to lock some memory using mlock(2)
But your application will be scheduled, and other processes (or scheduled tasks) would dirty your CPU cache. See also sched_setaffinity(2)
(and even kernel code might be perhaps preempted)
This page on Kernel Newbies, has some ideas about memory allocation. But the max for get_free_pages looks like 8MiB. (Perhaps that's a compile-time constraint?)
Since this would be all-custom, you could explore the mem= boot parameter of the linux kernel. This will limit the amount of memory used, and you can party all over the remaining memory without anyone knowing. Heck, if you boot up a busybox system, you could probably do mem=32M, but even mem=256M should work if you're not booting a GUI.
You will also want to look into the Offline Scheduler (and here). It "unplugs" the CPU from Linux so you can have full control over ALL code running on it. (Some parts of this are already in the mainline kernel, and maybe all of it is.)

What can make a program not capable to take advantages of 64 bit system?

I am looking into Google V8 Javascript Engine. It is said that they are having problems for porting to 64 bit systems.
What kind of programming or programming constraints can make a program a 32-bit or 64-bit specific, apart from building and testing them on 64 bit machine with 64 bit settings ?
You may check this wiki which says:-
The main disadvantage of 64-bit architectures is that, relative to
32-bit architectures, the same data occupies more space in memory (due
to longer pointers and possibly other types, and alignment padding).
This increases the memory requirements of a given process and can have
implications for efficient processor cache utilization. Maintaining a
partial 32-bit model is one way to handle this, and is in general
reasonably effective. For example, the z/OS operating system takes
this approach, requiring program code to reside in 31-bit address
spaces (the high order bit is not used in address calculation on the
underlying hardware platform) while data objects can optionally reside
in 64-bit regions.

Any way to reserve but not commit memory in linux?

Windows has VirtualAlloc, which allows you to reserve a contiguous region of address space, but not actually use any physical memory. Later when you want to use it (or part of it) you call VirtualAlloc again to commit the region of previously reserved pages.
This is actually really useful, but I want to eventually port my application to linux - so I don't want to use it if I can't port it later. Does linux have a way to do this?
EDIT - Use Case
I'm thinking of allocating 4 GB or some such of virtual address space, but only committing it 64K at a time. This would give me a zero-copy way to grow an array up to 4 GB. Which is important, because the typical double the array size and copy introduces seemingly random unacceptable latency for very large arrays.
mmap a special file, like /dev/zero (or use MAP_ANONYMOUS) as PROT_NONE, later use mprotect to commit.
You can turn this functionality on system-wide by using kernel overcommit. This is usually default setting on many distributions.
Here is the explanation http://www.mjmwired.net/kernel/Documentation/vm/overcommit-accounting
The Linux equivalent of VirtualAlloc() is mmap(), which provides the same behaviours. However as a commenter points out, reservation of contiguous memory is the behaviour of calls to malloc() as long as the memory is not initialized (such as by calloc(), or user code).
"seemingly random unacceptable latency
for very large arrays
You could also consider mlock() or mmap() + MAP_LOCKED to mitigate the impact of paging. Many CPUs support huge (aka large) pages, pages larger than 4kb. These larger pages can mitigate the impact of the TLB on streaming reads/writes.

Resources