R stats - memory issues when allocating a big matrix / Linux - linux

I have read several threads about memory issues in R and I can't seem to find a solution to my problem.
I am running a sort of LASSO regression on several subsets of a big dataset. For some subsets it works well, and for some bigger subsets it does not work, with errors of type "cannot allocate vector of size 1.6Gb". The error occurs at this line of the code:
example <- cv.glmnet(x=bigmatrix, y=price, nfolds=3)
It also depends on the number of variables that were included in "bigmatrix".
I tried on R and R64 for both Mac and R for PC but recently went onto a faster virtual machine on Linux thinking I would avoid any memory issues. It was better but still had some limits, even though memory.limit indicates "Inf".
Is there anyway to make this work or do I have to cut a few variables in the matrix or take a smaller subset of data ?
I have read that R is looking for some contiguous bits of memory and that maybe I should pre-allocate the matrix ? Any idea ?

Let me build slightly on what #richardh said. All of the data you load with R chews up RAM. So you load your main data and it uses some hunk of RAM. Then you subset the data so the subset is using a smaller hunk. Then the regression algo needs a hunk that is greater than your subset because it does some manipulations and gyrations. Sometimes I am able to better use RAM by doing the following:
save the initial dataset to disk using save()
take a subset of the data
rm() the initial dataset so it is no longer in memory
do analysis on the subset
save results from the analysis
totally dump all items in memory: rm(list=ls())
load the initial dataset from step 1 back into RAM using load()
loop steps 2-7 as needed
Be careful with step 6 and try not to shoot your eye out. That dumps EVERYTHING in R memory. If it's not been saved, it'll be gone. A more subtle approach would be to delete the big objects that you are sure you don't need and not do the rm(list=ls()).
If you still need more RAM, you might want to run your analysis in Amazon's cloud. Their High-Memory Quadruple Extra Large Instance has over 68GB of RAM. Sometimes when I run into memory constraints I find the easiest thing to do is just go to the cloud where I can be as sloppy with RAM as I want to be.
Jeremy Anglim has a good blog post that includes a few tips on memory management in R. In that blog post Jeremy links to this previous StackOverflow question which I found helpful.

I don't think this has to do with continuous memory, but just that R by default works only in RAM (i.e., can't write to cache). Farnsworth's guide to econometrics in R mentions package filehash to enable writing to disk, but I don't have any experience with it.
Your best bet may be to work with smaller subsets, manage memory manually by removing variables you don't need with rm (i.e., run regression, store results, remove old matrix, load new matrix, repeat), and/or getting more RAM. HTH.

Try bigmemory package. It is very easy to use. The idea is that data are stored in file on HDD and you create an object in R as reference to this file. I have tested this one and it works very well.
There are some alternative as well, like "ff". See CRAN Task View: High-Performance and Parallel Computing with R for more information.


Why whenever I look information on how to use the SDRAM of my DE1-SOC on internet, it takes me to use NIOS-II?

I'm doing a simple project of taking 100 numbers from an external memory (one by one), doing a simple arithmetic to that number (like adding 1) and returning it to another memory.
I successfully did that project "representing" a memory in verilog code, however I want now to synthesize my design but using the SDRAM of the board. The way I load data to the SDRAM or what I do with the resulting data outputted again to the SDRAM is irrelevant for my homework.
But I just can't understand what to do, all the information in internet takes me to the utilization of NIOS-II. Considering I have to load data to the SDRAM to make it able to serve me, and other reasons, maybe, is that NIOS-II is the most recommended way to do this? Can be done with out it, and would it be more practical?
this might not be the place to have your homework done. Additionally your question is very unclear. Let's try anyway:
I successfully did that project "representing" a memory in verilog code
I assume that you mean that you downloaded a model corresponding to the memory you have on your board.
taking 100 numbers from an external memory
I wonder how you do that. Did you load some initialization file or did you write the numbers first? In case of the first: this will not be synthesized and you might read random data, you should refer to the datasheet of your memory for this. If you expect specific values, you will need to write them to memory during some initialization procedure.
Of course you will need the correct constraints for your device. So I'd suggest that you take the NIOSII example, get it up and running and get rid of the NIOSII in a next step. At least you will be sure that the interfacing between controller and sdram is correct. Then read the datasheet of the controller. Probably you have a readstrobe, write strobe, data in, data out port, some configuration, perhaps a burstlength. If you need help with that you'll need to come up with a more specific question

external multithreading sort

I need to implement external multithreading sort. I dont't have experience in multithreading programming and now I'm not sure if my algorithm is good anoth also I don't know how to complete it. My idea is:
Thread reads next block of data from input file
Sort it using standart algorith(std::sort)
Writes it to another file
After this I have to merge such files. How should I do this?
If I wait untill input file will be entirely processed until merge
I recieve a lot of temporary files
If I try to merge file straight after sort, I can not come up with
an algorithm to avoid merging files with quite different sizes, which
will lead to O(N^2) difficulty.
Also I suppose this is a very common task, however I cannot find good prepared algoritm in the enternet. I would be very grateful for such a link especially for it's c++ implementation.
Well, the answer isn't that simple, and it actually depends on many factors, amongst them the number of items you wish to process, and the relative speed of your storage system and CPUs.
But the question is why to use multithreading at all here. Data too big to be held in memory? So many items that even a qsort algorithm can't sort fast enough? Take advantage of multiple processors or cores? Don't know.
I would suggest that you first write some test routines to measure the time needed to read and write the input file and the output files, as well as the CPU time needed for sorting. Please note that I/O is generally A LOT slower than CPU execution (actually they aren't even comparable), and I/O may not be efficient if you read data in parallel (there is one disk head which has to move in and out, so reads are in effect serialized - even if it's a digital drive it's still a device, with input and output channels). That is, the additional overhead of reading/writing temporary files may more than eliminate any benefit from multithreading. So I would say, first try making an algorithm that reads the whole file in memory, sorts it and writes it, and put in some time counters to check their relative speed. If I/O is some 30% of the total time (yes, that little!), it's definitely not worth, because with all that reading/merging/writing of temporary files, this will rise a lot more, so a solution processing the whole data at once would rather be preferable.
Concluding, don't see why use multithreading here, the only reason imo would be if data are actually delivered in blocks, but then again take into account my considerations above, about relative I/O-CPU speeds and the additional overhead of reading/writing the temporary files. And a hint, your file accessing must be very efficient, eg reading/writing in larger blocks using application buffers, not one by one (saves on system calls), otherwise this may have a detrimental effect if the file(s) are stored on a machine other than yours (eg a server).
Hope you find my suggestions useful.

How much ram do different python objects use?

I want to optimize my program as it consumes a lot of ram and stores a lot of data. To decide which datatype to use I need to know how much RAM a single python objects uses. How can I determine those sizes.
Just use sys.getsizeof().
If you want the size of a container pluss all of its content, then the doc even has a link to a recipe to do that.

Reducing memory usage in an extended Mathematica session

I'm doing some rather long computations, which can easily span a few days. In the course of these computations, sometimes Mathematica will run out of memory. To this end, I've ended up resorting to something along the lines of:
ParallelEvaluate[$KernelID]; (* Force the kernels to launch *)
kernels = Kernels[];
If[Mod[iteration, n] == 0,
(* Complicated stuff here *)
Export[...], (* If a computation ends early I don't want to lose past results *)
{iteration, min, max}]
This is great and all, but over time the main kernel accumulates memory. Currently, my main kernel is eating up roughly 1.4 GB of RAM. Is there any way I can force Mathematica to clear out the memory it's using? I've tried littering Share and Clear throughout the many Modules I'm using in my code, but the memory still seems to build up over time.
I've tried also to make sure I have nothing big and complicated running outside of a Module, so that something doesn't stay in scope too long. But even with this I still have my memory issues.
Is there anything I can do about this? I'm always going to have a large amount of memory being used, since most of my calculations involve several large and dense matrices (usually 1200 x 1200, but it can be more), so I'm wary about using MemoryConstrained.
The problem was exactly what Alexey Popkov stated in his answer. If you use Module, memory will leak slowly over time. It happened to be exacerbated in this case because I had multiple Module[..] statements. The "main" Module was within a ParallelTable where 8 kernels were running at once. Tack on the (relatively) large number of iterations, and this was a breeding ground for lots of memory leaks due to the bug with Module.
Since you are using Module extensively, I think you may be interested in knowing this bug with non-deleting temporary Module variables.
Example (non-deleting unlinked temporary variables with their definitions):
In[1]:= $HistoryLength=0;
Out[3]= 6
In[4]:= lst=Table[a[1],{1000}];
Out[5]= 1007
In[6]:= lst=.
Out[7]= 1007
In[8]:= Definition#d$999
Out[8]= Attributes[d$999]={Temporary}
Note that in the above code I set $HistoryLength = 0; to stress this buggy behavior of Module. If you do not do this, temporary variables can still be linked from history variables (In and Out) and will not be removed with their definitions due to this reason in more broad set of cases (it is not a bug but a feature, as Leonid mentioned).
UPDATE: Just for the record. There is another old bug with non-deleting unreferenced Module variables after Part assignments to them in v.5.2 which is not completely fixed even in version 7.0.1:
In[1]:= $HistoryLength=0;$Version
Out[1]= 7.0 for Microsoft Windows (32-bit) (February 18, 2009)
Out[3]= {L$111}
Out[4]= {40000084}
Have you tried to evaluate $HistoryLength=0; in all subkernels and as well as in the master kernel? History tracking is the most common source for going out of memory.
Have you tried do not use slow and memory-consuming Export and use fast and efficient Put instead?
It is not clear from your post where you evaluate ClearSystemCache[] - in the master kernel or in subkernels? It looks like you evaluate it in the master kernel only. Try to evaluate it in all subkernels too before each iteration.

How might one go about implementing a disk fragmenter?

I have a few ideas I would like to try out in the Disk Defragmentation Arena. I came to the conclusion that as a precursor to the implementation, it would be useful, to be able to put a disk into a state where it was fragmented. This seems to me to be a state that is more difficult to achieve than a defragmented one. I would assume that the commercial defragmenter companies probably have solved this issue.
So my question.....
How might one go about implementing a fragmenter? What makes sense in the context that it would be used, to test a defragmenter?
Maybe instead of fragmenting the actual disk, you should really test your defragmentation algorithm on a simulation/mock disk? Only once you're satisfied the algorithm itself works as specified, you could do the testing on actual disks using the actual disk API.
You could even take snapshots of actual fragmented disks (yours or of someone you know) and use this data as a mock model for testing.
How you can best fragement depends on the file system.
In general, concurrently open a large number of files. Opening a file will create a new directory entry but won't cause a block to be written for that file. But now go through each file in turn, writing one block. This typically will cause the next free block to be consumed, which will lead to all your files being fragmented with regard to each other.
Fragmenting existing files is another matter. Basically, do the same, but do it on a file copy of existing files, doing a delete of the original and rename of copy.
I may be oversimplifying here but if you artificially fragment the disk won't any tests you run will be only true for the fragmentation created by your fragmenter rather than any real world fragmentation. You may end up optimising for assumptions in the fragmenter tool that don't represent real world occurrences.
Wouldn't it be easier and more accurate to take some disk images of fragmented disks? Do you have any friends or colleagues who trust you not to do anything anti-social with their data?
Fragmentation is a mathematical problem such that you are trying to maximize the distance the head of the hard drive is traveling while performing a specific operation. So in order to effectively fragment something you need to define the specific operation first
