How to make R use all processors? - linux

I have a quad-core laptop running Windows XP, but looking at Task Manager R only ever seems to use one processor at a time. How can I make R use all four processors and speed up my R programs?

I have a basic system I use where I parallelize my programs on the "for" loops. This method is simple once you understand what needs to be done. It only works for local computing, but that seems to be what you're after.
You'll need these libraries installed:
library("parallel")
library("foreach")
library("doParallel")
First you need to create your computing cluster. I usually do other stuff while running parallel programs, so I like to leave one open. The "detectCores" function will return the number of cores in your computer.
cl <- makeCluster(detectCores() - 1)
registerDoParallel(cl, cores = detectCores() - 1)
Next, call your for loop with the "foreach" command, along with the %dopar% operator. I always use a "try" wrapper to make sure that any iterations where the operations fail are discarded, and don't disrupt the otherwise good data. You will need to specify the ".combine" parameter, and pass any necessary packages into the loop. Note that "i" is defined with an equals sign, not an "in" operator!
data = foreach(i = 1:length(filenames), .packages = c("ncdf","chron","stats"),
.combine = rbind) %dopar% {
try({
# your operations; line 1...
# your operations; line 2...
# your output
})
}
Once you're done, clean up with:
stopCluster(cl)

The CRAN Task View on High-Performance Compting with R lists several options. XP is a restriction, but you still get something like snow to work using sockets within minutes.

As of version 2.15, R now comes with native support for multi-core computations. Just load the parallel package
library("parallel")
and check out the associated vignette
vignette("parallel")

I hear tell that REvolution R supports better multi-threading then the typical CRAN version of R and REvolution also supports 64 bit R in windows. I have been considering buying a copy but I found their pricing opaque. There's no price list on their web site. Very odd.

I believe the multicore package works on XP. It gives some basic multi-process capability, especially through offering a drop-in replacement for lapply() and a simple way to evaluate an expression in a new thread (mcparallel()).

On Windows I believe the best way to do this would probably be with foreach and snow as David Smith said.
However, Unix/Linux based systems can compute using multiple processes with the 'multicore' package. It provides a high-level function, 'mclapply', that performs a list comprehension across multiple cores. An advantage of the 'multicore' package is that each processor gets a private copy of the Global Environment that it may modify. Initially, this copy is just a pointer to the Global Environment, making the sharing of variable extremely quick if the Global Environment is treated as read-only.
Rmpi requires that the data be explicitly transferred between R processes instead of working with the 'multicore' closure approach.
-- Dan

If you do a lot of matrix operations and you are using Windows you can install revolutionanalytics.com/revolution-r-open for free, and this one comes with the intel MKL libraries which allow you to do multithreaded matrix operations. On Windows if you take the libiomp5md.dll, Rblas.dll and Rlapack.dll files from that install and overwrite the ones in whatever R version you like to use you'll have multithreaded matrix operations (typically you get a 10-20 x speedup for matrix operations). Or you can use the Atlas Rblas.dll from prs.ism.ac.jp/~nakama/SurviveGotoBLAS2/binary/windows/x64 which also work on 64 bit R and are almost as fast as the MKL ones. I found this the single easiest thing to do to drastically increase R's performance on Windows systems. Not sure why they don't come as standard in fact on R Windows installs.
On Windows, multithreading unfortunately is not well supported in R (unless you use OpenMP via Rcpp) and the available SOCKET-based parallelization on Windows systems, e.g. via package parallel, is very inefficient. On POSIX systems things are better as you can use forking there. (package multicore there is I believe the most efficient one). You could also try to use package Rdsm for multithreading within a shared memory model - I've got a version on my github that has unflagged -unix only flag and should work also on Windows (earlier Windows wasn't supported as dependency bigmemory supposedly didn't work on Windows, but now it seems it does) :
library(devtools)
devtools::install_github('tomwenseleers/Rdsm')
library(Rdsm)

Related

Quick questions about Linux kernel modules

I'm very familiar with Linux (I've been using it for 2 years, no Windows for 1 1/2 years), and I'm finally digging deeper into kernel programming and I'm working a project. So my questions are:
Will a kernel module run faster than a traditional c program.
How can I communicate with a module (is that even possible), for example call a function in it.
1.Will a kernel module run faster than a traditional c program.
It Depends™
Running as a kernel module means you get to play by different rules, you potentially get to avoid some context switches depending on what you are doing. You get access to some powerful tools that can be leveraged to optimize your code, but don't expect your code to run magically faster just by throwing everything in kernelspace.
2.How can I communicate with a module (is that even possible), for example call a function in it.
There are various ways:
You can use the various file system interfaces: procfs, sysfs, debugfs, sysctl, ...
You could register a char device
You can make use of the Netlink interface
You could create new syscalls, although that's heavily discouraged
And you can always come up with your own scheme, or use some less common APIs
Will a kernel module run faster than a traditional c program.
The kernel is already a C program, which is most likely be compiled with same compiler you use. So generic algorithms or some processor intensive computations will be executed with almost same speed.
But most userspace programs (like bash) have to ask kernel to perform some operations on system resources, i.e. print prompt onto monitor. It will require entering the kernel with system call, sending data over tty interfaces and passing to a video-driver, it may introduce some latency. If you'd implemented bash in-kernel, you may directly call video-driver, which is definitely faster.
That approach however, have drawbacks. First of all, bash should be able to print prompt on ssh-session or serial console, and that will complicate logic. Also, if your bash will hang, you cannot just kill, you have to reboot system.
How can I communicate with a module (is that even possible), for example call a function in it.
In addition to excellent list provided by #tux3, I would suggest to start with char devices.

What is the recommended environment for running multiple casperjs instances?

I am new to casperjs and planning to use it to accurately simulate anywhere from a few dozen to low hundreds of concurrent sessions accessing a private server on a private network. Unlike typical HTTP load generators (Apache bench, httperf, ...), my purpose is to be able to control each session programmatically (increase delay between requests, have 'smarts' built into each script) and have each session have distinct source IP addresses.
My current thinking is to use OpenVZ containers (openvz.org) to create each 'virtual' client running casperjs (minimal functionality I need is following elements on the UI and taking screenshots). Would love to hear of anyone who has done something similar.
The crux of my question is: what would the 'slimmest' environment for running casperjs be? I'd like to strip down the OS as much as possible to be able to scale multiple clients. Specifically:
any recommended low-footprint UNIX/Linux distributions for CasperJS?
any specific recommendations on stripping down mainstream (CentOS, Debian, ...) distributions?
Thank you all in advance. I look forward to hearing your input on this specific question or similar experiences/tools for what I'm trying to achieve...
Fernando
CasperJS is headless, e.g. it doesn't need X running to function. Any bare bones Linux distribution will do you well.
any recommended low-footprint UNIX/Linux distributions for CasperJS?
Arch is very lightweight and has an easy to follow Beginners Guide. Arch's AUR has a package for CasperJS that's pretty straightforward to setup as well. Just make sure to grab the required base-devel package (pacman -S base-devel) before installing from the AUR as it's needed for the Arch Build System.
any specific recommendations on stripping down mainstream (CentOS, Debian, ...) distributions?
Not so much stripping down, but CrunchBang is based off of the latest Debian release. It may be worth taking a look at. It would be much less of a hassle to setup than Arch, and uses the same APT package manager as Debian / Ubuntu. It installs with the lightweight OpenBox window manager, but you can remove this and X all together if you'd like.
With that said, even a lightweight Linux environment won't help much with the amount of memory each CasperJS instance will use. You could probably pull off a few dozen depending on the amount of memory available, but a few hundred may not be feasible. It all depends on how much memory each website uses. Casperjs comes with some configuration options that may help reduce memory (e.g. don't load images, plugins, etc), but that may defeat the purpose of your tests.
The best advice I can give is to try it out for yourself. Write a simple script that will open the pages you are going to use and pass a callback to CasperJS's run() function to keep it alive (e.g. don't exit from Casper). It can be as simple as:
casper.start('http://example.com/site1', function () {});
casper.thenOpen('http://example.com/site2', function () {});
casper.run(function() {
// wait 60 seconds before exit . . . or remove to never exit
setTimeout(function() { casper.exit(); }, 60000);
}
Spin up multiple instances, and watch your total memory usage. You can use the cli tools top, or use this alias that totals the amount of memory usage for the current user.
alias memu="ps -u $(whoami) -o pid,rss,command | awk '{print \$0}{sum+=\$2} END {print \"Total\", sum/1024, \"MB\"}'"
From this you should be able to see roughly how much memory each instance takes, and how many you can run at once on one machine.

Matlab 2011a Use all Cores Available on 64 bit Linux?

Hi I've looked online but I can't seem to find the answer whether I need to do anything to make matlab use all cores? From what I understand multi-threading has been supported since 2007. On my machine matlab only uses one core #100% and the rest hang at ~2%. I'm using a 64 bit Linux (Mint 12). On my other computer which has only 2 cores and is 32 bit Matlab seems to be utilizing both cores #100%. Not all of the time but in sufficient number of cases. On the 64 bit, 4 core PC this never happens.
Do I have to do anything in 64 bit to get Matlab to use all the cores whenever possible? I had to do some custom linking after install as Matlab wasn't finding the libraries (eg. libc.so.6) because it wasn't looking in the correct places.
By standard, since the latest release, you can use 12 cores using the Parallel Computing Toolbox. Without this toolbox, I guess you're out of luck. Any additional cores could be accessed by the MATLAB Distributed Computing Server, where you actually pay per number of worker threads.
To make matlab use your multiple cores you have to do
matlabpool open
And it of course works better if you actually have multithreaded code (like using the spmd function or parfor loops)
More info at the Matlab homepage
MATLAB has only one single thread for Computation.
That said, multiple threads would be created for certain functions which use the multithreaded features of the BLAS libraries that it uses underneath.
Thus, you would only be able to gain a 'multi threaded' advantage if you are calling functions which use these multi-threaded blas libraries.
This link has information on the list of functions which are multithreaded.
Now for the use of your cores, that would depend on your OS. I believe the OS would have to load balance your threads to be used on all cores. One CANNOT set affinities to threads from within MATLAB. One can however set worker MATLAB processes to have affinities to cores from within the Parallel Computing toolbox.
However, you could always try setting the affinity for the MATLAB process to all your processors manually by the details available at the following link for Linux
Windows users can simply right click on the process in the task manager and set affinity.
My understanding is that this is only a request to the OS and is not a hard binding rule that the OS must adhere to.

Any C++ libraries to run a single program on multiple PC's (i.e. "use grid computing to run my app")

I'm after a method of converting a single program to run on multiple computers on a network (think "grid computing").
I'm using MSVC 2007 and C++ (non-.NET).
The program I've written is ideally suited for parallel programming (its doing analysis of scientific data), so the more computers the better.
The classic answer for this would be MPI (Message Passing Interface). It requires a bit of work to get your program to work well with message passing, but the end result is that you can easily launch your executable across a cluster of machines that are running a MPI daemon.
There are several implementations. I've worked with MPICH, but I might consider doing this with Boost MPI (which didn't exist last time I was in the neighborhood).
Firstly, this topic is covered here:
https://stackoverflow.com/questions/2258332/distributed-computing-in-c
Secondly, a search for "C++ grid computing library", "grid computing for visual studio" and "C++ distributed computing library" returned the following:
OpenMP+OpenMPI. OpenMP handles the running of single C++ program on multiple CPU cores within the same machine, OpenMPI handles the messaging between multiple machines. OpenMP+OpenMPI=grid computing.
POP-C++, see http://gridgroup.hefr.ch/popc/.
Xoreax Grid Engine, see http://www.xoreax.com/high_performance_grid_computing.htm. Xoreax focuses on speeding up builds of Visual Studio, but the Xoreax Grid Engine can also be applied to generic applications. Looking at http://www.xoreax.com/xge_xoreax_grid_engine.htm quotes, we see the quote "Once a task-set (a set of tasks for distribution along with their dependency definitions) is defined through one of the interfaces described below, it can be executed on any machine running an IncrediBuild Agent.". See the accompanying CodeGuru article at http://www.codeproject.com/KB/showcase/Xoreax-Grid.aspx
Alchemi, see http://www.codeproject.com/KB/threads/alchemi.aspx.
RightScale, see http://www.rightscale.com/pdf/Grid-Whitepaper-Technical.pdf. A quote from the examples section of this paper: "Pharmaceutical protein analysis: Several million protein compound comparisons were performed in less than a day – a task that would have taken over a week on the customer’s internal resources ..."

How can I find file system concurrency issues?

I have an application running on Linux, and I find myself wanting windows (!).
The problem is that every 1000 times or so I run into concurrency problems that are consistent with concurrent reading/writing of files. I am fairly sure that this behavior would be prohibited by file locking under Windows, but I don't have any sufficiently fast windows box to check.
There is simply too much file access (too much data) to expect strace to work reliably - the sheer volume of output is likely to change the problem too. It also happens on different files every time. Ideally I would like to change/reconfigure the linux file system to be more restrictive (as in fail-fast) wrt concurrent access.
Are there any tools/settings I can use to achieve this ?
Hmmm. Concurrent access to files is perfectly legitimate on Posix-like systems so there is no kind of "failure" mode associated with it. Is there a reason you can't use file-locking on Linux? It's difficult to tell from your description what the actual problem is (1000 times of what?) but it sounds like the traditional flock() or lockf() system calls might be what you're looking for.
For some reason I thought you were using C++. The following applies if you are.
If you are using multi-threading and fstream IO and custom streambufs or you disabled sync_with_stdio, then yes, the C++ iostreams will act differently from iostreams on Windows.
I ran into this with one of my own projects.
Windows defines a mutex in its iostream sentry. Linux does not. Linux does seem to have locking in its C stdio functions, so usually that works out anyway.
However, I defined a custom debug streambuf that didn't go through stdio and got all sorts of corruption in Linux.
I got around it by using a mutex that is preprocessed out if the OS is Windows.

Resources