Stanford Parser multithread usage - multithreading

Stanford Parser is now 'thread-safe' as of version 2.0 (02.03.2012). I am currently running the command line tools and cannot figure out how to make use of my multiple cores by threading the program.
In the past, this question has been answered with "Stanford Parser is not thread-safe", as the FAQ still says. I am hoping to find someone who has had success threading the latest version.
I have tried using -t flag (-t10 and -tLLP) since that was all I could find in my searches, but both throw errors.
An example of a command I issue is:
java -cp stanford-parser.jar edu.stanford.nlp.parser.lexparser.LexicalizedParser \
-outputFormat "oneline" ./grammar/englishPCFG.ser.gz ./corpus > corpus.lex

Starting with version 2.0.5, you can now easily use multiple threads with the option -nthreads k. For example, your command can be like this:
java -mx6g edu.stanford.nlp.parser.lexparser.LexicalizedParser -nthreads 4 edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz file.txt > file.stp
(Releases of version 2 prior to 2013 had no way to enable multithreading from the command-line, but only when using the API.)
Internally, you can simultaneously run as many parsing threads inside one JVM process as you want. You can do this either by getting and using multiple LexicalizedParserQuery objects (via the parserQuery() method) or implicitly by calling apply(...) or parseTree(...) off one LexicalizedParser. The -nthreads k option does this for you by sending successive sentences to different parsers using the Executor framework. You can also simultaneously create multiple LexicalizedParser's, e.g., for parsing different languages.
Multiple LexicalizedparserQuery objects share the same grammar (LexicalizedParser), but the memory space savings aren't huge, as most of the memory goes to the transient structures used in chart parsing. So, if you are running lots of parsing threads concurrently, you will need to give a lot of memory to the JVM, as in the example above.
p.s. Sorry, yes, some of the documentation still needs updating. But -tLPP is one flag for specifying language-specific resources. The Stanford Parser has no -t flag.

Related

Why are some Bash commands both built-in and external?

Some commands are internal built-in Bash commands while others are external (other programs). I see why certain commands need to be built-in. Some of the reasons are:
If a command needs to change the internal state of the shell process.
If a command performs a very basic operation in the shell.
If a command is called often and needs to be made fast. An external command is executed by loading an external program and hence is slower.
But why are some commands both built-in and external, for example echo and test? I understand echo is used a lot and thus is built-in (Reason 3). But then why also have it as an external command and have a binary for it in /bin/echo? The built-in version of echo will always take precedence over the external version and thus, the external version is hardly ever used. So, why then have an external version of it at all?
It's exactly your point 3. When a command does very little (echo is a good example), spawning a new process dominates the run time behavior. With growing disks and bandwidth and code bases you always reach a spot when you have so much data and so many files (our code base at work has 100k files!!) that one less spawn per file makes a difference of minutes.
That's also why the typical built-in is a drop-in replacement which takes (perhaps a superset of) the same arguments as the binary.
You also ask why the old binary is still retained even though Bash has it as a built-in — the answer is that a lot of programs rely on the existence of that /bin/echo. It's actually standardized.
Bash is only one of many user interfaces and offline command interpreters. They all have different sets of built-ins. Some shells are purposefully small and rely a lot on what you could call "legacy" binaries. One example is ash and its successor, Dash. Dash is now the default /bin/sh in Ubuntu and Debian due to its speed, and is popular for embedded systems due to its small size. (But even Dash has builtins for echo, test and dozens of other commands, and provides a command history for interactive use.)

Is tesseract 3.00 multi-threaded?

I read some other posts suggesting that they would add multi-threading support in 3.00. But I'm not sure if it's added in 3.00 when it was released.
Other than multi-threading, is running multiple processes of tesseract a feasible option to achieve concurrency?
Thanks.
One thing I've done is invoked GNU Parallel to run as many instances of Tess* as able on a multi-core system for multi-page documents converted to single page images.
It's a short program, easily compiled on most Linux distros (I'm using OpenSuSE 11.4).
Here's the command line that I use:
/usr/local/bin/parallel -j 4 \
/usr/local/bin/tesseract -psm 1 -l eng {} {.} \
::: /tmp/tmp/*.jpg
The -j 4 tells parallel to use all four CPU cores that I have on a server.
If you run this, and in another terminal do a 'top,' you'll see up to four processes at one time until it rummages through all of the JPG's in the directory specified.
Your load should never exceed the number of CPU cores in your system (if you run Linux).
Here's the link to GNU Parallel:
http://www.gnu.org/software/parallel/
No. You can browse the code in http://code.google.com/p/tesseract-ocr/source/browse/ None of the current code in trunk seems to make use of multi-threading. (at least looking through the base classes, api, and neural networking classes)
I did use parallel as well, on a Centos, this way:
ls | parallel --gnu "tesseract {} {.}"
I used the --gnu option as suggested from the stdout log which was:
parallel: Warning: YOU ARE USING --tollef. IF THINGS ARE ACTING WEIRD USE --gnu.
the {} and {.} are placeholders for parallel: in this case you're telling tesseract to use the file listed as first argument, and the same file name without extension as second argument - everything is well explained in parallel man pages.
Now, if you have - say - three .tif files and you run tesseract three times, one for each file, summing up the execution time, and then you run the command above with time before parallel, you can easily check the speedup.

Any C++ libraries to run a single program on multiple PC's (i.e. "use grid computing to run my app")

I'm after a method of converting a single program to run on multiple computers on a network (think "grid computing").
I'm using MSVC 2007 and C++ (non-.NET).
The program I've written is ideally suited for parallel programming (its doing analysis of scientific data), so the more computers the better.
The classic answer for this would be MPI (Message Passing Interface). It requires a bit of work to get your program to work well with message passing, but the end result is that you can easily launch your executable across a cluster of machines that are running a MPI daemon.
There are several implementations. I've worked with MPICH, but I might consider doing this with Boost MPI (which didn't exist last time I was in the neighborhood).
Firstly, this topic is covered here:
https://stackoverflow.com/questions/2258332/distributed-computing-in-c
Secondly, a search for "C++ grid computing library", "grid computing for visual studio" and "C++ distributed computing library" returned the following:
OpenMP+OpenMPI. OpenMP handles the running of single C++ program on multiple CPU cores within the same machine, OpenMPI handles the messaging between multiple machines. OpenMP+OpenMPI=grid computing.
POP-C++, see http://gridgroup.hefr.ch/popc/.
Xoreax Grid Engine, see http://www.xoreax.com/high_performance_grid_computing.htm. Xoreax focuses on speeding up builds of Visual Studio, but the Xoreax Grid Engine can also be applied to generic applications. Looking at http://www.xoreax.com/xge_xoreax_grid_engine.htm quotes, we see the quote "Once a task-set (a set of tasks for distribution along with their dependency definitions) is defined through one of the interfaces described below, it can be executed on any machine running an IncrediBuild Agent.". See the accompanying CodeGuru article at http://www.codeproject.com/KB/showcase/Xoreax-Grid.aspx
Alchemi, see http://www.codeproject.com/KB/threads/alchemi.aspx.
RightScale, see http://www.rightscale.com/pdf/Grid-Whitepaper-Technical.pdf. A quote from the examples section of this paper: "Pharmaceutical protein analysis: Several million protein compound comparisons were performed in less than a day – a task that would have taken over a week on the customer’s internal resources ..."

How to make R use all processors?

I have a quad-core laptop running Windows XP, but looking at Task Manager R only ever seems to use one processor at a time. How can I make R use all four processors and speed up my R programs?
I have a basic system I use where I parallelize my programs on the "for" loops. This method is simple once you understand what needs to be done. It only works for local computing, but that seems to be what you're after.
You'll need these libraries installed:
library("parallel")
library("foreach")
library("doParallel")
First you need to create your computing cluster. I usually do other stuff while running parallel programs, so I like to leave one open. The "detectCores" function will return the number of cores in your computer.
cl <- makeCluster(detectCores() - 1)
registerDoParallel(cl, cores = detectCores() - 1)
Next, call your for loop with the "foreach" command, along with the %dopar% operator. I always use a "try" wrapper to make sure that any iterations where the operations fail are discarded, and don't disrupt the otherwise good data. You will need to specify the ".combine" parameter, and pass any necessary packages into the loop. Note that "i" is defined with an equals sign, not an "in" operator!
data = foreach(i = 1:length(filenames), .packages = c("ncdf","chron","stats"),
.combine = rbind) %dopar% {
try({
# your operations; line 1...
# your operations; line 2...
# your output
})
}
Once you're done, clean up with:
stopCluster(cl)
The CRAN Task View on High-Performance Compting with R lists several options. XP is a restriction, but you still get something like snow to work using sockets within minutes.
As of version 2.15, R now comes with native support for multi-core computations. Just load the parallel package
library("parallel")
and check out the associated vignette
vignette("parallel")
I hear tell that REvolution R supports better multi-threading then the typical CRAN version of R and REvolution also supports 64 bit R in windows. I have been considering buying a copy but I found their pricing opaque. There's no price list on their web site. Very odd.
I believe the multicore package works on XP. It gives some basic multi-process capability, especially through offering a drop-in replacement for lapply() and a simple way to evaluate an expression in a new thread (mcparallel()).
On Windows I believe the best way to do this would probably be with foreach and snow as David Smith said.
However, Unix/Linux based systems can compute using multiple processes with the 'multicore' package. It provides a high-level function, 'mclapply', that performs a list comprehension across multiple cores. An advantage of the 'multicore' package is that each processor gets a private copy of the Global Environment that it may modify. Initially, this copy is just a pointer to the Global Environment, making the sharing of variable extremely quick if the Global Environment is treated as read-only.
Rmpi requires that the data be explicitly transferred between R processes instead of working with the 'multicore' closure approach.
-- Dan
If you do a lot of matrix operations and you are using Windows you can install revolutionanalytics.com/revolution-r-open for free, and this one comes with the intel MKL libraries which allow you to do multithreaded matrix operations. On Windows if you take the libiomp5md.dll, Rblas.dll and Rlapack.dll files from that install and overwrite the ones in whatever R version you like to use you'll have multithreaded matrix operations (typically you get a 10-20 x speedup for matrix operations). Or you can use the Atlas Rblas.dll from prs.ism.ac.jp/~nakama/SurviveGotoBLAS2/binary/windows/x64 which also work on 64 bit R and are almost as fast as the MKL ones. I found this the single easiest thing to do to drastically increase R's performance on Windows systems. Not sure why they don't come as standard in fact on R Windows installs.
On Windows, multithreading unfortunately is not well supported in R (unless you use OpenMP via Rcpp) and the available SOCKET-based parallelization on Windows systems, e.g. via package parallel, is very inefficient. On POSIX systems things are better as you can use forking there. (package multicore there is I believe the most efficient one). You could also try to use package Rdsm for multithreading within a shared memory model - I've got a version on my github that has unflagged -unix only flag and should work also on Windows (earlier Windows wasn't supported as dependency bigmemory supposedly didn't work on Windows, but now it seems it does) :
library(devtools)
devtools::install_github('tomwenseleers/Rdsm')
library(Rdsm)

Logging frameworks for embedded linux?

I need a small, portable framework for logging on embedded linux. Ideally it would output to a file or a socket, and having some sort of log rotation/compression would also be nice.
So far, I've found a lot of frameworks, but almost all of them have daunting build procedures or require the use of application frameworks (e.g. log4cxx requires the Apache Portable Runtime, which I'd rather not bother with...).
Just looking for something simple and robust, but everything I seem to find is complicated or requires lots of secondary junk just to run.
Suggestions? (and if the answer is roll my own, that's fine, but...it's be great to avoid that)
Use syslog(3) and syslogd from BusyBox. BusyBox can be very compact when stripped down and doesn't depend on anything other than libc. You can strip out everything you don't want so it is perfectly possible to use it only for logging.
We use BusyBox on a number of embedded systems, both Linux and uClinux, and find its logging facilities highly reliable.
I have no experience with the log4cxx-module but I am using APR on an embedded target running Linux (it is based on the Atmel AT91SAM926x processor family). It was really simple to configure and compile (more or less ./configure --host=arm-none-linux-gnueabi) so I would not be to afraid of going down the log4cxx-path.
Maybe you should consider spending some time on a good logging framework, since this is what you are going to use on your embedded Linux. ... and printf ...
I cooked something where I can enable/disable various logging levels per module in runtime.
Did you ever try debugging multithreaded apps on Linux?
Good luck!
Implementing very robust logging mechanism in C taking about 1000 code lines (from our code base). 90% of this defines of different sections. This includes different macros DBG_E DBG_W DBG_TRACE etc ... and spliting to the section, run time changing of debug level and debug modules (does not include compression just simple print abstraction that can be implemented in different ways file/socket/serial etc...) .
I will estimate that it take about few days to implement. The down side you will spend a few days the up side that you will get something that works for your needs and nothing more, i understand that you are working on embedded platform and footprint and memory usage are important, the best and optimized solution will be one you write. We invested those few days ones. and using it across different products/project and adjust/improve with the time past according to real needs. Main problem of generic solution that it usually will do sort of what you need and a lot more, this more usually just waist of resources.
I can't imagine that your platform is too small to include log4cxx and APR, neither is a large library, and even the tiniest platform is likely to have space for them.
You could just use syslog, which is provided by the C library - a syslog daemon is provided by busybox (which no doubt, you already use if you're on a really tiny platform). I don't know if busybox's syslogd can log to the network, but it has some level of flexibility. You can do log rotation using shell scripts pretty trivially.
Use klogd it reads the kernel log messages(from /proc/kmsg kernel) interface and redirect those messages to appropriate directory. you can use user configurable syslogd daemon along with klogd that will redirect kernel messages into appropriate files in /var/log/ directory.
For instance logs related to mail service will be stored in /var/log/main.log and logs related to kernel booting process will be stored in /var/log/boot.log . User can configure log parsing using syslogd configuration file.
But the use of syslogd may lead to your system performance degradation because for every log messages syslog daemon will do disk operation to store that log into appropriate file
Log sequence
Messages from kernel
---> klogd ( access messages from kernel ring buffer)-->syslogd --> /var/log/*

Resources