Using the package tm in R, I want to transform a corpus with a pretty complicated function, and I need some side effects for storing pertinent information. Since content_transformer requires a specific function format, the easy way is to use <<- in my function. The problem occurs with the code below:
library(tm)
a <- 4
n <- 2
corp<-VCorpus(VectorSource(rep("fish",n)))
(func<-content_transformer(
function(x) {
a <<- 42
return(x)
}))
corp<-tm_map(corp,func)
print(a)
It prints the wrong answer, i.e. 4. But with n=1, it prints the right one. So I assume it is the multi-threading that tm does that fails to behave as scalar R. I guess it's a bug since on windows, it works. (Note: I use R 3.1.2 on linux and R 3.1.1 on windows).
Questions: is it a bug? if yes, a known bug? Is there an easy solution that does not require to refactor the code?
Thanks!
edit: additional example using assing
rm(list=ls())
library(tm)
env <- new.env()
a <- 1
n <- 2
corp<-VCorpus(VectorSource(rep("fish",n)))
(func<-content_transformer(
function(x,e) {
assign("a", 42, envir=e)
print(e)
print(ls.str(e))
return(x)
}))
corp<-tm_map(corp,func,env)
print(env)
print(ls.str(env))
In fact this works accidentally under Windows because mclapply is not defined under windows and it is just a call to to an lapply.
Indeed, when you are calling tm_map , you are using this function :
tm:::tm_map.VCorpus
function (x, FUN, ..., lazy = FALSE)
{
if (lazy) {
fun <- function(x) FUN(x, ...)
if (is.null(x$lazy))
x$lazy <- list(index = rep(TRUE, length(x)), maps = list(fun))
else x$lazy$maps <- c(x$lazy$maps, list(fun))
}
else x$content <- mclapply(content(x), FUN, ...) ## this the important line
x
}
So you can reproduce the "odd/normal" behavior by calling mclapply:
library(parallel)
res <- mclapply(1:2, function(x){a<<- 20;x})
a
[1] 4
a is unchanged and is still equal to 4. This is the normal paralel behavior since we avoid to have side effect. Under windows mcapply is just a call to lapply so the value of the global variable is correctly changed.
pseudo solution
Here better to use lapply if you want to the global side effect, but you can emulate the global variable in a reading specially if you add a as a second argument your function...
func <-
function(x,a) {
a <- 42 ## use a here
x$a <- a ## assign it to the x environment
return(x) ## but the new value of a can not be used by others documents..
}
(Func<-content_transformer(func))
res <- tm_map(corp,Func,a=4)
Related
This is my program:
modify :: Integer -> Integer
modify a = a + 100
x = x where modify(x) = 101
In ghci, this compiles successfully but when I try to print x the terminal gets stuck. Is it not possible to find input from function output in Haskell?
x = x where modify(x) = 101
is valid syntax but is equivalent to
x = x where f y = 101
where x = x is a recursive definition, which will get stuck in an infinite loop (or generate a <<loop>> exception), and f y = 101 is a definition of a local function, completely unrelated to the modify function defined elsewhere.
If you turn on warnings you should get a message saying "warning: the local definition of modify shadows the outer binding", pointing at the issue.
Further, there is no way to invert a function like you'd like to do. First, the function might not be injective. Second, even if it were such, there is no easy way to invert an arbitrary function. We could try all the possible inputs but that would be extremely inefficient.
In Haskell, is there a way of initialising a list and declaring symbols in that list at the same time?
Currently I do this:
import Data.List
main = do
let lambda = "\x03BB"
xi = "\x926"
bol = "\x1D539"
cohomology_algebra = [ lambda, bol, xi]
putStrLn $ xi
putStrLn $ show cohomology_algebra
However I have a long list of symbols and I worry that i forget to put them all in the list (it has happened)
Ideally I would do something like:
main = do
let cohomology_algebra = [ lambda = "\x03BB", bol = "\x1D539", xi= "\x926"] -- does not compile
putStrLn $ show cohomology_algebra
Is there a way around this?
Not a perfect solution, but you could use
let cohomology_algebra#[lambda, bol, xi] = ["\x03BB", "\x926", "\x1D539"]
This will trigger a runtime error if the two lists above have different length (at the point where the names are demanded).
It's not optimal, since this check should be at compile time instead. Further, in this code style we have to separate the identifier form its value too much, making it possible to swap some definitions by mistake.
I'm running GHC version 7.8.3 on Windows 7.
Ok, this is not about fancy code snippets. I'm just trying not be a noob here and actually compile something in a way that vaguely resembles the structure of side-effect languages.
I have the following code:
main =
do {
let x = [0..10];
print x
}
I've learned here, that the keyword do is a fancy syntactic sugar for fancy monadic expressions. When I try to compile it, I get the following error:
main.hs:4:1: parse error on input 'print'
And I've learned in this other question, that tabs in Haskell are evil, so I've tried to omit them:
main =
do {
let x = [0..10];
print x
}
And I've failed miserably, because the parse error persists.
I've also learned here, that print is a syntactic sugar for the fancy equivalent:
main =
do {
let x = [0..10];
putStrLn $ show x
}
But then I get this error instead:
main.hs:4:9: parse error on input 'putStrLn'
Trying to face my despair, I've tried to omit the let keyword, after reading this answer:
main =
do {
x = [0..10];
print x
}
And then I get:
main.hs:4:1: parse error on input '='
And in a final useless attempt, I've even tried to omit the ';' like this:
main =
do {
let x = [0..10]
print x
}
And got:
main.hs:4:1: parse error on input 'print'
So,
How to properly use monadic expressions in Haskell without getting parse errors? Is there any hope?
It took me a while to see what was actually going on here:
main =
do {
let x = [0..10];
print x
}
The above looks as if we have a do with two statements, which is perfectly fine. Sure, it is not common practice to use explicit braces-and-semicolons when indentation implicitly inserts them. But they shouldn't hurt... why then the above fails parsing?
The real issue is that let opens a new block! The let block has no braces, so the indentation rule applies. The block starts with the definition x = [0..10]. Then a semicolon is found, which promises that another definition is following e.g.
let x = [0..10] ; y = ...
or even
let x = [0..10] ;
y = ... -- must be indented as the x above, or more indented
However, after the semicolon we find print, which is even indented less than x. According to the indentation rule, this is equivalent to inserting braces like:
main =
do {
let { x = [0..10]; }
print x
}
but the above does not parse. The error message does not refer to the implicitly inserted braces (which would be very confusing!), but only to the next line (nearly as confusing in this case, unfortunately).
The code can be fixed by e.g. providing explicit braces for let:
main = do { let { x = [0..10] };
print x }
Above, indentation is completely irrelevant: you can add line breaks and/or spaces without affecting the parsing (e.g. as in Java, C, etc.). Alternatively, we can move the semicolon below:
main = do { let x = [0..10]
; print x }
The above semicolon is on the next line and is less indented than x, implicitly inserting a } which closes the let block. Here indentation matters, since let uses the indentation rule. If we indent the semicolon more, we can cause the same parse error we found earlier.
Of course, the most idiomatic choice is using the indentation rule for the whole code:
main = do let x = [0..10]
print x
I was about to say, with no useful information, that
main = do
let x = [0..10]
print x
Was working for me, but I'm now off to read about the in within the braces.
As a slight aside, I found http://echo.rsmw.net/n00bfaq.html quite handy for reading about identation/formatting.
main = do let x = [0..10]
print x
works for me
and so does
main = do { let x = [0..10]
in print x }
I think you're trying to mix some different syntax options up.
I occasionally find that it would be useful to get the printed representation of an R object as a character string, like Python's repr function or Lisp's prin1-to-string. Does such a function exist in R? I don't need it to work on complicated or weird objects, just simple vectors and lists.
Edit: I want the string that I would have to type into the console to generate an identical object, not the output of print(object).
I'm not familiar with the Python/Lisp functions you listed, but I think you want either dput or dump.
x <- data.frame(1:10)
dput(x)
dump("x", file="clipboard")
See ?evaluate in the evaluate package.
EDIT: Poster later clarified in comments that he wanted commands that would reconstruct the object rather than a string that held the print(object) output. In that case evaluate is not what is wanted but dput (as already mentioned by Joshua Ullrich in comments and since I posted has been transferred to an answer) and dump will work. recordPlot and replayPlot will store and replot classic graphics on at least Windows. trellis.last.object will retrieve the last lattice graphics object. Also note that .Last.value holds the very last value at the interactive console.
You can use capture.output:
repr <- function(x) {
paste(sprintf('%s\n', capture.output(show(x))), collapse='')
}
For a version without the line numbers something along these lines should work:
repr <- function(x) {
cat(sprintf('%s\n', capture.output(show(x))), collapse='')
}
I had exactly the same question. I was wondering if something was built-in for this or if I would need to write it myself. I didn't find anything built-in so I wrote the following functions:
dputToString <- function (obj) {
con <- textConnection(NULL,open="w")
tryCatch({dput(obj,con);
textConnectionValue(con)},
finally=close(con))
}
dgetFromString <- function (str) {
con <- textConnection(str,open="r")
tryCatch(dget(con), finally=close(con))
}
I think this does what you want. Here is a test:
> rep <- dputToString(matrix(1:10,2,5))
> rep
[1] "structure(1:10, .Dim = c(2L, 5L))"
> mat <- dgetFromString(rep)
> mat
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
I am using R to parse a list of strings in the form:
original_string <- "variable_name=variable_value"
First, I extract the variable name and value from the original string and convert the value to numeric class.
parameter_value <- as.numeric("variable_value")
parameter_name <- "variable_name"
Then, I would like to assign the value to a variable with the same name as the parameter_name string.
variable_name <- parameter_value
What is/are the function(s) for doing this?
assign is what you are looking for.
assign("x", 5)
x
[1] 5
but buyer beware.
See R FAQ 7.21
http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-can-I-turn-a-string-into-a-variable_003f
You can use do.call:
do.call("<-",list(parameter_name, parameter_value))
There is another simple solution found there:
http://www.r-bloggers.com/converting-a-string-to-a-variable-name-on-the-fly-and-vice-versa-in-r/
To convert a string to a variable:
x <- 42
eval(parse(text = "x"))
[1] 42
And the opposite:
x <- 42
deparse(substitute(x))
[1] "x"
The function you are looking for is get():
assign ("abc",5)
get("abc")
Confirming that the memory address is identical:
getabc <- get("abc")
pryr::address(abc) == pryr::address(getabc)
# [1] TRUE
Reference: R FAQ 7.21 How can I turn a string into a variable?
Use x=as.name("string"). You can use then use x to refer to the variable with name string.
I don't know, if it answers your question correctly.
strsplit to parse your input and, as Greg mentioned, assign to assign the variables.
original_string <- c("x=123", "y=456")
pairs <- strsplit(original_string, "=")
lapply(pairs, function(x) assign(x[1], as.numeric(x[2]), envir = globalenv()))
ls()
assign is good, but I have not found a function for referring back to the variable you've created in an automated script. (as.name seems to work the opposite way). More experienced coders will doubtless have a better solution, but this solution works and is slightly humorous perhaps, in that it gets R to write code for itself to execute.
Say I have just assigned value 5 to x (var.name <- "x"; assign(var.name, 5)) and I want to change the value to 6. If I am writing a script and don't know in advance what the variable name (var.name) will be (which seems to be the point of the assign function), I can't simply put x <- 6 because var.name might have been "y". So I do:
var.name <- "x"
#some other code...
assign(var.name, 5)
#some more code...
#write a script file (1 line in this case) that works with whatever variable name
write(paste0(var.name, " <- 6"), "tmp.R")
#source that script file
source("tmp.R")
#remove the script file for tidiness
file.remove("tmp.R")
x will be changed to 6, and if the variable name was anything other than "x", that variable will similarly have been changed to 6.
I was working with this a few days ago, and noticed that sometimes you will need to use the get() function to print the results of your variable.
ie :
varnames = c('jan', 'feb', 'march')
file_names = list_files('path to multiple csv files saved on drive')
assign(varnames[1], read.csv(file_names[1]) # This will assign the variable
From there, if you try to print the variable varnames[1], it returns 'jan'.
To work around this, you need to do
print(get(varnames[1]))
If you want to convert string to variable inside body of function, but you want to have variable global:
test <- function() {
do.call("<<-",list("vartest","xxx"))
}
test()
vartest
[1] "xxx"
Maybe I didn't understand your problem right, because of the simplicity of your example. To my understanding, you have a series of instructions stored in character vectors, and those instructions are very close to being properly formatted, except that you'd like to cast the right member to numeric.
If my understanding is right, I would like to propose a slightly different approach, that does not rely on splitting your original string, but directly evaluates your instruction (with a little improvement).
original_string <- "variable_name=\"10\"" # Your original instruction, but with an actual numeric on the right, stored as character.
library(magrittr) # Or library(tidyverse), but it seems a bit overkilled if the point is just to import pipe-stream operator
eval(parse(text=paste(eval(original_string), "%>% as.numeric")))
print(variable_name)
#[1] 10
Basically, what we are doing is that we 'improve' your instruction variable_name="10" so that it becomes variable_name="10" %>% as.numeric, which is an equivalent of variable_name=as.numeric("10") with magrittr pipe-stream syntax. Then we evaluate this expression within current environment.
Hope that helps someone who'd wander around here 8 years later ;-)
Other than assign, one other way to assign value to string named object is to access .GlobalEnv directly.
# Equivalent
assign('abc',3)
.GlobalEnv$'abc' = 3
Accessing .GlobalEnv gives some flexibility, and my use case was assigning values to a string-named list. For example,
.GlobalEnv$'x' = list()
.GlobalEnv$'x'[[2]] = 5 # works
var = 'x'
.GlobalEnv[[glue::glue('{var}')]][[2]] = 5 # programmatic names from glue()