Automatically add variable names to elements of a list [duplicate] - string

This question already has answers here:
Can lists be created that name themselves based on input object names?
(4 answers)
Closed 6 years ago.
I have a list of models, and to make the code easiser to maintain (so roubst to adding and removing models) I'd like to have a single place where I store them and their names. To do this I have to solve the following naming problem.
Upstream, i have generated models in a way that's less efficient than the following (if it was this compressed, i would assign them to their own env).
lmNms <- c( "mod1", "mod2", "mod3", "mod4", "mod5", "mod6")
lapply(lmNms, function(N) assign(N, lm(runif(10) ~ rnorm(10)), env = .GlobalEnv))
Downstream, i have collected the mess into a list:
modelList <- list(mod1, mod2, mod3, mod4, mod5, mod6)
I have an (un-named) lists of variable output, and attach the names as follows:
output <- list(1, 2, 3, 4, 5, 6)
names(output) <- lmNms
I'd like to be able to use the model names from modelList:
modelList <- list(mod1, mod2, mod3, mod4, mod5, mod6)
names(output) <- someFun(modelList)
I'm sure there exists someFun -- but I cannot figure it out ... can this be done?
To be clear, the aim is to do this without using lmNms -- i want to get the names either from modelList, or have them attach at the point that i build modelList (the point is to avoid list(a = a, b=b ...) boilerplate.

The key to this is to re-make the list function to stick on the names when you don't supply the names as well.
listN <- function(...){
anonList <- list(...)
names(anonList) <- as.character(substitute(list(...)))[-1]
anonList
}
With this, you make modelList as follows:
modelList <- listN(mod1, mod2, mod3, mod4, mod5, mod6)
With the names attached:
R> names(modelList)
[1] "mod1" "mod2" "mod3" "mod4" "mod5" "mod6"
A fuller solution is given here, which is robust to the use of a mixture of anonymous and named arguments to list.
listN2 <- function(...){
dots <- list(...)
inferred <- sapply(substitute(list(...)), function(x) deparse(x)[1])[-1]
if(is.null(names(inferred))){
names(dots) <- inferred
} else {
names(dots)[names(inferred) == ""] <- inferred[names(inferred) == ""]
}
dots
}

You can do this with environments:
e <- new.env()
output <- list(1,2,3,4,5,6)
nms <- c( "mod1", "mod2", "mod3", "mod4", "mod5", "mod6")
for(i in 1:length(output)) {
nm <- nms[i]
e[[nm]] <- output[[i]]
}
You can reference items in the environment like any list, or coerce it to a list
> ls(e)
[1] "mod1" "mod2" "mod3" "mod4" "mod5" "mod6"
> e[['mod1']]
[1] 1
> e$mod1
[1] 1
> new_output <- as.list(e)
Since environments act a lot like lists, there is probably an easy way to do it with your original list as well.

Use sapply with simplify=FALSE. This will assign names to the result.
sapply(lmNms, get, simplify=FALSE)

Related

Haskell: initialise a list with symbols?

In Haskell, is there a way of initialising a list and declaring symbols in that list at the same time?
Currently I do this:
import Data.List
main = do
let lambda = "\x03BB"
xi = "\x926"
bol = "\x1D539"
cohomology_algebra = [ lambda, bol, xi]
putStrLn $ xi
putStrLn $ show cohomology_algebra
However I have a long list of symbols and I worry that i forget to put them all in the list (it has happened)
Ideally I would do something like:
main = do
let cohomology_algebra = [ lambda = "\x03BB", bol = "\x1D539", xi= "\x926"] -- does not compile
putStrLn $ show cohomology_algebra
Is there a way around this?
Not a perfect solution, but you could use
let cohomology_algebra#[lambda, bol, xi] = ["\x03BB", "\x926", "\x1D539"]
This will trigger a runtime error if the two lists above have different length (at the point where the names are demanded).
It's not optimal, since this check should be at compile time instead. Further, in this code style we have to separate the identifier form its value too much, making it possible to swap some definitions by mistake.

Build a specific list of string from a string in Haskell

I'm starting learning haskell and i'm stuck in a problem.
I read from the standard input a string like "1234" or "azer"
and I want to make a list like ["123", "234", "341", "412"] or ["aze", "zer", "era", "raz"].
I probably must use map but i don't know how to proceed.
Is someone can help me to do that ? Thanks
Let's start with a list, [1..4]. Let's repeat it for eternity:
>>> cycle [1..4]
[1,2,3,4,1,2,3,4,1,2,3,4,...
Now let's take a slice of it, at say, the 2nd index:
>>> take 4 $ drop (2-1) $ cycle [1..4]
[2,3,4,1]
We can generalize this by naming a function:
slice n = take 4 $ drop n $ cycle [1..4]
To obtain all possible cyclic permutations, we only need to sample n from 1 to 4:
>>> map slice [1..4]
[[2,3,4,1],[3,4,1,2],[4,1,2,3],[1,2,3,4]]
Now, how can we make this work with an arbitrary string? Let's redefine slice to accept a string:
slice s n = take (length s) $ drop n $ cycle s
And so our cyclic permutations function can be defined as follows:
cyclicPerms s = map (slice s) [1..(length s)]
Testing:
>>> cyclicPerms "abcde"
["bcdea","cdeab","deabc","eabcd","abcde"]
I had originally posted an answer totally misunderstanding the specification. I, like a Haskell enumeration, read only the first two numbers so I thought it continued as such. Oops. In any event I just adapted a chunks function I wrote to produce the repetitions. When I get home, I think I have another that cycles lists. I'll post it as well if it's not the same. Who knows.
This function allows you to specify the chunk size as well as the list.
cychnks n ls = [take n.drop x$ls2|(x,y) <-zip [0..] ls]
where ls2 = ls++ls
cychnks 5 "abcde"
["abcde","bcdea","cdeab","deabc","eabcd"]
cychnks 3 "abcde"
["abc","bcd","cde","dea","eab"]

assign global variable in R function using tm: bug on linux?

Using the package tm in R, I want to transform a corpus with a pretty complicated function, and I need some side effects for storing pertinent information. Since content_transformer requires a specific function format, the easy way is to use <<- in my function. The problem occurs with the code below:
library(tm)
a <- 4
n <- 2
corp<-VCorpus(VectorSource(rep("fish",n)))
(func<-content_transformer(
function(x) {
a <<- 42
return(x)
}))
corp<-tm_map(corp,func)
print(a)
It prints the wrong answer, i.e. 4. But with n=1, it prints the right one. So I assume it is the multi-threading that tm does that fails to behave as scalar R. I guess it's a bug since on windows, it works. (Note: I use R 3.1.2 on linux and R 3.1.1 on windows).
Questions: is it a bug? if yes, a known bug? Is there an easy solution that does not require to refactor the code?
Thanks!
edit: additional example using assing
rm(list=ls())
library(tm)
env <- new.env()
a <- 1
n <- 2
corp<-VCorpus(VectorSource(rep("fish",n)))
(func<-content_transformer(
function(x,e) {
assign("a", 42, envir=e)
print(e)
print(ls.str(e))
return(x)
}))
corp<-tm_map(corp,func,env)
print(env)
print(ls.str(env))
In fact this works accidentally under Windows because mclapply is not defined under windows and it is just a call to to an lapply.
Indeed, when you are calling tm_map , you are using this function :
tm:::tm_map.VCorpus
function (x, FUN, ..., lazy = FALSE)
{
if (lazy) {
fun <- function(x) FUN(x, ...)
if (is.null(x$lazy))
x$lazy <- list(index = rep(TRUE, length(x)), maps = list(fun))
else x$lazy$maps <- c(x$lazy$maps, list(fun))
}
else x$content <- mclapply(content(x), FUN, ...) ## this the important line
x
}
So you can reproduce the "odd/normal" behavior by calling mclapply:
library(parallel)
res <- mclapply(1:2, function(x){a<<- 20;x})
a
[1] 4
a is unchanged and is still equal to 4. This is the normal paralel behavior since we avoid to have side effect. Under windows mcapply is just a call to lapply so the value of the global variable is correctly changed.
pseudo solution
Here better to use lapply if you want to the global side effect, but you can emulate the global variable in a reading specially if you add a as a second argument your function...
func <-
function(x,a) {
a <- 42 ## use a here
x$a <- a ## assign it to the x environment
return(x) ## but the new value of a can not be used by others documents..
}
(Func<-content_transformer(func))
res <- tm_map(corp,Func,a=4)

How to extract different part of 2 strings

I have
x<-c('abczzzdef','abcxxdef')
I want a function
fn(x)
that returns a length 2 vector
[1] 'zzz' 'xx'
How?
(I have tried searching for an answer but search terms like 'partial matching' give me something quite different)
Update
'length 2 vector' means length(fn(x)) is 2 and fn(x)[1] give "zzz" while fn(x)[2] gives "xx".
After trying out the answers provided, I realize I haven't been specific enough.
There will only be 2 strings (in a vector) that I am comparing.
The location of the different parts (zzz and xx) can be anywhere in the string. i.e. it could be x<-c('zzzabcdef','xxabcdef') or it could be at the end. But the 2 strings are always at the same respective place (i.e. both at the beginning, or both at the middle, or both at the end).
zzz and xx are obviously generic names. They could be different things (numbers, alphabet, symbols) and of different length (not necessarily 3 and 2).
Same comment applies to abc and def.
I have got some test cases
x1<-c('abcxxxttt','abczzttt')
x2<-c('abcxxxdef','abczz126gsdef')
x3<-c('xx_x123../t','z_z126gs123../t')
fn(x1) should give "xxx" "zz"
fn(x2) should give "xxx" "zz126gs"
fn(x3) should give "xx_x" "z_z126gs"
x<-c('abczzzdef','abcxxdef')
fn <- function(x) unlist(regmatches(x, gregexpr("(.)\\1+", x)))
fn(x)
# [1] "zzz" "xx"
First of all, it would have been better to include all that detail in the first version of the question. No need to waste people's time coming up with solutions that wont work for you just because you didn't clearly explain what you needed. If you need to change a question that much after it's already been answered, it probably would be best to ask a new question rather than completely changing your first one.
What you are tying to do, find the largest non-shared portion of a string, can be a pretty messy process for a computer. A somewhat standard measure of string dissimilarity is the generalized Levenshtein distance which R has implemented in the adist function. It can produce a string which tells you how to transform one string into another via matches, insertions, deletions, and substitutions. If I find the longest string of matches, I'll have a pretty good idea of where to extract the unique information.
So this method basically focuses on extracting the regions outside of the best matches. Here's the function that does the matching
fn <- function(x) {
ld <- attr(adist(x[1], x[2], counts=T,
costs=c(substitutions=500)),"trafos")[1,1]
starts <- gregexpr("M+", ld)[[1]]
lens <- attr(starts,"match.length")
starts <- as.vector(starts)
ends <- starts + lens - 1
bm <- which.max(lens)
if (starts[bm]==1 | ends[bm]==nchar(ld)) {
#beg/end
for( i in which(starts==1 | ends==nchar(ld))) {
substr(ld, starts[i], ends[i]) <-
paste(rep("X", lens[i]), collapse="")
}
} else {
#middle
substr(ld, starts[bm], ends[bm]) <-
paste(rep("X", lens[bm]), collapse="")
}
tr <- strsplit(ld,"")[[1]]
x1 <- cumsum(tr %in% c("D","M","X"))[!tr %in% c("X","I")]
x2 <- cumsum(tr %in% c("I","M","X"))[!tr %in% c("X","D")]
c(substr(x[1], min(x1), max(x1)), substr(x[2], min(x2), max(x2)))
}
Now we can apply it to your test data
x1 <- c('abcxxxttt','abczzttt')
x2 <- c('abcxxxdef','abczz126gsdef')
x3 <- c('xx_x123../t','z_z126gs123../t')
fn(x1)
# [1] "xxx" "zz"
fn(x2)
# [1] "xxx" "zz126gs"
fn(x3)
# [1] "xx_x" "z_z126gs"
So we get the results you expect. Here I do little error checking. I assume there will always be some overlap and some non-overlapping regions. If that's not true, the function will likely produce an error or unexpected results.
gsub("([^xz]*)([xz]*)([^xz]*)", "\\2", x)
[1] "zzz" "xx"
> getxz <- function(x, str) gsub(paste0("([^",str, ']*)([', str, ']*)([^', str, ']*)'),
"\\2", x)
> getxz(x=x,"xz")
[1] "zzz" "xx"
In response to the new examples I offer these tests which I think provides three successes:
> getxz(x=x1,"xz_")
[1] "xxx" "zz"
> getxz(x=x2,"xz_")
[1] "xxx" "zz"
> getxz(x=x3,"xz_")
[1] "xx_x" "z_z"

R coding string

I have a list of 96 files that I would like to open and perform some functions on the data. I am VERY new to R, and am unsure how to manipulate strings to open the sequential file names. Here is my code below, which clearly does not work:
for (N in (1:96)){
if (N > 10)
TrackID <- "000$N"
}
else{
TrackID <- "00$N"
}
fname_in <- 'input/intersections_track_calibrated_jma_from1951_$TrackID.csv'
fname_out <- 'output/tracks_crossing_regional_polygon_$TrackID.csv'
......data manipulations.....
}
So basically I just need to be able to, for instance, when N=1, reference a file called intersections_track_calibrated_jma_from1951_0001.csv.
Thanks in advance!
Kimberly
I think that what you are looking for is the sprintf() function...
sprintf will save you from having the tests on n, to known how many leading zeros are needed.
Combined with the paste() or paste0 function, producing the desired file name becomes a one-liner.
Indeed it would be possibly to just use the sprintf() function alone, as in
sprintf("intersections_track_calibrated_jma_from1951_%04d.csv", n) but having a function to produce the file names and/or the "TrakID" may allow to hide all these file naming convention details away.
Below, see sprintf() and paste0() in action, in the context of a convenience function created to produce the filename given a number n.
> GetFileName <- function(n)
paste0("intersections_track_calibrated_jma_from1951_",
sprintf("%04d", n),
".csv")
> GetFileName(1)
[1] "intersections_track_calibrated_jma_from1951_0001.csv"
> GetFileName(13)
[1] "intersections_track_calibrated_jma_from1951_0013.csv"
> GetFileName(321)
[1] "intersections_track_calibrated_jma_from1951_0321.csv"
>
Of course, you could make the GetFileName function more versatile by adding parameters, some of which with a default value. In that fashion it could be used for both input and output file name (or any other file prefix/extension). For example:
GetFileName <- function(n, prefix=NA, ext="csv") {
if (is.na(prefix)) {
prefix <- "intersections_track_calibrated_jma_from1951_"
}
paste0(prefix, sprintf("%04d", n), ".", ext)
}
> GetFileName(12)
[1] "intersections_track_calibrated_jma_from1951_0012.csv"
> GetFileName(12, "output/tracks_crossing_regional_polygon_", "txt")
[1] "output/tracks_crossing_regional_polygon_0012.txt"
> GetFileName(12, "output/tracks_crossing_regional_polygon_")
[1] "output/tracks_crossing_regional_polygon_0012.csv"
>
Try using paste and paste0 to generate strings like this instead.
for (N in (1:96)){
if (N > 10)
TrackID <- paste0("000",N)
}
else{
TrackID <- paste0("00",N)
}
fname_in <- paste0('input/intersections_track_calibrated_jma_from1951_',
TrackID.'.csv')
fname_out <- paste0('output/tracks_crossing_regional_polygon_',
TrackID,'.csv')
......data manipulations.....
}
paste0 just saves you from writing sep="" if you don't require a seperator (as in your case)

Resources