Collapsing character vectors with sprintf instead of paste - string

I have mostly used paste or paste0 for my pasting tasks in the past, but I'm pretty fascinated by the speed of sprintf. Yet I feel that I'm lacking some its basics.
Just wondered if there's also a way to collapse a multi-element character vector to one of length 1 as paste would do when using its collapse argument, that is, without having to specify respective wildcards and its values manually (in paste, I simply leave the task up to the function to find out how many elements should be collapsed).
x <- c("Pasted string:", "hello", "world!")
> sprintf("%s %s %s", x[1], x[2], x[3])
[1] "Pasted string: hello world!"
> paste(x, collapse=" ")
[1] "Pasted string: hello world!"
I'm looking for something like this (pseudo code)
> sprintf("<the-correct-parameter>", x)
[1] "Pasted string: hello world"
For the interested: benchmark of sprintf vs. paste
require("microbenchmark")
t1 <- median(microbenchmark(sprintf("%s %s %s", x[1], x[2], x[3]))$time)
t2 <- median(microbenchmark(paste(x, collapse=" "))$time)
> t1/t2
[1] 0.7273114

The function sprintf recycles its format string, so for example the code
cat(sprintf("%8.4f",rnorm(5)),"\n")
prints something like
-0.5685 -0.6481 0.6296 -0.0043 -1.4763
str = sprintf("%8.4f",rnorm(5))
stores the output in a vector of strings and
str_one = paste(sprintf("%8.4f",rnorm(5)),collapse='')
stores the output in a single string. The format string does not need to specify the number of floats to be printed. This also holds for printing integers and strings with the %d and %s formats.

Related

Meaning of specific MATLAB instruction num2str

I have been sent some complete MATLAB script. It appears that author of this code has made a specific instruction using num2str and set_param .
What is the purpose of the following line:
['[' num2str(operating_point) ']']
I am interested specially of an intention of using '[' as syntax.
set_param(system_block_name, operating_point_name,...
['[' num2str(operating_point) ']']
In MATLAB rectangular braces [] create vector/matrix.
By default, horizontal vector of strings concatenated into 1 string:
['str1','str2'] % produces str1str2
So
['[',']'] % will print []
num2str() converts number into string:
a=10;
my_str = ['[',num2str(a),']'] % will assign my_str = '[10]'
set_param() is some function that gets 3 parameters, where 3rd one is your string

How to split a string vector and recompose it in the original form

I would like to split a string vector, process its tokens, and then recompose it in the original form.
Please consider the following
vector.in <- c("red rum", "mur der", "red rum", "mur der")
length(vector.in)
# [1] 4
vector.splt <- strsplit(vector.in, "\\s")
vector.splt <- unlist(vector.splt)
vector.out <- paste(vector.splt, sep="", collapse=" ")
and of course
length(vector.out)
# [1] 1
How should I process it so to output a vector with the same form and length as the original vector.in, that is without loosing any information?
The unlist is the problem. That removes the structure too early. Then you need to loop around the elements and pass to the paste function. I will use lapply for the loop:
vector.in <- c("red rum", "mur der", "red rum", "mur der")
vector.splt <- strsplit(vector.in, "\\s")
unlist(lapply(vector.splt, paste, collapse=' '))
## [1] "red rum" "mur der" "red rum" "mur der"
The gsubfn function in the gsubfn package does that. For example, here we split the input into words, apply a function (represented in formula notation) to each word where in this case the function parenthesizes each word and then we put it all back together:
> library(gsubfn)
> gsubfn("\\w+", ~ sprintf("(%s)", x), vector.in)
[1] "(red) (rum)" "(mur) (der)" "(red) (rum)" "(mur) (der)"

Wrapping strings, but not substrings in quotes, using R

This question is related to my question about Roxygen.
I want to write a new function that does word wrapping of strings, similar to strwrap or stringr::str_wrap, but with the following twist: Any elements (substrings) in the string that are enclosed in quotes must not be allowed to wrap.
So, for example, using the following sample data
test <- "function(x=123456789, y=\"This is a long string argument\")"
cat(test)
function(x=123456789, y="This is a long string argument")
strwrap(test, width=40)
[1] "function(x=123456789, y=\"This is a long"
[2] "string argument\")"
I want the desired output of a newWrapFunction(x, width=40, ...) to be:
desired <- c("function(x=123456789, ", "y=\"This is a long string argument\")")
desired
[1] "function(x=123456789, "
[2] "y=\"This is a long string argument\")"
identical(desired, newWrapFunction(tsring, width=40))
[1] TRUE
Can you think of a way to do this?
PS. If you can help me solve this, I will propose this code as a patch to roxygen2. I have identified where this patch should be applied and will acknowledge your contribution.
Here's what I did to get strwrap so it would not break single quoted sections on spaces:
A) Pre-process the "even" sections after splitting by the single-quotes by substituting "~|~" for the spaces:
Define new function strwrapqt
....
zz <- strsplit(x, "\'") # will be only working on even numbered sections
for (i in seq_along(zz) ){
for (evens in seq(2, length(zz[[i]]), by=2)) {
zz[[i]][evens] <- gsub("[ ]", "~|~", zz[[i]][evens])}
}
zz <- unlist(zz)
.... insert just before
z <- lapply(strsplit) ...........
Then at the end replace all the "~|~" with spaces. It might be necessary to doa lot more thinking about the other sorts of whitespace "events" to get a fully regular treatment.
....
y <- gsub("~\\|~", " ", y)
....
Edit: Tested #joran's suggestion. Matching single and double quotes would be a difficult task with the methods I am using but if one were willing to consider any quote as equally valid as a separator target, one could just use zz <- strsplit(x, "\'|\"") as the splitting criterion in the code above.

How to read whitespace delimited strings until EOF in R

I am new to R and I am currently having trouble with reading a series of strings until I encounter an EOF. Not only I don't know how to detect EOF, but I also don't know how to read a single string separated by whitespace which is trivial to do in any other language I have seen so far. In C, I would simply do:
while (scanf("%s", s) == 1) { /* do something with s */ }
If possible, I would prefer a solution which does not require knowing the maximum length of strings in advance.
Any ideas?
EDIT: I am looking for solution which does not store all the input into memory, but the one equivalent or at least similar to the C code above.
Here's a way to read one item at a time... It uses the fact that scan has an nmax parameter (and n and nlines - it's actually kind of a mess!).
# First create a sample file to read from...
writeLines(c("Hello world", "and now", "Goodbye"), "foo.txt")
# Use a file connection to read from...
f <- file("foo.txt", "r")
i <- 0L
repeat {
s <- scan(f, "", nmax=1, quiet=TRUE)
if (length(s) == 0) break
i <- i + 1L
cat("Read item #", i, ": ", s, "\n", sep="")
}
close(f)
When scan encounters EOF, it returns a zero-length vector. So a more obscure but C-like way would be:
while (length(s <- scan(f, "", nmax=1, quiet=TRUE))) {
i <- i + 1L
cat("Read item #", i, ": ", s, "\n", sep="")
}
In any case, the output would be:
Read item #1: Hello
Read item #2: world
Read item #3: and
Read item #4: now
Read item #5: Goodbye
Finally, if you could vectorize what you do to the strings, you should probably try to read a bunch of them at a time - just change nmax to, say, 10000.
> txt <- "This is an example" # could be from a file but will use textConnection()
> read.table(textConnection(txt))
V1 V2 V3 V4
1 This is an example
read.table is implemented with scan, so you can just look at the code to see how the experts did it.

How do I put variable values into a text string in MATLAB?

I'm trying to write a simple function that takes two inputs, x and y, and passes these to three other simple functions that add, multiply, and divide them. The main function should then display the results as a string containing x, y, and the totals.
I think there's something I'm not understanding about output arguments. Anyway, here's my (pitiful) code:
function a=addxy(x,y)
a=x+y;
function b=mxy(x,y)
b=x*y;
function c=dxy(x,y)
c=x/y;
The main function is:
function [d e f]=answer(x,y)
d=addxy(x,y);
e=mxy(x,y);
f=dxy(x,y);
z=[d e f]
How do I get the values for x, y, d, e, and f into a string? I tried different matrices and stuff like:
['the sum of' x 'and' y 'is' d]
but none of the variables are showing up.
Two additional issues:
Why is the function returning "ans 3" even though I didn't ask for the length of z?
If anyone could recommend a good book for beginners to MATLAB scripting I'd really appreciate it.
Here's how you convert numbers to strings, and join strings to other things (it's weird):
>> ['the number is ' num2str(15) '.']
ans =
the number is 15.
You can use fprintf/sprintf with familiar C syntax. Maybe something like:
fprintf('x = %d, y = %d \n x+y=%d \n x*y=%d \n x/y=%f\n', x,y,d,e,f)
reading your comment, this is how you use your functions from the main program:
x = 2;
y = 2;
[d e f] = answer(x,y);
fprintf('%d + %d = %d\n', x,y,d)
fprintf('%d * %d = %d\n', x,y,e)
fprintf('%d / %d = %f\n', x,y,f)
Also for the answer() function, you can assign the output values to a vector instead of three distinct variables:
function result=answer(x,y)
result(1)=addxy(x,y);
result(2)=mxy(x,y);
result(3)=dxy(x,y);
and call it simply as:
out = answer(x,y);
As Peter and Amro illustrate, you have to convert numeric values to formatted strings first in order to display them or concatenate them with other character strings. You can do this using the functions FPRINTF, SPRINTF, NUM2STR, and INT2STR.
With respect to getting ans = 3 as an output, it is probably because you are not assigning the output from answer to a variable. If you want to get all of the output values, you will have to call answer in the following way:
[out1,out2,out3] = answer(1,2);
This will place the value d in out1, the value e in out2, and the value f in out3. When you do the following:
answer(1,2)
MATLAB will automatically assign the first output d (which has the value 3 in this case) to the default workspace variable ans.
With respect to suggesting a good resource for learning MATLAB, you shouldn't underestimate the value of the MATLAB documentation. I've learned most of what I know on my own using it. You can access it online, or within your copy of MATLAB using the functions DOC, HELP, or HELPWIN.
I just realized why I was having so much trouble - in MATLAB you can't store strings of different lengths as an array using square brackets. Using square brackets concatenates strings of varying lengths into a single character array.
>> a=['matlab','is','fun']
a =
matlabisfun
>> size(a)
ans =
1 11
In a character array, each character in a string counts as one element, which explains why the size of a is 1X11.
To store strings of varying lengths as elements of an array, you need to use curly braces to save as a cell array. In cell arrays, each string is treated as a separate element, regardless of length.
>> a={'matlab','is','fun'}
a =
'matlab' 'is' 'fun'
>> size(a)
ans =
1 3
I was looking for something along what you wanted, but wanted to put it back into a variable.
So this is what I did
variable = ['hello this is x' x ', this is now y' y ', finally this is d:' d]
basically
variable = [str1 str2 str3 str4 str5 str6]

Resources