Getting and removing the first character of a string - string

I would like to do some 2-dimensional walks using strings of characters by assigning different values to each character. I was planning to 'pop' the first character of a string, use it, and repeat for the rest of the string.
How can I achieve something like this?
x <- 'hello stackoverflow'
I'd like to be able to do something like this:
a <- x.pop[1]
print(a)
'h'
print(x)
'ello stackoverflow'

See ?substring.
x <- 'hello stackoverflow'
substring(x, 1, 1)
## [1] "h"
substring(x, 2)
## [1] "ello stackoverflow"
The idea of having a pop method that both returns a value and has a side effect of updating the data stored in x is very much a concept from object-oriented programming. So rather than defining a pop function to operate on character vectors, we can make a reference class with a pop method.
PopStringFactory <- setRefClass(
"PopString",
fields = list(
x = "character"
),
methods = list(
initialize = function(x)
{
x <<- x
},
pop = function(n = 1)
{
if(nchar(x) == 0)
{
warning("Nothing to pop.")
return("")
}
first <- substring(x, 1, n)
x <<- substring(x, n + 1)
first
}
)
)
x <- PopStringFactory$new("hello stackoverflow")
x
## Reference class object of class "PopString"
## Field "x":
## [1] "hello stackoverflow"
replicate(nchar(x$x), x$pop())
## [1] "h" "e" "l" "l" "o" " " "s" "t" "a" "c" "k" "o" "v" "e" "r" "f" "l" "o" "w"

There is also str_sub from the stringr package
x <- 'hello stackoverflow'
str_sub(x, 2) # or
str_sub(x, 2, str_length(x))
[1] "ello stackoverflow"

Use this function from stringi package
> x <- 'hello stackoverflow'
> stri_sub(x,2)
[1] "ello stackoverflow"

substring is definitely best, but here's one strsplit alternative, since I haven't seen one yet.
> x <- 'hello stackoverflow'
> strsplit(x, '')[[1]][1]
## [1] "h"
or equivalently
> unlist(strsplit(x, ''))[1]
## [1] "h"
And you can paste the rest of the string back together.
> paste0(strsplit(x, '')[[1]][-1], collapse = '')
## [1] "ello stackoverflow"

removing first characters:
x <- 'hello stackoverflow'
substring(x, 2, nchar(x))
Idea is select all characters starting from 2 to number of characters in x. This is important when you have unequal number of characters in word or phrase.
Selecting the first letter is trivial as previous answers:
substring(x,1,1)

Another alternative is to use capturing sub-expressions with the regular expression functions regmatches and regexec.
# the original example
x <- 'hello stackoverflow'
# grab the substrings
myStrings <- regmatches(x, regexec('(^.)(.*)', x))
This returns the entire string, the first character, and the "popped" result in a list of length 1.
myStrings
[[1]]
[1] "hello stackoverflow" "h" "ello stackoverflow"
which is equivalent to list(c(x, substr(x, 1, 1), substr(x, 2, nchar(x)))). That is, it contains the super set of the desired elements as well as the full string.
Adding sapply will allow this method to work for a character vector of length > 1.
# a slightly more interesting example
xx <- c('hello stackoverflow', 'right back', 'at yah')
# grab the substrings
myStrings <- regmatches(x, regexec('(^.)(.*)', xx))
This returns a list with the matched full string as the first element and the matching subexpressions captured by () as the following elements. So in the regular expression '(^.)(.*)', (^.) matches the first character and (.*) matches the remaining characters.
myStrings
[[1]]
[1] "hello stackoverflow" "h" "ello stackoverflow"
[[2]]
[1] "right back" "r" "ight back"
[[3]]
[1] "at yah" "a" "t yah"
Now, we can use the trusty sapply + [ method to pull out the desired substrings.
myFirstStrings <- sapply(myStrings, "[", 2)
myFirstStrings
[1] "h" "r" "a"
mySecondStrings <- sapply(myStrings, "[", 3)
mySecondStrings
[1] "ello stackoverflow" "ight back" "t yah"

Another way using the sub function.
sub('(^.).*', '\\1', 'hello stackoverflow')
[1] "h"
sub('(^.)(.*)', '\\2', 'hello stackoverflow')
[1] "ello stackoverflow"

Related

strsplit with vertical bar (pipe)

Here,
> r<-c("AAandBB", "BBandCC")
> strsplit(as.character(r),'and')
[[1]]
[1] "AA" "BB"
[[2]]
[1] "BB" "CC"
Working well, but
> r<-c("AA|andBB", "BB|andCC")
> strsplit(as.character(r),'|and')
[[1]]
[1] "A" "A" "|" "" "B" "B"
[[2]]
[1] "B" "B" "|" "" "C" "C"
Here, the answer is not correct. How to get "AA" and "BB", when I use '|and'?
Thanks in advance.
As you can read on ?strsplit, the argument split in function strsplit is a regular expression. Hence either you need to escape the vertical bar (it is a special character)
strsplit(r,split='\\|and')
or you can choose fixed=TRUE to indicate that split is not a regular expression
strsplit(r,split='|and',fixed=TRUE)

Automatic acronyms of strings in R

Long strings in plots aren't always attractive. What's the shortest way of making an acronym in R? E.g., "Hello world" to "HW", and preferably to have unique acronyms.
There's function abbreviate, but it just removes some letters from the phrase, instead of taking first letters of each word.
An easy way would be to use a combination of strsplit, substr, and make.unique.
Here's an example function that can be written:
makeInitials <- function(charVec) {
make.unique(vapply(strsplit(toupper(charVec), " "),
function(x) paste(substr(x, 1, 1), collapse = ""),
vector("character", 1L)))
}
Test it out:
X <- c("Hello World", "Home Work", "holidays with children", "Hello Europe")
makeInitials(X)
# [1] "HW" "HW.1" "HWC" "HE"
That said, I do think that abbreviate should suffice, if you use some of its arguments:
abbreviate(X, minlength=1)
# Hello World Home Work holidays with children Hello Europe
# "HlW" "HmW" "hwc" "HE"
Using regex you can do following. The regex pattern ((?<=\\s).|^.) looks for any letter followed by space or first letter of the string. Then we just paste resulting vectors using collapse argument to get first letter based acronym. And as Ananda suggested, if you want to make unique pass the result through make.unique.
X <- c("Hello World", "Home Work", "holidays with children")
sapply(regmatches(X, gregexpr(pattern = "((?<=\\s).|^.)", text = X, perl = T)), paste, collapse = ".")
## [1] "H.W" "H.W" "h.w.c"
# If you want to make unique
make.unique(sapply(regmatches(X, gregexpr(pattern = "((?<=\\s).|^.)", text = X, perl = T)), paste, collapse = "."))
## [1] "H.W" "H.W.1" "h.w.c"

Password generator function in R

I am looking for a smart way to code a password generator function in R:
generate.password (length, capitals, numbers)
length: the length of the password
capitals: a vector of defining where capitals shall occur, vector reflects the corresponsing password string position, default should be no capitals
numbers: a vector defining where capitals shall occur, vector reflects the corresponsing password string position, default should be no numbers
Examples:
generate.password(8)
[1] "hqbfpozr"
generate.password(length=8, capitals=c(2,4))
[1] "hYbFpozr"
generate.password(length=8, capitals=c(2,4), numbers=c(7:8))
[1] "hYbFpo49"
There is function which generates random strings in the stringi (version >= 0.2-3) package:
require(stringi)
stri_rand_strings(n=2, length=8, pattern="[A-Za-z0-9]")
## [1] "90i6RdzU" "UAkSVCEa"
So using different patterns you can generate parts for your desired password and then paste it like this:
x <- stri_rand_strings(n=4, length=c(2,1,2,3), pattern=c("[a-z]","[A-Z]","[0-9]","[a-z]"))
x
## [1] "ex" "N" "81" "tsy"
stri_flatten(x)
## [1] "exN81tsy"
Here's one approach
generate.password <- function(length,
capitals = integer(0),
numbers = integer(0)) {
stopifnot(is.numeric(length), length > 0L,
is.numeric(capitals), capitals > 0L, capitals <= length,
is.numeric(numbers), numbers > 0L, numbers <= length,
length(intersect(capitals, numbers)) == 0L)
lc <- sample(letters, length, replace = TRUE)
uc <- sample(LETTERS, length(capitals), replace = TRUE)
num <- sample(0:9, length(numbers), replace = TRUE)
pass <- lc
pass[capitals] <- uc
pass[numbers] <- num
paste0(pass, collapse = "")
}
## Examples
set.seed(1)
generate.password(8)
# [1] "gjoxfxyr"
set.seed(1)
generate.password(length=8, capitals=c(2,4))
# [1] "gQoBfxyr"
set.seed(1)
generate.password(length=8, capitals=c(2,4), numbers=c(7:8))
# [1] "gQoBfx21"
You can also add other special characters in the same fashion. If you want repeated values for letters and numbers, then add replace =TRUE in sample function.
I liked the solution given by #Hadd E. Nuff... and What I did, is the inclusion of digits between 0 and 9, at random... here is the modified solution...
generate.password <- function(LENGTH){
punct <- c("!", "#", "$", "%", "&", "(", ")", "*", "+", "-", "/", ":",
";", "<", "=", ">", "?", "#", "[", "^", "_", "{", "|", "}", "~")
nums <- c(0:9)
chars <- c(letters, LETTERS, punct, nums)
p <- c(rep(0.0105, 52), rep(0.0102, 25), rep(0.02, 10))
pword <- paste0(sample(chars, LENGTH, TRUE, prob = p), collapse = "")
return(pword)
}
generate.password(8)
This will generate very strong passwords like:
"C2~mD20U" # 8 alpha-numeric-specialchar
"+J5Gi3" # 6 alpha-numeric-specialchar
"77{h6RsGQJ66if5" # 15 alpha-numeric-specialchar

How to extract substrings from this string?

The string is
And I want to get substrings "11","1.1","282". Can anyone show me how to do this in R? Thanks!
I believe strsplit(x," +")[[1]] will do it. (the regular expression " +" denotes one or more spaces; strsplit applies to character vectors, and returns a list with the splitted version of each element in the vector, so [[1]] extracts the first (and only) component)
> x = "11 1.1 282"
> res <- strsplit(x, " +")
> res
[[1]]
[1] "11" "1.1" "282"
>

R: How can I replace let's say the 5th element within a string?

I would like to convert the a string like be33szfuhm100060 into BESZFUHM0060.
In order to replace the small letters with capital letters I've so far used the gsub function.
test1=gsub("be","BE",test)
Is there a way to tell this function to replace the 3rd and 4th string element? If not, I would really appreciate if you could tell me another way to solve this problem. Maybe there is also a more general solution to change a string element at a certain position into a capital letter whatever the element is?
A couple of observations:
Cnverting a string to uppercase can be done with toupper, e.g.:
> toupper('be33szfuhm100060')
> [1] "BE33SZFUHM100060"
You could use substr to extract a substring by character positions and paste to concatenate strings:
> x <- 'be33szfuhm100060'
> paste(substr(x, 1, 2), substr(x, 5, nchar(x)), sep='')
[1] "beszfuhm100060"
As an alternative, if you are going to be doing this alot:
String <- function(x="") {
x <- as.character(paste(x, collapse=""))
class(x) <- c("String","character")
return(x)
}
"[.String" <- function(x,i,j,...,drop=TRUE) {
unlist(strsplit(x,""))[i]
}
"[<-.String" <- function(x,i,j,...,value) {
tmp <- x[]
tmp[i] <- String(value)
x <- String(tmp)
x
}
print.String <- function(x, ...) cat(x, "\n")
## try it out
> x <- String("be33szfuhm100060")
> x[3:4] <- character(0)
> x
beszfuhm100060
You can use substring to remove the third and fourth elements.
x <- "be33szfuhm100060"
paste(substring(x, 1, 2), substring(x, 5), sep = "")
If you know what portions of the string you want based on their position(s), use substr or substring. As I mentioned in my comment, you can use toupper to coerce characters to uppercase.
paste( toupper(substr(test,1, 2)),
toupper(substr(test,5,10)),
substr(test,12,nchar(test)),sep="")
# [1] "BESZFUHM00060"

Resources