truncate string from a certain character in R [duplicate] - string

This question already has answers here:
How do I specify a dynamic position for the start of substring?
(4 answers)
Closed 5 years ago.
I have a list of strings in R which looks like:
WDN.TO
WDR.N
WDS.AX
WEC.AX
WEC.N
WED.TO
I want to get all the postfix of the strings starting from the character ".", the result should look like:
.TO
.N
.AX
.AX
.N
.TO
Anyone have any ideas?

Joshua's solution works fine. I'd use sub instead of gsub though. gsub is for substituting multiple occurrences of a pattern in a string - sub is for one occurrence. The pattern can be simplified a bit too:
> x <- c("WDN.TO","WDR.N","WDS.AX","WEC.AX","WEC.N","WED.TO")
> sub("^[^.]*", "", x)
[1] ".TO" ".N" ".AX" ".AX" ".N" ".TO"
...But if the strings are as regular as in the question, then simply stripping the first 3 characters should be enough:
> x <- c("WDN.TO","WDR.N","WDS.AX","WEC.AX","WEC.N","WED.TO")
> substring(x, 4)
[1] ".TO" ".N" ".AX" ".AX" ".N" ".TO"

Using gsub:
x <- c("WDN.TO","WDS.N")
# replace everything from the start of the string to the "." with "."
gsub("^.*\\.",".",x)
# [1] ".TO" ".N"
Using strsplit:
# strsplit returns a list; use sapply to get the 2nd obs of each list element
y <- sapply(strsplit(x,"\\."), `[`, 2)
# since we split on ".", we need to put it back
paste(".",y,sep="")
# [1] ".TO" ".N"

Strsplit might do it but in case the data set is too large it will show an error
subscript out of bounds
x <- c("WDN.TO","WDR.N","WDS.AX","WEC.AX","WEC.N","WED.TO")
y <- strsplit(x,".")[,2]
#output y= TO N AX AX N TO

Related

how to position a string character in R

Suppose I have a string like:
x<-c("bv_bid_bayley_inf_development_f7r","bv_fci_family_care_indicator_f7r")
how can I position the first "_" (a) and the last "_" (b) so that I can substr(x,a,b) in R. Such a output like that:
bid_bayley_inf_development
fci_family_care_indicator
You can use regular expressions to extract the substring:
x <- c("bv_bid_bayley_inf_development_f7r", "bv_fci_family_care_indicator_f7r")
sub("[^_]*_(.*)_[^_]*", "\\1", x)
# [1] "bid_bayley_inf_development" "fci_family_care_indicator"
for position only,
gregexpr("_",x)

Automatic acronyms of strings in R

Long strings in plots aren't always attractive. What's the shortest way of making an acronym in R? E.g., "Hello world" to "HW", and preferably to have unique acronyms.
There's function abbreviate, but it just removes some letters from the phrase, instead of taking first letters of each word.
An easy way would be to use a combination of strsplit, substr, and make.unique.
Here's an example function that can be written:
makeInitials <- function(charVec) {
make.unique(vapply(strsplit(toupper(charVec), " "),
function(x) paste(substr(x, 1, 1), collapse = ""),
vector("character", 1L)))
}
Test it out:
X <- c("Hello World", "Home Work", "holidays with children", "Hello Europe")
makeInitials(X)
# [1] "HW" "HW.1" "HWC" "HE"
That said, I do think that abbreviate should suffice, if you use some of its arguments:
abbreviate(X, minlength=1)
# Hello World Home Work holidays with children Hello Europe
# "HlW" "HmW" "hwc" "HE"
Using regex you can do following. The regex pattern ((?<=\\s).|^.) looks for any letter followed by space or first letter of the string. Then we just paste resulting vectors using collapse argument to get first letter based acronym. And as Ananda suggested, if you want to make unique pass the result through make.unique.
X <- c("Hello World", "Home Work", "holidays with children")
sapply(regmatches(X, gregexpr(pattern = "((?<=\\s).|^.)", text = X, perl = T)), paste, collapse = ".")
## [1] "H.W" "H.W" "h.w.c"
# If you want to make unique
make.unique(sapply(regmatches(X, gregexpr(pattern = "((?<=\\s).|^.)", text = X, perl = T)), paste, collapse = "."))
## [1] "H.W" "H.W.1" "h.w.c"

Extract the first (or last) n characters of a string

I want to extract the first (or last) n characters of a string. This would be the equivalent to Excel's LEFT() and RIGHT(). A small example:
# create a string
a <- paste('left', 'right', sep = '')
a
# [1] "leftright"
I would like to produce b, a string which is equal to the first 4 letters of a:
b
# [1] "left"
What should I do?
See ?substr
R> substr(a, 1, 4)
[1] "left"
The stringr package provides the str_sub function, which is a bit easier to use than substr, especially if you want to extract right portions of your string :
R> str_sub("leftright",1,4)
[1] "left"
R> str_sub("leftright",-5,-1)
[1] "right"
You can easily obtain Right() and Left() functions starting from the Rbase package:
right function
right = function (string, char) {
substr(string,nchar(string)-(char-1),nchar(string))
}
left function
left = function (string,char) {
substr(string,1,char)
}
you can use those two custom-functions exactly as left() and right() in excel.
Hope you will find it useful
Make it simple and use R basic functions:
# To get the LEFT part:
> substr(a, 1, 4)
[1] "left"
>
# To get the MIDDLE part:
> substr(a, 3, 7)
[1] "ftrig"
>
# To get the RIGHT part:
> substr(a, 5, 10)
[1] "right"
The substr() function tells you where start and stop substr(x, start, stop)
For those coming from Microsoft Excel or Google Sheets, you would have seen functions like LEFT(), RIGHT(), and MID(). I have created a package known as forstringr and its development version is currently on Github.
if(!require("devtools")){
install.packages("devtools")
}
devtools::install_github("gbganalyst/forstringr")
library(forstringr)
the str_left(): This counts from the left and then extract n characters
the str_right()- This counts from the right and then extract n characters
the str_mid()- This extract characters from the middle
Examples:
x <- "some text in a string"
str_left(x, 4)
[1] "some"
str_right(x, 6)
[1] "string"
str_mid(x, 6, 4)
[1] "text"

How to extract substrings from this string?

The string is
And I want to get substrings "11","1.1","282". Can anyone show me how to do this in R? Thanks!
I believe strsplit(x," +")[[1]] will do it. (the regular expression " +" denotes one or more spaces; strsplit applies to character vectors, and returns a list with the splitted version of each element in the vector, so [[1]] extracts the first (and only) component)
> x = "11 1.1 282"
> res <- strsplit(x, " +")
> res
[[1]]
[1] "11" "1.1" "282"
>

R: How can I replace let's say the 5th element within a string?

I would like to convert the a string like be33szfuhm100060 into BESZFUHM0060.
In order to replace the small letters with capital letters I've so far used the gsub function.
test1=gsub("be","BE",test)
Is there a way to tell this function to replace the 3rd and 4th string element? If not, I would really appreciate if you could tell me another way to solve this problem. Maybe there is also a more general solution to change a string element at a certain position into a capital letter whatever the element is?
A couple of observations:
Cnverting a string to uppercase can be done with toupper, e.g.:
> toupper('be33szfuhm100060')
> [1] "BE33SZFUHM100060"
You could use substr to extract a substring by character positions and paste to concatenate strings:
> x <- 'be33szfuhm100060'
> paste(substr(x, 1, 2), substr(x, 5, nchar(x)), sep='')
[1] "beszfuhm100060"
As an alternative, if you are going to be doing this alot:
String <- function(x="") {
x <- as.character(paste(x, collapse=""))
class(x) <- c("String","character")
return(x)
}
"[.String" <- function(x,i,j,...,drop=TRUE) {
unlist(strsplit(x,""))[i]
}
"[<-.String" <- function(x,i,j,...,value) {
tmp <- x[]
tmp[i] <- String(value)
x <- String(tmp)
x
}
print.String <- function(x, ...) cat(x, "\n")
## try it out
> x <- String("be33szfuhm100060")
> x[3:4] <- character(0)
> x
beszfuhm100060
You can use substring to remove the third and fourth elements.
x <- "be33szfuhm100060"
paste(substring(x, 1, 2), substring(x, 5), sep = "")
If you know what portions of the string you want based on their position(s), use substr or substring. As I mentioned in my comment, you can use toupper to coerce characters to uppercase.
paste( toupper(substr(test,1, 2)),
toupper(substr(test,5,10)),
substr(test,12,nchar(test)),sep="")
# [1] "BESZFUHM00060"

Resources