Making a string concatenation operator in R - string

I was wondering how one might go about writing a string concatenation operator in R, something like || in SAS, + in Java/C# or & in Visual Basic.
The easiest way would be to create a special operator using %, like
`%+%` <- function(a, b) paste(a, b, sep="")
but this leads to lots of ugly %'s in the code.
I noticed that + is defined in the Ops group, and you can write S4 methods for that group, so perhaps something like that would be the way to go. However, I have no experience with S4 language features at all. How would I modify the above function to use S4?

As others have mentioned, you cannot override the sealed S4 method "+". However, you do not need to define a new class in order to define an addition function for strings; this is not ideal since it forces you to convert the class of strings and thus leading to more ugly code. Instead, one can simply overwrite the "+" function:
"+" = function(x,y) {
if(is.character(x) || is.character(y)) {
return(paste(x , y, sep=""))
} else {
.Primitive("+")(x,y)
}
}
Then the following should all work as expected:
1 + 4
1:10 + 4
"Help" + "Me"
This solution feels a bit like a hack, since you are no longer using formal methods but its the only way to get the exact behavior you wanted.

I'll try this (relatively more clean S3 solution)
`+` <- function (e1, e2) UseMethod("+")
`+.default` <- function (e1, e2) .Primitive("+")(e1, e2)
`+.character` <- function(e1, e2)
if(length(e1) == length(e2)) {
paste(e1, e2, sep = '')
} else stop('String Vectors of Different Lengths')
Code above will change + to a generic, and set the +.default to the original +, then add new method +.character to +

You can also use S3 classes for this:
String <- function(x) {
class(x) <- c("String", class(x))
x
}
"+.String" <- function(x,...) {
x <- paste(x, paste(..., sep="", collapse=""), sep="", collapse="")
String(x)
}
print.String <- function(x, ...) cat(x)
x <- "The quick brown "
y <- "fox jumped over "
z <- "the lazy dog"
String(x) + y + z

If R would thoroghlly comply with S4, the following would have been enough:
setMethod("+",
signature(e1 = "character", e2 = "character"),
function (e1, e2) {
paste(e1, e2, sep = "")
})
But this gives an error that the method is sealed :((. Hopefully this will change in the feature versions of R.
The best you can do is to define new class "string" which would behave exactly as "character" class:
setClass("string", contains="character")
string <- function(obj) new("string", as.character(obj))
and define the most general method which R allows:
setMethod("+", signature(e1 = "character", e2 = "ANY"),
function (e1, e2) string(paste(e1, as.character(e2), sep = "")))
now try:
tt <- string(44444)
tt
#An object of class "string"
#[1] "44444"
tt + 3434
#[1] "444443434"
"sfds" + tt
#[1] "sfds44444"
tt + tt
#[1] "4444444444"
343 + tt
#Error in 343 + tt : non-numeric argument to binary operator
"sdfs" + tt + "dfsd"
#An object of class "string"
#[1] "sdfs44444dfsd"

You have given yourself the correct answer -- everything in R is a function, and you cannot define new operators. So %+% is as good as it gets.

Related

How convert first char to lowerCase

Try to play with string and I have string like: "Hello.Word" or "stackOver.Flow"
and i what first char convert to lower case: "hello.word" and "stackOver.flow"
For snakeCase it easy we need only change UpperCase to lower and add '_'
but in camelCase (with firs char in lower case) i dont know how to do this
open System
let convertToSnakeCase (value:string) =
String [|
Char.ToLower value.[0]
for ch in value.[1..] do
if Char.IsUpper ch then '_'
Char.ToLower ch |]
Who can help?
module Identifier =
open System
let changeCase (str : string) =
if String.IsNullOrEmpty(str) then str
else
let isUpper = Char.IsUpper
let n = str.Length
let builder = new System.Text.StringBuilder()
let append (s:string) = builder.Append(s) |> ignore
let rec loop i j =
let k =
if i = n (isUpper str.[i] && (not (isUpper str.[i - 1])
((i + 1) <> n && not (isUpper str.[i + 1]))))
then
if j = 0 then
append (str.Substring(j, i - j).ToLower())
elif (i - j) > 2 then
append (str.Substring(j, 1))
append (str.Substring(j + 1, i - j - 1).ToLower())
else
append (str.Substring(j, i - j))
i
else
j
if i = n then builder.ToString()
else loop (i + 1) k
loop 1 0
type System.String with
member x.ToCamelCase() = changeCase x
printfn "%s" ("StackOver.Flow".ToCamelCase()) //stackOver.Flow
//need stackOver.flow
I suspect there are much more elegant and concise solutions, I sense you are learning functional programming, so I think its best to do stuff like this with recursive function rather than use some magic library function. I notice in your question you ARE using a recusive function, but also an index into an array, lists and recursive function work much more easily than arrays, so if you use recursion the solution is usually simpler if its a list.
I'd also avoid using a string builder, assuming you are learning fp, string builders are imperative, and whilst they obviously work, they wont help you get your head around using immutable data.
The key then is to use the pattern match to match the scenario that you want to use to trigger the upper/lower case logic, as it depends on 2 consecutive characters.
I THINK you want this to happen for the 1st char, and after a '.'?
(I've inserted a '.' as the 1st char to allow the recursive function to just process the '.' scenario, rather than making a special case).
let convertToCamelCase (value : string) =
let rec convertListToCamelCase (value : char list) =
match value with
| [] -> []
| '.' :: second :: rest ->
'.' :: convertListToCamelCase (Char.ToLower second :: rest)
| c :: rest ->
c :: convertListToCamelCase rest
// put a '.' on the front to simplify the logic (and take it off after)
let convertAsList = convertListToCamelCase ('.' :: (value.ToCharArray() |> Array.toList))
String ((convertAsList |> List.toArray).[1..])
The piece to worry about is the recusive piece, the rest of it is just flipping an array to a list and back again.

R: combinatorial string replacement

I am on the lookout for a gsub based function which would enable me to do combinatorial string replacement, so that if I would have an arbitrary number of string replacement rules
replrules=list("<x>"=c(3,5),"<ALK>"=c("hept","oct","non"),"<END>"=c("ane","ene"))
and a target string
string="<x>-methyl<ALK><END>"
it would give me a dataframe with the final string name and the substitutions that were made as in
name x ALK END
3-methylheptane 3 hept ane
5-methylheptane 5 hept ane
3-methyloctane 3 oct ane
5-methyloctane 5 ... ...
3-methylnonane 3
5-methylnonane 5
3-methylheptene 3
5-methylheptene 5
3-methyloctene 3
5-methyloctene 5
3-methylnonene 3
5-methylnonene 5
The target string would be of arbitrary structure, e.g. it could also be string="1-<ALK>anol" or each pattern could occur several times, as in string="<ALK>anedioic acid, di<ALK>yl ester"
What would be the most elegant way to do this kind of thing in R?
How about
d <- do.call(expand.grid, replrules)
d$name <- paste0(d$'<x>', "-", "methyl", d$'<ALK>', d$'<END>')
EDIT
This seems to work (substituting each of these into the strplit)
string = "<x>-methyl<ALK><END>"
string2 = "<x>-ethyl<ALK>acosane"
string3 = "1-<ALK>anol"
Using Richards regex
d <- do.call(expand.grid, list(replrules, stringsAsFactors=FALSE))
names(d) <- gsub("<|>","",names(d))
s <- strsplit(string3, "(<|>)", perl = TRUE)[[1]]
out <- list()
for(i in s) {
out[[i]] <- ifelse (i %in% names(d), d[i], i)
}
d$name <- do.call(paste0, unlist(out, recursive=F))
EDIT
This should work for repeat items
d <- do.call(expand.grid, list(replrules, stringsAsFactors=FALSE))
names(d) <- gsub("<|>","",names(d))
string4 = "<x>-methyl<ALK><END>oate<ALK>"
s <- strsplit(string4, "(<|>)", perl = TRUE)[[1]]
out <- list()
for(i in seq_along(s)) {
out[[i]] <- ifelse (s[i] %in% names(d), d[s[i]], s[i])
}
d$name <- do.call(paste0, unlist(out, recursive=F))
Well, I'm not exactly sure we can even produce a "correct" answer to your question, but hopefully this helps give you some ideas.
Okay, so in s, I just split the string where it might be of most importance. Then g gets the first value in each element of r. Then I constructed a data frame as an example. So then dat is a one row example of how it would look.
> (s <- strsplit(string, "(?<=l|\\>)", perl = TRUE)[[1]])
# [1] "<x>" "-methyl" "<ALK>" "<END>"
> g <- sapply(replrules, "[", 1)
> dat <- data.frame(name = paste(append(g, s[2], after = 1), collapse = ""))
> dat[2:4] <- g
> names(dat)[2:4] <- sapply(strsplit(names(g), "<|>"), "[", -1)
> dat
# name x ALK END
# 1 3-methylheptane 3 hept ane

A calculator using Lua string matching

I've recently playing around with string manipulation to try to make a calculator that takes only one string and returns an answer. I know I could simply use loadstring to do this, but I am trying to learn more about string manipulation. This is what I have so far: Is there any way I can make it more efficient?
function calculate(exp)
local x, op, y =
string.match(exp, "^%d"),
string.match(exp, " %D"),
string.match(exp, " %d$")
x, y = tonumber(x), tonumber(y)
op = op:sub(string.len(op))
if (op == "+") then
return x + y
elseif (op == "-") then
return x - y
elseif (op == "*") then
return x * y
elseif (op == "/") then
return x / y
else
return 0
end
end
print(calculate("5 + 5"))
You can use captures in the matching pattern to reduce the number of calls to string.match().
local x, op, y = string.match(exp, "^(%d) (%D) (%d)$")
This also eliminates the need to trim the op result.
The conversion tonumber() does not need to be called for x and y. These will automatically be converted when used with the numeric operators.

Optional capture of balanced brackets in Lua

Let's say I have lines of the form:
int[4] height
char c
char[50] userName
char[50+foo("bar")] userSchool
As you see, the bracketed expression is optional.
Can I parse these strings using Lua's string.match() ?
The following pattern works for lines that contain brackets:
line = "int[4] height"
print(line:match('^(%w+)(%b[])%s+(%w+)$'))
But is there a pattern that can handle also the optional brackets? The following does not work:
line = "char c"
print(line:match('^(%w+)(%b[]?)%s+(%w+)$'))
Can the pattern be written in another way to solve this?
Unlike regular expressions, ? in Lua pattern matches a single character.
You can use the or operator to do the job like this:
line:match('^(%w+)(%b[])%s+(%w+)$') or line:match('^(%w+)%s+(%w+)$')
A little problem with it is that Lua only keeps the first result in an expression. It depends on your needs, use an if statement or you can give the entire string the first capture like this
print(line:match('^((%w+)(%b[])%s+(%w+))$') or line:match('^((%w+)%s+(%w+))$'))
LPeg may be more appropriate for your case, especially if you plan to expand your grammar.
local re = require're'
local p = re.compile( [[
prog <- stmt* -> set
stmt <- S { type } S { name }
type <- name bexp ?
bexp <- '[' ([^][] / bexp)* ']'
name <- %w+
S <- %s*
]], {set = function(...)
local t, args = {}, {...}
for i=1, #args, 2 do t[args[i+1]] = args[i] end
return t
end})
local s = [[
int[4] height
char c
char[50] userName
char[50+foo("bar")] userSchool
]]
for k, v in pairs(p:match(s)) do print(k .. ' = ' .. v) end
--[[
c = char
userSchool = char[50+foo("bar")]
height = int[4]
userName = char[50]
--]]

How to paste two vectors together and pad at the end?

I would like to paste two character strings together and pad at the end with another character to make the combination a certain length. I was wondering if there was an option to paste that one can pass or another trick that I am missing? I can do this in multiple lines by figuring out the length of each and then calling paste with rep(my_pad_character,N) but I would like to do this in one line.
Ex: pad together "hi", and "hello" and pad with an "a" to make the sequence length 10. the result would be "hihelloaaa"
Here is one option:
s1 <- "hi"
s2 <- "hello"
f <- function(x, y, pad = "a", length = 10) {
out <- paste0(x, y)
nc <- nchar(out)
paste0(out, paste(rep(pad, length - nc), collapse = ""))
}
> f(s1, s2)
[1] "hihelloaaa"
You can use the stringr function str_pad
library(stringr)
str_pad(paste0('hi','hello'), side = 'right', width = 10 , pad = 'a')

Resources