Non character argument in R string split function (strsplit)

Non character argument in R string split function (strsplit) - string

This works
x <- "0.466:1.187:2.216:1.196"
y <- as.numeric(unlist(strsplit(x, ":")))
Values of blat$LRwAvg all look like X above but this doesn't work
for (i in 1:50){
y <- as.numeric(unlist(strsplit(blat$LRwAvg[i], "\\:")))
blat$meanLRwAvg[i]=mean(y)
}
Because of:
Error in strsplit(blat$LRwAvg[i], "\:") : non-character argument
It doesn't matter if I have one, two or null backslashes.
What's my problem? (Not generally, I mean in this special task, technically)

As agstudy implied blat$LRwAvg <- as.character(blat$LRwAvg) before loop fixed it
blat$meanLRwAvg <- blat$gtFrqAvg #or some other variable in data frame with equal length
blat$LRwAvg <- as.character(blat$LRwAvg)
for (i in 1:50){
y <- as.numeric(unlist(strsplit(blat$LRwAvg[i], "\\:")))
blat$meanLRwAvg[i]=mean(y)
}

Related

haskell how to remove the start word from the expected output

Am trying to find all possible partitions of a string. For example if i use this single word "sun" ,
`the expected output should be [["su","n"],["s","un"],["s","u","n"]]`
but in my code the output contains the string where the user inputs and i dont want this
// i dont want ["sun"] in the output
my output : [["sun"],["su","n"],["s","un"],["s","u","n"]]
Is there any way in my code to stop the function of containing the whole word into the output ?
I was thinging to push as input into the auxiliary function the lenght of all partions of the string :
length (<function_name> w ) , where w is the string and if the a counter where i will set be equal to
length (<function_name> w ) -1 , stop making partitions but i didnt work .
My code is :
partition :: String->[[String]]
partition w = help_partition w
help_partition :: String->[[String]]
help_partition [x] = [[[x]]]
help_partition (x:xs) = [(x:head l):(tail l) | l<- help_partition xs] ++[[x]:l | l<- help_partition xs]

The way your help_partition function works, it will always include the full string as the first element in the returned list, so you can use tail to skip it:
partition w = tail (help_partition w)
or even:
partition = tail . help_partition
This is a pretty common technique: you'll often find yourself writing a recursive function that almost returns the list you want but includes an extra special case as the first or last element, and you can use either tail or init to drop those elements.

You should be able to make your partition function just filter out [w] from the result of help_partition. In other words:
partition w = filter (/= [w]) (help_partition w)

Haskell As-patterns, binding variables to constants

(This code doesn't make much sense, but I need this logic to work in my other complicated function):
import Data.List
elemIndex1 xss#(x:xs) =
if (x == ' ')
then (elemIndex x xss)
else (elemIndex1 xs)
So I want this function to give this:
elemIndex1 "qwe asd zxc"
Just 3
Instead it gives this:
elemIndex1 "qwe asd zxc"
Just 0
As I understand, at the else clause xss actually becomes xs.
So my question is: is there a possibility to bind the variable (x:xs) to a constant and to use this constant at any iteration?

It seems like you are expecting xss#(x:xs) to be the following:
xss: the original string given to elemIndex1
x: the first character of an arbitrary call
xs: the rest of the characters of an arbitrary call
e.g. for your example when x first matches a space
xss = "qwe asd zxc"
x = ' '
xs = "asd zxe"
This is not how the pattern match works. xss is actually equal to x:xs, so in that example it would be " asd zxc".
If you want to keep around the first call to a function, you can use a helper function called inside of the scope of the original function.
weirdElemIndex str = weirdElemIndex' str
where
weirdElemIndex' "" = Nothing
weirdElemIndex' (x:xs) =
if x == ' '
then elemIndex ' ' str
else weirdElemIndex' xs
Note that the str I reference in the body of the helper function will be a constant in its invocation.
For what it’s worth, your contrived example seems to be equivalent to elemIndex ' ' since it deals with the case where there is no space in the string by returning Nothing.

Scala Comprehension Errors

I am working on some of the exercism.io exercises. The current one I am working on is for Scala DNA exercise. Here is my code and the errors that I am receiving:
For reference, DNA is instantiated with a strand String. This DNA can call count (which counts the strand for the single nucleotide passed) and nucletideCounts which counts all of the respective occurrences of each nucleotide in the strand and returns a Map[Char,Int].
class DNA(strand:String) {
def count(nucleotide:Char): Int = {
strand.count(_ == nucleotide)
}
def nucleotideCounts = (
for {
n <- strand
c <- count(n)
} yield (n, c)
).toMap
}
The errors I am receiving are:
Error:(10, 17) value map is not a member of Int
c <- count(n)
^
Error:(12, 5) Cannot prove that Char <:< (T, U). ).toMap
^
Error:(12, 5) not enough arguments for method toMap: (implicit ev:
<:<[Char,(T, U)])scala.collection.immutable.Map[T,U]. Unspecified
value parameter ev. ).toMap
^
I am quite new to Scala, so any enlightenment on why these errors are occurring and suggestions to fixing them would be greatly appreciated.

for comprehensions work over Traversable's that have flatMap and map methods defined, as the error message is pointing out.
In your case count returns with a simple integer so no need to "iterate" over it, just simply add it to your result set.
for {
n <- strand
} yield (n, count(n))
On a side note this solution is not too optimal as in the case of a strand AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA count is going to be called many times. I would recommend calling toSet so you get the distinct Chars only:
for {
n <- strand.toSet
} yield (n, count(n))

In line with Akos's approach, consider a parallel traversal of a given strand (String),
strand.distinct.par.map( n => n -> count(n) )
Here we use distinct to gather unique items and construct each Map association in map.

A pipeline solution would look like:
def nucleotideCounts() = strand.groupBy(identity).mapValues(_.length)

Another approach is
Map() ++ {for (n <- strand; c = count(n)) yield n->c}
Not sure why it's different than {...}.toMap() but it gets the job done!
Another way to go is
Map() ++ {for (n <- strand; c <- Seq(count(n))) yield n->c}

Insert a character at a specific location in a string

I would like to insert an extra character (or a new string) at a specific location in a string. For example, I want to insert d at the fourth location in abcefg to get abcdefg.
Now I am using:
old <- "abcefg"
n <- 4
paste(substr(old, 1, n-1), "d", substr(old, n, nchar(old)), sep = "")
I could write a one-line simple function for this task, but I am just curious if there is an existing function for that.

You can do this with regular expressions and gsub.
gsub('^([a-z]{3})([a-z]+)$', '\\1d\\2', old)
# [1] "abcdefg"
If you want to do this dynamically, you can create the expressions using paste:
letter <- 'd'
lhs <- paste0('^([a-z]{', n-1, '})([a-z]+)$')
rhs <- paste0('\\1', letter, '\\2')
gsub(lhs, rhs, old)
# [1] "abcdefg"
as per DWin's comment,you may want this to be more general.
gsub('^(.{3})(.*)$', '\\1d\\2', old)
This way any three characters will match rather than only lower case. DWin also suggests using sub instead of gsub. This way you don't have to worry about the ^ as much since sub will only match the first instance. But I like to be explicit in regular expressions and only move to more general ones as I understand them and find a need for more generality.
as Greg Snow noted, you can use another form of regular expression that looks behind matches:
sub( '(?<=.{3})', 'd', old, perl=TRUE )
and could also build my dynamic gsub above using sprintf rather than paste0:
lhs <- sprintf('^([a-z]{%d})([a-z]+)$', n-1)
or for his sub regular expression:
lhs <- sprintf('(?<=.{%d})',n-1)

stringi package for the rescue once again! The most simple and elegant solution among presented ones.
stri_sub function allows you to extract parts of the string and substitute parts of it like this:
x <- "abcde"
stri_sub(x, 1, 3) # from first to third character
# [1] "abc"
stri_sub(x, 1, 3) <- 1 # substitute from first to third character
x
# [1] "1de"
But if you do this:
x <- "abcde"
stri_sub(x, 3, 2) # from 3 to 2 so... zero ?
# [1] ""
stri_sub(x, 3, 2) <- 1 # substitute from 3 to 2 ... hmm
x
# [1] "ab1cde"
then no characters are removed but new one are inserted. Isn't that cool? :)

#Justin's answer is the way I'd actually approach this because of its flexibility, but this could also be a fun approach.
You can treat the string as "fixed width format" and specify where you want to insert your character:
paste(read.fwf(textConnection(old),
c(4, nchar(old)), as.is = TRUE),
collapse = "d")
Particularly nice is the output when using sapply, since you get to see the original string as the "name".
newold <- c("some", "random", "words", "strung", "together")
sapply(newold, function(x) paste(read.fwf(textConnection(x),
c(4, nchar(x)), as.is = TRUE),
collapse = "-WEE-"))
# some random words strung together
# "some-WEE-NA" "rand-WEE-om" "word-WEE-s" "stru-WEE-ng" "toge-WEE-ther"

Your original way of doing this (i.e. splitting the string at an index and pasting in the inserted text) could be made into a generic function like so:
split_str_by_index <- function(target, index) {
index <- sort(index)
substr(rep(target, length(index) + 1),
start = c(1, index),
stop = c(index -1, nchar(target)))
}
#Taken from https://stat.ethz.ch/pipermail/r-help/2006-March/101023.html
interleave <- function(v1,v2)
{
ord1 <- 2*(1:length(v1))-1
ord2 <- 2*(1:length(v2))
c(v1,v2)[order(c(ord1,ord2))]
}
insert_str <- function(target, insert, index) {
insert <- insert[order(index)]
index <- sort(index)
paste(interleave(split_str_by_index(target, index), insert), collapse="")
}
Example usage:
> insert_str("1234567890", c("a", "b", "c"), c(5, 9, 3))
[1] "12c34a5678b90"
This allows you to insert a vector of characters at the locations given by a vector of indexes. The split_str_by_index and interleave functions are also useful on their own.
Edit:
I revised the code to allow for indexes in any order. Before, indexes needed to be in ascending order.

I've made a custom function called substr1 to deal with extracting, replacing and inserting chars in a string. Run these codes at the start of every session. Feel free to try it out and let me know if it needs to be improved.
# extraction
substr1 <- function(x,y) {
z <- sapply(strsplit(as.character(x),''),function(w) paste(na.omit(w[y]),collapse=''))
dim(z) <- dim(x)
return(z) }
# substitution + insertion
`substr1<-` <- function(x,y,value) {
names(y) <- c(value,rep('',length(y)-length(value)))
z <- sapply(strsplit(as.character(x),''),function(w) {
v <- seq(w)
names(v) <- w
paste(names(sort(c(y,v[setdiff(v,y)]))),collapse='') })
dim(z) <- dim(x)
return(z) }
# demonstration
abc <- 'abc'
substr1(abc,1)
# "a"
substr1(abc,c(1,3))
# "ac"
substr1(abc,-1)
# "bc"
substr1(abc,1) <- 'A'
# "Abc"
substr1(abc,1.5) <- 'A'
# "aAbc"
substr1(abc,c(0.5,2,3)) <- c('A','B')
# "AaB"

It took me some time to understand the regular expression, afterwards I found my way with the numbers I had
The end result was
old <- "89580000"
gsub('^([0-9]{5})([0-9]+)$', '\\1-\\2', old)

similar to yours!
First make sure to load tidyverse package, and then use both paste0 and gsub.
Here is the exact code:
paste0(substr(old, 1,3), "d", substr(old,4,6))

In base you can use regmatches to insert a character at a specific location in a string.
old <- "abcefg"
n <- 4
regmatches(old, `attr<-`(n, "match.length", 0)) <- "d"
old
#[1] "abcdefg"
This could also be used with a regex to find the location to insert.
s <- "abcefg"
regmatches(s, regexpr("(?<=c)", s, perl=TRUE)) <- "d"
s
#[1] "abcdefg"
And works also for multiple matches with individual repacements at different matches.
s <- "abcefg abcefg"
regmatches(s, gregexpr("(?<=c)", s, perl=TRUE)) <- list(1:2)
s
#[1] "abc1efg abc2efg"

Replacing parts of a list

Is there an easy way to replace a sub-list of strings in a character vector with another list of strings? something like
gsub(c("a","b"),c("z","y"),a)
or
replace(a,c("a","b"),c("z","y"))
neither of which unfortunately work?

If you are just replacing single characters, then chartr might just be what you are looking for :
> chartr( "ab", "zy", "abababa")
[1] "zyzyzyz"
This question might also be of interest.

A simple loop using gsub would suffice and will probably perform just fine in most cases:
a <- c("x","y")
b <- c("a","b")
vec <- "xy12"
mgsub <- function(pattern,replacement,x,...){
for (i in seq_along(pattern)){
x <- gsub(pattern = pattern[i],replacement = replacement[i],x,...)
}
x
}
> mgsub(a,b,vec)
[1] "ab12"

I could've sworn there was a recursive apply in R, and there is, but it does something very different.
Anyhow, here's one:
#' Iteratively (recursively) apply a function to its own output
#' #param X a vector of first arguments to be passed in
#' #param FUN a function taking a changing (x) and an initial argument (init)
#' #param init an argument to be "worked on" by FUN with parameters x[1], x[2], etc.
#' #return the final value, of the same type as init
#' #example
#' vec <- "xy12"
#' replacementPairs <- list( c("x","a"), c("y","b") )
#' iapply( replacementPairs , FUN=function(repvec,x) {
#' gsub(repvec[1],repvec[2],x)
#' }, init=vec )
iapply <- function(X, FUN, init, ...) {
res <- init
for(x in X) {
res <- FUN(x, res, ...)
}
res
}
The example returns "ab12".

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Non character argument in R string split function (strsplit) - string

Related

haskell how to remove the start word from the expected output

Haskell As-patterns, binding variables to constants

Scala Comprehension Errors

Insert a character at a specific location in a string

Replacing parts of a list

Categories

Resources