I have a dataset I am working on in R, however, one of the column values has dot (.) instead of a comma (,) so I think this might be messing up when I am running the regression. Does anyone know what code I should run to change all the dots to commas?
Thanks beforehand.
Assuming you have a dataframe named df
df %>% mutate_all(funs(str_replace(., "\\.", ",")))
If its for one column only
df %>% mutate(col1 = gsub("\\.", ",", col1))
Assuming your data is a character vector inside a dataframe
df <- data.frame(var = c("5.1", "30", "..", "75.234.4423.5"))
With gsub
df$var <- gsub("\\.", ",", df$var)
With stringi and purrr
library(stringi)
library(purrr)
df$var <- modify_if(df$var, stri_detect_fixed(df$var, "."),
~stri_sub_replace_all(., stri_locate_all_fixed(., "."), replacement=","))
Output
df
var
1 5,1
2 30
3 ,,
4 75,234,4423,5
I used purrr::modify_if with the predicate stri_detect_fixed(df$var, ".") so that the values without any dots (in my example, 30) are not converted to NA by stringi::stri_sub_replace_all.
The stringi version is more flexible for other purposes, you can pass functions inside the replacement argument when you want a dynamic replacement value.
I cannot comment on if it will help your regression analysis. I simply answered by giving ways to change dots to commas in a character vector.
I am fairly new to python and have a task to solve.
I have a list that is made of strings made of hexadecimal numbers. I want to replace some items with '0', if they do not start with the right characters.
So, for example, I have
List = ['0800096700000000', '090000000000025d', '0b0000000000003c', '0500051b014f0000']
and I want, say, to only have the data that starts with "0b" and "05", and I want to replace the others by "0".
For now, I have this:
multiplex = ('0b', '05')
List = ['0800096700000000', '090000000000025d', '0b0000000000003c', '0500051b014f0000']
List = [x for x in List if x.startswith(multiplex)]
This gives me the following result:
['0b0000000000003c', '0500051b014f0000']
Although I would like the following result:
['0', '0', '0b0000000000003c', '0500051b014f0000']
I cannot index the specific item I wish to change because the actual data is way too large for that...
Can someone help?
You should use an if/else to determine what to return, not if a value should be in the list.
my_list = ['0800096700000000', '090000000000025d', '0b0000000000003c', '0500051b014f0000']
multiplex = ('0b', '05')
my_new_list = [x if x.startswith(multiplex) else '0' for x in my_list]
print(my_new_list)
'''' Sample Output
['0', '0', '0b0000000000003c', '0500051b014f0000']
''''
Your multiplex strings are too long, so a single character string does not start with 2 characters. Try if x.startswith(multiplex) or len(str(x)) < 2 and x.startswith("0") or if x.startswith(multiplex) or str(x) == "0"
List = [x if x.startswith(multiplex) else '0' for x in List]
Input : abcdABCD
Output : AaBbCcDd
ms=[]
n = input()
for i in n:
ms.append(i)
ms.sort()
print(ms)
It gives me ABCDabcd.
How to sort this in python?
Without having to import anything, you could probably do something like this:
arr = "abcdeABCDE"
temp = sorted(arr, key = lambda i: (i.lower(), i))
result = "".join(temp)
print(result) # AaBbCcDdEe
The key will take in each element of arr and sort it first by lower-casing it, then if it ties, it will sort it based on its original value. It will group all similar letters together (A with a, B with b) and then put the capital first.
Use a sorting key:
ms = "abcdABCD"
sorted_ms = sorted(ms, key=lambda letter:(letter.upper(), letter.islower()))
# sorted_ms = ['A', 'a', 'B', 'b', 'C', 'c', 'D', 'd']
sorted_str = ''.join(sorted_ms)
# sorted_str = 'AaBbCcDd'
Why this works:
You can specify the criteria by which to sort by using the key argument in the sorted function, or the list.sort() method - this expects a function or lambda that takes the element in question, and outputs a new criteria by which to sort it. If that "new criteria" is a tuple, then the first element takes precedence - if it's equal, then the second argument, and so on.
So, the lambda I provided here returns a 2-tuple:
(letter.upper(), letter.islower())
letter.upper() as the first element here means that the strings are going to be sorted lexigraphically, but case-insensitively (as it will sort them as if they were all uppercase). Then, I use letter.islower() as the second argument, which is True if the letter was lowercase and False otherwise. When sorting, False comes before True - which means that if you give a capital letter and a lowercase letter, the capital letter will come first.
Try this:
>>>s='abcdABCD'
>>>''.join(sorted(s,key=lambda x:x.lower()))
'aAbBcCdD'
suppose I have a list which calls name:
name=['ACCBCDB','CCABACB','CAABBCB']
I want to use python to remove middle B from each element in the list.
the output should display :
['ACCCDB','CCAACB','CAABCB']
def ter(s):
return s[3:-3]
name=['ACBBDBA','CCABACB','CABBCBB']
xx=[ter(s) for s in name]
z=xx
print(z)
output
['B', 'B', 'B']
I did the reverse I want to delete B in middle and keep the other parts from each element
name = ['ACCBCDB','CCABACB','CAABBCB']
name_without_middle = []
for oldstr in name:
midlen = int((len(oldstr)/2))
newstr = oldstr[:midlen] + oldstr[midlen+1:]
name_without_middle.append(newstr)
print(name_without_middle)
Returns
['ACCCDB', 'CCAACB', 'CAABCB']
Try it here
I would like to insert an extra character (or a new string) at a specific location in a string. For example, I want to insert d at the fourth location in abcefg to get abcdefg.
Now I am using:
old <- "abcefg"
n <- 4
paste(substr(old, 1, n-1), "d", substr(old, n, nchar(old)), sep = "")
I could write a one-line simple function for this task, but I am just curious if there is an existing function for that.
You can do this with regular expressions and gsub.
gsub('^([a-z]{3})([a-z]+)$', '\\1d\\2', old)
# [1] "abcdefg"
If you want to do this dynamically, you can create the expressions using paste:
letter <- 'd'
lhs <- paste0('^([a-z]{', n-1, '})([a-z]+)$')
rhs <- paste0('\\1', letter, '\\2')
gsub(lhs, rhs, old)
# [1] "abcdefg"
as per DWin's comment,you may want this to be more general.
gsub('^(.{3})(.*)$', '\\1d\\2', old)
This way any three characters will match rather than only lower case. DWin also suggests using sub instead of gsub. This way you don't have to worry about the ^ as much since sub will only match the first instance. But I like to be explicit in regular expressions and only move to more general ones as I understand them and find a need for more generality.
as Greg Snow noted, you can use another form of regular expression that looks behind matches:
sub( '(?<=.{3})', 'd', old, perl=TRUE )
and could also build my dynamic gsub above using sprintf rather than paste0:
lhs <- sprintf('^([a-z]{%d})([a-z]+)$', n-1)
or for his sub regular expression:
lhs <- sprintf('(?<=.{%d})',n-1)
stringi package for the rescue once again! The most simple and elegant solution among presented ones.
stri_sub function allows you to extract parts of the string and substitute parts of it like this:
x <- "abcde"
stri_sub(x, 1, 3) # from first to third character
# [1] "abc"
stri_sub(x, 1, 3) <- 1 # substitute from first to third character
x
# [1] "1de"
But if you do this:
x <- "abcde"
stri_sub(x, 3, 2) # from 3 to 2 so... zero ?
# [1] ""
stri_sub(x, 3, 2) <- 1 # substitute from 3 to 2 ... hmm
x
# [1] "ab1cde"
then no characters are removed but new one are inserted. Isn't that cool? :)
#Justin's answer is the way I'd actually approach this because of its flexibility, but this could also be a fun approach.
You can treat the string as "fixed width format" and specify where you want to insert your character:
paste(read.fwf(textConnection(old),
c(4, nchar(old)), as.is = TRUE),
collapse = "d")
Particularly nice is the output when using sapply, since you get to see the original string as the "name".
newold <- c("some", "random", "words", "strung", "together")
sapply(newold, function(x) paste(read.fwf(textConnection(x),
c(4, nchar(x)), as.is = TRUE),
collapse = "-WEE-"))
# some random words strung together
# "some-WEE-NA" "rand-WEE-om" "word-WEE-s" "stru-WEE-ng" "toge-WEE-ther"
Your original way of doing this (i.e. splitting the string at an index and pasting in the inserted text) could be made into a generic function like so:
split_str_by_index <- function(target, index) {
index <- sort(index)
substr(rep(target, length(index) + 1),
start = c(1, index),
stop = c(index -1, nchar(target)))
}
#Taken from https://stat.ethz.ch/pipermail/r-help/2006-March/101023.html
interleave <- function(v1,v2)
{
ord1 <- 2*(1:length(v1))-1
ord2 <- 2*(1:length(v2))
c(v1,v2)[order(c(ord1,ord2))]
}
insert_str <- function(target, insert, index) {
insert <- insert[order(index)]
index <- sort(index)
paste(interleave(split_str_by_index(target, index), insert), collapse="")
}
Example usage:
> insert_str("1234567890", c("a", "b", "c"), c(5, 9, 3))
[1] "12c34a5678b90"
This allows you to insert a vector of characters at the locations given by a vector of indexes. The split_str_by_index and interleave functions are also useful on their own.
Edit:
I revised the code to allow for indexes in any order. Before, indexes needed to be in ascending order.
I've made a custom function called substr1 to deal with extracting, replacing and inserting chars in a string. Run these codes at the start of every session. Feel free to try it out and let me know if it needs to be improved.
# extraction
substr1 <- function(x,y) {
z <- sapply(strsplit(as.character(x),''),function(w) paste(na.omit(w[y]),collapse=''))
dim(z) <- dim(x)
return(z) }
# substitution + insertion
`substr1<-` <- function(x,y,value) {
names(y) <- c(value,rep('',length(y)-length(value)))
z <- sapply(strsplit(as.character(x),''),function(w) {
v <- seq(w)
names(v) <- w
paste(names(sort(c(y,v[setdiff(v,y)]))),collapse='') })
dim(z) <- dim(x)
return(z) }
# demonstration
abc <- 'abc'
substr1(abc,1)
# "a"
substr1(abc,c(1,3))
# "ac"
substr1(abc,-1)
# "bc"
substr1(abc,1) <- 'A'
# "Abc"
substr1(abc,1.5) <- 'A'
# "aAbc"
substr1(abc,c(0.5,2,3)) <- c('A','B')
# "AaB"
It took me some time to understand the regular expression, afterwards I found my way with the numbers I had
The end result was
old <- "89580000"
gsub('^([0-9]{5})([0-9]+)$', '\\1-\\2', old)
similar to yours!
First make sure to load tidyverse package, and then use both paste0 and gsub.
Here is the exact code:
paste0(substr(old, 1,3), "d", substr(old,4,6))
In base you can use regmatches to insert a character at a specific location in a string.
old <- "abcefg"
n <- 4
regmatches(old, `attr<-`(n, "match.length", 0)) <- "d"
old
#[1] "abcdefg"
This could also be used with a regex to find the location to insert.
s <- "abcefg"
regmatches(s, regexpr("(?<=c)", s, perl=TRUE)) <- "d"
s
#[1] "abcdefg"
And works also for multiple matches with individual repacements at different matches.
s <- "abcefg abcefg"
regmatches(s, gregexpr("(?<=c)", s, perl=TRUE)) <- list(1:2)
s
#[1] "abc1efg abc2efg"