Rcpp find unique character vectors - rcpp

I am learning Rcpp from Hadley Wickham's Advance R: http://adv-r.had.co.nz/Rcpp.html.
There is one exercise to implement R function unique() in Rcpp using an unordered_set (challenge: do it in one line!). The solution finds the unique numbers in a numeric vector. I am trying to find the unique characters in a character vector using the second code chunk, which produces an error. Any idea on how to achieve this simple function manually ? Thanks!
// [[Rcpp::export]]
std::unordered_set<double> uniqueCC(NumericVector x) {
return std::unordered_set<double>(x.begin(), x.end());
}
// [[Rcpp::export]]
std::unordered_set<String> uniqueCC(CharacterVector x) {
return std::unordered_set<String>(x.begin(), x.end());
}

For object types not in the STL library you need to define your own hash function. String (capital S) is an Rcpp object.
The easiest way to do this is to use Rcpp's ability to convert to common STL objects.
// [[Rcpp::export]]
std::unordered_set<std::string> uniqueCC(CharacterVector x) {
auto xv = Rcpp::as<std::vector<std::string>>(x);
return std::unordered_set<std::string>(xv.begin(), xv.end());
}
> x <- sample(letters, 1000, replace=T)
> uniqueCC(x)
[1] "r" "o" "c" "n" "f" "s" "y" "l" "i" "j" "m" "v" "t" "p" "u" "x" "w" "k" "g" "a" "d" "q" "z" "b" "h" "e"
Alternatively, you can take in a STL string vector and Rcpp magic will do the rest:
// [[Rcpp::export]]
std::unordered_set<std::string> uniqueCC(const std::vector<std::string> & x) {
return std::unordered_set<std::string>(x.begin(), x.end());
}

Related

How would you solve the letter changer in Julia?

I found this challenge:
Using your language, have the function LetterChanges(str) take the str parameter being passed and modify it using the following algorithm. Replace every letter in the string with the letter following it in the alphabet (ie. c becomes d, z becomes a). Then capitalize every vowel in this new string (a, e, i, o, u) and finally return this modified string.
I am new in Julia, and I was challenging myself in this challenge. I found this challenge very hard in Julia lang and I could not find a solution.
Here I tried to solve in the way below, but I got error: the x value is not defined
How would you solve this?
function LetterChanges(stringis::AbstractString)
alphabet = "abcdefghijklmnopqrstuvwxyz"
vohels = "aeiou"
for Char(x) in split(stringis, "")
if x == 'z'
x = 'a'
elseif x in vohels
uppercase(x)
else
Int(x)+1
Char(x)
println(x)
end
end
end
Thank you
As a side note:
The proposed solution works properly. However, if you would need high performance (which you probably do not given the source of your problem) it is more efficient to use string builder:
function LetterChanges2(str::AbstractString)
v = Set("aeiou")
#sprint(sizehint=sizeof(str)) do io # use on Julia 0.7 - new keyword argument
sprint() do io # use on Julia 0.6.2
for c in str
c = c == 'z' ? 'a' : c+1 # we assume that we got only letters from 'a':'z'
print(io, c in v ? uppercase(c) : c)
end
end
end
it is over 10x faster than the above.
EDIT: for Julia 0.7 this is a bit faster:
function LetterChanges2(str::AbstractString)
v = BitSet(collect(Int,"aeiouy"))
sprint(sizehint=sizeof(str)) do io # use on Julia 0.7 - new keyword argument
for c in str
c = c == 'z' ? 'a' : c+1 # we assume that we got only letters from 'a':'z'
write(io, Int(c) in v ? uppercase(c) : c)
end
end
end
There is a logic error. It says "Replace every letter in the string with the letter following it in the alphabet. Then capitalize every vowel in this new string". Your code checks, if it is a vowel. Then it capitalizes it or replaces it. That's different behavior. You have to first replace and then to check if it is a vowel.
You are replacing 'a' by 'Z'. You should be replacing 'z' by 'a'
The function split(stringis, "") returns an array of strings. You can't store these strings in Char(x). You have to store them in x and then you can transform theses string to char with c = x[1].
After transforming a char you have to store it in the variable: c = uppercase(c)
You don't need to transform a char into int. You can add a number to a char: c = c + 1
You have to store the new characters in a string and return them.
function LetterChanges(stringis::AbstractString)
# some code
str = ""
for x in split(stringis, "")
c = x[1]
# logic
str = "$str$c"
end
return str
end
Here's another version that is a bit faster than #BogumilKaminski's answer on version 0.6, but that might be different on 0.7. On the other hand, it might be a little less intimidating than the do-block magic ;)
function changeletters(str::String)
vowels = "aeiouy"
carr = Vector{Char}(length(str))
i = 0
for c in str
newchar = c == 'z' ? 'a' : c + 1
carr[i+=1] = newchar in vowels ? uppercase(newchar) : newchar
end
return String(carr)
end
At the risk of being accused of cheating, this is a dictionary-based approach:
function change_letters(s::String)::String
k = collect('a':'z')
v = vcat(collect('b':'z'), 'A')
d = Dict{Char, Char}(zip(k, v))
for c in Set("eiou")
d[c - 1] = uppercase(d[c - 1])
end
b = IOBuffer()
for c in s
print(b, d[c])
end
return String(take!(b))
end
It seems to compare well in speed terms with the other Julia 0.6 methods for long strings (e.g. 100,000 characters). There's a bit of unnecessary overhead in constructing the dictionary which is noticeable on small strings, but I'm far too lazy to type out the 'a'=>'b' construction long-hand!

How do I compare string in Ocaml

How do I go about comparing strings? For example
"a" and "b", since "a" comes before "b" then I would put in a tuple like this("a","b"). For "c" and "b", it would be like this ("b","c")
You can compare strings with the usal comparison operators: =, <>, <, <=, >, >=.
You can also use the compare function, which returns -1 if the first string is less than the second, 1 if the first string is greater than the second, and 0 if they are equal.
# "a" < "b";;
- : bool = true
# "a" > "b";;
- : bool = false
# compare "a" "b";;
- : int = -1

How do I get from string "3+10" to strings "3" "+" "10"?

I'm making a graphing calculator in Unity and I have input with strings like "3+10" and I want to split it to "3","+" and "10".
I can figure out a way to deal with them once I've got them to this form, but I really need a way to split the string to the left and right of key characters such as plus, times, exponent, etc.
I'm doing this in Unity, but a way to do this in any language should help.
C#
The following code will do what you asked for (and nothing more).
string input = "3+10-5";
string pattern = #"([-+^*\/])";
string[] substrings = Regex.Split(input, pattern);
// results in substrings = {"3", "+", "10", "-", "5"}
By using Regex.Split instead of String.Split you are able to retrieve the math operators as well. This is done by putting the math operators in a capture group ( ). If you're not familiar with regular expressions you should google the basics.
The code above will stubbornly use the math operators to split your string. If the string doesn't make sense, the method doesn't care and may even produce unexpected results. For example "5//10-" will result in {"5", "/", "", "10", "-", ""}. Note that only one / is returned and empty strings are added.
You can use more complex regular expressions to check if your string is a valid mathematical expression before you try to split it. For example ^(\d+(?:.\d+)?+([-+*^\/]\g<1>)?)$ would check if your string consists of a decimal number and zero or more combinations of an operator and another decimal number.
Here is the C# way -- which I mention because you are using Unity.
words = phrase.Split(default(string[]),StringSplitOptions.RemoveEmptyEntries);
https://msdn.microsoft.com/en-us/library/tabh47cf%28v=vs.110%29.aspx
Here is Java code for splitting a String by math operators
String[] splitByOperators(String input) {
String[] output = new String[input.length()];
int index = 0;
String current = "";
for (char c : input){
if (c == '+' || c == '-' || c == '*' || c == '/'){
output[index] = current;
index++;
output[index] = c;
index++;
current = "";
} else {
current = current + c;
}
}
output[index] = current;
return output;
}
Using Python regular expressions:
>>> import re
>>> match = re.search(r'(\d+)(.*)(\d+)', "3+1")
>>> match.group(1)
'3'
>>> match.group(2)
'+'
>>> match.group(3)
'1'
The reason for using regular expressions is for greater flexibility in handling a variety of simple arithmetic expressions.
R: EDITED
Take your input vector as x<-c("3+10", "4/12" , "8-3" ,"12*1","1+2-3*4/8").
We can use the following string split based on regex:
> strsplit(x,split="(?<=\\d)(?=[+*-/])|(?<=[+*-/])(?=\\d)",perl=T)
[[1]]
[1] "3" "+" "10"
[[2]]
[1] "4" "/" "12"
[[3]]
[1] "8" "-" "3"
[[4]]
[1] "12" "*" "1"
[[5]]
[1] "1" "+" "2" "-" "3" "*" "4" "/" "8"
How it works:
Split the string when one of two things is found:
A digit followed by an arithmetic operator. (?<=\\d) finds something immediately preceded by a digit, while (?=[+*-/]) finds something immediately succeeded by an arithmetic operator, i.e. +, *, -, or /. The "something" in both cases is the blank string "" found between a digit and an operator, and the string is split at such a point.
An arithmetic operator followed by a digit. This is just the reverse of the above.

Generating substrings and random strings in R

Please bear with me, I come from a Python background and I am still learning string manipulation in R.
Ok, so lets say I have a string of length 100 with random A, B, C, or D letters:
> df<-c("ABCBDBDBCBABABDBCBCBDBDBCBDBACDBCCADCDBCDACDDCDACBCDACABACDACABBBCCCBDBDDCACDDACADDDDACCADACBCBDCACD")
> df
[1]"ABCBDBDBCBABABDBCBCBDBDBCBDBACDBCCADCDBCDACDDCDACBCDACABACDACABBBCCCBDBDDCACDDACADDDDACCADACBCBDCACD"
I would like to do the following two things:
1) Generate a '.txt' file that is comprised of 20-length subsections of the above string, each starting one letter after the previous with their own unique name on the line above it, like this:
NAME1
ABCBDBDBCBABABDBCBCB
NAME2
BCBDBDBCBABABDBCBCBD
NAME3
CBDBDBCBABABDBCBCBDB
NAME4
BDBDBCBABABDBCBCBDBD
... and so forth
2) Take that generated list and from it comprise another list that has the same exact substrings with the only difference being a change of one or two of the A, B, C, or Ds to another A, B, C, or D (any of those four letters only).
So, this:
NAME1
ABCBDBDBCBABABDBCBCB
Would become this:
NAME1.1
ABBBDBDBCBDBABDBCBCB
As you can see, the "C" in the third position became a "B" and the "A" in position 11 became a "D", with no implied relationship between those changed letters. Purely random.
I know this is a convoluted question, but like I said, I am still learning basic text and string manipulation in R.
Thanks in advance.
Create a text file of substrings
n <- 20 # length of substrings
starts <- seq(nchar(df) - 20 + 1)
v1 <- mapply(substr, starts, starts + n - 1, MoreArgs = list(x = df))
names(v1) <- paste0("NAME", seq_along(v1), "\n")
write.table(v1, file = "filename.txt", quote = FALSE, sep = "",
col.names = FALSE)
Randomly replace one or two letters (A-D):
myfun <- function() {
idx <- sample(seq(n), sample(1:2, 1))
rep <- sample(LETTERS[1:4], length(idx), replace = TRUE)
return(list(idx = idx, rep = rep))
}
new <- replicate(length(v1), myfun(), simplify = FALSE)
v2 <- mapply(function(x, y, z) paste(replace(x, y, z), collapse = ""),
strsplit(v1, ""),
lapply(new, "[[", "idx"),
lapply(new, "[[", "rep"))
names(v2) <- paste0(names(v2), ".1")
write.table(v2, file = "filename2.txt", quote = FALSE, sep = "\n",
col.names = FALSE)
I tried breaking this down into multiple simple steps, hopefully you can get learn a few tricks from this:
# Random data
df<-c("ABCBDBDBCBABABDBCBCBDBDBCBDBACDBCCADCDBCDACDDCDACBCDACABACDACABBBCCCBDBDDCACDDACADDDDACCADACBCBDCACD")
n<-10 # Number of cuts
set.seed(1)
# Pick n random numbers between 1 and the length of string-20
nums<-sample(1:(nchar(df)-20),n,replace=TRUE)
# Make your cuts
cuts<-sapply(nums,function(x) substring(df,x,x+20-1))
# Generate some names
nams<-paste0('NAME',1:n)
# Make it into a matrix, transpose, and then recast into a vector to get alternating names and cuts.
names.and.cuts<-c(t(matrix(c(nams,cuts),ncol=2)))
# Drop a file.
write.table(names.and.cuts,'file.txt',quote=FALSE,row.names=FALSE,col.names = FALSE)
# Pick how many changes are going to be made to each cut.
changes<-sample(1:2,n,replace=2)
# Pick that number of positions to change
pos.changes<-lapply(changes,function(x) sample(1:20,x))
# Find the letter at each position.
letter.at.change.pos<-lapply(pos.changes,function(x) substring(df,x,x))
# Make a function that takes any letter, and outputs any other letter from c(A-D)
letter.map<-function(x){
# Make a list of alternate letters.
alternates<-lapply(x,setdiff,x=c('A','B','C','D'))
# Pick one of each
sapply(alternates,sample,size=1)
}
# Find another letter for each
letter.changes<-lapply(letter.at.change.pos,letter.map)
# Make a function to replace character by position
# Inefficient, but who cares.
rep.by.char<-function(str,pos,chars){
for (i in 1:length(pos)) substr(str,pos[i],pos[i])<-chars[i]
str
}
# Change every letter at pos.changes to letter.changes
mod.cuts<-mapply(rep.by.char,cuts,pos.changes,letter.changes,USE.NAMES=FALSE)
# Generate names
nams<-paste0(nams,'.1')
# Use the matrix trick to alternate names.Drop a file.
names.and.mod.cuts<-c(t(matrix(c(nams,mod.cuts),ncol=2)))
write.table(names.and.mod.cuts,'file2.txt',quote=FALSE,row.names=FALSE,col.names = FALSE)
Also, instead of the rep.by.char function, you could just use strsplit and replace like this:
mod.cuts<-mapply(function(x,y,z) paste(replace(x,y,z),collapse=''),
strsplit(cuts,''),pos.changes,letter.changes,USE.NAMES=FALSE)
One way, albeit slowish:
Rgames> foo<-paste(sample(c('a','b','c','d'),20,rep=T),sep='',collapse='')
Rgames> bar<-matrix(unlist(strsplit(foo,'')),ncol=5)
Rgames> bar
[,1] [,2] [,3] [,4] [,5]
[1,] "c" "c" "a" "c" "a"
[2,] "c" "c" "b" "a" "b"
[3,] "b" "b" "a" "c" "d"
[4,] "c" "b" "a" "c" "c"
Now you can select random indices and replace the selected locations with sample(c('a','b','c','d'),1) . For "true" randomness, I wouldn't even force a change - if your newly drawn letter is the same as the original, so be it.
Like this:
ibar<-sample(1:5,4,rep=T) # one random column number for each row
for ( j in 1: 4) bar[j,ibar[j]]<-sample(c('a','b','c','d'),1)
Then, if necessary, recombine each row using paste
For the first part of your question:
df <- c("ABCBDBDBCBABABDBCBCBDBDBCBDBACDBCCADCDBCDACDDCDACBCDACABACDACABBBCCCBDBDDCACDDACADDDDACCADACBCBDCACD")
nstrchars <- 20
count<- nchar(df)-nstrchars
length20substrings <- data.frame(length20substrings=sapply(1:count,function(x)substr(df,x,x+20)))
# to save to a text file. I chose not to include row names or a column name in the .txt file file
write.table(length20substrings,"length20substrings.txt",row.names=F,col.names=F)
For the second part:
# create a function that will randomly pick one or two spots in a string and replace
# those spots with one of the other characters present in the string:
changefxn<- function(x){
x<-as.character(x)
nc<-nchar(as.character(x))
id<-seq(1,nc)
numchanges<-sample(1:2,1)
ids<-sample(id,numchanges)
chars2repl<-strsplit(x,"")[[1]][ids]
charspresent<-unique(unlist(strsplit(x,"")))
splitstr<-unlist(strsplit(x,""))
if (numchanges>1) {
splitstr[id[1]] <- sample(setdiff(charspresent,chars2repl[1]),1)
splitstr[id[2]] <- sample(setdiff(charspresent,chars2repl[2]),1)
}
else {splitstr[id[1]] <- sample(setdiff(charspresent,chars2repl[1]),1)
}
newstr<-paste(splitstr,collapse="")
return(newstr)
}
# try it out
changefxn("asbbad")
changefxn("12lkjaf38gs")
# apply changefxn to all the substrings from part 1
length20substrings<-length20substrings[seq_along(length20substrings[,1]),]
newstrings <- lapply(length20substrings, function(ii)changefxn(ii))

Pattern matching a string in linear time

Given two strings S and T, where the T is the pattern string. Find if any scrambled form of pattern string exists as SubString in the string S and if present return the start index.
Example:
String S: abcdef
String T: efd
String S has "def", a combination of search string T: "efd".
I have found a solution with a run time of O(m*n). I am working on a linear time solution where I used to HashMaps (static one, maintained for String T, and another a dynamic copy of the previous HashMap used for checking the current substring of T). I'd start checking at the next character where it fails. But this runs in O(m*n) in worst case.
I'd like to get some pointers to make it work in O(m+n) time. Any help would be appreciated.
First of all, I would like to know boundaries for string S length (m) and pattern T length (n).
There exist one general idea but complexity of the solution based on it depends on the pattern length. Complexity varies from O(m) to O(m*n^2) for short patterns with length<=100 and O(n) for long patterns.
Fundamental theorem of arithmetic states that every integer number can be uniquely represented as a product of prime numbers.
Idea - I guess, your alphabet is english letters. So, alphabet size is 26. Let's replace first letter with first prime, second letter with the second and so on. I mean the following replacement: a->2b->3c->5d->7e->11 and so on.
Let's denote product of primes corresponding for the letters of some string as prime product(string). For example, primeProduct(z) will be 101 as 101 is 26-th prime number, primeProduct(abc) will be 2*3*5=30,primeProduct(cba) will also be 5*3*2=30.
Why we choose prime numbers? If we replace a ->2; b ->3, c->4, we won't be able to decipher for exapmle 4 - is it "c" or "aa".
Solution for the short patterns case:
For the string S, we should calculate in linear time prime product for all prefixes. I mean we have to create array A such that A[0] = primeProduct(S[0]), A[1] = primeProduct(S[0]S[1]), A[N] = primeProduct(S). Sample implementation:
A[0] = getPrime(S[0]);
for(int i=1;i<S.length;i++)
A[i]=A[i-1]*getPrime(S[i]);
Searching pattern T. Calculate primeProduct(T). For all 'windows' in S which have the same length with pattern compare it's primeProduct with primeProduct(pattern). If currentWindow is equal to the pattern or currentWindow is a scrumbled form(anagramm) of the pattern primeProducts will be the same.
Important note! We have prepared array A for fast computing primeProduct for any substring of S. primeProduct of(S[i],S[i+1],...S[j]) = getPrime(S[i])*...*getPrime(S[j]) = A[j]/A[i-1];
Complexity: if pattern length is <=9, even 'zzzzzzzzz' is 101^9<=MAX_LONG_INT; All calculations fit in standart long type and complexity is O(N)+O(M) where N is for calculating primeProduct of pattern and M is iterating over all windows in S. If length<=100 you have to add complexity of mul/div long numbers that's why complexity becomes O(m*n^2). length of 101^length is O(N) mul/div of such long numbers is O(N^2)
For the long patterns with length>=1000 it's better to store some hash map(prime,degree). Array of prefixes will become array of hash maps and A[j]/A[i-1] trick will become differenceBetween(A[j] and A[i-1] hashmaps's key sets).
Would this JavaScript example be linear time?
<script>
function matchT(t,s){
var tMap = [], answer = []
//map the character count in t
for (var i=0; i<t.length; i++){
var chr = t.charCodeAt(i)
if (tMap[chr]) tMap[chr]++
else tMap[chr] = 1
}
//traverse string
for (var i=0; i<s.length; i++){
if (tMap[s.charCodeAt(i)]){
var start = i, j = i + 1, tmp = []
tmp[s.charCodeAt(i)] = 1
while (tMap[s.charCodeAt(j)]){
var chr = s.charCodeAt(j++)
if (tmp[chr]){
if (tMap[chr] > tmp[chr]) tmp[chr]++
else break
}
else tmp[chr] = 1
}
if (areEqual (tmp,tMap)){
answer.push(start)
i = j - 1
}
}
}
return answer
}
//function to compare arrays
function areEqual(arr1,arr2){
if (arr1.length != arr2.length) return false
for (var i in arr1)
if (arr1[i] != arr2[i]) return false
return true
}
</script>
Output:
console.log(matchT("edf","ghjfedabcddef"))
[3, 10]
If the alphabet is not too large (say, ASCII), then there is no need to use a hash to take care of strings.
Just use a big array which is of the same size as the alphabet, and the existence checking becomes O(1). Thus the whole algorithm becomes O(m+n).
Let us consider for the given example,
String S: abcdef
String T: efd
Create a HashSet which consists of the characters present in the Substring T. So, the set consists of .
Generate a label for the Substring T: 1e1f1d. (number of occurences of each characters + the character itself, can be done using technique similar to count sort)
Now we have to generate labels for the input of the sub-string's length.
Let us start from the first position, which has character a. Since it is not present we do not create any sub-string and move to the next character b. Similarly, to character c and then stop at d.
Since d is present in the HashSet start generating labels(of the sub-string length) for each time the character appears. We can do this in different function to avoid clearing the count array(doing this reduces the complexity from O(m*n) to O(m+n)). If at any point the input string does not consists of the Substring T we can start the label generation from the next position(since the position till the break occurred cannot be a part of the anagram).
So, by generating the labels we can solve the problem in linear O(m+n) time complexity.
m: length of the input string,
n: length of the sub string.
That Code below I used for the pattern searching questions in GFG its accepted in all test cases and works in linear time.
// { Driver Code Starts
import java.util.*;
class Implement_strstr
{
public static void main(String args[])
{
Scanner sc = new Scanner(System.in);
int t = sc.nextInt();
sc.nextLine();
while(t>0)
{
String line = sc.nextLine();
String a = line.split(" ")[0];
String b = line.split(" ")[1];
GfG g = new GfG();
System.out.println(g.strstr(a,b));
t--;
}
}
}// } Driver Code Ends
class GfG
{
//Function to locate the occurrence of the string x in the string s.
int strstr(String a, String d)
{
if(a.equals("") && d.equals("")) return 0;
if(a.length()==1 && d.length()==1 && a.equals(d)) return 0;
if(d.length()==1 && a.charAt(a.length()-1)==d.charAt(0)) return a.length()-1;
int t=0;
int pl=-1;
boolean b=false;
int fl=-1;
for(int i=0;i<a.length();i++)
{
if(pl!=-1)
{
if(i==pl+1 && a.charAt(i)==d.charAt(t))
{
t++;
pl++;
if(t==d.length())
{
b=true;
break;
}
}
else
{
fl=-1;
pl=-1;
t=0;
}
}
else
{
if(a.charAt(i)==d.charAt(t))
{
fl=i;
pl=i;
t=1;
}
}
}
return b?fl:-1;
}
}
Here is the link to the question https://practice.geeksforgeeks.org/problems/implement-strstr/1

Resources