How do I split a string with multiple separators in lua? - string

I want to split a string into an array divided by multiple delimiters.
local delim = {",", " ", "."}
local s = "a, b c .d e , f 10, M10 , 20,5"
Result table should look like this:
{"a", "b", "c", "d", "e", "f", "10", "M10", "20", "5"}
Delimiters can be white spaces, commas or dots.
If two delimiters like a white space and comma are coming after each other, they should be collapsed, additional whitespaces should be ignored.

This code splits the string as required by building a pattern of the complement of the delimiter set.
local delim = {",", " ", "."}
local s = "a, b c .d e , f 10, M10 , 20,5"
local p = "[^"..table.concat(delim).."]+"
for w in s:gmatch(p) do
print(w)
end
Adapt the code to save the "words" in a table.

Related

How to correctly read column from file when first element is empty

I have a data file data.txt
a
5 b
3 c 7
which I would like to load and have as
julia> loaded_data
3×3 Matrix{Any}:
"" "a" ""
5 "b" ""
3 "c" 7
but it is unclear to me how to do this. Trying readdlm
julia> using DelimitedFiles
julia> readdlm("data.txt")
3×3 Matrix{Any}:
"a" "" ""
5 "b" ""
3 "c" 7
does not correctly identify the first element of the first column as empty space, and instead reads "a" as the first element (which of course makes sense that it would). The closest I think I've gotten to what I want is using readlines
julia> readlines("data.txt")
3-element Vector{String}:
" a "
"5 b "
"3 c 7"
but from here I'm not sure how to proceed. I can grab one of the rows with all the columns and split it, but not sure how that helps me identify the empty elements in other rows.
Here's a possibility:
cnv(s) = (length(s) > 0 && all(isdigit, s)) ? parse(Int, s) : s
cnv.(stack(split.(replace.(eachline("data.txt")," "=>" "), " "), dims=1))
If the contents of the columns are sufficiently distinguishable to make the parsing uniquely defined, I'd use a regex on each line:
julia> lines
3-element Vector{String}:
" a "
"5 b "
"3 c 7"
julia> [match(r"\s*(\d*)\s*([a-z]*)\s*(\d*)", s).captures for s in lines]
3-element Vector{Vector{Union{Nothing, SubString{String}}}}:
["", "a", ""]
["5", "b", ""]
["3", "c", "7"]
You can then proceed to parse and concatenate as you wish, e.g.
julia> mapreduce(vcat, lines) do line
x, y, z = match(r"\s*(\d*)\s*([a-z]*)\s*(\d*)", line).captures
[tryparse(Int, x) y tryparse(Int, z)]
end
3×3 Matrix{Any}:
nothing "a" nothing
5 "b" nothing
3 "c" 7
In Julia 1.9, I think you should be able to write this as
stack(lines; dims=1) do line
x, y, z = match(r"\s*(\d*)\s*([a-z]*)\s*(\d*)", line).captures
(tryparse(Int, x), y, tryparse(Int, z))
end
This problem may have many edge cases to clarify.
Here is a longer option than the other answer, but perhaps better suited to tweak for the edge cases:
function splittable(d)
# find all non-space locations
t = sort(union(findall.(!isspace, d)...))
# find initial indices of fields
tt = t[vcat(1,findall(diff(t).!=1).+1)]
# prepare ranges to extract fields
tr = [tt[i]:tt[i+1]-1 for i in 1:length(tt)-1]
# extract substrings
vs = map(s -> strip.(vcat([s[intersect(r,eachindex(s))] for r in tr],
tt[end]<=length(s) ? s[tt[end]:end] : "")), d)
# fit substrings into matrix
L = maximum(length.(vs))
String.([j <= length(vs[i]) ? vs[i][j] : ""
for i in 1:length(vs), j in 1:L])
end
And:
julia> d = readlines("data.txt")
3-element Vector{String}:
" a "
"5 b "
"3 c 7"
julia> dd = splittable(d)
3×3 Matrix{String}:
"" "a" ""
"5" "b" ""
"3" "c" "7"
To get the partial parsing effect:
function parsewhatmay(m)
M = tryparse.(Int, m)
map((x,y)->isnothing(x) ? y : x, M, m)
end
and now:
julia> parsewhatmay(dd)
3×3 Matrix{Any}:
"" "a" ""
5 "b" ""
3 "c" 7

Julia: concat strings with separator (equivalent of R's paste)

I have an array of strings that I would like to concatenate together with a specific separator.
x = ["A", "B", "C"]
Expected results (with sep = ;):
"A; B; C"
The R's equivalent would be paste(x, sep=";")
I've tried things like string(x) but the result is not what I look for...
Use join. It is not clear if you want ";" or "; " as a separator.
julia> x = ["A", "B", "C"]
3-element Array{String,1}:
"A"
"B"
"C"
julia> join(x, ';')
"A;B;C"
julia> join(x, "; ")
"A; B; C"
If you just want ; then just use a character ';'as a separator, if you also want the space, you need to use a string: "; "

Find and print vowels from a string using a while loop

Study assignment (using python 3):
For a study assignment I need to write a program that prints the indices of all vowels in a string, preferably using a 'while-loop'.
So far I have managed to design a 'for-loop' to get the job done, but I could surely need some help on the 'while-loop'
for-loop solution:
string = input( "Typ in a string: " )
vowels = "a", "e", "i", "o", "u"
indices = ""
for i in string:
if i in vowels:
indices += i
print( indices )
while-loop solution:
string = input( "Typ in a string: " )
vowels = "a", "e", "i", "o", "u"
indices = ""
while i < len( string ):
<code>
i += 1
print( indices )
Would the use 'index()' or 'find()' work here?
Try This :
string = input( "Typ in a string: " )
vowels = ["a", "e", "i", "o", "u"]
higher_bound=1
lower_bound=0
while lower_bound<higher_bound:
convert_str=list(string)
find_vowel=list(set(vowels).intersection(convert_str))
print("Vowels in {} are {}".format(string,"".join(find_vowel)))
lower_bound+=1
You can also set higher_bound to len(string) then it will print result as many times as len of string.
Since this is your Study assignment you should look and practice yourself instead of copy paste. Here is additional info for solution :
In mathematics, the intersection A ∩ B of two sets A and B is the set
that contains all elements of A that also belong to B (or
equivalently, all elements of B that also belong to A), but no other
elements. For explanation of the symbols used in this article, refer
to the table of mathematical symbols.
In python :
The syntax of intersection() in Python is:
A.intersection(*other_sets)
A = {2, 3, 5, 4}
B = {2, 5, 100}
C = {2, 3, 8, 9, 10}
print(B.intersection(A))
print(B.intersection(C))
print(A.intersection(C))
print(C.intersection(A, B))
You can get the character at index x of a string by doing string[x]!
i = 0 # initialise i to 0 here first!
while i < len( string ):
if string[i] in vowels:
indices += str(i)
i += 1
print( indices )
However, is making indices a str really suitable? I don't think so, since you don't have separators between the indices. Is the string "12" mean that there are 2 vowels at index 1 and 2, or one vowel index 12? You can try using a list to store the indices:
indices = []
And you can add i to it by doing:
indices.append(i)
BTW, your for loop solution will print the vowel characters, not the indices.
If you don't want to use lists, you can also add an extra space after each index.
indices += str(I) + " "

How can I format this list into one string?

I have a list that contains a word. Each letter is separated by a space (as seen below).
word = ["h", " ", "e", " ", "l", " ", "l", " ", "o", " "]
I am trying to get it to print in the format:
h e l l o
I tried using a print statement (among other things) but it just came out:
["h", " ", "e", " ", "l", " ", "l", " ", "o", " "]
How do I fix this?
You could str.join(iterable) to join them together as one string:
"".join(word)
This will join all elements of the array with empty strings, essentially concatenating the strings together into one. Then you can print it:
print("".join(word))
This will produce
h e l l o
Just use the join function to convert the List into a string:
print ("".join(my_word))
The "" before .join means that between the characters an empty space will be added. If you want you can put whatever you like, even spaces or digits or strings.

Generating substrings and random strings in R

Please bear with me, I come from a Python background and I am still learning string manipulation in R.
Ok, so lets say I have a string of length 100 with random A, B, C, or D letters:
> df<-c("ABCBDBDBCBABABDBCBCBDBDBCBDBACDBCCADCDBCDACDDCDACBCDACABACDACABBBCCCBDBDDCACDDACADDDDACCADACBCBDCACD")
> df
[1]"ABCBDBDBCBABABDBCBCBDBDBCBDBACDBCCADCDBCDACDDCDACBCDACABACDACABBBCCCBDBDDCACDDACADDDDACCADACBCBDCACD"
I would like to do the following two things:
1) Generate a '.txt' file that is comprised of 20-length subsections of the above string, each starting one letter after the previous with their own unique name on the line above it, like this:
NAME1
ABCBDBDBCBABABDBCBCB
NAME2
BCBDBDBCBABABDBCBCBD
NAME3
CBDBDBCBABABDBCBCBDB
NAME4
BDBDBCBABABDBCBCBDBD
... and so forth
2) Take that generated list and from it comprise another list that has the same exact substrings with the only difference being a change of one or two of the A, B, C, or Ds to another A, B, C, or D (any of those four letters only).
So, this:
NAME1
ABCBDBDBCBABABDBCBCB
Would become this:
NAME1.1
ABBBDBDBCBDBABDBCBCB
As you can see, the "C" in the third position became a "B" and the "A" in position 11 became a "D", with no implied relationship between those changed letters. Purely random.
I know this is a convoluted question, but like I said, I am still learning basic text and string manipulation in R.
Thanks in advance.
Create a text file of substrings
n <- 20 # length of substrings
starts <- seq(nchar(df) - 20 + 1)
v1 <- mapply(substr, starts, starts + n - 1, MoreArgs = list(x = df))
names(v1) <- paste0("NAME", seq_along(v1), "\n")
write.table(v1, file = "filename.txt", quote = FALSE, sep = "",
col.names = FALSE)
Randomly replace one or two letters (A-D):
myfun <- function() {
idx <- sample(seq(n), sample(1:2, 1))
rep <- sample(LETTERS[1:4], length(idx), replace = TRUE)
return(list(idx = idx, rep = rep))
}
new <- replicate(length(v1), myfun(), simplify = FALSE)
v2 <- mapply(function(x, y, z) paste(replace(x, y, z), collapse = ""),
strsplit(v1, ""),
lapply(new, "[[", "idx"),
lapply(new, "[[", "rep"))
names(v2) <- paste0(names(v2), ".1")
write.table(v2, file = "filename2.txt", quote = FALSE, sep = "\n",
col.names = FALSE)
I tried breaking this down into multiple simple steps, hopefully you can get learn a few tricks from this:
# Random data
df<-c("ABCBDBDBCBABABDBCBCBDBDBCBDBACDBCCADCDBCDACDDCDACBCDACABACDACABBBCCCBDBDDCACDDACADDDDACCADACBCBDCACD")
n<-10 # Number of cuts
set.seed(1)
# Pick n random numbers between 1 and the length of string-20
nums<-sample(1:(nchar(df)-20),n,replace=TRUE)
# Make your cuts
cuts<-sapply(nums,function(x) substring(df,x,x+20-1))
# Generate some names
nams<-paste0('NAME',1:n)
# Make it into a matrix, transpose, and then recast into a vector to get alternating names and cuts.
names.and.cuts<-c(t(matrix(c(nams,cuts),ncol=2)))
# Drop a file.
write.table(names.and.cuts,'file.txt',quote=FALSE,row.names=FALSE,col.names = FALSE)
# Pick how many changes are going to be made to each cut.
changes<-sample(1:2,n,replace=2)
# Pick that number of positions to change
pos.changes<-lapply(changes,function(x) sample(1:20,x))
# Find the letter at each position.
letter.at.change.pos<-lapply(pos.changes,function(x) substring(df,x,x))
# Make a function that takes any letter, and outputs any other letter from c(A-D)
letter.map<-function(x){
# Make a list of alternate letters.
alternates<-lapply(x,setdiff,x=c('A','B','C','D'))
# Pick one of each
sapply(alternates,sample,size=1)
}
# Find another letter for each
letter.changes<-lapply(letter.at.change.pos,letter.map)
# Make a function to replace character by position
# Inefficient, but who cares.
rep.by.char<-function(str,pos,chars){
for (i in 1:length(pos)) substr(str,pos[i],pos[i])<-chars[i]
str
}
# Change every letter at pos.changes to letter.changes
mod.cuts<-mapply(rep.by.char,cuts,pos.changes,letter.changes,USE.NAMES=FALSE)
# Generate names
nams<-paste0(nams,'.1')
# Use the matrix trick to alternate names.Drop a file.
names.and.mod.cuts<-c(t(matrix(c(nams,mod.cuts),ncol=2)))
write.table(names.and.mod.cuts,'file2.txt',quote=FALSE,row.names=FALSE,col.names = FALSE)
Also, instead of the rep.by.char function, you could just use strsplit and replace like this:
mod.cuts<-mapply(function(x,y,z) paste(replace(x,y,z),collapse=''),
strsplit(cuts,''),pos.changes,letter.changes,USE.NAMES=FALSE)
One way, albeit slowish:
Rgames> foo<-paste(sample(c('a','b','c','d'),20,rep=T),sep='',collapse='')
Rgames> bar<-matrix(unlist(strsplit(foo,'')),ncol=5)
Rgames> bar
[,1] [,2] [,3] [,4] [,5]
[1,] "c" "c" "a" "c" "a"
[2,] "c" "c" "b" "a" "b"
[3,] "b" "b" "a" "c" "d"
[4,] "c" "b" "a" "c" "c"
Now you can select random indices and replace the selected locations with sample(c('a','b','c','d'),1) . For "true" randomness, I wouldn't even force a change - if your newly drawn letter is the same as the original, so be it.
Like this:
ibar<-sample(1:5,4,rep=T) # one random column number for each row
for ( j in 1: 4) bar[j,ibar[j]]<-sample(c('a','b','c','d'),1)
Then, if necessary, recombine each row using paste
For the first part of your question:
df <- c("ABCBDBDBCBABABDBCBCBDBDBCBDBACDBCCADCDBCDACDDCDACBCDACABACDACABBBCCCBDBDDCACDDACADDDDACCADACBCBDCACD")
nstrchars <- 20
count<- nchar(df)-nstrchars
length20substrings <- data.frame(length20substrings=sapply(1:count,function(x)substr(df,x,x+20)))
# to save to a text file. I chose not to include row names or a column name in the .txt file file
write.table(length20substrings,"length20substrings.txt",row.names=F,col.names=F)
For the second part:
# create a function that will randomly pick one or two spots in a string and replace
# those spots with one of the other characters present in the string:
changefxn<- function(x){
x<-as.character(x)
nc<-nchar(as.character(x))
id<-seq(1,nc)
numchanges<-sample(1:2,1)
ids<-sample(id,numchanges)
chars2repl<-strsplit(x,"")[[1]][ids]
charspresent<-unique(unlist(strsplit(x,"")))
splitstr<-unlist(strsplit(x,""))
if (numchanges>1) {
splitstr[id[1]] <- sample(setdiff(charspresent,chars2repl[1]),1)
splitstr[id[2]] <- sample(setdiff(charspresent,chars2repl[2]),1)
}
else {splitstr[id[1]] <- sample(setdiff(charspresent,chars2repl[1]),1)
}
newstr<-paste(splitstr,collapse="")
return(newstr)
}
# try it out
changefxn("asbbad")
changefxn("12lkjaf38gs")
# apply changefxn to all the substrings from part 1
length20substrings<-length20substrings[seq_along(length20substrings[,1]),]
newstrings <- lapply(length20substrings, function(ii)changefxn(ii))

Resources