string combine and split algorithm - string

I am using redis storing data, datas are combinated sequences of values, a sequece is combined with separator : and several string values, for example:
value1:value2:value3
the problem is those values may contain : in them, my first thought is escaping : to :: in the values, and then combine them, and I can split them by a solo :.
but this is not perfect, because {'abc', 'aaa:', 'bbb'} will be escaped to {'abc', 'aaa::', 'bbb'} and combined to abc:aaa:::bbb, it's unresolveable. this is probably a stupid question, I'm stuck, how would you solve the problem, or any better suggestion ?

I would instead suggest enclosing the values while inserting using a special identifier towards the beginning and end of string each and then combining them. e.g :
{'%abc%', '%aaa:%', '%bbb%'}
So whenever you want to split them again you can split them using your separator and then replace the prepended and appended value as per you convention to get the original string.
Hope that Helps!

Related

I want to join two strings, but separate the two strings by a space

I want to process files quickly through a program called star, but I have many files and want to pre-format the input from my files to save time. The format required is-
sample1read1.fq, sample2read1.fq \space\ sample1read2.fq, sample2read2.fq
[EDIT]: My files look like this:
trimmed_Sample_RX.fq where X can either be 1 or 2.
Star wants me to load all of the R1's together, separated by a comma, then a space and then all of my R2's together separated by a comma. To tackle this problem I have attempted to use the join command in python:
def identifier(x):
return(x[-10:])
read1= list(sorted(fnmatch.filter(os.listdir(PATH),'*_R1.fq'), key= identifier))
read1.append(' ')
read2= list(sorted(fnmatch.filter(os.listdir(PATH),'*_R2.fq'), key= identifier))
first_half= ','.join(read1)
second_half= ','.join(read2)
star_input= first_half + second_half
print(star_input)= 'trimmed_sample1R1.fq,trimmed_sample2R1.fq, , trimmed_sample1R2.fq,trimmed_sample2R2.fq'
I attempt to add a space to the end of my file list read1. Then I turn everything into a string and attempt to join the two strings together, but that space I added into my first half pops up in the concatenation as a comma
'trimmed_sample1R1.fq,trimmed_sample2R1.fq, , trimmed_sample1R2.f,trimmed_sample2R2.fq'
If I remove the step where I append a blank space and then concatenate the two strings I get the following
'trimmed_sample1R1.fq,trimmed_sample2R1.fqtrimmed_sample1R2.fq,trimmed_sample2R2.fq'
So now the comma is gone, but I also lose the space.
Thanks.
I think I found a workaround, but if there is a better way to do this please point it out to me. Basically I kept my code the same, but I change the last step. Now the flow looks like this:
first_half=','.join(read1)
second_half=','.join(read2)
star_input= '{} {}'.format(first_half, second_half)
print(star_input)
'trimmed_sample1R1.fq,trimmed_sample2R1 trimmed_sample1R2.fq,trimmedsample2R2.fq'

Python3 strip() get unexpect result

It's a weird problem
to_be_stripped="D:\\Users\\UserKnown\\PycharmProjects\\ProjectKnown\\PT\\collections\\120"
And two strings below:
s1="D:\\Users\\UserKnown\\PycharmProjects\\ProjectKnown\\PT\\collections\\120\\[Content_Types].xml"
s2="D:\\Users\\UserKnown\\PycharmProjects\\ProjectKnown\\PT\\collections\\120\\_rels\.rels"
When I use the command below:
s1.strip(to_be_stripped)
s2.strip(to_be_stripped)
I get these outputs:
'[Content_Types].x'
'_rels\\.'
If I use lstrip(), they will be:
'[Content_Types].xml'
'_rels\\.rels'
Which is the right outputs.
However, if we replace all Project Known with zeus_pipeline:
to_be_stripped="D:\\Users\\UserKnown\\PycharmProjects\\zeus_pipeline\\PT\\collections\\120"
And:
s2="D:\\Users\\UserKnown\\PycharmProjects\\zeus_pipeline\\PT\\collections\\120\\_rels\.rels"
s2.lstrip(to_be_stripped)will be '.rels'
If I use / instead of \\, nothing goes wrong. I am wondering why this problem happens.
strip isn't meant to remove full strings exactly. Rather, you give it a string, and every character in that string is removed from the start and of the string to be stripped.
In your case, the variable to_be_stripped contains the characters m and l, so those are stripped from the end of s1. However, it doesn't contain the character x, so the stripping stops there and no characters beyond that are removed.
Check out this question. The accepted answer is probably more extensive than you need - I like another user's suggestion of using replace instead of strip. This would look like:
s1.replace(to_be_stripped, "")

Adding space in a specific position in a string of uppercase and lowercase letters

Dear stackoverflow users,
Many people encounter situations in which they need to modify strings. I have seen many
posts related to string modification. But, I have not come across solutions I am looking
for. I believe my post would be useful for some other R users who will face similar
challenges. I would like to seek some help from R users who are familiar with string
modification.
I have been trying to modify a string like the following.
x <- "Marcus HELLNERJohan OLSSONAnders SOEDERGRENDaniel RICHARDSSON"
There are four individuals in this string. Family names are in capital letters.
Three out of four family names stay in chunks with first names (e.g., HELLNERJohan).
I want to separate family names and first names adding space (e.g., HELLNER Johan).
I think I need to state something like "Select sequences of uppercase letters, and
add space between the last and second last uppercase letters, if there are lowercase
letters following."
The following post is probably somewhat relevant, but I have not been successful in writing codes yet.
Splitting String based on letters case
Thank you very much for your generous support.
This works by finding and capturing two consecutive sub-patterns, the first consisting of one upper case letter (the end of a family name), and the next consisting of an upper then a lower-case letter (taken to indicate the start of a first name). Everywhere these two groups are found, they are captured and replaced by themselves with a space inserted between (the "\\1 \\2" in the call below).
x <- "Marcus HELLNERJohan OLSSONAnders SOEDERGRENDaniel RICHARDSSON"
gsub("([[:upper:]])([[:upper:]][[:lower:]])", "\\1 \\2", x)
# "Marcus HELLNER Johan OLSSON Anders SOEDERGREN Daniel RICHARDSSON"
If you want to separate the vector into a vector of names, this splits the string using a regular expression with zero-width lookbehind and lookahead assertions.
strsplit(x, split = "(?<=[[:upper:]])(?=[[:upper:]][[:lower:]])",
perl = TRUE)[[1]]
# [1] "Marcus HELLNER" "Johan OLSSON" "Anders SOEDERGREN"
# [4] "Daniel RICHARDSSON"

MATLAB line continuation within string

In MATLAB, ... is used to continue a line to the next line. But if I want to continue a long string within quotation, what can I do? ... will be treated as a part of the string itself.
Using [] is not a perfect solution since in most cases I use sprintf/fprintf to parse a long string like sql query. Using [] would be cumbersome. thanks.
If you put the string in brackets, you can build it in several pieces:
s = ['abc' 'def' ...
'ghi'];
You can then split that statement into several lines between the strings.
answer=['You can divide strings '...
,'by adding a comma '...
,'(as you probably know one year later).'];
You can use strcat or horzcat, which gives you somewhat more options than [], including the ability to mix in variables along with the hardcoded values.

identify common chars in correct order (kind of regular expression) from a array of strings

I am looking for how to identify common chars from a set of strings of different
length. First let me tell the same problem had posted here, and the author is somehow able to find out the answer. But i could not get his solution. I tried to post my query over
there, but not sure whether I will get any reply. So i am posting as a new one. (this is
the link for old qs Find common chars in array of strings, in the right order
of-strings-in-the-right-order).
I m taking the same example from him.
Let's assume "+" is the "wildcard char":
Array(
0 => '48ca135e0$5',
1 => 'b8ca136a0$5',
2 => 'c48ca13730$5',
3 => '48ca137a0$5');
Should return :
$wildcard='+8ca13+0$5';
This looks to me as a standard problem. so i doubt there will be some library
for this. If not pls show some light for solving this.
I dont think comparing char-by-char work (as told in the reply), becoz the matching char can come in anywhere (eg:- arr1[1] and arr2[3] can be starting index of matching some substring and the other way also).
regards,
Looks like you're looking for the "longest common substring". The first longest common substring is 8ca13, the second longest is 0$5. Once we have these two strings, you can take any of the strings in the set and replace extra characters with a single +.
http://en.wikipedia.org/wiki/Longest_common_substring_problem

Resources