String Compress with while and for loop - python-3.x

I am doing a string compress feature with while and for loop. The idea is:
1. Use the 1st character to loop through the string, and then stop when the next char is not the same
2. slide the old string by remove the first same characters
3. redo the loop until the string len = 0.
Somehow my initial code didnt work -> I use a helper ( adding 2 "$$" characters to end of initial string, and loop until last 2 $$.
Can anyone help me to solve this problem without using the helper!
thank you very much.
here is my code:
text = 'aaxxxxxxxbccccaaxxxaa'
text =text+"$$" # this is the helper, I would like the do the task without
using helper
count=0
result=''
while len(text)>2:
for x in range(0,len(text)):
if text[x]==text[0]:
#print( text[x])
count+=1
else:
print(text[0]+str(count))
result = result+text[0]+str(count)
print(result)
text =text[count:]
count=0
break
P.s If I don't use helper, I get Out of string range.

this code below will print: a2x7b1c4a2x3a2
text = 'aaxxxxxxxbccccaaxxxaa'
text =text+"$$" # this is the helper, I would like the do the task withou using
helper
count=0
result=''
while len(text)>2:
for x in range(0,len(text)):
if text[x]==text[0]:
#print( text[x])
count+=1
else:
#print(text[0]+str(count))
result = result+text[0]+str(count)
#print(result)
text =text[count:]
count=0
break
print(result)
I want to remove helper, and get the same a2x7b1c4a2x3a2 with the 2nd code below, but Jupiter notebook will die (I guess infinite loop happen)
text = 'aaxxxxxxxbccccaaxxxaa'
count=0
result=''
while len(text)>0:
for x in range(0,len(text)):
if text[x]==text[0]:
#print( text[x])
count+=1
else:
#print(text[0]+str(count))
result = result+text[0]+str(count)
#print(result)
text =text[count:]
count=0
break
print(result)

Related

Why if there is no break this code keep returning my input?

Could anyone help me to understand why if I didn't put break there, the code give me multiple output. For example:
In : myfunc('abogoboga')
Out : 'aBoGoBoGaaBoGoBoGaaBoGoBoGaaBoGoBoGaaBoGoBoGaaBoGoBoGaaBoGoBoGaaBoGoBoGaaBoGoBoGa'
def myfunc(*args):
output = []
for strg in args:
for char in strg:
for i in range(len(strg)):
if i % 2 == 0:
output.append(strg[i].lower())
else:
output.append(strg[i].upper())
break
return ''.join(output)
but, after putting break as above:
In : myfunc('abogoboga')
Out : 'aBoGoBoGa'
Your nested for loops accomplish the same thing. for char in strg is assigning char to each character in strg, but you never use it. Instead, you iterate over strg again, and the reason it works with the break is that if you break after performing one loop of for char in strg, you are turning the for loop into a simple statement. A simpler way of doing this is removing for char in strg:
def myfunc(*args):
output = []
for strg in args:
for i in range(len(strg)):
if i % 2 == 0:
output.append(strg[i].lower())
else:
output.append(strg[i].upper())
return ''.join(output)

a="python". I want output: p-y-t-h-o-n. Any loop possible for this

a="python"
I want output with the help of loop as: p-y-t-h-o-n
I want this code to be dynamic by using input() so that every input entered can be seperated by (-).
a = "python"
b = '-'.join(a)
print(b)
unless you really need a loop
a = "python"
print('-'.join(a))
Untested, also without a loop
If you really need a loop:
s = input('Enter a String ')
res=''
for i in range(0, len(s)):
res+=s[i]+'-'
res=res[:-1]
print (res)
If you actually need a for loop.
a = 'python'
s = ''
#Iterate through each character and make a new string after appending -
for i in a:
s+=i+'-'
#Remove the last -
s = s[:-1]
print(s)
#p-y-t-h-o-n

Python. Trying to write a function called one_frame. Does not seem to work. Help would be greatly appreciated

As of right now, this is my code:
def get_orf(DNA):
codon = ''
if(DNA[0:3] == 'ATG'):
codon = DNA[0:3]
for x in range(3,len(DNA)+1,3):
if DNA[x:x+3] == "TAG" or DNA[x:x+3] == "TAA" or DNA[x:x+3] == "TGA":
return codon
else: codon = codon + DNA[x:x+3]
if codon[-3:] in ["TAG", "TAA", "TGA"]:
return codon
else: return 'No ORF'
def one_frame(DNA):
x = 0
ORFlist = []
while x < len(DNA):
codon = DNA[x:]
if DNA.startswith('ATG'):
get_orf(DNA[x:])
if codon:
ORFlist.append(codon)
x += len(codon)
return(ORFlist)
get_orf function works fine but my one_frame function doesn't work.
The one_frame function is supposed to take a DNA string as input. It searches that
string from left to right in multiples of three nucleotides–that is, in a single reading frame. When
it hits a start codon “ATG" it calls get_orf on the slice of the string beginning at that start codon
(until the end) to get back an ORF. That ORF is added to a list of ORFs and then the function skips
ahead in the DNA string to the point right after the ORF that we just found and starts looking for
the next ORF. This is repeated until we’ve traversed the entire DNA string.
I can see a few obvious problems but not sure exactly what you want so hope this helps. Firstly your for loop in one_frame will never end unless DNA starts with 'ATG'. I think you want to check codon.startswith instead of DNA.startswith. You also need to do the x+= command outside of the if statement, or it will never be updated when you don't hit 'ATG' and so your loop will continue forever. You're also not using the value of get_orf at all.
I think this will do the trick,
def one_frame(DNA):
x = 0
ORFlist = []
while x < len(DNA):
codon = DNA[x:]
# Check codon instead of DNA
if codon.startswith('ATG'):
# Record the return value of get_orf
orf_return_value = get_orf(DNA[x:])
if orf_return_value:
ORFlist.append(orf_return_value)
x += len(orf_return_value)
# Increment by 3 if we don't hit ATG
else:
x += 3
return(ORFlist)

Python 2.7 - remove special characters from a string and camelCasing it

Input:
to-camel-case
to_camel_case
Desired output:
toCamelCase
My code:
def to_camel_case(text):
lst =['_', '-']
if text is None:
return ''
else:
for char in text:
if text in lst:
text = text.replace(char, '').title()
return text
Issues:
1) The input could be an empty string - the above code does not return '' but None;
2) I am not sure that the title()method could help me obtaining the desired output(only the first letter of each word before the '-' or the '_' in caps except for the first.
I prefer not to use regex if possible.
A better way to do this would be using a list comprehension. The problem with a for loop is that when you remove characters from text, the loop changes (since you're supposed to iterate over every item originally in the loop). It's also hard to capitalize the next letter after replacing a _ or - because you don't have any context about what came before or after.
def to_camel_case(text):
# Split also removes the characters
# Start by converting - to _, then splitting on _
l = text.replace('-','_').split('_')
# No text left after splitting
if not len(l):
return ""
# Break the list into two parts
first = l[0]
rest = l[1:]
return first + ''.join(word.capitalize() for word in rest)
And our result:
print to_camel_case("hello-world")
Gives helloWorld
This method is quite flexible, and can even handle cases like "hello_world-how_are--you--", which could be difficult using regex if you're new to it.

Find string between two substrings [duplicate]

This question already has answers here:
How to extract the substring between two markers?
(22 answers)
Closed 4 years ago.
How do I find a string between two substrings ('123STRINGabc' -> 'STRING')?
My current method is like this:
>>> start = 'asdf=5;'
>>> end = '123jasd'
>>> s = 'asdf=5;iwantthis123jasd'
>>> print((s.split(start))[1].split(end)[0])
iwantthis
However, this seems very inefficient and un-pythonic. What is a better way to do something like this?
Forgot to mention:
The string might not start and end with start and end. They may have more characters before and after.
import re
s = 'asdf=5;iwantthis123jasd'
result = re.search('asdf=5;(.*)123jasd', s)
print(result.group(1))
s = "123123STRINGabcabc"
def find_between( s, first, last ):
try:
start = s.index( first ) + len( first )
end = s.index( last, start )
return s[start:end]
except ValueError:
return ""
def find_between_r( s, first, last ):
try:
start = s.rindex( first ) + len( first )
end = s.rindex( last, start )
return s[start:end]
except ValueError:
return ""
print find_between( s, "123", "abc" )
print find_between_r( s, "123", "abc" )
gives:
123STRING
STRINGabc
I thought it should be noted - depending on what behavior you need, you can mix index and rindex calls or go with one of the above versions (it's equivalent of regex (.*) and (.*?) groups).
start = 'asdf=5;'
end = '123jasd'
s = 'asdf=5;iwantthis123jasd'
print s[s.find(start)+len(start):s.rfind(end)]
gives
iwantthis
s[len(start):-len(end)]
String formatting adds some flexibility to what Nikolaus Gradwohl suggested. start and end can now be amended as desired.
import re
s = 'asdf=5;iwantthis123jasd'
start = 'asdf=5;'
end = '123jasd'
result = re.search('%s(.*)%s' % (start, end), s).group(1)
print(result)
Just converting the OP's own solution into an answer:
def find_between(s, start, end):
return (s.split(start))[1].split(end)[0]
If you don't want to import anything, try the string method .index():
text = 'I want to find a string between two substrings'
left = 'find a '
right = 'between two'
# Output: 'string'
print(text[text.index(left)+len(left):text.index(right)])
source='your token _here0#df and maybe _here1#df or maybe _here2#df'
start_sep='_'
end_sep='#df'
result=[]
tmp=source.split(start_sep)
for par in tmp:
if end_sep in par:
result.append(par.split(end_sep)[0])
print result
must show:
here0, here1, here2
the regex is better but it will require additional lib an you may want to go for python only
Here is one way to do it
_,_,rest = s.partition(start)
result,_,_ = rest.partition(end)
print result
Another way using regexp
import re
print re.findall(re.escape(start)+"(.*)"+re.escape(end),s)[0]
or
print re.search(re.escape(start)+"(.*)"+re.escape(end),s).group(1)
Here is a function I did to return a list with a string(s) inbetween string1 and string2 searched.
def GetListOfSubstrings(stringSubject,string1,string2):
MyList = []
intstart=0
strlength=len(stringSubject)
continueloop = 1
while(intstart < strlength and continueloop == 1):
intindex1=stringSubject.find(string1,intstart)
if(intindex1 != -1): #The substring was found, lets proceed
intindex1 = intindex1+len(string1)
intindex2 = stringSubject.find(string2,intindex1)
if(intindex2 != -1):
subsequence=stringSubject[intindex1:intindex2]
MyList.append(subsequence)
intstart=intindex2+len(string2)
else:
continueloop=0
else:
continueloop=0
return MyList
#Usage Example
mystring="s123y123o123pp123y6"
List = GetListOfSubstrings(mystring,"1","y68")
for x in range(0, len(List)):
print(List[x])
output:
mystring="s123y123o123pp123y6"
List = GetListOfSubstrings(mystring,"1","3")
for x in range(0, len(List)):
print(List[x])
output:
2
2
2
2
mystring="s123y123o123pp123y6"
List = GetListOfSubstrings(mystring,"1","y")
for x in range(0, len(List)):
print(List[x])
output:
23
23o123pp123
To extract STRING, try:
myString = '123STRINGabc'
startString = '123'
endString = 'abc'
mySubString=myString[myString.find(startString)+len(startString):myString.find(endString)]
You can simply use this code or copy the function below. All neatly in one line.
def substring(whole, sub1, sub2):
return whole[whole.index(sub1) : whole.index(sub2)]
If you run the function as follows.
print(substring("5+(5*2)+2", "(", "("))
You will pobably be left with the output:
(5*2
rather than
5*2
If you want to have the sub-strings on the end of the output the code must look like below.
return whole[whole.index(sub1) : whole.index(sub2) + 1]
But if you don't want the substrings on the end the +1 must be on the first value.
return whole[whole.index(sub1) + 1 : whole.index(sub2)]
These solutions assume the start string and final string are different. Here is a solution I use for an entire file when the initial and final indicators are the same, assuming the entire file is read using readlines():
def extractstring(line,flag='$'):
if flag in line: # $ is the flag
dex1=line.index(flag)
subline=line[dex1+1:-1] #leave out flag (+1) to end of line
dex2=subline.index(flag)
string=subline[0:dex2].strip() #does not include last flag, strip whitespace
return(string)
Example:
lines=['asdf 1qr3 qtqay 45q at $A NEWT?$ asdfa afeasd',
'afafoaltat $I GOT BETTER!$ derpity derp derp']
for line in lines:
string=extractstring(line,flag='$')
print(string)
Gives:
A NEWT?
I GOT BETTER!
This is essentially cji's answer - Jul 30 '10 at 5:58.
I changed the try except structure for a little more clarity on what was causing the exception.
def find_between( inputStr, firstSubstr, lastSubstr ):
'''
find between firstSubstr and lastSubstr in inputStr STARTING FROM THE LEFT
http://stackoverflow.com/questions/3368969/find-string-between-two-substrings
above also has a func that does this FROM THE RIGHT
'''
start, end = (-1,-1)
try:
start = inputStr.index( firstSubstr ) + len( firstSubstr )
except ValueError:
print ' ValueError: ',
print "firstSubstr=%s - "%( firstSubstr ),
print sys.exc_info()[1]
try:
end = inputStr.index( lastSubstr, start )
except ValueError:
print ' ValueError: ',
print "lastSubstr=%s - "%( lastSubstr ),
print sys.exc_info()[1]
return inputStr[start:end]
from timeit import timeit
from re import search, DOTALL
def partition_find(string, start, end):
return string.partition(start)[2].rpartition(end)[0]
def re_find(string, start, end):
# applying re.escape to start and end would be safer
return search(start + '(.*)' + end, string, DOTALL).group(1)
def index_find(string, start, end):
return string[string.find(start) + len(start):string.rfind(end)]
# The wikitext of "Alan Turing law" article form English Wikipeida
# https://en.wikipedia.org/w/index.php?title=Alan_Turing_law&action=edit&oldid=763725886
string = """..."""
start = '==Proposals=='
end = '==Rival bills=='
assert index_find(string, start, end) \
== partition_find(string, start, end) \
== re_find(string, start, end)
print('index_find', timeit(
'index_find(string, start, end)',
globals=globals(),
number=100_000,
))
print('partition_find', timeit(
'partition_find(string, start, end)',
globals=globals(),
number=100_000,
))
print('re_find', timeit(
're_find(string, start, end)',
globals=globals(),
number=100_000,
))
Result:
index_find 0.35047444528454114
partition_find 0.5327825636197754
re_find 7.552149639286381
re_find was almost 20 times slower than index_find in this example.
My method will be to do something like,
find index of start string in s => i
find index of end string in s => j
substring = substring(i+len(start) to j-1)
This I posted before as code snippet in Daniweb:
# picking up piece of string between separators
# function using partition, like partition, but drops the separators
def between(left,right,s):
before,_,a = s.partition(left)
a,_,after = a.partition(right)
return before,a,after
s = "bla bla blaa <a>data</a> lsdjfasdjöf (important notice) 'Daniweb forum' tcha tcha tchaa"
print between('<a>','</a>',s)
print between('(',')',s)
print between("'","'",s)
""" Output:
('bla bla blaa ', 'data', " lsdjfasdj\xc3\xb6f (important notice) 'Daniweb forum' tcha tcha tchaa")
('bla bla blaa <a>data</a> lsdjfasdj\xc3\xb6f ', 'important notice', " 'Daniweb forum' tcha tcha tchaa")
('bla bla blaa <a>data</a> lsdjfasdj\xc3\xb6f (important notice) ', 'Daniweb forum', ' tcha tcha tchaa')
"""
Parsing text with delimiters from different email platforms posed a larger-sized version of this problem. They generally have a START and a STOP. Delimiter characters for wildcards kept choking regex. The problem with split is mentioned here & elsewhere - oops, delimiter character gone. It occurred to me to use replace() to give split() something else to consume. Chunk of code:
nuke = '~~~'
start = '|*'
stop = '*|'
julien = (textIn.replace(start,nuke + start).replace(stop,stop + nuke).split(nuke))
keep = [chunk for chunk in julien if start in chunk and stop in chunk]
logging.info('keep: %s',keep)
Further from Nikolaus Gradwohl answer, I needed to get version number (i.e., 0.0.2) between('ui:' and '-') from below file content (filename: docker-compose.yml):
version: '3.1'
services:
ui:
image: repo-pkg.dev.io:21/website/ui:0.0.2-QA1
#network_mode: host
ports:
- 443:9999
ulimits:
nofile:test
and this is how it worked for me (python script):
import re, sys
f = open('docker-compose.yml', 'r')
lines = f.read()
result = re.search('ui:(.*)-', lines)
print result.group(1)
Result:
0.0.2
This seems much more straight forward to me:
import re
s = 'asdf=5;iwantthis123jasd'
x= re.search('iwantthis',s)
print(s[x.start():x.end()])

Resources