Max Length Removal - string

The problem is If there is “100” as a sub-string in the string, then we can delete this sub-string. The task is to find the length of longest sub-string which can be make removed?
s=input('')
i=0
if '100' not in s:
print('0')
else:
st=''
while i<len(s)-2:
if s[i:i+3]=='100':
s= s.replace('100','')
a=s.find('100')
if a<=i:
st=st+'100'
i=a
else:
st='100'
i=i+1
else:
i=i+1
print(len(st))
for the input: 101001010000,this code is printing 9 instead of 12,
somehow the else part is not getting executed..
please someone help me out

s.replace() removes all occurrences of the substring, not just the first, and searching from the start.
This means that '101001010000'.replace('100', '') replaces two occurrences:
>>> '101001010000'.replace('100', '')
'101000'
but you count that as one replacement.
str.replace() takes a third argument, the number of replacements to be made, see the documentation:
str.replace(old, new[, count])
Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.
Use that to limit the number of replacements.

Related

Efficient way to check if a specific character in a string appears consecutively

Say the character which we want to check if it appears consecutively in a string s is the dot '.'.
For example, 'test.2.1' does not have consecutive dots, whereas 'test..2.2a...' has consecutive dots. In fact, for the second example, we should not even bother with checking the rest of the string after the first occurence of the consecutive dots.
I have come up with the following simple method:
def consecutive_dots(s):
count = 0
for c in data:
if c == '.':
count += 1
if count == 2:
return True
else:
count = 0
return False
I was wondering if the above is the most 'pythonic' and efficient way to achieve this goal.
You can just use in to check if two consecutive dots (that is, the string "..") appear in the string s
def consecutive_dots(s):
return '..' in s

Variable length lookahead/behind regex when you don't know exactly what you're matching (python)

I have some transcriptions that unfortunately contain lots of occurrences of words separated by a period but no space (ie word.word).
Is there a way to use regex to separate these, but leave other words like decimals and abbreviations such as U.K. or U.S.A alone? I'm planning to tokenize the text, and so i want the word.word occurrences to be counted as separate words, but I don't want to mess up abbreviations/decimals/any other places where the period is part of the word. Since I would want to replace these specific word.word periods with a space but leave all others alone (or at least not replace them with a space because then it would break up the abbreviation), my first thought was something like this:
text = re.sub("(?<!\d){2,}\.(?!\d){2,}", " ", text)
look for periods that are surrounded by at least two or more not-digits, and then just replace the period with a space. But it seems that variable length lookbehind/lookahead isn't really a thing you can do. I've tested this out in some regex testers and it still matches the letter abbreviations above, although it does not match decimals.
Is there another way to write what I've thought about or another way to approach this? I've gotten somewhat mentally stuck in this solution and I can't find another way that will do close to what I'm looking to do - can it even be done?
Thank you!
Ok, so :D
i have written this code, which i have given it the string "i.would.like.to.visit.the.U.S.A.or.the.u.k.while.i.am.eating.a.banana.b" (the b is there for a purpose, to make sure it doesn't delete one letters for no reason), and the output was:
['i', 'would', 'like', 'to', 'visit', 'the', 'USA', 'or', 'the', 'uk', 'while', 'i', 'am', 'eating', 'a', 'banana', 'b'].
The code is:
text = "i.would.like.to.visit.the.U.S.A.or.the.u.k.while.i.am.eating.a.banana.b"
def split(string: str):
string = string.split(".")
length = len(string) - 1
obj = enumerate(string)
together = []
for index, word in obj:
sub = []
if index and len(word) == 1 and index < length:
idx = index
while len(string[idx]) == 1:
sub.append((string[idx], idx))
idx += 1
next(obj)
together.append(sub)
if together:
deleted = 0
for sub in together:
if len(sub) > 1:
string[sub[0][1] - deleted:sub[-1][1] + 1 - deleted] = ["".join(x[0] for x in sub)]
deleted += len(sub) - 1
return string
print(split(text))
You can edit the section "".join(x[0] for x in sub) to ".".join(x[0] for x in sub) in order to keep the dots, (U.S.A instead of USA)
If you are just trying to add space if both sides are two or more characters the following is what you are looking for.
text = re.sub(r"([^\d.]{2})\.([^\d.]{2})", r"\1. \2", text)
Example:
"This sentence ends.The following is an abbreviation A.B.C." becomes
"This sentence ends. The following is an abbreviation A.B.C."

Can someone explain how this code works with range and slicing?

s = 'eljwboboblejr' # dont paste into grader
count = 0
for i in range (len(s)):
if s[i:i+3]== 'bob':
count+=1
print('Number of times bob occurs is: ' + str(count))
I do not get how len is working here, or if s[i:i+3] == 'bob'
So what happens here is that the i goes through all the letters, and slice all the letters by i and i+3 in each loop. What len is doing is just taking the length of s (basically how many characters there are in it) and returning it as an integer. What the s[i:i+3] == 'bob' is doing is determining if the sliced string is equal to 'bob'. So imagine that the i represents all the letters in the s string. So if the sliced string that is contained by the i and i+3 has 'bob' in it, it returns true. It's not the greatest of explanations, but I hope it helps.
documentation for len is here:
https://docs.python.org/3.2/library/functions.html#len
It will be implemented in string as a magic private function (__len__, I believe).
documentation for range is here:
https://docs.python.org/3.2/library/functions.html#range
With one arg, range generates integers 0 to that arg (excluding arg itself).
The slice in the loop evaluates to 'elj', then 'ljw', then 'jwb', ... in subsequent iterations. The slice [a:b] doesn't include the b'th element.

Is there a way to substring, which is between two words in the string in Python?

My question is more or less similar to:
Is there a way to substring a string in Python?
but it's more specifically oriented.
How can I get a par of a string which is located between two known words in the initial string.
Example:
mySrting = "this is the initial string"
Substring = "initial"
knowing that "the" and "string" are the two known words in the string that can be used to get the substring.
Thank you!
You can start with simple string manipulation here. str.index is your best friend there, as it will tell you the position of a substring within a string; and you can also start searching somewhere later in the string:
>>> myString = "this is the initial string"
>>> myString.index('the')
8
>>> myString.index('string', 8)
20
Looking at the slice [8:20], we already get close to what we want:
>>> myString[8:20]
'the initial '
Of course, since we found the beginning position of 'the', we need to account for its length. And finally, we might want to strip whitespace:
>>> myString[8 + 3:20]
' initial '
>>> myString[8 + 3:20].strip()
'initial'
Combined, you would do this:
startIndex = myString.index('the')
substring = myString[startIndex + 3 : myString.index('string', startIndex)].strip()
If you want to look for matches multiple times, then you just need to repeat doing this while looking only at the rest of the string. Since str.index will only ever find the first match, you can use this to scan the string very efficiently:
searchString = 'this is the initial string but I added the relevant string pair a few more times into the search string.'
startWord = 'the'
endWord = 'string'
results = []
index = 0
while True:
try:
startIndex = searchString.index(startWord, index)
endIndex = searchString.index(endWord, startIndex)
results.append(searchString[startIndex + len(startWord):endIndex].strip())
# move the index to the end
index = endIndex + len(endWord)
except ValueError:
# str.index raises a ValueError if there is no match; in that
# case we know that we’re done looking at the string, so we can
# break out of the loop
break
print(results)
# ['initial', 'relevant', 'search']
You can also try something like this:
mystring = "this is the initial string"
mystring = mystring.strip().split(" ")
for i in range(1,len(mystring)-1):
if(mystring[i-1] == "the" and mystring[i+1] == "string"):
print(mystring[i])
I suggest using a combination of list, split and join methods.
This should help if you are looking for more than 1 word in the substring.
Turn the string into array:
words = list(string.split())
Get the index of your opening and closing markers then return the substring:
open = words.index('the')
close = words.index('string')
substring = ''.join(words[open+1:close])
You may want to improve a bit with the checking for the validity before proceeding.
If your problem gets more complex, i.e multiple occurrences of the pair values, I suggest using regular expression.
import re
substring = ''.join(re.findall(r'the (.+?) string', string))
The re should store substrings separately if you view them in list.
I am using the spaces between the description to rule out the spaces between words, you can modify to your needs as well.

Split by the delimiter that comes first, Python

I have some unpredictable log lines that I'm trying to split.
The one thing I can predict is that the first field always ends with either a . or a :.
Is there any way I can automatically split the string at whichever delimiter comes first?
Look at the index of the . and : characters in the string using the index() function.
Here’s a simple implementation:
def index_default(line, char):
"""Returns the index of a character in a line, or the length of the string
if the character does not appear.
"""
try:
retval = line.index(char)
except ValueError:
retval = len(line)
return retval
def split_log_line(line):
"""Splits a line at either a period or a colon, depending on which appears
first in the line.
"""
if index_default(line, ".") < index_default(line, ":"):
return line.split(".")
else:
return line.split(":")
I wrapped the index() function in an index_default() function because if the line doesn’t contain a character, index() throws a ValueError, and I wasn’t sure if every line in your log would contain both a period and a colon.
And then here’s a quick example:
mylines = [
"line1.split at the dot",
"line2:split at the colon",
"line3:a colon preceded. by a dot",
"line4-neither a colon nor a dot"
]
for line in mylines:
print split_log_line(line)
which returns
['line1', 'split at the dot']
['line2', 'split at the colon']
['line3', 'a colon preceded. by a dot']
['line4-neither a colon nor a dot']
Check the indexes for both both characters, then use the lowest index to split your string.

Resources