code for finding longest substring without repeating characters not working - string

I have written the following code for finding the length of the longest substring without repeating characters below, but it doesn't work - would anyone know why? (I know there are other solutions on the internet that work but the following code is written in my style and I'd ideally like to adapt it)
def longestSubstring(str):
start = 0
maxLen = 1
hashSet = set()
for i in range(len(str)):
if str[i] not in hashSet:
hashSet.add(str[i])
maxLen = max(maxLen, i - start + 1)
continue
else:
while str[start] != str[i]:
hashSet.discard(str[start])
start += 1
hashSet.discard(str[start])
start += 1
return maxLen

It is just one line, remove the 2nd hashSet.discard(str[start]) you don't want to remove this character from the set, you just need to increase start, it is the character you just encountered.

Related

find a sequence in string

Hi I've got problem set in cs50 and having difficulties as this is my first week in Python and I would be appreciate if you don't directly write an open answer but forward me to the right functions or method to use.
We've been given a long string sequence in a .txt file, one line and no white spaces. I have to find the longest consecutive sequence of words of given DNA string
example txt:
GGAGGCCAAAGTCTTGTGATATCGGGCAACTCCCCGGGAGGAACACAGGCCCACCGAAAACAGCTTGAAATGGGAAACGTTCCCGATCTACGCCGGGCCAGAGG
original text is around 5000 characters but it goes like the example below. My task is to find the longest consecutive sequences of 'AGATC' string.
lets say the first consequtive sequence is 23 times, after i kept reading and find another consequtive sequences in 34 times, I have to store the biggest number.
My problem is not to find a way to read and analyse a string in this way. I can read a string can find the total repetitive times and so on but finding the longest repetition is not making sense in every way I've tried. I thought C was hard but I can write this code with C so easily as I we can manipulate strings in so much way in C. At least in C there are ways to read in a size but as far as I see Python reads at once and there is no control over read. In Python it doesn't seem you can make much with, at least in my level of knowledge at the moment :/ Probably Python got one line solutions for this, please don't judge this is my 3rd day and 4th program in Python.
What functions or methods I should look to analyze a string in this way. I've watched videos for a similiar thing but for sequence of single character, not a string. Also bought the Python Crash Course to get some knowledge about the string manipulation but couldn't find anything related in this case. Also checked the Python documentation but obviously it's so much complicated for day 3 in Python.
Could anyone help me please.TIA
here is my not-working and not-making-sense code
import csv
import sys
#check the arguments count
if len(sys.argv) != 3:
print("Usage: python dna.py data.csv sequence.txt")
sys.exit(1)
#create a dictionary to store str results
SEQ = {
"AGATC": 0,
"AATG": 0,
"TATC": 0
}
counter = 0 #keeps the the length of the sequence
seq = 0 #keeps the longest sequence
DNA = '' ## keeps the key of SEQ, "AGATC" etc.
#find the longest consecutive sequence of DNA
def findSEQ(file, DNA): #get the sequences text file and the string of the key as parameters
for DNA in (DNA, file):
if file[i:i + len(DNA)] == DNA: #if find a match
counter += 1 #count up the sequence
else:
if counter > seq: #if it's not a sequence the next thing it reads
seq = counter
counter = 0
return seq
seq = 0
#open sequence file and read
with open(sys.argv[2],'r') as file:
reader = csv.reader(file)
#find the longest sequence of AGATC
findSEQ("AGATC", file)
#update the seq dictionary
SEQ["AGATC"] = seq
#find the longest sequence of AATG
findSEQ(file, "AATG")
#update the seq dictionary
SEQ["AATG"] = seq
#find the longest sequence of TATC
findSEQ(file, "TATC")
#update the seq dictionary
SEQ["TATC"] = seq
#open and read database
with open(sys.argv[1], "r") as file:
reader = csv.reader(file)
#skip the first row
next(reader)
#compare the seq dictionary results with database
for row in reader:
seq1, seq2, seq3 = row[1], row[2], row[3]
#if found any match print the name
if SEQ[seq1] == row[1] and SEQ[seq2] == row[2] and SEQ[seq3] == row[3]:
print(row[0])
#otherwise print not found
else:
print("Not found any match.")
To elaborate on my comment, please find the following example:
import re
text = 'GGAGGCCAAGATCAAGTCTTGTGATATCGGGCAACTCCCCGGGAAGATCAGATCAGATCGGAACACAGGCCCACCGAAAACAGCTTGAAGATCAATGGGAAACGTTCCCGATCTACGCCGGGCCAGAGG'
sequence = 'AGATC'
pattern = f'(?:{sequence})+'
findings = sorted(re.findall(pattern, text), key=len)
longest_sequence = len(findings[-1]) / len(sequence)
print(f'longest sequence: {longest_sequence}')
This program uses regex (regular expressions) to find sequences of the pattern you're looking for. It then sorts the findings by length (in an ascending order), allowing you to find the longest sequences in the last index of the list.

Sorting strings without methods and other types

Hello I have to reorder a string, I am banned from using other types and str methods
So my problem is that I could not figure out how to end my code to get it work with any string
I tried to compare the results with sorted() to check and I am stuck at the first exchange
My code:
i = 0
s1 = "hello"
s2 = sorted(s1)
while (i<len(s1)):
j=i+1
while (j<=len(s1)-1):
if (s1[i] > s1[j]):
s1 = s1[0:i] + s1[j] + s1[i]
j+=1
i+=1
print(s1)
print(s2)
I tried to add + s1[len(s1):] at the end of the operation but
I only had found the result for a single string(that I was testing) adding thisI am really stuck, how can I make it work for all the strings with different lenghts??
Thanks
You're not reconstructing the string correctly when doing s1 = s1[0:i] + s1[j] + s1[i] as you're replacing one character for the other but you omit to actually interchange the two and to add the remains of the splitted string to the end of the new string.
Given what your code looks like, I would do it like this:
i = 0
s1 = "hello"
s2 = sorted(s1)
while i < len(s1):
j = i + 1
while j <= len(s1)-1:
if s1[i] > s1[j]:
s1 = s1[0:i] + s1[j] + s1[i+1:j] + s1[i] + s1[j+1:len(s1)]
j += 1
i += 1
print("".join(s2))
# > 'ehllo'
print(s1)
# > 'ehllo'
Please tell me if anything is unclear!
I am banned from using other types and str methods
Based upon your criteria, your request is impossible. Just accessing the elements of a string requires string methods.
The technique that you are using is very convoluted, hard to read and is difficult to debug. Try running your code in a debugger.
Now given that you are allowed to convert a string to a list (which requires string methods), redesign your code to use simple, easy to understand statements.
The following code first converts the string into a list. Then loops thru the list starting at the beginning and compares each following character to the end. If any character is less then the current character, swap. As you step thru the string, the character swaps will result in a sorted list. At the end convert the list back to a string using join().
msg = 'hello'
s = list(msg)
for i in range(len(s) - 1):
for j in range(i + 1, len(s)):
if s[i] <= s[j]:
continue
# swap characters
s[i], s[j] = s[j], s[i]
print(msg)
print(''.join(s))

How do I achieve this following function only using while loop?

I'm currently working on this problem that ask me to generate an arrow pattern using loops function that looks something like this:
"How many columns? 3"
*
*
*
*
*
I know I can do this with for loop(probably more efficient too), but that is not what I aimed for. I wanted to achieve this only using while loop.
I have some ideas:
1. I set up a control variable and an accumulator to control the loop
2. I then write 2 separate loops to generate the upper and lower part of the pattern. I was thinking about inserting the space before the asterisks using method like this:
(accumulator - (accumulator - integer)) * spaces.
#Ask the user how many column and direction of column
#they want to generate
Keep_going = True
Go = 0
while keep_going:
Column_num = int(input("How many columns? "))
if Column_num <= 0:
print("Invalid entry, try again!")
else:
print()
Go = 1
#Upper part
while Keep_going == True and Go == 1:
print("*")
print(""*(Column_num - (Column_num - 1) + "*")
...but I soon realized it wouldn't work because I don't know the user input and thus cannot manually calculate how many spaces to insert before asterisks. Now everything on the internet tells me to use for loop and range function, I could do that, but I think that is not helpful for me to learn python since I couldn't utilize loops very well yet and brute force it with some other method just not going to improve my skills.
I assume this is achievable only using while loop.
#Take your input in MyNumber
MyNumber = 5
i = 1
MyText = '\t*'
while i <=MyNumber:
print(MyText.expandtabs(i-1))
i = i+1
i = i-1
while i >=1:
print(MyText.expandtabs(i-1))
i = i-1
Python - While Loop
Well first you have to understand that a while loop loops until a requirement is met.
And looking at your situation, to determine the number of spaces before the * you should have an ongoing counter, a variable that counts how many spaces are needed before you continue. For example:
###Getting the number of columns###
while True:
number=int(input('Enter number of rows: '))
if number<=0:
print('Invalid')
else:
###Ending the loop###
break
#This will determine the number of spaces before a '*'
counter=0
#Loops until counter equals number
while counter!=number:
print(" "*counter + "*")
#Each time it loops the counter variable increases by 1
counter=counter+1
counter=counter-1
#Getting the second half of the arrow done
while counter!=0:
counter=counter-1
print(" "*counter + "*")
Please reply if this did not help you so that i can give a more detailed response

Convert S to T by performing K operations (HackerRank)

I was solving a problem on HackerRank. It required me to see if it is possible to convert string s to string t by performing k operations.
https://www.hackerrank.com/challenges/append-and-delete/problem
The operations we can perform are: appending a lowercase letter to the end of s or removing a lowercase letter from the end of s. For example Ash Ashley 2 would return No since we need 3 operations, not 2.
I tried solving the problem as follows:
def appendAndDelete(s, t, k):
if len(s) > len(t):
maxs = [s,t]
else:
maxs = [t,s]
maximum = maxs[0]
minimum = maxs[1]
k -= len(maximum) - len(minimum)
substr = maximum[len(minimum): len(maximum)]
maximum = maximum.replace(substr, '')
i = 0
while i < len(maximum):
if maximum[i] != minimum[i]:
k -= (len(maximum)-i)*2
break
i += 1
if k < 0:
return 'No'
else:
return 'Yes'
However, it fails at this weird test case. y yu 2. The expected answer is No but according to my code, it would return Yes since only one operation was required. Is there something I do not understand?
Since you don't explain your idea, it's difficult for us to understand
what you mean in your code and debug it to tell you where you went wrong.
However, I would like to share my idea(I solved this on the website too)-
len1 => Length of first string s.
len2 => Length of second/target string t.
Exactly K makes it a bit tricky. So, if len1 + len2 <= k, you can blindly assume it can be accomplished and return true since we can delete empty string many times to get an empty string(as it says) and we can delete characters of one string entirely and keep appending new letters to get the another.
When we start matching s with t from left to right, this looks more like longest common prefix but this is NOT the case. Let's take an example -
aaaaaaaaa (source)
aaaa (target)
7 (k)
Here, up till aaaa it's common and looks like there are additional 5 a's in the source. So, we can delete those 5 a's and get the target but 5 != 7, hence it appears to be a No. But this ain't the case since we can delete an a from the source just like that and append it again(2 operations) just to satisfy k. So, it need not be longest common prefix all the time, however it gets us closer to the solution.
So, let's match both strings from left to right and stop when there is a mismatch. Let's assume we got this index in a variable called first_unmatched. Initialize first_unmatched = min(len(s),len(t)) at the beginning of your method itself.
Let
rem1 = len1 - first_unmatched
rem2 = len2 - first_unmatched
where rem1 is remaining substring of s and rem2 is the remaining substring of t.
Now, comes the conditions.
if(rem1 + rem2 == k) return true-
This is because rem1 characters to delete and rem2 characters to add. If both sum up to k then it's possible.
if(rem1 + rem2 > k) return false-
This is because rem1 characters to delete and rem2 characters to add. If both sum greater than k then it's not possible.
if(rem1 + rem2 < k) return (k - (rem1 + rem2)) % 2 == 0-
This is because rem1 characters to delete and rem2 characters to add. If both sum less than k, then it depends.
Here, (k - (rem1 + rem2)) will give you the extra in k. This extra can or cannot depends upon whether it's divisible by 2 or not. Here, we do %2 because we have 2 operations in our question - delete and append. If the extra k falls short of any operation, then the answer is No, else it's a Yes.
You can cross check this with above example.

How to get longest alphabetically ordered substring in python

I am trying to write a function that returns the longest substring of s in which the letters occur in alphabetical order. For example, if s = 'azcbobobegghakl', the function should return 'beggh'
Here is my function, which is still not complete but it does not return the list of sub;
the return error is:
"IndexError: string index out of range"
def longest_substring(s):
sub=[]
for i in range (len(s)-1):
subs=s[i]
counter=i+1
while ord(s[i])<ord(s[counter]):
subs+=s[counter]
counter+=1
sub.append(subs)
return sub
It is not optimal (works in linear time O(n)) but i made some modification to your code (in Python 3):
def longest_substring(s):
length = len(s)
if length == 0 : # Empty string
return s
final = s[0]
for i in range (length-1):
current = s[i]
counter = i+1
while counter < length and ord(s[i]) <= ord(s[counter]):
current += s[counter]
counter +=1
i+=1
if len(final) < len(current):
final = current
return final
s = 'azcbobobegghakl'
print(longest_substring(s))
Output:
beggh
Modifications:
You are comparing character with fixed position i.e. in while loop you are incrementing only counter not i so I incremented
the ith position also.(So we avoid checking the characters which are already checked, So it does this in linear time O(n) I think..)
Also you are only checking less than for condition while ord(s[i])<ord(s[counter]): But you also have to check for equals too.
You created one list where you append every sequence which is unnecessary unless you want do any other calculations on the
sequence, So I take string and if previous sequence's length is small
then I updated it with new sequence.
Note : If two sequence's length is same then 1st occurring sequence is shown as output.
Another Input:
s = 'acdb'
Output:
acd
I hope this will help you.

Resources