How to count number of substrings in python, if substrings overlap? - python-3.x

The count() function returns the number of times a substring occurs in a string, but it fails in case of overlapping strings.
Let's say my input is:
^_^_^-_-
I want to find how many times ^_^ occurs in the string.
mystr=input()
happy=mystr.count('^_^')
sad=mystr.count('-_-')
print(happy)
print(sad)
Output is:
1
1
I am expecting:
2
1
How can I achieve the desired result?

New Version
You can solve this problem without writing any explicit loops using regex. As #abhijith-pk's answer cleverly suggests, you can search for the first character only, with the remainder being placed in a positive lookahead, which will allow you to make the match with overlaps:
def count_overlapping(string, pattern):
regex = '{}(?={})'.format(re.escape(pattern[:1]), re.escape(pattern[1:]))
# Consume iterator, get count with minimal memory usage
return sum(1 for _ in re.finditer(regex, string))
[IDEOne Link]
Using [:1] and [1:] for the indices allows the function to handle the empty string without special processing, while using [0] and [1:] for the indices would not.
Old Version
You can always write your own routine using the fact that str.find allows you to specify a starting index. This routine will not be very efficient, but it should work:
def count_overlapping(string, pattern):
count = 0
start = -1
while True:
start = string.find(pattern, start + 1)
if start < 0:
return count
count += 1
[IDEOne Link]
Usage
Both versions return identical results. A sample usage would be:
>>> mystr = '^_^_^-_-'
>>> count_overlapping(mystr, '^_^')
2
>>> count_overlapping(mystr, '-_-')
1
>>> count_overlapping(mystr, '')
9
>>> count_overlapping(mystr, 'x')
0
Notice that the empty string is found len(mystr) + 1 times. I consider this to be intuitively correct because it is effectively between and around every character.

you can use regex for a quick and dirty solution :
import re
mystr='^_^_^-_-'
print(len(re.findall('\^(?=_\^)',mystr)))

You need something like this
def count_substr(string,substr):
n=len(substr)
count=0
for i in range(len(string)-len(substr)+1):
if(string[i:i+len(substr)] == substr):
count+=1
return count
mystr=input()
print(count_substr(mystr,'121'))
Input: 12121990
Output: 2

Related

How can I convert this code into one liner or reduce the number of lines using list comprehension?

def consecutive_zeros(input_binary):
count = 0
count_list = list()
for x in input_binary:
if x == "0":
count += 1
else:
count_list.append(count)
count = 0
return max(count_list)
I tried different ways to implement the same but was getting syntax error or wrong output.
Is there a more efficient way in which I can implement the same? How to make it one liner?
It looks like you want to find the longest sequence of zeros following a one. If this is correct zeros in the end should not be counted. I have a solution that is based on string operations as I assume your input is a string. If not please consider adding an example input to your question.
def consecutive_zeros(input_binary):
return max(map(len, input_binary.rstrip('0').split('1')))
print(consecutive_zeros('0000111110001000000')) # 4
print(consecutive_zeros('00001111100010000001')) # 6
EDIT: As your function is named consecutive_zeros it could be that you also want a sequence in the end, which would not be counted in your code. If you want to count it you can use this code:
def consecutive_zeros(input_binary):
return max(map(len, input_binary.split('1')))
print(consecutive_zeros('0000111110001000000')) # 6
print(consecutive_zeros('00001111100010000001')) # 6
Per the function in your question, which returns the number of leading 0s, you can use this:
def consecutive_zeros(input_binary):
return len(input_binary) - len(input_binary.lstrip('0'))

Shortest code to return current index number in string in 'for n in 'word': loop

I have a question about strings. I thought that this code:
for n in 'banana':
print(n)
would return this:
0
1
2
3
4
5
But, of course, it doesn't. It returns the value at each position in the string, not the position number. In order for me to understand this better, I thought it might help to write the simplest possible program to achieve the output I thought I'd get:
count = 0
for n in 'banana':
print(count)
count += 1
This works, but surely there's a more direct way to access the position number that the current iteration is looking at? Can't see any methods that would achieve this directly though.
These are all equivalent:
i = 0
for n in 'banana':
print(i)
i += 1
for i, w in enumerate('banana'):
print(i)
for i in range(len('banana')):
print(i)
print(*range(len('banana')), sep='\n')
As posted in the other answer, enumerate() works:
for idx, character in enumerate('myword'):
print(f"Index={idx} character={character}")
It is worth pointing out that in this Python treats strings as arrays. When you have "abc"[0] it will return a. And, similarly, when you say 'give me each element in some array' it will simply give you the element, not the index of that element - which would be counterintuitive.

Way to check if target string contains any item in list and get the index? | Python3

Python3
I am looking for a way to check if any element inside my list, is contained within target string.
Now - if the condition is met, I need to get the index.
I have learned about the .find() method but it only compares one value and I need a way to test them all and get the position.
Edit: Many thanks for the answers! That's the stuff
If there's only one target string to search ("haystack"), and it's not absurdly huge (billions of characters), or the number of strings to be searched for ("needles") is smallish, just do the linear scans the naive way:
haystack = '....'
needles = ['...', '...']
hits = {}
for needle in needles:
try:
hits[needle] = haystack.index(needle)
except ValueError:
pass # needle not found
# Or if exceptions aren't allowed, test and check
for needle in needles:
idx = haystack.find(needle)
if idx >= 0:
hits[needle] = idx
If you've got many needles to search for in many (or huge) haystacks, you can get major speed-ups from Aho-Corasick string search, which I've already covered in detail here.
Your question is vague. However, here is what I'm guessing you're asking for.
targ_string = 'hello world'
elements = ['a', 'b' 'c', 'd']
for char in elements:
# will print the index of the character in the string, and -1 if it wasn't found
print('The index of',char,'is',targ_string.find(char))
Output:
--------------------------
The index of a is -1
The index of bc is -1
The index of d is 10

Unable to Reverse the text using 'for' Loop Function

I want to reverse the string using the Loop & Function. But when I use the following code, it is output the exact same string again. But it suppose to reverse the string. I can't figure out why.
def reversed_word(word):
x=''
for i in range(len(word)):
x+=word[i-len(word)]
print(i-len(word))
return x
a=reversed_word('APPLE')
print(a)
If you look at the output of your debug statement (the print in the function), you'll see you're using the indexes -5 through -1.
Since negative indexes specify the distance from the end of the string, -5 is the A, -4 is the first P, and so on. And, since you're appending these in turn to an originally empty string, you're just adding the letters in the same order they appear in the original.
To add them in the other order, you can simply use len(word) - i - 1 as the index, giving the sequence (len-1) .. 0 (rather than -len .. -1, which equates to 0 .. (len-1)):
def reversed_word(word):
result = ""
for i in range(len(word)):
result += word[len(word) - i - 1]
return result
Another alternative is to realise you don't need to use an index at all since iterating over a string gives it to you one character at a time. However, since it gives you those characters in order, you need to adjust how you build the reversed string, by prefixing each character rather than appending:
def reverse_string(word):
result = ""
for char in word:
result = char + result
return result
This builds up the reversed string (from APPLE) as A, PA, PPA, LPPA and ELPPA.
Of course, you could also go fully Pythonic:
def reverse_string(word):
return "".join([word[i] for i in range(len(word), -1, -1)])
This uses list comprehension to create a list of characters in the original string (in reverse order) then just joins that list into a single string (with an empty separator).
Probably not something I'd hand in for classwork (unless I wanted to annoy the marker) but you should be aware that that's how professional Pythonistas usually tackle the problem.
Let's say your word is python.
You loop will then iterate over the values 0 through 5, since len(word) == 6.
When i is 0, i-len(word) is -6 (note carefully that this value is negative). You'll note that word[-6] is the character six places to the left from the end of the string, which is p.
Similarly, when i is 1, i-len(word) is -5, and word[i-len(word)] is y.
This pattern continues for each iteration of your loop.
It looks like you intend to use positive indices to step backward through the string with each iteration. To obtain this behavior, try using the expression len(word)-i-1 to index your string.
def reversed_word(word):
reversed = ''
for i in range(len(word)-1, -1, -1):
reversed += word[i]
return reversed
print(reversed_word("apple"))

finding DNA codon starting with a or t with regular expression

Given a DNA sequence of codons, I want to get the precentage of codons starting with A or T.
The DNA sequence would be something like: dna = "atgagtgaaagttaacgt". Eeach sequence starting in the 0,3,6 etc. positions <-and that's the source of the problem as far as my intentions goes
What we wrote and works:
import re
DNA = "atgagtgaaagttaacgt"
def atPct(dna):
'''
gets a dna sequence and returns the %
of sequences that are starting with a or t
'''
numOfCodons = re.findall(r'[a|t|c|g]{3}',dna) # [a|t][a|t|c|g]{2} won't give neceseraly in the pos % 3==0 subseq
count = 0
for x in numOfCodons:
if str(x)[0]== 'a' or str(x)[0]== 't':
count+=1
print(str(x))
return 100*count/len(numOfCodons)
print(atPct(DNA))
My goal is to find it without that for loop, somehow I feel there's a way more elegant way to do this just with regular expressions but I might be wrong, if there's a better way i would be glad to learn how! is there a way to cross the location and "[a|t][a|t|c|g]{2}" as a regular expression?
p.s question assume it's a valid dna sequence that's why i haven't even checked that
A loop will be faster than doing it another way. Still, you can use sum and a generator expression (another SO answer) to improve readability:
import re
def atPct(dna):
# Find all sequences
numSeqs = re.findall('[atgc]{3}', DNA)
# Count all sequences that start with 'a' or 't'
atSeqs = sum(1 for seq in numSeqs if re.match('[at]', seq))
# Return the calculation
return 100 * len(numSeqs) / atSeqs
DNA = "atgagtgaaagttaacgt"
print( atPct(DNA) )
So you just want to find out the percentage of times a or t appear in the first of every three characters in the string? Use the step parameter of a slice:
def atPct(dna):
starts = dna[::3] # Every third character of dna, starting with the first
return (starts.count('a') + starts.count('t')) / len(starts)

Resources