Get the nth occurrence of a letter in a string (python) - python-3.x

Let's say there is a string "abcd#abcd#a#"
How to get the index of the 2nd occurrence of '#' , and get the output as 9?
Since the position of the second occurrence of '#' is 9

Using a generator expression:
text = "abcd#abcd#a#"
gen = (i for i, l in enumerate(text) if l == "#")
next(gen) # skip as many as you need
4
next(gen) # get result
9
As a function:
def index_for_occurrence(text, token, occurrence):
gen = (i for i, l in enumerate(text) if l == token)
for _ in range(occurrence - 1):
next(gen)
return next(gen)
Result:
index_for_occurrence(text, "#", 2)
9

s = 'abcd#abcd#a#'
s.index('#', s.index('#')+1)

Related

Finding a substring that occurs k times in a long string

I'm trying to solve some algorithm task, but the solution does not pass the time limit.
The condition of the task is the following:
You are given a long string consisting of small Latin letters. We need to find all its substrings of length n that occur at least k times.
Input format:
The first line contains two natural numbers n and k separated by a space.
The second line contains a string consisting of small Latin letters. The string length is 1 ≤ L ≤ 10^6.
n ≤ L, k ≤ L.
Output Format:
For each found substring, print the index of the beginning of its first occurrence (numbering in the string starts from zero).
Output indexes in any order, in one line, separated by a space.
My final solution looks something like this:
def polinomial_hash(s: str, q: int, R: int) -> int:
h = 0
for c in s:
h = (h * q + ord(c)) % R
return h
def get_index_table(inp_str, n):
q = 1000000007
power = q ** (n-1)
R = 2 ** 64
M = len(inp_str)
res_dict = {}
cur_hash = polinomial_hash(inp_str[:n], q, R)
res_dict[cur_hash] = [0]
for i in range(n, M):
first_char = inp_str[i-n]
next_char = inp_str[i]
cur_hash = (
(cur_hash - ord(first_char)*(power))*q
+ ord(next_char)) % R
try:
d_val = res_dict[cur_hash]
d_val += [i-n+1]
except KeyError:
res_dict[cur_hash] = [i-n+1]
return res_dict
if __name__ == '__main__':
n, k = [int(i) for i in input().split()]
inp_str = input()
for item in get_index_table(inp_str, n).values():
if len(item) >= k:
print(item[0], end=' ')
Is it possible to somehow optimize this solution, or advise some alternative options?!

Why isn't chr() outputting the correct character?

I'm working on a Caesar Cypher with Python 3 where s is the string input and k is the amount that you shift the letter. I'm currently just trying to work through getting a letter like 'z' to wrap around to equal 'B'(I know the case is wrong, I'll fix it later). However when I run caesarCipher using the the following inputs: s = 'z' and k = 2, the line: s[n] = chr((122-ord(s[n]) + 64 + k)) causes s[n] to equal 'D'. If i adjust it down two(logically on the unicode scale this would equal 'B'), it makes s[n] = #. What am I doing wrong on that line that's causing 'B' not to be the output?
def caesarCipher(s, k):
# Write your code here
n = 0
s = list(s)
while n < len(s):
if s[n].isalpha() == True:
if (ord(s[n].lower())+k) < 123:
s[n] = (chr(ord(s[n])+k))
n += 1
else:
s[n] = chr((122-ord(s[n]) + 64 + k))
else:
n += 1
s = ''.join(s)
return s
You forgot to add 1 to n in the test of (ord(s[n].lower())+k) < 123 so that it would count s[n] twice or more.
Change it to
else:
s[n] = chr((122 - ord(s[n]) + 64 + k))
n += 1
and if you input "z" and 2, you'll get "B"
print(caesarCipher("z", 2))
# B
and if you adjust it down two, you'll get "#", which is the previous previous character of B in ASCII.
...
else:
s[n] = chr((122 - ord(s[n]) + 62 + k))
n += 1
...
print(caesarCipher("z", 2))
# #

Error when performing pattern search on a randomly generated characters:

So I am trying to implement the Knuth-Morris-Pratt algorithm in Python, below is my implementation:
def KMP(Pattern, Chars):
# compute the start position (number of characters)of the longest suffix that matches the prefix
# Then store prefix and the suffix into the list K, and then set the first element of K to be 0 and the second element to be 1
K = [] # K[n] store the value so that if the mismatch happens at n, it should move pattern Pattern K[n] characters ahead.
n = -1
K.append(n) #add the first element, and keep n = 0.
for k in range (1,len(Pattern) + 1):
# traverse all the elements in Pattern, calculate the corresponding value for each element.
while(n >=0 and Pattern[n] != Pattern[k - 1]): # if n = 1, if n >=1 and the current suffix does not match then try a shorter suffix
n = K[n]
n = n + 1 # if it matches, then the matching position should be one character ahead
K.append(n) #record the matching position for k
#match the string Chars with Pattern
m = 0
for i in range(0, len(Chars)): #traverse through the list one by one
while(m >= 0 and Pattern[m] != Chars[i]): # if they do not match then move Pattern forward with K[m] characters and restart the comparison
m = K[m]
m = m + 1 #if position m matches, then move forward with the next position
if m == len(Pattern): # if m is already the end of K (or Pattern), then a fully matched pattern is found. Continue the comparison by moving Pattern forward K[m] characters
print(i - m + 1, i)
m = K[m]
def main():
Pattern = "abcba"
letters = "abc"
Chars = print ( ''.join(random.choice(letters) for i in range(1000)) )
kmp(Pattern, Chars)
if __name__ == '__main__':
main()
When I try to run this code for a list of randomly generated letters which are abc I get the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-25-c7bc734e5e35> in <module>
1 if __name__ == '__main__':
----> 2 main()
<ipython-input-24-2c3de20f253f> in main()
3 letters = "abc"
4 Chars = print ( ''.join(random.choice(letters) for i in range(1000)) )
----> 5 KMP(Pattern, Chars)
<ipython-input-21-edf1808c23d4> in KMP(Pattern, Chars)
14 #match the string Chars with Pattern
15 m = 0
---> 16 for i in range(0, len(Chars)): #traverse through the list one by one
17 while(m >= 0 and Pattern[m] != Chars[i]): # if they do not match then move Pattern forward with K[m] characters and restart the comparison
18 m = K[m]
TypeError: object of type 'NoneType' has no len()
I am not really sure what I am doing wrong, any help will be greatly appreciated
After I replaced
Chars = print ( ''.join(random.choice(letters) for i in range(1000)) )
by
Chars = ''.join(random.choice(letters) for i in range(1000))
it worked for me.

Python split string every n character

I need help finding a way to split a string every nth character, but I need it to overlap so as to get all the
An example should be clearer:
I would like to go from "BANANA" to "BA", "AN", "NA", "AN", "NA", "
Here's my code so far
import string
import re
def player1(s):
pos1 = []
inP1 = "AN"
p = str(len(inP1))
n = re.findall()
for n in range(len(s)):
if s[n] == inP1:
pos1.append(n)
points1 = len(pos1)
return points1
if __name__ == '__main__':
= "BANANA"
You can do this pretty simply with list comprehension;
input_string = "BANANA"
[input_string[i]+input_string[i+1] for i in range(0,len(input_string)-1)]
or for every nth character:
index_range = 3
[''.join([input_string[j] for j in range(i, i+index_range)]) for i in range(0,len(input_string)-index_range+1)]
This will iterate over each letter in the word banana, 0 through 6.
Then print each letter plus the next letter. Else statement for when the word reaches the last letter.
def splitFunc(word):
for i in range(0, len(word)-1):
if i < len(word):
print(word[i] + word[i+1])
else:
break
splitFunc("BANANA")
Hope this helps
Those are called n-grams.
This should work :)
text = "BANANA"
n = 2
chars = [c for c in text]
ngrams = []
for i in range(len(chars)-n + 1):
ngram = "".join(chars[i:i+n])
ngrams.append(ngram)
print(ngrams)
output: ['BA', 'AN', 'NA, 'AN', 'NA']

Printing on the same line different Permuations of a value

Hey guys so here is my question. I have written code that sums two prime numbers and prints the values less than or equal to 100 and even. How do I write it so that every combination of the number prints on the same line
like so
100 = 3 + 97 = 11 + 89
def isPrime(n):
limit = int(n ** 0.5) +1
for divisor in range (2, limit):
if (n % divisor == 0):
return False
return True
def main():
a = 0
b = 0
for n in range (4, 101):
if (n % 2 == 0):
for a in range (1, n + 1):
if isPrime(a):
for b in range (1, n + 1):
if isPrime(b):
if n == (a + b):
print ( n, "=", a, "+", b)
main()
any ideas?
I don't know too much about strings yet, but I was thinking we could set the string as n == a + b and some how repeat on the same line where n == n print the a + b statement or idk haha
One way to do this is to accumulate a and b pairs in some collection, then print a line containing all the pairs. Here's an example with some comments explaining whats going on and general Python tips:
def main():
for n in range (4, 101, 2): # range() can have third argument -> step
accumulator = []
for a in filter(isPrime, range(1, n + 1)): # filter() is useful if you want to skip some values
for b in filter(isPrime, range (1, n + 1)):
if n == (a + b):
accumulator.append((a,b)) # We accumulate instead of printing
str_accumulator = ["{} + {}".format(i[0], i[1]) for i in accumulator]
joined_accumulator = " = ".join(str_accumulator)
print("{} = {}".format(n, joined_accumulator))
Now, some explanation:
range(4, 101, 2) - as said in comment, it has an optional third argument. Some examples and explanations on how to use range in documentation.
filter() - Very useful generic iterator constructor. You pass a function that returns True/False, a collection, and you receive an iterator that spits out only those elements from the collection that are accepted by the function. See documentation.
str.format - For me, format is the best way to paste values into strings. It has PLENTY options and is very versatile. You should read the whole documentation here.
str.join - When you have a collection of string, and you want to make one string of them, join is what you want. It's much faster than str + str operation, and also you don't have to care if there is one or many elements in the collection. See documentation.

Resources