Efficient way to check if a specific character in a string appears consecutively - python-3.x

Say the character which we want to check if it appears consecutively in a string s is the dot '.'.
For example, 'test.2.1' does not have consecutive dots, whereas 'test..2.2a...' has consecutive dots. In fact, for the second example, we should not even bother with checking the rest of the string after the first occurence of the consecutive dots.
I have come up with the following simple method:
def consecutive_dots(s):
count = 0
for c in data:
if c == '.':
count += 1
if count == 2:
return True
else:
count = 0
return False
I was wondering if the above is the most 'pythonic' and efficient way to achieve this goal.

You can just use in to check if two consecutive dots (that is, the string "..") appear in the string s
def consecutive_dots(s):
return '..' in s

Related

Replace string only if all characters match (Thai)

The problem is that มาก technically is in มาก็. Because มาก็ is มาก + ็.
So when I do
"แชมพูมาก็เยอะ".replace("มาก", " X ")
I end up with
แชมพู X ็เยอะ
And what I want
แชมพู X เยอะ
What I really want is to force the last character ก็ to count as a single character, so that มาก no longer matches มาก็.
While I haven't found a proper solution, I was able to find a solution. I split each string into separate (combined) characters via regex. Then I compare those lists to each other.
# Check is list is inside other list
def is_slice_in_list(s,l):
len_s = len(s) #so we don't recompute length of s on every iteration
return any(s == l[i:len_s+i] for i in range(len(l) - len_s+1))
def is_word_in_string(w, s):
a = regex.findall(u'\X', w)
b = regex.findall(u'\X', s)
return is_slice_in_list(a, b)
assert is_word_in_string("มาก็", "พูมาก็เยอะ") == True
assert is_word_in_string("มาก", "พูมาก็เยอะ") == False
The regex will split like this:
พู ม า ก็ เ ย อ ะ
ม า ก
And as it compares ก็ to ก the function figures the words are not the same.
I will mark as answered but if there is a nice or "proper" solution I will chose that one.

Recursive function how to manage output

I'm working on a project for creating some word list. I have a word and some rules, for example, this char % is for digit, while this one ^ for special character, for example January%%^ should create things like:
January00!
January01!
January02!
January03!
January04!
January05!
January06!
etc.
For now I'm trying to do it with only digit and create a recursive function, because people can add as many digits and special characters as they want
January^%%%^% (for example)
This is the first function I have created:
month = "January"
nbDigit = "%%%"
def addNumber(month : list, position: int):
for i in range(position, len(month)):
for j in range(0,10):
month[position] = j
if(position == len(month)-1):
print (''.join(str(v) for v in month))
if position < len(month):
if month[position+1] == "%":
addNumber(month, position+1)
The problem is for each % that I have there is another output (three %, three times as output January000-January999/January000-January999/January000-January999).
When I tried to add the new function special character it's even worse, because I can't manage the output since every word can't end with a special character or digit. (AddSpecialChar is also a recursive function).
I believe what you are looking for is the following:
month = 'January'
nbDigit = "%%"
def addNumbers(root: str, mask: str)-> list:
# create a list of words using root followed By digits
rslt = []
mxNmb = 0
for i in range(len(mask)):
mxNmb += 9 * 10**i
mxNmb += 1
for i in range(mxNmb):
word = f"{root}{((str(i).rjust(len(mask), '0')))}"
rslt.append(word)
return rslt
this will produce:
['January00',
'January01',
'January02',
'January03',
'January04',
'January05',
'January06',
'January07',
'January08',
'January09',
'January10',
'January11',
'January12',
'January13',
'January14',
'January15',
'January16',
'January17',
'January18',
'January19',
'January20',
'January21',
'January22',
'January23',
'January24',
'January25',
'January26',
'January27',
'January28',
'January29',
'January30',
'January31',
'January32',
'January33',
'January34',
'January35',
'January36',
'January37',
'January38',
'January39',
'January40',
'January41',
'January42',
'January43',
'January44',
'January45',
'January46',
'January47',
'January48',
'January49',
'January50',
'January51',
'January52',
'January53',
'January54',
'January55',
'January56',
'January57',
'January58',
'January59',
'January60',
'January61',
'January62',
'January63',
'January64',
'January65',
'January66',
'January67',
'January68',
'January69',
'January70',
'January71',
'January72',
'January73',
'January74',
'January75',
'January76',
'January77',
'January78',
'January79',
'January80',
'January81',
'January82',
'January83',
'January84',
'January85',
'January86',
'January87',
'January88',
'January89',
'January90',
'January91',
'January92',
'January93',
'January94',
'January95',
'January96',
'January97',
'January98',
'January99']
Adding another position to the nbDigit variable will produce the numeric sequence from 000 to 999

Making one string the anagram of other

I have a problem where two strings of same length are given, and I have to tell how many letters I have to change in the first string to make it an anagram of the second.
Here is what I did:
count = 0
Mutable_str = ''.join(sorted("hhpddlnnsjfoyxpci"))
Ref_str = ''.join(sorted("ioigvjqzfbpllssuj"))
i = 0
while i < len(Mutable_str):
if Mutable_str[i] != Ref_str[i]:
count += 1
i += 1
print(count)
My algorithm in this case returned 16 as result. But the correct answer is 10. Can someone tell me what is wrong in my code?
Thank you very much!
You need to use str.count
So you need to add up the differences between the number of occurrences of each character in the different strings. This can be done with str.count(c) where c is each distinct character in the second string (got with set()). We then need to use max() on the difference with 0 so that if the difference is negative this doesn't effect the total differences.
So as you can see, it boils down to one neat little one-liner:
def changes(s1, s2):
return sum(max(0, s2.count(c) - s1.count(c)) for c in set(s2))
and some tests:
>>> changes("hhpddlnnsjfoyxpci", "ioigvjqzfbpllssuj")
10
>>> changes("abc", "bcd")
1
>>> changes("jimmy", "bobby")
4

Searching a minimal string meeting some conditions

Recently, I was asked the following problem during an interview.
Given a string S, I need to find another string S2 such that S2 is a subsequence of S and also S is a subsequence of S2+reverse(S2). Here '+' means concatenation. I need to output the min possible length of S2 for given S.
I was told that this is a dynamic programming problem however I was unable to solve it. Can somebody help me with this problem?
EDIT-
Is there a way to do this in O(N2) or less.
There are 2 important aspects in this problem.
Since we need S as a substring of S2+reverse(S2), S2 should have
atleast n/2 length.
After concatenation of S2 and reverse(S2), there is a pattern where
the alphabets repeats such as
So the solution is to check from the center of S to end of S for any consecutive elements. If you find one then check the elements on either side as shown.
Now if you are able to reach till the end of the string, then the minimum number of elements (result) is the distance from start to the point where you find consecutive elements. In this example its C i.e 3.
We know that this may not happen always. i.e you may not be able to find consecutive elements at the center. Let us say the consecutive elements are after the center then we can do the same test.
Main string
Substring
Concatenated string
Now arrives the major doubt. Why we consider only the left side starting from center? The answer is simple, the concatenated string is made by S+reverse(S). So we are sure that the last element in the substring comes consecutive in the concatenated string. There is no way that any repetition in the first half of the main string can give a better result because at least we should have the n alphabets in the final concatenated string
Now the matter of complexity:
Searching for consecutive alphabets give a maximum of O(n)
Now checking elements on either side iteratively can give a worst case complexity of O(n). i.e maximum n/2 comparisons.
We may fail many times doing the second check so the we have a multiplicative relation between the complexities i.e O(n*n).
I believe this is a correct solution and didn't find any loophole yet.
Let's say that S2 is "apple". Then we can make this assumption:
S2 + reverseS2 >= S >= S2
"appleelppa" >= S >= "apple"
So the given S will something including "apple" to not more than "appleelppe". It could be "appleel" or "appleelpp".
String S ="locomotiffitomoc";
// as you see S2 string is "locomotif" but
// we don't know S2 yet, so it's blank
String S2 = "";
for (int a=0; a<S.length(); a++) {
try {
int b = 0;
while (S.charAt(a - b) == S.charAt(a + b + 1))
b++;
// if this for loop breaks that means that there is a character that doesn't match the rule
// if for loop doesn't break but throws an exception we found it.
} catch (Exception e) {
// if StringOutOfBoundsException is thrown this means end of the string.
// you can check this manually of course.
S2 = S.substring(0,a+1);
break;
}
}
System.out.println(S2); // will print out "locomotif"
Congratulations, you found the minimum S2.
Each character from S can be includes in S2 or not. With that we can construct recursion that tries two cases:
first character of S is used for cover,
first character of S is not
used for cover,
and calculate minimum of these two covers. To implement this, it is enough to track how much of S is covered with already chosen S2+reverse(S2).
There are optimizations where we know what result is (found cover, can't have cover), and it is not needed to take first character for cover if it will not cover something.
Simple python implementation:
cache = {}
def S2(S, to_cover):
if not to_cover: # Covered
return ''
if not S: # Not covered
return None
if len(to_cover) > 2*len(S): # Can't cover
return None
key = (S, to_cover)
if key not in cache:
without_char = S2(S[1:], to_cover) # Calculate with first character skipped
cache[key] = without_char
_f = to_cover[0] == S[0]
_l = to_cover[-1] == S[0]
if _f or _l:
# Calculate with first character used
with_char = S2(S[1:], to_cover[int(_f):len(to_cover)-int(_l)])
if with_char is not None:
with_char = S[0] + with_char # Append char to result
if without_char is None or len(with_char) <= len(without_char):
cache[key] = with_char
return cache[key]
s = '21211233123123213213131212122111312113221122132121221212321212112121321212121132'
c = S2(s, s)
print len(s), s
print len(c), c

Does Python have a string contains how many substring method, allowing for overlap?

I want to count a long string contain how many substring, how to do it in python?
"12212"
contains 2x "12"
how to get the count number?
It must allow for overlaping substrings; for instance "1111" contains 3 "11" substrings.
"12121" contains 2 "121" substrings.
"1111".count("11")
will return 2. It does not count any overlaps.
Strings have a method count
You can do
s = '12212'
s.count('12') # this equals 2
Edited for the changing question, the answer below was posted as a comment by tobias_k
To count with overlap,
count_all = lambda string, sub: sum(string[i:i+len(sub)] == sub for i in range(len(string) - len(sub) + 1))
This can be called with,
count_all('1111', '11') # this returns 3

Resources