Count number of continuous matching elements in two different numbers in Python - python-3.x

Suppose
we have two numbers a and b we need to calculate the continuous matching digits between the two numbers.
some examples are shown below:
a = 123456 b = 456 ==> I need count as : 3 digits matching
a = 556789 b = 55678 ==> I need count as : 5 digits matching
I don't want unique but continuous matching numbers and need the count. Also display the matching ones will be helpful. Also can we can we do in two different lists if numbers?
I am very new to python and trying out few things. Thanks

Given two numbers a and b:
a = 123456
b = 456
First you need to covert them to strings:
a_str = str(a)
b_str = str(b)
Then you need to check if there is a continuous match of b_str in a_str:
if b_str in a_str:
...
Finally you can check the length of b_str:
len(b_str)
This is the complete function:
def count_matching_elements(a, b):
a_str, b_str = str(a), str(b)
if b_str in a_str:
return len(b_str)
else:
return -1 # no matches

What you want here is know as the Longest common substring, you can find it like this (this code can be found here Find common substring between two strings, just a little difference that you actually want the len(answer)) :
def longestSubstringFinder(string1, string2):
answer = ""
len1, len2 = len(string1), len(string2)
for i in range(len1):
match = ""
for j in range(len2):
if (i + j < len1 and string1[i + j] == string2[j]):
match += string2[j]
else:
if (len(match) > len(answer)): answer = match
match = ""
return len(answer)
Note that a and b would have to be strings

Related

Recursive function how to manage output

I'm working on a project for creating some word list. I have a word and some rules, for example, this char % is for digit, while this one ^ for special character, for example January%%^ should create things like:
January00!
January01!
January02!
January03!
January04!
January05!
January06!
etc.
For now I'm trying to do it with only digit and create a recursive function, because people can add as many digits and special characters as they want
January^%%%^% (for example)
This is the first function I have created:
month = "January"
nbDigit = "%%%"
def addNumber(month : list, position: int):
for i in range(position, len(month)):
for j in range(0,10):
month[position] = j
if(position == len(month)-1):
print (''.join(str(v) for v in month))
if position < len(month):
if month[position+1] == "%":
addNumber(month, position+1)
The problem is for each % that I have there is another output (three %, three times as output January000-January999/January000-January999/January000-January999).
When I tried to add the new function special character it's even worse, because I can't manage the output since every word can't end with a special character or digit. (AddSpecialChar is also a recursive function).
I believe what you are looking for is the following:
month = 'January'
nbDigit = "%%"
def addNumbers(root: str, mask: str)-> list:
# create a list of words using root followed By digits
rslt = []
mxNmb = 0
for i in range(len(mask)):
mxNmb += 9 * 10**i
mxNmb += 1
for i in range(mxNmb):
word = f"{root}{((str(i).rjust(len(mask), '0')))}"
rslt.append(word)
return rslt
this will produce:
['January00',
'January01',
'January02',
'January03',
'January04',
'January05',
'January06',
'January07',
'January08',
'January09',
'January10',
'January11',
'January12',
'January13',
'January14',
'January15',
'January16',
'January17',
'January18',
'January19',
'January20',
'January21',
'January22',
'January23',
'January24',
'January25',
'January26',
'January27',
'January28',
'January29',
'January30',
'January31',
'January32',
'January33',
'January34',
'January35',
'January36',
'January37',
'January38',
'January39',
'January40',
'January41',
'January42',
'January43',
'January44',
'January45',
'January46',
'January47',
'January48',
'January49',
'January50',
'January51',
'January52',
'January53',
'January54',
'January55',
'January56',
'January57',
'January58',
'January59',
'January60',
'January61',
'January62',
'January63',
'January64',
'January65',
'January66',
'January67',
'January68',
'January69',
'January70',
'January71',
'January72',
'January73',
'January74',
'January75',
'January76',
'January77',
'January78',
'January79',
'January80',
'January81',
'January82',
'January83',
'January84',
'January85',
'January86',
'January87',
'January88',
'January89',
'January90',
'January91',
'January92',
'January93',
'January94',
'January95',
'January96',
'January97',
'January98',
'January99']
Adding another position to the nbDigit variable will produce the numeric sequence from 000 to 999

Check how many consecutive times appear in a string

I want to to display a number or an alphabet which appears mostly
consecutive in a given string or numbers or both.
Example:
s= 'aabskeeebadeeee'
output: e appears 4 consecutive times
I thought about set the string then and for each element loop the string to check if equal with element set element if so count =+1 and check if next to it is not equal add counter value to list with same index as in set, if is add counter value to li list if value is bigger than existing.
The problem is error index out or range although I think I am watching it.
s = 'aabskeeebadeeee'
c = 0
t = list(set(s)) # list of characters in s
li=[0,0,0,0,0,0] # list for counted repeats
print(t)
for x in t:
h = t.index(x)
for index, i in enumerate(s):
maximus = len(s)
if i == x:
c += 1
if index < maximus:
if s[index +1] != x: # if next element is not x
if c > li[h]: #update c if bigger than existing
li[h] = c
c = 0
else:
if c > li[h]:
li[h] = c
for i in t:
n = t.index(i)
print(i,li[n])
print(f'{s[li.index(max(li))]} appears {max(li)} consecutive times')
Here is an O(n) time, O(1) space solution, that breaks ties by returning the earlier seen character:
def get_longest_consecutive_ch(s):
count = max_count = 0
longest_consecutive_ch = previous_ch = None
for ch in s:
if ch == previous_ch:
count += 1
else:
previous_ch = ch
count = 1
if count > max_count:
max_count = count
longest_consecutive_ch = ch
return longest_consecutive_ch, max_count
s = 'aabskeeebadeeee'
longest_consecutive_ch, count = get_longest_consecutive_ch(s)
print(f'{longest_consecutive_ch} appears {count} consecutive times in {s}')
Output:
e appears 4 consecutive times in aabskeeebadeeee
Regex offers a concise solution here:
inp = "aabskeeebadeeee"
matches = [m.group(0) for m in re.finditer(r'([a-z])\1*', inp)]
print(matches)
matches.sort(key=len, reverse=True)
print(matches[0])
This prints:
['aa', 'b', 's', 'k', 'eee', 'b', 'a', 'd', 'eeee']
eeee
The strategy here is to find all islands of similar characters using re.finditer with the regex pattern ([a-z])\1*. Then, we sort the resulting list descending by length to find the longest sequence.
Alternatively, you can leverage the power of itertools.groupby() to approach this type of problem (for quick counting for similar items in groups. [Note, this can be applied to some broader cases, eg. numbers]
from itertools import groupby
>>> char_counts = [str(len(list(g)))+k for k, g in groupby(s)]
>>> char_counts
['2a', '1b', '1s', '1k', '3e', '1b', '1a', '1d', '4e']
>>> max(char_counts)
'4e'
# you can continue to do the rest of splitting, or printing for your needs...
>>> ans = '4e' # example
>>> print(f' the most frequent character is {ans[-1]}, it appears {ans[:-1]} ')
Output:
the most frequent character is e, it appears 4
This answer was posted as an edit to the question Check how many consecutive times appear in a string by the OP Ziggy Witkowski under CC BY-SA 4.0.
I did not want to use any libraries.
s = 'aabskaaaabadcccc'
lil = tuple(set(s)) # Set a characters in s to remove duplicates and
then make a tuple
li=[0,0,0,0,0,0] # list for counted repeats, the index of number
repeats for character
# will be equal to index of its character in a tuple
for i in lil: #iter over tuple of letters
c = 0 #counter
h= lil.index(i) #take an index
for letter in s: #iterate ove the string characters
if letter == i: # check if equal with character from tuple
c += 1 # if equal Counter +1
if c > li[lil.index(letter)]: # Updated the counter if present is bigger than the one stored.
li[lil.index(letter)] = c
else:
c=0
continue
m = max(li)
for index, j in enumerate(li): #Check if the are
characters with same max value
if li[index] == m:
print(f'{lil[index]} appears {m} consecutive times')
Output:
c appears 4 consecutive times
a appears 4 consecutive times

Python - Iterate through a list, swapping alphbets by indexes?

I need to write a code that makes alphabets rotate, through 2 lists.
So I need to define a function, let's say it is called rotate_text.
2 parameters are passed, 1 is string and 1 is integer.
This is my code so far:
def rotate_text(text, n):
plaintext = ['ABCDEFGHIJKLMNOPQRSTUVWXYZ']
ciphertext = ['FGHIJKLMNOPQRSTUVWYXZABCDE']
rotated_text = []
for i in plaintext:
rotated_text = ciphertext[plaintext[i + n]]
result = ''.join(rotated_text)
return result
So what it needs to do is, if I put ABC for the parameter text and for 2 for the parameter n,
A should return CDE as the result. Or DOG and 11 should return OBK. I don't really think I need that cipertext list so I think I will take that out, but how do I make this code work?
If the program gets ABC as the text, it should find A's index from plaintext list and + n to that index, and find the letter satisfies with plused n index from plaintext list and then.... I am getting a headache.
Can anyone help?
How about this code? A text should be only capital character.
def rotate_text(text, n):
for i in len(text):
number = ord(text[i]) - ord('A')
number = (number + n) % 26
text[i] = chr(number + ord('A'))
return text
If you want to use lower case too, you should use if statements.

Return number of alphabetical substrings within input string

I'm trying to generate code to return the number of substrings within an input that are in sequential alphabetical order.
i.e. Input: 'abccbaabccba'
Output: 2
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def cake(x):
for i in range(len(x)):
for j in range (len(x)+1):
s = x[i:j+1]
l = 0
if s in alphabet:
l += 1
return l
print (cake('abccbaabccba'))
So far my code will only return 1. Based on tests I've done on it, it seems it just returns a 1 if there are letters in the input. Does anyone see where I'm going wrong?
You are getting the output 1 every time because your code resets the count to l = 0 on every pass through the loop.
If you fix this, you will get the answer 96, because you are including a lot of redundant checks on empty strings ('' in alphabet returns True).
If you fix that, you will get 17, because your test string contains substrings of length 1 and 2, as well as 3+, that are also substrings of the alphabet. So, your code needs to take into account the minimum substring length you would like to consider—which I assume is 3:
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def cake(x, minLength=3):
l = 0
for i in range(len(x)):
for j in range(i+minLength, len(x)): # carefully specify both the start and end values of the loop that determines where your substring will end
s = x[i:j]
if s in alphabet:
print(repr(s))
l += 1
return l
print (cake('abccbaabccba'))

Change Letters in A String One at a Time (Pandas,Python3)

I have a list of words in Pandas (DF)
Words
Shirt
Blouse
Sweater
What I'm trying to do is swap out certain letters in those words with letters from my dictionary one letter at a time.
so for example:
mydict = {"e":"q,w",
"a":"z"}
would create a new list that first replaces all the "e" in a list one at a time, and then iterates through again replacing all the "a" one at a time:
Words
Shirt
Blouse
Sweater
Blousq
Blousw
Swqater
Swwater
Sweatqr
Sweatwr
Swezter
I've been looking around at solutions here: Mass string replace in python?
and have tried the following code but it changes all instances "e" instead of doing so one at a time -- any help?:
mydict = {"e":"q,w"}
s = DF
for k, v in mydict.items():
for j in v:
s['Words'] = s["Words"].str.replace(k, j)
DF["Words"] = s
this doesn't seem to work either:
s = DF.replace({"Words": {"e": "q","w"}})
This answer is very similar to Brian's answer, but a little bit sanitized and the output has no duplicates:
words = ["Words", "Shirt", "Blouse", "Sweater"]
md = {"e": "q,w", "a": "z"}
md = {k: v.split(',') for k, v in md.items()}
newwords = []
for word in words:
newwords.append(word)
for c in md:
occ = word.count(c)
pos = 0
for _ in range(occ):
pos = word.find(c, pos)
for r in md[c]:
tmp = word[:pos] + r + word[pos+1:]
newwords.append(tmp)
pos += 1
Content of newwords:
['Words', 'Shirt', 'Blouse', 'Blousq', 'Blousw', 'Sweater', 'Swqater', 'Swwater', 'Sweatqr', 'Sweatwr', 'Swezter']
Prettyprint:
Words
Shirt
Blouse
Blousq
Blousw
Sweater
Swqater
Swwater
Sweatqr
Sweatwr
Swezter
Any errors are a result of the current time. ;)
Update (explanation)
tl;dr
The main idea is to find the occurences of the character in the word one after another. For each occurence we are then replacing it with the replacing-char (again one after another). The replaced word get's added to the output-list.
I will try to explain everything step by step:
words = ["Words", "Shirt", "Blouse", "Sweater"]
md = {"e": "q,w", "a": "z"}
Well. Your basic input. :)
md = {k: v.split(',') for k, v in md.items()}
A simpler way to deal with replacing-dictionary. md now looks like {"e": ["q", "w"], "a": ["z"]}. Now we don't have to handle "q,w" and "z" differently but the step for replacing is just the same and ignores the fact, that "a" only got one replace-char.
newwords = []
The new list to store the output in.
for word in words:
newwords.append(word)
We have to do those actions for each word (I assume, the reason is clear). We also append the world directly to our just created output-list (newwords).
for c in md:
c as short for character. So for each character we want to replace (all keys of md), we do the following stuff.
occ = word.count(c)
occ for occurrences (yeah. count would fit as well :P). word.count(c) returns the number of occurences of the character/string c in word. So "Sweater".count("o") => 0 and "Sweater".count("e") => 2.
We use this here to know, how often we have to take a look at word to get all those occurences of c.
pos = 0
Our startposition to look for c in word. Comes into use in the next loop.
for _ in range(occ):
For each occurence. As a continual number has no value for us here, we "discard" it by naming it _. At this point where c is in word. Yet.
pos = word.find(c, pos)
Oh. Look. We found c. :) word.find(c, pos) returns the index of the first occurence of c in word, starting at pos. At the beginning, this means from the start of the string => the first occurence of c. But with this call we already update pos. This plus the last line (pos += 1) moves our search-window for the next round to start just behind the previous occurence of c.
for r in md[c]:
Now you see, why we updated mc previously: we can easily iterate over it now (a md[c].split(',') on the old md would do the job as well). So we are doing the replacement now for each of the replacement-characters.
tmp = word[:pos] + r + word[pos+1:]
The actual replacement. We store it in tmp (for debug-reasons). word[:pos] gives us word up to the (current) occurence of c (exclusive c). r is the replacement. word[pos+1:] adds the remaining word (again without c).
newwords.append(tmp)
Our so created new word tmp now goes into our output-list (newwords).
pos += 1
The already mentioned adjustment of pos to "jump over c".
Additional question from OP: Is there an easy way to dictate how many letters in the string I want to replace [(meaning e.g. multiple at a time)]?
Surely. But I have currently only a vague idea on how to achieve this. I am going to look at it, when I got my sleep. ;)
words = ["Words", "Shirt", "Blouse", "Sweater", "multipleeee"]
md = {"e": "q,w", "a": "z"}
md = {k: v.split(',') for k, v in md.items()}
num = 2 # this is the number of replaces at a time.
newwords = []
for word in words:
newwords.append(word)
for char in md:
for r in md[char]:
pos = multiples = 0
current_word = word
while current_word.find(char, pos) != -1:
pos = current_word.find(char, pos)
current_word = current_word[:pos] + r + current_word[pos+1:]
pos += 1
multiples += 1
if multiples == num:
newwords.append(current_word)
multiples = 0
current_word = word
Content of newwords:
['Words', 'Shirt', 'Blouse', 'Sweater', 'Swqatqr', 'Swwatwr', 'multipleeee', 'multiplqqee', 'multipleeqq', 'multiplwwee', 'multipleeww']
Prettyprint:
Words
Shirt
Blouse
Sweater
Swqatqr
Swwatwr
multipleeee
multiplqqee
multipleeqq
multiplwwee
multipleeww
I added multipleeee to demonstrate, how the replacement works: For num = 2 it means the first two occurences are replaced, after them, the next two. So there is no intersection of the replaced parts. If you would want to have something like ['multiplqqee', 'multipleqqe', 'multipleeqq'], you would have to store the position of the "first" occurence of char. You can then restore pos to that position in the if multiples == num:-block.
If you got further questions, feel free to ask. :)
Because you need to replace letters one at a time, this doesn't sound like a good problem to solve with pandas, since pandas is about doing everything at once (vectorized operations). I would dump out your DataFrame into a plain old list and use list operations:
words = DF.to_dict()["Words"].values()
for find, replace in reversed(sorted(mydict.items())):
for word in words:
occurences = word.count(find)
if not occurences:
print word
continue
start_index = 0
for i in range(occurences):
for replace_char in replace.split(","):
modified_word = list(word)
index = modified_word.index(find, start_index)
modified_word[index] = replace_char
modified_word = "".join(modified_word)
print modified_word
start_index = index + 1
Which gives:
Words
Shirt
Blousq
Blousw
Swqater
Swwater
Sweatqr
Sweatwr
Words
Shirt
Blouse
Swezter
Instead of printing the words, you can append them to a list and re-create a DataFrame if that's what you want to end up with.
If you are looping, you need to update s at each cycle of the loop. You also need to loop over v.
mydict = {"e":"q,w"}
s=deduped
for k, v in mydict.items():
for j in v:
s = s.replace(k, j)
Then reassign it to your dataframe:
df["Words"] = s
If you can write this as a function that takes in a 1d array (list, numpy array etc...), you can use df.apply to apply it to any column, using df.apply().

Resources