Grabbing the most duplicated letter from a string - python-3.x

What I want to get accomplished is an algorithm that finds the most duplicated letter from the entire list of strings. I'm new to Python so its taken me roughly two hours to get to this stage. The problem with my current code is that it returns every duplicated letter, when I'm only looking for the most duplicated letter. Additionally, I would like to know of a faster way that doesn't use two for loops.
Code:
rock_collections = ['aasdadwadasdadawwwwwwwwww', 'wasdawdasdasdAAdad', 'WaSdaWdasSwd', 'daWdAWdawd', 'QaWAWd', 'fAWAs', 'fAWDA']
seen = []
dupes = []
for words in rock_collections:
for letter in words:
if letter not in seen:
seen.append(letter)
else:
dupes.append(letter)
print(dupes)

If you are looking for the letter which appears the greatest number of times, I would recommend the following code:
def get_popular(strings):
full = ''.join(strings)
unique = list(set(full))
return max(
list(zip(unique, map(full.count, unique))), key=lambda x: x[1]
)
rock_collections = [
'aasdadwadasdadawwwwwwwwww',
'wasdawdasdasdAAdad',
'WaSdaWdasSwd',
'daWdAWdawd',
'QaWAWd',
'fAWAs',
'fAWDA'
]
print(get_popular(rock_collections)) # ('d', 19)
Let me break down the code for you:
full contains each of the strings together with without any letters between them. set(full) produces a set, meaning that it contains every unique letter only once. list(set(full)) makes this back into a list, meaning that it retains order when you iterate over the elements in the set.
map(full.count, unique) iterates over each of the unique letters and counts how many there are of them in the string. zip(unique, ...) puts those numbers with their respective letters. key=lambda x: x[1] is a way of saying, don't take the maximum value of the tuple, instead take the maximum value of the second element of the tuple (which is the number of times the letter appears). max finds the most common letter, using the aforementioned key.

Related

Palindrome problem - Trying to check 2 lists for equality python3.9

I'm writing a program to check if a given user input is a palindrome or not. if it is the program should print "Yes", if not "no". I realize that this program is entirely too complex since I actually only needed to check the whole word using the reversed() function, but I ended up making it quite complex by splitting the word into two lists and then checking the lists against each other.
Despite that, I'm not clear why the last conditional isn't returning the expected "Yes" when I pass it "racecar" as an input. When I print the lists in line 23 and 24, I get two lists that are identical, but then when I compare them in the conditional, I always get "No" meaning they are not equal to each other. can anyone explain why this is? I've tried to convert the lists to strings but no luck.
def odd_or_even(a): # function for determining if odd or even
if len(a) % 2 == 0:
return True
else:
return False
the_string = input("How about a word?\n")
x = int(len(the_string))
odd_or_even(the_string) # find out if the word has an odd or an even number of characters
if odd_or_even(the_string) == True: # if even
for i in range(x):
first_half = the_string[0:int((x/2))] #create a list with part 1
second_half = the_string[(x-(int((x/2)))):x] #create a list with part 2
else: #if odd
for i in range(x):
first_half = the_string[:(int((x-1)/2))] #create a list with part 1 without the middle index
second_half = the_string[int(int(x-1)/2)+1:] #create a list with part 2 without the middle index
print(list(reversed(second_half)))
print(list(first_half))
if first_half == reversed(second_half): ##### NOT WORKING BUT DONT KNOW WHY #####
print("Yes")
else:
print("No")
Despite your comments first_half and second_half are substrings of your input, not lists. When you print them out, you're converting them to lists, but in the comparison, you do not convert first_half or reversed(second_half). Thus you are comparing a string to an iterator (returned by reversed), which will always be false.
So a basic fix is to do the conversion for the if, just like you did when printing the lists out:
if list(first_half) == list(reversed(second_half)):
A better fix might be to compare as strings, by making one of the slices use a step of -1, so you don't need to use reversed. Try second_half = the_string[-1:x//2:-1] (or similar, you probably need to tweak either the even or odd case by one). Or you could use the "alien smiley" slice to reverse the string after you slice it out of the input: second_half = second_half[::-1].
There are a few other oddities in your code, like your for i in range(x) loop that overwrites all of its results except the last one. Just use x - 1 in the slicing code and you don't need that loop at all. You're also calling int a lot more often than you need to (if you used // instead of /, you could get rid of literally all of the int calls).

Sorting algoritm

I want to make my algorithm more efficient via deleting the items it already sorted, but i don't know how I can do it efficiently. The only way I found was to rewrite the whole list.
l = [] #Here you put your list
sl = [] # this is to store the list when it is sorted
a = 0 # variable to store which numbers he already looked for
while True: # loop
if len(sl) == len(l): #if their size is matching it will stop
print(sl) # print the sorted list
break
a = a + 1
if a in l: # check if it is in list
sl.append(a) # add to sorted list
#here i want it to be deleted from the list.
The variable a is a little awkward. It starts at 0 and increments 1 by 1 until it matches elements from the list l
Imagine if l = [1000000, 1200000, -34]. Then your algorithm will first run for 1000000 iterations without doing anything, just incrementing a from 0 to 1000000. Then it will append 1000000 to sl. Then it will run again 200000 iterations without doing anything, just incrementing a from 1000000 to 1200000.
And then it will keep incrementing a looking for the number -34, which is below zero...
I understand the idea behind your variable a is to select the elements from l in order, starting from the smallest element. There is a function that does that: it's called min(). Try using that function to select the smallest element from l, and append that element to sl. Then delete this element from l; otherwise, the next call to min() will select the same element again instead of selecting the next smallest element.
Note that min() has a disadvantage: it returns the value of the smallest element, but not its position in the list. So it's not completely obvious how to delete the element from l after you've found it with min(). An alternative is to write your own function that returns both the element, and its position. You can do that with one loop: in the following piece of code, i refers to a position in the list (0 is the position of the first element, 1 the position of the second, etc) and a refers to the value of that element. I left blanks and you have to figure out how to select the position and value of the smallest element in the list.
....
for i, a in enumerate(l):
if ...:
...
...
If you managed to do all this, congratulations! You have implemented "selection sort". It's a well-known sorting algorithm. It is one of the simplest. There exist many other sorting algorithms.

How to find length of shortest unique substring and number of occurrences of all unique substrings of same length in a given string

The problem is to find the length of the shortest unique substring and number of same length unique substring occurring in the string. For eg. "aatcc" will have "t" as the shortest length unique substring and length is 1 so the output will be 1,1. Another example is "aacc" here the output will be 2,3 as strings are aa,ac,cc
I tried to solve it but could come up only with a brute Force solution which is to loop over all possible substrings. It exceeded the time limit.
I googled it and found some references to suffix array but not quite clear about it.
So what is the optimal solution for this problem?
EDIT : Forgot to mention the key requirement of the solution of that was required for this problem and that is to NOT use any library functions other than input and output functions to read and write from and to the standard input and the standard output respectively.
EDIT: I have found another solution using trie data structure.
Pseudocode:
for i from 1 to length(string) do
for j from 0 to length(string)-1 do
1. create a substring of length i from jth character
2. if checkIfSeen(substring) then count-- else count++
close inner for loop
if count >= 1 then break
close outer for loop
print i(the length of the unique substring), count (no. of such substrings)
checkIfSeen(Substring) will use a trie data structure which
will run O(log l) where l is the average length of the prefixes.
The time complexity of this algorithm would be O(n^2 log l) where if the average length of the prefixes is n/2 then the time complexity would be O(n^2 log n). Please point out the mistakes if there are and also ways to improve this running time if possible.
Sorry, but keep in mind that my answer is based on program I wrote with Python, but can be applied to any programming language :)
Now I believe brute force approach is indeed what you need to do in this problem. But what we can do to shorten the time is:
1: start the brute force from the smallest substring length, which is
1.
2: after looping through the string with substring length 1 (the data
will look something like {"a":2, "t":1, "c":2} for "aatcc"), check if
any substring appeared only once. If it did, count the occurrence by
looping through the dictionary (in case of the example you gave, "t"
only appeared once, so occurrence is 1).
3: After the occurrence is counted, break the loop so that it does not
have to waste time on counting the rest of bigger substrings.
4: on 2:, if the unique substring was not found, reset the dictionary
and try a bigger substring (the data can be something like {"aa": 1, "ac":1,
"cc":1 for "aacc"}). Eventually the unique substring WILL be found no matter what (for example, in the string "aaaaa", the unique substring is "aaaaa" with the data {"aaaaa":1})
Here is the implementation in Python:
def countString(string):
for i in range(1, len(string)+1): #start the brute force from string length 1
dictionary = {}
for j in range(len(string)-i+1): #check every combination.
#count the substring occurrences
try:
dictionary[string[j:j+i]] += 1
except:
dictionary[string[j:j+i]] = 1
isUnique = False #loop stops if isUnique is True
occurrence= 0
for key in dictionary: #iterate through the dictionary
if dictionary[key] == 1: #check if any substring is unique
#if found, get ready to escape from the loop and increase the occurrence
isUnique = True
occurrence+=1
if isUnique:
return (i, occurrence)
print(countString("aacc")) #prints (2,3)
print(countString("aatcc")) #prints (1,1)
I am pretty sure that this design is fairly fast, but there always should be a better way. But anyway, I hope this helped :)

How to slice a list of strings till index of matched string depending on if-else condition

I have a list of strings =
['after','second','shot','take','note','of','the','temp']
I want to strip all strings after the appearance of 'note'.
It should return
['after','second','shot','take']
There are also lists which does not have the flag word 'note'.
So in case of a list of strings =
['after','second','shot','take','of','the','temp']
it should return the list as it is.
How to do that in a fast way? I have to repeat the same thing with many lists with unequal length.
tokens = [tokens[:tokens.index(v)] if v == 'note' else v for v in tokens]
There is no need of an iteration when you can slice list:
strings[:strings.index('note')+1]
where s is your input list of strings. The end slice is exclusive, hence a +1 makes sure 'note' is part.
In case of missing data ('note'):
try:
final_lst = strings[:strings.index('note')+1]
except ValueError:
final_lst = strings
if you want to make sure the flagged word is present:
if 'note' in lst:
lst = lst[:lst.index('note')+1]
Pretty much the same as #Austin's answer above.

merging some entries in a python list based on length of items

I have a list of about 20-30 items [strings].
I'm able to print them out in my program just fine - but I'd like to save some space, and merge items that are shorter...
So basically, if I have 2 consecutive items that the combined lengths are less than 30, I want to join those to items as a single entry in the list - with a / between them
I'm not coming up with a simple way of doing this.
I don't care if I do it in the same list, or make a new list of items... it's all happening inside 1 function...
You need to loop through the list and keep joining items till they satisfy your requirement (size 30). Then add them to a new list when an element grows that big.
l=[] # your new list
buff=yourList[0] if len(yourList)>0 else "" # hold strings till they reach desired length
for i in range(1,len(yourList)):
# check if concatenating will exceed the size or not
t=yourList[i]
if (len(buff) + len(t) + 1) <= 30:
buff+="/"+t
else:
l.append(buff)
buff=t
l.append(buff) # since last element is yet to be inserted
You can extend method of list as follows:
a = [1,2,3]
b = [4,5,6]
a.append('/')
a.extend(b)
You just need to check the size of two list a and b as per your requirements.
I hope I understood your problem !
This code worked for me, you can check to see if that's what you wanted, it's a bit lenghty but it works.
list1 = yourListOfElements
for elem in list1:
try: # Needs try/except otherwise last iteration would throw an indexerror
listaAUX = [] # Auxiliar list to check length and join smaller elements. You can probably eliminate this using list slicing
listaAUX.append(elem)
listaAUX.append(list1[list1.index(elem)+1])
if len(listaAUX[0]) + len(listaAUX[1]) < 30:
concatenated = '/'.join(listaAUX)
print(concatenated)
else:
print(elem)
except IndexError:
print(elem)

Resources