Stable sorting without using sort libraries - Python - python-3.x

I want to sort a list of words by word length but maintain the original order of words with same length (stable sorting).
For example:
words = ['dog', 'cat', 'hello', 'girl', 'py', 'book']
Should become
words = ['py', 'dog', 'cat', 'girl', 'book', 'hello']
I know this can be done easily with python's sorted function, which is stable:
sorted(words, key = lambda l:len(l))
But how would one do this efficiently without access to the library? I've been trying to brainstorm a solution and I'm thinking
Create a dict that will contain the index (key) and length (value) of each word in words
Maybe use a Counter to track how many instances of each length word there is (for this ex., would look like Counter({3: 2, 5: 1, 4: 2, 2: 1}))
Go through the dict looking for the minimum length of a word in counter (2 in my ex), and keep doing that for amt of times that length appears
Step 3 but moving onto next minimum in Counter
But this seems super inefficient. Could someone help with a better implementation?

You can just use a nomral bubble sorting technique for this.
Algo
Loop from 0 to len(word)-1
Nested loop from i+1 to len(words)
compare if words[i] > words[j]
Do swap, words[i] = words[j]
words = ['dog', 'cat', 'hello', 'girl', 'py', 'book']
# first loop
for i in range(len(words)-1):
# second loop
for j in range(i+1, len(words)):
# swapping if first word is bigger than the second
if len(words[i]) > len(words[j]):
temp = words[i]
words[i] = words[j]
words[j] = temp
print(words)
Output
['py', 'cat', 'dog', 'girl', 'book', 'hello']
This solves your problem in the time complexity of O(N*N). So kind of costly. But you can use it for a inputs of the length till 10^3

One solution, is to make dictionary buckets, where keys are length of the words (integer) and values are lists storing words of the same length. You can construct this dict in linear time.
Then traverse the dictionary from 0 to max_bucket, where max_bucket is maximal lenght of observed word:
words = ['dog', 'cat', 'hello', 'girl', 'py', 'book']
buckets = {}
max_bucket = -1
for w in words:
buckets.setdefault(len(w), []).append(w)
if len(w) > max_bucket:
max_bucket = len(w)
out = [w for i in range(max_bucket+1) for w in buckets.get(i, [])]
print(out)
Prints:
['py', 'dog', 'cat', 'girl', 'book', 'hello']

Related

How do I count the amount of 'edges' in a word in Python, using a dictionary?

For this exercise I need to count the 'number of edges' in the word 'AARDGASRESERVES' (Dutch), using a for loop and a dictionary. I already have the following code, but this doesn't give me the required output (which should be 31). I believe the middle part isn't quite right yet.
# dictionary of number of edges per letter
edges = {"A": 2, "R": 2, "D": 0,"G": 2,"S": 2,"R": 2,"E": 3, "V": 2}
word = "AARDGASRESERVES"
# loop over the letters in the word and count the number of edges
total_edges = 0
for key in edges:
if edges[key] == word:
total_edges += [value]
# print the total
print("Total number of edges:", total_edges)
I tried if edges[key] in word: too but that results in an error. I'm still new to Python so I might have written something that isn't even possible.
A simple way to do what you want is:
edges = {"A": 2, "R": 2, "D": 0,"G": 2,"S": 2,"R": 2,"E": 3, "V": 2}
word = "AARDGASRESERVES"
total_edges = sum([edges[c] for c in word])
# print the total
print("Total number of edges:", total_edges) # Output: 31
However, I advise you to carefully examine your code to understand why it doesn't work. For example here:
...
if edges[key] == word:
...
you compare if a number (value of your dict) is equal to the whole word, which is an irrational if-statement.
From what I understand about the purpose of this code. It should go about this
# dictionary of number of edges per letter
edges = {"A": 2, "R": 2, "D": 0,"G": 2,"S": 2,"R": 2,"E": 3, "V": 2}
word = "AARDGASRESERVES"
# loop over the letters in the word and count the number of edges
total_edges = 0
for key in word:
total_edges += edges[key]
# print the total
print("Total number of edges:", total_edges)
What you want to do is to get a letter from the word and get the specific element associated with that key and add it.

Python string duplicates

I have a list
a=['apple', 'elephant', 'ball', 'country', 'lotus', 'potato']
I am trying to find largest element in the list with no duplicates.
For example script should return "country" as it doesn't have any duplicates.
Please help
You could also use collections.Counter for this:
from collections import Counter
a = ['apple', 'elephant', 'ball', 'country', 'lotus', 'potato']
a = set(a)
no_dups = []
for word in a:
counts = Counter(word)
if all(v == 1 for v in counts.values()):
no_dups.append(word)
print(max(no_dups, key = len))
Which follows this procedure:
Converts a to a set, since we only need to look at a word once, just in case a contains duplicates.
Creates a Counter() object of each word.
Only appends words that have a count of 1 for each letter, using all().
Get longest word from this resultant list, using max().
Note: This does not handle ties, you may need to do further work to handle this.
def has_dup(x):
unique = set(x) # pick unique letters
return any([x.count(e) != 1 for e in unique]) # find if any letter appear more than once
def main():
a = ['apple', 'elephant', 'ball', 'country', 'lotus', 'potato']
a = [e for e in a if not has_dup(e)] # filter out duplicates
chosen = max(a, key=len) # choose with max length
print(chosen)
if __name__ == '__main__':
main()

Which character comes first?

So the input is word and I want to know if a or b comes first.
I can use a_index = word.find('a') and compare this to b_index = word.find('b') and if a is first, a is first is returned. But if b isn't in word, .find() will return -1, so simply comparing b_index < a_index would return b is first. This could be accomplished by adding more if-statements, but is there a cleaner way?
function description:
input: word, [list of characters]
output: the character in the list that appears first in the word
Example: first_instance("butterfly", ['a', 'u', 'e'] returns u
You can create a function that takes word and a list of chars - convert those chars into a set for fast lookup and looping over word take the first letter found, eg:
# Chars can be any iterable whose elements are characters
def first_of(word, chars):
# Remove duplicates and get O(1) lookup time
lookup = set(chars)
# Use optional default argument to next to return `None` if no matches found
return next((ch for ch in word if ch in lookup), None)
Example:
>>> first_of('bob', 'a')
>>> first_of('bob', 'b')
'b'
>>> first_of('abob', 'ab')
'a'
>>> first_of("butterfly", ['a', 'u', 'e'])
'u'
This way you're only ever iterating over word once and short-circuit on the first letter found instead of running multiple finds, storing the results and then computing the lowest index.
Make a list without the missing chars and then sort it by positions.
def first_found(word, chars):
places = [x for x in ((word.find(c), c) for c in chars) if x[0] != -1]
if not places:
# no char was found
return None
else:
return min(places)[1]
In any case you need to check the type of the input:
if isinstance(your_input, str):
a_index = your_input.find('a')
b_index = your_input.find('b')
# Compare the a and b indexes
elif isinstance(your_input, list):
a_index = your_input.index('a')
b_index = your_input.index('b')
# Compare the a and b indexes
else:
# Do something else
EDIT:
def first_instance(word, lst):
indexes = {}
for c in lst:
if c not in indexes:
indexes[c] = word.find(c)
else:
pass
return min(indexes, key=indexes.get)
It will return the character from list lst which comes first in the word.
If you need to return the index of this letter then replace the return statement with this:
return min_value = indexes[min(indexes, key=indexes.get)]

convert a list of lists to a list of string

I have a list of lists like this
list1 = [['I am a student'], ['I come from China'], ['I study computer science']]
len(list1) = 3
Now I would like to convert it into a list of string like this
list2 = ['I', 'am', 'a', 'student','I', 'come', 'from', 'China', 'I','study','computer','science']
len(list2) = 12
I am aware that I could conversion in this way
new_list = [','.join(x) for x in list1]
But it returns
['I,am,a,student','I,come,from,China','I,study,computer,science']
len(new_list) = 3
I also tried this
new_list = [''.join(x for x in list1)]
but it gives the following error
TypeError: sequence item 0: expected str instance, list found
How can I extract each word in the sublist of list1 and convert it into a list of string? I'm using python 3 in windows 7.
Following your edit, I think the most transparent approach is now the one that was adopted by another answer (an answer which has since been deleted, I think). I've added some whitespace to make it easier to understand what's going on:
list1 = [['I am a student'], ['I come from China'], ['I study computer science']]
list2 = [
word
for sublist in list1
for sentence in sublist
for word in sentence.split()
]
print(list2)
Prints:
['I', 'am', 'a', 'student', 'I', 'come', 'from', 'China', 'I', 'study', 'computer', 'science']
Given a list of lists where each sublist contain strings this could be solved using jez's strategy like:
list2 = ' '.join([' '.join(strings) for strings in list1]).split()
Where the list comprehension transforms list1 to a list of strings:
>>> [' '.join(strings) for strings in list1]
['I am a student', 'I come from China', 'I study computer science']
The join will then create a string from the strings and split will create a list split on spaces.
If the sublists only contain single strings, you could simplify the list comprehension:
list2 = ' '.join([l[0] for l in list1]).split()

Generate every possible subset of length 1..n of list/string

Problem
I have a string, for which I want to find every possible subset of lengths 1..n.
Example
Given the string "abc" and n=3, I want to produce the following list:
{"a", "b", "c", "aa", "ab", "ac", "ba", ..., "aaa", "aab", "aac", "aba" ..., "ccc"}
My attempt
...is painfully novice. One loop for every n, nested n times.
For n = 3, I'd have:
characters = "abcdef" # and so on
for char in characters:
print(char)
for char1 in characters:
for char2 in characters:
print(str(char1) + str(char2))
for char1 in characters:
for char2 in characters:
for char3 in characters:
print(str(char1) + str(char2) + str(char3))
As you can see, this is unscalable to say the least. Is there a nice way to do this? Any complexity reductions would also be cool, although I have a hard time imagining any.
itertools.product is what you need. Use "".join" to join characters into a single string.
>>> import itertools
>>> n = 3
>>> s = "abc"
>>> for i in range(n):
print(["".join(prod) for prod in itertools.product(s, repeat = i + 1)])
['a', 'b', 'c']
['aa', 'ab', 'ac', 'ba', 'bb', 'bc', 'ca', 'cb', 'cc']
['aaa', 'aab', 'aac', 'aba', 'abb', 'abc', 'aca', 'acb', 'acc', 'baa', 'bab', 'bac', 'bba', 'bbb', 'bbc', 'bca', 'bcb', 'bcc', 'caa', 'cab', 'cac', 'cba', 'cbb', 'cbc', 'cca', 'ccb', 'ccc']
Using python I would suggest looping over the i from 1 to n and for each iteration looping over j from 1 to n-i and in each iteration printing the substring that starts at j and ends at j+i.
substring is a bit strange in python so you would have to "slice" it from j to -(n-(j+i)).
This is approximately what you want, although you may have to adjust the edge terms.

Resources