Related
I have two lists [1, 2, 3, 1, 2, 1] and [a, b, c, d, e, f]. I want to reorder elements in the second list according to the permutations that sort the first list. Sorting the first list gives [1, 1, 1, 2, 2, 3] but there are many possible permutations for the second list to be sorted by the first i.e. [a, d, f, b, e, c], [d, f, a, e, b, c], etc..
How can I generate all of these permutations in an efficient manner in python?
If I just wanted one permutation I could get one by something like this:
sorted_numbers, sorted_letters = list(zip(*[(x, y) for x, y in sorted(zip(numbers, letters))]))
If the size of the lists is not too large you could just use a list comprehension to filter all the permutations with a helper function:
from itertools import permutations
def is_valid_ordering(perm: str, ch_to_order: dict) -> bool:
if not perm or len(perm) <= 1:
return True
for ch1, ch2 in zip(perm[:-1], perm[1:]):
if ch_to_order[ch1] > ch_to_order[ch2]:
return False
return True
lst_1 = [1, 2, 3, 1, 2, 1]
lst_2 = ['a', 'b', 'c', 'd', 'e', 'f']
ch_to_order = {ch: o for ch, o in zip(lst_2, lst_1)}
valid_permutations = [
list(p) for p in permutations(lst_2)
if is_valid_ordering(p, ch_to_order)
]
for valid_perm in valid_permutations:
print(valid_perm)
Output:
['a', 'd', 'f', 'b', 'e', 'c']
['a', 'd', 'f', 'e', 'b', 'c']
['a', 'f', 'd', 'b', 'e', 'c']
['a', 'f', 'd', 'e', 'b', 'c']
['d', 'a', 'f', 'b', 'e', 'c']
['d', 'a', 'f', 'e', 'b', 'c']
['d', 'f', 'a', 'b', 'e', 'c']
['d', 'f', 'a', 'e', 'b', 'c']
['f', 'a', 'd', 'b', 'e', 'c']
['f', 'a', 'd', 'e', 'b', 'c']
['f', 'd', 'a', 'b', 'e', 'c']
['f', 'd', 'a', 'e', 'b', 'c']
Alternatively if the lists are large and therefore efficiency is important, you could construct only the valid orderings (see Stef's answer for an even better approach than below):
from collections import defaultdict
from itertools import permutations, product
from iteration_utilities import flatten
lst_1 = [1, 2, 3, 1, 2, 1]
lst_2 = ['a', 'b', 'c', 'd', 'e', 'f']
equivalent_chars = defaultdict(list)
for o, ch in zip(lst_1, lst_2):
equivalent_chars[o].append(ch)
equivalent_char_groups = [g for o, g in sorted(equivalent_chars.items())]
all_group_permutations = [[list(p) for p in permutations(group)]
for group in equivalent_char_groups]
valid_permutations = [
list(flatten(p)) for p in product(*all_group_permutations)
]
for valid_perm in valid_permutations:
print(valid_perm)
Using itertools to build the Cartesian product of the permutations for each duplicated key:
Code
from itertools import chain, permutations, groupby, product
from operator import itemgetter
def all_sorts(numbers, letters):
return [list(map(itemgetter(1), chain.from_iterable(p))) for p in product(*(permutations(g) for _,g in groupby(sorted(zip(numbers, letters)), key=itemgetter(0))))]
print( all_sorts([1,2,3,1,2,1], 'abcdef') )
# [['a', 'd', 'f', 'b', 'e', 'c'], ['a', 'd', 'f', 'e', 'b', 'c'], ['a', 'f', 'd', 'b', 'e', 'c'], ['a', 'f', 'd', 'e', 'b', 'c'], ['d', 'a', 'f', 'b', 'e', 'c'], ['d', 'a', 'f', 'e', 'b', 'c'], ['d', 'f', 'a', 'b', 'e', 'c'], ['d', 'f', 'a', 'e', 'b', 'c'], ['f', 'a', 'd', 'b', 'e', 'c'], ['f', 'a', 'd', 'e', 'b', 'c'], ['f', 'd', 'a', 'b', 'e', 'c'], ['f', 'd', 'a', 'e', 'b', 'c']]
This approach is optimal in the sense that it generates the solutions directly, rather that filtering them from a huge list of candidates. With the given example list of size 6, it generates only 12 solutions, rather than filtering through all 720 permutations of a list of size 6.
How it works:
First we sort and group by key, using sorted and itertools.groupby. Note operator.itemgetter(0) is the same as lambda t: t[0].
>>> [list(g) for _,g in groupby(sorted(zip(numbers, letters)), key=itemgetter(0))]
[[(1, 'a'), (1, 'd'), (1, 'f')],
[(2, 'b'), (2, 'e')],
[(3, 'c')]]
Then we generate the possible permutations of every key, using itertools.permutation on every group.
>>> [list(permutations(g)) for _,g in groupby(sorted(zip(numbers, letters)), key=itemgetter(0))]
[[((1, 'a'), (1, 'd'), (1, 'f')), ((1, 'a'), (1, 'f'), (1, 'd')), ((1, 'd'), (1, 'a'), (1, 'f')), ((1, 'd'), (1, 'f'), (1, 'a')), ((1, 'f'), (1, 'a'), (1, 'd')), ((1, 'f'), (1, 'd'), (1, 'a'))],
[((2, 'b'), (2, 'e')), ((2, 'e'), (2, 'b'))],
[((3, 'c'),)]]
Then we build the Cartesian product of these lists of permutations, using itertools.product; and we rebuild a list from each tuple in the Cartesian product, using itertools.chain to concatenate. Fially we "undecorate", discarding the keys and keeping only the letters, which I did with map(itemgetter(1), ...) but could have equivalently done with a list comprehension [t[1] for t in ...].
>>> [list(map(itemgetter(1), chain.from_iterable(p))) for p in product(*(permutations(g) for _,g in groupby(sorted(zip(numbers, letters)), key=itemgetter(0))))]
[['a', 'd', 'f', 'b', 'e', 'c'], ['a', 'd', 'f', 'e', 'b', 'c'], ['a', 'f', 'd', 'b', 'e', 'c'], ['a', 'f', 'd', 'e', 'b', 'c'], ['d', 'a', 'f', 'b', 'e', 'c'], ['d', 'a', 'f', 'e', 'b', 'c'], ['d', 'f', 'a', 'b', 'e', 'c'], ['d', 'f', 'a', 'e', 'b', 'c'], ['f', 'a', 'd', 'b', 'e', 'c'], ['f', 'a', 'd', 'e', 'b', 'c'], ['f', 'd', 'a', 'b', 'e', 'c'], ['f', 'd', 'a', 'e', 'b', 'c']]
Another implementation without filtering:
from itertools import product, permutations, chain
numbers = [1, 2, 3, 1, 2, 1]
letters = ['a', 'b', 'c', 'd', 'e', 'f']
grouper = {}
for number, letter in zip(numbers, letters):
grouper.setdefault(number, []).append(letter)
groups = [grouper[number] for number in sorted(grouper)]
for prod in product(*map(permutations, groups)):
print(list(chain.from_iterable(prod)))
Output:
['a', 'd', 'f', 'b', 'e', 'c']
['a', 'd', 'f', 'e', 'b', 'c']
['a', 'f', 'd', 'b', 'e', 'c']
['a', 'f', 'd', 'e', 'b', 'c']
['d', 'a', 'f', 'b', 'e', 'c']
['d', 'a', 'f', 'e', 'b', 'c']
['d', 'f', 'a', 'b', 'e', 'c']
['d', 'f', 'a', 'e', 'b', 'c']
['f', 'a', 'd', 'b', 'e', 'c']
['f', 'a', 'd', 'e', 'b', 'c']
['f', 'd', 'a', 'b', 'e', 'c']
['f', 'd', 'a', 'e', 'b', 'c']
It first groups the letters by their numbers, using a dict:
grouper = {1: ['a', 'd', 'f'], 2: ['b', 'e'], 3: ['c']}
Then it sorts the numbers and extracts their letter groups:
groups = [['a', 'd', 'f'], ['b', 'e'], ['c']]
Then just permute each group and build and chain the products.
I have the following 2D list:
test_list = [['A', 'B', 'C'], ['I', 'L', 'A', 'C', 'K', 'B'], ['J', 'I', 'A', 'B', 'C']]
I want to compare the 1st list elements of the 2D array test_list[0] with all other lists. If the elements ['A', 'B', 'C'] are present in all other lists then it should print any message such as "All elements are similar".
I have tried this piece of code but it is not working as I expected:
test_list = [['A', 'B', 'C'], ['I', 'L', 'A', 'C', 'K', 'B'], ['J', 'I', 'A', 'B', 'C']]
for idx,ele in enumerate(p):
result = set(test_list [0]).intersection(test_list [(idx + 1) % len(temp_d)])
print(result)
Expected Output:
The elements of the list ['A', 'B', 'C'] are present in all other lists.
You can use the all(...) function - or remove all elements from the bigger list from your smaller one converted to set. If the set.difference() is Falsy (i.e. all elements were removed) they were all contained in it:
test_list = [['A', 'B', 'C'], ['I', 'L', 'A', 'C', 'K', 'B'], ['J', 'I', 'A', 'B', 'C']]
s = test_list[0]
for e in test_list[1:]:
if all(v in e for v in s):
print(e, "contains all elements of ", s)
s = set(s)
for e in test_list[1:]:
# if all elements of s are in e the difference will be an empty set == Falsy
if not s.difference(e):
print(e, "contains all elements of ", s)
Output:
['I', 'L', 'A', 'C', 'K', 'B'] contains all elements of ['A', 'B', 'C']
['J', 'I', 'A', 'B', 'C'] contains all elements of ['A', 'B', 'C']
['I', 'L', 'A', 'C', 'K', 'B'] contains all elements of {'A', 'B', 'C'}
['J', 'I', 'A', 'B', 'C'] contains all elements of {'A', 'B', 'C'}
For each letter in the first list see if they are in the second and third list and return the boolean.
Then see if the set of the new list equals True
test_list = [['A', 'B', 'C'], ['I', 'L', 'A', 'C', 'K', 'B'], ['J', 'I', 'A', 'B', 'C']]
bool = ([x in test_list[1]+test_list[2] for x in test_list[0]])
if list(set(bool))[0] == True:
print('All elements are similar')
>>> All elements are similar
I have a list of 3 letter strings that match the keys of a dictionary in my program. I want to translate each 3 letter string using the dictionary, so I wrote a for loop. Instead of translating each element in the list, it translates each single character in each 3 letter element, which accomplishes nothing because there is no pair in the dictionary. I feel like I'm forgetting something basic about for loops and lists, but I can't figure out how to get this to iterate and translate how I want. Code Below:
rnaCodonTable = {
# RNA codon table
# U
'UUU': 'F', 'UCU': 'S', 'UAU': 'Y', 'UGU': 'C', # UxU
'UUC': 'F', 'UCC': 'S', 'UAC': 'Y', 'UGC': 'C', # UxC
'UUA': 'L', 'UCA': 'S', 'UAA': 'STOP', 'UGA': 'STOP', # UxA
'UUG': 'L', 'UCG': 'S', 'UAG': 'STOP', 'UGG': 'W', # UxG
# C
'CUU': 'L', 'CCU': 'P', 'CAU': 'H', 'CGU': 'R', # CxU
'CUC': 'L', 'CCC': 'P', 'CAC': 'H', 'CGC': 'R', # CxC
'CUA': 'L', 'CCA': 'P', 'CAA': 'Q', 'CGA': 'R', # CxA
'CUG': 'L', 'CCG': 'P', 'CAG': 'Q', 'CGG': 'R', # CxG
# A
'AUU': 'I', 'ACU': 'T', 'AAU': 'N', 'AGU': 'S', # AxU
'AUC': 'I', 'ACC': 'T', 'AAC': 'N', 'AGC': 'S', # AxC
'AUA': 'I', 'ACA': 'T', 'AAA': 'K', 'AGA': 'R', # AxA
'AUG': 'M', 'ACG': 'T', 'AAG': 'K', 'AGG': 'R', # AxG
# G
'GUU': 'V', 'GCU': 'A', 'GAU': 'D', 'GGU': 'G', # GxU
'GUC': 'V', 'GCC': 'A', 'GAC': 'D', 'GGC': 'G', # GxC
'GUA': 'V', 'GCA': 'A', 'GAA': 'E', 'GGA': 'G', # GxA
'GUG': 'V', 'GCG': 'A', 'GAG': 'E', 'GGG': 'G' # GxG
}
self.aaList = []
self.codonList = ['AUG', 'AGG', 'CUG', 'AAG', 'AUA', 'AGG', 'ACA', 'GAC', 'GGC', 'GCC', 'GCC', 'CAG', 'CAA', 'CAG', 'CAG', 'GCG', 'GAC', 'UGG', 'CGG', 'GAC', 'UGC', 'UUC', 'AUC', 'CGC', 'GCC', 'GUC', 'GUC', 'GAG', 'AUG', 'CCG', 'GCG', 'GAC', 'UGG', 'GGC', 'AUG', 'GCG', 'AUA', 'AUC', 'AAG', 'GCC', 'AUG', 'CCC', 'CAG', 'GAG', 'AUG', 'GUA', 'AAC', 'GAG', 'CUG', 'UUA', 'CAA', 'AGC', 'CGA', 'AAC', 'GAC', 'CCC', 'UAC', 'UAC', 'AAA', 'UUC', 'GCG', 'CUU', 'CUG', 'CUA', 'CUG', 'CAG', 'AGG', 'GCA', 'CAG', 'AAA', 'UAA']
for codon in self.codonList[:]:
self.aaList += codon.translate(self.rnaCodonTable)
print (self.aaList)
I was expecting an output of ['M', 'R', 'L',...] translating each 3 letter string according to the dictionary, but instead I get:
['A', 'U', 'G', 'A', 'G', 'G', 'C', 'U', 'G', 'A', 'A', 'G', 'A', 'U', 'A', 'A', 'G', 'G', 'A', 'C', 'A', 'G', 'A', 'C', 'G', 'G', 'C', 'G', 'C', 'C', 'G', 'C', 'C', 'C', 'A', 'G', 'C', 'A', 'A', 'C', 'A', 'G', 'C', 'A', 'G', 'G', 'C', 'G', 'G', 'A', 'C', 'U', 'G', 'G', 'C', 'G', 'G', 'G', 'A', 'C', 'U', 'G', 'C', 'U', 'U', 'C', 'A', 'U', 'C', 'C', 'G', 'C', 'G', 'C', 'C', 'G', 'U', 'C', 'G', 'U', 'C', 'G', 'A', 'G', 'A', 'U', 'G', 'C', 'C', 'G', 'G', 'C', 'G', 'G', 'A', 'C', 'U', 'G', 'G', 'G', 'G', 'C', 'A', 'U', 'G', 'G', 'C', 'G', 'A', 'U', 'A', 'A', 'U', 'C', 'A', 'A', 'G', 'G', 'C', 'C', 'A', 'U', 'G', 'C', 'C', 'C', 'C', 'A', 'G', 'G', 'A', 'G', 'A', 'U', 'G', 'G', 'U', 'A', 'A', 'A', 'C', 'G', 'A', 'G', 'C', 'U', 'G', 'U', 'U', 'A', 'C', 'A', 'A', 'A', 'G', 'C', 'C', 'G', 'A', 'A', 'A', 'C', 'G', 'A', 'C', 'C', 'C', 'C', 'U', 'A', 'C', 'U', 'A', 'C', 'A', 'A', 'A', 'U', 'U', 'C', 'G', 'C', 'G', 'C', 'U', 'U', 'C', 'U', 'G', 'C', 'U', 'A', 'C', 'U', 'G', 'C', 'A', 'G', 'A', 'G', 'G', 'G', 'C', 'A', 'C', 'A', 'G', 'A', 'A', 'A', 'U', 'A', 'A']
How do I make the translation affect whole strings within a list, rather than each character of the strings?
I removed all the selfs for simplicity, you were close, your iteration was correct, but you need to iterate over each object in the list.
And then using that object to find the corresponding value in the dictionary you defined. Fixed your indentation with print which was incorrect.
Also you do not need to use translate, and I used append to add the values to the resulting list.
Hope that helps :)
rnaCodonTable = {
# RNA codon table
# U
'UUU': 'F', 'UCU': 'S', 'UAU': 'Y', 'UGU': 'C', # UxU
'UUC': 'F', 'UCC': 'S', 'UAC': 'Y', 'UGC': 'C', # UxC
'UUA': 'L', 'UCA': 'S', 'UAA': 'STOP', 'UGA': 'STOP', # UxA
'UUG': 'L', 'UCG': 'S', 'UAG': 'STOP', 'UGG': 'W', # UxG
# C
'CUU': 'L', 'CCU': 'P', 'CAU': 'H', 'CGU': 'R', # CxU
'CUC': 'L', 'CCC': 'P', 'CAC': 'H', 'CGC': 'R', # CxC
'CUA': 'L', 'CCA': 'P', 'CAA': 'Q', 'CGA': 'R', # CxA
'CUG': 'L', 'CCG': 'P', 'CAG': 'Q', 'CGG': 'R', # CxG
# A
'AUU': 'I', 'ACU': 'T', 'AAU': 'N', 'AGU': 'S', # AxU
'AUC': 'I', 'ACC': 'T', 'AAC': 'N', 'AGC': 'S', # AxC
'AUA': 'I', 'ACA': 'T', 'AAA': 'K', 'AGA': 'R', # AxA
'AUG': 'M', 'ACG': 'T', 'AAG': 'K', 'AGG': 'R', # AxG
# G
'GUU': 'V', 'GCU': 'A', 'GAU': 'D', 'GGU': 'G', # GxU
'GUC': 'V', 'GCC': 'A', 'GAC': 'D', 'GGC': 'G', # GxC
'GUA': 'V', 'GCA': 'A', 'GAA': 'E', 'GGA': 'G', # GxA
'GUG': 'V', 'GCG': 'A', 'GAG': 'E', 'GGG': 'G' # GxG
}
aaList = []
codonList = ['AUG', 'AGG', 'CUG', 'AAG', 'AUA', 'AGG', 'ACA', 'GAC', 'GGC', 'GCC', 'GCC', 'CAG', 'CAA', 'CAG', 'CAG', 'GCG', 'GAC', 'UGG', 'CGG', 'GAC', 'UGC', 'UUC', 'AUC', 'CGC', 'GCC', 'GUC', 'GUC', 'GAG', 'AUG', 'CCG', 'GCG', 'GAC', 'UGG', 'GGC', 'AUG', 'GCG', 'AUA', 'AUC', 'AAG', 'GCC', 'AUG', 'CCC', 'CAG', 'GAG', 'AUG', 'GUA', 'AAC', 'GAG', 'CUG', 'UUA', 'CAA', 'AGC', 'CGA', 'AAC', 'GAC', 'CCC', 'UAC', 'UAC', 'AAA', 'UUC', 'GCG', 'CUU', 'CUG', 'CUA', 'CUG', 'CAG', 'AGG', 'GCA', 'CAG', 'AAA', 'UAA']
for codon in codonList:
aaList.append(rnaCodonTable[codon])
print (aaList)
From docs on string.translate:
Delete all characters from s that are in deletechars (if present), and then translate the characters using table, which must be a 256-character string giving the translation for each character value, indexed by its ordinal. If table is None, then only the character deletion step is performed.
...which is something completely different from what you're doing. This is what you should be doing:
aaList = ""
for codon in self.codonList:
self.aaList += self.rnaCodonTable[codon]
I'm pretty new to Python, I have written a web scraper that gives me output from 8 different tables into 8 pandas data frames. I am renaming the column names from each dataframe and extracting only 2 of those.
df1.columns = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']
df2.columns = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']
df3.columns = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']
df4.columns = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']
df5.columns = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']
df6.columns = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']
df7.columns = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']
df8.columns = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']
df_delvol1 = df1[["E", "F"]
df_delvol2 = df2[["E", "F"]
df_delvol3 = df2[["E", "F"]
etc
writer = pd.ExcelWriter('options_{}.xlsx'.format(pd.datetime.today().strftime('%d %b %y')), engine = 'xlsxwriter')
df_delvol1.to_excel(writer,'Sheet1')
df_delvol2.to_excel(writer,'Sheet2')
etc
It works but I was wondering if there was a more efficient way to do this?
If you place all your dataframes in a list you can then iterate through them and apply the same operation.
It would look something like this, in the first line I am just creating some random dataframes.
dfs = [pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 8))) for _ in range(8)]
for df in dfs:
df.columns = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']
df_delvols = [df[["E", "F"]] for df in dfs]
writer = pd.ExcelWriter('options_{}.xlsx'.format(pd.datetime.today().strftime('%d %b %y')), engine = 'xlsxwriter')
for n, df_delvol in enumerate(df_delvols):
df_delvol.to_excel(writer, 'Sheet{}'.format(n))
This will give you an idea for avoiding redundant code -
a = [1,2,3]
df1 = pandas.DataFrame(a)
df2 = pandas.DataFrame(a)
df3 = pandas.DataFrame(a)
for var in ['df1.columns', 'df2.columns', 'df3.columns']:
exec("%s = ['A']" % var)
>>> print(df1.columns)
Index(['A'], dtype='object')
I have explained it for only one column - 'A' but you get the point.
Assuming I have something like the following:
['a', 'b', 'c', 'd', 'e', 'f', 'g']
And I have to change it to:
['a', 'b', 'f', 'c', 'd', 'e', 'g']
What is the most efficient way to do this?
UPDATE: I actually need the elements shifted, not swapped. Note the change to my example above.
I don't know if by "efficient" you mean "in a clear/readable way", or if you're referring to performance. If it's the former and you want to do the replacement in-place, you can use the handy [] operator of lists:
def arr = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
arr[2, 5] = arr[5, 2]
assert arr == ['a', 'b', 'f', 'd', 'e', 'c', 'g']
Update: The question is not about swapping two elements, it's about moving an element to another position. To do that in-place, you can use some of the Java ArrayList methods that let you add and remove elements from a given position. I think this is quite readable:
def arr = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
arr.add(2, arr.remove(5))
assert arr == ['a', 'b', 'f', 'c', 'd', 'e', 'g']