I have a list list1=['a','b','c','a','c','d','a','b',10,20] , the list may contain some more elements with 'a','b','c','d' and 'e' in randomized position. I want to replace 'a' with 10, 'b' with 0, 'c' with 20, 'd' with 100, 'e' with -10. so basically the output list should be(for list1):[10,0,20,10,20,100,10,0,10,20]
I have a list list1=['a','b','c','a','c','d','a','b',10,20] , the list may contain some more elements with 'a','b','c','d' and 'e' in randomized index position. I want to replace 'a' with 10, 'b' with 0, 'c' with 20, 'd' with 100, 'e' with -10 in the list. so basically the output list should be(for list1):[10,0,20,10,20,100,10,0,10,20]
note: I dont want to replace numerical elements
What you want to do is a basic mapping of a value to another. This is usually done using a dictionary which defines the mapping and then iterate over all the values that you want to map and apply the mapping.
Here are to approaches which will get you the expected result. One is using a list comprehension, the second alternative way is using the map() built-in function.
list1 = ['a', 'b', 'c', 'a', 'c', 'd', 'a', 'b']
mapping = {
"a": 10,
"b": 0,
"c": 20,
"d": 100,
"e": -10
}
# option 1 using a list comprehension
result = [mapping[item] for item in list1]
print(result)
# another option using the built-in map()
alternative = list(map(lambda item: mapping[item], list1))
print(alternative)
Expected output:
[10, 0, 20, 10, 20, 100, 10, 0]
[10, 0, 20, 10, 20, 100, 10, 0]
Edit
As per request in the comments, here a version which only maps the values for which a mapping is defined. If no mapping is defined the original value is returned. Again I've implemented both variants.
# I have added some values which do not have a mapping defined
list1 = ['a', 'b', 'c', 'a', 'c', 'd', 'a', 'b', 'z', 4, 12, 213]
mapping = {
"a": 10,
"b": 0,
"c": 20,
"d": 100,
"e": -10
}
def map_value(value):
"""
Maps value to a defined mapped value or if no mapping is defined returns the original value
:param value: value to be mapped
:return:
"""
if value in mapping:
return mapping[value]
return value
# option 1 using a list comprehension
result = [map_value(item) for item in list1]
print(result)
# another option using the built-in map()
alternative = list(map(map_value, list1))
print(alternative)
Expected output
[10, 0, 20, 10, 20, 100, 10, 0, 'z', 4, 12, 213]
[10, 0, 20, 10, 20, 100, 10, 0, 'z', 4, 12, 213]
As you can see 'z', 4, 12, 213 are not affected as for them there is no mapping defined.
Related
I have the following set of rules for grading system
if 25 < score <= 30, grade = A.
if 20 < score <= 25, grade = B.
if 15 < score <= 20, grade = C.
if 10 < score <= 15, grade = D.
if 5 < score <= 10, grade = E.
if 0 <= score <= 5, grade = F.
so I have to write a function which takes score as parameter and returns letter grade. So I can do this using selections(if, else). But I want to do it in different manner.
for instance I want to declare a dictionary like below:
gradeDict = {
'A': [26, 27, 28, 29, 30],
'B': [21, 22, 23, 24, 25],
'C': [16, 17, 18, 19, 20],
'D': [11, 12, 13, 14, 15],
'E': [6, 7, 8, 9, 10],
'F': [0, 1, 2, 3, 4, 5]
}
so while checking the score with values I want to return the key
In python I've learned something like dict.get(term, 'otherwise') but it will give you the values. Is there any other mechanism that does the opposite, ie: if we can pass the value in the get method it will return the key?
The bisect standard library offers an elegant solution to problems like this one. In fact, grading is one of the examples shown in the docs.. Here is an adaption of the example modeled on OP's grading curve:
Example:
from bisect import bisect_left
def grade(score, breakpoints=[5, 10, 15, 20, 25], grades='FEDCBA'):
i = bisect_left(breakpoints, score)
return grades[i]
[grade(score) for score in [1, 5, 8, 10, 11, 15, 17, 20, 22, 25, 26]]
Output:
['F', 'F', 'E', 'E', 'D', 'D', 'C', 'C', 'B', 'B', 'A']
Funny thing is that you don't even need a dictionary for this, just an array. Ofc you can do it in a dictionary way style by declaring the following dict:
gradeDict = {
1:'F',
2:'E',
3:'D',
4:'C',
5:'B',
6:'A'
}
This dict seems to be useless since it's just an ordered list of indexes 1,2,3...
You can transform it: grates_arr = ['F', 'E', 'D', 'C', 'B', 'A']
But how can I get the letter that I need? you may ask. Simple, divide the score by 5. 21 // 5 means 4. grates_arr[21//5] is 'B'.
2 more particular cases:
when the score divides 5 means you have to subtract 1 because for example 25 // 5 means 5 but grates_arr[5] is A not B.
when score is 0 do not subtract.
1) The question I'm working on:
Write an algorithm for a function called removeAll which takes 3 parameters: an array of array type, a count of elements in the array, and a value. As with the remove method we discussed in class, elements passed the count of elements are stored as None. This function should remove all occurrences of value and then shift the remaining data down. The last populated element in the array should then be set to None. The function then returns the count of “valid” (i.e. non-removed) data elements left. This function should do the removal “by hand” and SHOULD NOT use the remove method.
2) Below I have what I think works for the question, but it seems inefficient and repetitive. Is there any way to simplify it?
'''
def myremove(mylist, elements, val):
for i in range(elements): # Changes all val that needs to be removed to None
if mylist[i] == val:
mylist[i] = None
for i in range(elements):
if mylist[i] is None: # Moves elements remaining left
for j in range(i, elements- 1):
mylist[j] = mylist[j + 1]
mylist[elements- 1] = None
while mylist[0] is None: # If the first element is None move left until it is not
for j in range(i, elements - 1):
mylist[j] = mylist[j + 1]
mylist[elements - 1] = None
for i in range(elements): # Counts remaining elements
if mylist[i] is None:
elements -= 1
return mylist, elements
"""
"""
# Testing the function
print(removeAll([8, 'N', 24, 16, 1, 'N'], 6, 'N'))
print(removeAll([1, 'V', 3, 4, 2, 'V'], 6, 3))
print(removeAll([0, 'D', 5, 6, 9, 'D'], 6, 'N'))
print(removeAll(['X', 'X', 7, 'X', 'C', 'X'], 6, 'X'))
"""
"""
OUTPUT
([8, 24, 16, 1, None, None], 4)
([1, 'V', 4, 2, 'V', None], 5)
([0, 'D', 5, 6, 9, 'D'], 6)
([7, 'C', None, None, None, None], 2)
"""
You can just sort the list based on whether or not the value equals the hole value.
l = [8, 'N', 24, 16, 1, 'N']
sorted(l, key=lambda x: x == 'N')
output:
[8, 24, 16, 1, 'N', 'N']
If you need None instead of the hole value in the output, use a list comprehension and then sort based on None first.
l = [i if i != 'N' else None for i in [8, 'N', 24, 16, 1, 'N']]
sorted(l, key=lambda x: x == None)
[8, 24, 16, 1, None, None]
Then all thats left is to add in the count which you can just get by counting how many elements are None and subtract that from your input parameter.
def myremove(mylist, elements, val):
ret_list = sorted([i if i != val else None for i in mylist], key=lambda x: x == None)
return ret_list, elements - ret_list.count(None)
I have a double list of this type: dl = [[13, 22, 41], ['c', 'b', 'a']], in which, each element dl[0][i] belongs a value in dl[1][i] (with the same index). How can I sort my list using dl[0] values as my order criteria, maintainning linked both sublists? Sublist are kind of 'linked data', so the previous dl[0][i] and dl[1][i] values must match their index after sorting the parent entire list, using as sorting criteria, the first sublist values
I expect something like:
input: dl = [ [14,22,7,17], ['K', 'M', 'F','A'] ]
output: dl = [ [7, 14, 17, 22], ['F', 'K', 'A', 'M'] ]
This was way too much fun to write. I don't doubt that this function can be greatly improved, but this is what I've gotten in a very short amount of time and should get you started.
I've included some tests just so you can verify that this does indeed do what you want.
from unittest import TestCase, main
def sort_by_first(data):
sorted_data = []
for seq in data:
zipped_to_first = zip(data[0], seq)
sorted_by_first = sorted(zipped_to_first)
unzipped_data = zip(*sorted_by_first)
sorted_data.append(list(tuple(unzipped_data)[1]))
return sorted_data
class SortByFirstTestCase(TestCase):
def test_sort(self):
output_1 = sort_by_first([[1, 3, 5, 2, 4], ['a', 'b', 'c', 'd', 'e']])
self.assertEqual(output_1, [[1, 2, 3, 4, 5], ['a', 'd', 'b', 'e', 'c']])
output_2 = sort_by_first([[9, 1, 5], [21, 22, 23], ['spam', 'foo', 'bar']])
self.assertEqual(output_2, [[1, 5, 9], [22, 23, 21], ['foo', 'bar', 'spam']])
if __name__ == '__main__':
main()
Updated for what you're looking for, selection sort but added another line to switch for the second list to match the first.
for i in range(len(dl[0])):
min_idx = i
for j in range(i+1, len(dl[0])):
if dl[0][min_idx] > dl[0][j]:
min_idx = j
dl[0][i], dl[0][min_idx] = dl[0][min_idx], dl[0][i]
dl[1][i], dl[1][min_idx] = dl[1][min_idx], dl[1][i]
You can try solving this with a for loop also:
dl = [ [3,2,1], ['c', 'b', 'a'] ]
for i in range(0,len(dl)):
dl[i].sort()
print(dl)
I have a dictionary with two keys and their values are lists of strings.
I want to calculate string length of one list base on an indicator in another list.
It's difficult to frame the question is words, so let's look at an example.
Here is an example dictionary:
thisdict ={
'brand': ['Ford','bmw','toyota','benz','audi','subaru','ferrari','volvo','saab'],
'type': ['O','B','O','B','I','I','O','B','B']
}
Now, I want to add an item to the dictionary that corresponds to string cumulative-length of "brand-string-sequence" base on condition of "type-sequence".
Here is the criteria:
If type = 'O', set string length = 0 for that index.
If type = 'B', set string length to the corresponding string length.
If type = 'I', it's when things get complicated. You would want to look back the sequence and sum up string length until you reach to the first 'B'.
Here is an example output:
thisdict ={
"brand": ['Ford','bmw','toyota','benz','audi','subaru','ferrari','volvo','saab'],
'type': ['O','B','O','B','I','I','O','B','B'],
'cumulative-length':[0,3,0,4,8,14,0,5,4]
}
where 8=len(benz)+len(audi) and 14=len(benz)+len(audi)+len(subaru)
Note that in the real data I'm working on, the sequence can be one "B" and followed by an arbitrary number of "I"s. ie. ['B','I','I','I','I','I','I',...,'O'] so I'm looking for a solution that is robust in such situation.
Thanks
You can use the zip fucntion to tie the brand and type together. Then just keep a running total as you loop through the dictionary values. This solution will support any length series and any length string in the brand list. I am assuming that len(thisdict['brand']) == len(thisdict['type']).
thisdict = {
'brand': ['Ford','bmw','toyota','benz','audi','subaru','ferrari','volvo','saab'],
'type': ['O','B','O','B','I','I','O','B','B']
}
lengths = []
running_total = 0
for b, t in zip(thisdict['brand'], thisdict['type']):
if t == 'O':
lengths.append(0)
elif t == 'B':
running_total = len(b)
lengths.append(running_total)
elif t == 'I':
running_total += len(b)
lengths.append(running_total)
print(lengths)
# [0, 3, 0, 4, 8, 14, 0, 5, 4]
Generating random data
import random
import string
def get_random_brand_and_type():
n = random.randint(1,8)
b = ''.join(random.choice(string.ascii_uppercase) for _ in range(n))
t = random.choice(['B', 'I', 'O'])
return b, t
thisdict = {
'brand': [],
'type': []
}
for i in range(random.randint(1,20)):
b, t = get_random_brand_and_type()
thisdict['brand'].append(b)
thisdict['type'].append(t)
yields the following result:
{'type': ['B', 'B', 'O', 'I', 'B', 'O', 'O', 'I', 'O'],
'brand': ['O', 'BSYMLFN', 'OF', 'SO', 'KPQGRW', 'DLCWW', 'VLU', 'ZQE', 'GEUHERHE']}
[1, 7, 0, 9, 6, 0, 0, 9, 0]
let's say I have multiple lists of lists, I'll a include a shortened version of three of them in this example.
list1=[['name', '1A5ZA'], ['length', 83], ['A', 28], ['V', 31], ['I', 24]]
list2=[['name', '1AJ8A'], ['length', 49], ['A', 18], ['V', 11], ['I', 20]]
list3=[['name', '1AORA'], ['length', 96], ['A', 32], ['V', 49], ['I', 15]]
all of the lists are in the same format: they have the same number of nested lists, with the same labels.
I generate each of these lists with the following function
def GetResCount(sequence):
residues=[['A',0],['V',0],['I',0],['L',0],['M',0],['F',0],['Y',0],['W',0],
['S',0],['T',0],['N',0],['Q',0],['C',0],['U',0],['G',0],['P',0],['R',0],
['H',0],['K',0],['D',0],['E',0]]
name=sequence[0:5]
AAseq=sequence[27:]
for AA in AAseq:
for n in range(len(residues)):
if residues[n][0] == AA:
residues[n][1]=residues[n][1]+1
length=len(AAseq)
nameLsit=(['name', name])
lengthList=(['length', length])
residues.insert(0,lengthList)
residues.insert(0,nameLsit)
return residues
the script takes a sequence such as this
1A5ZA:A|PDBID|CHAIN|SQUENCEMKIGIVGLGRVGSSTAFAL
and will create a list similar to the ones mentioned above.
As each individual list is generated, I would like to append it to a final form, such that all of them combined together looks like this:
final=[['name', '1A5ZA', '1AJ8A', '1AORA'], ['length', 83, 49, 96], ['A', 28, 18, 32], ['V', 31, 11, 49], ['I', 24, 20, 15]]
maybe the final form of the data isn't in the right format. I am open to suggestion on how to format the final form better...
To summarize, what the script should do is to get a sequence of letters with the name of the sequence being at beginning, count the occurrence of each letter withing the sequence as well as the overall sequence length, and output the name length and the letter frequency to a list. Then it should combine the info from each sequence into a larger list(maybe dictionary?..)
at the very end all of this info will go into a spreadsheet that will look like this:
name length A V I
1A5ZA 83 28 31 24
1AJ8A 49 18 11 20
1AORA 96 32 49 15
I'm including this last bit because maybe I'm not starting starting in the right way to end up with what I want.
Anyway,
I hope you made it here and thanks for the help!
So if you are looking for a table then a dict might be a better approach. (Note: collections.Counter does the same as your counting), e.g.:
from collections import Counter
def GetResCount(sequence):
name, AAseq = sequence[0:5], sequence[27:]
residuals = {'name': name, 'length': len(AAseq), 'A': 0, 'V': 0, 'I': 0, 'L': 0,
'M': 0, 'F': 0, 'Y': 0, 'W': 0, 'S': 0, 'T': 0, 'N': 0, 'Q': 0, 'C': 0,
'U': 0, 'G': 0, 'P': 0, 'R': 0, 'H': 0, 'K': 0, 'D': 0, 'E': 0}
residuals.update(Counter(AAseq))
return residuals
In []:
GetResCount('1A5ZA:A|PDBID|CHAIN|SQUENCEMKIGIVGLGRVGSSTAFAL')
Out[]:
{'name': '1A5ZA', 'length': 19, 'A': 2, 'V': 2, 'I': 2, 'L': 2, 'M': 1, 'F': 1, 'Y': 0,
'W': 0, 'S': 2, 'T': 1, 'N': 0, 'Q': 0, 'C': 0, 'U': 0, 'G': 4, 'P': 0, 'R': 1,
'H': 0, 'K': 1, 'D': 0, 'E': 0}
Note: this may only be in the order you might be looking in Py3.6+ but we can fix that later as we create the table if necessary.
Then you can create a list of the dicts, e.g. (assuming you are reading these lines from a file):
with open(<file>) as file:
data = [GetResCount(line.strip()) for line in file]
Then you can load it directly into pandas, e.g.:
In []:
import pandas as pd
columns = ['name', 'length', 'A', 'V', 'I', ...] # columns = list(data[0].keys()) - Py3.6+
df = pd.DataFrame(data, columns=columns)
print(df)
Out[]:
name length A V I ...
0 1A5ZA 83 28 31 24 ...
1 1AJ8A 49 18 11 20 ...
2 1AORA 96 32 49 15 ...
...
You could also just dump it out to a file with cvs.DictWriter():
from csv import DictWriter
fieldnames = ['name', 'length', 'A', 'V', 'I', ...]
with open(<output>, 'w') as file:
writer = DictWrite(file, fieldnames)
writer.writerows(data)
Which would output something like:
name,length,A,V,I,...
1A5ZA,83,28,31,24,...
1AJ8A,49,18,11,20,...
1AORA,96,32,49,15 ...
...