How to aggregate string length sequence base on an indicator sequence - python-3.x

I have a dictionary with two keys and their values are lists of strings.
I want to calculate string length of one list base on an indicator in another list.
It's difficult to frame the question is words, so let's look at an example.
Here is an example dictionary:
thisdict ={
'brand': ['Ford','bmw','toyota','benz','audi','subaru','ferrari','volvo','saab'],
'type': ['O','B','O','B','I','I','O','B','B']
}
Now, I want to add an item to the dictionary that corresponds to string cumulative-length of "brand-string-sequence" base on condition of "type-sequence".
Here is the criteria:
If type = 'O', set string length = 0 for that index.
If type = 'B', set string length to the corresponding string length.
If type = 'I', it's when things get complicated. You would want to look back the sequence and sum up string length until you reach to the first 'B'.
Here is an example output:
thisdict ={
"brand": ['Ford','bmw','toyota','benz','audi','subaru','ferrari','volvo','saab'],
'type': ['O','B','O','B','I','I','O','B','B'],
'cumulative-length':[0,3,0,4,8,14,0,5,4]
}
where 8=len(benz)+len(audi) and 14=len(benz)+len(audi)+len(subaru)
Note that in the real data I'm working on, the sequence can be one "B" and followed by an arbitrary number of "I"s. ie. ['B','I','I','I','I','I','I',...,'O'] so I'm looking for a solution that is robust in such situation.
Thanks

You can use the zip fucntion to tie the brand and type together. Then just keep a running total as you loop through the dictionary values. This solution will support any length series and any length string in the brand list. I am assuming that len(thisdict['brand']) == len(thisdict['type']).
thisdict = {
'brand': ['Ford','bmw','toyota','benz','audi','subaru','ferrari','volvo','saab'],
'type': ['O','B','O','B','I','I','O','B','B']
}
lengths = []
running_total = 0
for b, t in zip(thisdict['brand'], thisdict['type']):
if t == 'O':
lengths.append(0)
elif t == 'B':
running_total = len(b)
lengths.append(running_total)
elif t == 'I':
running_total += len(b)
lengths.append(running_total)
print(lengths)
# [0, 3, 0, 4, 8, 14, 0, 5, 4]
Generating random data
import random
import string
def get_random_brand_and_type():
n = random.randint(1,8)
b = ''.join(random.choice(string.ascii_uppercase) for _ in range(n))
t = random.choice(['B', 'I', 'O'])
return b, t
thisdict = {
'brand': [],
'type': []
}
for i in range(random.randint(1,20)):
b, t = get_random_brand_and_type()
thisdict['brand'].append(b)
thisdict['type'].append(t)
yields the following result:
{'type': ['B', 'B', 'O', 'I', 'B', 'O', 'O', 'I', 'O'],
'brand': ['O', 'BSYMLFN', 'OF', 'SO', 'KPQGRW', 'DLCWW', 'VLU', 'ZQE', 'GEUHERHE']}
[1, 7, 0, 9, 6, 0, 0, 9, 0]

Related

list elements replacing with specific values in python

I have a list list1=['a','b','c','a','c','d','a','b',10,20] , the list may contain some more elements with 'a','b','c','d' and 'e' in randomized position. I want to replace 'a' with 10, 'b' with 0, 'c' with 20, 'd' with 100, 'e' with -10. so basically the output list should be(for list1):[10,0,20,10,20,100,10,0,10,20]
I have a list list1=['a','b','c','a','c','d','a','b',10,20] , the list may contain some more elements with 'a','b','c','d' and 'e' in randomized index position. I want to replace 'a' with 10, 'b' with 0, 'c' with 20, 'd' with 100, 'e' with -10 in the list. so basically the output list should be(for list1):[10,0,20,10,20,100,10,0,10,20]
note: I dont want to replace numerical elements
What you want to do is a basic mapping of a value to another. This is usually done using a dictionary which defines the mapping and then iterate over all the values that you want to map and apply the mapping.
Here are to approaches which will get you the expected result. One is using a list comprehension, the second alternative way is using the map() built-in function.
list1 = ['a', 'b', 'c', 'a', 'c', 'd', 'a', 'b']
mapping = {
"a": 10,
"b": 0,
"c": 20,
"d": 100,
"e": -10
}
# option 1 using a list comprehension
result = [mapping[item] for item in list1]
print(result)
# another option using the built-in map()
alternative = list(map(lambda item: mapping[item], list1))
print(alternative)
Expected output:
[10, 0, 20, 10, 20, 100, 10, 0]
[10, 0, 20, 10, 20, 100, 10, 0]
Edit
As per request in the comments, here a version which only maps the values for which a mapping is defined. If no mapping is defined the original value is returned. Again I've implemented both variants.
# I have added some values which do not have a mapping defined
list1 = ['a', 'b', 'c', 'a', 'c', 'd', 'a', 'b', 'z', 4, 12, 213]
mapping = {
"a": 10,
"b": 0,
"c": 20,
"d": 100,
"e": -10
}
def map_value(value):
"""
Maps value to a defined mapped value or if no mapping is defined returns the original value
:param value: value to be mapped
:return:
"""
if value in mapping:
return mapping[value]
return value
# option 1 using a list comprehension
result = [map_value(item) for item in list1]
print(result)
# another option using the built-in map()
alternative = list(map(map_value, list1))
print(alternative)
Expected output
[10, 0, 20, 10, 20, 100, 10, 0, 'z', 4, 12, 213]
[10, 0, 20, 10, 20, 100, 10, 0, 'z', 4, 12, 213]
As you can see 'z', 4, 12, 213 are not affected as for them there is no mapping defined.

How to count the key from the nested dictionary?

I'd like to count and return the largest number of keys from the root('key1') to reach the end. How to compute and return the number of the deepest nest to the dictionary without using any libraries?
{'key1': {'key2': {'key3': {'key4': {'key5': {'key6': 'Y',
'key7': 'N'}},
'key8': {'key9': {'key10': 'Y', 'key11': 'N'}},
'key12': {'key13': {'key14': 'N', 'key15': 'Y'}},
'key16': {'key17': {'N': 'Y'}}}},
'key18': {'key19': {'key20': 'N', 'key21': 'N', 'key22': 'N', 'key23': 'Y'}}}}
Under the case, I expect to return 6 as a counted number.
Here's a recursive solution that doesn't use any libraries (though there might be a better way to do this using collections):
def deepest_nesting(data):
max_depth = 0
if not isinstance(data, dict):
return max_depth
for v in data.values():
path_depth = deepest_nesting(v)
max_depth = max(path_depth, max_depth)
return 1 + max_depth
This returns 6 for your example, 1 for {'key1': 0}, 0 for a non-dict, 4 for {'one': {'two': 0, 'three': 0, 'four': 0}, 'five': {'six': {'seven': 0, 'eight': 0, 'nine': {'ten': 0}}}}, etc.

Using a shorter list as values

Hello I am trying to use a shorter list as a value in a dictionary can anyone please help?
string = input("Input DNA Sequence: ")
sequence = [string[e:e+3] for e in range(0, len(string), 3)]
p_residue = list("WYS")
Input: TGGTACTCTTTCTTCACA
Output: {TGG:W,TAC:Y,TCT:S,TTC:W,TTC:Y,ACA:S}
I've tried cycle but I can't seem to make it work.
You can use cycle if you zip p_residue onto your sequence.
from itertools import cycle
def split_str_every(n, seq):
return [seq[i:i+n] for i in range(0, len(seq), n)]
def combine(seq, residue):
return zip(sequence, cycle(p_residue))
sequence = split_str_every(3, "TGGTACTCTTTCTTCACA")
p_residue = ["W", "Y", "S"]
out = combine(sequence, p_residue)
print(dict(out))
Gives:
{'TGG': 'W', 'TAC': 'Y', 'TCT': 'S', 'TTC': 'Y', 'ACA': 'S'}
Which, as you can see, dictionaries don't allow duplicate keys by definition. We can use defaultdics to circumvent this problem. To fix this, we import defaultdict and redefine our combine function:
from collections import defaultdict
def combine(seq, residue):
zipped = zip(sequence, cycle(p_residue))
ddict = defaultdict(list)
for k, v in zipped:
ddict[k].append(v)
return dict(ddict.items())
print(combine(sequence, p_residue))
Now gives the correct answer. Notice that the key TTC stores a list containing both Y & W:
{'TGG': ['W'], 'TAC': ['Y'], 'TCT': ['S'], 'TTC': ['W', 'Y'], 'ACA': ['S']}
Use
sequence = {
string[e:e+3]: p_residue[(e//3) % len(p_residue)]
for e in range(0, len(string), 3)
}
Output
{'TGG': 'W', 'TAC': 'Y', 'TCT': 'S', 'TTC': 'Y', 'ACA': 'S'}
To understand better,
e is one of [0, 3, 6, 9, 12, 15]
e//3 is integer division, so, [0, 1, 2, 3, 4, 5]
(e//3) % 3 is to keep it to residue length, so, [0, 1, 2, 0, 1, 2]
This mathematical approach induces a cycle

How to find frequency of values in a list of lists and combine with another existing list by common value?

I have a nested list of music artists comprised of user inputs, lets say:
artists_list = [['A', 'B', 'C'],
['A', 'C', 'B'],
['B', 'A', 'D']]
I've also managed to create a separate list, based on order of input (not alphabetically), that assigns a genre to each unique artist in the above list:
artist_genre_list = [['A', 'Rock'],
['B', 'Rap'],
['C', 'Rock'],
['D', 'Blues']]
How do I combine these two to make either a master list or dictionary including the frequency count similar to:
master_list = [['A', 'Rock', 3],
['B', 'Rap', 3],
['C', 'Rock', 2],
['D', 'Blues', 1]]
master_dict = {'A': {
'Genre': 'Rock',
'Frequency': 3},
'B': {
'Genre': 'Rap',
'Frequency': 3},
'C': {
'Genre': 'Rock',
'Frequency': 2},
'D': {
'Genre': 'Blues',
'Frequency': 1}
}
The order doesn't necessarily have to be alphabetical. Here is a sample of what I'm doing to create the first two lists:
# Counters
count = 1
new_artist_counter = 0
# Generate Lists
artists_input_list = []
aux_artists_list = []
aux_genre_list = []
aux_artists_genre_list = []
def merge(aux_artists_list, aux_genre_list):
merged_list = [[aux_artists_list[i], aux_genre_list[i]] for i in range(0,
len(aux_artists_list))]
return merged_list
while count < 4:
# Inputs
a1_in = str(input("Artist 1: "))
a2_in = str(input("Artist 2: "))
a3_in = str(input("Artist 3: "))
artists_input_list.append([a1_in, a2_in, a3_in])
# Determines if new unique artist has been added and asks for it's genre
while new_artist_counter < len(artists_input_list):
for entry in artists_input_list:
for artist in entry:
if artist not in aux_artists_list:
aux_artists_list.append(artist)
genre_input = input("What is "+artist+"'s genre? ")
aux_genre_list.append(genre_input)
else: continue
new_artist_counter += 1
aux_artists_genre_list = merge(aux_artists_list, aux_genre_list)
# Counter updates
count += 1
print(artists_input_list)
print(aux_artists_genre_list)
This is what I came up with. It first flattens your artist list, gets the frequencies of each item in the list then combines it with your genre list
from itertools import groupby, chain
import pprint
artists_list = [
['A', 'B', 'C'],
['A', 'C', 'B'],
['B', 'A', 'D']
]
artist_genre_list = [
['A', 'Rock'],
['B', 'Rap'],
['C', 'Rock'],
['D', 'Blues']
]
frequencies = {
key: len(list(value)) for key,
value in groupby(sorted(chain.from_iterable(artists_list)))
}
frequency = [{
letter: {
'Genre': genre,
'Frequency': next((freq
for key, freq in frequencies.items() if key is letter), 0)
}
}
for letter, genre in artist_genre_list
]
pprint.pprint(frequency)
I used pprint just to make the output tidier, which shows as
[{'A': {'Frequency': 3, 'Genre': 'Rock'}},
{'B': {'Frequency': 3, 'Genre': 'Rap'}},
{'C': {'Frequency': 2, 'Genre': 'Rock'}},
{'D': {'Frequency': 1, 'Genre': 'Blues'}}]

How do I order double list of elements of this type: [[1,2,3], [a,b,c]]?

I have a double list of this type: dl = [[13, 22, 41], ['c', 'b', 'a']], in which, each element dl[0][i] belongs a value in dl[1][i] (with the same index). How can I sort my list using dl[0] values as my order criteria, maintainning linked both sublists? Sublist are kind of 'linked data', so the previous dl[0][i] and dl[1][i] values must match their index after sorting the parent entire list, using as sorting criteria, the first sublist values
I expect something like:
input: dl = [ [14,22,7,17], ['K', 'M', 'F','A'] ]
output: dl = [ [7, 14, 17, 22], ['F', 'K', 'A', 'M'] ]
This was way too much fun to write. I don't doubt that this function can be greatly improved, but this is what I've gotten in a very short amount of time and should get you started.
I've included some tests just so you can verify that this does indeed do what you want.
from unittest import TestCase, main
def sort_by_first(data):
sorted_data = []
for seq in data:
zipped_to_first = zip(data[0], seq)
sorted_by_first = sorted(zipped_to_first)
unzipped_data = zip(*sorted_by_first)
sorted_data.append(list(tuple(unzipped_data)[1]))
return sorted_data
class SortByFirstTestCase(TestCase):
def test_sort(self):
output_1 = sort_by_first([[1, 3, 5, 2, 4], ['a', 'b', 'c', 'd', 'e']])
self.assertEqual(output_1, [[1, 2, 3, 4, 5], ['a', 'd', 'b', 'e', 'c']])
output_2 = sort_by_first([[9, 1, 5], [21, 22, 23], ['spam', 'foo', 'bar']])
self.assertEqual(output_2, [[1, 5, 9], [22, 23, 21], ['foo', 'bar', 'spam']])
if __name__ == '__main__':
main()
Updated for what you're looking for, selection sort but added another line to switch for the second list to match the first.
for i in range(len(dl[0])):
min_idx = i
for j in range(i+1, len(dl[0])):
if dl[0][min_idx] > dl[0][j]:
min_idx = j
dl[0][i], dl[0][min_idx] = dl[0][min_idx], dl[0][i]
dl[1][i], dl[1][min_idx] = dl[1][min_idx], dl[1][i]
You can try solving this with a for loop also:
dl = [ [3,2,1], ['c', 'b', 'a'] ]
for i in range(0,len(dl)):
dl[i].sort()
print(dl)

Resources