merging mulitple list of lists in python 3 - python-3.x

let's say I have multiple lists of lists, I'll a include a shortened version of three of them in this example.
list1=[['name', '1A5ZA'], ['length', 83], ['A', 28], ['V', 31], ['I', 24]]
list2=[['name', '1AJ8A'], ['length', 49], ['A', 18], ['V', 11], ['I', 20]]
list3=[['name', '1AORA'], ['length', 96], ['A', 32], ['V', 49], ['I', 15]]
all of the lists are in the same format: they have the same number of nested lists, with the same labels.
I generate each of these lists with the following function
def GetResCount(sequence):
residues=[['A',0],['V',0],['I',0],['L',0],['M',0],['F',0],['Y',0],['W',0],
['S',0],['T',0],['N',0],['Q',0],['C',0],['U',0],['G',0],['P',0],['R',0],
['H',0],['K',0],['D',0],['E',0]]
name=sequence[0:5]
AAseq=sequence[27:]
for AA in AAseq:
for n in range(len(residues)):
if residues[n][0] == AA:
residues[n][1]=residues[n][1]+1
length=len(AAseq)
nameLsit=(['name', name])
lengthList=(['length', length])
residues.insert(0,lengthList)
residues.insert(0,nameLsit)
return residues
the script takes a sequence such as this
1A5ZA:A|PDBID|CHAIN|SQUENCEMKIGIVGLGRVGSSTAFAL
and will create a list similar to the ones mentioned above.
As each individual list is generated, I would like to append it to a final form, such that all of them combined together looks like this:
final=[['name', '1A5ZA', '1AJ8A', '1AORA'], ['length', 83, 49, 96], ['A', 28, 18, 32], ['V', 31, 11, 49], ['I', 24, 20, 15]]
maybe the final form of the data isn't in the right format. I am open to suggestion on how to format the final form better...
To summarize, what the script should do is to get a sequence of letters with the name of the sequence being at beginning, count the occurrence of each letter withing the sequence as well as the overall sequence length, and output the name length and the letter frequency to a list. Then it should combine the info from each sequence into a larger list(maybe dictionary?..)
at the very end all of this info will go into a spreadsheet that will look like this:
name length A V I
1A5ZA 83 28 31 24
1AJ8A 49 18 11 20
1AORA 96 32 49 15
I'm including this last bit because maybe I'm not starting starting in the right way to end up with what I want.
Anyway,
I hope you made it here and thanks for the help!

So if you are looking for a table then a dict might be a better approach. (Note: collections.Counter does the same as your counting), e.g.:
from collections import Counter
def GetResCount(sequence):
name, AAseq = sequence[0:5], sequence[27:]
residuals = {'name': name, 'length': len(AAseq), 'A': 0, 'V': 0, 'I': 0, 'L': 0,
'M': 0, 'F': 0, 'Y': 0, 'W': 0, 'S': 0, 'T': 0, 'N': 0, 'Q': 0, 'C': 0,
'U': 0, 'G': 0, 'P': 0, 'R': 0, 'H': 0, 'K': 0, 'D': 0, 'E': 0}
residuals.update(Counter(AAseq))
return residuals
In []:
GetResCount('1A5ZA:A|PDBID|CHAIN|SQUENCEMKIGIVGLGRVGSSTAFAL')
Out[]:
{'name': '1A5ZA', 'length': 19, 'A': 2, 'V': 2, 'I': 2, 'L': 2, 'M': 1, 'F': 1, 'Y': 0,
'W': 0, 'S': 2, 'T': 1, 'N': 0, 'Q': 0, 'C': 0, 'U': 0, 'G': 4, 'P': 0, 'R': 1,
'H': 0, 'K': 1, 'D': 0, 'E': 0}
Note: this may only be in the order you might be looking in Py3.6+ but we can fix that later as we create the table if necessary.
Then you can create a list of the dicts, e.g. (assuming you are reading these lines from a file):
with open(<file>) as file:
data = [GetResCount(line.strip()) for line in file]
Then you can load it directly into pandas, e.g.:
In []:
import pandas as pd
columns = ['name', 'length', 'A', 'V', 'I', ...] # columns = list(data[0].keys()) - Py3.6+
df = pd.DataFrame(data, columns=columns)
print(df)
Out[]:
name length A V I ...
0 1A5ZA 83 28 31 24 ...
1 1AJ8A 49 18 11 20 ...
2 1AORA 96 32 49 15 ...
...
You could also just dump it out to a file with cvs.DictWriter():
from csv import DictWriter
fieldnames = ['name', 'length', 'A', 'V', 'I', ...]
with open(<output>, 'w') as file:
writer = DictWrite(file, fieldnames)
writer.writerows(data)
Which would output something like:
name,length,A,V,I,...
1A5ZA,83,28,31,24,...
1AJ8A,49,18,11,20,...
1AORA,96,32,49,15 ...
...

Related

writing a generator that yields dictionaries of base frequencies of nucleotides

I am trying to write a function that returns a generator that can be iterated over all starting position of a k-window in the DNA sequence. For each starting position, the generator returns the nucleotide frequencies in the window as a dictionary.
def sliding(s,k):
d = {}
for i in range(len(s)-3):
chunk = ''.join([s[i],s[i+(k-3)],s[i+(k-2)],s[i+(k-1)]])
for j in chunk:
if j not in d:
d[j] = 1
else:
d[j] += 1
yield d
seq = "ACGTTGCA"
for d in sliding(seq,4):
print(d)
Output:
{'A': 1, 'C': 1, 'G': 1, 'T': 1}
{'A': 1, 'C': 2, 'G': 2, 'T': 3}
{'A': 1, 'C': 2, 'G': 4, 'T': 5}
{'A': 1, 'C': 3, 'G': 5, 'T': 7}
{'A': 2, 'C': 4, 'G': 6, 'T': 8}
Expected Output:
{'T': 1, 'C': 1, 'A': 1, 'G': 1}
{'T': 2, 'C': 1, 'A': 0, 'G': 1}
{'T': 2, 'C': 0, 'A': 0, 'G': 2}
{'T': 2, 'C': 1, 'A': 0, 'G': 1}
{'T': 1, 'C': 1, 'A': 1, 'G': 1}
However, in my function, as one can see, the dictionary is the same for all the windows and the nucleotide counts to the same dictionary key in every iteration. For every window (chunk) there should be different dictionary.
You should initialize d inside the loop instead so that it starts with a new dict for each iteration:
for i in range(len(s) - 3):
d = {}
...
If you want the dicts in the output to always have the same keys even if their values are 0, as suggested by your expected output, you can initialize a dict with all of the distinct letters as keys, and copy the dict to d for each iteration:
initialized_dict = dict.fromkeys(s, 0)
for i in range(len(s) - 3):
d = initialized_dict.copy()
...

How to individually add each letter of the alphabet into a dictionary

I'm trying to add each letter of the alphabet into a python dictionary, but I don't want to add it manually.
I have tried using string.ascii_lowercase, but it does not add each letter individually into the dictionary. Is there a way to add each letter in individually without doing it manually?
import string
dict = {'letter':string.ascii_lowercase, 'appearances':0}
print(dict['letter'], dict['appearances'])
I'm trying to get it to print out, 'a' 0, 'b' 0, etc. However, instead, it is printing out 'abcdefg...z' 0. Is there a way to enter then print out each letter individually followed by 0?
Initialize your dictionary with dict comprehension:
import string
d = {k: 0 for k in string.ascii_lowercase}
for k, v in d.items():
print(k, v)
Prints:
a 0
b 0
c 0
d 0
...and so on.
Dictionary d contains:
{'a': 0, 'b': 0, 'c': 0, 'd': 0, 'e': 0, 'f': 0, 'g': 0, 'h': 0, 'i': 0, 'j': 0, 'k': 0, 'l': 0, 'm': 0, 'n': 0, 'o': 0, 'p': 0, 'q': 0, 'r': 0, 's': 0, 't': 0, 'u': 0, 'v': 0, 'w': 0, 'x': 0, 'y': 0, 'z': 0}

Return multiple lines in a for loop

d = {'U': 4, '_': 2, 'C': 2, 'K': 1, 'D': 4, 'T': 6, 'Q': 1, 'V': 2, 'A': 9, 'F': 2, 'O': 8, 'J': 1, 'I': 9, 'N': 6, 'P': 2, 'S': 4, 'M': 2, 'W': 2, 'E': 12, 'Z': 1, 'G': 3, 'Y': 2, 'B': 2, 'L': 4, 'R': 6, 'X': 1, 'H': 2}
def __str__(self):
omgekeerd = {}
for sleutel, waarde in self.inhoud.items():
letters = omgekeerd.get(waarde, '')
letters += sleutel
omgekeerd[waarde] = letters
for aantal in sorted(omgekeerd):
return '{}: {}'.format(aantal, ''.join(sorted(omgekeerd[aantal])))
I need to return the value, followed by a ':' and then followed by every letter that has that value.
The problem is that when I use return, it only returns one value instead of every vale on a new line.
I can't use print() because that is not supported by the method str(self).
The return statement ends function execution and specifies a value to
be returned to the function caller.
I believe that your code is terminated too early because of wrong usage of return statement.
What you could do is to store what you would like to return in a seperate list/dictionary and then when everything is done, you can return the new dict/list that you've stored the results in.
If I understood you correctly; This is what might be looking for:
def someFunc():
d = {'U': 4, '_': 2, 'C': 2, 'K': 1, 'D': 4, 'T': 6, 'Q': 1, 'V': 2, 'A': 9,
'F': 2, 'O': 8, 'J': 1, 'I': 9, 'N': 6, 'P': 2, 'S': 4, 'M': 2, 'W': 2, 'E': 12,
'Z': 1, 'G': 3, 'Y': 2, 'B': 2, 'L': 4, 'R': 6, 'X': 1, 'H': 2}
result = {}
for key, value in d.iteritems():
result[value] = [k for k,v in d.iteritems() if v == value]
return result
# call function and iterate over given dictionary
for key, value in someFunc().iteritems():
print key, value
Result:
1 ['K', 'J', 'Q', 'X', 'Z']
2 ['C', 'B', 'F', 'H', 'M', 'P', 'W', 'V', 'Y', '_']
3 ['G']
4 ['D', 'L', 'S', 'U']
6 ['N', 'R', 'T']
8 ['O']
9 ['A', 'I']
12 ['E']

is there a simple way i can convert this list into a dictionary (python)

the list is this :
List1 = ['a','b','c','d','e','f','g','h','h','i','j','k','l','m','n']
And I am hoping for the outcome to be where each times the item appears in the list its assigned an integer e.g:
List1 = ['a:1']
without using the 'import counter' module
You could use this list comprehension:
dict((x, List1.count(x)) for x in set(List1))
Example output:
{'d': 1, 'f': 1, 'l': 1, 'c': 1, 'j': 1, 'e': 1, 'i': 1, 'a': 1, 'h': 2, 'b': 1, 'm': 1, 'n': 1, 'k': 1, 'g': 1}
(Edited to match edited question.)
Use a dictionary comprehension and count.
>>> List1 = ['a','b','c','d','e','f','g','h','h','i','j','k','l','m','n']
>>> mapping = {v: List1.count(v) for v in List1}
>>> mapping
{'a': 1, 'b': 1, 'c': 1, 'd': 1, 'e': 1, 'f': 1,
'g': 1, 'h': 2, 'i': 1, 'j': 1, 'k': 1, 'l': 1, 'm': 1, 'n': 1}

Counting the occurence of each character in multidimensional list in python

I want to count the occurrence of each character in this list below:
messages=['It is certain',
'It is decidedly so',
'Yes definitely',
'Reply hazy try again',
'Ask again later',
'Concentrate and ask again',
'My reply is no',
'Outlook not so good',
'Very doubtful']
My code is this:
a=dict((letter,messages.count(letter))for letter in set(messages))
print(a)
Output is:
{'Yes definitely': 1, 'Very doubtful': 1, 'It is decidedly so': 1, 'Outlook not so good': 1, 'Reply hazy try again': 1, 'It is certain': 1, 'My reply is no': 1, 'Concentrate and ask again': 1, 'Ask again later': 1}
This is counting of each element in list instead I want count of each character.
There are multiple ways to do it, one hacky way to do this is:
messages=['It is certain','It is decidedly so','Yes definitely','Reply hazy try again','Ask again later','Concentrate and ask again','My reply is no','Outlook not so good','Very doubtful']
characters = list(''.join(messages).replace(" ","").lower())
characters.sort()
from itertools import groupby
count = {key:len(list(group)) for key, group in groupby(characters)}
print(count)
{'a': 13, 'b': 1, 'c': 4, 'd': 7, 'e': 12, 'f': 2, 'g': 4, 'h': 1, 'i': 12, 'k': 3, 'l': 7, 'm': 1, 'n': 10, 'o': 11, 'p': 2, 'r': 7, 's': 8, 't': 11, 'u': 3, 'v': 1, 'y': 9, 'z': 1}

Resources