This is my code that takes a number of codons. Codons are a group of three nucleotides, each coding for an Amino Acid
codon_sequence=[]
print("Enter no. of codons you want")
n=int(input())
for i in range(n):
codon=str(input())
codon_sequence.append(codon)
print(codon_sequence)
for i in range(n):
if(codon_sequence[i]=="UUU" or "UUC" or "TTT" or "TTC"):
print("Phe_")
elif(codon_sequence[i]=="UUA" or "UUG" or "CUU" or "CUC" or "CUG" or "CUA" or "TTA" or "TTG" or "CTT" or "CTC" or "CTG" or "CTA"):
print("Leu_")
elif(codon_sequence[i]=="UCU" or "UCC" or "UCG" or "UCA" or "AGU" or "AGC" or "TCT" or "TCC" or "TCG" or "TCA" or "AGT" or "AGC"):
print("Ser_")
elif(codon_sequence[i]=="UAU" or "UAC" or "TAT" or "TAC"):
print("Tyr_")
elif(codon_sequence[i]=="UGU" or "UGC" or "TGT" or "TGC"):
print("Cys_")
elif(codon_sequence[i]=="UGG" or "TGG"):
print("Trp_")
elif(codon_sequence[i]=="CCU" or "CCC" or "CCA" or "CCG" or "CCT"):
print("Pro_")
elif(codon_sequence[i]=="CGU" or "CGC" or "CGA" or "CGG" or "AGA" or "AGG" or "CGT"):
print("Arg_")
elif(codon_sequence[i]=="CAU" or "CAC" or "CAT"):
print("His_")
elif(codon_sequence[i]=="CAA" or "CAG"):
print("Gln_")
elif(codon_sequence[i]=="AUU" or "AUC" or "AUA" or "ATT" or "ATC" or "ATA"):
print("Ile_")
elif(codon_sequence[i]=="AUG"):
print("Met_")
elif(codon_sequence[i]=="ACU" or "ACC" or "ACA" or "ACG" or "ACT"):
print("Thr_")
elif(codon_sequence[i]=="GUU" or "GUC" or "GUA" or "GUG" or "GTT" or "GTC" or "GTA" or "GTG"):
print("Val_")
elif(codon_sequence[i]=="GCU" or "GCC" or "GCA" or "GCG" or "GCT"):
print("Ala_")
elif(codon_sequence[i]=="GGU" or "GGC" or "GGA" or "GGG" or "GGT"):
print("Gly_")
elif(codon_sequence[i]=="GAU" or "GAC" or "GAT"):
print("Asp_")
elif(codon_sequence[i]=="GAA" or "GAG"):
print("Glu_")
elif(codon_sequence[i]=="AAU" or "AAC" or "AAT"):
print("Asn_")
elif(codon_sequence[i]=="AAA" or "AAG"):
print("Lys_")
else:
print("Stop_")
This is however, giving me only 'Phe_' as result, and ignores all other conditions
Reason why your code is not hitting the elif blocks
Your if and elif blocks should look like this.
It should check if codon_sequence[i] is equal to a string of interest.
if(codon_sequence[i]=="UUU" or codon_sequence[i]=="UUC" or codon_sequence[i]=="TTT" or codon_sequence[i]=="TTC"):
Instead you have an or condition against just plain strings like UUC.
This will result in the first if condition always being True.
Thereby you will never hit the elif block.
Also a better way of writing the if statement would be:
if codon_sequence[i] in ["UUU", "UUC", "TTT", "TTC"]:
print("Phe_")
This would be a great candidate for a switch statement, but as the previous answer mentioned you can't put an "or" between each string like you're doing.
So I have these strings that I split by spaces (' ') and I just rolled them into a single list I called 'keyLabelRun'
so it looks like this:
keyLabelRun[0-12]:
0 OS=Dengue
1 virus
2 3
3 PE=4
4 SV=1
5 Split=0
6
7 OS=Bacillus
8 subtilis
9 XF-1
10 GN=opuBA
11 PE=4
12 SV=1
I only want the elements that include and are after "OS=", anything else, whether it be "SV=" or "PE=" etc. I want to skip over those elements until I get to the next "OS="
The number of elements to the next "OS=" is arbitrary so that's where I'm having the problem.
This is what I'm currently trying:
OSarr = []
for i in range(len(keyLabelrun)):
if keyLabelrun[i].count('OS='):
OSarr.append(keyLabelrun[i])
if keyLabelrun[i+1].count('=') != 1:
continue
But the elements where "OS=" is not included is what is tripping me up I think.
Also at the end I'm going to join them all back together in their own elements but I feel like I will be able to handle that after this.
In my attempt, I am trying to append all elements I'm looking for in order to an new list 'OSarr'
If anyone can lend a hand, it would be much appreciated.
Thank you.
These list of strings came from a dataset that is a text file in the form:
>tr|W0FSK4|W0FSK4_9FLAV Genome polyprotein (Fragment) OS=Dengue virus 3 PE=4 SV=1 Split=0
MNNQRKKTGKPSINMLKRVRNRVSTGSQLAKRFSKGLLNGQGPMKLVMAFIAFLRFLAIPPTAGVLARWGTFKKSGAIKVLKGFKKEISNMLSIINKRKKTSLCLMMILPAALAFHLTSRDGEPRMIVGKNERGKSLLFKTASGINMCTLIAMDLGEMCDDTVTYKCPHITEVEPEDIDCWCNLTSTWVTYGTCNQAGEHRRDKRSVALAPHVGMGLDTRTQTWMSAEGAWRQVEKVETWALRHPGFTILALFLAHYIGTSLTQKVVIFILLMLVTPSMTMRCVGVGNRDFVEGLSGATWVDVVLEHGGCVTTMAKNKPTLDIELQKTEATQLATLRKLCIEGKITNITTDSRCPTQGEATLPEEQDQNYVCKHTYVDRGWGNGCGLFGKGSLVTCAKFQCLEPIEGKVVQYENLKYTVIITVHTGDQHQVGNETQGVTAEITPQASTTEAILPEYGTLGLECSPRTGLDFNEMILLTMKNKAWMVHRQWFFDLPLPWTSGATTETPTWNRKELLVTFKNAHAKKQEVVVLGSQEGAMHTALTGATEIQNSGGTSIFAGHLKCRLKMDKLELKGMSYAMCTNTFVLKKEVSETQHGTILIKVEYKGEDVPCKIPFSTEDGQGKAHNGRLITANPVVTKKEEPVNIEAEPPFGESNIVIGIGDNALKINWYKKGSSIGKMFEATARGARRMAILGDTAWDFGSVGGVLNSLGKMVHQIFGSAYTALFSGVSWVMKIGIGVLLTWIGLNSKNTSMSFSCIAIGIITLYLGAVVQADMGCVINWKGKELKCGSGIFVTNEVHTWTEQYKFQADSPKRLATAIAGAWENGVCGIRSTTRMENLLWKQIANELNYILWENNIKLTVVVGDIIGVLEQGKRTLTPQPMELKYSWKTWGKAKIVTAETQNSSFIIDGPNTPECPSVSRAWNVWEVEDYGFGVFTTNIWLKLREVYTQLCDHRLMSAAVKDERAVHADMGYWIESQKNGSWKLEKASLIEVKTCTWPKSHTLWSNGVLESDMIIPKSLAGPISQHNHRPGYHTQTAGPWHLGKLELDFNYCEGTTVVITENCGTRGPSLRTTTVSGKLIHEWCCRSCTLPPLRYMGEDGCWYGMEIRPISEKEENMVKSLVSAGSGKVDNFTMGVLCLAILFEEVMRGKFGKKHMIAGVFFTFVLLLSGQITWRDMAHTLIMIGSNASDRMGMGVTYLALIATFKIQPFLALGFFLRKLTSRENLLLGVGLAMATTLQLPEDIEQMANGIALGLMALKLITQFETYQLWTALISLTCSNTIFTLTVAWRTATLILAGVSLLPVCQSSSMRKTDWLPMAVAAMGVPPLPLFIFGLKDTLKRRSWPLNEGVMAVGLVSILASSLLRNDVPMAGPLVAGGLLIACYVITGTSADLTVEKAADITWEEEAEQTGVSHNLMITVDDDGTMRIKDDETENILTVLLKTALLIVSGIFPYSIPATLLVWHTWQKQTQRSGVLWDVPSPPETQKAELEEGVYRIKQQGIFGKTQVGVGVQKEGVFHTMWHVTRGAVLTYNGKRLEPNWASVKKDLISYGGGWRLSAQWQKGEEVQVIAVEPGKNPKNFQTMPGTFQTTTGEIGAIALDFKPGTSGSPIINREGKVVGLYGNGVVTKNGGYVSGIAQTNAEPDGPTPELEEEMFKKRNLTIMDLHPGSGKTRKYLPAIVREAIKRRLRTLILAPTRVVAAEMEEALKGLPIRYQTTATKSEHTGREIVDLMCHATFTMRLLSPVRVPNYNLIIMDEAHFTDPASIAARGYISTRVGMGEAAAIFMTATPPGTADAFPQSNAPIQDEERDIPERSWNSGNEWITDFAGKTVWFVPSIKAGNDIANCLRKNGKKVIQLSRKTFDTEYQKTKLNDWDFVV
>tr|M4KW32|M4KW32_BACIU Choline ABC transporter (ATP-binding protein) OS=Bacillus subtilis XF-1 GN=opuBA PE=4 SV=1 Split=0
MLTLENVSKTYKGGKKAVNNVNLKIAKGEFICFIGPSGCGKTTTMKMINRLIEPSAGKIFIDGENIMDQDPVELRRKIGYVIQQIGLFPHMTIQQNISLVPKLLKWPEQQRKERARELLKLVDMGPEYVDRYPHELSGGQQQRIGVLRALAAEPPLILMDEPFGALDPITRDSLQEEFKKLQKTLHKTIVFVTHDMDEAIKLADRIVILKAGEIVQVGTPDDILRNPADEFVEEFIGKERLIQSSSPDVERVDQIMNTQPVTITADKTLSEAIQLMRQERVDSLLVVDDEHVLQGYVDVEIIDQCRKKANLIGEVLHEDIYTVLGGTLLRDTVRKILKRGVKYVPVVDEDRRLIGIVTRASLVDIVYDSLWGEEKQLAALS
>sp|Q8AWH3|SX17A_XENTR Transcription factor Sox-17-alpha OS=Xenopus tropicalis GN=sox17a PE=2 SV=1 Split=0
MSSPDGGYASDDQNQGKCSVPIMMTGLGQCQWAEPMNSLGEGKLKSDAGSANSRGKAEARIRRPMNAFMVWAKDERKRLAQQNPDLHNAELSKMLGKSWKALTLAEKRPFVEEAERLRVQHMQDHPNYKYRPRRRKQVKRMKRADTGFMHMAEPPESAVLGTDGRMCLESFSLGYHEQTYPHSQLPQGSHYREPQAMAPHYDGYSLPTPESSPLDLAEADPVFFTSPPQDECQMMPYSYNASYTHQQNSGASMLVRQMPQAEQMGQGSPVQGMMGCQSSPQMYYGQMYLPGSARHHQLPQAGQNSPPPEAQQMGRADHIQQVDMLAEVDRTEFEQYLSYVAKSDLGMHYHGQESVVPTADNGPISSVLSDASTAVYYCNYPSA
I got it! :D
OSarr = []
G = 0
for i in range(len(keyLabelrun)):
OSarr.append(keyLabelrun[G])
G += 1
if keyLabelrun[G].count('='):
while keyLabelrun[G].count('OS=') != 1:
G+=1
Maybe next time everyone, thank you!
Due to the syntax, you have to keep track of which part (OS, PE, etc) you're currently parsing. Here's a function to extract the species name from the FASTA header:
def extract_species(description):
species_parts = []
is_os = False
for word in description.split():
if word[:3] == 'OS=':
is_os = True
species_parts.append(word[3:])
elif '=' in word:
is_os = False
elif is_os:
species_parts.append(word)
return ' '.join(species_parts)
You can call it when processing your input file, e.g.:
from Bio import SeqIO
for record in SeqIO.parse('input.fa', 'fasta'):
species = extract_species(record.description)
So I am trying to solve mean, median and mode challenge on Hackerrank. I defined 3 functions to calculate mean, median and mode for a given array with length between 10 and 2500, inclusive.
I get an error with an array of 2500 integers, not sure why. I looked into python documentation and found no mentions of max length for lists. I know I can use statistics module but trying the hard way and being stubborn I guess. Any help and criticism is appreciated regarding my code. Please be honest and brutal if need be. Thanks
N = int(input())
var_list = [int(x) for x in input().split()]
def mean(sample_list):
mean = sum(sample_list)/N
print(mean)
return
def median(sample_list):
sorted_list = sorted(sample_list)
if N%2 != 0:
median = sorted_list[(N//2)]
else:
median = (sorted_list[N//2] + sorted_list[(N//2)-1])/2
print(median)
return
def mode(sample_list):
sorted_list = sorted(sample_list)
mode = min(sorted_list)
max_count = sorted_list.count(mode)
for i in sorted_list:
if (i <= mode) and (sorted_list.count(i) >= max_count):
mode = i
print(mode)
return
mean(var_list)
median(var_list)
mode(var_list)
Compiler Message
Wrong Answer
Input (stdin)
2500
19325 74348 68955 98497 26622 32516 97390 64601 64410 10205 5173 25044 23966 60492 71098 13852 27371 40577 74997 42548 95799 26783 51505 25284 49987 99134 33865 25198 24497 19837 53534 44961 93979 76075 57999 93564 71865 90141 5736 54600 58914 72031 78758 30015 21729 57992 35083 33079 6932 96145 73623 55226 18447 15526 41033 46267 52486 64081 3705 51675 97470 64777 31060 90341 55108 77695 16588 64492 21642 56200 48312 5279 15252 20428 57224 38086 19494 57178 49084 37239 32317 68884 98127 79085 77820 2664 37698 84039 63449 63987 20771 3946 862 1311 77463 19216 57974 73012 78016 9412 90919 40744 24322 68755 59072 57407 4026 15452 82125 91125 99024 49150 90465 62477 30556 39943 44421 68568 31056 66870 63203 43521 78523 58464 38319 30682 77207 86684 44876 81896 58623 24624 14808 73395 92533 4398 8767 72743 1999 6507 49353 81676 71188 78019 88429 68320 59395 95307 95770 32034 57015 26439 2878 40394 33748 41552 64939 49762 71841 40393 38293 48853 81628 52111 49934 74061 98537 83075 83920 42792 96943 3357 83393{-truncated-}
Download to view the full testcase
Expected Output
49921.5
49253.5
2184
Your issue seems to be that you are actually using standard list operations rather than calculating things on the fly, while looping through the data once (for the average). sum(sample_list) will almost surely give you something which exceeds the double-limit, i.a.w. it becomes really big.
Further reading
Calculating the mean, variance, skewness, and kurtosis on the fly
How do I determine the standard deviation (stddev) of a set of values?
Rolling variance algorithm
What is a good solution for calculating an average where the sum of all values exceeds a double's limits?
How do I determine the standard deviation (stddev) of a set of values?
How to efficiently compute average on the fly (moving average)?
I figured out that you forgot to change the max_count variable inside the if block. Probably that causes the wrong result. I tested the debugged version on my computer and they seem to work well when I compare their result with the scipy's built-in functions. The correct mode function should be
def mode(sample_list):
N = len(sample_list)
sorted_list = sorted(sample_list)
mode = min(sorted_list)
max_count = sorted_list.count(mode)
for i in sorted_list:
if (sorted_list.count(i) >= max_count):
mode = i
max_count = sorted_list.count(i)
print(mode)
I was busy with some stuff and now came back to completing this. I am happy to say that I have matured enough as a coder and solved this issue.
Here is the solution:
# Enter your code here. Read input from STDIN. Print output to STDOUT
# Input an array of numbers, convert it to integer array
n = int(input())
my_array = list(map(int, input().split()))
my_array.sort()
# Find mean
array_mean = sum(my_array) / n
print(array_mean)
# Find median
if (n%2) != 0:
array_median = my_array[n//2]
else:
array_median = (my_array[n//2 - 1] + my_array[n//2]) / 2
print(array_median)
# Find mode(I could do this using multimode method of statistics module for python 3.8)
def sort_second(array):
return array[1]
modes = [[i, my_array.count(i)] for i in my_array]
modes.sort(key = sort_second, reverse=True)
array_mode = modes[0][0]
print(array_mode)
I'm looking for the process to happen on the fly since it can be generalized and also efficient.
This is the code I tried. It has the basic logic. However this does not work for all inputs.
result=[{4:5},{4:6},{4:7}]
new=[]
new=result.copy()
print("Original",result)
for i in range(0,len(result)-1):
for key, value in result[i].items():
for j in range(1,len(result)):
for key_1, value_1 in result[j].items() :
if i!=j:
if key==key_1:
print("when key=key\n",result[i],"=",result[j])
dict={value:value_1}
new[j]=dict
print(new)
break;
if key==value_1:
print("when key=value\n",result[i],"=",result[j])
dict={value:key_1}
new[j]=dict
print(new)
if value==value_1:
print("when value=value\n",result[i],"=",result[j])
dict={key:key_1}
new[j]=dict
print(new)
else:
break;
print("\nFinal =",new)
print("\nCorrect =[{4:5},{5:6},{6:7}]" )
input: [{4:5},{4:6},{4:7}]
output: [{4:5},{5:6},{6:7}]
An example like:
print("1.", get_list_nums_without_9([589775, 677017, 34439, 48731548, 782295632, 181967909]))
print("2.", get_list_nums_without_9([6162, 29657355, 5485406, 422862350, 74452, 480506, 2881]))
print("3.", get_list_nums_without_9([292069010, 73980, 8980155, 921545108, 75841309, 6899644]))
print("4.", get_list_nums_without_9([]))
nums = [292069010, 73980, 8980155, 21545108, 7584130, 688644, 644908219, 44281, 3259, 8527361, 2816279, 985462264, 904259, 3869, 609436333, 36915, 83705, 405576, 4333000, 79386997]
print("5.", get_list_nums_without_9(nums))
I'm trying to get number list without 9. If the list is empty or if all of the numbers in the list contain the digit 9, the function should return an empty list. I tried the function below, it doesn't work.
def get_list_nums_without_9(a_list):
j=0
for i in a_list:
a_list[j]=i.rstrip(9)
j+=1
return a_list
expected:
1. [677017, 48731548]
2. [6162, 5485406, 422862350, 74452, 480506, 2881]
3. []
4. []
5. [21545108, 7584130, 688644, 44281, 8527361, 83705, 405576, 4333000]
your lists contain integers. To remove the ones containing 9 the best way is to test if 9 belongs to the number as string and rebuild the output using a list comprehension with a conditional.
(Besides, rstrip removes the trailing chars of a string. Not suitable at all for your problem)
def get_list_nums_without_9(a_list):
return [x for x in a_list if "9" not in str(x)]
testing with your data:
>>> numbers = [
... [589775, 677017, 34439, 48731548, 782295632, 181967909],
... [6162, 29657355, 5485406, 422862350, 74452, 480506, 2881],
... [292069010, 73980, 8980155, 921545108, 75841309, 6899644],
... [],
... [292069010, 73980, 8980155, 21545108, 7584130, 688644, 644908219,
... 44281, 3259, 8527361, 2816279, 985462264, 904259, 3869, 609436333,
... 36915, 83705, 405576, 4333000, 79386997]
... ]
>>> for i, l in enumerate(numbers, 1):
... print("{}. {}".format(i, get_list_nums_without_9(l)))
1. [677017, 48731548]
2. [6162, 5485406, 422862350, 74452, 480506, 2881]
3. []
4. []
5. [21545108, 7584130, 688644, 44281, 8527361, 83705, 405576, 4333000]
You don't want to replace anything in the list, you want to generate a new list based on what you detect in the old list:
def get_list_nums_without_9(numbers):
sans_9 = []
for number in numbers:
if '9' not in str(number):
sans_9.append(number)
return sans_9
This will get you an empty list if there are no valid numbers.