Sum of the items in a dictionary - python-3.x

I've been trying to do an exercise.. The objective is to sum the items and see which one has the highest value and return, each letter is corresponded in a value. For example, "Babel" worths 10 points (3+1+3+1+2) and "Xadrez" worths 21 points (8+1+2+1+1+8), so the program is supposed to return "Xadrez".
My code's this:
def better(l1):
dic = {'D':2, 'C':2, 'L':2, 'P':2, 'B':3, 'N':3, 'F':4, 'G':4,
'H':4, 'V':4, 'J':5, 'Q':6, 'X':8, 'Y':8, 'Z':8}
for word in dic.keys():
l1 = []
best = 0
sum = 0
word = word.split()
word = word.item()
sum = word.item()
best = word
l1 = l1.append(word)
return best
My idea is trying to split each word and sum the value of each letter in each word. Thanks.
Another example: (['ABACO', 'UTOPIA', 'ABADE']) >>'ABACO'

I would start by assigning a score value for every letter. This makes it easier to score the words because each letter has a specified value
points = \
{ 0: '_' # wild letter
, 1: 'eaionrtlsu'
, 2: 'dg'
, 3: 'bcmp'
, 4: 'fhvwy'
, 5: 'k'
, 8: 'jx'
, 10: 'qz'
}
You can easily build a dic like you have. It's better to define this outside of your scoring function because the dic can be reused instead of being recreated each time the function is run (as it does in your code)
dic = \
{ letter:score for (score,set) in points.items() for letter in set }
# { '_': 0
# , 'e': 1
# , 'a': 1
# , 'i': 1
# , ...
# , 'x': 8
# , 'q': 10
# , 'z': 10
# }
Scoring a word using dic is easy thanks to built-in sum function
def score_word (word):
return sum (dic[letter] for letter in word)
print (score_word ("babel")) # 9
print (score_word ("xadrez")) # 23
Now we need a max_word function that can determine the "max" of two words, w1 and w2
def max_word (w1, w2):
if (score_word (w1) > score_word (w2)):
return w1
else:
return w2
print (max_word ("babel", "xadrez")) # xadrez
Now it's easy to make a function that can accept any number of words, max_words
def max_words (w = None, *words):
if not words:
return w
else:
return max_word (w, max_words (*words))
print (max_words ('abaco', 'utopia', 'abade')) # abaco
print (max_words ()) # None

Related

Simple Python program to convert roman numerals to the international system

I've tried to put together a basic program to convert roman numbers to conventional numbers and I seem to have made some mistake and right now it's a hit or miss.Do take a look at the code.I've described it better in the code as comments.Thanks!
def roman(roman_num):
inter_dic = {
"I":1,"V":5,"X":10,"L":50,"C":100,"D":500,"M":1000
}
output = 0
num = roman_num.upper()
#this is to ensure that the number appears as uppercase alphabets
for numeral in num:
numeral_index = num.index(numeral)
#I've squeezed an exception here to handle the last digit in the number
try:
#for the additive part
if inter_dic[numeral] > inter_dic[num[numeral_index + 1]]:
output += inter_dic[numeral]
#for the subtraction part
elif inter_dic[numeral] < inter_dic[num[numeral_index + 1]]:
output -= inter_dic[numeral]
elif inter_dic[numeral] == inter_dic[num[numeral_index + 1]]:
output += inter_dic[numeral]
#the following line is actually dead code,but I've added it just in case.
else:
print("There was an error.")
#the IndexError will be called when the last digit is reached and so the last digit
#will be added
except IndexError:
output += inter_dic[numeral]
return output
assert roman("cxcix") == 199
#this returns an assertion error
#when the function is called,the output is 179
This should do what you want:
def roman(roman_num):
inter_dic = {
"I":1,"V":5,"X":10,"L":50,"C":100,"D":500,"M":1000
}
x = list(map(lambda x: inter_dic[x], roman_num.upper()))
for idx in range(len(x)-1):
if x[idx+1] > x[idx]:
x[idx] *= -1
return x
decimal = roman("cxcix")
print(decimal) # Output: [100, -10, 100, -1, 10]
print(sum(decimal)) # Output: 199
This works on the assumption that the numbers are properly structured. As in, the numbers represented should be in order of the biggest to the smallest.
The above code will work even if you just give it one character, because the loop is based on the range(len() - 1) of the given list that is created when we translate the letters into their integers.
If x == [100] then len(x)-1 == 0 which will just terminate the for-loop immediately, so we will not encounter any IndexError inside of the loop.
To explain what is happening in your code that differs from my version we can create a simple example here:
lst = list("abcabc")
for idx, letter in enumerate(lst):
print(f"Letter: {letter}, list.index: {lst.index(letter)}, Actual index: {idx}")
Output:
Letter: a, list.index: 0, Actual index: 0
Letter: b, list.index: 1, Actual index: 1
Letter: c, list.index: 2, Actual index: 2
Letter: a, list.index: 0, Actual index: 3
Letter: b, list.index: 1, Actual index: 4
Letter: c, list.index: 2, Actual index: 5
If we look at the documentation for list.index, we can see this description:
index(self, value, start=0, stop=2147483647, /)
Return first index of value.
Raises ValueError if the value is not present.
So because there are repeated values inside of lst when we call to check for it's index, it just returns the first value that matches the variable we give it.

Appending results from a list to a string

Heavy python beginner here. I want to create a simple function for a PIN guessing game that receives two 4-digit lists ( [guess], [answer] ) and returns a string with 4 letters stating how close I am to guessing the correct [answer] sequence (eg. Higher, True, Lower, Higher)
However, I get a new list for each string:
def checkNumbers(guess,right):
for n in range(4):
result = []
if guess[n] == right[n]:
result.append("T") #true
elif guess[n] < right[n]:
result.append("H") #higher
elif guess[n] > right[n]:
result.append("L") #lower
else:
result.append("F") #false
print (result)
return
checkNumbers([1,2,3,5],[2,2,1,6])
The result should look like this:
checkNumbers([1,2,3,4], [2, 2, 1 , 6]) #call function with ([guess], [answer])
'HTLH' #returns a string stating how accurate [guess] is to [answer] list
Result looks like this however:
checkNumbers([1,2,3,5],[2,2,1,6])
['H']
['T']
['L']
['H']
Thanks very much in advance for any help I could get.
you can use string instead of list or "".join()
def checkNumbers(guess, right):
result = ""
for n in range(4):
if guess[n] == right[n]:
result += "T" # true
elif guess[n] < right[n]:
result += "H" # higher
elif guess[n] > right[n]:
result += "L" # lower
else:
result += "F" # false
print(result)
but... maybe you want to use zip function
def checkNumbers(guess, right):
result = ""
for g, r in zip(guess, right):
if g == r:
result += "T" # true
elif g < r:
result += "H" # higher
elif g > r:
result += "L" # lower
else:
result += "F" # false
print(result)
Funny bonus here:
def checkNumbers(guess, right):
print("".join("THL"[(g > r) + (g != r)] for g, r in zip(guess, right)))
I don't get why you need else part...
Initiate the list and print the result outside of the loop:
def checkNumbers(guess, right):
result = []
for n in range(4):
# do loopy stuff
print (result)
return # not strictly necessary
If you do it inside, you are creating a new list on every iteration.

Select a number randomly with probability proportional to its magnitude from the given array of n elements

Ex 1: A = [0 5 27 6 13 28 100 45 10 79]
let f(x) denote the number of times x getting selected in 100 experiments.
f(100) > f(79) > f(45) > f(28) > f(27) > f(13) > f(10) > f(6) > f(5) > f(0)
My code:
def pick_a_number_from_list(A,l):
Sum = 0
#l = len(A)
for i in range(l):
Sum+=A[i]
A_dash = []
for i in range(l):
b=A[i]/Sum
A_dash.append(b)
#print(A_dash)
series = pd.Series(A_dash)
cumsum = series.cumsum(skipna=False)
#print(cumsum[9])
sample_value = uniform(0.0,1.0)
r = sample_value
print(r)
#for i in range(l):
if r<cumsum[1]:
return 1
elif r>cumsum[1] and r <cumsum[2]:
return 2
elif r<cumsum[3]:
return 3
elif r<cumsum[4]:
return 4
elif r<cumsum[5]:
return 5
elif r<cumsum[6]:
return 6
elif r<cumsum[7]:
return 7
elif r<cumsum[8]:
return 8
elif r<cumsum[9]:
return 9
def sampling_based_on_magnitued():
A = [0,5,27,6,13,28,100,45,10,79]
n = len(A)
#for i in range(1,10):
num = pick_a_number_from_list(A,n)
print(A[num])
sampling_based_on_magnitued()
In mu code i am using multiple if else statement and because it is hardcoded
i can make by o/p right till 10 element in the list.
I want to make my code dynamic for any value in the list.
Here in my code i have restricted it to n=10
Pls tell me how can i right generic code which can replace all if - elseif statement with for loop
sum1=0;
for i in A:
sum1+=i;
x=0
list1=[]
for i in A:
list1.append(x+i/sum1)
x=x+i/sum1;
#list1 contsins cumulative sum
bit=uniform(0,1)
for i in range (0,len(list1)):
if bit<list1[i]:
return A[i]
you may use this
you can use random.choices
A = [0,5, 27, 6, 13, 28, 100, 45, 10, 79]
let no of random values want to pick it be 100 s0 k=100
w = [0.0, 0.01597444089456869, 0.08626198083067092, 0.019169329073482427, 0.04153354632587859, 0.08945686900958466, 0.3194888178913738, 0.14376996805111822, 0.03194888178913738, 0.2523961661341853]
weights is calculsted by using A[i]/(total sum of all the values of A)
x = random.choices(A,w,k=100)
print(x)
it displays the values from list A according to there weights
Some changes in Bitan Guha Roy's code to return just one value
import numpy as np
sum1=0;
for i in A:
sum1+=i;
x=0
list1=[]
for i in A:
list1.append(x+i/sum1)
x=x+i/sum1;
# list1 contains cumulative sum
bit=np.random.uniform(0.0,1.0,1)
for i in range (0,len(list1)):
if bit>=list1[i] and bit<list1[i+1]:
print(A[i+1]) # or return if under a function
import random
lst=[0, 5 ,27, 6, 13, 28, 100, 45, 10,79]
def pick_a_number_from_list(A):
weights1=[]
for i in A:
weights1.append(i/sum(lst))
selected_random_number = random.choices(A,weights=weights1,k=1)
return selected_random_number
def sampling_based_on_magnitued():
for i in range(1,100):
number=pick_a_number_from_list(lst)
print(number)
sampling_based_on_magnitued()
# used random.choices which gives option to choose random number according respective weights. Please suggest any modification if you've any

Extending current code to include both median and mode

I have this line of code that i used for one assignment, but i can't figure out how to add the median and mode into the code to let it run without error.
def main():
filename = input('File name: ')
num=0
try:
infile = open(filename, 'r')
count = 0
total = 0.0
average = 0.0
maximum = 0
minimum = 0
range1 = 0
for line in infile:
num = int(line)
count = count + 1
total = total + num
if count == 1:
maximum = num
minimum = num
else:
if num > maximum:
maximum = num
if num < minimum:
minimum = num
if count > 0:
average = total / count
range1 = maximum - minimum
I'll jump right in and show you the code. It's a very simple and quite pythonic solution.
Solution
import statistics
def open_file(filename):
try:
return open(filename, 'r')
except OSError as e:
print(e)
return None
def main():
# Read file. Note that we are trusting the user input here without sanitizing.
fd = open_file(input('File name: '))
if fd is None: # Ensure we have a file descriptor
return
data = fd.read() # Read whole file
if data == '':
print("No data in file")
return
lines = data.split('\n') # Split the data into a list of strings
# We need to convert the list of strings to a list of integers
# I don't know a pythonic way of doing this.
for number, item in enumerate(lines):
lines[number] = int(item)
total_lines = len(lines)
total_sum = sum(lines)
maximum = max(lines)
minimum = min(lines)
# Here is the python magic, no need to reinvent the wheel!
mean = statistics.mean(lines) # mean == average
median = statistics.median(lines)
mode = "No mode!"
try:
mode = statistics.mode(lines)
except statistics.StatisticsError as ec:
pass # No mode, due to having the same quantity of 2 or more different values
print("Total lines: " + str(total_lines))
print("Sum: " + str(total_sum))
print("Max: " + str(maximum))
print("Min: " + str(minimum))
print("Mean: " + str(mean))
print("Median: " + str(median))
print("Mode: " + str(mode))
if __name__ == '__main__':
main()
Explanation
Generally, in python, it's safe to assume that if you want to calculate any mundane value using a well known algorithm, there will already be a function written for you to do just that. No need to reinvent the wheel!
These functions aren't usually hard to find online either. For instance, you can find suggestions regarding the statistics library by googling python calculate the median
Although you have the solution, I strongly advise looking through the source code of the statistics library (posted below), and working out how these functions work for yourself. It will help your grow as a developer and mathematician.
statistics.py
mean
def mean(data):
"""Return the sample arithmetic mean of data.
>>> mean([1, 2, 3, 4, 4])
2.8
>>> from fractions import Fraction as F
>>> mean([F(3, 7), F(1, 21), F(5, 3), F(1, 3)])
Fraction(13, 21)
>>> from decimal import Decimal as D
>>> mean([D("0.5"), D("0.75"), D("0.625"), D("0.375")])
Decimal('0.5625')
If ``data`` is empty, StatisticsError will be raised.
"""
if iter(data) is data:
data = list(data)
n = len(data)
if n < 1:
raise StatisticsError('mean requires at least one data point')
T, total, count = _sum(data)
assert count == n
return _convert(total/n, T)
median
def median(data):
"""Return the median (middle value) of numeric data.
When the number of data points is odd, return the middle data point.
When the number of data points is even, the median is interpolated by
taking the average of the two middle values:
>>> median([1, 3, 5])
3
>>> median([1, 3, 5, 7])
4.0
"""
data = sorted(data)
n = len(data)
if n == 0:
raise StatisticsError("no median for empty data")
if n%2 == 1:
return data[n//2]
else:
i = n//2
return (data[i - 1] + data[i])/2
mode
def mode(data):
"""Return the most common data point from discrete or nominal data.
``mode`` assumes discrete data, and returns a single value. This is the
standard treatment of the mode as commonly taught in schools:
>>> mode([1, 1, 2, 3, 3, 3, 3, 4])
3
This also works with nominal (non-numeric) data:
>>> mode(["red", "blue", "blue", "red", "green", "red", "red"])
'red'
If there is not exactly one most common value, ``mode`` will raise
StatisticsError.
"""
# Generate a table of sorted (value, frequency) pairs.
table = _counts(data)
if len(table) == 1:
return table[0][0]
elif table:
raise StatisticsError(
'no unique mode; found %d equally common values' % len(table)
)
else:
raise StatisticsError('no mode for empty data')

Greedy Motif Search in Python

I am studying the Bioinformatics course at Coursera, and have been stuck on the following problem for 5 days:
Implement GreedyMotifSearch.
Input: Integers k and t, followed by a collection of strings Dna.
Output: A collection of strings BestMotifs resulting from applying GreedyMotifSearch(Dna, k, t).
If at any step you find more than one Profile-most probable k-mer in a given string, use the
one occurring first.
Here's my attempt to solve this (I just copied it from my IDE, so pardon any print statements):
def GreedyMotifSearch(DNA, k, t):
"""
Documentation here
"""
import math
bestMotifs = []
bestScore = math.inf
for string in DNA:
bestMotifs.append(string[:k])
base = DNA[0]
for i in window(base, k):
newMotifs = []
for j in range(t):
profile = ProfileMatrix([i])
probable = ProfileMostProbable(DNA[j], k, profile)
newMotifs.append(probable)
if Score(newMotifs) <= bestScore:
bestScore = Score(newMotifs)
bestMotifs = newMotifs
return bestMotifs
The helper functions are these:
def SymbolToNumber(Symbol):
"""
Converts base to number (in lexicograpical order)
Symbol: the letter to be converted (str)
Returns: the number correspondinig to that base (int)
"""
if Symbol == "A":
return 0
elif Symbol == "C":
return 1
elif Symbol == "G":
return 2
elif Symbol == "T":
return 3
def NumberToSymbol(index):
"""
Finds base from number (in lexicographical order)
index: the number to be converted (int)
Returns: the base corresponding to index (str)
"""
if index == 0:
return str("A")
elif index == 1:
return str("C")
elif index == 2:
return str("G")
elif index == 3:
return str("T")
def HammingDistance(p, q):
"""
Finds the number of mismatches between 2 DNA segments of equal lengths
p: first DNA segment (str)
q: second DNA segment (str)
Returns: number of mismatches (int)
"""
return sum(s1 != s2 for s1, s2 in zip(p, q))
def window(s, k):
for i in range(1 + len(s) - k):
yield s[i:i+k]
def ProfileMostProbable(Text, k, Profile):
"""
Finds a k-mer that was most likely to be generated by profile among
all k-mers in Text
Text: given DNA segment (str)
k: length of pattern (int)
Profile: a 4x4 matrix (list)
Returns: profile-most probable k-mer (str)
"""
letter = [[] for key in range(k)]
probable = ""
hamdict = {}
index = 1
for a in range(k):
for j in "ACGT":
letter[a].append(Profile[j][a])
for b in range(len(letter)):
number = max(letter[b])
probable += str(NumberToSymbol(letter[b].index(number)))
for c in window(Text, k):
for x in range(len(c)):
y = SymbolToNumber(c[x])
index *= float(letter[x][y])
hamdict[c] = index
index = 1
for pat, ham in hamdict.items():
if ham == max(hamdict.values()):
final = pat
break
return final
def Count(Motifs):
"""
Documentation here
"""
count = {}
k = len(Motifs[0])
for symbol in "ACGT":
count[symbol] = []
for i in range(k):
count[symbol].append(0)
t = len(Motifs)
for i in range(t):
for j in range(k):
symbol = Motifs[i][j]
count[symbol][j] += 1
return count
def FindConsensus(motifs):
"""
Finds a consensus sequence for given list of motifs
motifs: a list of motif sequences (list)
Returns: consensus sequence of motifs (str)
"""
consensus = ""
for i in range(len(motifs[0])):
countA, countC, countG, countT = 0, 0, 0, 0
for motif in motifs:
if motif[i] == "A":
countA += 1
elif motif[i] == "C":
countC += 1
elif motif[i] == "G":
countG += 1
elif motif[i] == "T":
countT += 1
if countA >= max(countC, countG, countT):
consensus += "A"
elif countC >= max(countA, countG, countT):
consensus += "C"
elif countG >= max(countC, countA, countT):
consensus += "G"
elif countT >= max(countC, countG, countA):
consensus += "T"
return consensus
def ProfileMatrix(motifs):
"""
Finds the profile matrix for given list of motifs
motifs: list of motif sequences (list)
Returns: the profile matrix for motifs (list)
"""
Profile = {}
A, C, G, T = [], [], [], []
for j in range(len(motifs[0])):
countA, countC, countG, countT = 0, 0, 0, 0
for motif in motifs:
if motif[j] == "A":
countA += 1
elif motif[j] == "C":
countC += 1
elif motif[j] == "G":
countG += 1
elif motif[j] == "T":
countT += 1
A.append(countA)
C.append(countC)
G.append(countG)
T.append(countT)
Profile["A"] = A
Profile["C"] = C
Profile["G"] = G
Profile["T"] = T
return Profile
def Score(motifs):
"""
Finds score of motifs relative to the consensus sequence
motifs: a list of given motifs (list)
Returns: score of given motifs (int)
"""
consensus = FindConsensus(motifs)
score = 0.0000
for motif in motifs:
score += HammingDistance(consensus, motif)
#print(score)
return round(score, 4)
It seems fine to me. However, when I run this code for quiz problems, it gives an incorrect answer. Their code grading system shows this error:
Failed test #3. Your indexing may be off by one at the beginning of each string in Dna.
I have tried everything I can think of and run this code on all their sample data and debug data, but I simply can't figure out how to make this code work. Please help me with any possible solutions to this.
You have a few problems. I think this should address them all. I've included comments explaining each change along with your original code and a reference to the relevant Pseudocode in the debug data page you linked to.
def GreedyMotifSearch(DNA, k, t):
"""
Documentation here
"""
import math
bestMotifs = []
bestScore = math.inf
for string in DNA:
bestMotifs.append(string[:k])
base = DNA[0]
for i in window(base, k):
# Change here. Should start with one element in motifs and build up.
# As in the line "motifs ← list with only Dna[0](i,k)"
# newMotifs = []
newMotifs = [i]
# Change here to iterate over len(DNA).
# Should go through "for j from 1 to |Dna| - 1"
# for j in range(t):
for j in range(1, len(DNA)):
# Change here. Should build up motifs and build profile using them.
# profile = ProfileMatrix([i])
profile = ProfileMatrix(newMotifs)
probable = ProfileMostProbable(DNA[j], k, profile)
newMotifs.append(probable)
# Change to < rather < = to ensure getting the most recent hit. As referenced in the instructions:
# If at any step you find more than one Profile-most probable k-mer in a given string, use the one occurring **first**.
if Score(newMotifs) < bestScore:
#if Score(newMotifs) <= bestScore:
bestScore = Score(newMotifs)
bestMotifs = newMotifs
return bestMotifs

Resources