I can not for the life of me figure this problem out for school. If somebody could just point me in the right direction?
Given a list of words, select the one that earns the highest score.
ScrabbleAssistant
+ getBestWord(List<String> options,
Map<Character, Integer> letterScores) : String
+ getScore(String word,
Map<Character, Integer> letterScores) : int
We will give you a list of words and a dictionary that tells you how many points each letter in the word is worth. You return the word that earned the highest score.
I normally wouldn't do someone's homework for them, but here is the pseudo code
maxScore = 0
bestWord = ""
for each word
totalScore = 0
for each letter in word
totaScore = total + lookup value of letter from collection
if totalScore > maxScore
maxScore = totalScore
bestWord = word
Related
I came along the following problem and do not know how to solve it. The problem is to find the length of the longest substring of repeating A's and return the value of the length for every string in this list:
['>KF735813.1 HIV-1 isolate Cameroon1(ViroSeq) HIV DR 02 from Cameroon
pol protein (pol) gene, partial cds',
'CCTCAAATCACTCTTTGGCAACGACCCTTAGTCACAGTTAGGATAGAGGGACAGTTAATAGAAGCCCTATTAGACACAGG',
'GGCAGATGATACAGTATTAGAAGAGATAAATTTACCAGGAAAATGGAAACCAAAAATGATAGGGGGAATTGGAGGTTTTA',
'TCAAAGTAAGACAGTATGATCAGATACTTATAGAAATTTGTGGAAAAAGGGCCATAGGTACAGTATTAGTAGGACCTACA',
'CCTGTCAACATAATTGGACGAAACATGTTGACTCAGATTGGTTGTACTTTAAATTTTCCAATTAGTCCTATTGAAACTGT',
'GCCAGTAAAATTAAAGCCAGGTATGGATGGCCCAAAGGTAAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAA',
'CAGAAATTTGTACAGATATGGAAAAGGAGGGAAAAATTTCAATAATTGGGCCTGAAAATCCATATAATACTCCAGTATTT',
'GCCATAAAGAAAAAAGATAGTACTAAATGGAGAAAATTAGTAGATTTTAGAGAACTTAATAAGAGAACTCAAGACTTCTG',
'GGAGATCCAATTAGGAATACCTCATCCCGCGGGATTAAAAAAGAACAAATCAGTAACAGTACTAGATGTGGGGGATGCAT',
'ATTTTTCAGTTCCCTTAGATTAAGACTTTAGAAAGTACACTGCATTCACTATACCTAGTTTAAATAATGCAACACCAGGT',
'ATTAGATATCAGTACAATGTGCTTCCACAGGGATGGAAAGGATCACCAGCAATATTTCAGGCAAGCATGACAAAAATCTT',
'AGAGCCCTTTAGGACAAAATATCCAGAAATAGTGATCTACCAATATATGGATGATTTATATGTAGGATCAGACTTAGAGA',
'TAGGGCAGCATAGAGCAAAAATAGAGGAGTTGAGAGTACATCTATTGAAGTGGGGATTTACCACACCAGACAAAAAACAT',
'CAGAAAGAACCTCCATTTCTTTGGATGGGATATGAACTCCATCCTGACAAATGGACAGTCCAGCCTATACAGCTGCCAGA',
'AAAAGACAGCTGGACTGTCAATGATATACAGAAATTAGTGGGAAAACTAAATTGGGCAAGTCAGATTTATGCAGGAATTA',
'AAGTAAAGCAACTGTGTAGACTCCTCAGGGGAGCCAAAGCACTAACAGAGGTAGTACCACTAACTGAGGAAGCAGAATTA',
'GAATTGGCAGATAACAGGGAGATTCTAAAAGAACCTGTACATGGAGTATATTATGACCCAACAAAAGACTTAGTAGCAGA',
'AATACAGAAGCAAGGGCAAGAC']
Here is the function I have tried to do, but I know this is the wrong approach:
for c in range(len(fastarec_Lines)):
if fastarec_Lines[c].count('A') == current:
count += 1
else:
count = 1
current = fastarec_Lines[c]
maximum = max(count,maximum)
return maximum
Can someone help me out ?
One approach would be to do a regex find all search on the pattern A+. Then, sort the resulting string based on length, and print out the last element:
seq = "AATTGGCCAAAAATTGCA"
matches = re.findall(r'A+', seq)
matches.sort(lambda x,y: cmp(len(x), len(y)))
print("longest string is " + matches[-1] + " with a length of " + str(len(matches[-1])))
This prints:
longest string is AAAAA with a length of 5
I've got a list of strings like
Foobar
Foobaron
Foot
barstool
barfoo
footloose
I want to find the set of shortest possible sub-sequences that are unique to each string in the set; the characters in each sub-sequence do not need to be adjacent, just in order as they appear in the original string. For the example above, that would be (along other possibilities)
Fb (as unique to Foobar as it gets; collision with Foobaron unavoidable)
Fn (unique to Foobaron, no other ...F...n...)
Ft (Foot)
bs (barstool)
bf (barfoo)
e (footloose)
Is there an efficient way to mine such sequences and minimize the number of colliding strings (when collisions can't be avoided, e.g. when strings are substrings of other strings) from a given array of strings? More precisely, chosing the length N, what is the set of sub-sequences of up to N characters each that identify the original strings with the fewest number of collisions.
I would'nt really call that 'efficient', but you can do better than totally dumb like that:
words = ['Foobar', 'Foobaron', 'Foot', 'barstool', 'barfoo', 'footloose']
N = 2
n = len(words)
L = max([len(word) for word in words])
def generate_substrings(word, max_length=None):
if max_length is None:
max_length = len(word)
set_substrings = set()
set_substrings.add('')
for charac in word:
new_substr_list = []
for substr in set_substrings:
new_substr = substr + charac
if len(new_substr) <= max_length:
new_substr_list.append(new_substr)
set_substrings.update(new_substr_list)
return set_substrings
def get_best_substring_for_each(string_list=words, max_length=N):
all_substrings = {}
best = {}
for word in string_list:
for substring in generate_substrings(word, max_length=max_length):
if substring not in all_substrings:
all_substrings[substring] = 0
all_substrings[substring] = all_substrings[substring] + 1
for word in string_list:
best_score = len(string_list) + 1
best[word] = ''
for substring in generate_substrings(word=word, max_length=max_length):
if all_substrings[substring] < best_score:
best[word] = substring
best_score = all_substrings[substring]
return best
print(get_best_substring_for_each(words, N))
This program prints the solution:
{'barfoo': 'af', 'Foobar': 'Fr', 'Foobaron': 'n', 'footloose': 'os', 'barstool': 'al', 'Foot': 'Ft'}
This can still be improved easily by a constant factor, for instance by storing the results of generate_substringsinstead of computing it twice.
The complexity is O(n*C(N, L+N)), where n is the number of words and L the maximum length of a word, and C(n, k) is the number of combinations with k elements out of n.
I don't think (not sure though) that you can do much better in the worst case, because it seems hard not to enumerate all possible substrings in the worst case (the last one to be evaluated could be the only one with no redundancy...). Maybe in average you can do better...
You could use a modification to the longest common subsequence algorithm. In this case you are seeking the shortest unique subsequence. Shown below is part of a dynamic programming solution which is more efficient than a recursive solution. The modifications to the longest common subsequence algorithm are described in the comments below:
for (int i = 0; i < string1.Length; i++)
for (int j = 0; j < string2.Length; j++)
if (string1[i-1] != string2[j-1]) // find characters in the strings that are distinct
SUS[i][j] = SUS[i-1][j-1] + 1; // SUS: Shortest Unique Substring
else
SUS[i][j] = min(SUS[i-1][j], SUS[i][j-1]); // find minimum size of distinct strings
You can then put this code in a function and call this function for each string in your set to find the length of the shortest unique subsequence in the set.
Once you have the length of the shortest unique subsequence, you can backtrack to print the subsequence.
You should use modified Trie structure, insert strings to a trie in a way that :
Foo-bar-on
-t
bar-stool
-foo
The rest is straightforward, just choose correct compressed node[0] char
That Radix tree should help
I have a list of words in Pandas (DF)
Words
Shirt
Blouse
Sweater
What I'm trying to do is swap out certain letters in those words with letters from my dictionary one letter at a time.
so for example:
mydict = {"e":"q,w",
"a":"z"}
would create a new list that first replaces all the "e" in a list one at a time, and then iterates through again replacing all the "a" one at a time:
Words
Shirt
Blouse
Sweater
Blousq
Blousw
Swqater
Swwater
Sweatqr
Sweatwr
Swezter
I've been looking around at solutions here: Mass string replace in python?
and have tried the following code but it changes all instances "e" instead of doing so one at a time -- any help?:
mydict = {"e":"q,w"}
s = DF
for k, v in mydict.items():
for j in v:
s['Words'] = s["Words"].str.replace(k, j)
DF["Words"] = s
this doesn't seem to work either:
s = DF.replace({"Words": {"e": "q","w"}})
This answer is very similar to Brian's answer, but a little bit sanitized and the output has no duplicates:
words = ["Words", "Shirt", "Blouse", "Sweater"]
md = {"e": "q,w", "a": "z"}
md = {k: v.split(',') for k, v in md.items()}
newwords = []
for word in words:
newwords.append(word)
for c in md:
occ = word.count(c)
pos = 0
for _ in range(occ):
pos = word.find(c, pos)
for r in md[c]:
tmp = word[:pos] + r + word[pos+1:]
newwords.append(tmp)
pos += 1
Content of newwords:
['Words', 'Shirt', 'Blouse', 'Blousq', 'Blousw', 'Sweater', 'Swqater', 'Swwater', 'Sweatqr', 'Sweatwr', 'Swezter']
Prettyprint:
Words
Shirt
Blouse
Blousq
Blousw
Sweater
Swqater
Swwater
Sweatqr
Sweatwr
Swezter
Any errors are a result of the current time. ;)
Update (explanation)
tl;dr
The main idea is to find the occurences of the character in the word one after another. For each occurence we are then replacing it with the replacing-char (again one after another). The replaced word get's added to the output-list.
I will try to explain everything step by step:
words = ["Words", "Shirt", "Blouse", "Sweater"]
md = {"e": "q,w", "a": "z"}
Well. Your basic input. :)
md = {k: v.split(',') for k, v in md.items()}
A simpler way to deal with replacing-dictionary. md now looks like {"e": ["q", "w"], "a": ["z"]}. Now we don't have to handle "q,w" and "z" differently but the step for replacing is just the same and ignores the fact, that "a" only got one replace-char.
newwords = []
The new list to store the output in.
for word in words:
newwords.append(word)
We have to do those actions for each word (I assume, the reason is clear). We also append the world directly to our just created output-list (newwords).
for c in md:
c as short for character. So for each character we want to replace (all keys of md), we do the following stuff.
occ = word.count(c)
occ for occurrences (yeah. count would fit as well :P). word.count(c) returns the number of occurences of the character/string c in word. So "Sweater".count("o") => 0 and "Sweater".count("e") => 2.
We use this here to know, how often we have to take a look at word to get all those occurences of c.
pos = 0
Our startposition to look for c in word. Comes into use in the next loop.
for _ in range(occ):
For each occurence. As a continual number has no value for us here, we "discard" it by naming it _. At this point where c is in word. Yet.
pos = word.find(c, pos)
Oh. Look. We found c. :) word.find(c, pos) returns the index of the first occurence of c in word, starting at pos. At the beginning, this means from the start of the string => the first occurence of c. But with this call we already update pos. This plus the last line (pos += 1) moves our search-window for the next round to start just behind the previous occurence of c.
for r in md[c]:
Now you see, why we updated mc previously: we can easily iterate over it now (a md[c].split(',') on the old md would do the job as well). So we are doing the replacement now for each of the replacement-characters.
tmp = word[:pos] + r + word[pos+1:]
The actual replacement. We store it in tmp (for debug-reasons). word[:pos] gives us word up to the (current) occurence of c (exclusive c). r is the replacement. word[pos+1:] adds the remaining word (again without c).
newwords.append(tmp)
Our so created new word tmp now goes into our output-list (newwords).
pos += 1
The already mentioned adjustment of pos to "jump over c".
Additional question from OP: Is there an easy way to dictate how many letters in the string I want to replace [(meaning e.g. multiple at a time)]?
Surely. But I have currently only a vague idea on how to achieve this. I am going to look at it, when I got my sleep. ;)
words = ["Words", "Shirt", "Blouse", "Sweater", "multipleeee"]
md = {"e": "q,w", "a": "z"}
md = {k: v.split(',') for k, v in md.items()}
num = 2 # this is the number of replaces at a time.
newwords = []
for word in words:
newwords.append(word)
for char in md:
for r in md[char]:
pos = multiples = 0
current_word = word
while current_word.find(char, pos) != -1:
pos = current_word.find(char, pos)
current_word = current_word[:pos] + r + current_word[pos+1:]
pos += 1
multiples += 1
if multiples == num:
newwords.append(current_word)
multiples = 0
current_word = word
Content of newwords:
['Words', 'Shirt', 'Blouse', 'Sweater', 'Swqatqr', 'Swwatwr', 'multipleeee', 'multiplqqee', 'multipleeqq', 'multiplwwee', 'multipleeww']
Prettyprint:
Words
Shirt
Blouse
Sweater
Swqatqr
Swwatwr
multipleeee
multiplqqee
multipleeqq
multiplwwee
multipleeww
I added multipleeee to demonstrate, how the replacement works: For num = 2 it means the first two occurences are replaced, after them, the next two. So there is no intersection of the replaced parts. If you would want to have something like ['multiplqqee', 'multipleqqe', 'multipleeqq'], you would have to store the position of the "first" occurence of char. You can then restore pos to that position in the if multiples == num:-block.
If you got further questions, feel free to ask. :)
Because you need to replace letters one at a time, this doesn't sound like a good problem to solve with pandas, since pandas is about doing everything at once (vectorized operations). I would dump out your DataFrame into a plain old list and use list operations:
words = DF.to_dict()["Words"].values()
for find, replace in reversed(sorted(mydict.items())):
for word in words:
occurences = word.count(find)
if not occurences:
print word
continue
start_index = 0
for i in range(occurences):
for replace_char in replace.split(","):
modified_word = list(word)
index = modified_word.index(find, start_index)
modified_word[index] = replace_char
modified_word = "".join(modified_word)
print modified_word
start_index = index + 1
Which gives:
Words
Shirt
Blousq
Blousw
Swqater
Swwater
Sweatqr
Sweatwr
Words
Shirt
Blouse
Swezter
Instead of printing the words, you can append them to a list and re-create a DataFrame if that's what you want to end up with.
If you are looping, you need to update s at each cycle of the loop. You also need to loop over v.
mydict = {"e":"q,w"}
s=deduped
for k, v in mydict.items():
for j in v:
s = s.replace(k, j)
Then reassign it to your dataframe:
df["Words"] = s
If you can write this as a function that takes in a 1d array (list, numpy array etc...), you can use df.apply to apply it to any column, using df.apply().
so far this is what i have and this and this will just print out a list with the count of each letter in the string. it only checks for lowercase letters.
`S="""Four score and seven years ago our fathers brought forth on this continent a new nation
Now we are engaged in a great civil war
"""
lowlet = S.lower()
L_count = [0]*26
total_count = 0
alpha = "abcdefghijklmnopqrstuvwxyz"
i = 0
while i < len(lowlet):
if lowlet[i] in alpha:
L_count[i][ord(lowlet[i]) - 97] += 1
total_count += 1
i += 1
print('total count of letters:',total_count)'
now im giving this algorithm but i cant put it into code and i cant use a for loop i have to use a while loop
Initialize a list, L_freq.
For each element, count, in L_counts
Find the letter corresponding to this count
Insert in L_freq, the list: [count, letter]
is it a requirement that it be a list? I feel like a dictionary would be easier to handle
sentence = s.lower()
counts = { letter: sentence.count(letter) for letter in alpha }
print(counts)
this will print like:
{'a': 5, 'b': 2}
You are given a string and can change at most Q letters in the string. You are also given a list of substrings (each two characters long), with a corresponding score. Each occurance of the substring within the string adds to your total score. What is the maximum possible attainable score?
String length <= 150, Q <= 100, Number of Substrings <= 700
Example:
String = bpdcg
Q = 2
Substrings:
bz - score: 2
zd - score: 5
dm - score: 7
ng - score: 10
In this example, you can achieve the maximum score b changing the "p" in the string to a "z" and the "c" to an "n". Thus, your new string is "bzdng" which has a score of 2+5+10 = 17.
I know that given a string which already has the letters changed, the score can be checked in linear time using a dictionary matching algorithm such as aho-corasick (or with a slightly worse complexity, Rabin Karp). However, trying each two letter substitution will take too long and then checking will take too long.
Another possible method I thought was to work backwards, to construct the ideal string from the given substrings and then check whether it differs by at most two characters from the original string. However, I am not sure how to do this, and even if it could be done, I think that it would also take too long.
What is the best way to go about this?
An efficient way to solve this is to use dynamic programming.
Let L be the set of letters that start any of the length-2 scoring substrings, and a special letter "*" which stands for any other letter than these.
Let S(i, j, c) be the maximum score possible in the string (up to index i) using j substitutions, where the string ends with character c (where c in L).
The recurrence relations are a bit messy (or at least, I didn't find a particularly beautiful formulation of them), but here's some code that computes the largest score possible:
infinity = 100000000
def S1(L1, L2, s, i, j, c, scores, cache):
key = (i, j, c)
if key not in cache:
if i == 0:
if c != '*' and s[0] != c:
v = 0 if j >= 1 else -infinity
else:
v = 0 if j >= 0 else -infinity
else:
v = -infinity
for d in L1:
for c2 in [c] if c != '*' else L2 + s[i]:
jdiff = 1 if s[i] != c2 else 0
score = S1(L1, L2, s, i-1, j-jdiff, d, scores, cache)
score += scores.get(d+c2 , 0)
v = max(v, score)
cache[key] = v
return cache[key]
def S(s, Q, scores):
L1 = ''.join(sorted(set(w[0] for w in scores))) + '*'
L2 = ''.join(sorted(set(w[1] for w in scores)))
return S1(L1, L2, s + '.', len(s), Q, '.', scores, {})
print S('bpdcg', 2, {'bz': 2, 'zd': 5, 'dm': 7, 'ng': 10})
There's some room for optimisation:
the computation isn't terminated early if j goes negative
when given a choice, every value of L2 is tried, whereas only letters that can complete a scoring word from d need trying.
Overall, if there's k different letters in the scoring words, the algorithm runs in time O(QN*k^2). With the second optimisation above, this can be reduced to O(QNw) where w is the number of scoring words.