I want to create a list of vowels from a string that contains letters and still have the consonants in that string. I have something that worked for a particular string, with one occurrence in that string but when I have multiple occurrences, it does not work.
PRINCELY worked well,
EMEKA did not work.
I need help!
alphabets = {"A":1,"B":2,"C":3, "D":4,"E":5,"F":6,"G":7,"H":8,"I":9,"J":1,"K":2,"L":3,"M":4,"N":5,"O":6,"P":7,"Q":8,"R":9,"S":1,
"T":2,"U":3,"V":4,"W":5, "X":6,"Y":7,"Z":8}
def digit_sum(num):
return sum( [ int(char) for char in str(num) ] )
def numerology(word):
total = 0
for letter in word:
total += alphabets[letter]
total = digit_sum(total)
return total
fan = 'PRINCELY'
def vowels(fan):
vowels=[]
if 'I' in fan:
vowels.append(9)
fan1=fan[:fan.index('I')]+fan[fan.index('I')+1:]
consonant = fan1
if 'E' in fan:
vowels.append(5)
fan2=fan1[:fan1.index('E')]+fan1[fan1.index('E')+1:]
consonant = fan2
if 'A' in fan:
vowels.append(1)
fan3=fan2[:fan2.index('A')]+fan2[fan2.index('A')+1:]
consonant = fan3
if 'O' in fan:
vowels.append(6)
fan4=fan3[:fan3.index('O')]+fan3[fan3.index('O')+1:]
consonant = fan4
if 'U' in fan:
vowels.append(3)
fan5=fan4[:fan4.index('U')]+fan4[fan4.index('U')+1:]
consonant = fan5
print(vowels)
print(consonant)
print(digit_sum(sum(vowels)))
cons = numerology(consonant)
print(cons)
vowels(fan)
#outputs
#[9, 5]
#PRNCLY
#5
#7
The easiest way is to use str.translate with an appropriate translation table. First we'll make a function that takes a character and returns the appropriate number as a string. We'll use that to build a translation table. Then we just use that translation table on the string
def get_number(char):
return str(ord(char.lower())-96) # lowercase ascii letters start at 97
vowels = (vowel for letter in 'aeiou' for vowel in (letter, letter.upper()))
table = str.maketrans({vowel: get_number(vowel) for vowel in vowels})
print('PRINCELYPRINCELY'.translate(table))
# PR9NC5LYPR9NC5LY
To sort the string into vowels and consonants, and then turn the vowels into numbers you could do
s = 'PRINCELYPRINCELY'
vowels = [ord(char.lower())-96 for char in s if char.lower() in 'aeiou']
# [9, 5, 9, 5]
consonants = s.translate(str.maketrans('','','aeiouAEIOU'))
# 'PRNCLYPRNCLY'
I have already seen this answer to a similar question:
https://stackoverflow.com/a/44311921/5881884
Where the ahocorasick algorithm is used to show if each word in a list exists in a string or not with O(n). But I want to get the frequency of each word in a list in a string.
For example if
my_string = "some text yes text text some"
my_list = ["some", "text", "yes", "not"]
I would want the result:
[2, 3, 1, 0]
I did not find an exact example for this in the documentation, any idea how to accomplish this?
Other O(n) solutions than using ahocorasick would also be appreciated.
Implementation:
Here's an Aho-Corasick frequency counter:
import ahocorasick
def ac_frequency(needles, haystack):
frequencies = [0] * len(needles)
# Make a searcher
searcher = ahocorasick.Automaton()
for i, needle in enumerate(needles):
searcher.add_word(needle, i)
searcher.make_automaton()
# Add up all frequencies
for _, i in searcher.iter(haystack):
frequencies[i] += 1
return frequencies
(For your example, you'd call ac_frequency(my_list, my_string) to get the list of counts)
For medium-to-large inputs this will be substantially faster than other methods.
Notes:
For real data, this method will potentially yield different results than the other solutions posted, because Aho-Corasick looks for all occurrences of the target words, including substrings.
If you want to find full-words only, you can call searcher.add_word with space/punctuation-padded versions of the original string:
...
padding_start = [" ", "\n", "\t"]
padding_end = [" ", ".", ";", ",", "-", "–", "—", "?", "!", "\n"]
for i, needle in enumerate(needles):
for s, e in [(s,e) for s in padding_start for e in padding_end]:
searcher.add_word(s + needle + e, i)
searcher.make_automaton()
# Add up all frequencies
for _, i in searcher.iter(" " + haystack + " "):
...
The Counter in the collections module may be of use to you:
from collections import Counter
my_string = "some text yes text text some"
my_list = ["some", "text", "yes", "not"]
counter = Counter(my_string.split(' '))
[counter.get(item, 0) for item in my_list]
# out: [2, 3, 1, 0]
You can use list comprehensions to count the number of times the specific list occurs in my_string:
[my_string.split().count(i) for i in my_list]
[2, 3, 1, 0]
You can use a dictionary to count the occurrences of the words you care about:
counts = dict.fromkeys(my_list, 0) # initialize the counting dict with all counts at zero
for word in my_string.split():
if word in counts: # this test filters out any unwanted words
counts[word] += 1 # increment the count
The counts dict will hold the count of each word. If you really do need a list of counts in the same order as the original list of keywords (and the dict won't do), you can add a final step after the loop has finished:
results = [counts[word] for word in my_list]
i need sum in string letters value ex.
a = 1
b = 2
c = 3
d = 4
alphabet = 'abcdefghijklmnopqrstuvwxyz'
v1
string = "abcd"
# #result = sum(string) so
if string[0] and string[1] and string[2] and string[3] in alphabet:
if string[0] is alphabet[0] and string[1] is alphabet[1] and string[2] is alphabet[2] and string[3] is alphabet[3]:
print(a+b+c+d)
v2
string = ("ab","aa","dc",)
if string[0][0] and string[0][1] and string[1][0] and string[1][1] and string[2][0] and string[2][1] in alphabet:
if string[0] is alphabet[0] and string[1] is alphabet[1] and string[2] is alphabet[2] and string[3] is alphabet[3]:
print(a+b+c+d)
what is the solution? can you help me
Use the sum() function and a generator expression; a dictionary built from string.ascii_lowercase can serve as a means to getting an integer value per letter:
from string import ascii_lowercase
letter_value = {c: i for i, c in enumerate(ascii_lowercase, 1)}
wordsum = sum(letter_value.get(c, 0) for c in word if c)
The enumerate(ascii_lowercase, 1) produces (index, letter) pairs when iterated over, starting at 1. That gives you (1, 'a'), (2, 'b'), etc. That can be converted to c: i letter pairs in a dictionary, mapping letter to integer number.
Next, using the dict.get() method lets you pick a default value; for any character in the input string, you get to look up the numeric value and map it to an integer, but if the character is not a lowercase letter, 0 is returned instead. The sum(...) part with the loop then simply adds those values up.
If you need to support sequences with words, just use sum() again. Put the above sum() call in a function, and apply that function to each word in a sequence:
from string import ascii_lowercase
letter_value = {c: i for i, c in enumerate(ascii_lowercase, 1)}
def sum_word(word):
return sum(letter_value.get(c, 0) for c in word if c)
def sum_words(words):
return sum(sum_word(word) for word in words)
The old-fashioned way is to take advantage of the fact that lowercase letters are contiguous, so that ord(b) - ord(a) == 1:
data = "abcd"
print("Sum:", sum(ord(c)-ord("a")+1 for c in data))
Of course you could "optimize" it to reduce the number of computations, though it seems silly in this case:
ord_a = ord("a")
print("Sum:", sum(ord(c)-ord_a for c in data)+len(data))
I tried to print the index of every i in the word Mississippi. I've got the result but the print statement is repeating 3 times. This is the code
s="Mississippi"
start=0
while start<len(s):
print "the index of i is: ", s.find('i',start,len(s))
start=start+1
If you use enumerate then you iterate through the string, looking at each letter and idx is counting upwards as you go.
for idx, letter in enumerate(s):
if letter == "i":
print "the index of i is: ", idx
do you want to print the indexes as a list? try this:
l = []
for index, char in enumerate('mississippi'):
if char == 'i':
l.append(index)
print "the index of i is: ", l
the result will be:
the index of i is: [1, 4, 7, 10]
Because of the while loop, you print the last found position of "i" in each run.
I would prefer a for loop over the string:
s="Misssssissippi"
start=0
character="i"
for i in s:
if (i == character):
print "the index of i is: ", start
start=start+1
import re
s="Wisconsin"
for c in re.finditer('i',s):
print c.start(0)
Is there any algorithm that can be used to find the most common phrases (or substrings) in a string? For example, the following string would have "hello world" as its most common two-word phrase:
"hello world this is hello world. hello world repeats three times in this string!"
In the string above, the most common string (after the empty string character, which repeats an infinite number of times) would be the space character .
Is there any way to generate a list of common substrings in this string, from most common to least common?
This is as task similar to Nussinov algorithm and actually even simpler as we do not allow any gaps, insertions or mismatches in the alignment.
For the string A having the length N, define a F[-1 .. N, -1 .. N] table and fill in using the following rules:
for i = 0 to N
for j = 0 to N
if i != j
{
if A[i] == A[j]
F[i,j] = F [i-1,j-1] + 1;
else
F[i,j] = 0;
}
For instance, for B A O B A B:
This runs in O(n^2) time. The largest values in the table now point to the end positions of the longest self-matching subquences (i - the end of one occurence, j - another). In the beginning, the array is assumed to be zero-initialized. I have added condition to exclude the diagonal that is the longest but probably not interesting self-match.
Thinking more, this table is symmetric over diagonal so it is enough to compute only half of it. Also, the array is zero initialized so assigning zero is redundant. That remains
for i = 0 to N
for j = i + 1 to N
if A[i] == A[j]
F[i,j] = F [i-1,j-1] + 1;
Shorter but potentially more difficult to understand. The computed table contains all matches, short and long. You can add further filtering as you need.
On the next step, you need to recover strings, following from the non zero cells up and left by diagonal. During this step is also trivial to use some hashmap to count the number of self-similarity matches for the same string. With normal string and normal minimal length only small number of table cells will be processed through this map.
I think that using hashmap directly actually requires O(n^3) as the key strings at the end of access must be compared somehow for equality. This comparison is probably O(n).
Python. This is somewhat quick and dirty, with the data structures doing most of the lifting.
from collections import Counter
accumulator = Counter()
text = 'hello world this is hello world.'
for length in range(1,len(text)+1):
for start in range(len(text) - length):
accumulator[text[start:start+length]] += 1
The Counter structure is a hash-backed dictionary designed for counting how many times you've seen something. Adding to a nonexistent key will create it, while retrieving a nonexistent key will give you zero instead of an error. So all you have to do is iterate over all the substrings.
just pseudo code, and maybe this isn't the most beautiful solution, but I would solve like this:
function separateWords(String incomingString) returns StringArray{
//Code
}
function findMax(Map map) returns String{
//Code
}
function mainAlgorithm(String incomingString) returns String{
StringArray sArr = separateWords(incomingString);
Map<String, Integer> map; //init with no content
for(word: sArr){
Integer count = map.get(word);
if(count == null){
map.put(word,1);
} else {
//remove if neccessary
map.put(word,count++);
}
}
return findMax(map);
}
Where map can contain a key, value pairs like in Java HashMap.
Since for every substring of a String of length >= 2 the text contains at least one substring of length 2 at least as many times, we only need to investigate substrings of length 2.
val s = "hello world this is hello world. hello world repeats three times in this string!"
val li = s.sliding (2, 1).toList
// li: List[String] = List(he, el, ll, lo, "o ", " w", wo, or, rl, ld, "d ", " t", th, hi, is, "s ", " i", is, "s ", " h", he, el, ll, lo, "o ", " w", wo, or, rl, ld, d., ". ", " h", he, el, ll, lo, "o ", " w", wo, or, rl, ld, "d ", " r", re, ep, pe, ea, at, ts, "s ", " t", th, hr, re, ee, "e ", " t", ti, im, me, es, "s ", " i", in, "n ", " t", th, hi, is, "s ", " s", st, tr, ri, in, ng, g!)
val uniques = li.toSet
uniques.toList.map (u => li.count (_ == u))
// res18: List[Int] = List(1, 2, 1, 1, 3, 1, 5, 1, 1, 3, 1, 1, 3, 2, 1, 3, 1, 3, 2, 3, 1, 1, 1, 1, 1, 3, 1, 3, 3, 1, 3, 1, 1, 1, 3, 3, 2, 4, 1, 2, 2, 1)
uniques.toList(6)
res19: String = "s "
Perl, O(n²) solution
my $str = "hello world this is hello world. hello world repeats three times in this string!";
my #words = split(/[^a-z]+/i, $str);
my ($display,$ix,$i,%ocur) = 10;
# calculate
for ($ix=0 ; $ix<=$#words ; $ix++) {
for ($i=$ix ; $i<=$#words ; $i++) {
$ocur{ join(':', #words[$ix .. $i]) }++;
}
}
# display
foreach (sort { my $c = $ocur{$b} <=> $ocur{$a} ; return $c ? $c : split(/:/,$b)-split(/:/,$a); } keys %ocur) {
print "$_: $ocur{$_}\n";
last if !--$display;
}
displays the 10 best scores of the most common sub strings (in case of tie, show the longest chain of words first). Change $display to 1 to have only the result.There are n(n+1)/2 iterations.