The English alphabet as a vector of characters in Rust - rust

The title says it all. I want to generate the alphabet as a vector of characters. I did consider simply creating a range of 97-122 and converting it to characters, but I was hoping there would be a nicer looking way, such as Python's string.ascii_lower.
The resulting vector or string should have the characters a-z.

Hard-coding this sort of thing makes sense, as it can then be a compiled constant, which is great for efficiency.
static ASCII_LOWER: [char; 26] = [
'a', 'b', 'c', 'd', 'e',
'f', 'g', 'h', 'i', 'j',
'k', 'l', 'm', 'n', 'o',
'p', 'q', 'r', 's', 't',
'u', 'v', 'w', 'x', 'y',
'z',
];
(Decide for yourself whether to use static or const.)
This is pretty much how Python does it in string.py:
lowercase = 'abcdefghijklmnopqrstuvwxyz'
# ...
ascii_lowercase = lowercase

Collecting the characters of a str doesn't seem like a bad idea...
let alphabet: Vec<char> = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ".chars().collect();

Old question but you can create a range of chars, so
('a'..='z').into_iter().collect::<Vec<char>>()

Related

Why is this function just being skipped and not called?

I can't seem to see why this doesn't print each item in the loop. I ran it through Thonny and it just completely skips my function. Am I not passing in the variables/arguments correctly?
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
direction = input("Type 'encode' to encrypt, type 'decode' to decrypt:\n")
text = input("Type your message:\n").lower()
shift = int(input("Type the shift number:\n"))
def encrypt(direction, text, shift):
if direction == "encode":
for index, letter in enumerate(text):
print(letter)
I feel like I am messing up passing the values in from outside the function.
All you've done is define the function. You need to call it like
encrypt(direction, text, shift)
Right after you have gotten the input for those variables. Calling a function also needs to be done after it's defined, so you should move the function definition up at the top of the program.
Putting the names of parameters in the function definition doesn't "link" those to any variables named that in the rest of the program. Those parameters take on the value of whatever is passed in to the function. So for example, you could do
encrypt("encode", "hello", 3)
and then inside the function direction would be "encode", text would be "hello", and shift would be 3.

I want to know about alphabets using map?

why we use this particular range from 97 to 123? And I want to know more about alphabets using map ?
list(map(chr,range(97,123)))
ASCII codes for the lower case English alphabets range from 97 to 122.
The range function in the line you provided above, creates an iterable object with the elements from 97 to 122. You are mapping these with the chr method. This method returns the associated ASCII character. For example,
>>> chr(97)
'a'
>>> chr(100)
'd'
>>> chr(122)
'z'
Now, your map function doing all these operations for the numbers between 97 to 123.
>>> map(chr,range(97,123))
<map object at 0x000002EEAE8F46C8>
But the map returns the map object, and to convert that to a list , you can use list method.
>>> list(map(chr,range(97,123)))
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
Regards

Pass cracker how not use four loops for four letters combination

I was practising for educational purpose with simply password cracker.
I know that I could use itertool but in my case when I'm learning I
would miss facing problems and learning on them and indeed I've met one
which is not giving me a peace.
What I want to learn is if I need get for example four combinations, so
how to get in a loop first letter 'a',then another step'a' and again 'a'
and 'a', to have 'aaaa' later on'abaa' etc.
So I wrote that:
import string
passe = 'zulu'
mylist = []
#letters = string.ascii_lowercase
letters = ['a','b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'w', 'q', 'y', 'z']
mineset= set()
for a in letters:
for b in letters:
for c in letters:
for d in letters:
s = a + b + c + d
mylist.append(s)
mineset=set(mylist)
k = sorted(mineset)
print(k)
for i in k:
if i == passe:
print('got it: ', i )
print(passe in k)
It works in someway but the problems are:
I had to made a set from list because combinations
were repeated.
And finally, I was struggling with making it in without creating four
loops for four letters, something like that:
To try to solve those I went with that approach:
letters = ['a','b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'w', 'q', 'y', 'z']
password = 'pass'
combin = ''
lista=[]
for x in range(1,5):
for y in letters:
combin +=y
lista.append(combin)
combin=''
mineset=set(lista)
print(mineset)
for i in lista:
if i == password:
print('pass:', i)
But the results are disspainting:
{'abc', 'a', 'ab', 'abcd', 'abcde'}
I was seating on it for a long day but can't even closely achieve
similar effect to 4 loops in previous code.
While I don't recommend this approach as a general rule, for purposes of learning here is a way to achieve what you want:
def getCombos(lts):
rslt = []
for l1 in lts:
for l2 in lts:
for l3 in lts:
for l4 in lts:
s = l1+l2+l3+l4
rslt.append(s)
return rslt
letters = 'abcdefghijklmnopqrstuvwxyz'
getCombos(letters)
As is illustrated by a simple example, this code is of O(n^x) in complexity where n = number of characters and x = length of the letters. This approach, quickly becomes unwieldly as the following example illustrates:
getCombos('abc")
yields 81 entries including:
['aaaa',
'aaab',
'aaac',
'aaba',
'aabb',
'aabc',
'aaca',
'aacb',
'aacc',
'abaa',
'abab',
'abac',
'abba',
'abbb',
'abbc',
'abca',
'abcb',
'abcc',
'acaa',
'acab',
'acac',
'acba',
'acbb',
'acbc',
'acca',
'accb',
'accc',
'baaa',
'baab',
'baac',
'baba',
'babb',
'babc',
'baca',
'bacb',
'bacc',
'bbaa',
'bbab',
'bbac',
'bbba',
'bbbb',
'bbbc',
'bbca',
'bbcb',
'bbcc',
'bcaa',
'bcab',
'bcac',
'bcba',
'bcbb',
'bcbc',
'bcca',
'bccb',
'bccc',
'caaa',
'caab',
'caac',
'caba',
'cabb',
'cabc',
'caca',
'cacb',
'cacc',
'cbaa',
'cbab',
'cbac',
'cbba',
'cbbb',
'cbbc',
'cbca',
'cbcb',
'cbcc',
'ccaa',
'ccab',
'ccac',
'ccba',
'ccbb',
'ccbc',
'ccca',
'cccb',
'cccc']

Python 3.8 sort - Lambda function behaving differently for lists, strings

Im trying to sort a list of objects based on frequency of occurrence (increasing order) of characters. Im seeing that the sort behaves differently if list has numbers versus characters. Does anyone know why this is happening?
Below is a list of numbers sorted by frequency of occurrence.
# Sort list of numbers based on increasing order of frequency
nums = [1,1,2,2,2,3]
countMap = collections.Counter(nums)
nums.sort(key = lambda x: countMap[x])
print(nums)
# Returns correct output
[3, 1, 1, 2, 2, 2]
But If I sort a list of characters, the order of 'l' and 'o' is incorrect in the below example:
# Sort list of characters based on increasing order of frequency
alp = ['l', 'o', 'v', 'e', 'l', 'e', 'e', 't', 'c', 'o', 'd', 'e']
countMap = collections.Counter(alp)
alp.sort(key = lambda x: countMap[x])
print(alp)
# Returns Below output - characters 'l' and 'o' are not in the correct sorted order
['v', 't', 'c', 'd', 'l', 'o', 'l', 'o', 'e', 'e', 'e', 'e']
# Expected output
['v', 't', 'c', 'd', 'l', 'l', 'o', 'o', 'e', 'e', 'e', 'e']
Sorting uses stable sort - that means if you have the same sorting criteria for two elements they keep their relative order/positioning (here it being the amount of 2 for both of them).
from collections import Counter
# Sort list of characters based on increasing order of frequency
alp = ['l', 'o', 'v', 'e', 'l', 'e', 'e', 't', 'c', 'o', 'd', 'e']
countMap = Counter(alp)
alp.sort(key = lambda x: (countMap[x], x)) # in a tie, the letter will be used to un-tie
print(alp)
['c', 'd', 't', 'v', 'l', 'l', 'o', 'o', 'e', 'e', 'e', 'e']
This fixes it by using the letter as second criteria.
To get your exact output you can use:
# use original position as tie-breaker in case counts are identical
countMap = Counter(alp)
pos = {k:alp.index(k) for k in countMap}
alp.sort(key = lambda x: (countMap[x], pos[x]))
print(alp)
['v', 't', 'c', 'd', 'l', 'l', 'o', 'o', 'e', 'e', 'e', 'e']
See Is python's sorted() function guaranteed to be stable? or https://wiki.python.org/moin/HowTo/Sorting/ for details on sorting.

Python-crfsuite labeling in fixed pattern

I'm trying to create a CRF model that segments Japanese sentences into words. At the moment I'm not worried about perfect results as it's just a test. The training goes fine but when it's finished it always gives the same guess for every sentence I try to tag.
"""Labels: X: Character is mid word, S: Character starts a word, E:Character ends a word, O: One character word"""
Sentence:広辞苑や大辞泉には次のようにある。
Prediction:['S', 'X', 'E', 'S', 'E', 'S', 'E', 'S', 'E', 'S', 'E', 'S', 'E', 'S', 'E', 'S', 'E']
Truth:['S', 'X', 'E', 'O', 'S', 'X', 'E', 'O', 'O', 'O', 'O', 'S', 'E', 'O', 'S', 'E', 'O']
Sentence:他にも、言語にはさまざまな分類がある。
Prediction:['S', 'X', 'E', 'S', 'E', 'S', 'E', 'S', 'E', 'S', 'E', 'S', 'E', 'S', 'E', 'S', 'E', 'S', 'E']
Truth:['O', 'O', 'O', 'O', 'S', 'E', 'O', 'O', 'S', 'X', 'X', 'X', 'E', 'S', 'E', 'O', 'S', 'E', 'O']
When looking at the transition info for the model:
{('E', 'E'): -3.820618,
('E', 'O'): 3.414133,
('E', 'S'): 2.817927,
('E', 'X'): -3.056175,
('O', 'E'): -4.249522,
('O', 'O'): 2.583123,
('O', 'S'): 2.601341,
('O', 'X'): -4.322003,
('S', 'E'): 7.05034,
('S', 'O'): -4.817578,
('S', 'S'): -4.400028,
('S', 'X'): 6.104851,
('X', 'E'): 4.985887,
('X', 'O'): -5.141898,
('X', 'S'): -4.499069,
('X', 'X'): 4.749289}
This looks good since all the transitions with negative values are impossible,
E -> X for example, going from the end of a word to the middle of the following one. S -> E gets has the highest value, and as seen above the model simply gets into a pattern of labeling S then E repeatedly until the sentence ends. I followed this demo when trying this, though that demo is for separating Latin. My features are similarly just n-grams:
['bias',
'char=ま',
'-2-gram=さま',
'-3-gram=はさま',
'-4-gram=にはさま',
'-5-gram=語にはさま',
'-6-gram=言語にはさま',
'2-gram=まざ',
'3-gram=まざま',
'4-gram=まざまな',
'5-gram=まざまな分',
'6-gram=まざまな分類']
I've tried changing labels to just S and X for start and other, but this just causes the model to repeat S,X,S,X till it runs out of characters. I've gone up to 6-grams in both directions which took a lot longer but didn't change anything. Tried training for more iterations and changing the L1 and L2 constants a bit. I've trained on up to 100,000 sentences which is about as far as I can go as it takes almost all 16GB of my ram to do so. Are my features structured wrong? How do I get the model to stop guessing in a pattern, is that even what's happening? Help would be appreciated, and let me know if I need to add more info to the question.
Turns out I was missing a step. I was passing raw sentences to the tagger rather than passing features, because the CRF can apparently accept character strings as if it were a list of almost featureless entries it was just defaulting to guessing the highest rated transition rather than raising an error. I'm not sure if this will help anyone else given it was a stupid mistake but I'll put an answer here until I decide whether or not I want to remove the question.

Resources