CS50x 2019 pset2: Vigenère does not fully work

CS50x 2019 pset2: Vigenère does not fully work - cs50

I am currently working on the pset2 of CS50x 2019, specifically on Vigenère. The CS50 Gradebook shows me 93% finished after uploading it to GitHub.
I have already tried some other code snippets that you can find online but they did not seem to work.
Here is the part of my program that creates the ciphertext:
string k = argv[1];
// Get the plaintext and print out the ciphertext
string s = get_string("plaintext: ");
printf("ciphertext: ");
// Iterate through plaintext letter by letter
for (int i = 0, n = strlen(s) ; i < n; i++)
{
int key = tolower(k[i % strlen(k)]) - 'a';
// Check if the letter is lowercase, uppercase or neither and print out the rotated character
if (islower(s[i]))
{
printf("%c", (((s[i] - 'a') + key) % 26) + 'a');
}
else if (isupper(s[i]))
{
printf("%c", (((s[i] - 'A') + key) % 26) + 'A');
}
else
{
printf("%c", s[i]);
}
}
printf("\n");
return 0;
There are some examples in the documentation which you can test out with your code.
The following example does not work with my code:
$ ./vigenere bacon
plaintext: Meet me at the park at eleven am
ciphertext: Negh zf av huf pcfx bt gzrwep oz
My output is:
$ ./vigenere bacon
plaintext: Meet me at the park at eleven am
ciphertext: Negh ne og tjs qaty bt syfvgb bm
As you can see, the first 4 characters are correct but the remaining are not.

This int key = tolower(k[i % strlen(k)]) - 'a'; is a problem.
From the spec:
Remember also that every time you encipher a character, you need to
move to the next letter of k, the keyword (and wrap around to the
beginning of the keyword if you exhaust all of its characters). But if
you don’t encipher a character (e.g., a space or a punctuation mark),
don’t advance to the next character of k!
The bottom line is: since plaintext and key run at different "rates" you cannot use the same index (in this case i) for both. The program should crawl plaintext one character at a time, as is done here. But it needs a separate way to control the character in key so you can 1) skip spaces/punctuation and 2) wrap around. Another variable for the key index would be one way to solve it. There may be other problems in the code, but this is a fundamental flaw.

Related

Last character of a multibyte string

One of the things I often need to do when handling a multibyte string is deleting its last character. How do I locate this last character so I can chop it off using normal byte operations, preferably with as few reads as possible?
Note that this question is intended to work for most, if not all, multibyte encodings. The answer for self-synchonizing encodings like UTF-8 is trivial, as you can just go right-to-left in the bytestring for a start marker.

The answer will be written in C, with POSIX multibyte functions. The said functions are also found on Windows. Assume that the bytestring ends at len and is well-formed up to the point; assume appropriate setlocale calls. Porting to mbrlen is left as an exercise for the reader.
The naive solution
The obviously correct solution involves parsing the encoding "as intended", going from left-to-right.
ssize_t index_of_last_char_left(const char *c, size_t len) {
size_t pos = 0;
size_t next = 1;
mblen(NULL, 0);
while (pos < len - 1) {
next = mblen(c + pos, len - pos);
if (next == -1) // Invalid input
return pos;
pos += next;
}
return pos - next;
}
Deleting multiple characters like this will cause an "accidentally quadratic" situation; memorizing of intermediate positions will help, but additional management is required.
The right-to-left solution
As I mentioned in the question, for self-synchonizing encodings the only thing to do is to look for a start marker. But what breaks with the ones that don't self-synchonize?
The one-or-two-byte EUC encodings have both bytes of the two-byte sequence higher than 0x7f, and there's almost no differentiating between start and continuation bytes. For that we can check for mblen(pos) == bytes_left since we know the string is well-formed.
The Big5, GBK, and GB10830 encodings also allow a continuation byte in the ASCII range, so a lookbehind is mandatory.
With that cleared out (and assuming the bytestring up to len is well-formed), we can have:
// As much as CJK encodings do. I don't have time to see if it works for UTF-1.
#define MAX_MB_LEN 4
ssize_t index_of_last_char_right(const char *c, size_t len) {
ssize_t pos = len - 1;
bool last = true;
bool last_is_okay = false;
assert(!mblen(NULL, 0)); // No, we really cannot handle shift states.
for (; pos >= 0 && pos >= len - 2 - MAX_MB_LEN; pos--) {
int next = mblen(c + pos, len - pos);
bool okay = (next > 0) && (next == len - pos - 1);
if (last) {
last_is_okay = okay;
last = false;
} else if (okay)
return pos;
}
return last_is_okay ? len - 1 : -1;
}
(You should be able to find the last good char of a malformed string by (next > 0) && (next <= len - pos - 1). But don't return that when the last byte is okay!)
What's the point of this?
The code sample above is for the idealist who does not want to write just a "UTF-8 support" but a "locale support" based on the C library. There might not have a point for this at all in 2021 :)

MIPS, Number of occurrences in a string located in the stack

I have an exercise to solve in MIPS assembly (where I have some doubts but other things are clear) but I have some problem to write it's code. The exercise ask me:
Write a programm that, obtained a string from keyboard, count the occurrences of the character with the higher number of occurrences and show it.
How I can check all the 26 characters and find who has the higher occurences?
Example:
Give me a string: Hello world!
The character with the higher occurrences is: l
Thanks alot for the future answer.
P.s.
This is my first part of the programm:
#First message
li $v0, 4
la $a0, mess
syscall
#Stack space allocated
addi $sp, $sp, -257
#Read the string
move $a0, $sp
li $a1, 257
li $v0, 8
syscall

Since this is your assignment I'll leave the MIPS assembly implementation to you. I'll just show you the logic for the code in a higher-level language:
// You'd keep these variables in some MIPS registers of your choice
int c, i, count, max_count=0;
char max_char;
// Iterate over all ASCII character codes
for (c = 0; c < 128; c+=1) {
count = 0;
// Count the number of occurences of this character in the string
for (i = 0; string[i]!=0; i+=1) {
if (string[i] == c) count++;
}
// Was is greater than the current max?
if (count > max_count) {
max_count = count;
max_char = c;
}
}
// max_char now hold the ASCII code of the character with the highest number
// of occurences, and max_count hold the number of times that character was
// found in the string.

#Michael, I saw you answered before I posted, I just want to repeat that with a more detailed answer. If you edit your own to add some more explanations, then I will delete mine. I did not edit yours directly, because I was already half-way there when you posted. Anyway:
#Marco:
You can create a temporary array of 26 counters (initialized to 0).
Each counter corresponds to each letter (i.e. the number each letter occurs). For example counter[0] corresponds to the number of occurences of letter 'a', counter[1] for letter 'b', etc...
Then iterate over each character in the input character-sequence and for each character do:
a) Obtain the index of the character in the counter array.
b) Increase counter["obtained index"] by 1.
To obtain the index of the character you can do the following:
a) First make sure the character is not capital, i.e. only 'a' to 'z' allowed and not 'A' to 'Z'. If it is not, convert it.
b) Substract the letter 'a' from the character. This way 'a'-'a' gives 0, 'b'-'a' gives 1, 'c'-'a' gives 2, etc...
I will demonstrate in C language, because it's your exercise on MIPS (I mean the goal is to learn MIPS Assembly language):
#include <stdio.h>
int main()
{
//Maximum length of string:
int stringMaxLength = 100;
//Create string in stack. Size of string is length+1 to
//allow the '\0' character to mark the end of the string.
char str[stringMaxLength + 1];
//Read a string of maximum stringMaxLength characters:
puts("Enter string:");
scanf("%*s", stringMaxLength, str);
fflush(stdin);
//Create array of counters in stack:
int counter[26];
//Initialize the counters to 0:
int i;
for (i=0; i<26; ++i)
counter[i] = 0;
//Main counting loop:
for (i=0; str[i] != '\0'; ++i)
{
char tmp = str[i]; //Storing of str[i] in tmp, to write tmp if needed,
//instead of writing str[i] itself. Optional operation in this particular case.
if (tmp >= 'A' && tmp <= 'Z') //If the current character is upper:
tmp = tmp + 32; //Convert the character to lower.
if (tmp >= 'a' && tmp <='z') //If the character is a lower letter:
{
//Obtain the index of the letter in the array:
int index = tmp - 'a';
//Increment its counter by 1:
counter[index] = counter[index] + 1;
}
//Else if the chacacter is not a lower letter by now, we ignore it,
//or we could inform the user, for example, or we could ignore the
//whole string itself as invalid..
}
//Now find the maximum occurences of a letter:
int indexOfMaxCount = 0;
int maxCount = counter[0];
for (i=1; i<26; ++i)
if (counter[i] > maxCount)
{
maxCount = counter[i];
indexOfMaxCount = i;
}
//Convert the indexOfMaxCount back to the character it corresponds to:
char maxChar = 'a' + indexOfMaxCount;
//Inform the user of the letter with maximum occurences:
printf("Maximum %d occurences for letter '%c'.\n", maxCount, maxChar);
return 0;
}
If you don't understand why I convert the upper letter to lower by adding 32, then read on:
Each character corresponds to an integer value in memory, and when you make arithmetic operations on characters, it's like you are making them to their corresponding number in the encoding table.
An encoding is just a table which matches those letters with numbers.
For example 'a' corresponds to number 97 in ASCII encoding/decoding/table.
For example 'b' corresponds to number 98 in ASCII encoding/decoding/table.
So 'a'+1 gives 97+1=98 which is the character 'b'. They are all numbers in memory, and the difference is how you represent (decode) them. The same table of the encoding, is also used for decoding of course.
Examples:
printf("%c", 'a'); //Prints 'a'.
printf("%d", (int) 'a'); //Prints '97'.
printf("%c", (char) 97); //Prints 'a'.
printf("%d", 97); //Prints '97'.
printf("%d", (int) 'b'); //Prints '98'.
printf("%c", (char) (97 + 1)); //Prints 'b'.
printf("%c", (char) ( ((int) 'a') + 1 ) ); //Prints 'b'.
//Etc...
//All the casting in the above examples is just for demonstration,
//it would work without them also, in this case.

Counter for two binary strings C++

I am trying to count two binary numbers from string. The maximum number of counting digits have to be 253. Short numbers works, but when I add there some longer numbers, the output is wrong. The example of bad result is "10100101010000111111" with "000011010110000101100010010011101010001101011100000000111000000000001000100101101111101000111001000101011010010111000110".
#include <iostream>
#include <stdlib.h>
using namespace std;
bool isBinary(string b1,string b2);
int main()
{
string b1,b2;
long binary1,binary2;
int i = 0, remainder = 0, sum[254];
cout<<"Get two binary numbers:"<<endl;
cin>>b1>>b2;
binary1=atol(b1.c_str());
binary2=atol(b2.c_str());
if(isBinary(b1,b2)==true){
while (binary1 != 0 || binary2 != 0){
sum[i++] =(binary1 % 10 + binary2 % 10 + remainder) % 2;
remainder =(binary1 % 10 + binary2 % 10 + remainder) / 2;
binary1 = binary1 / 10;
binary2 = binary2 / 10;
}
if (remainder != 0){
sum[i++] = remainder;
}
--i;
cout<<"Result: ";
while (i >= 0){
cout<<sum[i--];
}
cout<<endl;
}else cout<<"Wrong input"<<endl;
return 0;
}
bool isBinary(string b1,string b2){
bool rozhodnuti1,rozhodnuti2;
for (int i = 0; i < b1.length();i++) {
if (b1[i]!='0' && b1[i]!='1') {
rozhodnuti1=false;
break;
}else rozhodnuti1=true;
}
for (int k = 0; k < b2.length();k++) {
if (b2[k]!='0' && b2[k]!='1') {
rozhodnuti2=false;
break;
}else rozhodnuti2=true;
}
if(rozhodnuti1==false || rozhodnuti2==false){ return false;}
else{ return true;}
}

One of the problems might be here: sum[i++]
This expression, as it is, first returns the value of i and then increases it by one.
Did you do it on purporse?
Change it to ++i.
It'd help if you could also post the "bad" output, so that we can try to move backward through the code starting from it.
EDIT 2015-11-7_17:10
Just to be sure everything was correct, I've added a cout to check what binary1 and binary2 contain after you assing them the result of the atol function: they contain the integer numbers 547284487 and 18333230, which obviously dont represent the correct binary-to-integer transposition of the two 01 strings you presented in your post.
Probably they somehow exceed the capacity of atol.
Also, the result of your "math" operations bring to an even stranger result, which is 6011111101, which obviously doesnt make any sense.
What do you mean, exactly, when you say you want to count these two numbers? Maybe you want to make a sum? I guess that's it.
But then, again, what you got there is two signed integer numbers and not two binaries, which means those %10 and %2 operations are (probably) misused.
EDIT 2015-11-07_17:20
I've tried to use your program with small binary strings and it actually works; with small binary strings.
It's a fact(?), at this point, that atol cant handle numerical strings that long.
My suggestion: use char arrays instead of strings and replace 0 and 1 characters with numerical values (if (bin1[i]){bin1[i]=1;}else{bin1[i]=0}) with which you'll be able to perform all the math operations you want (you've already written a working sum function, after all).
Once done with the math, you can just convert the char array back to actual characters for 0 and 1 and cout it on the screen.
EDIT 2015-11-07_17:30
Tested atol on my own: it correctly converts only strings that are up to 10 characters long.
Anything beyond the 10th character makes the function go crazy.

Generating a fake ISBN from book title? (Or: How to hash a string into a 6-digit numeric ID)

Short version: How can I turn an arbitrary string into a 6-digit number with minimal collisions?
Long version:
I'm working with a small library that has a bunch of books with no ISBNs. These are usually older, out-of-print titles from tiny publishers that never got an ISBN to begin with, and I'd like to generate fake ISBNs for them to help with barcode scanning and loans.
Technically, real ISBNs are controlled by commercial entities, but it is possible to use the format to assign numbers that belong to no real publisher (and so shouldn't cause any collisions).
The format is such that:
978-0-01-######-?
Gives you 6 digits to work with, from 000000 to 999999, with the ? at the end being a checksum.
Would it be possible to turn an arbitrary book title into a 6-digit number in this scheme with minimal chance of collisions?

After using code snippets for making a fixed-length hash and calculating the ISBN-13 checksum, I managed to create really ugly C# code that seems to work. It'll take an arbitrary string and convert it into a valid (but fake) ISBN-13:
public int GetStableHash(string s)
{
uint hash = 0;
// if you care this can be done much faster with unsafe
// using fixed char* reinterpreted as a byte*
foreach (byte b in System.Text.Encoding.Unicode.GetBytes(s))
{
hash += b;
hash += (hash << 10);
hash ^= (hash >> 6);
}
// final avalanche
hash += (hash << 3);
hash ^= (hash >> 11);
hash += (hash << 15);
// helpfully we only want positive integer < MUST_BE_LESS_THAN
// so simple truncate cast is ok if not perfect
return (int)(hash % MUST_BE_LESS_THAN);
}
public int CalculateChecksumDigit(ulong n)
{
string sTemp = n.ToString();
int iSum = 0;
int iDigit = 0;
// Calculate the checksum digit here.
for (int i = sTemp.Length; i >= 1; i--)
{
iDigit = Convert.ToInt32(sTemp.Substring(i - 1, 1));
// This appears to be backwards but the
// EAN-13 checksum must be calculated
// this way to be compatible with UPC-A.
if (i % 2 == 0)
{ // odd
iSum += iDigit * 3;
}
else
{ // even
iSum += iDigit * 1;
}
}
return (10 - (iSum % 10)) % 10;
}
private void generateISBN()
{
string titlehash = GetStableHash(BookTitle.Text).ToString("D6");
string fakeisbn = "978001" + titlehash;
string check = CalculateChecksumDigit(Convert.ToUInt64(fakeisbn)).ToString();
SixDigitID.Text = fakeisbn + check;
}

The 6 digits allow for about 10M possible values, which should be enough for most internal uses.
I would have used a sequence instead in this case, because a 6 digit checksum has relatively high chances of collisions.
So you can insert all strings to a hash, and use the index numbers as the ISBN, either after sorting or without it.
This should make collisions almost impossible, but it requires keeping a number of "allocated" ISBNs to avoid collisions in the future, and keeping the list of titles that are already in store, but it's information that you would most probably want to keep anyway.
Another option is to break the ISBN standard and use hexadecimal/uuencoded barcodes, that may increase the possible range to a point where it may work with a cryptographic hash truncated to fit.
I would suggest that since you are handling old book titles, which may have several editions capitalized and punctuated differently, I would strip punctuation, duplicated whitespaces and convert everything to lowercase before the comparison to minimize the chance of a technical duplicate even though the string is different (Unless you want different editions to have different ISBNs, in that case, you can ignore this paragraph).

Finding the number of permutations for a three letter string with ABC and 123

I know from Algebra class that with ABC and 123 we can make 216 different permutations for a three letter string, right? (6 x 6 x 6) I'd like to create a console program in C++ that displays ever possible permutation for the example above. The thing is, how would I even begin trying to calculate them. Perhaps:
AAA
BAA
CAA
1BA
2BA
3CA
1AB
2BC
3CA
etc.
This is really hard to ask, but what would I have to do to ensure that I include every permutation? I know there are 216 but I don't know how to actually go about going through all of them.
Any suggestions would be greatly appreciated!!!

If you need a fixed-number strings, you can use N nested loops (three in your case).
string parts = "ABC123";
for (int i = 0 ; i != parts.size() ; i++)
for (int j = 0 ; j != parts.size() ; j++)
for (int k = 0 ; k != parts.size() ; k++)
cout << parts[i] << parts[j] << parts[k] << endl;
If N is not fixed, you would need a more general recursive solution.

It's really easy to do using recursion. Provided you have an array of all six elements, here's java code to do it. I am sure you can translate it to C++ easily.
void getAllCombinations(List<String> output, char[] chrs, String prefix, int length) {
if (prefix.length() == length) {
output.add(prefix);
} else {
for (int i = 0;i < chrs.length;i++) {
getAllCombinations(output, chrs, prefix + chrs[i], length);
}
}
return;
}
This is not perfect, but it should give you the general idea.
Run it with parameters: empty list, array of available characters, empty string and length of desired strings.

With three nested loops (one per character position) iterating over each of the 6 allowed characters it's hard not to see that every possibly combination has a corresponding set of loop indices, and that every set of legal loop indices has a corresponding 3 letter string. And that 1-1 correspondence between loop indices and strings is what you're looking for, I gather.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string