Append Char To StringBuilder C++/CLI - string

I am trying to use StringBuilder to create the output that is being sent over the serial port for a log file. The output is stored in a byte array, and I am recursing through it.
ref class UART_G {
public:
static array<System::Byte>^ message = nullptr;
static uint8_t message_length = 0;
};
static void logSend ()
{
StringBuilder^ outputsb = gcnew StringBuilder();
outputsb->Append("Sent ");
for (uint8_t i = 0; i < UART_G::message_length; i ++)
{
unsigned char mychar = UART_G::message[i];
if (
(mychar >= ' ' && mychar <= 'Z') || //Includes 0-9, A-Z.
(mychar >= '^' && mychar <= '~') || //Includes a-z.
(mychar >= 128 && mychar <= 254)) //I think these are okay.
{
outputsb->Append(L""+mychar);
}
else
{
outputsb->Append("[");
outputsb->Append(mychar);
outputsb->Append("]");
}
}
log_line(outputsb->ToString());
}
I want all plain text characters (eg A, :) to be sent as text, while functional characters (eg BEL, NEWLINE) will be sent like [7][13].
What is happening is that the StringBuilder, in all cases, is outputting the character as a number. For example, A is being sent out as 65.
For example, if I have the string 'APPLE' and a newline in my byte array, I want to see:
Sent APPLE[13]
Instead, I see:
Sent 6580807669[13]
I have tried every way imaginable to get it to display the character properly, including type-casting, concatenating it to a string, changing the variable type, etc... I would really appreciate if anyone knows how to do this. My log files are largely unreadable without this function.

You're getting the ASCII values because the compiler is choosing one of the Append overloads that takes an integer of some sort. To fix this, you could do a explicit cast to System::Char, to force the correct overload.
However, that won't necessarily give the proper results for 128-255. You could cast a value in that range from Byte to Char, and it'll give something, but not necessarily what you expect. First off, 0x80 through 0x9F are control characters, and whereever you're getting the bytes from might not intend the same representation for 0xA0 through 0xFF as Unicode has.
In my opinion, the best solution would be to use the "[value]" syntax that you're using for the other control characters for 0x80 through 0xFF as well. However, if you do want to convert those to characters, I'd use Encoding::Default, not Encoding::ASCII. ASCII only defines 0x00 through 0x7F, 0x80 and higher will come out as "?". Encoding::Default is whatever code page is defined for the language you have selected in Windows.
Combine all that, and here's what you'd end up with:
for (uint8_t i = 0; i < UART_G::message_length; i ++)
{
unsigned char mychar = UART_G::message[i];
if (mychar >= ' ' && mychar <= '~' && mychar != '[' && mychar != ']')
{
// Use the character directly for all ASCII printable characters,
// except '[' and ']', because those have a special meaning, below.
outputsb->Append((System::Char)(mychar));
}
else if (mychar >= 128)
{
// Non-ASCII characters, use the default encoding to convert to Unicode.
outputsb->Append(Encoding::Default->GetChars(UART_G::message, i, 1));
}
else
{
// Unprintable characters, use the byte value in brackets.
// Also do this for bracket characters, so there's no ambiguity
// what a bracket means in the logs.
outputsb->Append("[");
outputsb->Append((unsigned int)mychar);
outputsb->Append("]");
}
}

You are recieveing ascii value of the string .
See the Ascii chart
65 = A
80 = P
80 = P
76 = L
69 = E
Just write a function that converts the ascii value to string

Here is the code I came up with which resolved the issue:
static void logSend ()
{
StringBuilder^ outputsb = gcnew StringBuilder();
ASCIIEncoding^ ascii = gcnew ASCIIEncoding;
outputsb->Append("Sent ");
for (uint8_t i = 0; i < UART_G::message_length; i ++)
{
unsigned char mychar = UART_G::message[i];
if (
(mychar >= ' ' && mychar <= 'Z') || //Includes 0-9, A-Z.
(mychar >= '^' && mychar <= '~') || //Includes a-z.
(mychar >= 128 && mychar <= 254)) //I think these are okay.
{
outputsb->Append(ascii->GetString(UART_G::message, i, 1));
}
else
{
outputsb->Append("[");
outputsb->Append(mychar);
outputsb->Append("]");
}
}
log_line(outputsb->ToString());
}
I still appreciate any alternatives which are more efficient or simpler to read.

Related

Last character of a multibyte string

One of the things I often need to do when handling a multibyte string is deleting its last character. How do I locate this last character so I can chop it off using normal byte operations, preferably with as few reads as possible?
Note that this question is intended to work for most, if not all, multibyte encodings. The answer for self-synchonizing encodings like UTF-8 is trivial, as you can just go right-to-left in the bytestring for a start marker.
The answer will be written in C, with POSIX multibyte functions. The said functions are also found on Windows. Assume that the bytestring ends at len and is well-formed up to the point; assume appropriate setlocale calls. Porting to mbrlen is left as an exercise for the reader.
The naive solution
The obviously correct solution involves parsing the encoding "as intended", going from left-to-right.
ssize_t index_of_last_char_left(const char *c, size_t len) {
size_t pos = 0;
size_t next = 1;
mblen(NULL, 0);
while (pos < len - 1) {
next = mblen(c + pos, len - pos);
if (next == -1) // Invalid input
return pos;
pos += next;
}
return pos - next;
}
Deleting multiple characters like this will cause an "accidentally quadratic" situation; memorizing of intermediate positions will help, but additional management is required.
The right-to-left solution
As I mentioned in the question, for self-synchonizing encodings the only thing to do is to look for a start marker. But what breaks with the ones that don't self-synchonize?
The one-or-two-byte EUC encodings have both bytes of the two-byte sequence higher than 0x7f, and there's almost no differentiating between start and continuation bytes. For that we can check for mblen(pos) == bytes_left since we know the string is well-formed.
The Big5, GBK, and GB10830 encodings also allow a continuation byte in the ASCII range, so a lookbehind is mandatory.
With that cleared out (and assuming the bytestring up to len is well-formed), we can have:
// As much as CJK encodings do. I don't have time to see if it works for UTF-1.
#define MAX_MB_LEN 4
ssize_t index_of_last_char_right(const char *c, size_t len) {
ssize_t pos = len - 1;
bool last = true;
bool last_is_okay = false;
assert(!mblen(NULL, 0)); // No, we really cannot handle shift states.
for (; pos >= 0 && pos >= len - 2 - MAX_MB_LEN; pos--) {
int next = mblen(c + pos, len - pos);
bool okay = (next > 0) && (next == len - pos - 1);
if (last) {
last_is_okay = okay;
last = false;
} else if (okay)
return pos;
}
return last_is_okay ? len - 1 : -1;
}
(You should be able to find the last good char of a malformed string by (next > 0) && (next <= len - pos - 1). But don't return that when the last byte is okay!)
What's the point of this?
The code sample above is for the idealist who does not want to write just a "UTF-8 support" but a "locale support" based on the C library. There might not have a point for this at all in 2021 :)

How do I convert a string to lowercase?

I don't know how to convert a word into complete lowercase in cs50. I have to convert words into lowercase to check it properly.
Below is my code so far
bool check(const char *word) {
char *lword[strlen(word)];
for (i = 0; i < strlen(word); i++) {
lword[i] = tolower(int
word[i]);
}
node *current;
int hashnum = hash(word);
if (table[hashnum] == NULL)
return false;
current = table[hashnum];
while (current->next != NULL) {
if (strcmp(current->word, word) == 0)
return true;
else
current = current->next;
}
return false;
}
This declaration char *lword[strlen(word)]; is a problem. it declares lword as an array of strings (aka char*). An array of chars would be more appropriate. (Also, program would probably complain when lword is sent as an argument to hash function.) Don't forget to declare the lower case word large enough to accommodate the null-terminator, and null-terminate it. Don't forget to send the lower-case word to hash, instead of the original word.

Extra characters and symbols outputted when doing substitution in C

When I run the code using the following key, extra characters are outputted...
TERMINAL WINDOW:
$ ./substitution abcdefghjklmnopqrsTUVWXYZI
plaintext: heTUXWVI ii ssTt
ciphertext: heUVYXWJ jj ttUuh|
This is the instructions (cs50 substitution problem)
Design and implement a program, substitution, that encrypts messages using a substitution cipher.
Implement your program in a file called substitution.c in a ~/pset2/substitution directory.
Your program must accept a single command-line argument, the key to use for the substitution. The key itself should be case-insensitive, so whether any character in the key is uppercase or lowercase should not affect the behavior of your program.
If your program is executed without any command-line arguments or with more than one command-line argument, your program should print an error message of your choice (with printf) and return from main a value of 1 (which tends to signify an error) immediately.
If the key is invalid (as by not containing 26 characters, containing any character that is not an alphabetic character, or not containing each letter exactly once), your program should print an error message of your choice (with printf) and return from main a value of 1 immediately.
Your program must output plaintext: (without a newline) and then prompt the user for a string of plaintext (using get_string).
Your program must output ciphertext: (without a newline) followed by the plaintext’s corresponding ciphertext, with each alphabetical character in the plaintext substituted for the corresponding character in the ciphertext; non-alphabetical characters should be outputted unchanged.
Your program must preserve case: capitalized letters must remain capitalized letters; lowercase letters must remain lowercase letters.
After outputting ciphertext, you should print a newline. Your program should then exit by returning 0 from main.
My code:
#include <cs50.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int main(int argc,string argv[])
{
char alpha[26] = {'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'};
string key = argv[1];
int totalchar = 0;
for (char c ='a'; c <= 'z'; c++)
{
for (int i = 0; i < strlen(key); i++)
{
if (tolower(key[i]) == c)
{
totalchar++;
}
}
}
//accept only singular 26 key
if (argc == 2 && totalchar == 26)
{
string plaint = get_string("plaintext: ");
int textlength =strlen(plaint);
char subchar[textlength];
for (int i= 0; i< textlength; i++)
{
for (int j =0; j<26; j++)
{
// substitute
if (tolower(plaint[i]) == alpha[j])
{
subchar[i] = tolower(key[j]);
// keep plaintext's case
if (plaint[i] >= 'A' && plaint[i] <= 'Z')
{
subchar[i] = (toupper(key[j]));
}
}
// if isn't char
if (!(isalpha(plaint[i])))
{
subchar[i] = plaint[i];
}
}
}
printf("ciphertext: %s\n", subchar);
return 0;
}
else
{
printf("invalid input\n");
return 1;
}
}
strcmp compares two strings. plaint[i] and alpha[j] are chars. The can be compared with "regular" comparison operators, like ==.

MIPS, Number of occurrences in a string located in the stack

I have an exercise to solve in MIPS assembly (where I have some doubts but other things are clear) but I have some problem to write it's code. The exercise ask me:
Write a programm that, obtained a string from keyboard, count the occurrences of the character with the higher number of occurrences and show it.
How I can check all the 26 characters and find who has the higher occurences?
Example:
Give me a string: Hello world!
The character with the higher occurrences is: l
Thanks alot for the future answer.
P.s.
This is my first part of the programm:
#First message
li $v0, 4
la $a0, mess
syscall
#Stack space allocated
addi $sp, $sp, -257
#Read the string
move $a0, $sp
li $a1, 257
li $v0, 8
syscall
Since this is your assignment I'll leave the MIPS assembly implementation to you. I'll just show you the logic for the code in a higher-level language:
// You'd keep these variables in some MIPS registers of your choice
int c, i, count, max_count=0;
char max_char;
// Iterate over all ASCII character codes
for (c = 0; c < 128; c+=1) {
count = 0;
// Count the number of occurences of this character in the string
for (i = 0; string[i]!=0; i+=1) {
if (string[i] == c) count++;
}
// Was is greater than the current max?
if (count > max_count) {
max_count = count;
max_char = c;
}
}
// max_char now hold the ASCII code of the character with the highest number
// of occurences, and max_count hold the number of times that character was
// found in the string.
#Michael, I saw you answered before I posted, I just want to repeat that with a more detailed answer. If you edit your own to add some more explanations, then I will delete mine. I did not edit yours directly, because I was already half-way there when you posted. Anyway:
#Marco:
You can create a temporary array of 26 counters (initialized to 0).
Each counter corresponds to each letter (i.e. the number each letter occurs). For example counter[0] corresponds to the number of occurences of letter 'a', counter[1] for letter 'b', etc...
Then iterate over each character in the input character-sequence and for each character do:
a) Obtain the index of the character in the counter array.
b) Increase counter["obtained index"] by 1.
To obtain the index of the character you can do the following:
a) First make sure the character is not capital, i.e. only 'a' to 'z' allowed and not 'A' to 'Z'. If it is not, convert it.
b) Substract the letter 'a' from the character. This way 'a'-'a' gives 0, 'b'-'a' gives 1, 'c'-'a' gives 2, etc...
I will demonstrate in C language, because it's your exercise on MIPS (I mean the goal is to learn MIPS Assembly language):
#include <stdio.h>
int main()
{
//Maximum length of string:
int stringMaxLength = 100;
//Create string in stack. Size of string is length+1 to
//allow the '\0' character to mark the end of the string.
char str[stringMaxLength + 1];
//Read a string of maximum stringMaxLength characters:
puts("Enter string:");
scanf("%*s", stringMaxLength, str);
fflush(stdin);
//Create array of counters in stack:
int counter[26];
//Initialize the counters to 0:
int i;
for (i=0; i<26; ++i)
counter[i] = 0;
//Main counting loop:
for (i=0; str[i] != '\0'; ++i)
{
char tmp = str[i]; //Storing of str[i] in tmp, to write tmp if needed,
//instead of writing str[i] itself. Optional operation in this particular case.
if (tmp >= 'A' && tmp <= 'Z') //If the current character is upper:
tmp = tmp + 32; //Convert the character to lower.
if (tmp >= 'a' && tmp <='z') //If the character is a lower letter:
{
//Obtain the index of the letter in the array:
int index = tmp - 'a';
//Increment its counter by 1:
counter[index] = counter[index] + 1;
}
//Else if the chacacter is not a lower letter by now, we ignore it,
//or we could inform the user, for example, or we could ignore the
//whole string itself as invalid..
}
//Now find the maximum occurences of a letter:
int indexOfMaxCount = 0;
int maxCount = counter[0];
for (i=1; i<26; ++i)
if (counter[i] > maxCount)
{
maxCount = counter[i];
indexOfMaxCount = i;
}
//Convert the indexOfMaxCount back to the character it corresponds to:
char maxChar = 'a' + indexOfMaxCount;
//Inform the user of the letter with maximum occurences:
printf("Maximum %d occurences for letter '%c'.\n", maxCount, maxChar);
return 0;
}
If you don't understand why I convert the upper letter to lower by adding 32, then read on:
Each character corresponds to an integer value in memory, and when you make arithmetic operations on characters, it's like you are making them to their corresponding number in the encoding table.
An encoding is just a table which matches those letters with numbers.
For example 'a' corresponds to number 97 in ASCII encoding/decoding/table.
For example 'b' corresponds to number 98 in ASCII encoding/decoding/table.
So 'a'+1 gives 97+1=98 which is the character 'b'. They are all numbers in memory, and the difference is how you represent (decode) them. The same table of the encoding, is also used for decoding of course.
Examples:
printf("%c", 'a'); //Prints 'a'.
printf("%d", (int) 'a'); //Prints '97'.
printf("%c", (char) 97); //Prints 'a'.
printf("%d", 97); //Prints '97'.
printf("%d", (int) 'b'); //Prints '98'.
printf("%c", (char) (97 + 1)); //Prints 'b'.
printf("%c", (char) ( ((int) 'a') + 1 ) ); //Prints 'b'.
//Etc...
//All the casting in the above examples is just for demonstration,
//it would work without them also, in this case.

Pipe Read Processing

I have to get input from a user, put that into a pipe(in the parent process) then I have to process the string in the child. All uppercase letters need to be lowercase and all lowercase letters must be uppercase. My issue is with the output of the pipe. My code will only change the letter case of the first character in the string and I am not sure why. The child pipe is reading through all the characters (at least it appears to be). I was hoping someone could tell me why this wont process each character.
while (read(pfd[0], &buf, strlen(cmd)) > 0){
if(buf >= 'a' && buf <= 'z'){
buf = toupper(buf);
}
else{
buf = tolower(buf);
}
}
write(STDOUT_FILENO, &buf, strlen(cmd));
You are making two common mistakes.
(1) read does not buffer for you so you are not guaranteed to get len bytes (i.e.strlen(cmd) in your case.). read will return whatever number of bytes it has available up to the length you specify but it can and often will return less. So you want to change your read loop to reflect that.
(2) buf is presumably a char array. You are always changing the first byte and only the first byte. You need to iterate over the all the bytes you just read.
So putting it all together, something like
while ((bytesread = read(pfd[0], &buf, strlen(cmd))) > 0)
{
for (int i = 0; i < bytesread; ++i)
{
if(buf[i] >= 'a' && buf[i] <= 'z')
buf[i] = toupper(buf[i]);
else
buf[i] = tolower(buf[i]);
}
write(STDOUT_FILENO, &buf, bytesread);
}

Resources