Parsing a string with varying number of whitespace characters in C - string

I'm pretty new to C, and trying to write a function that will parse a string such as:
"This (5 spaces here) is (1 space
here) a (2 spaces here) string."
The function header would have a pointer to the string passed in such as:
bool Class::Parse( unsigned char* string )
In the end I'd like to parse each word regardless of the number of spaces between words, and store the words in a dynamic array.
Forgive the silly questions...
But what would be the most efficient way to do this if I am iterating over each character? Is that how strings are stored? So if I was to start iterating with:
while ( (*string) != '\0' ) {
--print *string here--
}
Would that be printing out
T
h
i... etc?
Thank you very much for any help you can provide.

from http://www.cplusplus.com/reference/clibrary/cstring/strtok/
/* strtok example */
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] ="- This, a sample string.";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," ,.-"); /* split the string on these delimiters into "tokens" */
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ,.-"); /* split the string on these delimiters into "tokens" */
}
return 0;
}
Splitting string "- This, a sample string." into tokens:
This
a
sample
string

First of all, C does not have classes, so in a C program you would probably define your function with a prototype more like one of the following:
char ** my_prog_parse(char * string) {
/* (returns a malloc'd array of pointers into the original string, which has had
* \0 added throughout ) */
char ** my_prog_parse(const char * string) {
/* (returns a malloc'd NULL-terminated array of pointers to malloc'd strings) */
void my_prog_parse(const char * string, char buf, size_t bufsiz,
char ** strings, size_t nstrings)
/* builds a NULL-terminated array of pointers into buf, all memory
provided by caller) */
However, it is perfectly possible to use C-style strings in C++...
You could write your loop as
while (*string) { ... ; string++; }
and it will compile to exactly the same assembler on a modern optimizing compiler. yes, that is a correct way to iterate through a C-style string.
Take a look at the functions strtok, strchr, strstr, and strspn... one of them may help you build a solution.

I wouldn't do any non-trivial parsing in C, it's too laborious, the language is not suitable for that. But if you mean C++, and it looks like you do, since you wrote Class::Parse, then writing recursive descent parsers is pretty easy, and you don't need to reinvent the wheel. You can take Spirit for example, or AXE, if you compiler supports C++0x. For example, your parser in AXE can be written in few lines:
// assuming you have 0-terminated string
bool Class::Parse(const char* str)
{
auto space = r_lit(' ');
auto string_rule = "This" & r_many(space, 5) & space & 'a' & r_many(space, 2)
& "string" & r_end();
return string_rule(str, str + strlen(str)).matched;
}

Related

C Function to return a String resulting in corrupted top size

I am trying to write a program that calls upon an [external library (?)] (I'm not sure that I'm using the right terminology here) that I am also writing to clean up a provided string. For example, if my main.c program were to be provided with a string such as:
asdfFAweWFwseFL Wefawf JAWEFfja FAWSEF
it would call upon a function in externalLibrary.c (lets call it externalLibrary_Clean for now) that would take in the string, and return all characters in upper case without spaces:
ASDFFAWEWFWSEFLWEFAWFJAWEFFJAFAWSEF
The crazy part is that I have this working... so long as my string doesn't exceed 26 characters in length. As soon as I add a 27th character, I end up with an error that says
malloc(): corrupted top size.
Here is externalLibrary.c:
#include "externalLibrary.h"
#include <ctype.h>
#include <malloc.h>
#include <assert.h>
#include <string.h>
char * restrict externalLibrary_Clean(const char* restrict input) {
// first we define the return value as a pointer and initialize
// an integer to count the length of the string
char * returnVal = malloc(sizeof(input));
char * initialReturnVal = returnVal; //point to the start location
// until we hit the end of the string, we use this while loop to
// iterate through it
while (*input != '\0') {
if (isalpha(*input)) { // if we encounter an alphabet character (a-z/A-Z)
// then we convert it to an uppercase value and point our return value at it
*returnVal = toupper(*input);
returnVal++; //we use this to move our return value to the next location in memory
}
input++; // we move to the next memory location on the provided character pointer
}
*returnVal = '\0'; //once we have exhausted the input character pointer, we terminate our return value
return initialReturnVal;
}
int * restrict externalLibrary_getFrequencies(char * ar, int length){
static int freq[26];
for (int i = 0; i < length; i++){
freq[(ar[i]-65)]++;
}
return freq;
}
the header file for it (externalLibrary.h):
#ifndef LEARNINGC_EXTERNALLIBRARY_H
#define LEARNINGC_EXTERNALLIBRARY_H
#ifdef __cplusplus
extern "C" {
#endif
char * restrict externalLibrary_Clean(const char* restrict input);
int * restrict externalLibrary_getFrequencies(char * ar, int length);
#ifdef __cplusplus
}
#endif
#endif //LEARNINGC_EXTERNALLIBRARY_H
my main.c file from where all the action is happening:
#include <stdio.h>
#include "externalLibrary.h"
int main() {
char * unfilteredString = "ASDFOIWEGOASDGLKASJGISUAAAA";//if this exceeds 26 characters, the program breaks
char * cleanString = externalLibrary_Clean(unfilteredString);
//int * charDist = externalLibrary_getFrequencies(cleanString, 25); //this works just fine... for now
printf("\nOutput: %s\n", unfilteredString);
printf("\nCleaned Output: %s\n", cleanString);
/*for(int i = 0; i < 26; i++){
if(charDist[i] == 0){
}
else {
printf("%c: %d \n", (i + 65), charDist[i]);
}
}*/
return 0;
}
I'm extremely well versed in Java programming and I'm trying to translate my knowledge over to C as I wish to learn how my computer works in more detail (and have finer control over things such as memory).
If I were solving this problem in Java, it would be as simple as creating two class files: one called main.java and one called externalLibrary.java, where I would have static String Clean(string input) and then call upon it in main.java with String cleanString = externalLibrary.Clean(unfilteredString).
Clearly this isn't how C works, but I want to learn how (and why my code is crashing with corrupted top size)
The bug is this line:
char * returnVal = malloc(sizeof(input));
The reason it is a bug is that it requests an allocation large enough space to store a pointer, meaning 8 bytes in a 64-bit program. What you want to do is to allocate enough space to store the modified string, which you can do with the following line:
char *returnVal = malloc(strlen(input) + 1);
So the other part of your question is why the program doesn't crash when your string is less than 26 characters. The reason is that malloc is allowed to give the caller slightly more than the caller requested.
In your case, the message "malloc(): corrupted top size" suggests that you are using libc malloc, which is the default on Linux. That variant of malloc, in a 64-bit process, would always give you at least 0x18 (24) bytes (minimum chunk size 0x20 - 8 bytes for the size/status). In the specific case that the allocation immediately precedes the "top" allocation, writing past the end of the allocation will clobber the "top" size.
If your string is larger than 23 (0x17) you will start to clobber the size/status of the subsequent allocation because you also need 1 byte to store the trailing NULL. However, any string 23 characters or shorter will not cause a problem.
As to why you didn't get an error with a string with 26 characters, to answer that one would have to see that exact program with the string of 26 characters that does not crash to give a more precise answer. For example, if the program provided a 26-character input that contained 3 blanks, this would would require only 26 + 1 - 3 = 24 bytes in the allocation, which would fit.
If you are not interested in that level of detail, fixing the malloc call to request the proper amount will fix your crash.

Warning : comparison between pointer and integer

struct smt{
char *c;
};
int main(){
char *w="astring";
if(smt->c == w[0])
...do something
}
How do I fix the warning that I get in the if and what exacly causes it?
The warning shows up because you're comparing smt->c, which is char*, to w[0], which is a character (that for this comparison gets implicitly casted to int).
You probably meant comparing the first character like this:
if(smt->c[0] == w[0]) { ... }
If you want to compare full strings, use
if(strcmp(smt->c, w) == 0) { ... }
or even better, use strncmp if you know the maximum length the strings can have.
The error comes from the fact that often (almost always), you don't want to compare an adress (pointer) with a character.
You're comparing a char* c with a char 'a'. What you want to do is this I believe:
struct smt{
char *c;
};
int main(){
char *w="astring";
// Here smt->c returns a char*
// w[0] gets you the first character, so 'a'
if(strcmp(smt->c, w) == 0)
...do something
}
If you want to compare the first characters of both strings, you have to add [0] to smt->c

how to write data within malloc memory in C

Suppose I have
void * space= malloc(500) //alloc a block of memory
and I need to write two strings and an int: "Hello world", "Goodbye friend",5 in memory address 50,150,380, correspondance to space.I tried the approach:
int * insert = (int *)(space +380);
*insert = 5;
char * insert2 = (char *)(space+50);
*insert2 = "Hello world";
char * insert3 = (char *)(space + 150);
strcpy(insert3,"Goodbye friend");
the int has no problem, but the two string throws error messages. so what's the correct method to do this? Also, how can you check if there is an overlap in memory between those inputs (the string inputed can be arbitarily long?
Looking at this, I'm pretty sure that you trying to store "H" (which is a constant in C) that is of type int into insert2. In other words, your trying to store a type int (sizeof("H") == 2 on my compiler) into a char (sizeof(char) == 1). From there, it is up to your compiler to allow this behavior or throw an error. Plus "Hello world" is an immutable string that is located only in ROM, so you can't directly modified the string anyways.
To demonstrate, I ran:
printf("val:%d\n", "H");
printf("val:%d\n", "He");
printf("val:%d\n", "He");\\note 2nd instance the same
printf("val:%d\n", "Hel");
printf("val:%d\n", "Hell");
printf("val:%d\n", "Hello");
printf("val:%d\n", "Hello ");
printf("val:%d\n", "Hello W");
printf("val:%d\n", "Hello Wo");
printf("val:%d\n", "Hello Wor");
printf("val:%d\n", "Hello Worl");
printf("val:%d\n", "Hello World");
Yields:
val:4196404
val:4196406
val:4196406\\note 2nd instance the same
val:4196409
val:4196413
val:4196418
val:4196424
val:4196431
val:4196439
val:4196448
val:4196458
val:4196469
As #Rasmus was saying:
char * insert2 = (char *)(space+50);
strcpy(insert2 , "Hello world\0");// the '\0' if changing strings
Should resolve the issue.
BUT, I do not see much practical use for this, besides being vague/indecipherable, and not to mention wasteful of memory (unless you calculate it all out), so I recommend just using a struct:
struct blockOfMem{
int num;
char* str1;
char* str2;
};
....
struct blockOfMem space;
space.num = 5;
strcpy(space.str1, "Hello world\0");// the '\0' if changing strings
strcpy(space.str2, "Goodbye friend\0");// the '\0' if changing strings
This makes you code much more readable and practical to use.
Firstly, using strcpy i could make your code compile.
void * space= malloc(500); //alloc a block of memory
int * insert = (int *)(space +380);
*insert = 5;
char * insert2 = (char *)(space+50);
strcpy(insert2 , "Hello world");
char * insert3 = (char *)(space + 150);
strcpy(insert3,"Goodbye friend");
I would use strlen to get the length of the string and then sizeof(char). To get how much to allocate:
malloc(sizeof(int)+sizeof(char)*(1+strlen(string1))); // 1+ since you need to count the \0.
http://www.cplusplus.com/reference/cstring/strlen/

Conversion of CString to float

Some body help me regarding to the following problem
strFixFactorSide = _T("0.5");
dFixFactorSide = atof((const char *)(LPCTSTR)strFixFactorSide);
"dFixFactorSide" takes value as 0.0000;
How I will get correct value?
Use _tstof() instead of atof(), and cast CString to LPCTSTR, and leave it as such, instead of trying to get it to const char *. Forget about const char * (LPCSTR) while you're working with unicode and use only const _TCHAR * (LPCTSTR).
int _tmain(int argc, TCHAR* argv[], TCHAR* envp[])
{
int nRetCode = 0;
CString s1 = _T("123.4");
CString s2 = _T("567.8");
double v1 = _tstof((LPCTSTR)s1);
double v2 = _tstof((LPCTSTR)s2);
_tprintf(_T("%.3f"), v1 + v2);
return nRetCode;
}
and running this correctly gives the expected answer.
I think your CString strFixFactorSide is a Unicode (UTF-16) string.
If it is, the cast (const char *) only changes the pointer type, but the string it points to still remains Unicode.
atof() doesn't work with Unicode strings. If you shove L"0.5" into it, it will fetch bytes 0x30 ('0') and 0x00 (also part of UTF-16 '0'), treat that as a NUL-terminated ASCII string "0" and convert it to 0.0.
If CString strFixFactorSide is a Unicode string, you need to either first convert it to an ASCII string and then apply atof() or use a function capable of converting Unicode strings to numbers. _wtof() can be used for Unicode strings.

Converting an int or String to a char array on Arduino

I am getting an int value from one of the analog pins on my Arduino. How do I concatenate this to a String and then convert the String to a char[]?
It was suggested that I try char msg[] = myString.getChars();, but I am receiving a message that getChars does not exist.
To convert and append an integer, use operator += (or member function concat):
String stringOne = "A long integer: ";
stringOne += 123456789;
To get the string as type char[], use toCharArray():
char charBuf[50];
stringOne.toCharArray(charBuf, 50)
In the example, there is only space for 49 characters (presuming it is terminated by null). You may want to make the size dynamic.
Overhead
The cost of bringing in String (it is not included if not used anywhere in the sketch), is approximately 1212 bytes of program memory (flash) and 48 bytes RAM.
This was measured using Arduino IDE version 1.8.10 (2019-09-13) for an Arduino Leonardo sketch.
Risk
There must be sufficient free RAM available. Otherwise, the result may be lockup/freeze of the application or other strange behaviour (UB).
Just as a reference, below is an example of how to convert between String and char[] with a dynamic length -
// Define
String str = "This is my string";
// Length (with one extra character for the null terminator)
int str_len = str.length() + 1;
// Prepare the character array (the buffer)
char char_array[str_len];
// Copy it over
str.toCharArray(char_array, str_len);
Yes, this is painfully obtuse for something as simple as a type conversion, but somehow it's the easiest way.
You can convert it to char* if you don't need a modifiable string by using:
(char*) yourString.c_str();
This would be very useful when you want to publish a String variable via MQTT in arduino.
None of that stuff worked. Here's a much simpler way .. the label str is the pointer to what IS an array...
String str = String(yourNumber, DEC); // Obviously .. get your int or byte into the string
str = str + '\r' + '\n'; // Add the required carriage return, optional line feed
byte str_len = str.length();
// Get the length of the whole lot .. C will kindly
// place a null at the end of the string which makes
// it by default an array[].
// The [0] element is the highest digit... so we
// have a separate place counter for the array...
byte arrayPointer = 0;
while (str_len)
{
// I was outputting the digits to the TX buffer
if ((UCSR0A & (1<<UDRE0))) // Is the TX buffer empty?
{
UDR0 = str[arrayPointer];
--str_len;
++arrayPointer;
}
}
With all the answers here, I'm surprised no one has brought up using itoa already built in.
It inserts the string representation of the integer into the given pointer.
int a = 4625;
char cStr[5]; // number of digits + 1 for null terminator
itoa(a, cStr, 10); // int value, pointer to string, base number
Or if you're unsure of the length of the string:
int b = 80085;
int len = String(b).length();
char cStr[len + 1]; // String.length() does not include the null terminator
itoa(b, cStr, 10); // or you could use String(b).toCharArray(cStr, len);

Resources