Read a String with spaces till a new line in C - string

I am in a pickle right now. I'm having trouble taking in an input of example
1994 The Shawshank Redemption
1994 Pulp Fiction
2008 The Dark Knight
1957 12 Angry Men
I first take in the number into an integer, then I need to take in the name of the Movie into a string using a character array, however i have not been able to get this done.
here is the code atm
while(scanf("%d", &myear) != EOF)
{
i = 0;
while(scanf("%[^\n]", &ch))
{
title[i] = ch;
i++;
}
addNode(makeData(title,myear));
}
The title array is arbitrarily large and the function is to add the data as a node to a linked list. right now the output I keep getting for each node is as follows
" hank Redemption"
" ion"
" Knight"
" Men"
Yes, it oddly prints a space in front of the cut-off title. I checked the variables and it adds the space in the data. (I am not printing the year as that is taken in correctly)
How can I fix this?

You are using the wrong type of argument passed to scanf() -- instead of scanning a character, try scanning to the string buffer immediately. %[^\n] scans an entire string up to (but not including) the newline. It does not scan only one character.
(Marginal secondary problem: I don't know from where you people are getting the idea that scanf() returns EOF at end of input, but it doesn't - you'd be better off reading the documentation instead of making incorrect assumptions.)
I hope you see now: scanf() is hard to get right. It's evil. Why not input the whole line at once then parse it using sane functions?
char buf[LINE_MAX];
while (fgets(buf, sizeof buf, stdin) != NULL) {
int year = strtol(buf, NULL, 0);
const char *p = strchr(buf, ' ');
if (p != NULL) {
char name[LINE_MAX];
strcpy(name, p + 1); // safe because strlen(p) <= sizeof(name)
}
}

Related

How to add pointer char datas (created using malloc) to a char array in C?

In my MPI code in C, i'm receiving a word from each of my slave processes. I want to add all these words to an char array in master side (part of code below). I can print these words but not collect them into a single char array.
(I consider max word length as 10, and number of slave's as slavenumber)
char* word = (char*)malloc(sizeof(char)*10);
char words[slavenumber*10];
for (int p = 0; p<slavenumber; p++){
MPI_Recv(word, 10, MPI_CHAR, p, 0,MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Word: %s\n", word); //it works fine
words[p*10] = *word; //This does not work, i think there is a problem here.
}
printf(words); //This does not work correctly, it gives something like: ��>;&�>W�
Can anybody help me on this?
Let's break it down line by line
// allocate a buffer large enough to hold 10 elements of type `char`
char* word = (char*)malloc(sizeof(char)*10);
// define a variable-length-array large enough to
// hold 10*slavenumber elements of `char`
char words[slavenumber*10];
for (int p = 0; p<slavenumber; p++){
// dereference `word` which is exactly the same as writing
// `word[0]` assigning it to `words[p*10]`
words[p*10] = *word;
// words[p*10+1] to words[p*10+9] are unchanged,
// i.e. uninitialized
}
// printing from an array. For this to work properly all
// accessed elements must be initialized and the buffer
// terminated by a null byte. You have neither
printf(words);
Because you left elements uninitialized and didn't null terminate, you're invoking undefined behavior. Be happy that you didn't get demons crawl out of your nose.
In seriousness though, in C you can copy strings by mere assignment. Your usage case calls for strncpy.
for (int p = 0; p<slavenumber; p++){
strncpy(&words[p*10], word, 10);
}

How to store Integers from a string without using the getline function. C++

Sorry to ask, but I been looking everywhere to find a way to extract the integers from this set of strings:
{(1,2),(1,5),(2,1),(2,3),(3,2),(3,4),(4,3),(4,5),(5,1),(5,4)}
I don't really need the homework done, if you could link me to an example, I'll appreciate it.
thank you in advanced.
If you just want to access the integers from a line like that, one way is to simply continue reading integers while you can.
If, for some reason, you find an integer read failing (because there's a { in the input stream, for example), just skip over that single character and keep going.
Sample code for this is:
#include <iostream>
int main() {
int intVal; // for getting int
char charVal; // for skipping chars
while (true) {
while (! (std::cin >> intVal)) { // while no integer available
std::cin.clear(); // clear fail bit and
if (! (std::cin >> charVal)) { // skip the offending char.
return 0; // if no char left, end of file.
}
}
std::cout << intVal << '\n'; // print int and carry on
}
return 0;
}
A transcript follows:
pax> echo '{(314159,271828),(42,-1)}' | ./testprog
314159
271828
42
-1

Improve serial building of a string with openMP {Copeland-Erdős constant}

I'm building a program to find substrings of Copeland-Erdős constant in C++11
Copeland-Erdős constant is a string with all primes in order:
2,3,5,7,11,13… → 23571113…
I need to check if a substring given is inside that constant, and do it in a quick way.
By the moment I've build a serial program using Miller Rabin function for checking if the numbers generated by a counter are primes or not and add to the main string (constant). To find 8th Marsene Number (231-1) the program spends 8 minutes.
And then, I use find to check if the substring given is in the constant and the position where it starts.
PROBLEMS:
I use serial programming. I start at 0 and check if all numbers are prime to add them or not... I don't know if there is any other way to do it. The substring can be a mix of primes. ex: 1..{1131}..7 (substring of 11,13,17)
Do you have any proposal to improve the program execution time by using OpenMP?
I want to calculate 9th Mersene Number in "human time". I've spend more than one day and it doesn't find it (well, arrive to the number).
gcc version 4.4.7 20120313
Main.cpp
while (found == -1 && lastNumber < LIMIT) //while not found & not pass our limit
{
//I generate at least a string with double size of the input (llargada)
for (lastNumber; primers.length() <= 2*llargada; lastNumber++){
if (is_prime_mr(lastNumber))
primers += to_string(lastNumber); //if prime, we add it to the main string
}
found = primers.find(sequencia); //search substring and keep position
if (found == string::npos){ //if not found
indexOfZero += primers.length()/2; //keep IndexOfZero, the position of string in global constant
primers.erase(0,primers.length()/2); //delete first middle part of calculated string
}
}
if (found != -1){
cout << "FOUNDED!" << endl;
cout << "POS: " << indexOfZero << " + " << found << " = " << indexOfZero+found << endl;} //that give us the real position of the substring in the main string
//although we only spend 2*inputString.size() memory
else
cout << "NOT FOUND" << endl;
Improving serial execution:
For starters, you do not need to check every number to see if it's prime, but rather every odd number (except for 2). We know that no even number past two can be prime. This should cut down your execution time in half.
Also, I do not understand why you have a nested loop. You should only have to check your list once.
Also, I fear that your algorithm might not be correct. Currently, if you do not find the substring, you delete half of your string and move on. However, if you have 50 non-primes in a row, you could end up deleting the entire string except for the very last character. But what if the substring you're looking for is 3 digits and needed 2 of the previous characters? Then you've erased some of the information needed to find your solution!
Finally, you should only search for your substring if you've actually found a prime number. Otherwise, you have already searched for it last iteration and nothing has been added to your string.
Combining all of these ideas, you have:
primers = "23";
lastNumber = 3;
found = -1;
while (found == -1)
{
lastNumber += 2;
if (is_prime_mr(lastNumber)) {
primers += to_string(lastNumber); //if prime, we add it to the main string
found = primers.find(sequencia); //search substring and keep position
if (found == string::npos)
found = -1;
else
break;
}
}
Also, you should write your own find function to only check the last few digits (where few = length of your most recent concatenation to the global string primers). If the substring wasn't in the previous global string, there's only a few places it could pop up in your newest string. That algorithm should be O(1) as opposed to O(n).
int findSub(std::string total, std::string substring, std::string lastAddition);
With this change your if statement should change to:
if (found != -1)
break;
Adding parallelism:
Unfortunately, as-is, your algorithm is inherently serial because you have to iterate through all the primes one-by-one, adding them to the list in a row in order to find your answer. There's no simple OpenMP way to parallelize your algorithm.
However, you can take advantage of parallelism by breaking up your string into pieces and having each thread work separately. Then, the only tricky thing you have to do is consider the boundaries between the final strings to double check you haven't missed anything. Something like as follows:
bool globalFound = false;
bool found;
std::vector<std::string> primers;
#pragma omp parallel private(lastNumber, myFinalNumber, found, my_id, num_threads)
{
my_id = omp_get_thread_num();
num_threads = omp_get_num_threads();
if (my_id == 0) { // first thread starts at 0... well, actually 3
primers.resize(num_threads);
#pragma omp barrier
primers[my_id] = "23";
lastNumber = 3;
}
else {
// barrier needed to ensure that primers is initialized to correct size
#pragma omp barrier
primers[my_id] = "";
lastNumber = (my_id/(double)num_threads)*LIMIT - 2; // figure out my starting place
if (lastNumber % 2 == 0) // ensure I'm not even
lastNumber++;
}
found = false;
myFinalNumber = ((my_id+1)/(double)num_threads)*LIMIT - 2;
while (!globalFound && lastNumber < myFinalNumber)
{
lastNumber += 2;
if (is_prime_mr(lastNumber)) {
primers[my_id] += to_string(lastNumber);
found = findSub(primers[my_id], sequencia, to_string(lastNumber)); // your new version of find
if (found) {
#pragma omp atomic
globalFound = true;
break;
}
}
}
}
if (!globalFound) {
// Result was not found in any thread, so check for boundaries/endpoints
globalFound = findVectorSubstring(primers, sequencia);
}
I'll let you finish this (by writing the smart find, findVectorSubstring - should only be checking for boundaries between elements of primers, and double checking you understand the logic of this new algorithm). Furthermore, if the arbitrary LIMIT that you setup turns out to be too small, you can always wrap this whole thing in a loop that searches between i*LIMIT and (i+1)*LIMIT.
Lastly, yes there will be load balancing issues. I can certainly imagine threads finding an uneven amount of prime numbers. Therefore, certain threads will be doing more work in the find function than others. However, a smart version of find() should be O(1) whereas is_prime_mr() is probably O(n) or O(logn), so I'm assuming that the majority of the execution time will be spent in the is_prime_mr() function. Therefore, I do not believe the load balancing will be too bad.
Hope this helps.

Longest Subsequence with all occurrences of a character at 1 place

In a sequence S of n characters; each character may occur many times in the sequence. You want to find the longest subsequence of S where all occurrences of the same character are together in one place;
For ex. if S = aaaccaaaccbccbbbab, then the longest such subsequence(answer) is aaaaaaccccbbbb i.e= aaa__aaacc_ccbbb_b.
In other words, any alphabet character that appears in S may only appear in one contiguous block in the subsequence. If possible, give a polynomial time
algorithm to determine the solution.
Design
Below I give a C++ implementation of a dynamic programming algorithm that solves this problem. An upper bound on the running time (which is probably not tight) is given by O(g*(n^2 + log(g))), where n is the length of the string and g is the number of distinct subsequences in the input. I don't know a good way to characterise this number, but it can be as bad as O(2^n) for a string consisting of n distinct characters, making this algorithm exponential-time in the worst case. It also uses O(ng) space to hold the DP memoisation table. (A subsequence, unlike a substring, may consist of noncontiguous character from the original string.) In practice, the algorithm will be fast whenever the number of distinct characters is small.
The two key ideas used in coming up with this algorithm were:
Every subsequence of a length-n string is either (a) the empty string or (b) a subsequence whose first element is at some position 1 <= i <= n and which is followed by another subsequence on the suffix beginning at position i+1.
If we append characters (or more specifically character positions) one at a time to a subsequence, then in order to build all and only the subsequences that satisfy the validity criteria, whenever we add a character c, if the previous character added, p, was different from c, then it is no longer possible to add any p characters later on.
There are at least 2 ways to manage the second point above. One way is to maintain a set of disallowed characters (e.g. using a 256-bit array), which we add to as we add characters to the current subsequence. Every time we want to add a character to the current subsequence, we first check whether it is allowed.
Another way is to realise that whenever we have to disallow a character from appearing later in the subsequence, we can achieve this by simply deleting all copies of the character from the remaining suffix, and using this (probably shorter) string as the subproblem to solve recursively. This strategy has the advantage of making it more likely that the solver function will be called multiple times with the same string argument, which means more computation can be avoided when the recursion is converted to DP. This is how the code below works.
The recursive function ought to take 2 parameters: the string to work on, and the character most recently appended to the subsequence that the function's output will be appended to. The second parameter must be allowed to take on a special value to indicate that no characters have been appended yet (which happens in the top-level recursive case). One way to accomplish this would be to choose a character that does not appear in the input string, but this introduces a requirement not to use that character. The obvious workaround is to pass a 3rd parameter, a boolean indicating whether or not any characters have already been added. But it's slightly more convenient to use just 2 parameters: a boolean indicating whether any characters have been added yet, and a string. If the boolean is false, then the string is simply the string to be worked on. If it is true, then the first character of the string is taken to be the last character added, and the rest is the string to be worked on. Adopting this approach means the function takes only 2 parameters, which simplifies memoisation.
As I said at the top, this algorithm is exponential-time in the worst case. I can't think of a way to completely avoid this, but some optimisations can help certain cases. One that I've implemented is to always add maximal contiguous blocks of the same character in a single step, since if you add at least one character from such a block, it can never be optimal to add fewer than the entire block. Other branch-and-bound-style optimisations are possible, such as keeping track of a globally best string so far and cutting short the recursion whenever we can be certain that the current subproblem cannot produce a longer one -- e.g. when the number of characters added to the subsequence so far, plus the total number of characters remaining, is less than the length of the best subsequence so far.
Code
#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
#include <functional>
#include <map>
using namespace std;
class RunFinder {
string s;
map<string, string> memo[2]; // DP matrix
// If skip == false, compute the longest valid subsequence of t.
// Otherwise, compute the longest valid subsequence of the string
// consisting of t without its first character, taking that first character
// to be the last character of a preceding subsequence that we will be
// adding to.
string calc(string const& t, bool skip) {
map<string, string>::iterator m(memo[skip].find(t));
// Only calculate if we haven't already solved this case.
if (m == memo[skip].end()) {
// Try the empty subsequence. This is always valid.
string best;
// Try starting a subsequence whose leftmost position is one of
// the remaining characters. Instead of trying each character
// position separately, consider only contiguous blocks of identical
// characters, since if we choose one character from this block there
// is never any harm in choosing all of them.
for (string::const_iterator i = t.begin() + skip; i != t.end();) {
if (t.end() - i < best.size()) {
// We can't possibly find a longer string now.
break;
}
string::const_iterator next = find_if(i + 1, t.end(), bind1st(not_equal_to<char>(), *i));
// Just use next - 1 to cheaply give us an extra char at the start; this is safe
string u(next - 1, t.end());
u[0] = *i; // Record the previous char for the recursive call
if (skip && *i != t[0]) {
// We have added a new segment that is different from the
// previous segment. This means we can no longer use the
// character from the previous segment.
u.erase(remove(u.begin() + 1, u.end(), t[0]), u.end());
}
string v(i, next);
v += calc(u, true);
if (v.size() > best.size()) {
best = v;
}
i = next;
}
m = memo[skip].insert(make_pair(t, best)).first;
}
return (*m).second;
}
public:
RunFinder(string s) : s(s) {}
string calc() {
return calc(s, false);
}
};
int main(int argc, char **argv) {
RunFinder rf(argv[1]);
cout << rf.calc() << '\n';
return 0;
}
Example results
C:\runfinder>stopwatch runfinder aaaccaaaccbccbbbab
aaaaaaccccbbbb
stopwatch: Terminated. Elapsed time: 0ms
stopwatch: Process completed with exit code 0.
C:\runfinder>stopwatch runfinder abbaaasdbasdnfa,mnbmansdbfsbdnamsdnbfabbaaasdbasdnfa,mnbmansdbfsbdnamsdnbfabbaaasdbasdnfa,mnbmansdbfsbdnamsdnbfabbaaasdbasdnfa,mnbmansdbfsbdnamsdnbf
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,mnnsdbbbf
stopwatch: Terminated. Elapsed time: 609ms
stopwatch: Process completed with exit code 0.
C:\runfinder>stopwatch -v runfinder abcdefghijklmnopqrstuvwxyz123456abcdefghijklmnop
stopwatch: Command to be run: <runfinder abcdefghijklmnopqrstuvwxyz123456abcdefghijklmnop>.
stopwatch: Global memory situation before commencing: Used 2055507968 (49%) of 4128813056 virtual bytes, 1722564608 (80%) of 2145353728 physical bytes.
stopwatch: Process start time: 21/11/2012 02:53:14
abcdefghijklmnopqrstuvwxyz123456
stopwatch: Terminated. Elapsed time: 8062ms, CPU time: 7437ms, User time: 7328ms, Kernel time: 109ms, CPU usage: 92.25%, Page faults: 35473 (+35473), Peak working set size: 145440768, Peak VM usage: 145010688, Quota peak paged pool usage: 11596, Quota peak non paged pool usage: 1256
stopwatch: Process completed with exit code 0.
stopwatch: Process completion time: 21/11/2012 02:53:22
The last run, which took 8s and used 145Mb, shows how it can have problems with strings containing many distinct characters.
EDIT: Added in another optimisation: we now exit the loop that looks for the place to start the subsequence if we can prove that it cannot possibly be better than the best one discovered so far. This drops the time needed for the last example from 32s down to 8s!
EDIT: This solution is wrong for OP's problem. I'm not deleting it because it might be right for someone else. :)
Consider a related problem: find the longest subsequence of S of consecutive occurrences of a given character. This can be solved in linear time:
char c = . . .; // the given character
int start = -1;
int bestStart = -1;
int bestLength = 0;
int currentLength = 0;
for (int i = 0; i < S.length; ++i) {
if (S.charAt(i) == c) {
if (start == -1) {
start = i;
}
++currentLength;
} else {
if (currentLength > bestLength) {
bestStart = start;
bestLength = currentLength;
}
start = -1;
currentLength = 0;
}
}
if (bestStart >= 0) {
// longest sequence of c starts at bestStart
} else {
// character c does not occur in S
}
If the number of distinct characters (call it m) is reasonably small, just apply this algorithm in parallel to each character. This can be easily done by converting start, bestStart, currentLength, bestLength to arrays m long. At the end, scan the bestLength array for the index of the largest entry and use the corresponding entry in the bestStart array as your answer. The total complexity is O(mn).
import java.util.*;
public class LongestSubsequence {
/**
* #param args
*/
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
String str = sc.next();
execute(str);
}
static void execute(String str) {
int[] hash = new int[256];
String ans = "";
for (int i = 0; i < str.length(); i++) {
char temp = str.charAt(i);
hash[temp]++;
}
for (int i = 0; i < hash.length; i++) {
if (hash[i] != 0) {
for (int j = 0; j < hash[i]; j++)
ans += (char) i;
}
}
System.out.println(ans);
}
}
Space: 256 -> O(256), I don't if it's correct to say this way..., cause O(256) I think is O(1)
Time: O(n)

Converting an int or String to a char array on Arduino

I am getting an int value from one of the analog pins on my Arduino. How do I concatenate this to a String and then convert the String to a char[]?
It was suggested that I try char msg[] = myString.getChars();, but I am receiving a message that getChars does not exist.
To convert and append an integer, use operator += (or member function concat):
String stringOne = "A long integer: ";
stringOne += 123456789;
To get the string as type char[], use toCharArray():
char charBuf[50];
stringOne.toCharArray(charBuf, 50)
In the example, there is only space for 49 characters (presuming it is terminated by null). You may want to make the size dynamic.
Overhead
The cost of bringing in String (it is not included if not used anywhere in the sketch), is approximately 1212 bytes of program memory (flash) and 48 bytes RAM.
This was measured using Arduino IDE version 1.8.10 (2019-09-13) for an Arduino Leonardo sketch.
Risk
There must be sufficient free RAM available. Otherwise, the result may be lockup/freeze of the application or other strange behaviour (UB).
Just as a reference, below is an example of how to convert between String and char[] with a dynamic length -
// Define
String str = "This is my string";
// Length (with one extra character for the null terminator)
int str_len = str.length() + 1;
// Prepare the character array (the buffer)
char char_array[str_len];
// Copy it over
str.toCharArray(char_array, str_len);
Yes, this is painfully obtuse for something as simple as a type conversion, but somehow it's the easiest way.
You can convert it to char* if you don't need a modifiable string by using:
(char*) yourString.c_str();
This would be very useful when you want to publish a String variable via MQTT in arduino.
None of that stuff worked. Here's a much simpler way .. the label str is the pointer to what IS an array...
String str = String(yourNumber, DEC); // Obviously .. get your int or byte into the string
str = str + '\r' + '\n'; // Add the required carriage return, optional line feed
byte str_len = str.length();
// Get the length of the whole lot .. C will kindly
// place a null at the end of the string which makes
// it by default an array[].
// The [0] element is the highest digit... so we
// have a separate place counter for the array...
byte arrayPointer = 0;
while (str_len)
{
// I was outputting the digits to the TX buffer
if ((UCSR0A & (1<<UDRE0))) // Is the TX buffer empty?
{
UDR0 = str[arrayPointer];
--str_len;
++arrayPointer;
}
}
With all the answers here, I'm surprised no one has brought up using itoa already built in.
It inserts the string representation of the integer into the given pointer.
int a = 4625;
char cStr[5]; // number of digits + 1 for null terminator
itoa(a, cStr, 10); // int value, pointer to string, base number
Or if you're unsure of the length of the string:
int b = 80085;
int len = String(b).length();
char cStr[len + 1]; // String.length() does not include the null terminator
itoa(b, cStr, 10); // or you could use String(b).toCharArray(cStr, len);

Resources