(Optimization?) Bug regarding GCC std::thread

(Optimization?) Bug regarding GCC std::thread - multithreading

While testing some functionality with std::thread, a friend encountered a problem with GCC and we thought it's worth asking if this is a GCC bug or perhaps there's something wrong with this code (the code prints (for example) "7 8 9 10 1 2 3", but we expect every integer in [1,10] to be printed):
#include <algorithm>
#include <iostream>
#include <iterator>
#include <thread>
int main() {
int arr[10];
std::iota(std::begin(arr), std::end(arr), 1);
using itr_t = decltype(std::begin(arr));
// the function that will display each element
auto f = [] (itr_t first, itr_t last) {
while (first != last) std::cout<<*(first++)<<' ';};
// we have 3 threads so we need to figure out the ranges for each thread to show
int increment = std::distance(std::begin(arr), std::end(arr)) / 3;
auto first = std::begin(arr);
auto to = first + increment;
auto last = std::end(arr);
std::thread threads[3] = {
std::thread{f, first, to},
std::thread{f, (first = to), (to += increment)},
std::thread{f, (first = to), last} // go to last here to account for odd array sizes
};
for (auto&& t : threads) t.join();
}
The following alternate code works:
int main()
{
std::array<int, 10> a;
std::iota(a.begin(), a.end(), 1);
using iter_t = std::array<int, 10>::iterator;
auto dist = std::distance( a.begin(), a.end() )/3;
auto first = a.begin(), to = first + dist, last = a.end();
std::function<void(iter_t, iter_t)> f =
[]( iter_t first, iter_t last ) {
while ( first != last ) { std::cout << *(first++) << ' '; }
};
std::thread threads[] {
std::thread { f, first, to },
std::thread { f, to, to + dist },
std::thread { f, to + dist, last }
};
std::for_each(
std::begin(threads),std::end(threads),
std::mem_fn(&std::thread::join));
return 0;
}
We thought maybe its got something to do with the unsequenced evaluation of function's arity or its just the way std::thread is supposed to work when copying non-std::ref-qualified arguments. We then tested the first code with Clang and it works (and so started to suspect a GCC bug).
Compiler used: GCC 4.7, Clang 3.2.1
EDIT: The GCC code gives the wrong output with the first version of the code, but with the second version it gives the correct output.

From this modified program:
#include <algorithm>
#include <iostream>
#include <iterator>
#include <thread>
#include <sstream>
int main()
{
int arr[10];
std::iota(std::begin(arr), std::end(arr), 1);
using itr_t = decltype(std::begin(arr));
// the function that will display each element
auto f = [] (itr_t first, itr_t last) {
std::stringstream ss;
ss << "**Pointer:" << first << " | " << last << std::endl;
std::cout << ss.str();
while (first != last) std::cout<<*(first++)<<' ';};
// we have 3 threads so we need to figure out the ranges for each thread to show
int increment = std::distance(std::begin(arr), std::end(arr)) / 3;
auto first = std::begin(arr);
auto to = first + increment;
auto last = std::end(arr);
std::thread threads[3] = {
std::thread{f, first, to},
#ifndef FIX
std::thread{f, (first = to), (to += increment)},
std::thread{f, (first = to), last} // go to last here to account for odd array sizes
#else
std::thread{f, to, to+increment},
std::thread{f, to+increment, last} // go to last here to account for odd array sizes
#endif
};
for (auto&& t : threads) {
t.join();
}
}
I add the prints of the first and last pointer for lambda function f, and find this interesting results (when FIX is undefined):
**Pointer:0x28abd8 | 0x28abe4
1 2 3 **Pointer:0x28abf0 | 0x28abf0
**Pointer:0x28abf0 | 0x28ac00
7 8 9 10
Then I add some code for the #ELSE case for the #ifndef FIX. It works well.
- Update: This conclusion, the original post below, is wrong. My fault. See Josh's comment below -
I believe the 2nd line std::thread{f, (first = to), (to +=
increment)}, of threads[] contains a bug: The assignment inside the
two pairs of parenthesis, can be evaluated in any order, by the
parser. Yet the assignment order of 1st, 2nd and 3rd argument of the
constructor needs to keep the order as given.
--- Update: corrected ---
Thus the above debug printing results suggest that GCC4.8.2 (my version)
is still buggy (not to say GCC4.7), but GCC 4.9.2 fixes this bug, as
reported by Maxim Yegorushkin (see comment above).

Related

std::max giving error C2064: term does not evaluate to a function taking 2 arguments

I am fairly new to C++. I was practicing some ds,algo.This code looks fine to me, but I am getting some error about function not taking 2 arguments. Though I get some error asked in stackoverflow none of the cases match my problem.
#include <iostream>
#include <algorithm>
int ropecutting(int n, int *cuts){
if (n == 0)
return 0;
if (n < 0)
return -1;
int res = std::max(ropecutting(n-cuts[0], cuts), ropecutting(n-cuts[1], cuts), ropecutting(n-cuts[2], cuts));
if(res == -1) return -1;
return res+1;
}
int main(){
int n, cuts[3];
std::cin >> n;
for(int i = 0; i < 3; i ++)
std::cin >> cuts[i];
std::cout << ropecutting(n, cuts);
}
The error I get is,
main.cpp
G:\software_installation\Visual Studio Community 2017\VC\Tools\MSVC\14.16.27023\include\xlocale(319): warning C4530: C++ exception handler used, but unwind semantics are not enabled. Specify /EHsc
G:\software_installation\Visual Studio Community 2017\VC\Tools\MSVC\14.16.27023\include\algorithm(5368): error C2064: term does not evaluate to a function taking 2 arguments
G:\software_installation\Visual Studio Community 2017\VC\Tools\MSVC\14.16.27023\include\algorithm(5367): note: see reference to function template instantiation 'const _Ty &std::max<int,int>(const _Ty &,const _Ty &,_Pr) noexcept(<expr>)' being compiled
with
[
_Ty=int,
_Pr=int
]
G:\software_installation\Visual Studio Community 2017\VC\Tools\MSVC\14.16.27023\include\algorithm(5368): error C2056: illegal expression
Wishing someone would point me out in the right direction. Thank you.

Of the overloads of std::max, the only one which can be called with three arguments is
template < class T, class Compare >
constexpr const T& max( const T& a, const T& b, Compare comp );
So since it receives three int values, that function is attempting to use the third value as a functor to compare the other two, which of course doesn't work.
Probably the simplest way to get the maximum of three numbers is using the overload taking a std::initializer_list<T>. And a std::initializer_list can be automatically created from a braced list:
int res = std::max({ropecutting(n-cuts[0], cuts),
ropecutting(n-cuts[1], cuts),
ropecutting(n-cuts[2], cuts)});

CUDA Programming: Compilation Error

I am making a CUDA program that implements the data parallel prefix sum calculation operating upon N numbers. My code is also supposed to generate the numbers on the host using a random number generator. However, I seem to always run into a "unrecognized token" and "expected a declaration" error on the ending bracket of int main when attempting to compile. I am running the code on Linux.
#include <stdio.h>
#include <cuda.h>
#include <stdlib.h>
#include <math.h>
__global__ void gpu_cal(int *a,int i, int n) {
int tid = blockIdx.x * blockDim.x + threadIdx.x;
if(tid>=i && tid < n) {
a[tid] = a[tid]+a[tid-i];
}
}
int main(void)
{
int key;
int *dev_a;
int N=10;//size of 1D array
int B=1;//blocks in the grid
int T=10;//threads in a block
do{
printf ("Some limitations:\n");
printf (" Maximum number of threads per block = 1024\n");
printf (" Maximum sizes of x-dimension of thread block = 1024\n");
printf (" Maximum size of each dimension of grid of thread blocks = 65535\n");
printf (" N<=B*T\n");
do{
printf("Enter size of array in one dimension, currently %d\n",N);
scanf("%d",&N);
printf("Enter size of blocks in the grid, currently %d\n",B);
scanf("%d",&B);
printf("Enter size of threads in a block, currently %d\n",T);
scanf("%d",&T);
if(N>B*T)
printf("N>B*T, this will result in an incorrect result generated by GPU, please try again\n");
if(T>1024)
printf("T>1024, this will result in an incorrect result generated by GPU, please try again\n");
}while((N>B*T)||(T>1024));
cudaEvent_t start, stop; // using cuda events to measure time
float elapsed_time_ms1, elapsed_time_ms3;
int a[N],gpu_result[N];//for result generated by GPU
int cpu_result[N];//CPU result
cudaMalloc((void**)&dev_a,N * sizeof(int));//allocate memory on GPU
int i,j;
srand(1); //initialize random number generator
for (i=0; i < N; i++) // load array with some numbers
a[i] = (int)rand() ;
cudaMemcpy(dev_a, a , N*sizeof(int),cudaMemcpyHostToDevice);//load data from host to device
cudaEventCreate(&start); // instrument code to measure start time
cudaEventCreate(&stop);
cudaEventRecord(start, 0);
//GPU computation
for(j=0;j<log(N)/log(2);j++){
gpu_cal<<<B,T>>>(dev_a,pow(2,j),N);
cudaThreadSynchronize();
}
cudaMemcpy(gpu_result,dev_a,N*sizeof(int),cudaMemcpyDeviceToHost);
cudaEventRecord(stop, 0); // instrument code to measue end time
cudaEventSynchronize(stop);
cudaEventElapsedTime(&elapsed_time_ms1, start, stop );
printf("\n\n\nTime to calculate results on GPU: %f ms.\n", elapsed_time_ms1); // print out execution time
//CPU computation
cudaEventRecord(start, 0);
for(i=0;i<N;i++)
{
cpu_result[i]=0;
for(j=0;j<=i;j++)
{
cpu_result[i]=cpu_result[i]+a[j];
}
}
cudaEventRecord(stop, 0); // instrument code to measue end time
cudaEventSynchronize(stop);
cudaEventElapsedTime(&elapsed_time_ms3, start, stop );
printf("Time to calculate results on CPU: %f ms.\n\n", elapsed_time_ms3); // print out execution time
//Error check
for(i=0;i < N;i++) {
if (gpu_result[i] != cpu_result[i] ) {
printf("ERROR!!! CPU and GPU create different answers\n");
break;
}
}
//Calculate speedup
printf("Speedup on GPU compared to CPU= %f\n", (float) elapsed_time_ms3 / (float) elapsed_time_ms1);
printf("\nN=%d",N);
printf("\nB=%d",B);
printf("\nT=%d",T);
printf("\n\n\nEnter '1' to repeat, or other integer to terminate\n");
scanf("%d",&key);
}while(key == 1);
cudaFree(dev_a);//deallocation
return 0;
}

The very last } in your code is a Unicode character. If you delete this entire line, and retype the }, the error will be gone.

There are two compile errors in your code.
First, Last ending bracket is a unicode character, so you should resave your code as unicode or delete and rewrite the last ending bracket.
Second, int type variable N which used at this line - int a[N],gpu_result[N];//for result generated by GPU
was declared int type, but it's not allowed in c or c++ compiler, so you should change the N declaration as const int N.

using malloc in dgels function of lapacke

i am trying to use dgels function of lapacke:
when i use it with malloc fucntion. it doesnot give correct value.
can anybody tell me please what is the mistake when i use malloc and create a matrix?
thankyou
/* Calling DGELS using row-major order */
#include <stdio.h>
#include <lapacke.h>
#include <conio.h>
#include <malloc.h>
int main ()
{
double a[3][2] = {{1,0},{1,1},{1,2}};
double **outputArray;
int designs=3;
int i,j,d,i_mal;
lapack_int info,m,n,lda,ldb,nrhs;
double outputArray[3][1] = {{6},{0},{0}};*/
outputArray = (double**) malloc(3* sizeof(double*));
for(i_mal=0;i_mal<3;i_mal++)
{
outputArray[i_mal] = (double*) malloc(1* sizeof(double));
}
for (i=0;i<designs;i++)
{
printf("put first value");
scanf("%lf",&outputArray[i][0]);
}
m = 3;
n = 2;
nrhs = 1;
lda = 2;
ldb = 1;
info = LAPACKE_dgels(LAPACK_ROW_MAJOR,'N',m,n,nrhs,*a,lda,*outputArray,ldb);
for(i=0;i<m;i++)
{
for(j=0;j<nrhs;j++)
{
printf("%lf ",outputArray[i][j]);
}
printf("\n");
}
getch();
return (info);
}

The problem may come from outputArray not being contiguous in memory. You may use something like this instead :
outputArray = (double**) malloc(3* sizeof(double*));
outputArray[0]=(double*) malloc(3* sizeof(double));
for (i=0;i<designs;i++){
outputArray[i]=&outputArray[0][i];
}
Don't forget to free the memory !
free(outputArray[0]);
free(outputArray);
Edit : Contiguous means that you have to allocate the memory for all values at once. See http://www.fftw.org/doc/Dynamic-Arrays-in-C_002dThe-Wrong-Way.html#Dynamic-Arrays-in-C_002dThe-Wrong-Way : some packages, like fftw or lapack require this feature for optimization. As you were calling malloc three times, you created three parts and things went wrong.
If you have a single right hand side, there is no need for a 2D array (double**). outputArray[i] is a double*, that is, the start of the i-th row ( row major). The right line may be outputArray[i]=&outputArray[0][i*nrhs]; if you have many RHS.
By doing this in your code, you are building a 3 rows, one column, that is one RHS. The solution, is of size n=2. It should be outputArray[0][0] , outputArray[1][0]. I hope i am not too wrong, check this on simple cases !
Bye,

Weird SIGSEGV segmentation fault in std::string::assign() method from libstdc++.so.6

My program recently encountered a weird segfault when running. I want to know if somebody had met this error before and how it could be fixed. Here is more info:
Basic info:
CentOS 5.2, kernal version is 2.6.18
g++ (GCC) 4.1.2 20080704 (Red Hat 4.1.2-50)
CPU: Intel x86 family
libstdc++.so.6.0.8
My program will start multiple threads to process data. The segfault occurred in one of the threads.
Though it's a multi-thread program, the segfault seemed to occur on a local std::string object. I'll show this in the code snippet later.
The program is compiled with -g, -Wall and -fPIC, and without -O2 or other optimization options.
The core dump info:
Core was generated by `./myprog'.
Program terminated with signal 11, Segmentation fault.
#0 0x06f6d919 in __gnu_cxx::__exchange_and_add(int volatile*, int) () from /usr/lib/libstdc++.so.6
(gdb) bt
#0 0x06f6d919 in __gnu_cxx::__exchange_and_add(int volatile*, int) () from /usr/lib/libstdc++.so.6
#1 0x06f507c3 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::assign(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib/libstdc++.so.6
#2 0x06f50834 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator=(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib/libstdc++.so.6
#3 0x081402fc in Q_gdw::ProcessData (this=0xb2f79f60) at ../../../myprog/src/Q_gdw/Q_gdw.cpp:798
#4 0x08117d3a in DataParser::Parse (this=0x8222720) at ../../../myprog/src/DataParser.cpp:367
#5 0x08119160 in DataParser::run (this=0x8222720) at ../../../myprog/src/DataParser.cpp:338
#6 0x080852ed in Utility::__dispatch (arg=0x8222720) at ../../../common/thread/Thread.cpp:603
#7 0x0052c832 in start_thread () from /lib/libpthread.so.0
#8 0x00ca845e in clone () from /lib/libc.so.6
Please note that the segfault begins within the basic_string::operator=().
The related code:
(I've shown more code than that might be needed, and please ignore the coding style things for now.)
int Q_gdw::ProcessData()
{
char tmpTime[10+1] = {0};
char A01Time[12+1] = {0};
std::string tmpTimeStamp;
// Get the timestamp from TP
if((m_BackFrameBuff[11] & 0x80) >> 7)
{
for (i = 0; i < 12; i++)
{
A01Time[i] = (char)A15Result[i];
}
tmpTimeStamp = FormatTimeStamp(A01Time, 12); // Segfault occurs on this line
And here is the prototype of this FormatTimeStamp method:
std::string FormatTimeStamp(const char *time, int len)
I think such string assignment operations should be a kind of commonly used one, but I just don't understand why a segfault could occurr here.
What I have investigated:
I've searched on the web for answers. I looked at here. The reply says try to recompile the program with _GLIBCXX_FULLY_DYNAMIC_STRING macro defined. I tried but the crash still happens.
I also looked at here. It also says to recompile the program with _GLIBCXX_FULLY_DYNAMIC_STRING, but the author seems to be dealing with a different problem with mine, thus I don't think his solution works for me.
Updated on 08/15/2011
Here is the original code of this FormatTimeStamp. I understand the coding doesn't look very nice(too many magic numbers, for instance..), but let's focus on the crash issue first.
string Q_gdw::FormatTimeStamp(const char *time, int len)
{
string timeStamp;
string tmpstring;
if (time) // It is guaranteed that "time" is correctly zero-terminated, so don't worry about any overflow here.
tmpstring = time;
// Get the current time point.
int year, month, day, hour, minute, second;
#ifndef _WIN32
struct timeval timeVal;
struct tm *p;
gettimeofday(&timeVal, NULL);
p = localtime(&(timeVal.tv_sec));
year = p->tm_year + 1900;
month = p->tm_mon + 1;
day = p->tm_mday;
hour = p->tm_hour;
minute = p->tm_min;
second = p->tm_sec;
#else
SYSTEMTIME sys;
GetLocalTime(&sys);
year = sys.wYear;
month = sys.wMonth;
day = sys.wDay;
hour = sys.wHour;
minute = sys.wMinute;
second = sys.wSecond;
#endif
if (0 == len)
{
// The "time" doesn't specify any time so we just use the current time
char tmpTime[30];
memset(tmpTime, 0, 30);
sprintf(tmpTime, "%d-%d-%d %d:%d:%d.000", year, month, day, hour, minute, second);
timeStamp = tmpTime;
}
else if (6 == len)
{
// The "time" specifies "day-month-year" with each being 2-digit.
// For example: "150811" means "August 15th, 2011".
timeStamp = "20";
timeStamp = timeStamp + tmpstring.substr(4, 2) + "-" + tmpstring.substr(2, 2) + "-" +
tmpstring.substr(0, 2);
}
else if (8 == len)
{
// The "time" specifies "minute-hour-day-month" with each being 2-digit.
// For example: "51151508" means "August 15th, 15:51".
// As the year is not specified, the current year will be used.
string strYear;
stringstream sstream;
sstream << year;
sstream >> strYear;
sstream.clear();
timeStamp = strYear + "-" + tmpstring.substr(6, 2) + "-" + tmpstring.substr(4, 2) + " " +
tmpstring.substr(2, 2) + ":" + tmpstring.substr(0, 2) + ":00.000";
}
else if (10 == len)
{
// The "time" specifies "minute-hour-day-month-year" with each being 2-digit.
// For example: "5115150811" means "August 15th, 2011, 15:51".
timeStamp = "20";
timeStamp = timeStamp + tmpstring.substr(8, 2) + "-" + tmpstring.substr(6, 2) + "-" + tmpstring.substr(4, 2) + " " +
tmpstring.substr(2, 2) + ":" + tmpstring.substr(0, 2) + ":00.000";
}
else if (12 == len)
{
// The "time" specifies "second-minute-hour-day-month-year" with each being 2-digit.
// For example: "305115150811" means "August 15th, 2011, 15:51:30".
timeStamp = "20";
timeStamp = timeStamp + tmpstring.substr(10, 2) + "-" + tmpstring.substr(8, 2) + "-" + tmpstring.substr(6, 2) + " " +
tmpstring.substr(4, 2) + ":" + tmpstring.substr(2, 2) + ":" + tmpstring.substr(0, 2) + ".000";
}
return timeStamp;
}
Updated on 08/19/2011
This problem has finally been addressed and fixed. The FormatTimeStamp() function has nothing to do with the root cause, in fact. The segfault is caused by a writing overflow of a local char buffer.
This problem can be reproduced with the following simpler program(please ignore the bad namings of some variables for now):
(Compiled with "g++ -Wall -g main.cpp")
#include <string>
#include <iostream>
void overflow_it(char * A15, char * A15Result)
{
int m;
int t = 0,i = 0;
char temp[3];
for (m = 0; m < 6; m++)
{
t = ((*A15 & 0xf0) >> 4) *10 ;
t += *A15 & 0x0f;
A15 ++;
std::cout << "m = " << m << "; t = " << t << "; i = " << i << std::endl;
memset(temp, 0, sizeof(temp));
sprintf((char *)temp, "%02d", t); // The buggy code: temp is not big enough when t is a 3-digit integer.
A15Result[i++] = temp[0];
A15Result[i++] = temp[1];
}
}
int main(int argc, char * argv[])
{
std::string str;
{
char tpTime[6] = {0};
char A15Result[12] = {0};
// Initialize tpTime
for(int i = 0; i < 6; i++)
tpTime[i] = char(154); // 154 would result in a 3-digit t in overflow_it().
overflow_it(tpTime, A15Result);
str.assign(A15Result);
}
std::cout << "str says: " << str << std::endl;
return 0;
}
Here are two facts we should remember before going on:
1). My machine is an Intel x86 machine so it's using the Little Endian rule. Therefore for a variable "m" of int type, whose value is, say, 10, it's memory layout might be like this:
Starting addr：0xbf89bebc: m(byte#1): 10
0xbf89bebd: m(byte#2): 0
0xbf89bebe: m(byte#3): 0
0xbf89bebf: m(byte#4): 0
2). The program above runs within the main thread. When it comes to the overflow_it() function, the variables layout in the thread stack looks like this(which only shows the important variables):
0xbfc609e9 : temp[0]
0xbfc609ea : temp[1]
0xbfc609eb : temp[2]
0xbfc609ec : m(byte#1) <-- Note that m follows temp immediately. m(byte#1) happens to be the byte temp[3].
0xbfc609ed : m(byte#2)
0xbfc609ee : m(byte#3)
0xbfc609ef : m(byte#4)
0xbfc609f0 : t
...(3 bytes)
0xbfc609f4 : i
...(3 bytes)
...(etc. etc. etc...)
0xbfc60a26 : A15Result <-- Data would be written to this buffer in overflow_it()
...(11 bytes)
0xbfc60a32 : tpTime
...(5 bytes)
0xbfc60a38 : str <-- Note the str takes up 4 bytes. Its starting address is **16 bytes** behind A15Result.
My analysis:
1). m is a counter in overflow_it() whose value is incremented by 1 at each for loop and whose max value is supposed not greater than 6. Thus it's value could be stored completely in m(byte#1)(remember it's Little Endian) which happens to be temp3.
2). In the buggy line: When t is a 3-digit integer, such as 109, then the sprintf() call would result in a buffer overflow, because serializing the number 109 to the string "109" actually requires 4 bytes: '1', '0', '9' and a terminating '\0'. Because temp[] is allocated with 3 bytes only, the final '\0' would definitely be written to temp3, which is just the m(byte#1), which unfortunately stores m's value. As a result, m's value is reset to 0 every time.
3). The programmer's expectation, however, is that the for loop in the overflow_it() would execute 6 times only, with each time m being incremented by 1. Because m is always reset to 0, the actual loop time is far more than 6 times.
4). Let's look at the variable i in overflow_it(): Every time the for loop is executed, i's value is incremented by 2, and A15Result[i] will be accessed. However, if you compile and run this program, you'll see the i value finally adds up to 24, which means the overflow_it() writes data to the bytes ranging from A15Result[0] to A15Result[23]. Note that the object str is only 16 bytes behind A15Result[0], thus the overflow_it() has "sweeped through" str and destroy it's correct memory layout.
5). I think the correct use of std::string, as it is a non-POD data structure, depends on that that instantiated std::string object must have a correct internal state. But in this program, str's internal layout has been changed by force externally. This should be why the assign() method call would finally cause a segfault.
Update on 08/26/2011
In my previous update on 08/19/2011, I said that the segfault was caused by a method call on a local std::string object whose memory layout had been broken and thus became a "destroyed" object. This is not an "always" true story. Consider the C++ program below:
//C++
class A {
public:
void Hello(const std::string& name) {
std::cout << "hello " << name;
}
};
int main(int argc, char** argv)
{
A* pa = NULL; //!!
pa->Hello("world");
return 0;
}
The Hello() call would succeed. It would succeed even if you assign an obviously bad pointer to pa. The reason is: the non-virtual methods of a class don't reside within the memory layout of the object, according to the C++ object model. The C++ compiler turns the A::Hello() method to something like, say, A_Hello_xxx(A * const this, ...) which could be a global function. Thus, as long as you don't operate on the "this" pointer, things could go pretty well.
This fact shows that a "bad" object is NOT the root cause that results in the SIGSEGV segfault. The assign() method is not virtual in std::string, thus the "bad" std::string object wouldn't cause the segfault. There must be some other reason that finally caused the segfault.
I noticed that the segfault comes from the __gnu_cxx::__exchange_and_add() function, so I then looked into its source code in this web page:
00046 static inline _Atomic_word
00047 __exchange_and_add(volatile _Atomic_word* __mem, int __val)
00048 { return __sync_fetch_and_add(__mem, __val); }
The __exchange_and_add() finally calls the __sync_fetch_and_add(). According to this web page, the __sync_fetch_and_add() is a GCC builtin function whose behavior is like this:
type __sync_fetch_and_add (type *ptr, type value, ...)
{
tmp = *ptr;
*ptr op= value; // Here the "op=" means "+=" as this function is "_and_add".
return tmp;
}
There it is! The passed-in ptr pointer is dereferenced here. In the 08/19/2011 program, the ptr is actually the "this" pointer of the "bad" std::string object within the assign() method. It is the derefenence at this point that actually caused the SIGSEGV segmentation fault.
We could test this with the following program:
#include <bits/atomicity.h>
int main(int argc, char * argv[])
{
__sync_fetch_and_add((_Atomic_word *)0, 10); // Would result in a segfault.
return 0;
}

There are two likely possibilities:
some code before line 798 has corrupted the local tmpTimeStamp
object
the return value from FormatTimeStamp() was somehow bad.
The _GLIBCXX_FULLY_DYNAMIC_STRING is most likely a red herring and has nothing to do with the problem.
If you install debuginfo package for libstdc++ (I don't know what it's called on CentOS), you'll be able to "see into" that code, and might be able to tell whether the left-hand-side (LHS) or the RHS of the assignment operator caused the problem.
If that's not possible, you'll have to debug this at the assembly level. Going into frame #2 and doing x/4x $ebp should give you previous ebp, caller address (0x081402fc), LHS (should match &tmpTimeStamp in frame #3), and RHS. Go from there, and good luck!

I guess there could be some problem inside FormatTimeStamp function, but without source code it's hard to say anything. Try to check your program under Valgrind. Usually this helps to fix such sort of bugs.

atoi on a character array with lots of integers

I have a code in which the character array is populated by integers (converted to char arrays), and read by another function which reconverts it back to integers. I have used the following function to get the conversion to char array:
char data[64];
int a = 10;
std::string str = boost::lexical_cast<std::string>(a);
memcpy(data + 8*k,str.c_str(),sizeof(str.c_str())); //k varies from 0 to 7
and the reconversion back to characters is done using:
char temp[8];
memcpy(temp,data+8*k,8);
int a = atoi(temp);
This works fine in general, but when I try to do it as part of a project involving qt (ver 4.7), it compiles fine and gives me segmentation faults when it tries to read using memcpy(). Note that the segmentation fault happens only while in the reading loop and not while writing data. I dont know why this happens, but I want to get it done by any method.
So, are there any other other functions which I can use which can take in the character array, the first bit and the last bit and convert it into the integer. Then I wouldnt have to use memcpy() at all. What I am trying to do is something like this:
new_atoi(data,8*k,8*(k+1)); // k varies from 0 to 7
Thanks in advance.

You are copying only a 4 characters (dependent on your system's pointer width). This will leave numbers of 4+ characters non-null terminated, leading to runaway strings in the input to atoi
sizeof(str.c_str()) //i.e. sizeof(char*) = 4 (32 bit systems)
should be
str.length() + 1
Or the characters will not be nullterminated
STL Only:
make_testdata(): see all the way down
Why don't you use streams...?
#include <sstream>
#include <iostream>
#include <algorithm>
#include <iterator>
#include <string>
#include <vector>
int main()
{
std::vector<int> data = make_testdata();
std::ostringstream oss;
std::copy(data.begin(), data.end(), std::ostream_iterator<int>(oss, "\t"));
std::stringstream iss(oss.str());
std::vector<int> clone;
std::copy(std::istream_iterator<int>(iss), std::istream_iterator<int>(),
std::back_inserter(clone));
//verify that clone now contains the original random data:
//bool ok = std::equal(data.begin(), data.end(), clone.begin());
return 0;
}
You could do it a lot faster in plain C with atoi/itoa and some tweaks, but I reckon you should be using binary transmission (see Boost Spirit Karma and protobuf for good libraries) if you need the speed.
Boost Karma/Qi:
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp>
namespace qi=::boost::spirit::qi;
namespace karma=::boost::spirit::karma;
static const char delimiter = '\0';
int main()
{
std::vector<int> data = make_testdata();
std::string astext;
// astext.reserve(3 * sizeof(data[0]) * data.size()); // heuristic pre-alloc
std::back_insert_iterator<std::string> out(astext);
{
using namespace karma;
generate(out, delimit(delimiter) [ *int_ ], data);
// generate_delimited(out, *int_, delimiter, data); // equivalent
// generate(out, int_ % delimiter, data); // somehow much slower!
}
std::string::const_iterator begin(astext.begin()), end(astext.end());
std::vector<int> clone;
qi::parse(begin, end, qi::int_ % delimiter, clone);
//verify that clone now contains the original random data:
//bool ok = std::equal(data.begin(), data.end(), clone.begin());
return 0;
}
If you wanted to do architecture independent binary serialization instead, you'd use this tiny adaptation making things a zillion times faster (see benchmark below...):
karma::generate(out, *karma::big_dword, data);
// ...
qi::parse(begin, end, *qi::big_dword, clone);
Boost Serialization
The best performance can be reached when using Boost Serialization in binary mode:
#include <sstream>
#include <boost/archive/binary_oarchive.hpp>
#include <boost/archive/binary_iarchive.hpp>
#include <boost/serialization/vector.hpp>
int main()
{
std::vector<int> data = make_testdata();
std::stringstream ss;
{
boost::archive::binary_oarchive oa(ss);
oa << data;
}
std::vector<int> clone;
{
boost::archive::binary_iarchive ia(ss);
ia >> clone;
}
//verify that clone now contains the original random data:
//bool ok = std::equal(data.begin(), data.end(), clone.begin());
return 0;
}
Testdata
(common to all versions above)
#include <boost/random.hpp>
// generates a deterministic pseudo-random vector of 32Mio ints
std::vector<int> make_testdata()
{
std::vector<int> testdata;
testdata.resize(2 << 24);
std::generate(testdata.begin(), testdata.end(), boost::mt19937(0));
return testdata;
}
Benchmarks
I benchmarked it by
using input data of 2<<24 (33554432) random integers
not displaying output (we don't want to measure the scrolling performance of our terminal)
the rough timings were
STL only version isn't too bad actually at 12.6s
Karma/Qi text version ran in 18s 5.1s, thanks to Arlen's hint at generate_delimited :)
Karma/Qi binary version (big_dword) in only 1.4s (roughly 12x 3-4x as fast)
Boost Serialization takes the cake with around 0.8s (or when subsituting text archives instead of binaries, around 13s)

There is absolutely no reason for the Karma/Qi text version to be any slower than the STL version. I improved #sehe implementation of the Karma/Qi text version to reflect that claim.
The following Boost Karma/Qi text version is more than twice as fast as the STL version:
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp>
#include <boost/random.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/phoenix_stl.hpp>
namespace ascii = boost::spirit::ascii;
namespace qi = boost::spirit::qi;
namespace karma = boost::spirit::karma;
namespace phoenix = boost::phoenix;
template <typename OutputIterator>
void generate_numbers(OutputIterator& sink, const std::vector<int>& v){
using karma::int_;
using karma::generate_delimited;
using ascii::space;
generate_delimited(sink, *int_, space, v);
}
template <typename Iterator>
void parse_numbers(Iterator first, Iterator last, std::vector<int>& v){
using qi::int_;
using qi::phrase_parse;
using ascii::space;
using qi::_1;
using phoenix::push_back;
using phoenix::ref;
phrase_parse(first, last, *int_[push_back(ref(v), _1)], space);
}
int main(int argc, char* argv[]){
static boost::mt19937 rng(0); // make test deterministic
std::vector<int> data;
data.resize(2 << 24);
std::generate(data.begin(), data.end(), rng);
std::string astext;
std::back_insert_iterator<std::string> out(astext);
generate_numbers(out, data);
//std::cout << astext << std::endl;
std::string::const_iterator begin(astext.begin()), end(astext.end());
std::vector<int> clone;
parse_numbers(begin, end, clone);
//verify that clone now contains the original random data:
//std::copy(clone.begin(), clone.end(), std::ostream_iterator<int>(std::cout, ","));
return 0;
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

(Optimization?) Bug regarding GCC std::thread - multithreading

Related

std::max giving error C2064: term does not evaluate to a function taking 2 arguments

CUDA Programming: Compilation Error

using malloc in dgels function of lapacke

Weird SIGSEGV segmentation fault in std::string::assign() method from libstdc++.so.6

atoi on a character array with lots of integers

Categories

Resources