Viewing call stack for all threads when debugging a multithreaded Windows CE application

Viewing call stack for all threads when debugging a multithreaded Windows CE application - multithreading

So, working with Visual Studio 2008 developing native C++ code for a Windows CE 6.0 platform. Consider the following multithreaded application:
#include "stdafx.h"
void IncrementCounter(int& counter)
{
if (++counter >= 1000)
{
counter = 0;
}
}
unsigned long ThreadFunction(void* arguments)
{
int threadCounter = 0;
while (true)
{
Sleep(20);
IncrementCounter(threadCounter);
}
return 0;
}
int _tmain(int argc, _TCHAR* argv[])
{
CreateThread(
NULL,
0,
(LPTHREAD_START_ROUTINE)ThreadFunction,
NULL,
0,
NULL
);
int mainCounter = 0;
while (true)
{
Sleep(20);
IncrementCounter(mainCounter);
}
return 0;
}
When I build this to run on my Windows 7 dev. machine and run a debug session from Visual Studio with a breakpoint on the counter = 0; statement, execution will eventually break and two threads will be displayed in the "Threads" debug window. I can switch back and forth between the two threads using either a double-click or right-click->"Switch to Thread" and see a call stack and browse source and inspect symbols (for the call stack frames within my application code) for both threads. However when I do the same on Windows CE connecting via. ActiveSync/WMDC (have tried on both our custom CE 6.0 hardware with an in-house OS and SDK, and an old Windows mobile 5.0 PDA with the stock MS SDK) I can see a call stack and browse source for the thread in which the break has taken place (where the current execution point is within my application code), however I don't get anything useful for the other thread, which is currently blocked in kernel space waiting it's sleep timeout.
Anyone know whether it's possible to get this working better on Windows CE? I'm guessing it might be something to do with the debugger not knowing where to find .pdb symbol files for the WinCE kernel elements, or perhaps do I need to be running a Debug OS?
Windows CE 6 remote debugging. No call stack when pause program describes the same issue, but doesn't really provide a solution
thanks
Richard

Probably its because of missing pdb file for coredll.dll. If you are creating image for your device you will have access to this file, otherwise I am afraid its platform dependent.
You can find here that this issue is considered to be by design in VS2005 so maybe its the same for VS2008:
http://connect.microsoft.com/VisualStudio/feedback/details/190785/unable-to-debug-windows-mobile-application-that-is-in-a-system-call
In following link you can find some instructions for finding call stack using platform builder for "Thread That Is Not Running"
https://distrinet.cs.kuleuven.be/projects/SEESCOA/internal/workpackages/workpackage6/Task6dot2/ESCE/classes/331.pdf
Since I'am using only VS 2005 I cannot confirm if its of any help.
If logging is not sufficient (as was suggested in the SO link you provided), to find call stack for a thread like in your example I suggest using GetThreadCallStack function. Here is a step by step procedure:
1 - Add following code to your project:
typedef struct _CallSnapshotEx {
DWORD dwReturnAddr;
DWORD dwFramePtr;
DWORD dwCurProc;
DWORD dwParams[4];
} CallSnapshotEx;
#define STACKSNAP_EXTENDED_INFO 2
DWORD dwGUIThread;
void DumpGUIThreadCallStack() {
HINSTANCE hCore = LoadLibrary(_T("coredll.dll"));
typedef ULONG (*GETTHREADCALLSTACK)(HANDLE hThrd, ULONG dwMaxFrames, LPVOID lpFrames[], DWORD dwFlags,DWORD dwSkip);
GETTHREADCALLSTACK pGetThreadCallStack = (GETTHREADCALLSTACK)GetProcAddress(hCore, _T("GetThreadCallStack"));
if ( !pGetThreadCallStack )
return;
#define MAX_FRAMES 40
CallSnapshotEx lpFrames[MAX_FRAMES];
DWORD dwCnt = pGetThreadCallStack((HANDLE)dwGUIThread, MAX_FRAMES, (void**)lpFrames, STACKSNAP_EXTENDED_INFO, 0);
TCHAR szBuff[64];
for ( DWORD i = 0; i < dwCnt; ++i ) {
wsprintf(szBuff, L"[%d] %p\n", i, lpFrames[i].dwReturnAddr);
OutputDebugString(szBuff);
}
}
it will dump in Output window call frames return addresses (sample output is in point 3).
2 - initialize dwGUIThread in WinMain:
dwGUIThread = GetCurrentThreadId();
3 - execute DumpGUIThreadCallStack(); before actuall breakpoint inside ThreadFunction. It will write to output window text similar to this:
[0] 8C04D2C4
[1] 8C04D34C
[2] 40026D48
[3] 000111F4 <--- 1
[4] 00011BAC <--- 2
[5] 4003C2DC
1 and 2 are return addresses that you are interested in, and you want to find symbols nearest to them.
4 - while inside debugger switch to disassembly mode (right click on source file and choose 'Go to disassembly'). In this mode at the top of the window you will see Address: line. You should copy paste to it addresses from output window, in my case 000111F4 will direct me to following lines:
while (true)
{
Sleep(20);
000111F0 mov r0, #0x14
000111F4 bl 0001193C // <--- 1
IncrementCounter(mainCounter);
which gives you what your GUI thread is actually doing.
Visual Studio Debugger allows to execute functions from immediate window, but I was unable to call DumpGUIThreadCallStack, I am always getting 'Error: function evaluation not supported'.
To find nearest symbols for frame return addresses you can also use .map files together with .cod files (/FAcs compiled sources), there are some good tutorials on that on google.
Above example was tested with the use of VS 2005 and Standard SDK 5.0, on WCE6.0 (end user) device.

Related

size_t different default value when running through VS2013 debugger vs CommandLine

Running the following code via the visual studio debugger executes successfully. The "count" variable will be default initialized to 0.
If I run via the command line, i get random behaviour and my EXPECT_EQ( ... ) fails.
size_t expectedCount = actual.length() - expected.length();
position += 12;
size_t count;
for (size_t i = position ; i < actual.length(); ++i) {
if (actual.at(i) == 'a')
++count;
}
EXPECT_EQ(expectedCount , count);
I'm assuming this is because Visual studio gives me a clean stack (everything is 0) whereas the commandline has lingering garbage?

In a function scope, the syntax size_t count; will not initialize a variable. Use size_t count{};
For more info on initialization, see
Variable initialization in C++.

Your Debug build may be setting count to 0 due to the nature of that build configuration but not in Release build. You need to initialize count to zero. Always initialize variables.

Arduino sketch compiled from Windows or Linux performs differently

I have a very strange problem with a sketch which performs differently if compiled and uploaded to Arduino from Windows XP Home sp3 or Elementary OS Luna (a distro of Ubuntu Linux).
This sketch, between other things, reads a byte from a serial connection (bluetooth) and write it back to serial monitor.
This is what I get if I compile the sketch from WinXP: I sent over BT connection strings from "1" to "7" one time each. The ASCII code of these strings are reduced of 48 to transform string in byte. The result is correct, also functions in pointer array are correctly called.
and here is what I get from Linux. I sent 4 times each string from "1" to "7" to see that result has nothing to do with what I need to get and also is not consistent with the same input data: for example when I send string "2" I get 104 106 106 104..... and same byte 106 is written with different Strings coming from BT.
Also the functions are not called so it means that is not a Serial.print issue.
I'm sure it is a compiling issue because once the sketch is uploaded in Arduino it performs in the same way (correct or not) if I use serial monitor in WinXP or Linux.
Here's the sketch
#include "Arduino.h"
#include <SoftwareSerial.h>
#include <Streaming.h>
#define nrOfCommands 10
typedef void (* CmdFuncPtr) (); // this is a typedef to command functions
//the following declares an arry of 10 function pointers of type DigitFuncPtr
CmdFuncPtr setOfCmds[nrOfCommands] = {
noOp,
leftWindowDown,
leftWindowUp,
bootOpen,
cabinLightOn,
cabinLightOff,
lockOn,
lockOff,
canStart,
canStop
};
#define cmdLeftWindowDown 1
#define cmdLeftWindowUp 2
#define cmdBootOpen 3
#define cmdCabinLightOn 4
#define cmdCabinLightOff 5
#define cmdLockOn 6
#define cmdLockOff 7
#define cmdCanStart 8
#define cmdCanStop 9
#define buttonPin 4 // the number of the pushbutton pin
#define bluetoothTx 2
#define bluetoothRx 3
int buttonState = 0; // variable for reading the pushbutton status
int androidSwitch=0;
byte incomingByte; // incoming data
byte msg[12];
byte msgLen=0;
byte msgIdMsb=0;
byte msgIdLsb=0;
//const byte cmdLeftWindowDown;
SoftwareSerial bluetooth(bluetoothTx,bluetoothRx);
void setup()
{
//Setup usb serial connection to computer
Serial.begin(115200);
//Setup Bluetooth serial connection to android
bluetooth.begin(115200);
//bluetooth.print("$$$");
randomSeed(analogRead(10));
delay(100);
//bluetooth.println("U,9600,E");
//bluetooth.begin(9600);
//time=0;
}
void loop() {
msgIdLsb=random(1,255);
msgIdMsb=random(0,5);
msg[0]=msgIdMsb;
msg[1]=msgIdLsb;
msgLen=random(9);
msg[2]=msgLen;
for (int x=3;x<msgLen+3;x++) {
msg[x]=random(255);
}
for (int x=3+msgLen;x<11;x++) {
msg[x]=0;
}
msg[11]='\n';
// read the state of the pushbutton value:
buttonState = digitalRead(buttonPin);
if ((buttonState == HIGH)||(androidSwitch==HIGH)) {
for (int x=0;x<12;x++) {
Serial<<msg[x]<<" ";
bluetooth.write(uint8_t(msg[x]));
}
Serial<<endl;
}
//Read from bluetooth and write to usb serial
if(bluetooth.available())
{
incomingByte = bluetooth.read()-48;
Serial<<incomingByte<<endl;
if (incomingByte<nrOfCommands)
setOfCmds[incomingByte]();
}
delay(10);
}
void noOp(void)
{
Serial<<"noOp"<<endl;
};
void leftWindowDown(void)
{
Serial<<"leftWindowDown"<<endl;
};
void leftWindowUp(void)
{
Serial<<"leftWindowUp"<<endl;
};
void bootOpen(void)
{
Serial<<"bootOpen"<<endl;
};
void cabinLightOn(void)
{
Serial<<"cabinLightOn"<<endl;
};
void cabinLightOff(void)
{
Serial<<"cabinLightOff"<<endl;
};
void lockOn(void)
{
Serial<<"lockOn"<<endl;
};
void lockOff(void)
{
Serial<<"lockOff"<<endl;
};
void canStart(void)
{
androidSwitch=HIGH;
};
void canStop(void)
{
androidSwitch=LOW;
};
Any help would be very helpful.
Thanks in advance.

I suppose you are using the arduino ide; if not, some of the following might not apply.
First, find out the location of the build directory the ide is using when it compiles and links the code. [One way to find out is to temporarily turn on Verbose output during compilation. (Click File, Preferences, "Show verbose output during compilation".) Click the Verify button to compile the code, and look at the path following the -o option in the first line of output.] For example, on a Linux system the build directory path might be something like /tmp/build3877126492387157498.tmp. In that directory, look for the .cpp file created during compilation.
After you find the .cpp files for your sketch on both systems, copy them onto one system so you can compare them and check for differences. If they are different, one or the other ide might be corrupt or an incorrect include might be occurring.
If the .cpp files differ, compare the compile flags, the header files, etc. I think the flags and AVR header files should be the same on both systems, with the possible exception that MSW files might have carriage return characters after the newline characters. Also check the gcc versions. [I don't have an MSW system to try, but I'm supposing that gcc is used on both systems for AVR cross-compiling. Please correct me if I'm wrong.]
If the .cpp files match, then test the generated binary files to find out where they differ. (For example, if the sketch file is Blink21x.ino, binary files might be Blink21x.cpp.elf or Blink21x.cpp.hex.) If you have a .elf file on both systems [I don't know if the MSW system will generate .elf] use avr-objdump on the Linux system to produce a disassembled version of code:
avr-objdump -d Blink21x.cpp.elf > Blink21x.cpp.lst
Then use diff to locate differences between the two disassembly files. Enough information is available in the .lst file to identify your source line if the difference is due to how your source was compiled, as opposed to a difference in libraries. (In the latter case, enough information is given in the .lst file to identify which library routines differ.)
If you don't have an .elf file on the MSW system, you might try comparing the .hex files. From the location of the difference you can find the relevant line in the Linux-system .elf-disassembly file, and from that can identify a line of your code or a library routine.

Visual Studio 2012 - vshost32-clr2.exe has stopped working

I'm creating a WinForm Application in C# using Visual Studio 2012 and I'm getting an error when I debug it :
vshost32-clr2.exe has stopped working
I already searched but most results are for Visual Studio 2010 and lower and I get similar solutions which I think is not applicable to Visual Studio 2012 :
Properties -> Debug -> Enable unmanaged code debugging
Source : vshost32.exe crash when calling unmanaged DLL
Additional Details :
My project doesn't use any DLL.
As far as I progress in my project, it only occurs when the width is 17.
I use the following code :
Bitmap tmp_bitmap = new Bitmap(Width, Height, System.Drawing.Imaging.PixelFormat.Format24bppRgb);
Rectangle rect = new Rectangle(0, 0, 16, tmp_bitmap.Height);
System.Drawing.Imaging.BitmapData bmpData =
tmp_bitmap.LockBits(rect, System.Drawing.Imaging.ImageLockMode.ReadWrite,
tmp_bitmap.PixelFormat);
unsafe
{
// Get address of first pixel on bitmap.
byte* ptr = (byte*)bmpData.Scan0;
int bytes = Width * Height * 3; //124830 [Total Length from 190x219 24 Bit Bitmap]
int b; // Individual Byte
for (int i = 0; i < bytes; i++)
{
_ms.Position = EndOffset - i; // Change the fs' Position
b = _ms.ReadByte(); // Reads one byte from its position
*ptr = Convert.ToByte(b);
ptr++;
// fix width is odd bug.
if (Width % 4 != 0)
if ((i + 1) % (Width * 3) == 0 && (i + 1) * 3 % Width < Width - 1)
{
ptr += 2;
}
}
// Unlock the bits.
tmp_bitmap.UnlockBits(bmpData);
}
I think posting my code is necessary as it only occurs when such value is set to my method.
I hope you can help me fix this problem.
Thank you very much in advance!

Not sure if this is the same issue, but I had a very similar issue which resolved (vanished) when I un-checked "Enable the Visual Studio hosting process" under the Debug section of Project/Properties. I also enabled native code debugging.

This issue can be related with debugging application as "Any CPU" under x64 OS, set Target CPU as x86

Adding my 2 cents since I ran into this today.
In my case, a call to a printer was passing some invalid value, and it seems it send the debugger to sleep with the fishes.
If you run into this, see if you can pinpoint the line and make sure there are no funny business issues around a call out (like a printing service)

Below solution worked for me:
Go to the Project->Properties->Debug tab
Unchecked the 'Enable the Visual Studio hosting process' checkbox
Checked 'Enable native code debugging' option
Hope this helps.

Native mutex implementation

So in my ilumination days, i started to think about how the hell do windows/linux implement the mutex, i've implemented this synchronizer in 100... different ways, in many diferent arquitectures but never think how it is really implemented in big ass OS, for example in the ARM world i made some of my synchronizers disabling the interrupts but i always though that it wasn't a really good way to do it.
I tried to "swim" throgh the linux kernel but just like a though i can't see nothing that satisfies my curiosity. I'm not an expert in threading, but i have solid all the basic and intermediate concepts of it.
So does anyone know how a mutex is implemented?

A quick look at code apparently from one Linux distribution seems to indicate that it is implemented using an interlocked compare and exchange. So, in some sense, the OS isn't really implementing it since the interlocked operation is probably handled at the hardware level.
Edit As Hans points out, the interlocked exchange does the compare and exchange in an atomic manner. Here is documentation for the Windows version. For fun, I just now wrote a small test to show a really simple example of creating a mutex like that. This is a simple acquire and release test.
#include <windows.h>
#include <assert.h>
#include <stdio.h>
struct homebrew {
LONG *mutex;
int *shared;
int mine;
};
#define NUM_THREADS 10
#define NUM_ACQUIRES 100000
DWORD WINAPI SomeThread( LPVOID lpParam )
{
struct homebrew *test = (struct homebrew*)lpParam;
while ( test->mine < NUM_ACQUIRES ) {
// Test and set the mutex. If it currently has value 0, then it
// is free. Setting 1 means it is owned. This interlocked function does
// the test and set as an atomic operation
if ( 0 == InterlockedCompareExchange( test->mutex, 1, 0 )) {
// this tread now owns the mutex. Increment the shared variable
// without an atomic increment (relying on mutex ownership to protect it)
(*test->shared)++;
test->mine++;
// Release the mutex (4 byte aligned assignment is atomic)
*test->mutex = 0;
}
}
return 0;
}
int main( int argc, char* argv[] )
{
LONG mymutex = 0; // zero means
int shared = 0;
HANDLE threads[NUM_THREADS];
struct homebrew test[NUM_THREADS];
int i;
// Initialize each thread's structure. All share the same mutex and a shared
// counter
for ( i = 0; i < NUM_THREADS; i++ ) {
test[i].mine = 0; test[i].shared = &shared; test[i].mutex = &mymutex;
}
// create the threads and then wait for all to finish
for ( i = 0; i < NUM_THREADS; i++ )
threads[i] = CreateThread(NULL, 0, SomeThread, &test[i], 0, NULL);
for ( i = 0; i < NUM_THREADS; i++ )
WaitForSingleObject( threads[i], INFINITE );
// Verify all increments occurred atomically
printf( "shared = %d (%s)\n", shared,
shared == NUM_THREADS * NUM_ACQUIRES ? "correct" : "wrong" );
for ( i = 0; i < NUM_THREADS; i++ ) {
if ( test[i].mine != NUM_ACQUIRES ) {
printf( "Thread %d cheated. Only %d acquires.\n", i, test[i].mine );
}
}
}
If I comment out the call to the InterlockedCompareExchange call and just let all threads run the increments in a free-for-all fashion, then the results do result in failures. Running it 10 times, for example, without the interlocked compare call:
shared = 748694 (wrong)
shared = 811522 (wrong)
shared = 796155 (wrong)
shared = 825947 (wrong)
shared = 1000000 (correct)
shared = 795036 (wrong)
shared = 801810 (wrong)
shared = 790812 (wrong)
shared = 724753 (wrong)
shared = 849444 (wrong)
The curious thing is that one time the results showed now incorrect contention. That might be because there is no "everyone start now" synchronization; maybe all threads started and finished in order in that case. But when I have the InterlockedExchangeCall in place, it runs without failure (or at least it ran 100 times without failure ... that doesn't prove I didn't write a subtle bug into the example).

Here is the discussion from the people who implemented it ... very interesting as it shows the tradeoffs ..
Several posts from Linus T ... of course

In earlier days pre-POSIX etc I used to implement synchronization by using a native mode word (e.g. 16 or 32 bit word) and the Test And Set instruction lurking on every serious processor. This instruction guarantees to test the value of a word and set it in one atomic instruction. This provides the basis for a spinlock and from that a hierarchy of synchronization functions could be built. The simplest is of course just a spinlock which performs a busy wait, not an option for more than transitory sync'ing, then a spinlock which drops the process time slice at each iteration for a lower system impact. Notional concepts like Semaphores, Mutexes, Monitors etc can be built by getting into the kernel scheduling code.
As I recall the prime usage was to implement message queues to permit multiple clients to access a database server. Another was a very early real time car race result and timing system on a quite primitive 16 bit machine and OS.
These days I use Pthreads and Semaphores and Windows Events/Mutexes (mutices?) etc and don't give a thought as to how they work, although I must admit that having been down in the engine room does give one and intuitive feel for better and more efficient multiprocessing.

In windows world.
The mutex before the windows vista mas implemented with a Compare Exchange to change the state of the mutex from Empty to BeingUsed, the other threads that entered the wait on the mutex the CAS will obvious fail and it must be added to the mutex queue for furder notification. Those operations (add/remove/check) of the queue would be protected by an common lock in windows kernel.
After Windows XP, the mutex started to use a spin lock for performance reasons being a self-suficiant object.
In unix world i didn't get much furder but probably is very similar to the windows 7.
Finally for kernels that work on a single processor the best way is to disable the interrupts when entering the critical section and re-enabling then when exiting.

MPI program with a VC++ GUI?

I need to write an application using MPICH2 (64 bit, in case you're wondering). A GUI is entirely optional but would of course be a huge plus. Will mpiexec have any difficulties running managed VC++ code? Are there any other problems I might run into with compiling/linking (calling conventions, etc)?
Just to give you an idea, the general structure of the program would be like this:
int main(array<System::String ^> ^args)
{
/* Get MPI rank */
if ( rank == 0 )
{
// Enabling Windows XP visual effects before any controls are created
Application::EnableVisualStyles();
Application::SetCompatibleTextRenderingDefault(false);
// Create the main window and run it
// Send/receive messages in Form1's code
Application::Run(gcnew Form1());
}
else
{
/* Send/receive messages to/from process #0 only */
}
return 0;
}

MPI is just another library so no magic. Your code should look something like this:
init MPI
if(rank == 0) init your GUI;
while (1){
if(rank == 0)get input;
perform MPI computation on input
make sure rank 0 ends up with the final result
if(rank == 0) display result on GUI;
}
if(rank == 0) clean up GUI;
clean up MPI

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string