wxCriticalSection under Linux/Unix - linux

i discovered that a wxCriticalSection is not recursive ( does deadlock when a thread grabs a section more than once ) under linux. Looking at the sources, i discovered that a wxCriticalSection is implemented using a wxMutex under Linux, but without using wxMUTEX_RECURSIVE. I have a codebase that runs well under Win and Mac, and i want to port it to Linux, but i have deadlocks at some places where i did not avoid recursion.
Now i have two possibilities:
Changing and rebuilding wxWidgets for my purpose ( brrr - by any chance i want to avpid that since i do not know too much about the design decisions behind that )
debugging each and all of my possible code paths ( brrr - will take days and is horribly bug - prone )
Is there a third way, replacing/extending wxCriticalSection with a construct that behaves equally under Mac/Win/Unix?
ps. could someone explain the design decision to me? Mr. Vadim Z says ...
I had temporarily forgot the reason I was against this (making wxCriticalSections recursive) but I did recall it 30 seconds later (after sending my message, of course ). Please see my follow-up
But there was never a follow-up ...

In version 2.9.1, it appears that the default should be recursive. In file \wxWidgets-2.9.1\include\wx\thread.h:
inline wxCriticalSection::wxCriticalSection( wxCriticalSectionType critSecType )
: m_mutex( critSecType == wxCRITSEC_DEFAULT ? wxMUTEX_RECURSIVE : wxMUTEX_DEFAULT ) { }
And in class wxCriticalSection the constructor declaration is
wxCRITSECT_INLINE wxCriticalSection( wxCriticalSectionType critSecType = wxCRITSEC_DEFAULT );
I don't use Linux, so I can't verify that wxCriticalSection is actually recursive when compiled.

Related

pandarallel package on windows infinite loop bug

so this is not really a question but rather a bug report for the pandarallel package:
this is the end of my code:
...
print('Calculate costs NEG...')
for i, group in tqdm(df_mol_neg.groupby('DELIVERY_DATE')):
srl_slice = df_srl.loc[df_srl['DATE'] == i]
srl_slice['srl_soll'] = srl_slice['srl_soll'].copy() * -1
df_aep_neg.loc[df_aep_neg['DATE'] == i, 'SRL_cost'] = srl_slice['srl_soll'].parallel_apply(lambda x: get_cost_of_nearest_mol(group, x)).sum()
what happens here is that instead of doing the parallel_apply function, it loops back to the start of my code and repeats it all again. the exact same code works fine on my remote linux mashine so I have 2 possible error sources:
since pandarallel itself already has some difficulties with the windows os it might just be a windows problem
the other thing is that I currently use the early access version of pycharm (223.7401.13) and use the debugger which might also be a problem source
other than this bug I can highly recommend the pandarallel package (at least for linux users). it's super easy to use and if you got some cores it can really shave off some time, in my case it shaved off a cool 90% of time.
(also if there is a better way to report bugs, please let me know)

Full form of ttwu in the scheduling code of the linux kernel

I know this is kind of silly, but I tried to find out online, but couldnt.
What is the full-form of ttwu in the scheduler code of the linux kernel. It can be seen as a number of function prefixes, namely,
ttwu_do_wakeup
ttwu_do_activate
ttwu_queue_remote
ttwu_activate
.. and many more
I would assume it stands for try_to_wake_up. See for example the comment in kernel/sched/sched.h:
981 /* try_to_wake_up() stats */
982 unsigned int ttwu_count;
983 unsigned int ttwu_local;
yes in true *nix philosophy why waste time on extra characters (e.g. you want to know the current working directory? Use pwd for "print working directory") TTWU is indeed "Try To Wake Up" and implemented in the Linux scheduler code, eventually calling activate_task, which actually DOES NOT DO ANYTHING but put the task on the run queue of one of the CPUs. At some point in the future the _schedule function will make it activate (via switch_context.) Pretty cool stuff if you ask me.

how reliable is os.cpu_count()?

I am using this method in one of my applications. Should one be wary of it returning None/0 or raising error in some operating systems?
I know that os.stat(path) might return some dummy 0s if info is not available.
The nice thing about open-source projects like Python is that you can check the source when you're not sure how something works!
From searching for cpu_count in the Python repo, it appeared that os.cpu_count() is defined in C, rather than pure python:
#define OS_CPU_COUNT_METHODDEF \
{"cpu_count", (PyCFunction)os_cpu_count, METH_NOARGS, os_cpu_count__doc__},
Looking at the C source code, it appears that it could return None if the operating system doesn't support the syscall to find number of processors in the current context. I'd expect this to be quite rare.

How to find TLS segments of the current thread on linux amd64?

I'm looking for a way to find out the memory addresses of TLS segments for the current thread on linux, amd64. Bonus point for a solution that works on OSX.
Looked into various language runtime or GC (like boehm), but couldn't go through the multiple layer of abstractions to support all kind of systems so far. Any help appreciated.
Did you have a look at the solution Martin and I came up with in druntime?
What we do there boils down to scanning the segments in the corresponding dl_phdr_info (obtained by looking for the correct one using dl_iterate_phdr) for the segment with type PT_TLS, and storing its module id and size.
You can then get the start of the address range on the current thread by calling __tls_get_addr for offset 0 and the module id (there is an offset on some archs), and the end by simply adding the size you determined to that. If you do not need to support shared libraries, you can also simply use fs/gs on x86 for that (might be required if you want to link a static executable).
This works for Linux and FreeBSD (and probably other ELF platforms), but not OS X. There, the best I could come up with so far is this:
void _d_dyld_getTLSRange(void* arbitraryTLSSymbol, void** start, size_t* size) {
dyld_enumerate_tlv_storage(
^(enum dyld_tlv_states state, const dyld_tlv_info *info) {
assert(state == dyld_tlv_state_allocated);
if (info->tlv_addr <= arbitraryTLSSymbol &&
arbitraryTLSSymbol < (info->tlv_addr + info->tlv_size)
) {
// Found the range we are looking for.
*start = info->tlv_addr;
*size = info->tlv_size;
}
}
);
}
The naive implementation currently used in LDC's druntime does not quite handle shared libraries, though, and dyld_enumerate_tlv_storage is from dyld_priv.h, which might or might not be a problem for App Store publishing.
On Linux, the thread-specific segment is set up via arch_prtcl(ARCH_SET_FS, <addr>) call. You can find out what it was set to in the current thread via arch_prctl(ARCH_GET_FS, ...).
Bonus point for a solution that works on OSX.
OSX is a completely different OS, and uses completely different mechanism for its TLS support.

Visual C++ App crashes before main in Release, but runs fine in Debug

When in release it crashes with an unhandled exception: std::length error.
The call stack looks like this:
msvcr90.dll!__set_flsgetvalue() Line 256 + 0xc bytes C
msvcr90.dll!__set_flsgetvalue() Line 256 + 0xc bytes C
msvcr90.dll!_getptd_noexit() Line 616 + 0x7 bytes C
msvcr90.dll!_getptd() Line 641 + 0x5 bytes C
msvcr90.dll!rand() Line 68 C
NEM.exe!CGAL::Random::Random() + 0x34 bytes C++
msvcr90.dll!_initterm(void (void)* * pfbegin=0x00000003, void (void)* * pfend=0x00345560) Line 903 C
NEM.exe!__tmainCRTStartup() Line 582 + 0x17 bytes C
kernel32.dll!7c817067()
Has anyone got any clues?
Examining the stack dump:
InitTerm is simply a function that walks a list of other functions and executes each in step - this is used for, amongst other things, global constructors (on startup), global destructors (on shutdown) and atexit lists (also on shutdown).
You are linking with CGAL, since that CGAL::Random::Random in your stack dump is due to the fact that CGAL defines a global variable called default_random of the CGAL::Random::Random type. That's why your error is happening before main, the default_random is being constructed.
From the CGAL source, all it does it call the standard C srand(time(NULL)) followed by the local get_int which, in turn, calls the standard C rand() to get a random number.
However, you're not getting to the second stage since your stack dump is still within srand().
It looks like it's converting your thread into a fiber lazily, i.e., this is the first time you've tried to do something in the thread and it has to set up fiber-local storage before continuing.
So, a couple of things to try and investigate.
1/ Are you running this code on pre-XP? I believe fiber-local storage (__set_flsgetvalue) was introduced in XP. This is a long shot but we need to clear it up anyway.
2/ Do you need to link with CGAL? I'm assuming your application needs something in the CGAL libraries, otherwise don't link with it. It may be a hangover from another project file.
3/ If you do use CGAL, make sure you're using the latest version. As of 3.3, it supports a dynamic linking which should prevent the possibility of mixing different library versions (both static/dynamic and debug/nondebug).
4/ Can you try to compile with VC8? The CGAL supported platforms do NOT yet include VC9 (VS2008). You may need to follow this up with the CGAL team itself to see if they're working on that support.
5/ And, finally, do you have Boost installed? That's another long shot but worth a look anyway.
If none of those suggestions help, you'll have to wait for someone more knowledgable than I to come along, I'm afraid.
Best of luck.
Crashes before main() are usually caused by a bad constructor in a global or static variable.
Looks like the constructor for class Random.
Do you have a global or static variable of type Random? Is it possible that you're trying to construct it before the library it's in has been properly initialized?
Note that the order of construction of global and static variables is not fixed and might change going from debug to release.
Could you be more specific about the error you're receiving? (unhandled exception std::length sounds weird - i've never heard of it)
To my knowledge, FlsGetValue automatically falls back to TLS counterpart if FLS API is not available.
If you're still stuck, take .dmp of your process at the time of crash and post it (use any of the numerous free upload services - and give us a link) (Sounds like a missing feature in SO - source/data file exchange?)

Resources