How to find TLS segments of the current thread on linux amd64?

How to find TLS segments of the current thread on linux amd64? - linux

I'm looking for a way to find out the memory addresses of TLS segments for the current thread on linux, amd64. Bonus point for a solution that works on OSX.
Looked into various language runtime or GC (like boehm), but couldn't go through the multiple layer of abstractions to support all kind of systems so far. Any help appreciated.

Did you have a look at the solution Martin and I came up with in druntime?
What we do there boils down to scanning the segments in the corresponding dl_phdr_info (obtained by looking for the correct one using dl_iterate_phdr) for the segment with type PT_TLS, and storing its module id and size.
You can then get the start of the address range on the current thread by calling __tls_get_addr for offset 0 and the module id (there is an offset on some archs), and the end by simply adding the size you determined to that. If you do not need to support shared libraries, you can also simply use fs/gs on x86 for that (might be required if you want to link a static executable).
This works for Linux and FreeBSD (and probably other ELF platforms), but not OS X. There, the best I could come up with so far is this:
void _d_dyld_getTLSRange(void* arbitraryTLSSymbol, void** start, size_t* size) {
dyld_enumerate_tlv_storage(
^(enum dyld_tlv_states state, const dyld_tlv_info *info) {
assert(state == dyld_tlv_state_allocated);
if (info->tlv_addr <= arbitraryTLSSymbol &&
arbitraryTLSSymbol < (info->tlv_addr + info->tlv_size)
) {
// Found the range we are looking for.
*start = info->tlv_addr;
*size = info->tlv_size;
}
}
);
}
The naive implementation currently used in LDC's druntime does not quite handle shared libraries, though, and dyld_enumerate_tlv_storage is from dyld_priv.h, which might or might not be a problem for App Store publishing.

On Linux, the thread-specific segment is set up via arch_prtcl(ARCH_SET_FS, <addr>) call. You can find out what it was set to in the current thread via arch_prctl(ARCH_GET_FS, ...).
Bonus point for a solution that works on OSX.
OSX is a completely different OS, and uses completely different mechanism for its TLS support.

Related

How NVIC function working NVIC_EnableIRQ(CAN1_RX0_IRQn);

so my friend asked me to write my own implementation for above function NVIC_Enable_IRQ(CAN1_RX0_IRQn); to enable can reception interrupt.
initially i thought its impossible to right such implementation ..
could anybody please explain me like the register associated with NVIC where i directly go and change the required value ,so that there is no need of implementation of above function and CAN reception interrupt is enabled.
this line NVIC_EnableIRQ(CAN1_RX0_IRQn); i copied from example code given in stm32f example code of CAN

Everything that starts with NVIC_ is part of the CMSIS library supplied by ARM to set up the ARM core (which is independent of the MCU manufacturer). You don't really want to mess with them, so you'd better use them.
In the CMSIS core_cm4.h (for a cortex M4), you can find:
__STATIC_INLINE void NVIC_EnableIRQ(IRQn_Type IRQn)
{
NVIC->ISER[(((uint32_t)(int32_t)IRQn) >> 5UL)] = (uint32_t)(1UL << (((uint32_t)(int32_t)IRQn) & 0x1FUL));
}
Now, if you don't want to call NVIC_EnableIRQ or if you don't want to use the CMSIS, well good luck, you need to read the ARM core documentation to check which addresses you need to modify. The ARM core documentation can be found on the ARM or Keil website. For example, you might find those links useful for the Cortex M4:
https://www.arm.com/products/processors/cortex-m/cortex-m4-processor.php
http://www.keil.com/dd/docs/datashts/arm/cortex_m4/r0p1/dui0553a_cortex_m4_dgug.pdf

Size of an OpenGL context

Is there a way to get a size of an opengl context? Or at least to estimate it's size? If yes, how?
I have an application in glut, which creates several windows. Since glut doesn't share opengl context between windows, every window is going to create new. Now, I am trying to reduce needed memory, since it is for an embedded system. But if the opengl context is small enough to neglect it, then I am not going to see big reduction in memory usage.
I have found this patch to create windows with shared opengl context :
A small addendum for Windows users (by Misbah Qidwai): I added this subroutine to glut_win.c. I use this routine to call wglSharedLists()
//MQ
/* CENTRY */
GLXContext APIENTRY
glutGetWindowRenderContext(int win)
{
GLUTwindow *window;
if (win < 1 || win > __glutWindowListSize) {
__glutWarning("glutSetWindow attempted on bogus window.");
return NULL;
}
window = __glutWindowList[win - 1];
if (!window) {
__glutWarning("glutSetWindow attempted on bogus window.");
return NULL;
}
return window->renderCtx;
}

A OpenGL context is an abstract thing. The amount of data backing a particular context can be as small as a pointer, or as big as a few megabytes. The context itself is not some kind of data structure, it's merely a handle shared by your program and the graphics system so that each other "knows" what the other is talking about.
The only way to know in a particular configuration is to measure it.

How do I find the ptep for a given address?

I'm trying to write a function that write-protects every pte in a given vm_area_struct. What is the function that gives me the ptep for a given address? I have:
pte_t *ptep;
for (addr = start; addr < end; addr += PAGE_SIZE) {
ptep = WHATS_THIS_FUNCTION_CALLED(addr);
ptep_set_wrprotect(mm, addr, ptep);
}
What's the WHATS_THIS_FUNCTION_CALLED called?

The short answer to your question is to use __get_locked_pte . However, I would advise against it since there are much better (efficient and fair in terms of resource contention) ways to accomplish your goal.
In linux, the typical idiom for traversing page tables is with a nested for loop four levels deep (four is the number of page table levels linux supports). For examples, see copy_page_range and apply_to_page_range in mm/memory.c. In fact, if you look closely at copy_page_range, it is called when forking from dup_mmap in kernel/fork.c. It operates on an entire vm_area_struct essentially.
You can replicate the idiom used in either of those functions. There are some caveats, however. For example, copy_page_range fully supports transparent hugepages (2.6.38) by using a completely separate copy_huge_pmd inside copy_pmd_range. Unless you want to write two separate functions (one for normal pages and one for transparent huge pages, see Gracefull fallback in Documentation/vm/transhuge.txt.
The point is that virtual memory in linux is very complicated, so be sure to completely understand every possible use case. follow_page in mm/memory.c should demonstrate how to cover all of your bases.

I believe you are looking for either the function virt_to_pte as defined here or your own re-implementation of it.
This function uses pte_offset_kernel(pmd_t * dir, unsigned long address) which takes a pmd (Page mid-level Directory) structure and an address , though one could also use pte_offset(pmd_t * dir, unsigned long address).
See Linux Device Drivers 3rd Edition Chapter 15 or Linux Device Drivers, 2nd Edition Chapter 13 mmap and DMA for further references.

wxCriticalSection under Linux/Unix

i discovered that a wxCriticalSection is not recursive ( does deadlock when a thread grabs a section more than once ) under linux. Looking at the sources, i discovered that a wxCriticalSection is implemented using a wxMutex under Linux, but without using wxMUTEX_RECURSIVE. I have a codebase that runs well under Win and Mac, and i want to port it to Linux, but i have deadlocks at some places where i did not avoid recursion.
Now i have two possibilities:
Changing and rebuilding wxWidgets for my purpose ( brrr - by any chance i want to avpid that since i do not know too much about the design decisions behind that )
debugging each and all of my possible code paths ( brrr - will take days and is horribly bug - prone )
Is there a third way, replacing/extending wxCriticalSection with a construct that behaves equally under Mac/Win/Unix?
ps. could someone explain the design decision to me? Mr. Vadim Z says ...
I had temporarily forgot the reason I was against this (making wxCriticalSections recursive) but I did recall it 30 seconds later (after sending my message, of course ). Please see my follow-up
But there was never a follow-up ...

In version 2.9.1, it appears that the default should be recursive. In file \wxWidgets-2.9.1\include\wx\thread.h:
inline wxCriticalSection::wxCriticalSection( wxCriticalSectionType critSecType )
: m_mutex( critSecType == wxCRITSEC_DEFAULT ? wxMUTEX_RECURSIVE : wxMUTEX_DEFAULT ) { }
And in class wxCriticalSection the constructor declaration is
wxCRITSECT_INLINE wxCriticalSection( wxCriticalSectionType critSecType = wxCRITSEC_DEFAULT );
I don't use Linux, so I can't verify that wxCriticalSection is actually recursive when compiled.

Visual C++ App crashes before main in Release, but runs fine in Debug

When in release it crashes with an unhandled exception: std::length error.
The call stack looks like this:
msvcr90.dll!__set_flsgetvalue() Line 256 + 0xc bytes C
msvcr90.dll!__set_flsgetvalue() Line 256 + 0xc bytes C
msvcr90.dll!_getptd_noexit() Line 616 + 0x7 bytes C
msvcr90.dll!_getptd() Line 641 + 0x5 bytes C
msvcr90.dll!rand() Line 68 C
NEM.exe!CGAL::Random::Random() + 0x34 bytes C++
msvcr90.dll!_initterm(void (void)* * pfbegin=0x00000003, void (void)* * pfend=0x00345560) Line 903 C
NEM.exe!__tmainCRTStartup() Line 582 + 0x17 bytes C
kernel32.dll!7c817067()
Has anyone got any clues?

Examining the stack dump:
InitTerm is simply a function that walks a list of other functions and executes each in step - this is used for, amongst other things, global constructors (on startup), global destructors (on shutdown) and atexit lists (also on shutdown).
You are linking with CGAL, since that CGAL::Random::Random in your stack dump is due to the fact that CGAL defines a global variable called default_random of the CGAL::Random::Random type. That's why your error is happening before main, the default_random is being constructed.
From the CGAL source, all it does it call the standard C srand(time(NULL)) followed by the local get_int which, in turn, calls the standard C rand() to get a random number.
However, you're not getting to the second stage since your stack dump is still within srand().
It looks like it's converting your thread into a fiber lazily, i.e., this is the first time you've tried to do something in the thread and it has to set up fiber-local storage before continuing.
So, a couple of things to try and investigate.
1/ Are you running this code on pre-XP? I believe fiber-local storage (__set_flsgetvalue) was introduced in XP. This is a long shot but we need to clear it up anyway.
2/ Do you need to link with CGAL? I'm assuming your application needs something in the CGAL libraries, otherwise don't link with it. It may be a hangover from another project file.
3/ If you do use CGAL, make sure you're using the latest version. As of 3.3, it supports a dynamic linking which should prevent the possibility of mixing different library versions (both static/dynamic and debug/nondebug).
4/ Can you try to compile with VC8? The CGAL supported platforms do NOT yet include VC9 (VS2008). You may need to follow this up with the CGAL team itself to see if they're working on that support.
5/ And, finally, do you have Boost installed? That's another long shot but worth a look anyway.
If none of those suggestions help, you'll have to wait for someone more knowledgable than I to come along, I'm afraid.
Best of luck.

Crashes before main() are usually caused by a bad constructor in a global or static variable.
Looks like the constructor for class Random.

Do you have a global or static variable of type Random? Is it possible that you're trying to construct it before the library it's in has been properly initialized?
Note that the order of construction of global and static variables is not fixed and might change going from debug to release.

Could you be more specific about the error you're receiving? (unhandled exception std::length sounds weird - i've never heard of it)
To my knowledge, FlsGetValue automatically falls back to TLS counterpart if FLS API is not available.
If you're still stuck, take .dmp of your process at the time of crash and post it (use any of the numerous free upload services - and give us a link) (Sounds like a missing feature in SO - source/data file exchange?)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string