How can I get all frames of stack traces in "!htrace -diff" result? - memory-leaks

It seems that "!htrace -diff" can only show 16 frames. How can I increase the frame counts in the stack traces? The following is one of the handles leaked detected by !htrace -diff. I can't read anything from it without a complete stack trace.
Handle = 0x00000f7c - OPEN
Thread ID = 0x00001cc4, Process ID = 0x00009f20
0x01b8dad8: +0x01b8dad8
0x018c6e93: +0x018c6e93
0x7788179a: +0x7788179a
0x000a20bb: +0x000a20bb
0x753ab069: +0x753ab069
0x7539cf87: +0x7539cf87
0x75322776: +0x75322776
0x7539d07e: +0x7539d07e
0x7539c549: +0x7539c549
0x778ae707: +0x778ae707
0x7785c32e: +0x7785c32e
0x77a2ff66: ntdll!ZwCreateEvent+0x00000012
0x69bffc58: verifier!AVrfpNtCreateEvent+0x0000006b
0x77390d93: KERNELBASE!CreateEventExW+0x0000006e
0x773911c6: KERNELBASE!CreateEventW+0x00000027
0x69bffd8f: verifier!AVrfpCreateEventW+0x00000078

This link point to this one Which tells that basically it is hard coded.
The maximum depth of the stack trace is currently hardcoded to 16
(although it's possible it will change in the future). Also, that
includes a few entries for the kernel-mode portion of the stack
trace. Those stack trace entries can be displayed by kernel or driver
developers by using !htrace in a kernel debugger. So getting around
11 user-mode entries for each of your traces sounds accurate.

Unfortunately you can't.
Assuming that you have symbols set up correctly, I see the following possibility
Some of the traces reported by !htrace may be from a different process context. In this case, the return addresses may not resolve properly in the current process context, or may resolve to the wrong symbols.
Source: WinDbg help (.hh !htrace)
This can happen if a different process injects handles into your process and the addresses correlate to that process. In this case, the process ID listed by !htrace does not match the process you're debugging (type | (pipe) to get the process ID).
In such a case, you can attach to the process (.attach 0x<pid>, 0n is default here) and try to get the remaining callstack from there, but I never did this myself.

Related

What is the reason of Corrupted fields problem in SPLUNK?

I have a problem on this search below for last 25 days:
index=syslog Reason="Interface physical link is down" OR Reason="Interface physical link is up" NOT mainIfname="Vlanif*" "nw_ra_a98c_01.34_krtti"
Normally field7 values are like these ones:
Region field7 Date mainIfname Reason count
ASYA nw_ra_m02f_01.34pndkdv may 9 GigabitEthernet0/3/6 Interface physical link is up 3
ASYA nw_ra_m02f_01.34pldtwr may 9 GigabitEthernet0/3/24 Interface physical link is up 2
But recently they wee like this:
00:00:00.599 nw_ra_a98c_01.34_krtti
00:00:03.078 nw_ra_a98c_01.34_krtti
I think problem may be related to:
It started to happen after the disk free alarm. (-Cri- Swap reservation, bottleneck situation, current value: 95.00% exceeds configured threshold: 90.00%. : 07:17 17/02/20)
Especially This is not about disk, it's about swap space, the application finishes memory and then goes to swap use. There was memory increase before, but obviously it was insufficient, it is switching to swap again.
I need to understand: ''Why they use so many resources?''
Problematic one:
Normal one:
You need to provide example events, one from the normal situation, and one from the problematic situation.
It appears that someone in your environment has developed a field extraction for field7, which is incorrectly parsing the event.
Alternatively, it could the device that is sending the syslog data, may have an issue with it and it is reporting an error. Depending on the device, you may be better using a TA from splunkbase.splunk.com to extract the relevant information from the event

"shmop_open(): unable to attach or create shared memory segment 'No error':"?

I get this every time I try to create an account to ask this on Stack Overflow:
Oops! Something Bad Happened!
We apologize for any inconvenience, but an unexpected error occurred while you were browsing our site.
It’s not you, it’s us. This is our fault.
That's the reason I post it here. I literally cannot ask it on Overflow, even after spending hours of my day (on and off) repeating my attempts and solving a million reCAPTCHA puzzles. Can you maybe fix this error soon?
With no meaningful/complete examples, and basically no documentation whatsoever, I've been trying to use the "shmop" part of PHP for many years. Now I must find a way to send data between two different CLI PHP scripts running on the same machine, without abusing the database for this. It must work without database support, which means I'm trying to use shmop, but it doesn't work at all:
$shmopid = shmop_open(1, 'w', 0644, 99999); // I have no idea what the "key" is supposed to be. It says: "System's id for the shared memory block. Can be passed as a decimal or hex.", so I've given it a 1 and also tried with 123. It gave an error when I set the size to 64, so I increased it to 99999. That's when the error changed to the one I now face above.
shmop_write($shmopid, 'meow 123', 0); // Write "meow 123" to the shared variable.
while (1)
{
$shared_string = shmop_read($shmopid, 0, 8); // Read the "meow 123", even though it's the same script right now (since this is an example and minimal test).
var_dump($shared_string);
sleep(1);
}
I get the error for the first line:
shmop_open(): unable to attach or create shared memory segment 'No error':
What does that mean? What am I doing wrong? Why is the manual so insanely cryptic for this? Why isn't this just a built-in "superarray" that can be accessed across the scripts?
About CLI:
It cannot work in standalone CLI processes, as an answer here says:
https://stackoverflow.com/a/34533749
The master process is the one to hold the shared memory block, so you will have to use php-fpm or mod_php or some other web/service-running version, and maybe even start/request/stop it all from a CLI php script.
About shmop usage itself:
Use "c" mode in shmop_open() for creating the segment before it can be used with "a" or "w".
I stumbled on this error in a different scenario where shared memory is completely optional to speed up some repeated operations. So I wanted to try reading first without knowing memory size and then allocate from actual data when needed. In my case I had to call it as #shmop_open() to hide this error output.
About shmop on Windows:
PHP 7 crashed Apache worker process causing its restart with status 3221225477 when trying to reallocate a segment with the same predefined (arbitrary number) key but different size, even after shmop_delete(). As a workaround for this, I took filemtime() of the source file which contains data to be stored in memory, and used that timestamp as the key in shmop_open(). It still was not flawless IIRC, and I don't know if it would cause memory leaks, but it was enough to test my code which would mainly run on Linux anyway.
Finally, as of PHP version 8.0.7, shmop seems to work fine with Apache 2.4.46 and mod_php in Windows 10.

Linux kernel crash call stack length

Would anyone be able to tell how I can easily increase the number of call stack functions reported on a Linux Kernel crash?
Currently I see:
[<80100ca8>] (free_buffer_head) from [<80100d2c>] (try_to_free_buffers+0x7c/0xbc)
[<80100d2c>] (try_to_free_buffers) from [<800a9358>] (invalidate_inode_page+0x64/0x7c)
[<800a9358>] (invalidate_inode_page) from [<800a9454>] (invalidate_mapping_pages+0xe4/0x168)
[<800a9454>] (invalidate_mapping_pages) from [<80280f68>] (blkdev_ioctl+0x40c/0x910)
[<80280f68>] (blkdev_ioctl) from [<800e78a0>] (do_vfs_ioctl+0x3dc/0x59c)
[<800e78a0>] (do_vfs_ioctl) from [<800e7a94>] (SyS_ioctl+0x34/0x5c)
[<800e7a94>] (SyS_ioctl) from [<8000e460>] (ret_fast_syscall+0x0/0x30)
But I would really like to see even more output to get to the root of the error.
Many thanks
I think you already see everything. SyS_ioctl is entry function from userspace call and free_buffer_head is function where it actually crashed. It will not provide the backtrace from userspace program as, first it can be application specific, secondly it's not relevant to kernel crash.

list_empty returns 1 (non zero) if the list is not empty

I am working on Linux Device Driver code. I cant reveal what exactly this code is used for. I will try my best to explain my situation. Below code will be executed in interrupt context when we receive an USB interrupt stating that there is some data from USB. The data will arrive as URB's and we will convert them into chunks, add them to Linux Circular Doubly linked list for further processing.
spin_lock(&demod->demod_lock);
/* Delete node from free list */
list_del(&chunk->list_head);
/* Add it to full list */
list_add_tail(&chunk->list_head,&demod->ChunkRefList.full);
/* Update chunk status */
chunk->status=CHUNKDESC_FULL_LIST;
/* Increment the chunks assigned to application */
demod->stats.assigned.c++;
spin_unlock(&demod->demod_lock);
printk("Value returned by list_empty is %d ",list_empty(&demod->ChunkRefList.full));
I repeat, this code gets executed in interrupt context. I am using this code as a part of embedded system which displays Audio & Video(AV) forever.
After approx 12 hours, the AV is frozen and when I analyse, I see the value of list_empty(&demod->ChunkRefList.full) returned is always 0x1 forever. Usually in working case where AV is playing fine, its value will be 0x0 stating that the list is not empty.
As you can see, when above code gets executed, it first adds a node to the full list and checks if full list is empty. Ideally, the value should be always 0x0. Why its value is 0x1
I am monitoring the value of list_empty(&demod->ChunkRefList.full) using timedoctor applicaiton which doesn't introduce any overhead in interrupt context. I have used printk above to make you understand that I am printing its value.
Finally, I was able to figure out the issue.
Since the code is implemented in interrupt context, using list_del followed by list_add_tail was causing issue.
So, I removed these two macros and introduced list_move_tail macro which resolved the issue.
By using list_move_tail, there is no need for us to remove the node and add it to the new list. The kernel will takecare of both operations.
So, to conclude, its better to make Linux Linked list manipulations as minimal as possible in interrupt context.

Visual C++ App crashes before main in Release, but runs fine in Debug

When in release it crashes with an unhandled exception: std::length error.
The call stack looks like this:
msvcr90.dll!__set_flsgetvalue() Line 256 + 0xc bytes C
msvcr90.dll!__set_flsgetvalue() Line 256 + 0xc bytes C
msvcr90.dll!_getptd_noexit() Line 616 + 0x7 bytes C
msvcr90.dll!_getptd() Line 641 + 0x5 bytes C
msvcr90.dll!rand() Line 68 C
NEM.exe!CGAL::Random::Random() + 0x34 bytes C++
msvcr90.dll!_initterm(void (void)* * pfbegin=0x00000003, void (void)* * pfend=0x00345560) Line 903 C
NEM.exe!__tmainCRTStartup() Line 582 + 0x17 bytes C
kernel32.dll!7c817067()
Has anyone got any clues?
Examining the stack dump:
InitTerm is simply a function that walks a list of other functions and executes each in step - this is used for, amongst other things, global constructors (on startup), global destructors (on shutdown) and atexit lists (also on shutdown).
You are linking with CGAL, since that CGAL::Random::Random in your stack dump is due to the fact that CGAL defines a global variable called default_random of the CGAL::Random::Random type. That's why your error is happening before main, the default_random is being constructed.
From the CGAL source, all it does it call the standard C srand(time(NULL)) followed by the local get_int which, in turn, calls the standard C rand() to get a random number.
However, you're not getting to the second stage since your stack dump is still within srand().
It looks like it's converting your thread into a fiber lazily, i.e., this is the first time you've tried to do something in the thread and it has to set up fiber-local storage before continuing.
So, a couple of things to try and investigate.
1/ Are you running this code on pre-XP? I believe fiber-local storage (__set_flsgetvalue) was introduced in XP. This is a long shot but we need to clear it up anyway.
2/ Do you need to link with CGAL? I'm assuming your application needs something in the CGAL libraries, otherwise don't link with it. It may be a hangover from another project file.
3/ If you do use CGAL, make sure you're using the latest version. As of 3.3, it supports a dynamic linking which should prevent the possibility of mixing different library versions (both static/dynamic and debug/nondebug).
4/ Can you try to compile with VC8? The CGAL supported platforms do NOT yet include VC9 (VS2008). You may need to follow this up with the CGAL team itself to see if they're working on that support.
5/ And, finally, do you have Boost installed? That's another long shot but worth a look anyway.
If none of those suggestions help, you'll have to wait for someone more knowledgable than I to come along, I'm afraid.
Best of luck.
Crashes before main() are usually caused by a bad constructor in a global or static variable.
Looks like the constructor for class Random.
Do you have a global or static variable of type Random? Is it possible that you're trying to construct it before the library it's in has been properly initialized?
Note that the order of construction of global and static variables is not fixed and might change going from debug to release.
Could you be more specific about the error you're receiving? (unhandled exception std::length sounds weird - i've never heard of it)
To my knowledge, FlsGetValue automatically falls back to TLS counterpart if FLS API is not available.
If you're still stuck, take .dmp of your process at the time of crash and post it (use any of the numerous free upload services - and give us a link) (Sounds like a missing feature in SO - source/data file exchange?)

Resources