In Dmalloc tool is there way i will give check only specific memory range? - memory-leaks

We are running dmalloc tool for specific app but we are getting too many traces that are not from app they are from c libraries and we are getting huge log, to avoid is there any way we can feed address range or any other option we can try.
6567117: 175519: not freed: '0x17c9c80|s1' (80 bytes) from 'unknown'
this unknown reported is filed in log file

Related

Where can I get node exporter metrics description?

I'm new to monitoring the k8s cluster with prometheus, node exporter and so on.
I want to know that what the metrics exactly mean for though the name of metrics are self descriptive.
I already checked the github of node exporter, but I got not useful information.
Where can I get the descriptions of node exporter metrics?
Thanks
There is a short description along with each of the metrics. You can see them if you open node exporter in browser or just curl http://my-node-exporter:9100/metrics. You will see all the exported metrics and lines with # HELP are the description ones:
# HELP node_cpu_seconds_total Seconds the cpus spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="idle"} 2.59840376e+07
Grafana can show this help message in the editor:
Prometheus (with recent experimental editor) can show it too:
And this works for all metrics, not just node exporter's. If you need more technical details about those values, I recommend searching for the information in Google and man pages (if you're on Linux). Node exporter takes most of the metrics from /proc almost as-is and it is not difficult to find the details. Take for example node_memory_KReclaimable_bytes. 'Bytes' suffix is obviously the unit, node_memory is just a namespace prefix, and KReclaimable is the actual metric name. Using man -K KReclaimable will bring you to the proc(5) man page, where you can find that:
KReclaimable %lu (since Linux 4.20)
Kernel allocations that the kernel will attempt to
reclaim under memory pressure. Includes
SReclaimable (below), and other direct allocations
with a shrinker.
Finally, if this intention to learn more about the metrics is inspired by the desire to configure alerts for your hardware, you can skip to the last part and grab some alerts shared by the community from here: https://awesome-prometheus-alerts.grep.to/rules#host-and-hardware

SB37 abend in production and can not change the space parameter

My colleague faced an issue, where his sort job failed with an SB37 abend, I know that this error can be rectified by allocating more space to the output file but my question here is:
How can I remediate an SB37 abend without changing space allocation?
It takes a week or more to move changes to production. As such, I can't change the space allocation of file at the moment as the error is in production.
An SB37 abend indicates an out of space condition during end-of-volume processing.
B37 Explanation The error was detected by the end-of-volume
routine. This system completion code is accompanied by message
IEC030I. Refer to the explanation of message IEC030I for complete
information about the task that was ended and for an explanation of
the return code (rc in the message text) in register 15.
This is accompanied with message IEC030I which will provide more information about the issue.
Depending on a few items your production control team may be able to fix the environment where it would allow the job to run. Lacking more detail it is impossible to provide an exact answer so consider this a roadmap on how to approach the problem.
IEC030I B37-rc,mod, jjj,sss,ddname[-#],
dev,ser,diagcode,dsname(member)
In the message there should be a volser that identifes the volume that was being written to. If you have the production control team look at the contents of that volume there may be insufficient space that can be remedied by removing datasets. There are too many options to enumerate without specifics about the failure, type of dataset and other information to guide you.
However, as indicated in other comments, if you have a production control team that can run the job, they should be able to make changes to the JCL to direct the output dataset to another set of volumes or storage groups.
Changes to the JCL are likely the only way to correct the problem.

GOOGLE DEVELOPER TOOLS - MEMORY LEAK TRACING

I’ve been trying to trace memory leaks in a hybrid-app built using ionic. I tried using the Google’s developer tools for the same. When i perform heap dump and try tracing any Constructor value, I get there is a Detached DOM tree is present but when i expand the same and view the origin in Objects in the Retainers tab, I’m not able to find which file the variable is located. With this I’m able to say that there is a memory leakage but not the file where it is located. Is there any way to find the file in which the variable or the array causing the leakage is present.
In below picture, I'm not finding a way as to how to trace the file of the memory leak present:

Relevant debug data for a Linux target

For an embedded ARM system running in-field there is a need to retrieve relevant debug information when a user-space application crash occurs. Such information will be stored in a non-volatile memory so it could be retreived at a later time. All such information must be stored during runtime, and cannot use third-party applications due to memory consumption concerns.
So far I have thought of following:
Signal ID and corresponding PC / memory addresses in case a kernel SIG occurs;
Process ID;
What other information do you think it's relevant in order to indentify the causing problem and be able to do a fast debug afterwards?
Thank you!
Usually, to be able to understand an issue, you'll need every register (from r0 to r15), the CPSR, and the top of the stack (to be able to determine what happened before the crash). Please also note that, when your program is interrupt for any invalid operation (jump to invalid address, ...), the processor goes to an exception mode, while you need to dump the registers and stack in the context of your process.
To be able to investigate, using those data, you also must keep the ELF files (with debug information, if possible) from your build, to be able to interpret the content of your registers and stack.
In the end, the more information you keep, the easier the debug is, but it may be expensive to keep every memory sections used by your program at the time of the failure (as a matter of fact, I've never done this).
In postmortem analysis, you will face some limits :
Dynamically linked libraries : if your crash occurs in a dynamically loaded and linked code, you will also need the lib binary you are using on your target.
Memory corruption : memory corruption usually results in the call of random data as code. On ARM with linux, this will probably lead to a segfault, as you can't go to an other process memory area, and as your data will probably be marked as "never execute", nevertheless, when the crash happens, you may have already corrupted the data that could have allow you to identify the source of the corruption. Postmortem analysis isn't always able to identify the failure cause.

Minimal core dump (stack trace + current frame only)

Can I configure what goes into a core dump on Linux? I want to obtain something like the Windows mini-dumps (minimal information about the stack frame when the app crashed). I know you can set a max size for the core files using ulimit, but this does not allow me to control what goes inside the core (i.e. there is no guarantee that if I set the limit to 64kb it will dump the last 16 pages of the stack, for example).
Also, I would like to set it in a programmatic way (from code), if possible.
I have looked at the /proc/PID/coredump_filter file mentioned by man core, but it seems too coarse grained for my purposes.
To provide a little context: I need tiny core files, for multiple reasons: I need to collect them over the network, for numerous (thousands) of clients; furthermore, these are embedded devices with little SD cards, and GPRS modems for the network connection. So anything above ~200k is out of question.
EDIT: I am working on an embedded device which runs linux 2.6.24. The processor is PowerPC. Unfortunately, powerpc-linux is not supported in breakpad at the moment, so google breakpad is not an option
I have "solved" this issue in two ways:
I installed a signal handler for SIGSEGV, and used backtrace/backtrace_symbols to print out the stack trace. I compiled my code with -rdynamic, so even after stripping the debug info I still get a backtrace with meaningful names (while keeping the executable compact enough).
I stripped the debug info and put it in a separate file, which I will store somewhere safe, using strip; from there, I will use add22line with the info saved from the backtrace (addresses) to understand where the problem happened. This way I have to store only a few bytes.
Alternatively, I found I could use the /proc/self/coredump_filter to dump no memory (setting its content to "0"): only thread and proc info, registers, stacktrace etc. are saved in the core. See more in this answer
I still lose information that could be precious (global and local variable(s) content, params..). I could easily figure out which page(s) to dump, but unfortunately there is no way to specify a "dump-these-pages" for normal core dumps (unless you are willing to go and patch the maydump() function in the kernel).
For now, I'm quite happy with there 2 solutions (it is better than nothing..) My next moves will be:
see how difficult would be to port Breakpad to powerpc-linux: there are already powerpc-darwin and i386-linux so.. how hard can it be? :)
try to use google-coredumper to dump only a few pages around the current ESP (that should give me locals and parameters) and around "&some_global" (that should give me globals).

Resources