Tool for tracing and visualisation of pthread behaviour in Linux [closed]

Tool for tracing and visualisation of pthread behaviour in Linux [closed] - linux

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I am eager to find a tool that allows me to trace the behaviour of the pthreads in a program I am working on. I am aware that there where similar questions asked before, see here and here .
As it turns out, the tools that are recommended are not what I need or it seems impossible to get them to to work on my machine. It is Debian 6, 32-bit all over on x86 architecture.
EZtrace in combination with ViTE seems to be what I am looking for. But unfortunately I cannot get it to work. (Tools wont compile in some versions, other versions crash, never really saw it work. Different Computer (Ubuntu 10.04 x64) shows other bugs)
Is there a tracing solution that is capable of visualizing the behavior of a pthreaded program on Linux, that is actually known to work?

Valgrind's Tool Suite [ Linux and OS/X ]
I've used Memcheck and it works as advertised. I haven't used the visualization tools yet however. Not sure if the output of Helgrind can be adapted for viewing with kCachegrind.
The Valgrind distribution includes four [sic] useful debugging and profiling tools:
Memcheck detects memory-management problems, and is aimed primarily at C and C++ programs. When a program is run under Memcheck's supervision, all reads and writes of memory are checked, and calls to malloc/new/free/delete are intercepted. As a result, Memcheck can detect if your program:
Accesses memory it shouldn't ...
Uses uninitialised values in dangerous ways.
Leaks memory.
Does bad frees of heap blocks (double frees, mismatched frees).
Passes overlapping source and destination memory blocks to memcpy() and related functions.
Memcheck reports these errors as soon as they occur, giving the source line number at which it occurred...
Cachegrind is a cache profiler. It performs detailed simulation of the I1, D1 and L2 caches in your CPU and so can accurately pinpoint the sources of cache misses in your code...
Callgrind, by Josef Weidendorfer, is an extension to Cachegrind. It provides all the information that Cachegrind does, plus extra information about callgraphs. It was folded into the main Valgrind distribution in version 3.2.0. Available separately is an amazing visualisation tool, KCachegrind, which gives a much better overview of the data that Callgrind collects; it can also be used to visualise Cachegrind's output.
Massif is a heap profiler. It performs detailed heap profiling by taking regular snapshots of a program's heap. It produces a graph showing heap usage over time, including information about which parts of the program are responsible for the most memory allocations...
Helgrind is a thread debugger which finds data races in multithreaded programs. It looks for memory locations which are accessed by more than one (POSIX p-)thread, but for which no consistently used (pthread_mutex_) lock can be found. Such locations are indicative of missing synchronisation between threads, and could cause hard-to-find timing-dependent problems. It is useful for any program that uses pthreads. It is a somewhat experimental tool, so your feedback is especially welcome here.

check this
http://lttng.org/ (Linux Trace Toolkit)
HTH

DIVINE can draw a graph of the state space and check for violated assertions.

Related

how to achieve disk write speed as high as fio does? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 4 years ago.
Improve this question
I bought a Highpoint a HBA card with 4 Samsung 960PRO in it.As the official site said this card can perform 7500MB/s in writing and 13000MB/s in reading.
When I test this card with fio in my ubuntu 16.04 system,I got a writing speed of about 7000MB/s,here is my test arguments:
sudo fio -filename=/home/xid/raid0_dir0/fio.test -direct=1 -rw=write -ioengine=sync -bs=2k -iodepth=32 -size=100G -numjobs=1 -runtime=100 -time_base=1 -group_reporting -name=test-seq-write
I have made a raid0 in the card and made a xfs filesystem.I want to know how to achieve disk writing speed as high as fio performed if I want to use functions such as "open(),read(),write()" or functions such as "fopen(),fread(),fwrite()" in my console applications.

I'll just note that the fio job you specified seems a little flawed:
-direct=1 -rw=write -ioengine=sync -bs=2k -iodepth=32
(for the sake of simplicity let's assume the dashes are actually double)
The above is asking trying to ask a synchronous ioengine to use an iodepth of greater than one. This usually doesn't make sense and the iodepth section of the fio documentation warns about this:
iodepth=int
Number of I/O units to keep in flight against the file. Note that
increasing iodepth beyond 1 will not affect synchronous ioengines
(except for small degrees when verify_async is in use). Even async
engines may impose OS restrictions causing the desired depth not to be
achieved. [emphasis added]
You didn't post the fio output so we can't tell if you ever achieved an iodepth of greater than one. 7.5GByte/s seems high for such a job and I can't help thinking your filesystem quietly went and did buffering behind your back but who knows? I can't say more because the output of you fio run is unavailable I'm afraid.
Also note the data fio was writing might not have been random enough to defeat compression thus helping to achieve an artificially high I/O speed...
Anyway to your main question:
how to achieve disk write speed as high as fio does?
Your example shows you are telling fio to use an ioengine that does regular write calls. With this in mind, theoretically you should be able to achieve a similar speed by:
Preallocating your file and only writing into the allocated portions of it (so you are not doing extending writes)
Fulfilling all the requirements of using O_DIRECT (there are strict memory alignment and size constraints that MUST be fulfilled)
Making sure your write operations are working on buffers writing in chunks of exactly 2048 bytes (or greater so long as they are powers of two)
Submitting your writes as soon as possible :-)
You may find not using O_DIRECT (and thus allowing buffered I/O to do coalescing) is better if for some reason you are unable to submit "large" well aligned buffers every time.

Is it possible in linux to disable filesystem caching for specific files? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 4 years ago.
Improve this question
I have some large files and i am ok with them being read at disk I/O capacity. I wish to have file system cache free for other files.
Is it possible to turn of file system caching for specific files in linux ?

Your question hints that you might not be the author of the program you wish to control... If that's the case the answer is "not easily". If you are looking for something where you just mark (e.g. via extended attributes) a particular set of files "nocache" the answer is no. At best you are limited to having a LD_PRELOAD wrapper around the program and the wrapper would have to be written carefully to avoid impacting all files the program would try to open etc.
If you ARE the author of the program you should take a look at using fadvise (or the equivalent madvise if you're using mmap) because after you have finished reading the data you can hint to the kernel that it should discard the pieces it cached by using the FADV_DONTNEED parameter (why not use FADV_NOREUSE? Because with Linux kernels available at the time of writing it's a no-op).
Another technique if you're the author would be to open the file with the O_DIRECT flag set but I do not recommend this unless you really know what you're doing. O_DIRECT comes with a large set of usage constraints and conditions on its use (which people often don't notice or understand the impact of until it's too late):
You MUST do I/O in multiples of the disk's block size (no smaller than 512 bytes but not unusual for it to be 4Kbytes and it can be some other larger multiple) and you must only use offsets that are similarly well aligned.
The buffers of your program will have to conform to an alignment rule.
Filesystems can choose not to support O_DIRECT so your program has to handle that.
Filesystems may simply choose to put your I/O through the page cache anyway (O_DIRECT is a "best effort" hint).
NB: Not allowing caching to be used at all (i.e. not even on the initial read) may lead to the file being read in at speeds far below what the disk can achieve.

I think you can do this by the open system call with O_DIRECT for the file you don't want to cache file on the page cache of kernel.
The meaning of O_DIRECT flag from the open manual is the following:
O_DIRECT (Since Linux 2.4.10)
Try to minimize cache effects of the I/O to and from this file. In general this will degrade perfor‐
mance, but it is useful in special situations, such as when applications do their own caching. File
I/O is done directly to/from user-space buffers. The O_DIRECT flag on its own makes an effort to
transfer data synchronously, but does not give the guarantees of the O_SYNC flag that data and neces‐
sary metadata are transferred. To guarantee synchronous I/O, O_SYNC must be used in addition to
O_DIRECT. See NOTES below for further discussion.

Why is return-to-libc much earlier than return-to-user?

I'm really new to this topic and only know some basic concepts. Nevertheless, there is a question that quite confuses me.
Solar Designer proposed the idea of return-to-libc in 1997 (http://seclists.org/bugtraq/1997/Aug/63). Return-to-user, from my understanding, didn't become popular until 2007 (http://seclists.org/dailydave/2007/q1/224).
However, return-to-user seems much easier than return-to-libc. So my question is, why did hackers spend so much effort in building a gadget chain by using libc, rather than simply using their own code in the user space when exploiting a kernel vulnerability?
I don't believe that they did not realize there are NULL pointer dereference vulnerabilities in the kernel.

Great question and thank you for taking some time to research the topic before making a post that causes it's readers to lose hope in the future of mankind (you might be able to tell I've read a few [exploit] tags tonight)
Anyway, there are two reasons
return-to-libc is generic, provided you have the offsets. There is no need to programmatically or manually build either a return chain or scrape existing user functionality.
Partially because of linker-enabled relocations and partly because of history, the executable runtime of most programs executing on a Linux system essentially demand the libc runtime, or at least a runtime that can correctly handle _start and cons. This formula still stands on Windows, just under the slightly different paradigm of return-to-kernel32/oleaut/etc but can actually be more immediately powerful, particularly for shellcode with length requirements, for reasons relating to the way in which system calls are invoked indirectly by kernel32-SSDT functions
As a side-note, If you are discussing NULL pointer dereferences in the kernel, you may be confusing return to vDSO space, which is actually subject to a different set of constraints that standard "mprotect and roll" userland does not.

Ret2libc or ROP is a technique you deploy when you cannot return to your shellcode (jmp esp/whatever) because of memory protections (DEP/NX).

They perform different goals, and you may engage with both.
Return to libc
This is a buffer overrun exploit where you have control over some input, and you know there exists a badly written program which will copy your input onto the stack and return through it. This allows you to start executing code on a machine you have not got a login to, or a very restrictive access.
Return to libc gives you the ability to control what code is executed, and would generally result in you running within the space, a more natural piece of code.
return from kernel
This is a privilege escalation attack. It takes code which is running on the machine, and exploits bugs in the kernel causing the kernel privileges to be applied to the user process.
Thus you may use return-to-libc to get your code running in a web-browser, and then return-to-user to perform restricted tasks.
Earlier shellcode
Before return-to-libc, there was straight shellcode, where the buffer overrun included the exploit code, with knowledge about where the stack would be, this was possible to run directly. This became somewhat obsolete with the NX bit in windows. (x86 hardware was capable of having execute on a segment, but not a page).
Why is return-to-libc much earlier than return-to-user?
The goal in these attacks is to own the machine, frequently, it was easy to find a vulnerability in a user-mode process which gave you access to what you needed, it is only with the hardening of the system by reducing the privilege at the boundary, and fixing the bugs in important programs, that the new return-from-kernel became necessary.

what is big deal deploying debug build in production [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Given that 90% of time, developers are working with debug builds, why, exactly, is deploying release builds preferable?

Size, speed, and memory use.
Your users aren't going to need all the debugging crud you work with, so stripping the debug symbols reduces binary size and memory consumption (and therefore increases speed, as less time is spent loading the program's components into RAM).
When your application crashes, you usually want a traceback and the details. Your users really couldn't care less about that.

There is nothing wrong with deploying a debug build. People commonly don't because non-debug builds tend to be more efficient (i.e. the assertion code is removed and the compiler doesn't insert tracing/debugger info in the object code).

With Java 1.1, the debug builds could be much slower and they needed more disk space (at that time, disk with 120 megabytes were huge - try to fit your home directory on such a small device...).
Today, both are not an issue anymore. The Java runtime ignored debug symbols and with the JIT, the code is optimized at runtime so the compile time optimizations don't mean that much anymore.
One big advantage of debug builds is that the code in production is exactly what you tested.

That entirely depends on the build configuration, which may have compilation optimisations when doing a "release" build. There are other considerations, dependant on language; for example, when compiling an application using Visual C++, you can compile against debug versions of the C RunTime (CRT), which you are not permitted to redistribute.

I think the key here is determine what exactly makes a build a "debug build". Mainly production builds are more optimized (memory usage, performance, etc.), and more care has been taken to build it in such a way that makes it easier to deploy and maintain. One main thing that I run into the most is that debug builds have a lot of unneeded logging that results in a pretty dramatic performance hit.
Apart from that, there is no real reason why debug builds shouldn't be deployed to production environments.

handling GUI with assembly language in Linux [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
i am very new to assembly programming language. i have a little experience with MASM which works on windows. i want to know how to deal with GUI in Linux.(i have done simple programs in assembly on Linux using gcc) i would like if some one can give me any resources particularly coding samples.
Thanks !!

You'll want:
NASM: A cross platform assembler
GTK+: a C GUI library
Ubuntu: The most popular desktop Linux distribution
An example of GTK in use with NASM on Linux

If you want to go "low level", there is (or used to be) https://en.wikipedia.org/wiki/Direct_Graphics_Access on Linux + XFree86 (now called X.org).
You could map the framebuffer into user-space and draw on it with loads/stores, instead of read/write system calls on sockets (the end result of GTK+ function calls is normally talking to the X server over a socket).
Or there are various libraries that allow more or less direct access to video RAM and video modes when your program is running on a virtual console with no X server (or other display server like Wayland).
https://en.wikipedia.org/wiki/Virtual_console#Interface mentions DirectFB, DRI, SDL and the old SVGALib.
Changing video modes will normally still require a system call, but you can load/store to video RAM. Different libraries presumably have different ways for dealing with vsync / double-buffering / whatever.
This isn't necessarily faster or better: modern graphics cards are much more than dumb framebuffers. They have GPU hardware that can accelerate graphics operations, and storing directly to video RAM doesn't take advantage of it.
But if you really want to play around with direct access to video RAM, those links should be a good starting point for making it happen in user-space on a virtual console under Linux, with hopefully less risk of locking up the whole machine when your program has a bug.
(Be ready to SSH in from another machine and kill your process + chvt, though. And make sure you enable the magic SysRQ keys for killing all processes on the current console, and for "unRaw" keyboard input.
Disclaimer: I have not personally written software that does this, but there are some examples like FBI (framebuffer image viewer). I have recovered the console with ssh and/or SysRQ without rebooting after using buggy software and/or buggy drivers, though.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string