Internals of systemtap - linux

I want to know what actually happens internally when the event written in the tap file occurs and how it is handled?

You should read the documents presented at the end of the SystemTap Documentation page, especially the Locating System Problems Using Dynamic Instrumentation OLS paper, to know a bit more about the internals.
The Introduction to KProbes article at LWN is worth a read too.

You can explore as deeply as you like. The architecture paper or the stap(1) man page are a good start. If you want to know everything, "stap -k ..." or "stap -p3 ..." lets you inspect the actual internal outputs of the systemtap script translator: the compiled C version of the script code.

Related

How to proceed with Linux source code customization?

I am a non CS/IT student, but having knowledge of C, Java, DS and Algorithms. Now-a-days I am focusing on operating system and had gained some of its concepts. But I want some practical knowledge of it. Merely writing algo code in java/c has no fun in doing. I have gone through many articles where they mentioned we can customize source code of Linux-kernel.
I want to start customizing the kernel as I move ahead in the learning of OS concepts and apply the same. It will make two goals achievable 1. I will gain practical idea of the operating system 2. I will have a project.
Problem which I face-
1. From where to get the source code? Which source code should I download? Also the documentation if possible.
https://www.kernel.org/
I went in there but there are so many of them which one will be better?
2. How will I customize the code once I have it?
Please give me suggestions with detail about how I should start this journey (of changing source code to customize Linux).
Moreover I am using Windows 8.
I recommend first reading several books on OSes and on programming. You need a broad CS culture (if possible get a CS degree)
I am a non CS/IT student,
You'll better become one, or else spend years of work to learn all the stuff a CS graduate student has learnt.
First, you need to be very familiar with Linux programming on user side (application programs). So read at least Advanced Linux Programming and study the source code of several programs, including shells (and some kind of servers). Read also carefully syscalls(2). Explore the state of your kernel (e.g. thru proc(5)...). Look into https://kernelnewbies.org/
I also recommend learning several programming languages. You should in particular read SICP, an excellent introduction to programming. Read also some book like programming language pragmatics. Read something about continuation and continuation passing style. Read the Dragon book. Read some Introduction to Algorithms. Read something about computer architecture and instruction set architecture
Merely writing algo code in java/c has no fun in doing.
But the kernel is also written in C (mostly) and full of algorithmic code. What makes you think you'll get more fun in it?
I want to start customizing the kernel as I move ahead in the learning of OS concepts and apply the same.
But why? Why don't you also consider studying and contributing to some user-level code
I would recommend first reading a good book on OSes in general, notably Operating Systems: Three Easy Pieces. Look also on OSdev.
At last, the general advice about kernel programming is don't. A common mistake is to try adding code inside the kernel to solve some issue that can and should be solved in user-land.
How will I customize the code once I have it?
You probably should not customize the kernel, but if you did you'll use familiar tools (a good source code editor like emacs or vim, a compiler and linker on the command line, a build automation tool like make). Patching the kernel is similar to patching some other free software. But testing your kernel is harder (because you'll often reboot).
You'll also find several books explaining the Linux kernel.
If you still want to customize the kernel you should first try to code some kernel module.
Moreover I am using Windows 8.
This is a huge mistake. You first need to be an advanced Linux user. So wipe out Windows from your computer, and install some Linux distribution -I recommend Debian- (and use only Linux, no more Windows). Become familiar with command line.
I seriously recommend to avoid working on the kernel as your first project.
I strongly recommend looking at some existing user-land free software project first (there are thousands of them, notably on github, e.g. choose some package in your distribution, study its source code, work on it, propose the patch to the community). Be able to build from source code a lot of things.
A wise man once said you "must act your way into right thinking, as you cannot think your way into right acting". In your case, you'll need to act as an experienced programmer would act, which means before we write any code, we need to answer some questions.
What do we want to change?
Why do we want to change it?
What are the repercussions of this change (ie what other functions - out of all the 10's of millions of lines of source code - call this function)?
After we've made the change, how are we going to compile it? In other words, there is a defined process for this. What is it?
After we compile our new kernel/module, how are we going to test it?
A good start, in addition to the answer that was just posted, would be to run LFS (Linux from Scratch). Get a successful install of that and use it as a starting point.
Now, since we're experienced programmers, we know that tinkering with a 10M+ line codebase is a recipe for trouble; we need a bit more direction than that. Here's a list of bugs that need to be fixed: https://bugzilla.kernel.org/buglist.cgi?chfield=%5BBug%20creation%5D&chfieldfrom=7d
I, for one, would be glad to see the one called "AUFS hangs on fanotify" go away, as I use AUFS with Docker on a daily basis.
If, down the line, you decide you'd rather hack on something besides the kernel, there are plenty of other options.
From your question it follows that you've already gained some concepts of an operating system. However, if you feel that it's still insufficient, it is OK to spend more time on learning. An operating system (mainly, a kernel) has certain tasks to perform like memory management (or memory protection), multiprogramming, hardware abstraction and so on. Neither of the topics may be neglected - they are all as important. So, if you have some time, you may refer to such useful books as "Modern Operating Systems" by Andrew Tanenbaum. Special books like that will shed much light on all important aspects of a modern OS. Suffice it to say, Linux kernel itself was started by Linus Torvalds because of a strong inspiration by MINIX - an educational project by A. Tanenbaum.
Such a cumbersome project like an OS kernel (BSD, Linux, etc.) contains lots of code. Many people are collaborating to write or enhance whatever parts of the kernel. So, there is a common and inevitable need to use a version control system. So, if you have an intention to submit your code to the kernel in future, you also have to have hands on with version control. Particularly, Linux relies on Git SCM (software configuration management - a synonym for version control).
So, once you have some knowledge of Git, you can install it on your computer and download Linux source code: git clone https://github.com/torvalds/linux.git
Determine your goals at Linux kernel modification. What do you want to achieve? Perhaps, you have a network card which you suspect to miss some features in Linux? Take a look at the other vendors' drivers and make an attempt to fix the driver of interest to include the features. Of course, this will require some knowledge of the HW, and, if the features are HW dependent, you will unlikely succeed to elaborate your code without special knowledge. But, in general, - if you are trying to make an enhancement, it assumes that you are an experienced Linux user yourself. Otherwise, how will you understand that some fixes/enhancements/etc. are required? So, I can't help but agree with the proposal to postpone Windows 8 for a while and start using some Linux distribution (eg. Debian).
If you succeed to determine your goals (eg. if you find a paper describing some desired changes in Linux kernel or if you decide to enhance some device drivers / write your own), you will be able to try it hands on. However, you still might need some helpful books, but, in this case, some Linux-specific ones. Also, writing C code for the kernel itself will require one important detail - you will need to comply with a so called coding standard, otherwise Linux kernel maintainers will not be able to accept your patches.
So, I made an attempt to outline some tips based on your current question. Of course, the job of kernel development has far more broad prerequisites, but these are which are just obvious.

How to get started developing on *nix

Is there a good 'how-to' or 'getting started' guide for getting started using g++ and gdb?
Some background. Decent programmer, but so far I have done everything on Windows in Visual Studio.
I have a little experience using terminal to compile files (not much beyond a .h and 1 or 2 .cpp). But nothing beyond that.
Anyone know of a good primer on on how to get started coding on Linux?
Read some good books, notably Advanced Linux Programming and Advanced Unix Programming. Read also the advanced bash scripting guide and other documentation from Linux Documentation Project
Obviously, install some Linux distribution on your laptop (not in some VM, but on real disk partitions). If you have a debian like distribution, run aptitude build-dep gcc-4.6 gedit on it to get a lot of interesting developers packages.
Learn some command line skills. Learn to use the man command; after installing manpages and manpages-dev packages, type man man (use the space bar to "scroll text", the q key to quit). Read also the intro(2) man page. When you forgot how to use a command like cp try cp --help.
Use a version control system like git, even for one person tiny projects.
Backup your files.
Read several relevant Wikipedia pages on Linux, kernels, syscalls, free software, X11, Posix, Unix
Try hard to use the command line. For instance, try to do everything on the command line for a week or more. Avoid using your desktop, and possibly your mouse. Learn to use emacs.
Read about builder programs like GNU make
Retrieve several free software from their source code (e.g. from sourceforge or freecode or github) and practice building and compiling them. Study their source code
Basic tips to start (if a command is not found, you need to install the package providing it) in command line (in a terminal).
run emacs ; there is a tutorial menu; practice it for half an hour.
edit a helloworld.c program (with a main calling some hello function)
compile it with gcc -g -Wall helloworld.c -o helloworld; improve your code till no warnings are given. Always pass -Wall to gcc or g++ to get almost all warnings.
run it with ./helloworld
debug it with gdb ./helloworld, then
use the help command
use the b main command to add a breakpoint in main and likewise for your hello function.
run it under gdb using r
use bt to get a backtrace
use p to print some variable
use c to continue the execution of the debugged program.
write a tiny Makefile to be able to build your helloworld program using make
learn how to call make (with M-x compile) and gdb (with M-x gdb) from inside Emacs
Learn more about valgrind (to detect most memory leaks). Perhaps consider using Boehm's GC in some of your applications.
You've got a lot of things to learn. I won't give you details, but as someone who's done unix and c/c++ development for a couple decades now I'll try to give you some topics to start with.
My main advice is to start experimenting. Write the most trivial program you can in C or C++ (something that prints "Hello there, world!" is traditional) and figure out how to compile and run it from the command line. Then, once you've got a compiled version, start it up under the debugger and play around with breakpoints, printing expressions, etc, etc. Once you've got this simplest program up and running and you sort of understand what the debugger is telling you, add a class, function, struct, or whatever else feels like a good small step and go through the cycle again. You'll proceed much faster this way than if you start with a very large program.
Still at a very high level, here are a handful of topics you'll need to figure out at least a bit about. Note that the "learn by starting small" approach works well for any of the topics below.
Running g++: it has pretty good online documentation for the command line syntax, and although you're bound to find it intimidating at first, try to look for the simplest starting point.
Find a text editor to use. Vim and emacs are traditional (and very very powerful) but both have a relatively steep learning curve. If you have someone around to help you, that's so much the better. There are other alternatives, but as an emacs user myself, I'm afraid I'm not that familiar with them.
Get familiar with gdb. It's an incredibly powerful tool for understanding your program. Again, it has extensive online documentation that will repay close reading.
Some familiarity with standard unix commands will be useful: ls, cd, and moving basics of the navigating unix directories; grep for quickly searching source files.
You'll have to get used to the command line approach versus the ide approach. The former is the traditional unix developer model, where you put together the operations you want from a collection of other tools, rather than having the ide hide most of this knowledge from you.
If your project is multi-file, and especially if it's a full-semester project, you might also consider learning something about the following topics.
Make is a tool for describing how to compile and link a multi-file project so that you don't have to remember how to do it by hand each time. Make, unfortunately, has a well-deserved reputation for being tricky to use, but this is mostly true in very large projects spanning multiple directories, and there are probably good simple examples online.
I would strongly consider making use of a source code control system such as git or hg, even for a few relatively small project. It's so much safer to have an archived version of what you've done so that you can back up quickly. Both git and hg are overkill for a small one-shot project, but they are worth learning on their own. Conventional wisdom as I understand it today is that they're very similar in philosophy and core functionality, but that hg is definitely a bit more consistent at the command line level, and therefore easier to start with.
I suspect this is rather intimidating, especially if you've got effectively no exposure to a unix command environment before. I re-emphasize my first piece of advice above: learn by starting simple and experimenting. This minimizes the amount of new stuff you're having to wrap your head around at any given point in time.

Linux kernel/os source code documentation?

Is there a Linux distro (other than Minix) with good documentation for the source code? Or, is there some good documentation to describe the general Linux source code?
I have downloaded the Kernel source code, but, it is (unsurprisingly) a little overwhelming to find my way around and I wondered if there were some higher-level documentation to go with how the Linux kernel works?
Have you tried having a look on The linux documentation project I've find it quietly exhaustive regarding linux
They have a section The Linux Kernel wich is an online book that explains
how the linux kernel works and why it does behaves in certain ways, you should deffinitely
look into it because it's very well made.
Some of the Linux kernel code has decent commenting as documentation, but if you're going to be getting into kernel development, I'd recommend picking up a good book. A good, relatively easy-to-read one is Linux Kernel Development, by Robert Love. I got started on the Second Edition when I was in college, and keep a copy of the third on my bookshelf now.
I also find the Linux Cross Reference site helpful in jumping around the kernel source code. It's nice for tracking down functions that are in different files, and getting at what you need.
If you want to learn about operating systems and their basics, I strongly suggest you to start with a small kernel and then ramp up to learn about Linux. Starting with an operating system like Linux would be overwhelming in terms of code and documentation.
There is XV6 operating system which follows the basic Unix notion of files and processes. You can get the code listing and the documentation explaining the code properly. Here is a link to it. link.
Since academia is using this course as a baseline, I think you should get good support for understanding the same.
Linux Core Kernel Commentary is a little dated, but is still an excellent source of info.
For something which is not obsolete (like kernel.org/doc is), you may see:
Free Electrons Linux/Documentation/ (3.8)
Linux Cross Reference kernel/Documentation/
kernel-doc (3.6.10)
The first is the one I prefer personally (clean, readable, pleasant, up‑to‑date).
The second is the most well known.
The third, is for download, if you wish to browse and search it off‑line (may be handy in some case).
My two cents as a side note before I leave: I feel it's weird how for such a famous stuff as the Linux kernel is, when you search the web for documentation, you get masses of obsolete documentations, and how the rather up‑to‑date ones seems to be rather hidden and far from the top position of search engines.

How to use 'copy_to_user'?

I have to add a system call in linux kernel that will print the process tree showing only the PIDs to user code. I have to use copy_to_user here. But I am not understanding the use of this function. Could any of u give an example of how it works, including the user-side code and added system code?.....Any easy/simple example would be great for me...:)
Thanks.
I suggest you read through the Linux Device Driver book. It's freely available online at http://lwn.net/Kernel/LDD3/. Although it's geared towards device drivers, it covers most of the key aspects for communicating between kernel and user space and includes multiple examples.
By the way, this sounds like a homework question. If so, your question should have the 'homework' Tag associated with it.

How to code in the unix spirit? (small single task tools)

I have written a little script which retrieves pictures and movies from my camera and renames them based on their date and then copies them on my harddrive, managing conflicts automatically (same name? same size? same md5?)
Works pretty well.
But this is ONE script.
From time to time I need to check if a picture is already hidden somewhere in a volume, so I'd like to apply the "conflict manager" only. I guess if I had properly followed the unix spirit of tiny single-task tools, I could do that.
What are the best resources, best practices and your experience on the subject?
Thank you.
Edit : Although I'd love to read unix books and have a deep understanding of the subject, I am looking for the Great Principles first. Plus I tend to limit myself to online resources.
I would look at the book called The Art of Unix Programming.
I've found that most code doesn't start out being reusable, it evolves to be. Take your existing code and factor out the "conflict manager" portion into its own function or program, then call that program instead of having it be a part of your original application. After that you'll be able to reuse that part of your code that you have a need to reuse. Sometimes it's impossible to design software up front for reusability because you simply don't know which parts you'll want to reuse.
As for resources, it seems like the store shelves are packed with books for Linux desktop users and system administrators, but it's hard to find good Linux programming books. A few good ones:
Beginning Linux Programming
Professional Linux Programming
Linux Programming by Example: The Fundamentals
The Linux Programmer's Toolbox
Lastly, Eric Raymond has made The Art of Unix Programming available online for free.
Check out this book:
The Art of Unix Programming by Eric S. Raymond
http://www.amazon.com/UNIX-Programming-Addison-Wesley-Professional-Computing/dp/0131429019
Here is his website:
http://www.catb.org/~esr/writings/taoup/
Personally, whenever I see that the script I'm planning to write will be longer than a dozen lines, I use python instead of shell script. One neat trick with python scripting is that it's very easy to program in a style where you create both "unix spirit" command-line tools and libraries. E.g. for your "conflict manager", create a file (python module) and put the functionality in functions and/or classes, and then at the end you can put a python "main" function (the usual if __name__=='__main__': dance) where you parse command line options (use the builtin OptionParser module for this, it's very nice!) and use the functionality in the functions/classes.
This way you can use the utility both as a stand-alone command line program, or you can import the module in another python script and use the functionality defined there via functions/classes rather than parsing input.
Start with wikipedia (Dataflow programming)
The book Software Tools (amazon) by Kernighan and Plauger is a classic on this subject. I think it should be required reading for any serious student of software development.
- the art of UNIX programming - is quite a nice book ok "the unix way", in so far as one exists. OTOH if the way is "do as little work as gets your job done", you may already be there. :)
I think some of the keys for good gnu code, are:
Handling the system signals properly, like deattaching hard drive files if SIGTERM is received.
Proper use of pipes and standard input/output
Follwing common command line flag rules
I would also recommend this book. Pretty old, but I think is quite clear explaining the principles of unix.

Resources