How to build Linux system from kernel to UI layer - linux

I have been looking into MeeGo, maemo, Android architecture.
They all have Linux Kernel, build some libraries on it, then build middle layer libraries [e.g telephony, media etc...].
Suppose i wana build my own system, say Linux Kernel, with some binariers like glibc, Dbus,.... UI toolkit like GTK+ and its binaries.
I want to compile every project from source to customize my own linux system for desktop, netbook and handheld devices. [starting from netbook first :)]
How can i build my own customize system from kernel to UI.

I apologize in advance for a very long winded answer to what you thought would be a very simple question. Unfortunately, piecing together an entire operating system from many different bits in a coherent and unified manner is not exactly a trivial task. I'm currently working on my own Xen based distribution, I'll share my experience thus far (beyond Linux From Scratch):
1 - Decide on a scope and stick to it
If you have any hope of actually completing this project, you need write an explanation of what your new OS will be and do once its completed in a single paragraph. Print that out and tape it to your wall, directly in front of you. Read it, chant it, practice saying it backwards and whatever else may help you to keep it directly in front of any urge to succumb to feature creep.
2 - Decide on a package manager
This may be the single most important decision that you will make. You need to decide how you will maintain your operating system in regards to updates and new releases, even if you are the only subscriber. Anyone, including you who uses the new OS will surely find a need to install something that was not included in the base distribution. Even if you are pushing out an OS to power a kiosk, its critical for all deployments to keep themselves up to date in a sane and consistent manner.
I ended up going with apt-rpm because it offered the flexibility of the popular .rpm package format while leveraging apt's known sanity when it comes to dependencies. You may prefer using yum, apt with .deb packages, slackware style .tgz packages or your own format.
Decide on this quickly, because its going to dictate how you structure your build. Keep track of dependencies in each component so that its easy to roll packages later.
3 - Re-read your scope then configure your kernel
Avoid the kitchen sink syndrome when making a kernel. Look at what you want to accomplish and then decide what the kernel has to support. You will probably want full gadget support, compatibility with file systems from other popular operating systems, security hooks appropriate for people who do a lot of browsing, etc. You don't need to support crazy RAID configurations, advanced netfilter targets and minixfs, but wifi better work. You don't need 10GBE or infiniband support. Go through the kernel configuration carefully. If you can't justify including a module by its potential use, don't check it.
Avoid pulling in out of tree patches unless you absolutely need them. From time to time, people come up with new scheduling algorithms, experimental file systems, etc. It is very, very difficult to maintain a kernel that consumes from anything else but mainline.
There are exceptions, of course. If going out of tree is the only way to meet one of your goals stated in your scope. Just remain conscious of how much additional work you'll be making for yourself in the future.
4 - Re-read your scope then select your base userland
At the very minimum, you'll need a shell, the core utilities and an editor that works without an window manager. Paying attention to dependencies will tell you that you also need a C library and whatever else is needed to make the base commands work. As Eli answered, Linux From Scratch is a good resource to check. I also strongly suggest looking at the LSB (Linux standard base), this is a specification that lists common packages and components that are 'expected' to be included with any distribution. Don't follow the LSB as a standard, compare its suggestions against your scope. If the purpose of your OS does not necessitate inclusion of something and nothing you install will depend on it, don't include it.
5 - Re-read your scope and decide on a window system
Again, referring to the everything including the kitchen sink syndrome, try and resist the urge to just slap a stock install of KDE or GNOME on top of your base OS and call it done. Another common pitfall is to install a full blown version of either and work backwards by removing things that aren't needed. For the sake of sane dependencies, its really better to work on this from bottom up rather than top down.
Decide quickly on the UI toolkit that your distribution is going to favor and get it (with supporting libraries) in place. Define consistency in UIs quickly and stick to it. Nothing is more annoying than having 10 windows open that behave completely differently as far as controls go. When I see this, I diagnose the OS with multiple personality disorder and want to medicate its developer. There was just an uproar regarding Ubuntu moving window controls around, and they were doing it consistently .. the inconsistency was the behavior changing between versions. People get very upset if they can't immediately find a button or have to increase their mouse mileage.
6 - Re-read your scope and pick your applications
Avoid kitchen sink syndrome here as well. Choose your applications not only based on your scope and their popularity, but how easy they will be for you to maintain. Its very likely that you will be applying your own patches to them (even simple ones like messengers updating a blinking light on the toolbar).
Its important to keep every architecture that you want to support in mind as you select what you want to include. For instance, if Valgrind is your best friend, be aware that you won't be able to use it to debug issues on certain ARM platforms.
Pretend you are a company and will be an employee there. Does your company pass the Joel test? Consider a continuous integration system like Hudson, as well. It will save you lots of hair pulling as you progress.
As you begin unifying all of these components, you'll naturally be establishing your own SDK. Document it as you go, avoid breaking it on a whim (refer to your scope, always). Its perfectly acceptable to just let linux be linux, which turns your SDK more into formal guidelines than anything else.
In my case, I'm rather fortunate to be working on something that is designed strictly as a server OS. I don't have to deal with desktop caveats and I don't envy anyone who does.
7 - Additional suggestions
These are in random order, but noting them might save you some time:
Maintain patch sets to every line of upstream code that you modify, in numbered sequence. An example might be 00-make-bash-clairvoyant.patch, this allows you to maintain patches instead of entire forked repositories of upstream code. You'll thank yourself for this later.
If a component has a testing suite, make sure you add tests for anything that you introduce. Its easy to just say "great, it works!" and leave it at that, keep in mind that you'll likely be adding even more later, which may break what you added previously.
Use whatever version control system is in use by the authors when pulling in upstream code. This makes merging of new code much, much simpler and shaves hours off of re-basing your patches.
Even if you think upstream authors won't be interested in your changes, at least alert them to the fact that they exist. Coordination is essential, even if you simply learn that a feature you just put in is already in planning and will be implemented differently in the future.
You may be convinced that you will be the only person to ever use your OS. Design it as though millions will use it, you never know. This kind of thinking helps avoid kludges.
Don't pull upstream alpha code, no matter what the temptation may be. Red Hat tried that, it did not work out well. Stick to stable releases unless you are pulling in bug fixes. Major bug fixes usually result in upstream releases, so make sure you watch and coordinate.
Remember that it's supposed to be fun.
Finally, realize that rolling an entire from-scratch distribution is exponentially more complex than forking an existing distribution and simply adding whatever you feel that it lacks. You need to reward yourself often by booting your OS and actually using it productively. If you get too frustrated, consistently confused or find yourself putting off work on it, consider making a lightweight fork of Debian or Ubuntu. You can then go back and duplicate it entirely from scratch. Its no different than prototyping an application in a simpler / rapid language first before writing it for real in something more difficult. If you want to go this route (first), gNewSense offers utilities to fork your own OS directly from Ubuntu. Note, by default, their utilities will strip any non free bits (including binary kernel blobs) from the resulting distro.
I strongly suggest going the completely from scratch route (first) because the experience that you will gain is far greater than making yet another fork. However, its also important that you actually complete your project. Best is subjective, do what works for you.
Good luck on your project, see you on distrowatch.

Check out Linux From Scratch:
Linux From Scratch (LFS) is a project
that provides you with step-by-step
instructions for building your own
customized Linux system entirely from
source.

Use Gentoo Linux. It is a compile from source distribution, very customizable. I like it a lot.

Related

How to proceed with Linux source code customization?

I am a non CS/IT student, but having knowledge of C, Java, DS and Algorithms. Now-a-days I am focusing on operating system and had gained some of its concepts. But I want some practical knowledge of it. Merely writing algo code in java/c has no fun in doing. I have gone through many articles where they mentioned we can customize source code of Linux-kernel.
I want to start customizing the kernel as I move ahead in the learning of OS concepts and apply the same. It will make two goals achievable 1. I will gain practical idea of the operating system 2. I will have a project.
Problem which I face-
1. From where to get the source code? Which source code should I download? Also the documentation if possible.
https://www.kernel.org/
I went in there but there are so many of them which one will be better?
2. How will I customize the code once I have it?
Please give me suggestions with detail about how I should start this journey (of changing source code to customize Linux).
Moreover I am using Windows 8.
I recommend first reading several books on OSes and on programming. You need a broad CS culture (if possible get a CS degree)
I am a non CS/IT student,
You'll better become one, or else spend years of work to learn all the stuff a CS graduate student has learnt.
First, you need to be very familiar with Linux programming on user side (application programs). So read at least Advanced Linux Programming and study the source code of several programs, including shells (and some kind of servers). Read also carefully syscalls(2). Explore the state of your kernel (e.g. thru proc(5)...). Look into https://kernelnewbies.org/
I also recommend learning several programming languages. You should in particular read SICP, an excellent introduction to programming. Read also some book like programming language pragmatics. Read something about continuation and continuation passing style. Read the Dragon book. Read some Introduction to Algorithms. Read something about computer architecture and instruction set architecture
Merely writing algo code in java/c has no fun in doing.
But the kernel is also written in C (mostly) and full of algorithmic code. What makes you think you'll get more fun in it?
I want to start customizing the kernel as I move ahead in the learning of OS concepts and apply the same.
But why? Why don't you also consider studying and contributing to some user-level code
I would recommend first reading a good book on OSes in general, notably Operating Systems: Three Easy Pieces. Look also on OSdev.
At last, the general advice about kernel programming is don't. A common mistake is to try adding code inside the kernel to solve some issue that can and should be solved in user-land.
How will I customize the code once I have it?
You probably should not customize the kernel, but if you did you'll use familiar tools (a good source code editor like emacs or vim, a compiler and linker on the command line, a build automation tool like make). Patching the kernel is similar to patching some other free software. But testing your kernel is harder (because you'll often reboot).
You'll also find several books explaining the Linux kernel.
If you still want to customize the kernel you should first try to code some kernel module.
Moreover I am using Windows 8.
This is a huge mistake. You first need to be an advanced Linux user. So wipe out Windows from your computer, and install some Linux distribution -I recommend Debian- (and use only Linux, no more Windows). Become familiar with command line.
I seriously recommend to avoid working on the kernel as your first project.
I strongly recommend looking at some existing user-land free software project first (there are thousands of them, notably on github, e.g. choose some package in your distribution, study its source code, work on it, propose the patch to the community). Be able to build from source code a lot of things.
A wise man once said you "must act your way into right thinking, as you cannot think your way into right acting". In your case, you'll need to act as an experienced programmer would act, which means before we write any code, we need to answer some questions.
What do we want to change?
Why do we want to change it?
What are the repercussions of this change (ie what other functions - out of all the 10's of millions of lines of source code - call this function)?
After we've made the change, how are we going to compile it? In other words, there is a defined process for this. What is it?
After we compile our new kernel/module, how are we going to test it?
A good start, in addition to the answer that was just posted, would be to run LFS (Linux from Scratch). Get a successful install of that and use it as a starting point.
Now, since we're experienced programmers, we know that tinkering with a 10M+ line codebase is a recipe for trouble; we need a bit more direction than that. Here's a list of bugs that need to be fixed: https://bugzilla.kernel.org/buglist.cgi?chfield=%5BBug%20creation%5D&chfieldfrom=7d
I, for one, would be glad to see the one called "AUFS hangs on fanotify" go away, as I use AUFS with Docker on a daily basis.
If, down the line, you decide you'd rather hack on something besides the kernel, there are plenty of other options.
From your question it follows that you've already gained some concepts of an operating system. However, if you feel that it's still insufficient, it is OK to spend more time on learning. An operating system (mainly, a kernel) has certain tasks to perform like memory management (or memory protection), multiprogramming, hardware abstraction and so on. Neither of the topics may be neglected - they are all as important. So, if you have some time, you may refer to such useful books as "Modern Operating Systems" by Andrew Tanenbaum. Special books like that will shed much light on all important aspects of a modern OS. Suffice it to say, Linux kernel itself was started by Linus Torvalds because of a strong inspiration by MINIX - an educational project by A. Tanenbaum.
Such a cumbersome project like an OS kernel (BSD, Linux, etc.) contains lots of code. Many people are collaborating to write or enhance whatever parts of the kernel. So, there is a common and inevitable need to use a version control system. So, if you have an intention to submit your code to the kernel in future, you also have to have hands on with version control. Particularly, Linux relies on Git SCM (software configuration management - a synonym for version control).
So, once you have some knowledge of Git, you can install it on your computer and download Linux source code: git clone https://github.com/torvalds/linux.git
Determine your goals at Linux kernel modification. What do you want to achieve? Perhaps, you have a network card which you suspect to miss some features in Linux? Take a look at the other vendors' drivers and make an attempt to fix the driver of interest to include the features. Of course, this will require some knowledge of the HW, and, if the features are HW dependent, you will unlikely succeed to elaborate your code without special knowledge. But, in general, - if you are trying to make an enhancement, it assumes that you are an experienced Linux user yourself. Otherwise, how will you understand that some fixes/enhancements/etc. are required? So, I can't help but agree with the proposal to postpone Windows 8 for a while and start using some Linux distribution (eg. Debian).
If you succeed to determine your goals (eg. if you find a paper describing some desired changes in Linux kernel or if you decide to enhance some device drivers / write your own), you will be able to try it hands on. However, you still might need some helpful books, but, in this case, some Linux-specific ones. Also, writing C code for the kernel itself will require one important detail - you will need to comply with a so called coding standard, otherwise Linux kernel maintainers will not be able to accept your patches.
So, I made an attempt to outline some tips based on your current question. Of course, the job of kernel development has far more broad prerequisites, but these are which are just obvious.

Securely running user's code

I am looking to create an AI environment where users can submit their own code for the AI and let them compete. The language could be anything, but something easy to learn like JavaScript or Python is preferred.
Basically I see three options with a couple of variants:
Make my own language, e.g. a JavaScript clone with only very basic features like variables, loops, conditionals, arrays, etc. This is a lot of work if I want to properly implement common language features.
1.1 Take an existing language and strip it to its core. Just remove lots of features from, say, Python until there is nothing left but the above (variables, conditionals, etc.). Still a lot of work, especially if I want to keep up to date with upstream (though I just could also just ignore upstream).
Use a language's built-in features to lock it down. I know from PHP that you can disable functions and searching around, similar solutions seem to exist for Python (with lots and lots of caveats). For this I'd need to have a good understanding of all the language's features and not miss anything.
2.1. Make a preprocessor that rejects code with dangerous stuff (preferably whitelist based). Similar to option 1, except that I only have to implement the parser and not implement all features: the preprocessor has to understand the language so that you can have variables named "eval" but not call the function named "eval". Still a lot of work, but more manageable than option 1.
2.2. Run the code in a very locked-down environment. Chroot, no unnecessary permissions... perhaps in a virtual machine or container. Something in that sense. I'd have to research how to achieve this and how to make it give me the results in a secure way, but that seems doable.
Manually read through all code. Doable on a small scale or with moderators, though still tedious and error-prone (I might miss stuff like if (user.id = 0)).
The way I imagine 2.2 to work is like this: run both AIs in a virtual machine (or something) and constrain it to communicate with the host machine only (no other Internet or LAN access). Both AIs run in a separate machine and communicate with each other (well, with the playing field, and thereby they see each other's positions) through an API running on the host.
Option 2.2 seems the most doable, but also relatively hacky... I let someone's code loose in a virtualized or locked down environment, hoping that that'll keep them in while giving them free game to DoS or break out of the environment. Then again, most other options are not much better.
TL;DR: in essence my question is: how do I let people give me 'logic' for an AI (which I think is most easily done using code) and then run that without compromising the functionality of the system? There must be at least 2 AIs working on the same playing field.
This is really just a plugin system, so researching how others implement plugins is a good starting point. In particular, I'd look at web browsers like Chrome and Safari and their plugin systems.
A common theme in modern plugins systems is process isolation. Ideally you should run the plugin in its own process space in a sandbox. In OS X look at XPC, which is designed explicitly for this problem. On Linux (or more portably), I would probably look at NaCl (Native Client). The JVM is also designed to provide sandboxing, and offers a rich selection of languages. (That said, I don't personally consider the JVM a very strong sandbox. It's had a history of security problems.)
In general, my preference on these kinds of projects is a language-agnostic API. I most often use REST APIs (or "REST-like"). This allows the plugin to be highly restricted, while not restricting the language choice. I like simple HTTP for communications whenever possible because it has rich support in numerous languages, so it puts little restriction on the plugin. In fact, given your description, you wouldn't even have to run the plugin on your hardware (and certainly not on the main server). Making the plugins remote clients removes many potential concerns.
But ultimately, I think something like your "2.2" is the right direction.

Will writing C in both Windows and Linux cause compiling problems?

I work from 2 different machines. One is Windows and the other is Linux. If I alternately work on the same project but switch between both OSes, will I eventually run into compiling errors? I ask because maybe there are standards supported by one but not by the other.
That question is a pretty broad one and it depends, strictly speaking, on your tool chain. If you were to use the same tool chain (e.g. GCC/MinGW or Clang), you'd be minimizing the chance for this class of errors. If you were to use Visual Studio on Windows and GCC or Clang on the Linux side, you'd run into more issues alone because some of the headers differ. So once your program leaves the realm of strict ANSI C (C89) you'll be on your own.
However, if you aren't careful you may run into a lot of other more profane errors, such as the compiler on Linux choking on the line endings if you didn't tell your editor on the Windows side to use these.
Ah, and also keep in mind that if you want to actually cross-compile, GCC may be the best choice and therefore the first part I mentioned in my answer becomes a moot point. GCC is a proven choice on both ends. And given your question it's unlikely that you are trying to write something like a kernel mode driver - which would be fundamentally different.
That may be only if your application use some specific API.
It is entirely possible to write code that works on both platforms, with no issues to compile the code. It is, however, not without some difficulties. Compilers allow you to use non-standard features in the compiler, and it's often hard to do more fancy user interfaces (even if it's still just text) because as soon as you start wanting to do more than "read a line of text as it is entered in a shell", it's into "non-standard" land.
If you do find yourself needing to do more than what the standard C library can do, make sure you isolate those parts of the code into a separate file (or a couple of files, one for Linux/Unix style systems and one for Windows systems).
Using the same compiler (gcc) would help avoiding problems with "compiler B doesn't compile code that works fine in compiler A".
But it's far from an absolute necessity - just make sure you compile the code on both platforms and with all of your "suppoerted" compilers often enough that you haven't dug a very deep hole that is hard to get out of before you discover that "it's not working on the other system". It certainly helps if you have (at least) a virtual machine running the other OS, so you can easily try both variants.
Ideally, you want to set up an automated system, such that when you change the code [and feel that the changes are "complete"], it automatically gets built on both platforms and all compilers you want to use. And if possible, also automatically tested!
I would also seriously consider using version control - that way, when something breaks on one or the other side, you can go back and look at what the code looked like before it stopped working, and (hopefully) find the reason it broke much quicker than "Hmm, I think it's the change I made to foo.c, lets take that out... No, not that one, ok how about the change here..." - at least with version control, you can say "Ok, so version 1234 doesn't work, let's try version 1220 - ok, that works. Now try 1228, still works - so change between 1229 and 1234 - try 1232, ah, it's broken..." No editing files and you can still go to any other version you like with very little difficulty. I have used Mercurial quite a bit, git a little bit, some subversion, and worked on a project in Perforce for a few years. All of these are good - personally, I think I prefer mercurial.
As a side-effect: Most version control systems also deal with filename and line endings in the saner way than doing this manually.
If you combine your version control system with a "automated build and test-system", such as Jenkins, you can get everything very automated. Jenkins is free and runs on both Windows and Linux, and you can use it to automatically build and test your code as and when you submit the code to the version control system.
It will not create a problem until you recompile the source code in the respective OS. If you wanna run your compiled file generated by windows(.exe or .obj), into linux or vice-versa then it will definitely create a problem and wont be possible. But you can move you source code (file with extension .c/.c++) into any of the os. And sometimes it also create problems with different header files, so take care of that also. Best practice is to use single OS for you entire project, avoid multiple os until it is extremely necessary.

Contributing to a Linux distribution

I'm interested in contributing to a Linux distro, but regarding the various distro's developer communities, I'm having a bit of trouble figuring out which one I'd most like to join.
What languages I know: C, C++, Lua, Python, and fairly familiar with Perl (though I wouldn't say I "know" it). In particular, I have very little experience with x86 assembly besides hacking stuff together for performance tweaks, though that will be partially rectified soon.
What I'm looking for: A community that provides plenty of opportunities for developers to work on various aspects of the distribution. To be honest I'm most interested in reading and working on the kernel source (in which case the distro doesn't matter), but it's pretty daunting and I figure getting into the Linux community and working with experienced Linux developers might give me a better idea of how to jump into the guts(let me know if this is bogus, or if you have any advice regarding that).
So...
Which distro has the "best" developer community in terms of organization, people who are fun to work with, and opportunities to contribute?
I've read various "Contributing to XXX" pages and mailing lists for distros like Ubuntu, OpenSuse, Fedora, etc. but I'd rather get a more personal testament from an actual developer.
Unless you have a specific desire to learn the ins and outs of various packaging formats you would probably be better off contributing directly upstream to applications/libraries that you find interesting. While individual distributions often have a few management applications that are unique(ish) to them most core applications and libraries are shared between them.
As you have expressed an interest in guts it would make sense to stick to one of the main community distros (Fedora and Ubuntu/Debian) as the rest tend to be variations on a base distro. The other option is to choose a source based distribution which have a number of advantages to developers although you may find yourself spending a bit of time keeping your machine trim.
As I'm a developer I personally use Gentoo which gives me a number of things:
Rolling release: New versions of applications are generally available soon after release
Stable/Unstable mix: I can run stable core with bleeding edge on upstream packages I care about
Development ready: Any installed package is by default a "dev" package, the distinction between buildtime/runtime dependencies is blurred
Packaging is easy: If it's a simple as "configure/make/make install" writing and ebuild is very easy.
Contribution is easy: Contributing new ebuilds is fairly painless, from there you can get as involved as you like
Of course there are downsides, not least of all your machine spends a considerable amount of time building things and if your run a large selection of "unstable" packages you may find you occasionally need to fix-up your machine. However I find these disadvantages minor compared to giving me an up to date platform with which to contribute to upstream from.
If you want to work with the kernel then you shouldn't be picking a distribution, but rather working upstream.
Somebody correct me if I'm wrong, but I think that contributing to Ubuntu can be very easy and fun if you use Launchpad. I haven't tried contributing code, but I contribute translations and file bugs on some projects.

How do you balance the conflicting needs of backwards compatibility and innovation?

I work on an application that has a both a GUI (graphical) and API (scripting) interface. Our product has a very large installed base. Many customers have invested a lot of time and effort into writing scripts that use our product.
In all of our designs and implementation, we (understandably) have a very strict requirement to maintain 100% backwards compatibility. A script which ran before must continue to run in exactly the same way, without any modification, when we introduce a new software version.
Unfortunately, this requirement sometimes ties our hands behind our back, as it really restricts our ability to innovate and come up with new and better ways of doing things.
For example, we might come up with a better (and more usable) way of achieving a task which is already possible. It would be desirable to make this better way the default way, but we can't do this as it may have backwards compatibility implications. So we are stuck with leaving the new (better) way as a mode, that the user must "turn on" before it becomes available to them. Unless they read the documentation or online help (which many customers don't do), this new functionality will remain hidden forever.
I know that Windows Vista annoyed a lot of people when it first came out, because of all the software and peripherals which didn't work on it, even when they worked on XP. It received a pretty bad reception because of this. But you can see that Microsoft have also succeeded in making some great innovations in Vista, at the expense of backwards compatibility for a lot of users. They took a risk. Did it pay off? Did they make the right decision? I guess only time will tell.
Do you find yourself balancing the conflicting needs of innovation and backwards compatibility? How do you handle the juggling act?
As far is my programming experience is concerned, if I'm going to fundamentally change something that will prevent past incoming data to be used correctly, I need to create an abstraction layer for the old data where it can be converted for use in the new format.
Basically I set the "improved" way as default and make sure through a converter it can read data of the old format, but save or store data as the new format.
I think the big thing here is test, test, test. Backwards compatibility shouldn't hinder forward progress.
Split development into two branches, one that maintains backwards compatibility and one for a new major release, where you make it clear that backwards compatibility is being broken.
The critical question that you need to ask is wether the customers want/need this "improvement" even if you perceive it as one your customers might not. Once a certain way of doing things has been established changing the workflow is a very "expensive" operation. Depending on the computer savyness of your users it might take some a long time to adjust to the change in the UI.
If you are dealing with clients innovation for innovation's sake is not always a good thing as fun as it might be for you to develop these improvements.
You could alawys look for innovative ways to maintain backwards compatibilty.

Resources