Parsing Linux source into abstract syntax tree - linux

I'd like to perform source code analysis of Linux kernel, but to do that, I first need to parse it. What are my options? I'd prefer an AST usable from python, but any other language is ok too.
Apparently CIL is able to parse whole kernel, but it's not clear from the website, how to do that.

I'd recommend starting with the sparse static analysis tool. Because sparse was designed specifically to assist the kernel developers in performing static analysis on the kernel, you can have some level of assurance that it really ought to parse the combination of C99 and GNU extensions that are used in the kernel sources. The code I've examined looked clean and straight forward but I never tried to extend it in any fashion. The Documentation/sparse.txt file has a very short synopsis of using sparse on the kernel sources, if you want a very high-level overview.
Another option is GCC MELT, a tool designed to make it easier to build plugins for the gcc compiler. Using it would require knowing enough gcc internals to find your way around, but MELT does look far easier than coding a similar plugin directly in C.

You can check the page Parsing Kernel
about tools comparision. The winner seems to be KDevelop.
Regards,

Do you really need an AST? Or a lower level intermediate representation would be just enough? For both options, you can use Clang, and either analyse its AST (sadly, with C++ only) or an LLVM IR.
CIL is also an option, but you'd need to write your analysis tool in OCaml. cilly is its drop-in replacement for gcc, but it might need some hacking for using it with such a non-trivial build sequence as the Linux kernel. Just using --merge won't be sufficient.

Related

Measure function time execution without modification code

I have found some piece of code (function) in library which could be improved by the optimization of compiler (as the main idea - to find good stuff to go deep into compilers). And I want to automate measurement of time execution of this function by script. As it's low-level function in library and get arguments it's difficult to extract this one. Thus I want to find out the way of measurement exactly this function (precise CPU time) without library/application/environment modifications. Have you any ideas how to achieve that?
I could write wrapper but I'll need in near future much more applications for performance testing and I think to write wrapper for every one is very ugly.
P.S.: My code will run on ARM (armv7el) architecture, which has some kind of "Performance Monitor Control" registers. I have learned about "perf" in linux kernel. But don't know is it what I need?
It is not clear if you have access to the source code of the function you want to profile or improve, i.e. if you are able to recompile the considered library.
If you are using a recent GCC (that is 4.6 at least) on a recent Linux system, you could use profilers like gprof (assuming you are able to recompile the library) or better oprofile (which you could use without recompiling), and you could customize GCC for your needs.
Be aware that like any measurements, profiling may alter the observed phenomenon.
If you are considering customizing the GCC compiler for optimization purposes, consider making a GCC plugin, or better yet, a MELT extension, for that purpose (MELT is a high-level domain specific language to extend GCC). You could also customize GCC (with MELT) for your own specific profiling purposes.

How to create a language these days?

I need to get around to writing that programming language I've been meaning to write. How do you kids do it these days? I've been out of the loop for over a decade; are you doing it any differently now than we did back in the pre-internet, pre-windows days? You know, back when "real" coders coded in C, used the command line, and quibbled over which shell was superior?
Just to clarify, I mean, not how do you DESIGN a language (that I can figure out fairly easily) but how do you build the compiler and standard libraries and so forth? What tools do you kids use these days?
One consideration that's new since the punched card era is the existence of virtual machines already bountifully provided with "standard libraries." Targeting the JVM or the .NET CLR instead of ye olde "language walled garden" saves you a lot of bootstrapping. If you're creating a compiled language, you may also find Java byte code or MSIL an easier compile target than machine code (of course, if you're in this for the fun of creating a tight optimising compiler then you'll see this as a bug rather than a feature).
On the negative side, the idioms of the JVM or CLR may not be what you want for your language. So you may still end up building "standard libraries" just to provide idiomatic interfaces over the platform facility. (An example is that every languages and its dog seems to provide its own method for writing to the console, rather than leaving users to manually call System.out.println or Console.WriteLine.) Nevertheless, it enables an incremental development of the idiomatic libraries, and means that the more obscure libraries for which you never get round to building idiomatic interfaces are still accessible even if in an ugly way.
If you're considering an interpreted language, .NET also has support for efficient interpretation via the Dynamic Language Runtime (DLR). (I don't know if there's an equivalent for the JVM.) This should help free you up to focus on the language design without having to worry so much about the optimisation of the interpreter.
I've written two compilers now in Haskell for small domain-specific languages, and have found it to be an incredibly productive experience. The parsec library makes playing with syntax easy, and interpreters are very simple to write over a Haskell data structure. There is a description of writing a Lisp interpreter in Haskell that I found helpful.
If you are interested in a high-performance backend, I recommend LLVM. It has a concise and elegant byte-code and the best x86/amd64 generating backend you can find. There is an optional garbage collector, and some experimental backends that target the JVM and CLR.
You can write a compiler in any language that produces LLVM bytecode. If you are adventurous enough to learn Haskell but want LLVM, there are a set of Haskell-LLVM bindings.
What has changed considerably but hasn't been mentioned yet is IDE support and interoperability:
Nowadays we pretty much expect Intellisense, step-by-step execution and state inspection "right in the editor window", new types that tell the debugger how to treat them and rather helpful diagnostic messages. The old "compile .x -> .y" executable is not enough to create a language anymore. The environment is nothing to focus on first, but affects willingness to adopt.
Also, libraries have become much more powerful, noone wants to implement all that in yet another language. Try to borrow, make it easy to call existing code, and make it easy to be called by other code.
Targeting a VM - as itowlson suggested - is probably a good way to get started. If that turns out a problem, it can still be replaced by native compilers.
I'm pretty sure you do what's always been done.
Write some code, and show your results to the world.
As compared to the olden times, there are some tools to make your job easier though. Might I suggest ANTLR for parsing your language grammar?
Speaking as someone who just built a very simple assembly like language and interpreter, I'd start out with the .NET framework or similar. Nothing can beat the powerful syntax of C# + the backing of the entire .NET community when attempting to write most things. From here i designed a simple bytecode format and assembly syntax and proceeeded to write my interpreter + assembler.
Like i said, it was a very simple language.
You should not accept wimpy solutions like using the latest tools. You should bootstrap the language by writing a minimal compiler in Visual Basic for Applications or a similar language, then write all the compilation tools in your new language and then self-compile it using only the language itself.
Also, what is the proposed name of the language?
I think recently there have not been languages with ALL CAPITAL LETTER names like COBOL and FORTRAN, so I hope you will call it something like MIKELANG with all capital letters.
Not so much an implementation but a design decision which effects implementation - if you make every statement of your language have a unique parse tree without context, you'll get something that it's easy to hand-code a parser, and that doesn't require large amounts of work to provide syntax highlighting for. Similarly simple things like using a different symbol for module namespaces and object namespaces ( unlike Java which uses . for both package and class namespaces ) means you can parse the code without loading every module that it refers to.
Standard libraries - include the equivalent of everything in C99 standard libraries other than setjmp. Add whatever else you need for your domain. Work out an easy way to do this, either something like SWIG or an in-line FFI such as Ruby's [can't remember module name] and Python's ctypes.
Building as much of the language in the language is an option, but projects which start out doing either give up (rubinius moved to using C++ for parts of its standard library), or is only for research purposes (Mozilla Narcissus)
I am actually a kid, haha. I've never written an actual compiler before or designed a language, but I have finished The Red Dragon Book, so I suppose I have somewhat of an idea (I hope).
It would depend firstly on the grammar. If it's LR or LALR I suppose tools like Bison/Flex would work well. If it's more LL, I'd use Spirit, which is a component of Boost. It allows you to write the language's grammar in C++ in an EBNF-like syntax, so no muddling around with code generators; the C++ compiler compiles the grammar for you. If any of these fail, I'd write an EBNF grammar on paper, and then proceed to do some heavy recursive descent parsing, which seems to work; if C++ can be parsed pretty well using RDP (as GCC does it), then I suppose with enough unit tests and patience you could write entire compilers using RDP.
Once I have a parser running and some sort of intermediate representation, it then depends on how it runs. If it's some bytecode or native code compiler, I'll use LLVM or libJIT to process it. LLVM is more suited for general compilation, but I like the libJIT API and documentation better. Alternatively, if I'm really lazy, I'll generate C code and let GCC do the actual compilation. Another alternative, is to target an existing VM, like Parrot or the JVM or the CLR. Parrot is the VM being designed for Perl. If it's just an interpreter, I'll walk the syntax tree.
A radical alternative is to use Prolog, which has syntax features which remarkably simulate EBNF. I have no experience with it though, and if I am not wrong (which I am almost certainly going to be), Prolog would be quite slow if used to parse heavy duty programming languages with a lot of syntactical constructs and quirks (read: C++ and Perl).
All this I'll do in C++, if only because I am more used to writing in it than C. I'd stay away from Java/Python or anything of that sort for the actual production code (writing compilers in C/C++ help to make it portable), but I could see myself using them as a prototyping language, especially Python, which I am partial towards. Of course, I've never actually done any of this before, so I'm not one to say.
On lambda-the-ultimate there's a link to Create Your Own Programming Language by Marc-André Cournoyer, which appears to describe how to leverage some modern tools for creating little languages.
Just to clarify, I mean, not how do you DESIGN a language (that I can figure out fairly easily)
Just a hint: Look at some quite different languages first, before designing a new languge (i.e. languages with a very different evaluation strategy). Haskell and Oz come to mind. Though you should also know Prolog and Scheme. A year ago I also was like "hey, let's design a language that behaves exactly as I want", but fortunatly I looked at those other languages first (or you could also say unfortunatly, because now I don't know how I want a language to behave anymore...).
Before you start creating a language you should read this:
Hanspeter Moessenboeck, The Art of Niklaus Wirth
ftp://ftp.ssw.uni-linz.ac.at/pub/Papers/Moe00b.pdf
There's a big shortcut to implementing a language that I don't see in the other answers here. If you use one of Lukasiewicz's "unparenthesized" forms (ie. Forward Polish or Reverse Polish) you don't need a parser at all! With reverse polish, the dependencies go right-to-left so you simply execute each token as it's scanned. With forward polish, it's the reverse of that, so you actually execute the program "backwards", simplifying subexpressions until reaching the starting token.
To understand why this works, you should investigate the 3 primary tree-traversal algorithms: pre-order, in-order, post-order. These three traversals are the inverse of the parsing task that a language reader (i. parser) has to perform. Only the in-order notation "requires" a recursive decent to re-construct the expression tree. With the other two, you can get away with just a stack.
This may require more "thinking' and less "implementing".
BTW, if you've already found an answer (this question is a year old), you can post that and accept it.
Real coders still code in C. Just that it's a litte sharper.
Hmmm... language design? or writing a compiler?
If you want to write a compiler, you'd use Flex + Bison. (google)
Not an easy answer, but..
You essentially want to define a set of rules written in text (tokens) and then some parser that checks these rules and assembles them into fragments.
http://www.mactech.com/articles/mactech/Vol.16/16.07/UsingFlexandBison/
People can spend years on this, The above article talks about using two tools (Flex and Bison) That can be used to turn text into code you can feed to a compiler.
First I spent a year or so to actually think how the language should look like. At the same time I helped in developing Ioke (www.ioke.org) to learn language internals.
I have chosen Objective-C as implementation platform as it's fast (enough), simple and rich language. It also provides test framework so agile approach is a go. It also has a rich standard library I can build upon.
Since my language is simple on syntactic level (no keywords, only literals, operators and messages) I could go with Ragel (http://www.complang.org/ragel/) for building scanner. It's fast as hell and simple to use.
Now I have a working object model, scanner and simple operator shuffling plus standard library bootstrap code. I can even run a simple programs - as long as they fit in one file that is :)
Of course older techniques are still common (e.g. using Flex and Bison) many newer language implementations combine the lexing and parsing phase, by using a parser based on a parsing expression grammar (PEG). This works for recursive descent parsers created using combinators, or memoizing Packrat parsers. Many compilers are built using the Antlr framework also.
Use bison/flex which is the gnu version of yacc/lex. This book is extremely helpful.
The reason to use bison is it catches any conflicts in the language. I used it and it made my life many years easier (ok so i'm on my 2nd year but the first 6months was a few years ago writing it in C++ and the parsing/conflicts/results were terrible! :(.)
If you want to write a compiler obviously you need to read the Dragon Book ;)
Here is another good book that I have just read. It is practical and easier to understand than the Dragon Book:
http://www.amazon.co.uk/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=language+implementation+patterns&x=0&y=0
Mike --
If you're interested in an efficient native-code-generating compiler for Windows so you can get your bearings -- without wading through all the unnecessary widgets, gadgets, and other nonsense that clutter today's machines -- I recommend the Osmosian Order's Plain English development system. It includes a unique interface, a simplified file manager, a friendly text editor, a handy hexadecimal dumper, the compiler/linker (of course), and a wysiwyg page-layout application for documentation. Written entirely in Plain English, it is a quick download (less than a megabyte), small enough to understand in short order (about 25,000 lines of Plain English code, with just 4,000 in the compiler/linker), yet powerful enough to reproduce itself on a bottom-of-the-line Dell in less than three seconds. Really: three seconds. And it's free to all who write and ask for a copy, including the source code and and a rather humorous tongue-in-cheek 100-page manual. See www.osmosian.com for details on how to get a copy, or write to me directly with questions or comments: Gerry.Rzeppa#pobox.com

Bare metal cross compilers input

What are the input limitations of a bare metal cross compiler...as in does it not compile programs with pointers or mallocs......or anything that would require more than the underlying hardware....also how can 1 find these limitations..
I also wanted to ask...I built a cross compiler for target mips..i need to create a mips executable using this cross compiler...but i am not able to find where the executable is...as in there is 1 executable which i found mipsel-linux-cpp which is supposed to compile,assemble and link and then produce a.out but it is not doing so...
However the ./cc1 gives a mips assembly.......
There is an install folder which has a gcc executable which uses i386 assembly and then gives an exe...i dont understand how can the gcc exe give i386 and not mips assembly when i have specified target as mips....
please help im really not able to understand what is happ...
I followed the foll steps..
1. Installed binutils 2.19
2. configured gcc for mips..(g++,core)
I would suggest that you should have started two separate questions.
The GNU toolchain does not have any OS dependencies, but the GNU library does. Most bare-metal cross builds of GCC use the Newlib C library which provides a set of syscall stubs that you must map to your target yourself. These stubs include low-level calls necessary to implement stream I/O and heap management. They can be very simple or very complex depending on your needs. If the only I/O support is to a UART to stdin/stdout/stderr, then it is simple. You don't have to implement everything, but if you do not implement teh I/O stubs, you won't be able to use printf() for example. You must implement the sbrk()/sbrk_r() syscall is you want malloc() to work.
The GNU C++ library will work correctly with Newlib as its underlying library. If you use C++, the C runtime start-up (usually crt0.s) must include the static initialiser loop to invoke the constructors of any static objects that your code may include. The run-time start-up must also of course initialise the processor, clocks, SDRAM controller, timers, MMU etc; that is your responsibility, not the compiler's.
I have no experience of MIPS targets, but the principles are the same for all processors, there is a very useful article called "Building Bare Metal ARM with GNU" which you may find helpful, much of it will be relevant - especially porting the parts regarding implementing Newlib stubs.
Regarding your other question, if your compiler is called mipsel-linux-cpp, then it is not a 'bare-metal' build but rather a Linux build. Also this executable does not really "compile, assemble and link", it is rather a driver that separately calls the pre-processor, compiler, assembler and linker. It has to be configured correctly to invoke the cross-tools rather than the host tools. I generally invoke the linker separately in order to enforce decisions about which standard library to link (-nostdlib), and also because it makes more sense when a application is comprised of multiple execution units. I cannot offer much help other than that here since I have always used GNU-ARM tools built by people with obviously more patience than me, and moreover hosted on Windows, where there is less possibility of the host tool-chain being invoked instead (one reason why I have also avoided those tool-chains that rely on Cygwin)
EDIT
With more time available, I have rewritten my original answer in an attempt to provide something more useful.
I cannot provide a specific answer for your question. I have never tried to get code running on a MIPS machine. What I do have is plenty of experience getting a variety of "bare metal" boards up and running. All kinds of CPUs and all kinds of compilers and cross compilers. So I have an understanding of the principles that apply in all such situations. I will point out the kind of knowledge you will need to absorb before you can hope to succeed with a job like this, and hopefully I can list some links to resources to get you started on learning that knowledge.
I am worried you don't know that pointers are exactly the kind of thing a bare metal compiler can handle, they are a basic machine primitive. This tells me you are probably not an expert embedded developer who is just stuck in this particular scenario. Never mind. There isn't anything magic about programming an embedded system, and you can learn what you need to know.
The first step is getting to understand the relationship between C and the machine you wish to run code on. Basically C is a portable assembly language. This means that C is good for manipulating the basic operations of the machine. In this sense the basic operations of the machine are reading and writing memory locations, performing arithmetic and boolean operations on the data read from memory, and making branching and looping decisions based on that data. In particular the C concept of pointers allows you to manipulate data at locations in memory that you specify.
So far so good, but just doing raw computations in memory is not usually enough - you need a way to input and output data from memory. To do that you need to manipulate the hardware peripherals on your board. If the hardware peripherals are memory mapped then the machine registers used to control the peripherals look exactly like memory locations and C can manipulate them directly. Even in that case though, it is much more likely that doing useful I/O is best handled by extending the C core language with a library of routines provided just for that purpose. These library routines handle all the nasty details (timers, interrupts, non-memory mapped I/O) involved in manipulating the peripheral hardware on the board, and wrap them up with a convenient C function call interface. The idea is that you can go simply printf("hello world"); and the library call take care of the details of displaying the string.
An appropriately skilled developer knows how to adapt an existing I/O library to a new board, or how to develop new library routines to provide access to non-standard custom hardware. The classic way to develop these skills is to start with something simple, usually a LED for an output device, and a switch for an input device. Write a program that pulses a LED in a predictable way, or reads a switch and reflects in on a LED. The first time you get this working will be hugely satisfying.
Okay I have rambled enough. It is time to provide some more resources for you to study. The good news is that there's never been a better time to learn how things work at the interface between hardware and software. There is a wealth of freely available code and docs. Stackoverflow is a great resource as you know. Good luck! Links follow;
Embedded systems overview
Knowing the C language well is fundamental
Why not get your code working on a simulator before you try real hardware
Another emulated environment
Linux device drivers - an overlapping subject
Another book about bare metal programming

linux disassembler

how can I write just a simple disassembler for linux from scratches?
Are there any libs to use? I need something that "just works".
Instead of writing one, try Objdump.
Based on your comment, and your desire to implement from scratch, I take it this is a school project. You could get the source for objdump and see what libraries and techniques it uses.
The BFD library might be of use.
you have to understand the ELF file format first. Then, you can start processing the various sections of code according to the opcodes of your architecture.
You can use libbfd and libopcodes, which are libraries distributed as part of binutils.
http://www.gnu.org/software/binutils/
As an example of the power of these libraries, check out the Online Disassembler (ODA).
http://www.onlinedisassembler.com
ODA supports a myriad of architectures and provides a basic feature set. You can enter binary data in the Live View and watch the disassembly appear as you type, or you can upload a file to disassemble. A nice feature of this site is that you can share the link to the disassembly with others.
You can take a look at the code of ERESI
The ERESI Reverse Engineering Software Interface is a multi-architecture binary analysis framework with a tailored domain specific language for reverse engineering and program manipulation.

Why use build tools like Autotools when we can just write our own makefiles?

Recently, I switched my development environment from Windows to Linux. So far, I have only used Visual Studio for C++ development, so many concepts, like make and Autotools, are new to me. I have read the GNU makefile documentation and got almost an idea about it. But I am kind of confused about Autotools.
As far as I know, makefiles are used to make the build process easier.
Why do we need tools like Autotools just for creating the makefiles? Since all knows how to create a makefile, I am not getting the real use of Autotools.
What is the standard? Do we need to use tools like this or would just handwritten makefiles do?
You are talking about two separate but intertwined things here:
Autotools
GNU coding standards
Within Autotools, you have several projects:
Autoconf
Automake
Libtool
Let's look at each one individually.
Autoconf
Autoconf easily scans an existing tree to find its dependencies and create a configure script that will run under almost any kind of shell. The configure script allows the user to control the build behavior (i.e. --with-foo, --without-foo, --prefix, --sysconfdir, etc..) as well as doing checks to ensure that the system can compile the program.
Configure generates a config.h file (from a template) which programs can include to work around portability issues. For example, if HAVE_LIBPTHREAD is not defined, use forks instead.
I personally use Autoconf on many projects. It usually takes people some time to get used to m4. However, it does save time.
You can have makefiles inherit some of the values that configure finds without using automake.
Automake
By providing a short template that describes what programs will be built and what objects need to be linked to build them, Makefiles that adhere to GNU coding standards can automatically be created. This includes dependency handling and all of the required GNU targets.
Some people find this easier. I prefer to write my own makefiles.
Libtool
Libtool is a very cool tool for simplifying the building and installation of shared libraries on any Unix-like system. Sometimes I use it; other times (especially when just building static link objects) I do it by hand.
There are other options too, see StackOverflow question Alternatives to Autoconf and Autotools?.
Build automation & GNU coding standards
In short, you really should use some kind of portable build configuration system if you release your code to the masses. What you use is up to you. GNU software is known to build and run on almost anything. However, you might not need to adhere to such (and sometimes extremely pedantic) standards.
If anything, I'd recommend giving Autoconf a try if you're writing software for POSIX systems. Just because Autotools produce part of a build environment that's compatible with GNU standards doesn't mean you have to follow those standards (many don't!) :) There are plenty of other options, too.
Edit
Don't fear m4 :) There is always the Autoconf macro archive. Plenty of examples, or drop in checks. Write your own or use what's tested. Autoconf is far too often confused with Automake. They are two separate things.
First of all, the Autotools are not an opaque build system but a loosely coupled tool-chain, as tinkertim already pointed out. Let me just add some thoughts on Autoconf and Automake:
Autoconf is the configuration system that creates the configure script based on feature checks that are supposed to work on all kinds of platforms. A lot of system knowledge has gone into its m4 macro database during the 15 years of its existence. On the one hand, I think the latter is the main reason Autotools have not been replaced by something else yet. On the other hand, Autoconf used to be far more important when the target platforms were more heterogeneous and Linux, AIX, HP-UX, SunOS, ..., and a large variety of different processor architecture had to be supported. I don't really see its point if you only want to support recent Linux distributions and Intel-compatible processors.
Automake is an abstraction layer for GNU Make and acts as a Makefile generator from simpler templates. A number of projects eventually got rid of the Automake abstraction and reverted to writing Makefiles manually because you lose control over your Makefiles and you might not need all the canned build targets that obfuscate your Makefile.
Now to the alternatives (and I strongly suggest an alternative to Autotools based on your requirements):
CMake's most notable achievement is replacing AutoTools in KDE. It's probably the closest you can get if you want to have Autoconf-like functionality without m4 idiosyncrasies. It brings Windows support to the table and has proven to be applicable in large projects. My beef with CMake is that it is still a Makefile-generator (at least on Linux) with all its immanent problems (e.g. Makefile debugging, timestamp signatures, implicit dependency order).
SCons is a Make replacement written in Python. It uses Python scripts as build control files allowing very sophisticated techniques. Unfortunately, its configuration system is not on par with Autoconf. SCons is often used for in-house development when adaptation to specific requirements is more important than following conventions.
If you really want to stick with Autotools, I strongly suggest to read Recursive Make Considered Harmful (archived) and write your own GNU Makefile configured through Autoconf.
The answers already provided here are good, but I'd strongly recommend not taking the advice to write your own makefile if you have anything resembling a standard C/C++ project. We need the autotools instead of handwritten makefiles because a standard-compliant makefile generated by automake offers a lot of useful targets under well-known names, and providing all these targets by hand is tedious and error-prone.
Firstly, writing a Makefile by hand seems a great idea at first, but most people will not bother to write more than the rules for all, install and maybe clean. automake generates dist, distcheck, clean, distclean, uninstall and all these little helpers. These additional targets are a great boon to the sysadmin that will eventually install your software.
Secondly, providing all these targets in a portable and flexible way is quite error-prone. I've done a lot of cross-compilation to Windows targets recently, and the autotools performed just great. In contrast to most hand-written files, which were mostly a pain in the ass to compile. Mind you, it is possible to create a good Makefile by hand. But don't overestimate yourself, it takes a lot of experience and knowledge about a bunch of different systems, and automake creates great Makefiles for you right out of the box.
Edit: And don't be tempted to use the "alternatives". CMake and friends are a horror to the deployer because they aren't interface-compatible to configure and friends. Every half-way competent sysadmin or developer can do great things like cross-compilation or simple things like setting a prefix out of his head or with a simple --help with a configure script. But you are damned to spend an hour or three when you have to do such things with BJam. Don't get me wrong, BJam is probably a great system under the hood, but it's a pain in the ass to use because there are almost no projects using it and very little and incomplete documentation. autoconf and automake have a huge lead here in terms of established knowledge.
So, even though I'm a bit late with this advice for this question: Do yourself a favor and use the autotools and automake. The syntax might be a bit strange, but they do a way better job than 99% of the developers do on their own.
For small projects or even for large projects that only run on one platform, handwritten makefiles are the way to go.
Where autotools really shine is when you are compiling for different platforms that require different options. Autotools is frequently the brains behind the typical
./configure
make
make install
compilation and install steps for Linux libraries and applications.
That said, I find autotools to be a pain and I've been looking for a better system. Lately I've been using bjam, but that also has its drawbacks. Good luck finding what works for you.
Autotools are needed because Makefiles are not guaranteed to work the same across different platforms. If you handwrite a Makefile, and it works on your machine, there is a good chance that it won't on mine.
Do you know what unix your users will be using? Or even which distribution of Linux? Do you know where they want software installed? Do you know what tools they have, what architecture they want to compile on, how many CPUs they have, how much RAM and disk might be available to them?
The *nix world is a cross-platform landscape, and your build and install tools need to deal with that.
Mind you, the auto* tools date from an earlier epoch, and there are many valid complaints about them, but the several projects to replace them with more modern alternatives are having trouble developing a lot of momentum.
Lots of things are like that in the *nix world.
Autotools is a disaster.
The generated ./configure script checks for features that have not been present on any Unix system for last 20 years or so. To do this, it spends a huge amount of time.
Running ./configure takes for ages. Although modern server CPUs can have even dozens of cores, and there may be several such CPUs per server, the ./configure is single-threaded. We still have enough years of Moore's law left that the number of CPU cores will go way up as a function of time. So, the time ./configure takes will stay approximately constant whereas parallel build times reduce by a factor of 2 every 2 years due to Moore's law. Or actually, I would say the time ./configure takes might even increase due to increasing software complexity taking advantage of improved hardware.
The mere act of adding just one file to your project requires you to run automake, autoconf and ./configure which will take ages, and then you'll probably find that since some important files have changed, everything will be recompiled. So add just one file, and make -j${CPUCOUNT} recompiles everything.
And about make -j${CPUCOUNT}. The generated build system is a recursive one. Recursive make has for a long amount of time been considered harmful.
Then when you install the software that has been compiled, you'll find that it doesn't work. (Want proof? Clone protobuf repository from Github, check out commit 9f80df026933901883da1d556b38292e14836612, install it to a Debian or Ubuntu system, and hey presto: protoc: error while loading shared libraries: libprotoc.so.15: cannot open shared object file: No such file or directory -- since it's in /usr/local/lib and not /usr/lib; workaround is to do export LD_RUN_PATH=/usr/local/lib before typing make).
The theory is that by using autotools, you could create a software package that can be compiled on Linux, FreeBSD, NetBSD, OpenBSD, DragonflyBSD and other operating systems. The fact? Every non-Linux system to build packages from source has numerous patch files in their repository to work around autotools bugs. Just take a look at e.g. FreeBSD /usr/ports: it's full of patches. So, it would have been as easy to create a small patch for a non-autotools build system on a per project basis than to create a small patch for an autotools build system on a per project basis. Or perhaps even easier, as standard make is much easier to use than autotools.
The fact is, if you create your own build system based on standard make (and make it inclusive and not recursive, following the recommendations of the "Recursive make considered harmful" paper), things work in a much better manner. Also, your build time goes down by an order of magnitude, perhaps even two orders of magnitude if your project is very small project of 10-100 C language files and you have dozens of cores per CPU and multiple CPUs. It's also much easier to interface custom automatic code generation tools with a custom build system based on standard make instead of dealing with the m4 mess of autotools. With standard make, you can at least type a shell command into the Makefile.
So, to answer your question: why use autotools? Answer: there is no reason to do so. Autotools has been obsolete since when commercial Unix has become obsolete. And the advent of multi-core CPUs has made autotools even more obsolete. Why programmers haven't realized that yet, is a mystery. I'll happily use standard make on my build systems, thank you. Yes, it takes some amount of work to generate the dependency files for C language header inclusion, but the amount of work is saved by not having to fight with autotools.
I dont feel I am an expert to answer this but still give you a bit analogy with my experience.
Because upto some extent it is similar to why we should write Embedded Codes in C language(High Level language) rather then writing in Assembly Language.
Both solves the same purpose but latter is more lenghty, tedious ,time consuming and more error prone(unless you know ISA of the processor very well) .
Same is the case with Automake tool and writing your own makefile.
Writing Makefile.am and configure.ac is pretty simple than writing individual project Makefile.

Resources