How to check if running shake would rebuild a target (without actually trying to build it)? - haskell

In certain situations, I want to know very quickly if a certain target file is up-to-date, i.e. whether or not building it with shake would run any of the rules transitively contributing to the target.
What's the best way to achieve this with shake?
My question is somewhat related to Shake: Signal whether anything had to be rebuilt at all. However, I do not want to run the actual build because I do not want to touch any of my build products.

In Shake what files are "dirty" is a bit difficult to determine. If your file of interest depends on any oracles (even transitively), then the file is considered "dirty", since oracles are always dirty. However, if Shake reruns the dependent rules/oracles and they don't change, then at that point it is considered clean again. As a result, most rules are considered dirty before anything has been run, and only after running some rules does it become clean.
I have raised a ticket to do something better. One option would be given a target rule, to say which leaves depend on it and are dirty - that would list the oracles (which you reasonably expect not to change) and if it lists any source files, you would expect it to rebuild.
(I'd welcome suggestions if anyone has any good ideas.)

Related

Running a brief asm script inline for dynamic analysis

Is there any good reason not to run a brief unknown (30 line) assembly script inline in a usermode c program for dynamic analysis directly on my laptop?
There's only one system call to time, and at this point I can tell that it's a function that takes a c string and it's length, and performs some sort of encryption on it in a loop, which only iterates through the string as long as the length argument tells it.
I know that the script is (supposed to be) from a piece of malicious code, but for the life of me I can't think of any way it could possibly pwn my computer barring some sort of hardware bug (which seems unlikely given that the loop is ~ 7 instructions long and the strangest instruction in the whole script is a shr).
I know it sounds bad running an unknown piece of assembly code directly on the metal, but given my analysis up to this point I can't think of any way it could bite me or escape.
Yes, you can but I won't recommend it.
The problem is not how dangerous is the code this time (assuming you really understand all of the code and you can predict the outcome of any system call), the problem is that it's a slippery slope and it's not worth it considering what's at stake.
I've done quite a few malware analysis and rarely happened that a piece of code caught me off guard but it happened.
Luckily I was working on a virtual machine within an isolated network: I just restored the last snapshot and stepped through the code more carefully.
If you do this analysis on your real machine you may take the habit and one day this will bite you back.
Working with VMs, albeit not as comfortable as using your OS native GUI, is the way to go.
What could go wrong with running a 7 lines assembly snippet?
I don't know, it really depends on the code but a few things to be careful about:
Exceptions. An instruction may intentionally fault to pass the control to an exception handler. This is why it very important that you totally understand the code: both the instruction and the data.
System calls exploits. A specially crafted input to a system call may trigger a 0-day or an unpatched vulnerability in your system. This is why is important that you can predict the outcome of every system call.
Anti debugger techniques. There are a lot of way a piece of code could escape a debugger (I'm thinking Windows debugging here), it's hard to remember them all, be suspicious of everything.
I've just named a few, it's catastrophically possible that an hardware bug could lead to privileged code execution but if that's really a possibility then nothing but a spare sacrificable machine will do.
Finally, if you are going to run the malware (because I assume the work of extracting the code and its context is too much of a burden) up to a breakpoint on your machine, think of what's at stake.
If you place the break point on the wrong spot, if the malware takes another path or if the debugger has a glitchy GUI, you may loose your data or the confidentiality of your machine.
I'n my opinion is not worth it.
I had to make this premise for generality sake but we all sin something, don't we?
I've never run a piece of malware on my machine but I've stepped through some with a virtual machine directly connected on the company network.
It was a controlled move, nothing happened, the competent personnel was advised and it was an happy ending.
This may very well be your case: it can just be a decryption algorithm and nothing more.
However, only you have the final responsibility to judge if it is acceptable to run the piece of code or not.
As I remarked above, in general it is not a good idea and it presupposes that you really understand the code (something that is hard to do and be honest about).
If you think these prerequisites are all satisfied then go ahead and do it.
Before that I would:
Create an unprivileged user and deny it access to my data and common folders (ideally deny it everything but what's is necessary to make the program work).
Backup the the critical data, if any.
Optionally
Make a restore point.
Take an hash of the system folders, a list of installed services and the value of the usual startup registry keys (Sysinternals have a tool to enum them all).
After the analysis, you can check that nothing important system-wide has changed.
It may be helpful to subst a folder and put the malware there so that a dummy path traversal stops in that folder.
Isn't there better solution?
I like using VMs for their snapshotting capabilities, though you may stumble into an anti-VM check (but they are really dumb checks, so it's easy to skip them).
For a 7-line assembly I'd simply rewrite it as a JS function and run it directly in a browser console.
You can simply transform each register in a variable and transcript the code, you don't need to understand it globally but only locally (i.e. each instruction).
JS is handy if you don't have to work with 64-bit quantities because you have an interpreter in front of you right now :)
Alternatively I use any programming language I have at hand (One time even assembly it self, it seems paradoxical but due to a nasty trick I had to convert a 64-bit piece of code to a 32-bit one and patch the malware with it).
You can use Unicorn to easily emulate a CPU (if the architecture is supported) and play with your shellcode without any risk.

Why is "cabal build" so slow compared with "make"?

If I have a package with several executables, which I initially build using cabal build. Now I change one file that impacts just one executable, cabal seems to take about a second or two to examine each executable to see if it's impacted or not. On the other hand, make, given an equivalent number of executables and source files, will determine in a fraction of a second what needs to be recompiled. Why the huge difference? Is there a reason, cabal can't just build its own version of a makefile and go from there?
Disclaimer: I'm not familiar enough with Haskell or make internals to give technical specifics, but some web searching does offer some insight that lines up with my proposal (trying to avoid eliciting opinions by providing references). Also, I'm assuming your makefile is calling ghc, as cabal apparently would.
Proposal: I believe there could be several key reasons, but the main one is that make is written in C, whereas cabal is written in Haskell. This would be coupled with superior dependency checking from make (although I'm not sure how to prove this without looking at the source code). Other supporting reasons, as found on the web:
cabal tries to do a lot more than simply compiling, e.g. appears to take steps with regard to packaging (https://www.haskell.org/cabal/)
cabal is written in haskell, although the run time is written in C (https://en.wikipedia.org/wiki/Glasgow_Haskell_Compiler)
Again, not being overly familiar with make internals, make may simply have a faster dependency checking mechanism, thereby better tracking these changes. I point this out because from the OP it sounds like there is a significant enough difference to where cabal may be doing a blanket check against all dependencies. I suspect this would be the primary reason for the speed difference, if true.
At any rate, these are open source and can be downloaded from their respective sites (haskell.org/cabal/ and savannah.gnu.org/projects/make/) allowing anyone to examine specifics of the implementations.
It is also likely one could see a lot of variance in speed based upon the switches passed to the compilers in use.
HTH at least point you in the right direction.

How can I use the Shake library to build a reactive build system? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
Would it be possible to make Shake reactive, using inotify (or whatever git-annex and Yesod use) so that if ever the filesystem changes in such a way to imply that rule should execute, it does so at the earliest opportunity?
The author of Shake, Neil Mitchell, answered this by saying:
There are a few ways to approach this:
You could just rerun Shake every time it detects something has
changed. Shake is highly optimised for fast rebuilds, if the change
requires doing a compile, then the time for Shake to figure out what
to rebuild is probably minimal. Requires no changes to Shake.
There are certain things Shake does on startup, like reading the
Shake database. If there is demand, and that turns out to be
noticeable in time, I would happily provide a rerun Shake cheaply API
of some sort - it's not that difficult to do.
When Shake does do a rebuild-check, the most expensive thing it
does is checking file modification times. If the inotify layer gave a
list of the files that had changed I could only recheck things that
had actually changed. For a massive project you're likely to see ~1s
checking modification times, so it probably buys you a little, and
isn't too hard to implement.
If Shake is actively building, and then something changes, you
could throw an exception, kill whatever is being built, and restart
Shake. Shake has been thoroughly tested with exceptions being thrown
at it, and does the right thing. I know at least one person uses Shake
in this way.
Finally, if Shake is actively building, you could dynamically
terminate just those rules whose inputs have changed and go again.
Shake could support this model, but it would be a reasonable amount of
work, and require re-engineering some pieces. That would be the full
reactive model, but I suspect it only starts to be a benefit when you
have a massive number of files and a few files are changing almost
continuously but most files aren't.
We also determined that combining Shake with a utility like Hobbes (also on Hackage) can make it possible to do reactive builds.

Is there any good build tool that stays out of my way?

As the title says, I want to have a build tool that quite much stays out of my way.
I would rather want to specify rules, rather than steps in the build process. I wan to say that I want a binary file with a name placed in the root directory of my project, .o files should go in an obj/tmp dir and the source is in the Source-directory.
I do NOT want to tell it that it is this'n'that file as I keep adding new files rather quickly, it should just scan the source directory (and its subdirectories) looking for Ragel (.rl) and C++ code (.cxx) and doing what's necessary to make all into an executable.
I have looked into many tools, like auto{make,conf,header} (Did not really like that I placed the files it wanted in a subdir of project root, eclipse did not like that either), CMake (Seems like I have to add all source-files myself, and is quite much a variation of autotools in my eyes). I have also read about ant, maven (I am also allergic against XML, it's a good format to serialize data for applications, not so much for humans. I would prefer YAML) and others on WikiPedia. And I have seen tools which seems good but which require to be set up as a webserver which is kinda overkill.
Also, I really need the ability to be able to work offline without internet connection!.
Right now it seems like the best option is to make a little script that finds all .cxx files and write an Unity.cxx and builds that one with G++, which probably is quite fast but to much an ugly hack, I guess.
Bonus Points:
Fast builds
Ability to type build test-1 or something and it will build and directly run test-1
Multi-core builds (i.e. faster builds)
Does really not interrupt my train of thought
CMake is great. It's free, cross-platform, and reasonably well documented. It supports "out of source builds", meaning none of the build files are placed in the source directory. That makes source control a bit easier. It can be set up to find new files (globbing). Fast?...It generates make files...after that it's up to your compiler. Multicore...again, more a function of the compiler. I've used CMake on Windows, Linux, and Mac...it just works.
Another that I haven't tried but have read about and plan to test is premake... http://industriousone.com/sample-script
cake from CoffeeScript is quite good, and I'm writing a similar tool using Lua myself.
CMake and premake Ain't build/maketools, they are build/make-descriptor generators; which may fit a large number of projects that ain't changing too much. But not for project where rapid prototyping is a key.
Right now, I'm doing a project where the browser updates when you hit the save-button in your text editor; You do not need to go to the browser and hit F5 (Which would cause a small delay while the browser load in everything again, and you would most likely loose the state of the page, like say that you have an menu open, and wish to tweak the look of the menu. You would be forced to navigate there again in your RIA).

How to build Linux system from kernel to UI layer

I have been looking into MeeGo, maemo, Android architecture.
They all have Linux Kernel, build some libraries on it, then build middle layer libraries [e.g telephony, media etc...].
Suppose i wana build my own system, say Linux Kernel, with some binariers like glibc, Dbus,.... UI toolkit like GTK+ and its binaries.
I want to compile every project from source to customize my own linux system for desktop, netbook and handheld devices. [starting from netbook first :)]
How can i build my own customize system from kernel to UI.
I apologize in advance for a very long winded answer to what you thought would be a very simple question. Unfortunately, piecing together an entire operating system from many different bits in a coherent and unified manner is not exactly a trivial task. I'm currently working on my own Xen based distribution, I'll share my experience thus far (beyond Linux From Scratch):
1 - Decide on a scope and stick to it
If you have any hope of actually completing this project, you need write an explanation of what your new OS will be and do once its completed in a single paragraph. Print that out and tape it to your wall, directly in front of you. Read it, chant it, practice saying it backwards and whatever else may help you to keep it directly in front of any urge to succumb to feature creep.
2 - Decide on a package manager
This may be the single most important decision that you will make. You need to decide how you will maintain your operating system in regards to updates and new releases, even if you are the only subscriber. Anyone, including you who uses the new OS will surely find a need to install something that was not included in the base distribution. Even if you are pushing out an OS to power a kiosk, its critical for all deployments to keep themselves up to date in a sane and consistent manner.
I ended up going with apt-rpm because it offered the flexibility of the popular .rpm package format while leveraging apt's known sanity when it comes to dependencies. You may prefer using yum, apt with .deb packages, slackware style .tgz packages or your own format.
Decide on this quickly, because its going to dictate how you structure your build. Keep track of dependencies in each component so that its easy to roll packages later.
3 - Re-read your scope then configure your kernel
Avoid the kitchen sink syndrome when making a kernel. Look at what you want to accomplish and then decide what the kernel has to support. You will probably want full gadget support, compatibility with file systems from other popular operating systems, security hooks appropriate for people who do a lot of browsing, etc. You don't need to support crazy RAID configurations, advanced netfilter targets and minixfs, but wifi better work. You don't need 10GBE or infiniband support. Go through the kernel configuration carefully. If you can't justify including a module by its potential use, don't check it.
Avoid pulling in out of tree patches unless you absolutely need them. From time to time, people come up with new scheduling algorithms, experimental file systems, etc. It is very, very difficult to maintain a kernel that consumes from anything else but mainline.
There are exceptions, of course. If going out of tree is the only way to meet one of your goals stated in your scope. Just remain conscious of how much additional work you'll be making for yourself in the future.
4 - Re-read your scope then select your base userland
At the very minimum, you'll need a shell, the core utilities and an editor that works without an window manager. Paying attention to dependencies will tell you that you also need a C library and whatever else is needed to make the base commands work. As Eli answered, Linux From Scratch is a good resource to check. I also strongly suggest looking at the LSB (Linux standard base), this is a specification that lists common packages and components that are 'expected' to be included with any distribution. Don't follow the LSB as a standard, compare its suggestions against your scope. If the purpose of your OS does not necessitate inclusion of something and nothing you install will depend on it, don't include it.
5 - Re-read your scope and decide on a window system
Again, referring to the everything including the kitchen sink syndrome, try and resist the urge to just slap a stock install of KDE or GNOME on top of your base OS and call it done. Another common pitfall is to install a full blown version of either and work backwards by removing things that aren't needed. For the sake of sane dependencies, its really better to work on this from bottom up rather than top down.
Decide quickly on the UI toolkit that your distribution is going to favor and get it (with supporting libraries) in place. Define consistency in UIs quickly and stick to it. Nothing is more annoying than having 10 windows open that behave completely differently as far as controls go. When I see this, I diagnose the OS with multiple personality disorder and want to medicate its developer. There was just an uproar regarding Ubuntu moving window controls around, and they were doing it consistently .. the inconsistency was the behavior changing between versions. People get very upset if they can't immediately find a button or have to increase their mouse mileage.
6 - Re-read your scope and pick your applications
Avoid kitchen sink syndrome here as well. Choose your applications not only based on your scope and their popularity, but how easy they will be for you to maintain. Its very likely that you will be applying your own patches to them (even simple ones like messengers updating a blinking light on the toolbar).
Its important to keep every architecture that you want to support in mind as you select what you want to include. For instance, if Valgrind is your best friend, be aware that you won't be able to use it to debug issues on certain ARM platforms.
Pretend you are a company and will be an employee there. Does your company pass the Joel test? Consider a continuous integration system like Hudson, as well. It will save you lots of hair pulling as you progress.
As you begin unifying all of these components, you'll naturally be establishing your own SDK. Document it as you go, avoid breaking it on a whim (refer to your scope, always). Its perfectly acceptable to just let linux be linux, which turns your SDK more into formal guidelines than anything else.
In my case, I'm rather fortunate to be working on something that is designed strictly as a server OS. I don't have to deal with desktop caveats and I don't envy anyone who does.
7 - Additional suggestions
These are in random order, but noting them might save you some time:
Maintain patch sets to every line of upstream code that you modify, in numbered sequence. An example might be 00-make-bash-clairvoyant.patch, this allows you to maintain patches instead of entire forked repositories of upstream code. You'll thank yourself for this later.
If a component has a testing suite, make sure you add tests for anything that you introduce. Its easy to just say "great, it works!" and leave it at that, keep in mind that you'll likely be adding even more later, which may break what you added previously.
Use whatever version control system is in use by the authors when pulling in upstream code. This makes merging of new code much, much simpler and shaves hours off of re-basing your patches.
Even if you think upstream authors won't be interested in your changes, at least alert them to the fact that they exist. Coordination is essential, even if you simply learn that a feature you just put in is already in planning and will be implemented differently in the future.
You may be convinced that you will be the only person to ever use your OS. Design it as though millions will use it, you never know. This kind of thinking helps avoid kludges.
Don't pull upstream alpha code, no matter what the temptation may be. Red Hat tried that, it did not work out well. Stick to stable releases unless you are pulling in bug fixes. Major bug fixes usually result in upstream releases, so make sure you watch and coordinate.
Remember that it's supposed to be fun.
Finally, realize that rolling an entire from-scratch distribution is exponentially more complex than forking an existing distribution and simply adding whatever you feel that it lacks. You need to reward yourself often by booting your OS and actually using it productively. If you get too frustrated, consistently confused or find yourself putting off work on it, consider making a lightweight fork of Debian or Ubuntu. You can then go back and duplicate it entirely from scratch. Its no different than prototyping an application in a simpler / rapid language first before writing it for real in something more difficult. If you want to go this route (first), gNewSense offers utilities to fork your own OS directly from Ubuntu. Note, by default, their utilities will strip any non free bits (including binary kernel blobs) from the resulting distro.
I strongly suggest going the completely from scratch route (first) because the experience that you will gain is far greater than making yet another fork. However, its also important that you actually complete your project. Best is subjective, do what works for you.
Good luck on your project, see you on distrowatch.
Check out Linux From Scratch:
Linux From Scratch (LFS) is a project
that provides you with step-by-step
instructions for building your own
customized Linux system entirely from
source.
Use Gentoo Linux. It is a compile from source distribution, very customizable. I like it a lot.

Resources