Binary Analysis Research Tools - security

Can some one provide me with a list of leading binary research tools for Windows OS and windows applications? I found BinScope from microsoft itself but was wondering if there are any other better tools around?
Thanks,
Omer

If you only have access to the binary your access is limited. If you want to peer into the inner workings of this binary your best bet is a Decompiler like IDA Pro and a assembler level debugger like OllyDBG.

Tom Reps, a professor at the University of Wisconsin and founder of GrammaTech, gave an impressive talk on this at Stanford last summer. GrammaTech is working on binary analysis (http://www.grammatech.com/research/contracts/HSARPA/HSARPA-2005-MCSB/), but I don't know whether it's available in their static analysis product yet.
Disclaimer: One of their VP's bought me lunch and got me to try a demo of their source code analysis tool while I was at Palm (before the binary analysis talk), but I think the results are confidential.

BAP is a toolkit for performing binary analysis on x86 programs. It lifts binary code to an easily understandable and analyzable language similar to compiler intermediate languages. It's not a point and click solution (i.e., programming is required to use it effectively), but it can be useful for people who want to write new program analyses on binary code without redefining the semantics of x86.

Related

Pros/cons of different language workbench tools such as Xtext and MPS?

Does anyone have experience working with language workbench tools such as Xtext, Spoofax, and JetBrains' MPS? I'm looking to try one out and am having a hard time finding a good comparison of the different tools. What are the pros and cons of each?
I'm looking to build DSLs that generate python code, so I'm especially interested to hear from people who've used one of these tools with python (all three seem pretty Java-focused... why is that?). The DLSs are primarily for my own use, so I care less about building a really pretty IDE than I do about it being KISS to define the syntax and write the code generator. The ability to type-check / do static analysis of the DLSs would be pretty cool too.
I'm a little afraid of getting far down a path, hitting a wall, and realizing that all my code is in a format that can't be ported to anything else -- is that a risk with these tools? MPS in particular seems a little scary since as I understand it you don't really generate text-based syntaxes but rather build specialized editors for ASTs.
Markus Voelter does a pretty good job comparing those three in se-radio and Software ArchitekTOUR podcasts.
The basic idea is, that Xtext is most used, therefore most stable and documented, and it is based on popular Eclipse platform and modeling ecosystem - EMF which surrounds it. On the other hand it is parser based and uses ANTLR internally, which means the kind of grammars you can define is limited and languages cannot be combined easily.
Spoofax is an academic product with least adoption of those three. It is also parser based, but uses its own parser generator internally which allows language combinations.
Jetbrains MPS is projection based, which gives much freedom to language designer and allows combinations of languages. *t also has solid support. Drawback might be the learning curve.
None of these tools is strictly Java focused as target language for code generators. Xtext uses Xpand templates, which are plain text. I don't really know how code generation in Spoofax works. MPS has its base language, which is said to be subset of Java, but there are different alternatives.
I personally use Xtext because of its simplicity and maturity, but those strong limitations given by its design make it not a very future proof choice.
I have chosen XText in the same case two weeks ago, but I don't know anything about Spoofax.
My first impression - Xtext is very simple and productive.
I have made my first realife(but very simple) project in 30 minutes, I have generated a graphviz dot graph and html report.
I don't like MPS because I prefer plain text source and destination files.
There are other systems for doing this kind of thing. If your goal is building tools, you don't necessarily have to look to an IDE with an integrated tool; sometimes you can find better tools that have focused on utility rather than IDE integration
Consider any of the pure program transformation tools:
TXL (practical, single paradigm)
Stratego (Spoofax before it was transplanted into Eclipse)
Rascal (research, very nicely designed in many ways)
DMS Software Reengineering Toolkit (happens to be mine; commercial; used to do heavy duty DSL/conventional langauge analysis and transformation including on C++)
These all provide good mechanisms for defining DSLs and transforming them.
What really matters is the support machinery for carrying out "life after parsing".
I 've experimented for a couple of days with Xtext and while the tool looks promising I was eventually put off by the tight integration with the Eclipse ecosystem and the pain one has to go through just to solve what should be given hassle-free out of the box: a headless run of the code generator you implemented. See here for some of the minutiae one has to go through (and it's not even properly documented on the Xtext web site but rather on a blog, meaning its an ad-hoc patch that could very well break on the next release).
Will take another look in half a year to see if there has been any improvement on this front.
Take a look at the Markus Völter's book. It does a very comprehensive comparison of these 3 technologies.
http://dslbook.org
XText is very well maintained but this doesn't mean it's problem-less. Getting type-system, scoping and generation running isn't as easy as advertised.
Spoofax is scannerless, (simplifying grammar composition). Not that well documented, but seems complete.
MPS is projectional. A pro for language composition and con for editing. Supports multiple editors for an AST and will soon even support a nice diagram editor. Base language documentation isn't that good. Typesystem, scoping, checking is very well handled. Model to model transformations are done by the solver. My colleagues using it complain about model to text languages. (My opinion M2M wasn't that intuitive either.)
Years ago Microsoft had the OSLO project. MGrammar and especially Quadrant were very promising. It was possible to represent your model in table, form, text or diagram view. But suddenly they've cancelled the project (and perhaps shot the people working on it)
Perhaps today the best place to compare different language workbenches is http://www.languageworkbenches.net/ and there http://www.languageworkbenches.net/past-editions/ shows how a set of Language Workbenches implement a similar kind of task: a dsl for a particular domain.
Update 2022: as links were broken and newer articles on the topic are written see the site referred above at:
https://web.archive.org/web/20160324201529/http://www.languageworkbenches.net/
References to article reviewing language workbenches include: 1) State of the art: https://link.springer.com/chapter/10.1007/978-3-319-02654-1_11 and 2) Empirical evaluation: https://hal.archives-ouvertes.fr/file/index/docid/706841/filename/Evaluation_of_Modeling_Tools_Adaptation.pdf

What type of programs are C/C++ used for now? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
in which area is c++ mostly used?
I started off with C in school, went to Java and now I primarily use the P's(Php, Perl, Python) so my exposure to the lower level languages have all but disappeared. I would like to get back into it but I can never justify using C over Perl or Python. What real-world apps are being built with these languages? Any suggestions if I want to dive back in, what can I do with C/C++ that I can't easily do with Perl/Python?
To borrow some text from the answer I had for another related question:
Device drivers in native code.
High performance floating point number crunching (i.e. SIMD).
Easy ability to interface with assembly language routines.
Manage memory manually for extended execution runs.
Most of my work has been C and C++. I studied computer engineering in school and worked with embedded devices. My Master's degree had an emphasis in graphics and visualization. One of our visualization apps was written in Python, but for the most part, graphics demands C/C++ for the speed. I now work with embedded devices running Windows Mobile and Windows CE - all C++, though you can do a lot with C#. I previously worked in simulations, which was all C++ code on the backend. C++ is still king for time-sensitive IO, embedded applications, graphics and simulations.
Basically, if you need tight control of timing, you go lower level. Or if you need light-weight (ie, small program size, small memory footprint)
Somewhat unscientifically I took a look on Sourceforge and the top twenty projects/language break-down is currently thus:
Java(43,199)
C++(34,313)
PHP(28,333)
C(26,711)
C#(12,298)
Python(12,222)
JavaScript(10,307)
Perl(8,931)
Unix Shell(3,618)
Delphi/Kylix(3,353)
Visual Basic(3,044)
Visual Basic .NET(2,513)
Assembly(2,283)
JSP(1,891)
Ruby(1,731)
PL/SQL(1,669)
Objective C(1,424)
ASP.NET(1,344)
Tcl(1,241)
ActionScript(1,164)
Perl + Python together still total less than C alone. I have no idea why Java is so high, I know of no single Java developer and have not seen a single Java project, but I am sure someone is using it! For probably the same reason, you are not seeing much C/C++, you are just not working in a domain where it figures highly. I work in embedded systems where C and C++ are ubiquitous and Python comes nowhere. Different languages are encountered to different extents in different worlds.
You ask what you can do with C/C++ that you cannot do easily with Perl/Python; well the answer is plenty, real-time embedded systems for one; but if that is not what you want/need to do, then there is no reason to. On the other hand I might ask the reverse; I'd use C++ for things you might use Python for, simply because for me it would be easier and quicker (than learning a new language and getting the tools working)
C/C++ can be, and is, used for nearly all "types" of programs.
There are some major advantages to C and C++:
Potentially better performance
Easier to build interoperable libraries, especially if working with libraries usable from multiple languages.
well the interpreters for your "P's" languages are most certainly written in c/c++. Most OS code is written in C/C++. On the application side, if you are into games, they are generally written in c/c++. Anything that needs high performance and or low memory is a good candidate.
I've used Gsoap, a c++ soap client implementation for a web service that got HUGE traffic.
Most desktop/console applications with a bias toward graphics rely heavily on C++. This includes CAD software and AAA video games, among other things.

Platform for creating a visual programming language

I'm interested in creating a visual programming language which can aid non-programmers(like children) to write simple programs, much like Labview or Simulink allows engineers to connect functional blocks together without the knowledge of how they are internally built. Is this called programming by demonstration? What are example applications?
What would be an ideal platform which can allow me to do this(it can be a desktop or a web app)
Check out Google Blockly. Blockly allows a developer to create their own blocks, translations (generators) to virtually any programming language (or even JSON/XML) and includes a graphical interface to allow end users to create their own programs.
Brief summary:
Blockly was influenced by App Inventor, which itself was based off Scratch
App Inventor now uses Blockly (?!)
So does the BBC microbit
Blockly itself runs in a browser (typically) using javascript
Focused on (visual) language developers
language independent blocks and generators
includes a Block Factory - which allows visual programming to create new Blocks (?!) - I didn't find this useful myself...except for understanding
includes generators to map blocks to javascript/python
e.g. These blocks:
Generated this code:
See https://developers.google.com/blockly/about/showcase for more details
Best wishes - Andy
The adventure on which you are about to embark is the design and implementation of a visual programming language. I don't know of any good textbooks in this area, but there are an IEEE conference and refereed journal devoted to this field. Margaret Burnett of Oregon State University, who is a highly regarded authority, has assembled a bibliography on visual programming languages; I suggest you start there.
You might consider writing to Professor Burnett for advice. If you do, I hope you will report the results back here.
There is Scratch written by MIT which is much like what you are looking for.
http://scratch.mit.edu/
A restricted form of programming is dataflow (aka. flow-based) programming, where the application is built from components by connecting their ports. Depending on the platform and purpose, the components are simple (like a path selector) or complex (like an image transformator). There are several dataflow systems (just I've made two), some of them has no visual editor, some of them are just a part of a bigger system, and there're some which don't even mention the approach. (Did you think, that make, MS-Excel and Unix Shell pipes are some kind of this?)
All modern digital synths based on dataflow approach, there's an amazing visual example: http://www.youtube.com/watch?v=0h-RhyopUmc
AFAIK, there's no dataflow system for definitly educational purposes. For more information, you should check this site: http://flowbased.org/start
There is a new open source library out there: TUM.CMS.VPLControl. Get it here. This library may serve as a basis for your purposes.
There is Snap written by UC Berkeley. It is another option to understand VPL.
Pay attention on CoSpaces Edu. It is an online platform that enables the creation of virtual worlds and learning experiences whilst providing a more flexible approach to the learning curriculum.
There is visual coding named "CoBlocks".
Learners can animate and code their creations with "CoBlocks" before exploring and sharing them in mobile VR.
Also It is possible to use JavaScript or TypeScript.
If you want to go ahead with this, the platform that I suggest is the one used to implement Scratch (which already does what you want, IMHO), which is Squeak Smalltalk. The Squeak environment was designed with visual programming explicitly in mind. It's free, and Smalltalk syntax can learned in half an hour. Learning the gigantic class library may take just a little longer.
The blocks editor which was most support and development for microbit is microsoft makecode
Scratch is a horrible language to teach programming (i'm biased, but check out Pipes Visual Programming Language)
What you seem to want to do sounds a lot like Functional Block programming (as in functional block programming language IEC 61499 and other VPLs for mechatronics development). There is already a lot of research into VPLs so you might want to make sure that A) what your are trying to do has an audience and B) what you are trying to do can be done easily.
It sounds a bit negative in tone, but a good place to start to test the plausibility of your idea is by reading Davor Babic's short blog post at http://blog.davor.se/blog/2012/09/09/Visual-programming/
As far as what platform to use - you could use pretty much anything, just make sure it has good graphic libraries (You could use Java with Swing - if you like pain - or Python with TKinter) just depends what you are familiar with. Just keep in mind who you want to eventually launch the language to (if its iOS, then look at using Objective-C, etc.)

Ada compilers for Linux

I'm doing a trade study for Ada development on Linux. Do you have any good compiler/OS recommendations?
So far, I've got GNAT from AdaCore running on CentOS 5.4, and I have license requests in for Rational Apex and Aonix ObjectAda.
This is a porting effort. The original codebase is Apex 3.0 on OSF1 4.0d.
Anything else I should be considering? Ideally, it would be a supported environment.
One issue you need to take into consideration is to determine to what degree your system that's being ported utilizes vendor-supplied packages to perform its function. What I've seen with older, large systems, especially Apex ones, is a propensity for the language gurus during its development time to have decided that vanilla Ada just wasn't good enough, and so tie into all these vendor-supplied packages. If that's what your system does right now, it's a strong argument for upgrading within the vendor and sticking with Apex (all other things being mostly equal).
Whenever I've done ports of such systems, if given the opportunity I've done my best to tear out all the vendor-supplied stuff--nine times out of ten replacing the vendor-specific stuff with vanilla Ada implementations worked just as well, and you no longer have to deal with the quirks of a compiler-specific package. Plus, you increase the portability and maintainability of the system, allowing it to better adapt to future changes.
There is always SPARK, but I believe its a specialized/subsetted version of the Ada language. You might want to contact SigAda or the Ada usenet group to see if there are any other ideas.
Honestly though, GNAT is a great tool set. You can use GNATBench, an Eclipse interface, or GPS, a light-weight GTK+ IDE, to interface with the GNAT tools.
Other compilers I am aware of are Green Hills AdaMULTI (for various RTOSes), and DDC-I's SCORE (also for various RTOSes)
Providers of certified compilers that support Linux (in addition to those listed in the question):
Irvine Compiler Corp.
OC Systems
RR Software
Sofcheck

Development time in various languages

Does anybody know of any research or benchmarks of how long it takes to develop the same application in a variety of languages? Really I'm looking for Java vs. C++ but any comparisons would be useful. I have the feeling there is a section in Code Complete about this but my copy is at work.
Edit:
There are a lot of interesting answers to this question but it seems like there is a lack of really good research. I have made a proposal over at meta about this problem.
Pratt & Whitney, purveyors of jet engines for civilian and military applications, did a study on this many years ago, without actually intending to do the study.
They went on the same metrics kick everyone else went on in the 1990s. They collected a bunch of data about their jet engine controller projects, including timecard data. They crunched it. The poor sap who got to crunch the data noticed something in the results: the military projects uniformly had twice the programmer productivity and one/fourth the defect density as the civilian projects.
This, by itself, is significant. It means you only need half as many programmers, and you aren't going to spend quite as much time fixing bugs. What is even more important is that this was an apples-to-apples comparison. A jet engine controller is a jet engine controller.
He then went looking for candidate explanations. All of the usual candidates: individual experience, team size, toolsets, software processes, requirements stability, everything, were trotted out, and they were ruled out when it was seen that the story on those items was uniformly the same on both sides of the aisle. At the end of the day, only one statistically significant difference showed up.
The civilian projects were written in every language you could think of. The military projects were all written in Ada.
IN EVERY SINGLE CASE, against every other comer, for jet engine controllers at Pratt & Whitney, using Ada gave double the productivity and one/fourth the defect density.
I know what the flying code monkeys are going to say. "You can do good work in any language." In theory, that's true. In practice, however, it appears that, at least at Pratt & Whitney, language made a difference.
Last I heard about this, Pratt & Whitney upper management decreed that ALL jet engine controller projects would be done in Ada.
No, I don't have a citation. No paper was ever written. My source on this story was the poor sap who crunched the numbers. Here's a similar study from 1995:
http://archive.adaic.com/intro/ada-vs-c/cada_art.html
This, incidentally, was BEFORE Boeing did the 777, and BEFORE the 777 brake subcontractor story happened. But that's another story.
One of the few funded scientific studies that I'm aware of on cross-language productivity, from the early 90s, funded by ARPA and the ONR,
Haskell vs. Ada Vs. C++ vs Awk vs ... An Experiment in Software Prototyping Productivity, Hudak & Jones, 1994.
We describe the results of an
experiment in which several
conventional programming languages,
together with the functional language
Haskell, were used to prototype a
Naval Surface Warfare Center (NSWC)
requirement for a Geometric Region
Server. The resulting programs and
development metrics were reviewed by a
committee chosen by the Navy. The
results indicate that the Haskell
prototype took significantly less time
to develop and was considerably more
concise and easier to understand than
the..
This article(a pdf) has some benchmarks (note that it's from 2000) between C, C++, Perl, Java, Perl, Python, Rexx and Tcl.
Some common wisdom I believe holds true (also somewhere within the article):
The number of lines written per hour is independent of the language
Opinion: more important is what is faster for a given developer, for example yourself. What you are used to, will usually be faster. If you are used to 20 years of C++ pitfalls and never skip an uninitialized variable, that will be faster than Java for anybody.
If you remember all parameters of CreateWindowEx() by heart, it will be faster than MFC or winforms.
A couple of anecdotal data points:
On Project Euler, which invites programming solutions to mathematical problems,
the shortest solutions are almost invariably written in J or K, a relative of APL; there are occasionally MatLab solutions in the same range. It can be argued, though, that these languages specialized in math.
runners up were Ruby solutions. A lot of algorithm can be wrapped in very little code, and it's much more legible than J / K.
Python and Haskell solutions also did very well, LOC-wise.
The question asked about "fastest development," not "shortest code." But it's conceivable that shorter solutions are faster to come up with - certainly for slow typists!
There's an annual competition among roboticists. Contestants are given some specs for some hardware, a practical problem to solve in software, and limited time to do so. Again very domain specific, of course. Programmers have their choice of tools, including language of course. Every year, the winning team (often a single person) used Forth.
This admittedly limited sample suggests that "development speed" and "effect of language on speed" is often very dependent on the problem domain.
See also
Are there statistical studies that indicates that Python is "more productive"?
for some discussions about this kind of question.
It would make more sense to benchmark the programmers, not the languages. The time to write a program in any mainstream language depends more on the ability of the programmer in that language than on qualities of that specific language.
I think most benchmarks and statements on this topic will mean very little.
Benchmarks can always be gamed; see the history of "Pet Store".
A language that's good at solving one kind of problem might not apply as well to another.
What matters most is the skill of your team, its knowledge of a particular technology, and how well you know the domain you're trying to solve.
UPDATE: Control software for jet engines and helicopters is a very specialized subset of computing problems. It's characterized by very rigorous, complete, detailed specs and QA that means the multi-million dollar aircraft cannot crash.
I can second the (very good) citation by John Strohm of Pratt & Whitney control software written in Ada. The control software for Kaman helicopters sold to Australia was also written in Ada.
But this does not lead to the conclusion that if you decided to write your next web site in Ada that you'd have higher productivity and fewer defects than you would if you chose C# or Java or Python or Ruby. All languages are not equally good in all problem domains.
Language/framework comparison for web applications
The Plat_Forms project provides some information of this type for web applications.
There are three studies with different tasks (done in 2007, 2011, 2012), all of the following format: Several teams of three professional developers implemented the same application under controlled conditions within two days.
It covers Java, Perl, PHP, and Ruby and has multiple teams for each language.
The evaluation reports much more than only development time.
Findings of iteration one for instance included
that experience with the language and framework appeared to be more relevant than what that framework was.
that Java tended to induce teams to make laborious constructions while Perl induced them to make pragmatic (and quite handy) constructions.
Findings of iteration two included
that Ruby on Rails was more productive in this type of project (which due to its duration was more rapid prototyping than full-blown development of a mature application)
and that the one exception to the above rule was the one team using Symfony, a PHP framework that has similar concepts to Ruby on Rails (but still the very different base language underneath it).
Look under http://www.plat-forms.org or search the web for "Plat_Forms".
There is plenty more detail in the reports, in particular the thick techreport on iteration 1.
Most programs have to interface with some other framework. It tends to be a good idea to pick the language that has libraries specifically for what you are trying to do. For instance are you trying to build a distributed redundant messaging system? If so I would use Erlang. Are you trying to make a quick and dirty data driven website, use Ruby and Rails. You get the idea. Real time DirectX where performance is key, C++/C/Asm.
If you are writing something that is algorithm based I would look to a functional language like Haskell, although it has a very high learning curve.
This question is a little old fashioned. Focusing on development time solely based on the choice of language is of limited value. There are so many other factors that have equal or more impact than the language itself:
The libraries or frameworks available / used.
The level of quality required (ie. defect count).
The type of application (eg. GUI, server, driver etc...)
The level of maintainability required.
Developer experience in the language.
The platform or OS the application is built on.
As an example, many would say Java is the better choice over C++ to build enterprise (line of business) applications. This is not normally because of the language itself, but instead it is perceived that Java has better (or more mature) web server and database frameworks available to it. This may or may not be true, but that is beside the point.
You may even find that the building an application using the same language on different operating systems or platforms gives greatly differing development time. For example using C++ on Linux to build a GUI application may take longer than a Windows based GUI application using C++ because of less extensive and mature GUI libraries avaialble on Linux (once again this is debatable).
According to Norvig, Lutz Prechelt published just such an article in the October 1999 CACM: "Comparing Java vs. C/C++ Efficiency Issues to Interpersonal Issues".
Norvig includes a link to that article. Unfortunately, the ACM, despite having a bitmap graphic proclaiming their goal of "Advancing Computing as a Science & Profession", couldn't figure out how to maintain stable links on their webpage, so it's just a 404 now. Perhaps your local library could help you out.
That Ada story might be an embellished version of this: http://www.adaic.com/whyada/ada-vs-c/cada_art.html
Erlang vs C++/Corba
"... As the Erlang DCC is less than a quarter of the size of a similar C++/CORBA implementation, the product development in Erlang should be fast, and the code maintainable. We conclude that Erlang and associated libraries are suitable for the rapid development of maintainable and highly reliable distributed products."
Paper here
There's a reason why there are no real comparisons in that aspect, except for anecdotal evidence (which can be found in favor of almost any language).
Actually writing code takes relatively small portion of developer's time. Even if language lets you cut coding time in half, it will be barely noticeable by the time project ends. Design, structure of program, development process are all much more important, and then there are libraries, tools and experience with them.
Some languages are better suited for certain development processes than the others, so if you've settled on design and process you can decide which language will be more efficient - but not before.
(didn't notice there's a similar answer already, so feel free to ignore this)

Resources