What is the difference between source code analysis and object file analysis?

What is the difference between source code analysis and object file analysis? - security

I'm interested in vulnerability detection. But not much is known about the beginning.
I'm currently studying static analysis. Static analysis can be done through source code or object files.
I'd like to know difference between source code analysis and object file analysis. I want to explain each pros and cons. You can also provide a link to paper or blog.
Thank you!

For introductions to static source code analysis, I’ll immodestly suggest the references in my Dr Dobbs article:
http://www.drdobbs.com/testing/deploying-static-analysis/240003801. For an example of why binary analysis, though much harder, is also necessary see
https://threatpost.com/new-linux-flaw-enables-null-pointer-exploits-071709/72889/, where a technically correct but unfriendly compiler optimization led to a vulnerability not in the source. (Some of the debate on Slashdot may actually be worth reading: https://it.slashdot.org/story/09/07/18/0136224/new-linux-kernel-flaw-allows-null-pointer-exploits.)

Related

Dead code and/or how to generate a cross reference from Haskell source

I've got some unused functionality in my codebase, but it's hard to identify. The code has evolved over the last year as I explore its problem space and possible solutions. What I'm needing to do is find that unused code so I can get rid of it. I'm happy if it deals with the problem on an exportable name basis.GHC has warnings that deal with non-exported unused code. Any tools specific to this task would be of interest.
However, I'm curious about a comprehensive cross referencing tool. I can find the unused code with such a tool. Years ago when I was working in C and assembler, I found that a good xref was a pretty handy tool, useful for many different purposes.
I'm getting nowhere with googling. Apparently in Haskell the dominant meaning of cross-reference is within literate programming. Though maybe something there would be useful.

I don’t know of such a tool, so in the past I have done a bit of a hack instead.
If you have a comprehensive test suite, you can run it with GHC’s code coverage tracing enabled. Compile with -fhpc and use hpc markup to generate annotated source. This gives you the union of unused code and untested code, both of which you would probably like to address anyway.
SourceGraph can give you a bunch of information which you may also find useful.

There is now a tool for this very purpose: https://hackage.haskell.org/package/weeder
It's been around since 2017, and while it has limitations, it definitely helps with large codebases.

Simple toolkits for emotion (sentiment) analysis (not using machine learning)

I am looking for a tool that can analyze the emotion of short texts. I searched for a week and I couldn't find a good one that is publicly available. The ideal tool is one that takes a short text as input and guesses the emotion. It is preferably a standalone application or library.
I don't need tools that is trained by texts. And although similar questions are asked before no satisfactory answers are got.
I searched the Internet and read some papers but I can't find a good tool I want. Currently I found SentiStrength, but the accuracy is not good. I am using emotional dictionaries right now. I felt that some syntax parsing may be necessary but it's too complex for me to build one. Furthermore, it's researched by some people and I don't want to reinvent the wheels. Does anyone know such publicly/research available software? I need a tool that doesn't need training before using.
Thanks in advance.

I think that you will not find a more accurate program than SentiStrength (or SoCal) for this task - other than machine learning methods in a specific narrow domain. If you have a lot (>1000) of hand-coded data for a specific domain then you might like to try a generic machine learning approach based on your data. If not, then I would stop looking for anything better ;)

Identifying entities and extracting precise information from short texts, let alone sentiment, is a very challenging problem specially with short text because of lack of context. Hovewer, there are few unsupervised approaches to extracting sentiments from texts mainly proposed by Turney (2000). Look at that and may be you can adopt the method of extracting sentiments based on adjectives in the short text for your use-case. It is hovewer important to note that this might require you to efficiently POSTag your short text accordingly.

Maybe EmoLib could be of help.

organizing large pieces of code

I was wondering what methods of code organization stackoverflow users use. I have a sporadic thought process and as a result my code can start to look messy and over whelming. Any tips ?

Keep methods short and give classes a single, clear responsibility.
It's not necessary, but TDD can help you acheive this

One file per class.
Folders for related classes.
Use modules/packages/assemblies/namespaces if your language supports them.
In general, keep many levels of abstraction, and try to keep them separate through whatever mechanism you can in your language/ide/platform of choice.
Read Domain Driven Design, which discusses these issues (design, documentation, organization and communication).

I would suggest looking at the principles of Large Scale C++ Software Design by John Lakos (ISBN-13: 978-0201633627) if not the book itself. They are summed up in these lecture notes. Another summary of ideas.
Here's a brief outline of the headings of the principles, which while written about in the C++ context, the geist of which are language agnostic.
Internal and External Linkage
Components and Dependency Relations
Physical Hierarchy Reducing Link-Time
Dependencies: Levelization Reducing
Compile-Time Dependencies: Insulation

Expert System engine

Next year will be my graduate year to be an informatics engineering person and I am trying to find ideas about the jounior project. Actually, I have an idea of making an expert system engine. I worked with clips and prolog and I liked clips very much but it seems to be an old engine. Can any one advice me about this idea or give me sources for papers or any topics that may help me? I am thinking to use C language to obtain the high performance, and to build a robust data structure. Also, I am thinking about an idea (I dont know if it could be done) of writing facts and rules (like clips) and then generate a C++ optimal code from these rules such that I can obtain the speed of the machine and use exe file.
I need help to make this idea more clear and how it can be done. Specially because I read about fuzzy logic, nueral network and heard about the new generation of expert system, so I dont know how that can be related to such topic.

For your junior project, I would recommend against writing it in C. Your problem sounds like it needs correctness more than it needs speed. Writing it in C will take longer because you will need to implement a lot of primitives that are not included in the language or any standard library. Also, since C is relatively low-level, there are a lot of opportunities to make low-level mistakes. Write it in a higher level language that is closer to the problem domain. You will have more time to focus on your actual problem because you will spend less time getting the framework set up. If you already know Prolog, it would be good to stick with that. Perhaps you might consider Mercury. It is similar to Prolog, but also designed for speed.

JBoss Rules (also known as Drools) offers the best approach to rule-processing. It's written in Java. It allows you to integrate program components in the rules, and rule-bases into your program components. You can even build or modify rule-bases on the fly.
I've heard that Java is catching up in its ability to do math, but outside of that, you have nothing to fear from performance.

Good APIs for scope analyzers

I'm working on some code generation tools, and a lot of complexity comes from doing scope analysis.
I frequently find myself wanting to know things like
What are the free variables of a function or block?
Where is this symbol declared?
What does this declaration mask?
Does this usage of a symbol potentially occur before initialization?
Does this variable potentially escape?
and I think it's time to rethink my scoping kludge.
I can do all this analysis but am trying to figure out a way to structure APIs so that it's easy to use, and ideally, possible to do enough of this work lazily.
What tools like this are people familiar with, and what did they do right and wrong in their APIs?

I'm a bit surprised at at the question, as I've done tons of code generation and the question of scoping rarely comes up (except occasionally the desire to generate unique names).
To answer your example questions requires serious program analysis well beyond scoping. Escape analysis by itself is nontrivial. Use-before-initialization can be trivial or nontrivial depending on the target language.
In my experience, APIs for program analysis are difficult to design and frequently language-specific. If you're targeting a low-level language you might learn something useful from the Machine SUIF APIs.
In your place I would be tempted to steal someone else's framework for program analysis. George Necula and his students built CIL, which seems to be the current standard for analyzing C code. Laurie Hendren's group have built some nice tools for analyzing Java.
If I had to roll my own I'd worry less about APIs and more about a really good representation for abstract-syntax trees.
In the very limited domain of dataflow analysis (which includes the uninitialized-variable question), João Dias and I have adapted some nice work by Sorin Lerner, David Grove, and Craig Chambers. Only our preliminary results are published.
Finally if you want to generate code in multiple languages this is a complete can of worms. I have done it badly several times. If you create something you like, publish it!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

What is the difference between source code analysis and object file analysis? - security

Related

Dead code and/or how to generate a cross reference from Haskell source

Simple toolkits for emotion (sentiment) analysis (not using machine learning)

organizing large pieces of code

Expert System engine

Good APIs for scope analyzers

Categories

Resources