I am working on using clang bingings python to travers c/c++ code into AST,how can I get a tree based AST structure?
Some pointers on where to start, tutorials or anything in this regard will be of great help!!!
I found a very useful work(If you want to check this out ,here is the link:https://www.chess.com/blog/lockijazz/using-python-to-traverse-and-modify-clang-s-ast-tree) and tried his code,unfortunately I didn't get a useful output.
function_calls = []
function_declarations = []
def traverse(node):
for child in node.get_children():
traverse(child)
if node.type == clang.cindex.CursorKind.CALL_EXPR:
function_calls.append(node)
if node.type == clang.cindex.CursorKind.FUNCTION_DECL:
function_declarations.append(node)
print 'Found %s [line=%s, col=%s]' % (node.displayname, node.location.line, node.location.column)
clang.cindex.Config.set_library_path("/Users/tomgong/Desktop/build/lib")
index = clang.cindex.Index.create()
tu = index.parse(sys.argv[1])
root = tu.cursor
traverse(root)
Just in case anyone was having trouble still, I found that if you should be using kind instead of type
you can run clang.cindex.CursorKind.get_all_kinds() to retrieve all kinds and see that when using the node.type does not appear in any of them.
function_calls = []
function_declarations = []
def traverse(node):
for child in node.get_children():
traverse(child)
if node.kind == clang.cindex.CursorKind.CALL_EXPR:
function_calls.append(node)
if node.kind == clang.cindex.CursorKind.FUNCTION_DECL:
function_declarations.append(node)
print 'Found %s [line=%s, col=%s]' % (node.displayname, node.location.line, node.location.column)
clang.cindex.Config.set_library_path("/Users/tomgong/Desktop/build/lib")
index = clang.cindex.Index.create()
tu = index.parse(sys.argv[1])
root = tu.cursor
traverse(root)
how can I get a tree based AST structure?
The translation unit object's cursor (tu.cursor) is actually the start node of an AST. You might wanna use clang tool to visually analyze the tree. Maybe this will shed the light and give you the intuition on how to work with the tree.
clang++ -cc1 -ast-dump test.cpp
But basically, it boils down to getting children nodes of the main node (tu.cursor) and recursively traversing them, and getting to the nodes which are of interest to you.
You might wanna also check an article from Eli Benderski how to start working with the python binding:
https://eli.thegreenplace.net/2011/07/03/parsing-c-in-python-with-clang#id9
unfortunately I didn't get a useful output.
You might run into incomplete or wrong parsing, when you don't provide paths to includes in the parsed file to libclang module. For example, if the source file you want to parse uses some of the QT includes, then you need to specify relevant include paths in the parse() call like in the example here:
index = clang.cindex.Index.create()
tu = index.parse(src_file, args = [
'-I/usr/include/x86_64-linux-gnu/qt5/',
'-I/usr/include/x86_64-linux-gnu/qt5/QtCore'])
Also look for some comments in the libclang.cindex python module, they can help you. For example, I found the solution above by reading those comments.
I have been using pycparser in order to do obtain the AST of C/C++ source code and explore the same using python.
You can find the API for exploring the AST in this example from the repository.
Related
Is there a way, beside parsing the file, to display the comments in a Python file ?
As in :
d = {
# key value uses
k = v
}
I would display :
# key value uses
in the function __doc__.
Thanks
Python always deletes (and docstrings not at the beginning of a definition). So you'll have to parse the source yourself if you want to extract them.
The standard library's ast module also drops comments, but you could take a look at the tokenize module, which returns them. (However, it doesn't parse, so you'd still need to do some work to associate the comment with its function or class or whatever.)
I'm currently trying to learn Nim (it's going slowly - can't devote much time to it). On the other hand, in the interests of getting some working code, I'd like to prototype out sections of a Nim app I'm working on in ruby.
Since mruby allows embedding a ruby subset in a C app, and since nim allows compiling arbitrary C code into functions, it feels like this should be relatively straightforward. Has anybody done this?
I'm particularly looking for ways of using Nim's funky macro features to break out into inline ruby code. I'm going to try myself, but I figure someone is bound to have tried it and /or come up with more elegant solutions than I can in my current state of learning :)
https://github.com/micklat/NimBorg
This is a project with a somewhat similar goal. It targets python and lua at the moment, but using the same techniques to interface with Ruby shouldn't be too hard.
There are several features in Nim that help in interfacing with a foreign language in a fluent way:
1) Calling Ruby from Nim using Nim's dot operators
These are a bit like method_missing in Ruby.
You can define a type like RubyValue in Nim, which will have dot operators that will translate any expression like foo.bar or foo.bar(baz) to the appropriate Ruby method call. The arguments can be passed to a generic function like toRubyValue that can be overloaded for various Nim and C types to automatically convert them to the right Ruby type.
2) Calling Nim from Ruby
In most scripting languages, there is a way to register a foreign type, often described in a particular data structure that has to be populated once per exported type. You can use a bit of generic programming and Nim's .global. vars to automatically create and cache the required data structure for each type that was passed to Ruby through the dot operators. There will be a generic proc like getRubyTypeDesc(T: typedesc) that may rely on typeinfo, typetraits or some overloaded procs supplied by user, defining what has to be exported for the type.
Now, if you really want to rely on mruby (because you have experience with it for example), you can look into using the .emit. pragma to directly output pieces of mruby code. You can then ask the Nim compiler to generate only source code, which you will compile in a second step or you can just change the compiler executable, which Nim will call when compiling the project (this is explained in the same section linked above).
Here's what I've discovered so far.
Fetching the return value from an mruby execution is not as easy as I thought. That said, after much trial and error, this is the simplest way I've found to get some mruby code to execute:
const mrb_cc_flags = "-v -I/mruby_1.2.0_path/include/ -L/mruby_1.2.0_path/build/host/lib/"
const mrb_linker_flags = "-v"
const mrb_obj = "/mruby_1.2.0_path/build/host/lib/libmruby.a"
{. passC: mrb_cc_flags, passL: mrb_linker_flags, link: mrb_obj .}
{.emit: """
#include <mruby.h>
#include <mruby/string.h>
""".}
proc ruby_raw(str:cstring):cstring =
{.emit: """
mrb_state *mrb = mrb_open();
if (!mrb) { printf("ERROR: couldn't init mruby\n"); exit(0); }
mrb_load_string(mrb, `str`);
`result` = mrb_str_to_cstr(mrb, mrb_funcall(mrb, mrb_top_self(mrb), "test_func", 0));
mrb_close(mrb);
""".}
proc ruby*(str:string):string =
echo ruby_raw("def test_func\n" & str & "\nend")
"done"
let resp = ruby """
puts 'this was a puts from within ruby'
"this is the response"
"""
echo(resp)
I'm pretty sure that you should be able to omit some of the compiler flags at the start of the file in a well configured environment, e.g. by setting LD_LIBRARY_PATH correctly (not least because that would make the code more portable)
Some of the issues I've encountered so far:
I'm forced to use mrb_funcall because, for some reason, clang seems to think that the mrb_load_string function returns an int, despite all the c code I can find and the documentation and several people online saying otherwise:
error: initializing 'mrb_value' (aka 'struct mrb_value') with an expression of incompatible type 'int'
mrb_value mrb_out = mrb_load_string(mrb, str);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~
The mruby/string.h header is needed for mrb_str_to_cstr, otherwise you get a segfault. RSTRING_PTR seems to work fine also (which at least gives a sensible error without string.h), but if you write it as a one-liner as above, it will execute the function twice.
I'm going to keep going, write some slightly more idiomatic nim, but this has done what I needed for now.
Using Stanford CoreNLP, I am trying to parse text using the neural nets dependency parser. It runs really fast (that's why I want to use this and not the LexicalizedParser), and produces high-quality dependency relations. I am also interested in retrieving the parse trees (Penn-tree style) from that too. So, given the GrammaticalStructure, I am getting the root of that (using root()), and then trying to print it out using the toOneLineString() method. However, root() returns the root node of the tree, with an empty/null list of children. I couldn't find anything on this in the instructions or FAQs.
GrammaticalStructure gs = parser.predict(tagged);
// Print typed dependencies
System.err.println(gs);
// get the tree and print it out in the parenthesised form
TreeGraphNode tree = gs.root();
System.err.println(tree.toOneLineString());
The output of this is:
ROOT-0{CharacterOffsetBeginAnnotation=-1, CharacterOffsetEndAnnotation=-1, PartOfSpeechAnnotation=null, TextAnnotation=ROOT}Typed Dependencies:
[nsubj(tell-5, I-1), aux(tell-5, can-2), advmod(always-4, almost-3), advmod(tell-5, always-4), root(ROOT-0, tell-5), advmod(use-8, when-6), nsubj(use-8, movies-7), advcl(tell-5, use-8), amod(dinosaurs-10, fake-9), dobj(use-8, dinosaurs-10), punct(tell-5, .-11)]
ROOT-0
How can I get the parse tree too?
Figured I can use the Shift-Reduce constituency parser made available by Stanford. It's very fast and the results are comparable.
Given:
struct NameType([u8;64]);
name: (NameType, NameType);
I can do:
let def = &name.0 OR &name.1
but I cannot do:
let def = &name.0.0 OR &name.1.0
to access the internals. I have to do it twice:
let abc = &name.0;
let def = &abc.0;
why am I unable to chain it to access inner sub-tuples, tuple-structs etc?
rustc 1.0.0-nightly (ecf8c64e1 2015-03-21) (built 2015-03-22)
As mentioned in the comments, foo.0.0 will be parsed as having a number. This was originally mentioned in the RFC, specifically this:
I'd rather not change the lexer to permit a.0.1. I'd rather just have that be an error and have people write out the names. We could always add it later.
You can certainly file a bug, but as a workaround, use parenthesis:
(foo.0).0
In my opinion, you shouldn't be nesting tuples that deep anyway. I'd highly recommend giving names to fields before you slowly go insane deciding if you wanted foo.0.1.2 or foo.1.2.0.
In addition to above answers, I have also found out that a gap would work wonders :) So;
foo.0. 0 OR foo.0 . 0 etc all work
is fine. Don't know how much it means but there is a way to chain it if somebody wants to though (without resorting to brackets)
I am writing python scripts to extract data from multiple sources and put it in a graph in a certain structure.
I am using bulbs models for all the data. I have models for all relevant node types and relationships. My edge models have not additional properties except 'label'.
As it is in development, I run the same script multiple times. I use get_or_create to prevent duplicate nodes but edges do not have that method. I do not have the object for existing edge since it was created in a previous run of the script.
I saw several question talking about similar things with answers from espeed like this, but I could not find a satisfactory answer for my specific issue.
What would be the simplest code for this method?
Presently I am trying to do this via loading a gremlin script; as suggested by Stephen; with following function:
def is_connected(parent, child, edge_label) {
return g.v(parent).out(edge_label).retain([g.v(child)]).hasNext()
}
And the the following python code.
g.scripts.update('gremlin_scripts/gremlin.groovy')
script = g.scripts.get('gremlin:is_connected')
params = dict(parent=parent_node.eid, child=menu_item_v.eid, edge_label='has_sub_menu_item')
response = g.gremlin.execute(script, params)
I can't quite figure out how to get the bool result into python. I've also tried the g.gremlin.query(script, param)
Here's one way to do it:
parent_v.out(rel_label).retain(child_v).hasNext()
So, from the parent, traverse out to all children (i assume that "out" is the direction of your relationship - how you choose to implement that is specific to your domain) and determine if that child is present at any point via retain.