How does DalvikVM handle switch and try smali code - dalvik

I am trying to learn smali and I have a few question that I couldn't find by googling them.
1) I created a simple test case to better explain myself
const-string v1, "Start"
:try_start_0
const-string v1, "Try Block"
invoke-static {v1}, Lcom/example/test/Main;->print(Ljava/lang/String;)V
:try_end_0
.catchall {:try_start_0 .. :try_end_0} :catchall_0
The .catch statement: does the two arguments mean take from that label to that label and catch it (the code between the two label) or does it mean start executing the try from :try_start_0 up until it reaches :try_end_0 (allows a goto jump to execute code not within the two labels)?
Is the labels for try always in the format try_start_%d or can they be any label?
2)Another case
packed-switch v0, :pswitch_data_0
const-string v1, "Default Case"
invoke-static {v1}, Lcom/example/test/Main;->print(Ljava/lang/String;)V
:goto_0
const-string v1, "The End"
invoke-static {v1}, Lcom/example/test/Main;->print(Ljava/lang/String;)V
return-void
:pswitch_0
const-string v1, "Case 1"
invoke-static {v1}, Lcom/example/test/Main;->print(Ljava/lang/String;)V
goto :goto_0
:pswitch_data_0
.packed-switch 0x1
:pswitch_0
.end packed-switch
The switch statement: Does it require that the switch statements lie between the switch data and the switch call? and also again the naming of the labels fixed or just that for convenience?
3)If the labels can be different, would baksmali ever produce smali code with different labels?
4)What are the optional lines that aren't always shown when decompiling a dex?
I know .parameter and .line are optional, but what are all the ones that might not be there?
Thank you in advance.

1)
The first two labels (try_start_0 and try_end_0 in your example) define the range of code that the try block covers. If an exception happens within the covered code, then execution immediately jumps to the third label (catchall_0). The name of the label isn't important, it can be any valid identifier.
There is also the .catch directive, with is the same thing, except it only handles a specific type of exception (similar to java's catch statement).
A block of code can be covered by multiple catch statements, and at most 1 catch all statement. The location of the .catch statement is not important, however, the relative ordering of catch statements that cover the same code is import. For example, if you have
.catch Ljava/lang/Exception; {:try_start_0 .. :try_end_0} :handler1
.catch Ljava/lang/RuntimeException; {:try_start_0 .. :try_end_0} :handler2
The 2nd catch statement will never be used. If a RuntimeException is thrown in the covered code, the first catch will always get used, since a RuntimeException is an Exception.
However, if they were in the opposite order, it would work like you expect - the RuntimeException handler gets used for RuntimeExceptions, and the Exception handler gets used for any other type of exception.
And finally, unlike java, the range of code in the .catch statements does not need to be strictly nested. For example, it's perfectly legal to have something like
:a
const-string v1, "Start"
:b
const-string v1, "Try Block"
:c
invoke-static {v1}, Lcom/example/test/Main;->print(Ljava/lang/String;)V
:d
.catch Ljava/lang/RuntimeException; {:a .. :c} :d
.catch Ljava/lang/Exception; {:b .. :d} :d
You can also have some pretty weird constructions, like this.
.method public static main([Ljava/lang/String;)V
.registers 3
:second_handler
:first_try_start
new-instance v0, Ljava/lang/RuntimeException;
invoke-direct {v0}, Ljava/lang/RuntimeException;-><init>()V
throw v0
:first_try_end
.catch Ljava/lang/Exception; {:first_try_start .. :first_try_end} :first_handler
:first_handler
:second_try_start
new-instance v0, Ljava/lang/RuntimeException;
invoke-direct {v0}, Ljava/lang/RuntimeException;-><init>()V
throw v0
:second_try_end
.catch Ljava/lang/Exception; {:second_try_start .. :second_try_end} :second_handler
.end method
Neither of the above examples would ever be generated from compiled java code, but the bytecode itself allows it.
2) The switch statements could be anywhere in relation to the switch statement or switch data. The label names here are arbitrary as well.
3) Baksmali can generate labels in one of 2 ways. The default way is to use the general "type" of label, and append the bytecode address of the label. If you specify the -s/--sequential-labels option, instead of using the bytecode address, it keeps a counter for each label type and increments it each time it generates a label of that type.
4) Generally anything that is part of the debug information. .parameter, .line, .prologue, .epilogue, .source, .local, .restart local, .end local... I think that about covers it.

Related

what's the proper way to allow users to provide a string "mangler" as a regex/proc/expr/

In my Tcl/Tk project, i need to allow my users to mangle a string in a well-defined way.
The idea is, to allow people to declare a "string mangling" proc/expr/function/... in a configuration file, which then gets applied to the strings in question.
I'm a bit worried on how to properly implement that.
Possibilities I have considered so far:
regular expressions
That was my first thought, but there's two caveats:
search/replace with regular expressions in Tcl seems to be awkward. at least with regsub i need to pass the match and replacement parts separately (as opposed to how e.g. sed allows me to pass a single complicated string that does everything for me); there are sed implementations for Tcl, but they look naive and might break rather sooner than later
also regexes can be awkward by themselves; using them to mangle complicated strings is often more complicated than it should be
procs?
Since the target platform is Tcl anyhow, why not use the power of Tcl to do string mangling?
The "function" should have a single input and produce a single output, and ideally it the user should be nudged into doing it right (e.g. not being able to define a proc that requires two arguments) and it be (nigh) impossible to create side-effects (like changing the state of the application).
A simplistic approach would be to use proc mymangler s $body (with $body being the string defined by the user), but there are so many things that can go wrong:
$body assuming a different arg-name (e.g. $x instead of $s)
$body not returning anything
$body changing variables,... in the environment
expressions look more like it (always returning things, not allowing to modify the environment easily), but i cannot make them work on strings, and there's no way to pass a variable without agreeing its name.
So, the best I've come up with so far is:
set userfun {return $s} # user-defined string
proc mymangler s ${userfun}
set output [mymangler $input]
Are there better ways to achieve user-defined string-manglers in Tcl?
You can use apply -- the user provides a 2-element list: the second element is the "proc body", the code that does the mangling; the first element is the variable name to hold the string, this variable is used in the body.
For example:
set userfun {{str} {string reverse $str}}
set input "some string"
set result [apply $userfun $input] ;# => "gnirts emos"
Of course the code you get from the user is any arbitrary Tcl code. You can run it in a safe interpreter:
set userfun {{str} {exec some malicious code; return [string reverse $str]}}
try {
set interp [safe::interpCreate]
set result [$interp eval [list apply $userfun $input]]
puts "mangled string is: $result"
safe::interpDelete $interp
} on error e {
error "Error: $e"
}
results in
Error: invalid command name "exec"
Notes:
a standard Tcl command is used, apply
the user must specify the variable name used in the body.
this scheme does protect the environment:
set userfun {{str} {set ::env(SOME_VAR) "safe slave"; return $str$str}}
set env(SOME_VAR) "main"
puts $env(SOME_VAR)
try {
set interp [safe::interpCreate]
set result [$interp eval [list apply $userfun $input]]
puts "mangled string is: $result"
safe::interpDelete $interp
} on error e {
error "Error: $e"
}
puts $env(SOME_VAR)
outputs
main
mangled string is: some stringsome string
main
if the user does not return a value, then the mangled string is simply the empty string.
The "simplistic" approach is like foreach in that it requires the user to supply a variable name and a script to evaluate that uses that variable, and is a good approach. If you don't want it affecting the rest of the program, run it in a separate interpreter:
set x 0
proc mymangler {name body} {
set i [interp create -safe]
set s "some string to change"
try {
# Build the lambda used by apply here instead of making
# the user do it.
$i eval [list apply [list $name $body] $s]
} on error e {
return $e
} finally {
interp delete $i
}
}
puts [mymangler s { set x 1; string toupper $s }]
puts $x
outputs
SOME STRING TO CHANGE
0
If the person calling this says to use s as a variable and then uses something else in the body, it's on them. Same with providing a script that doesn't return anything.
I'd generally allow the user to specify a command prefix as a Tcl list (most simple command names are trivially suitable for this), which you would then apply to the argument by doing:
set mangled [{*}$commandPrefix $valueToMangle]
This lets people provide pretty much anything they want, especially as they can use apply and a lambda term to mangle things as required. Of course, if you're in a procedure then you're probably actually better off doing:
set mangled [uplevel 1 [list {*}$commandPrefix $valueToMangle]]
so that you're running in the caller's context (change 1 to #0 to use the global context instead) which can help protect your procedure against accidental changes and make using upvar within the mangler easier.
If the source of the mangling prefix is untrusted (what that means depends greatly on your application and deployment) then you can run the mangling code in a separate interpreter:
# Make the safe evaluation context; this is *expensive*
set context [interp create -safe]
# You might want to let them define extra procedures too
# interp invokehidden $context source /the/users/file.tcl
# Use the context
try {
set mangled [interp eval $context [list {*}$commandPrefix $valueToMangle]]
} on error {msg} {
# User supplied something bad; error message in $msg
}
There's various ways to support users specifying the transformation, but if you can expose the fact that you're working with Tcl to them then that's probably easiest and most flexible.

How can I reference an unnamed argument of a when expression?

I have a when expression that looks something like this:
when(foo.toString()){
"" ->'A'
"HELLO" ->'B'
"GOODBYE"->'C'
else ->foo.toString()[0]//problematic method call duplication
}
Now, I don't want to call foo.toString() twice, but I also want this to remain a single expression. Is there a convenient way for me to access the value I passed into the when expression in its else block, such as the it or this# syntax found elsewhere in the language?
I'm currently using the following work-around:
with(foo.toString()){
when(this){
"" ->'A'
"HELLO" ->'B'
"GOODBYE"->'C'
else ->this[0]
}
}
But this introduces another block, and is less readable than I'd like. Is there a better solution?
For the when block there's no variable specified, but you can use the let() function for a similar behavior which might be a little better than your workaround, but behaving the same.
foo.toString().let{
when(it){
"" ->'A'
"HELLO" ->'B'
"GOODBYE"->'C'
else ->it[0]
}
}

How to check if the first variable passed into a method is a string. Perl

I have no idea how to check for this. My method(if condition in method) should only work (execute) if the first argument passed in is a string. I know how to check other types, but I can't seem to find anything for checking for a string.
For a hash I would do something like;
if(ref eq 'HASH') {...}
If someone could provide a simple example I'm sure I would be able to apply it to what I'm doing. I will put up the code for the method and an explanation for the whole operational details of the method if needed.
Added Information
This is a method for handling different types of errors in the software, here are the 3 possible input formats:
$class->new("error string message")
$class->new("error string message", code => "UNABLE_TO_PING_SWITCH_ERROR")
$class->new("error string message", code => "UNABLE_TO_PING_SWITCH_ERROR", switch_ip => $ip3, timeout => $timeout)
There will always be an error message string first.
With the 1st case there is also a hashref to an error hash structure that is located in a library,
this method new will go into a template processing if the word "code" exists as an arg. where the longer detailed error message is constructed. (I already have the logic for this).
But I have to add logic so that the error message string is added to the hash, so the output is one hash, and not strings.
The second case is very similar to the first, where there are parameters eg. switch_ip , which are inserted into the string using a similar template processing logic, (already have this too).
So I think the first and second cases can be handled in the same way, but I'm not sure, so separated them in this question.
The last case is just can error message string by itself, which at the minute I just insert it into a one key message hash { message => "error string}.
So after all that how should I be checking or dividing up these error cases, At the minute my idea for the ones with code , is to dump the arguments into a hash and just use something like:
if(exists($param{code}) { doTemplateProcess()...}
I need to ensure that there is a string passed in first though. Which was my original question. Does any of my context information help? I hope I didn't go off the topic of my question, if so I'll open this a new question. Thanks.
Error hash - located in Type.pm
use constant ERROR_CODE => {
UNABLE_TO_PING_SWITCH_ERROR => {
category => 'Connection Error:',
template => 'Could not ping switch %s in %s minutes',
tt => {template => 'disabled'},
fatal => 1,
wiki_page => www.error-solution.com/,
},
}
From comments:
These will be called in the software's code like so
ASC::Builder::Error->new(
"Phase x this occured because y was happening:",
code => UNABLE_TO_PING_SWITCH_ERROR,
switch_ip => $ip3,
timeout => 30,
);
Putting the wisdom of your particular problem aside and channeling Jeff Foxworthy:
If you have a scalar and it's not a reference, you might have a string.
If your non-reference scalar doesn't look like a number, it might be a string.
If your non-reference scalar looks like a number, it can still be a string.
If your non-reference scalar has a different string and number value, it might be a dualvar.
You know that your argument list is just that: a list. A list is a collection of scalar values. A scalar can be a reference or not a reference. I think you're looking for the not a reference case:
die "You can't do that" if ref $first_argument;
Past that, you'd have to do fancier things to determine if it's the sort of value that you want. This might also mean that you reject objects that pretend to be strings through overloading and whatnot.
Perhaps you can make the first argument part of the key-value pairs that you pass. You can then access that key to extract the value and delete it before you use the remaining pairs.
You may easily check only whether the error string is a simple scalar value or a reference. You would do that with ref, but you must consider what you want to do if the first parameter isn't a string
You should write your constructor in the ASC::Builder::Error package along these lines
sub new {
my $class = shift;
my ($error, %options) = #_;
die if ref $error;
bless { string => $error }, $class;
}
This example simply dies, and so kills the program, if it is called with anything other than a simple string or number as the first parameter
You may call it as
ASC::Builder::Error->new('error')
or
ASC::Builder::Error->new(42)
and all will be well. If you try
ASC::Builder::Error->new('message', 'code')
then you will see a warning
Odd number of elements in hash assignment
And you may make that warning fatal
If there is anything more then you should explain
Supporting all of the following is simple:
$class->new("s")
$class->new("s", code => "s")
$class->new("s", code => "s", switch_ip => "s", timeout => "s")
All you need is the following:
sub new {
my ($class, $msg, %opts) = #_;
...
}
You can checks such as the following to examine what the called provided:
if (exists($opts{code}))
if (defined($opts{code}))
if ($opts{code})
Despite saying that the string will always be provided, you now ask how to check if was provided. As such, you are probably trying to perform validation rather than polymorphism. You shouldn't waste your time doing this.
Let's look at the hash reference example you gave. ref($arg) eq 'HASH' is wrong. That returns false for some hash references, and it returns false for some things that act like a reference to a hash. The following is a more proper check:
eval { %$arg; 1 }
The equivalent for strings would be the following:
eval { "$arg"; 1 }
Unfortunately, it will always return true! Every value can act as a string. That means the best thing you can do is simply to check if any argument is provided.
use Carp qw( croak );
croak("usage") if !#_;
It's rare for Perl subs to perform argument validation. Not only is it tricky, it's also expensive. It also provides very little benefits. Bad or missing arguments usually results in exceptions or warnings shortly after.
You might see suggestions to use croak("usage") if ref($arg); (or worse, die if ref($arg);), but keep in mind that those will cause the rejection of perfectly fine objects that overload stringification (which is somewhat common), and they will fail to detect the problem with ASC::Builder::Error->new(code => ...) because code produces a string. Again, performing type-based argument validation is an expensive and buggy practice in Perl.

What is the proper way of calling a method in Groovy

In fact, my issue is a lot broader than I could explain in the title. I'm having a problem with understanding a code in Groovy that's supposed to be fairly easy to understand. Please have a look at the following piece of code.
// event handlers are passed the event itself
1:def contactHandler(evt) {
2: log.debug "$evt.value"
3:
4: // The contactSensor capability can be either "open" or "closed"
5: // If it's "open", turn on the light!
6: // If it's "closed" turn the light off.
7: if (evt.value == "open") {
8: switch1.on();
9: } else if (evt.value == "closed") {
10: switch1.off();
11: }
12:}
I can understand everything starting that falls after the line 2, but If lines 8 or 10 are the proper way of calling a method, then what the heck is going on in the line 2? I can understand that log.debug means "debug" function of the class called "log".(Or something similar) But what is that blank space after it? And more than that, why does it say "$evt.value", when it can simply say "evt.value" in lines 8 and 10? And why isn't there a semicolon at the end of the line. I know, they're optional, but as far as I can see there's a convention as to when to use them and when to not. And lastly, I have a stranger line of code which totally insane (to me of course):
11: section ("When the door opens/closes...") {
12: input "contact1", "capability.contactSensor",
13: title: "Where?"
14: }
How should I understand the line starting from 12?
I've taken a look at http://groovy.codehaus.org/ but couldn't decide what to look for in order to find an explanation.
Ok, so from the beginning:
In groovy you can omit parentheses () when calling a method with arguments. So
log.debug 'lol'
is exactly the same as:
log.debug('lol')
Since on() and off() don't have any arguments, there's a need to use parens - or they might be mistaken with on and off fields. Blank separates method from arguments.
evt.value vs "$evt.value" - it's not the same. First is just a literal string, the second one is GString. First will print evt.value while the second one will evaluate the value of value variable for evt object. It might open or closed as further code shows.
Semicolon is optional, that's all I can say. No idea why semicolons are there. Sometimes there's a need to use semicolon - in oneliners e.g.
items.collect { print it; it*it }
Starting from line no.12 it's also a method call. It's equal to:
input("contact1", "capability.contactSensor", title: "Where?")
It's passing a map as the first parameter, and then two strings as the second and third parameters.
Further reading:
Methods - look also for named parameters.
Optionality
All docs

how to handle conditionally existing components in action code?

This is another problem I am facing while migrating from antlr3 to antlr4. This problem is with the java action code for handling conditional components of rules. One example is shown below.
The following grammar+code worked in antlr3. Here, if the unary operator is not present, then a value of '0' is returned, and the java code checks for this value and takes appropriate action.
exprUnary returns [Expr e]
: (unaryOp)? e1=exprAtom
{if($unaryOp.i==0) $e = $e1.e;
else $e = new ExprUnary($unaryOp.i, $e1.e);
}
;
unaryOp returns [int i]
: '-' {$i = 1;}
| '~' {$i = 2;}
;
In antlr4, this code results in a null pointer exception during a run, because 'unaryOp' is 'null' if it is not present. But if I change the code like below, then antlr generation itself reports an error:
if($unaryOp==null) ...
java org.antlr.v4.Tool try.g4
error(67): missing attribute access on rule reference 'unaryOp' in '$unaryOp'
How should the action be coded for antlr4?
Another example of this situation is in if-then-[else] - here $s2 is null in antlr4:
ifStmt returns [Stmt s]
: 'if' '(' e=cond ')' s1=stmt ('else' s2=stmt)?
{$s = new StmtIf($e.e, $s1.s, $s2.s);}
;
NOTE: question 16392152 provides a solution to this question with listeners, but I am not using listeners, my requirement is for this to be handled in the action code.
There are at least two potential ways to correct this:
The "ANTLR 4" way to do it is to create a listener or visitor instead of placing the Java code inside of actions embedded in the grammar itself. This is the only way I would even consider solving the problem in my own grammars.
If you still use an embedded action, the most efficient way to check if the item exists or not is to access the ctx property, e.g. $unaryOp.ctx. This property resolves to the UnaryOpContext you were assuming would be accessible by $unaryOp by itself.
ANTLR expects you access an attribute. Try its text attribute instead: $unaryOp.text==null

Resources