How are variable length arguments parsed while parsing dalvik instructions? - dalvik

Both move vA, vB and move-wide vA, vB involve the same operations and same operands. Except opcodes everything is the same. I am in a situation where I need to print the operands used by instructions in an application.
So when I see the instruction move vA, vB I should print move va(*contents of va*), vb(*contents of vb*).
This works fine in the case of 4-byte registers. But when I encounter move-wide instructions, I should print the contents of vA and the contents of the next virtual register., contents of vB and the contents of the next virtual register. What is the standard way of parsing these?

Since both Dalvik and dx are open source, the best way to answer questions involving dex files is to inspect their source. Dx parses instructions in DecodedInstruction.java. It decodes the opcode first, and then uses the opcode to inform decoding the rest of the instruction.
public static DecodedInstruction decode(CodeInput in) throws EOFException {
int opcodeUnit = in.read();
int opcode = Opcodes.extractOpcodeFromUnit(opcodeUnit);
InstructionCodec format = OpcodeInfo.getFormat(opcode);
return format.decode(opcodeUnit, in);
}

Related

How can I find the length of a _bstr_t object using windbg on a user-mode memory dump file?

I have a dump file that I am trying to extract a very long string from. I find the thread, then find the variable and dump part of it using the following steps:
~1s
dv /v, which returns:
00000000`07a4f6e8 basicString = class _bstr_t
dt -n basicString
Command 3 truncates the string in the debugging console to just a fraction of its actual contents.
What I would like to do is find the actual length of the _bstr_t variable so that I can dump its contents out to a file with a command like the following:
.writemem c:\debugging\output\string.txt 07a4f6e8 L<StringByteLength>
So my question is how can I determine what I should put in for StringByteLength?
Your .writemem line is pretty close to what you need already.
First, you'll need the correct address of the string in memory. 07a4f6e8 is the address of the _bstr_t, so writing memory at that address won't do any good.
_bstr_t is a pretty complicated type, but ultimately it holds a BSTR member called m_wstr.
We can store its address in a register like so:
r? #$t0 = ##c++(basicString.m_Data->m_wstr)
As Igor Tandetnik's comment says, the length of a BSTR can be found in the 4 bytes preceding it.
Let's put that into a register as well:
r? #$t1 = *(DWORD*)(((BYTE*)#$t0)-4)
And now, you can writemem using those registers.
.writemem c:\debugging\output\string.txt #$t0 L?#$t1

Whats this line mean? (system:call getfreediskex)

System::Call '${sysGetDiskFreeSpaceEx}(r0,.,,.r1)'
If i'm right, r0: directoryname, free bytes, number of bytes, number oof free bytes, but what actually means the r0,.,.,r1?
Thx for the help!
${...} is a define so when you want to know how some things work then the first thing you should do is to find out what the define does: !error "${sysGetDiskFreeSpaceEx}" will print !error: kernel32::GetDiskFreeSpaceEx(t, *l, *l, *l) i
In the System readme you will find this nugget:
PARAMS, RETURN and OPTIONS can be repeated many times in one Get/Call
line. When repeating, a lot can be omitted, and only what you wish to
change can be used. Type, source and/or destination can be omitted for
each parameter, even the return value. Options can be added or
removed. This allows you to define function prototypes and save on
some typing.
So ${sysGetDiskFreeSpaceEx} is a prototype that specifies the parameter count and types but it does not specify parameter source and destination.
What is the parameter syntax?
The parameters list is separated by commas. Each parameter is combined
of three values: type, source and destination. Type can be an integer,
a string, etc. Source, which is the source of the parameter value, can
be a NSIS register ($0, $1, $INSTDIR), the NSIS stack, a concrete
value (5, "test", etc.) or nothing (null). Destination, which is the
destination of the parameter value after the call returns, can be a
NSIS register, the NSIS stack or nothing which means no output is
required. Either one of source or destination can also be a dot (`.')
if it is not needed.
We can now expand the entire call !error 'System::Call "${sysGetDiskFreeSpaceEx}(r0,.,,.r1)"' and this gives us !error: System::Call 'kernel32::GetDiskFreeSpaceEx(t, *l, *l, *l) i(r0,.,,.r1)'
If we merge the repeated parameter definitions we get kernel32::GetDiskFreeSpaceEx(tr0, *l., *l, *l.r1)i.
So parameter 1 is a string (LPTSTR on MSDN) with the source r0 (NSIS register $0).
Parameter 2 and 3 have no source and no destination, only parameter 2 uses a . (dot) but the end result is the same; no input and no output. The only important part here is *l so the system plugin knows how large the parameter is.
The final parameter is a pointer (*) to a 64 bit number (l) with no input (.) and we request the output to be stored in $1 (r1).
The system plugin calls the native Windows API so it is often useful to look at MSDN to see what it has to say about the function you are calling.
Given that $0..$9 and $R0..$R9 are NSIS registers, the notation r0 -> r9 (resp. R10..R19 or also R0..R9) is used by the System plugin to specify the $0..$9 (resp. $R0..$R9) registers as a source and / or a destination with system API or other dll function calls.
Either one of source or destination can also be a dot (.) if it is not needed.
Look for the "Calling functions" and "Available sources and destinations" sections in the system plugin documentation.

How to use strstrip for parsing a string in two parts

I would like to know hot to parse a string like this "hello world" into "helloworld" using the strstrip kernel function. I am developing a Linux Kernel char device and this functions causes me a Kernel Panic (or Kernel Opss).
The way I'm using this function is the following:
char result[100];
strcpy(result, "hello world");
strstrip(result);
strstrip(&result); //Also tried this
strstrip("100+200"); //Also tried this
The Kernel error is caused as soon as the strstrip line gets executed. What is the proper way to call this function?
Actually strstrip helps to remove the white spaces at the front. It does not remove all the white spaces with in the string.
Please look at the below example.
char result[100];
strcpy(result, " hello world from stack exchange");
printk("\n before: %s",result);
strcpy(result, strstrip((char*)result));
printk("\n after: %s",result);
Hope it helps.
srtstrip() is a wrapper function for strim() (http://lxr.linux.no/linux+v3.11.2/lib/string.c#L361) in modern kernels. As it will attempt to modify the string itself, you cannot call it with a static string as you have in the third attempt.
The second attempt you have is passing a pointer to an array variable which is also a pointer. So you are passing a char** which if you look at the link above you can see is not correct.
The first attempt should not cause a kernel error, but you do not appear to be receiving the return value in a a local variable. What kind of error are you receiving? I will update this answer if you can provide that information.
In the end though as Balamurugan A points out, this function does not do what you seem to think it does. strsep() (http://lxr.linux.no/linux+v3.11.2/lib/string.c#L485) may help you out here but it will only be a stepping stone to removing all spaces. You will actually have to copy the string into a new buffer word by word as there is not way to simply "shift memory contents", as it were.

fwrite writes more bytes than it's told

I'm writing an unsigned char buffer to file (C++):
FILE* f = fopen("out.data","wb");
size_t count = fwrite((const void *)pBuf, sizeof(unsigned char), dl, f);
When I read it, I get more bytes than the 'dl' I expect to get. Anyone knows why ?
There was a similar question where the cause was 'fopen(...,"w")' instead of 'fopen(...,"wb")'.
I read the file using Matlab (tried both 'r' and 'rb' in Matlab's fopen), if it has something to do with it ...
Thanks !
Alright! A concise question, and here is my best bet:
What is the value of dl variable?
What is being pointed by pBuf - how many bytes does it logically refer?
Have you checked the return value? It returns number of elements written, which in this case, would be bytes.
After closing the file, have you checked the actual size of file, and NOT size of file on-disk - Does it match with dl and/or count.
How have you opened the file and read the contents later, read in text mode or binary mode.
PS: Try to be more expressive, put more words, relevant in your question.

Postscript: how to convert a integer to string?

In postscript , the cvs *operator* is said to convert a number to a string. How should I use it ?
I tried :
100 100 moveto
3.14159 cvs show
or
100 100 moveto
3.14159 cvs string show
but it didn't work.
Any help ?
Try 3.14159 20 string cvs show.
string needs a size and leaves the created string on the stack. cvs needs a value and a string to store the converted value.
If you're doing lots of string conversions, it may be more efficient to create one string and reuse it in each conversion:
/s 20 string def
3.14159 s cvs show
tldr;
A common idiom is to use a literal string as a template.
1.42857
( ) cvs show
more...
You can even do formatted output by presenting cvs with various substrings of a larger string.
%0123456.......
(2/7 = ) dup 6 7 getinterval
2.85714 exch cvs pop show
But the Ghostscript Style Guide forbids this. And it's pretty much the only published Postscript Style Guide we have. (A discussion about this in comp.lang.postscript.) So a common recommendation is to allocate a fresh string when you need it and let the garbage collector earn its keep.
4.28571 7 string cvs show
Freshly allocating a string can be very important if you're wrapping this action in a procedure.
/toString { ( ) cvs } def
% vs
/toString { 10 string cvs } def
If you allocate a fresh string, then the enclosing procedure can be treated as a pure function of its inputs. If you use an embedded literal string as the buffer, then this resulting string is state-dependent and will be invalidated if the generating procedure is run again.
too much, don't do this...
As a last resort, the truly lazy hacker will hijack =string, the built-in 128-byte buffer used by = and == to output numbers (using, of course, our friend cvs). This is interpreter-specific and not portable according to the standard.
5.71428 =string cvs show
And if you like that one, you can combine it with ='s other trick: immediately evaluated names.
{ 7.14285 //=string cvs show } % embed =string in this procedure
This shaves that extra microsecond off, and makes it much harder to interactively inspect the code. Calling == on this procedure will not reveal the fact that you are using =string; it looks just like any other string.
Using =string in this manner inherits all the state-dependency problems described in the last section, ramped up a notch because there's only one =string buffer. And it adds a portability issue to boot, since =string is non standard -- albeit available in historical Adobe implementations and Ghostscript -- it is a legacy hack and should be used only in situations where a legacy hack is appropriate.
something else, no one (here) asked for...
One more trick for the bag, from a post by Helge Blischke in comp.lang.postscript. This is a simple way to get a zero-padded integer.
/bindec % <integer> bindec <string_of_length_6>
{
1000000 add 7 string cvs 1 6 getinterval
}bind def

Resources