fwrite writes more bytes than it's told - visual-c++

I'm writing an unsigned char buffer to file (C++):
FILE* f = fopen("out.data","wb");
size_t count = fwrite((const void *)pBuf, sizeof(unsigned char), dl, f);
When I read it, I get more bytes than the 'dl' I expect to get. Anyone knows why ?
There was a similar question where the cause was 'fopen(...,"w")' instead of 'fopen(...,"wb")'.
I read the file using Matlab (tried both 'r' and 'rb' in Matlab's fopen), if it has something to do with it ...
Thanks !

Alright! A concise question, and here is my best bet:
What is the value of dl variable?
What is being pointed by pBuf - how many bytes does it logically refer?
Have you checked the return value? It returns number of elements written, which in this case, would be bytes.
After closing the file, have you checked the actual size of file, and NOT size of file on-disk - Does it match with dl and/or count.
How have you opened the file and read the contents later, read in text mode or binary mode.
PS: Try to be more expressive, put more words, relevant in your question.

Related

Reading .dfb file with rust throws invalid character error

I am new to rust and creating a POC to convert dbf file to csv. I am reading a .dbf file using rust library dbase.
The issue is, when i crate a sample .dbf file using dbfview the code works fine. But when i use .dbf file which i will be using in real time. I am getting the following error.
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidFieldType('M')', src/libcore/result.rs:999:5
Here is the code i am using from the given link.
use dbase::FieldValue;
let records = dbase::read("tests/data/line.dbf").unwrap();
for record in records {
for (name, value) in record {
println!("{} -> {:?}", name, value);
match value {
FieldValue::Character(string) => println!("Got string: {}", string),
FieldValue::Numeric(value) => println!("Got numeric value of {}", value),
_ => {}
}
}
}
I think the ^M shows the character appended by windows.
What can i do to handle this error and read the file successfully.
Any help will be much appreciated.
The short answer to your question is no, you will not be able to read this file with dbase-rs (or any current library) and you'll most likely have to rework this file to not contain a memo field.
A deep dive into the DBF file format
The InvalidFieldType error points at a structural feature of the file that your library cannot handle - a Memo field. We're going to deep-dive into the file to figure out why that is, and whether there is anything we can do to fix it.
This is the header definition:
Of particular importance is byte 28 (offset 0000010, byte 0C), which is a bitmask indicating if the table contains a bunch of possible things, most notably:
0x01 if the file comes with an associated .cdx file
0x02 if it contains a memo
0x04 if the file is actually a .dbc file (a database)
At 0x03, your file comes with both an associated .cdx file and contains a memo. As we know (ahead of time) that dbase-rs does not handle that, that's looking increasingly more likely.
Let's keep looking. From here on, each field is 32 bytes long.
Here are your fields:
Bytes 0-10 contain the field name, byte 11 is the type. Due to how the library you wanted to use can only parse certain fields, we only really care about byte 11.
In order of appearance by what the library can parse:
[x] CALL_ID (integer)
[x] CONTACT_ID (integer)
[x] CALL_DATE (DateTime)
[x] SUBJECT (char[])
[ ] NOTES (memo)
The last field is the problematic one. Looking into the library itself, this field type is not supported and will therefore yield an Error, which you are trying to unwrap(). This is the source of your error.
There are two three ways around it:
The "long" way is to patch the library to handle memo fields. This sounds easy, but in practice it really isn't. As the memos are stored in another file (typically a dbt file in the same folder), you're going to have to make that library read both files and reference both. The point of the memo type itself is to store more than 255 bytes of data in a field. You are the only one able to evaluate whether this work is worth the effort.
If your data is less than 255 bytes in size, you can replace that memo field with a char field, and dbfview should allow you to do this
If your field is longer than 255 bytes and you have access to the ability to run sub-processes (i.e. Command::run), you can sneak-convert it using a library that can process Memo fields in another language. this nodeJS library can, but read-only, for example.

How can I find the length of a _bstr_t object using windbg on a user-mode memory dump file?

I have a dump file that I am trying to extract a very long string from. I find the thread, then find the variable and dump part of it using the following steps:
~1s
dv /v, which returns:
00000000`07a4f6e8 basicString = class _bstr_t
dt -n basicString
Command 3 truncates the string in the debugging console to just a fraction of its actual contents.
What I would like to do is find the actual length of the _bstr_t variable so that I can dump its contents out to a file with a command like the following:
.writemem c:\debugging\output\string.txt 07a4f6e8 L<StringByteLength>
So my question is how can I determine what I should put in for StringByteLength?
Your .writemem line is pretty close to what you need already.
First, you'll need the correct address of the string in memory. 07a4f6e8 is the address of the _bstr_t, so writing memory at that address won't do any good.
_bstr_t is a pretty complicated type, but ultimately it holds a BSTR member called m_wstr.
We can store its address in a register like so:
r? #$t0 = ##c++(basicString.m_Data->m_wstr)
As Igor Tandetnik's comment says, the length of a BSTR can be found in the 4 bytes preceding it.
Let's put that into a register as well:
r? #$t1 = *(DWORD*)(((BYTE*)#$t0)-4)
And now, you can writemem using those registers.
.writemem c:\debugging\output\string.txt #$t0 L?#$t1

Is there a c-runtime function equivalent of fscanf, which contains the same parameter list?

Hi i have a function like this
while(fscanf(fp,"\n%d\t%s\t%s\t%X%X\t%d\t \n",&record.Index,record.Name,record.Empcode,&record.CSN_MSB,&record.AccessRights)!=EOF)
{
printf("\nIndex: %d\nEmployee Name: %s\nEmpcode: %s\nCSN: %X\nAccessRights: %d\n",record.Index,record.Name,record.Empcode,record.CSN_MSB,record.AccessRights);
sprintf(CSN_MSB_LSB,"%X", record.CSN_MSB);
if(strncmp(CSN_MSB_LSB,str,8)==0)
found=1;
}
in this code my fscanf is reading only one line from file pointer fd, i want to read all the lines from the file.
how i can i do this
with same fscanf function or else any alternative which contains the same parameter list for the fscanf function please suggest me
I would try something of the sort:
while(fscanf(fp,"%d%s%s%X%X%[^\n]*c",
&record.Index,record.Name,record.Empcode,
&record.CSN_MSB,&record.AccessRights)!=EOF)
{
Though, it is worth noting that you are scanning 6 items and only storing 5. Also, you are using sscanf which takes a pointer to a character and passing it a file pointer (file descriptor), you want to use fscanf if reading from a file. The last number you scan never gets stored. The "[^\n]" says scan until a newline and takes place of the last number you are scanning for (though you don't save it in your example) and the "*c" consumes that newline. See this.

Detect partial or incomplete characters read from a buffer

In a loop I am reading a stream, that is encoded as UTF-8, 10 bytes (say) in every loop. As the stream is being passed to a buffer first, I must specify its read length in bytes before transforming it to a UTF-8 string. The issue that I am facing is that sometimes it will read partial, incomplete characters. I need to fix this.
Is there a way to detect if a string ends with an incomplete character, or some check that I can perform on the last character of my string to determine this?
Preferably a “non single-encoding” solution would be the best.
If a buffer ends with an incomplete character and you convert it into a string and then initialize a new buffer from that string, the new buffer will be a different length (longer if you're using utf8, shorter if you're using ucs2) than the original.
Something like:
var b1=new Buffer(buf.toString('utf8'), 'utf8');
if (b2.length !== buf.length) {
// buffer has an incomplete character
} else {
// buffer is OK
}
Substitute your desired encoding for 'utf8'.
Note that this is dependent on how the current implementation of Buffer#toString deals with incomplete characters, which isn't documented, though it's unlikely to be changed in a way that would result in equal-length buffers (a future implementation might throw an error instead, so you should probably wrap the code in a try-catch block).

How are variable length arguments parsed while parsing dalvik instructions?

Both move vA, vB and move-wide vA, vB involve the same operations and same operands. Except opcodes everything is the same. I am in a situation where I need to print the operands used by instructions in an application.
So when I see the instruction move vA, vB I should print move va(*contents of va*), vb(*contents of vb*).
This works fine in the case of 4-byte registers. But when I encounter move-wide instructions, I should print the contents of vA and the contents of the next virtual register., contents of vB and the contents of the next virtual register. What is the standard way of parsing these?
Since both Dalvik and dx are open source, the best way to answer questions involving dex files is to inspect their source. Dx parses instructions in DecodedInstruction.java. It decodes the opcode first, and then uses the opcode to inform decoding the rest of the instruction.
public static DecodedInstruction decode(CodeInput in) throws EOFException {
int opcodeUnit = in.read();
int opcode = Opcodes.extractOpcodeFromUnit(opcodeUnit);
InstructionCodec format = OpcodeInfo.getFormat(opcode);
return format.decode(opcodeUnit, in);
}

Resources