Reading .dfb file with rust throws invalid character error

Reading .dfb file with rust throws invalid character error - rust

I am new to rust and creating a POC to convert dbf file to csv. I am reading a .dbf file using rust library dbase.
The issue is, when i crate a sample .dbf file using dbfview the code works fine. But when i use .dbf file which i will be using in real time. I am getting the following error.
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidFieldType('M')', src/libcore/result.rs:999:5
Here is the code i am using from the given link.
use dbase::FieldValue;
let records = dbase::read("tests/data/line.dbf").unwrap();
for record in records {
for (name, value) in record {
println!("{} -> {:?}", name, value);
match value {
FieldValue::Character(string) => println!("Got string: {}", string),
FieldValue::Numeric(value) => println!("Got numeric value of {}", value),
_ => {}
}
}
}
I think the ^M shows the character appended by windows.
What can i do to handle this error and read the file successfully.
Any help will be much appreciated.

The short answer to your question is no, you will not be able to read this file with dbase-rs (or any current library) and you'll most likely have to rework this file to not contain a memo field.
A deep dive into the DBF file format
The InvalidFieldType error points at a structural feature of the file that your library cannot handle - a Memo field. We're going to deep-dive into the file to figure out why that is, and whether there is anything we can do to fix it.
This is the header definition:
Of particular importance is byte 28 (offset 0000010, byte 0C), which is a bitmask indicating if the table contains a bunch of possible things, most notably:
0x01 if the file comes with an associated .cdx file
0x02 if it contains a memo
0x04 if the file is actually a .dbc file (a database)
At 0x03, your file comes with both an associated .cdx file and contains a memo. As we know (ahead of time) that dbase-rs does not handle that, that's looking increasingly more likely.
Let's keep looking. From here on, each field is 32 bytes long.
Here are your fields:
Bytes 0-10 contain the field name, byte 11 is the type. Due to how the library you wanted to use can only parse certain fields, we only really care about byte 11.
In order of appearance by what the library can parse:
[x] CALL_ID (integer)
[x] CONTACT_ID (integer)
[x] CALL_DATE (DateTime)
[x] SUBJECT (char[])
[ ] NOTES (memo)
The last field is the problematic one. Looking into the library itself, this field type is not supported and will therefore yield an Error, which you are trying to unwrap(). This is the source of your error.
There are two three ways around it:
The "long" way is to patch the library to handle memo fields. This sounds easy, but in practice it really isn't. As the memos are stored in another file (typically a dbt file in the same folder), you're going to have to make that library read both files and reference both. The point of the memo type itself is to store more than 255 bytes of data in a field. You are the only one able to evaluate whether this work is worth the effort.
If your data is less than 255 bytes in size, you can replace that memo field with a char field, and dbfview should allow you to do this
If your field is longer than 255 bytes and you have access to the ability to run sub-processes (i.e. Command::run), you can sneak-convert it using a library that can process Memo fields in another language. this nodeJS library can, but read-only, for example.

Related

How to convert model.tflite to model.cc and model.h on Windows 10

I have created a TensorFlow Lite .tflite model which I plan to use on a microcontroller. However, this file must be converted to a C source file, i.e, a TensorFlow Lite for Microcontrollers model. TensorFlow documentation provides a simple way to convert to a C array with the unix command xxd. I am using Windows 10 and do not have access to the unix command and there are no alternative Windows methods documented. After searching superuser, I saw that xxd for Windows now exists. I downloaded the command and ran it on my .tflite model. The results were different than the hello world example.
First, the hello world example model.h file has a comment that say it was "Automatically created from a TensorFlow Lite flatbuffer using the command: xxd -i model.tflite > model.cc" When I ran the command, model.h was not "automatically created".
Second, comparing the model.cc file from the hello world example, with the model.cc file that I generated, they are quite different and I'm not sure how to interpret this (I'm not referring to the differences in the actual array). Again, in the example model.cc file, it states that it was "automatically created" using the xxd command. Line 28 in the example is alignas(8) const unsigned char g_model[] = { and line 237 is const int g_model_len = 2488;. In comparison, the equivalent lines in the file I generated are unsigned char _________g_model[] = { and unsigned int _________g_model_len = 4009981;
While I am not a C expert, I am not sure how to interpret the differences in the files and if I have generated the model.cc file incorrectly. I would greatly appreciate any insight or guidance here on how to properly generate both the model.h and model.cc files from the original model.tflite file.

After doing some experiments, I think this is why you are getting differences:
xxd replaces any non-letter/non-digit character of the path to the input file by an underscore ('_'). Apparently you called xxd with a path for the input file that has 9 such leading characters, perhaps something like "../../../g.model". The syntax of C allows only letters (a to z, A to Z), digits (0 to 9) and underscore as characters of objects' names, and the names need to start with a non-digit. This is the only "manipulation" xxd does to the name of an input file.
Since xxd knows nothing about TensorFlow, it could not had generated the copyright notice. Using this as indication, any other difference had been inserted by other means by the TensorFlow authors, despite the statement "Automatically created from a TensorFlow Lite flatbuffer ...". This could be done manually or by a script, unfortunately I did not find any hint in some quick research on their repository. Apparently the statement means just the data values.
So you need to edit your result:
Add any comment you see fit.
Add the compiler-specific alignas(8) to the array, if your compiler supports it.
Add the keywords const to the array and the length variable. This will tell the compiler to prohibit any write access. And probably this will place the data in read-only memory.
Rename array and length variables to g_model and g_model_len, respectively. Most probably TensorFlow expects these names.
Copy "model.cc" into "model.h", and then apply more editions, as the example demonstrated.
Don't be bothered by different values. Different contents of the model's file are the reason. It's especially simple to check the length variable, it has to have exactly the same value as the size of the input file.
EDIT:
On line 28 which is this text alignas(8) const unsigned char as shown in the example converted model. When I attempt to convert a model (whether it's my custom model or the "hello_world.tflite" example model) the text that would be on line 28 is unsigned char (any other text on that line is not in question). How is line 28 edited & explained?
Concerning the "how": I firmly believe that the authors of TensorFlow literally used an editor (an IDE or a stand-alone program like Notepad++ or Geany) and edited the line, or used some script to automate this.
The reason for alignas(8) is most probably that TensorFlow expects the data with an alignment of 8 bytes, for example because it casts the byte array to a structure that contains values of 8 bytes width.
The insertion of const will also commonly locate the model in read-only memory, which is preferable on most microcontrollers. If it were left out, the model's data were not only writable, but would be located in precious RAM.
On line 237, the text specifically is const int. When I attempt to convert a model (whether it's my custom model or the "hello_world.tflite" example model) the text that would be on line 237 is unsigned int (any other text on that line is not in question). Why are these two lines different in these specific places? It makes me believe that xxd on Windows is not functioning the same?
Again, I firmly believe this was edited manually or by a script. TensorFlow might expect this variable to be of data type int, but any xxd I tried (Windows and Linux) generates unsigned int. I don't think that your specific version of xxd functions differently on Windows.
For const the same thoughts apply as above.
Finally, when I attempt to convert the example model "hello_world.tflite" file using the xxd for windows utility, my resulting array doesn't match the example "hello_world.cc" file. I would expect the array values to be identical if the xxd worked. The last question is how to generate the "model.h" and "model.cc" files on Windows.
Did you note that the model you link is in another branch of the repository?
If I use the branch on GitHub as in your link to "hello_world.cc", I find in "../train/README.md" this archive hello_world_2020_12_28.zip. I unpacked it and ran xxd on the included "model.tflite". The result's data match the included "model.cc" in the archive. But it does not match the data of "hello_world.cc" in the same branch that you linked. The difference is already there.
My conclusion is, that the example result was not generated from the example model. This happens, since developers sometimes don't pay enough attention on what they commit. Yes, it's unfortunate, as it irritates and frustrates beginners like you.
But, as I wrote, don't let this make you headaches. Try the simple example, use the documentation as instructions on the process. Look at the differences in specific data as a quirk. You will encounter such things time after time when working with other's projects. It is quite normal.

How do `Map<String, String>` get to know, the `String` endings in `MethodChannel` arguments

If dart and kotlin code communicate through binary(array of 8-bit integers (0-255)), then how does String end or even int end is represented in, or determined from binary sequence of bytes, is there some special charCode or something else.
Also is there a way to save a List<int> as-it-is to a file.txt, so it can be read directly to List<int> instead of serialization.
Please guide this new dev,
Thanking you...

Since Flutter handles the MethodChannel, in both the Dart side and Kotlin side, it can be allowed to have its own internal protocol to communicate between the native layer and Flutter. In theory they could use JSON but they are probably using something else based on the supported types and also making it more efficient: https://docs.flutter.dev/development/platform-integration/platform-channels?tab=type-mappings-kotlin-tab#codec
For saving a List<int> to a file, you need to determine how you want to encode the content in the file and then how you want to decode it. It can be as simply as just saving each number separated by comma or encode the list into JSON.
If your list of numbers can be represented with Uint8List or Int8List, then you can basically just save the numbers as raw bytes to the file and then read them again.
But List<int> is a list of 64-bit numbers and you should therefore determine how you want to encode this exactly.
For writing to files, there are several ways to do it but the specific way depends on what you exactly want. So without any more details I can just suggest you check the API: https://api.dart.dev/stable/2.17.3/dart-io/File-class.html

Can I get all custom section from a WebAssembly in Node.JS?

I see there is way i can get a specific custom section with given name.
var nameSections = WebAssembly.Module.customSections(module, "sec_name");
Is there any way I can get all the custom sections in a given WebAssembly module?

The JavaScript interface at this time does not offer that. The customSection function was discussed and approved here. There is currently an opened issue to address the question you asked here. Generally this issue status at this time is that providing such an API would bring too much complexities (not so much about the sections as a list, but about the content of each). That makes sense to me, because the WebAssembly standard is currently in very active development (many post-MVP features are in progress).
That said, it seems that you are left off to parse the binary by yourself. The binary format is documented here. Basically, after you parse the magic number and the version there is a list of sections: 1 byte ID, one u32 (LEB128 format with allowed trailing zeroes) for the section's byte content length and then are the bytes of the section's content (with that length). After this header and the content bytes, is the next section, if any, till the end.
The custom sections have an id of 0. The content of these sections starts with a name and after it are the section "user" bytes. This name is a vector of bytes. Each vector has a length (again u32) followed by that number of bytes that must be the UTF-8 encoded string.
So the pseudo-code is something like this:
parse the magic
parse the version
loop till there are bytes
read the section id
read the section u32 length
if id = 0 then
read the name u32 length
skip the name bytes
use the user bytes till the end of the section
jump to the end of the section

Writing a raw binary structure to file in D?

I'm trying to have a binary file which contains several binary records defined in some struct. However, I do cannot seem to find how to do it. Looking at other examples, I've managed to write strings without problems, but not struct. I just want to write it like I would in C with fwrite(3), but in D version 2.
Here is what I've tried so far:
using stream.write(tr) - writes human readable/debug representation
using stream.rawWrite(tr) - this sounded like what I need, but fails to compile with:
Error: template std.stdio.File.rawWrite cannot deduce function from
argument types !()(TitleRecord), candidates are:
/usr/lib/ldc/x86_64-linux-gnu/include/d/std/stdio.d(1132): std.stdio.File.rawWrite(T)(in T[] buffer)
trying rawWrite as above, but casting data to various things, also never compiles.
even trying to get back to C with fwrite, but can't get deep enough to get file descriptor I need.
Reading the docs has not been very helpful (writing strings works for me too, but not writing struct). I'm sure there must be simple way to do it, but I'm not able to find it.... Other SO questions did not help me. I D 1.0, it might have been accomplished with stream.writeExact(&tr, tr.sizeof) but that is no longer an option.
import std.stdio;
struct TitleRecord {
short id;
char[49] text;
};
TitleRecord tr;
void main()
{
auto stream = File("filename.dat","wb+");
tr.id = 1234;
tr.text = "hello world";
writeln(tr);
//stream.write(tr);
//stream.rawWrite(tr);
//stream.rawWrite(cast(ubyte[52]) tr);
//stream.rawWrite(cast(ubyte[]) tr);
//fwrite(&tr, 4, 1, stream);
}

For this that error is saying it expects an array not a struct. So one easy way to do it is to simply slice a pointer and give that to rawWrite:
stream.rawWrite((&tr)[0 .. 1]);
The (&tr) gets the address, thus converting your struct to a pointer. Then the [0 .. 1] means get a slice of it from the beginning, grabbing just one element.
Thus you now have a T[] that rawWrite can handle containing your one element.
Be warned if you use the #safe annotation this will not pass, you'd have to mark it #trusted. Also of course any references inside your struct (including string) will be written as binary pointers instead of data as you surely know from C experience. But in the case you showed there you're fine.
edit: BTW you could also just use fwrite if you like, copy/pasting the same code over from C (except it is foo.sizeof instead of sizeof foo). The D File thing is just a small wrapper around C's FILE* and you can get the original FILE* back out to pass to the other functions with stream.getFP() http://dpldocs.info/experimental-docs/std.stdio.File.getFP.html )

rawWrite expects an array, but there are many workarounds.
One is to create a single element array.
file.rawWrite([myStruct]);
Another one is casting the struct into an array. My library called bitleveld has a function for that called reinterpretAsArray. This also makes it easy to create checksums of said structs.
Once in a while I've encountered issues with alignment using this method, so be careful. Could be fixed by changing the align property of the struct.

How to store binary data in a Lua string

I needed to create a custom file format with embedded meta information. Instead of whipping up my own format I decide to just use Lua.
texture
{
format=GL_LUMINANCE_ALPHA;
type=GL_UNSIGNED_BYTE;
width=256;
height=128;
pixels=[[
<binary-data-here>]];
}
texture is a function that takes a table as its sole argument. It then looks up the various parameters by name in the table and forwards the call on to a C++ routine. Nothing out of the ordinary I hope.
Occasionally the files fail to parse with the following error:
my_file.lua:8: unexpected symbol near ']'
What's going on here?
Is there a better way to store binary data in Lua?
Update
It turns out that storing binary data is a Lua string is non-trivial. But it is possible when taking care with 3 sequences.
Long-format-string-literals cannot have an embedded closing-long-bracket (]], ]=], etc).
This one is pretty obvious.
Long-format-string-literals cannot end with something like ]== which would match the chosen closing-long-bracket.
This one is more subtle. Luckily the script will fail to compile if done wrong.
The data cannot embed \n or \r.
Lua's built in line-end processing messes these up. This problem is much more subtle. The script will compile fine but it will yield the wrong data. 0x13 => 0x10, 0x1013 => 0x10, etc.
To get around these limitations I split the binary data up on \r, \n, then pick a long-bracket that works, finally emit Lua that concats the various parts back together. I used a script that does this for me.
input: XXXX\nXX]]XX\r\nXX]]XX]=
texture
{
--other fields omitted
pixels= '' ..
[[XXXX]] ..
'\n' ..
[=[XX]]XX]=] ..
'\r\n' ..
[==[XX]]XX]=]==];
}

Lua is able to encode most characters in long bracket format including nulls. However, Lua opens the script file in text mode and this causes some problems. On my Windows system the following characters have problems:
Char code(s) Problem
-------------- -------------------------------
13 (CR) Is translated to 10 (LF)
13 10 (CR LF) Is translated to 10 (LF)
26 (EOF) Causes "unfinished long string near '<eof>'"
If you are not using windows than these may not cause problems, but there may be different text-mode based problems.
I was only able to produce the error you received by encoding multiple close brackets:
a=[[
]]] --> a.lua:2: unexpected symbol near ']'
But, this was easily fixed with the following:
a=[==[
]]==]

The binary data needs to be encoded into printable characters. The simplest method for decoding purposes would be to use C-like escape sequences for all bytes. For example, hex bytes 13 41 42 1E would be encoded as '\19\65\66\30'. Of course, then the encoded data is three to four times larger than the source binary.
Alternatively, you could use something like Base64, but that would have to be decoded at runtime instead of relying on the Lua interpreter. Personally, I'd probably go the Base64 route. There are Lua examples of Base64 encoding and decoding.
Another alternative would be have two files. Use a well defined image format file (e.g. TGA) that is pointed to by a separate Lua script with the additional metadata. If you don't want two files to move around then they could be combined in an archive.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string