How to convert model.tflite to model.cc and model.h on Windows 10 - windows-10

I have created a TensorFlow Lite .tflite model which I plan to use on a microcontroller. However, this file must be converted to a C source file, i.e, a TensorFlow Lite for Microcontrollers model. TensorFlow documentation provides a simple way to convert to a C array with the unix command xxd. I am using Windows 10 and do not have access to the unix command and there are no alternative Windows methods documented. After searching superuser, I saw that xxd for Windows now exists. I downloaded the command and ran it on my .tflite model. The results were different than the hello world example.
First, the hello world example model.h file has a comment that say it was "Automatically created from a TensorFlow Lite flatbuffer using the command: xxd -i model.tflite > model.cc" When I ran the command, model.h was not "automatically created".
Second, comparing the model.cc file from the hello world example, with the model.cc file that I generated, they are quite different and I'm not sure how to interpret this (I'm not referring to the differences in the actual array). Again, in the example model.cc file, it states that it was "automatically created" using the xxd command. Line 28 in the example is alignas(8) const unsigned char g_model[] = { and line 237 is const int g_model_len = 2488;. In comparison, the equivalent lines in the file I generated are unsigned char _________g_model[] = { and unsigned int _________g_model_len = 4009981;
While I am not a C expert, I am not sure how to interpret the differences in the files and if I have generated the model.cc file incorrectly. I would greatly appreciate any insight or guidance here on how to properly generate both the model.h and model.cc files from the original model.tflite file.

After doing some experiments, I think this is why you are getting differences:
xxd replaces any non-letter/non-digit character of the path to the input file by an underscore ('_'). Apparently you called xxd with a path for the input file that has 9 such leading characters, perhaps something like "../../../g.model". The syntax of C allows only letters (a to z, A to Z), digits (0 to 9) and underscore as characters of objects' names, and the names need to start with a non-digit. This is the only "manipulation" xxd does to the name of an input file.
Since xxd knows nothing about TensorFlow, it could not had generated the copyright notice. Using this as indication, any other difference had been inserted by other means by the TensorFlow authors, despite the statement "Automatically created from a TensorFlow Lite flatbuffer ...". This could be done manually or by a script, unfortunately I did not find any hint in some quick research on their repository. Apparently the statement means just the data values.
So you need to edit your result:
Add any comment you see fit.
Add the compiler-specific alignas(8) to the array, if your compiler supports it.
Add the keywords const to the array and the length variable. This will tell the compiler to prohibit any write access. And probably this will place the data in read-only memory.
Rename array and length variables to g_model and g_model_len, respectively. Most probably TensorFlow expects these names.
Copy "model.cc" into "model.h", and then apply more editions, as the example demonstrated.
Don't be bothered by different values. Different contents of the model's file are the reason. It's especially simple to check the length variable, it has to have exactly the same value as the size of the input file.
EDIT:
On line 28 which is this text alignas(8) const unsigned char as shown in the example converted model. When I attempt to convert a model (whether it's my custom model or the "hello_world.tflite" example model) the text that would be on line 28 is unsigned char (any other text on that line is not in question). How is line 28 edited & explained?
Concerning the "how": I firmly believe that the authors of TensorFlow literally used an editor (an IDE or a stand-alone program like Notepad++ or Geany) and edited the line, or used some script to automate this.
The reason for alignas(8) is most probably that TensorFlow expects the data with an alignment of 8 bytes, for example because it casts the byte array to a structure that contains values of 8 bytes width.
The insertion of const will also commonly locate the model in read-only memory, which is preferable on most microcontrollers. If it were left out, the model's data were not only writable, but would be located in precious RAM.
On line 237, the text specifically is const int. When I attempt to convert a model (whether it's my custom model or the "hello_world.tflite" example model) the text that would be on line 237 is unsigned int (any other text on that line is not in question). Why are these two lines different in these specific places? It makes me believe that xxd on Windows is not functioning the same?
Again, I firmly believe this was edited manually or by a script. TensorFlow might expect this variable to be of data type int, but any xxd I tried (Windows and Linux) generates unsigned int. I don't think that your specific version of xxd functions differently on Windows.
For const the same thoughts apply as above.
Finally, when I attempt to convert the example model "hello_world.tflite" file using the xxd for windows utility, my resulting array doesn't match the example "hello_world.cc" file. I would expect the array values to be identical if the xxd worked. The last question is how to generate the "model.h" and "model.cc" files on Windows.
Did you note that the model you link is in another branch of the repository?
If I use the branch on GitHub as in your link to "hello_world.cc", I find in "../train/README.md" this archive hello_world_2020_12_28.zip. I unpacked it and ran xxd on the included "model.tflite". The result's data match the included "model.cc" in the archive. But it does not match the data of "hello_world.cc" in the same branch that you linked. The difference is already there.
My conclusion is, that the example result was not generated from the example model. This happens, since developers sometimes don't pay enough attention on what they commit. Yes, it's unfortunate, as it irritates and frustrates beginners like you.
But, as I wrote, don't let this make you headaches. Try the simple example, use the documentation as instructions on the process. Look at the differences in specific data as a quirk. You will encounter such things time after time when working with other's projects. It is quite normal.

Related

Is there a way to compare the format in which a line of a text file was written in Fortran?

I'm developing a Fortran program that must obtain some data from a text file and generate another text file using specific data from the first one.
The input file have many lines written in several specific formats which I know of. Although I know the formats, the lines in this file are generated in a "random way".
It would be much easier to generate the output file if I could compare the format in which each line was written, then I would know exactly what data I can get from that line of the input file to use it in the output file.
What I need is something like, for example, knowing that the format of the line read and stored in the LINHA variable is described in the FORMATO variable, do something like:
    
IF (FORMATO = '(1X, 15,3F8.1,2 (5A, 1X))') THEN
READ (LINHA, '(6X, F8.1)') my_variable
END IF
Because there might be another format such as
'(6A, 2F8.1, F8.6,2 (6A))'
in which, if I use the same READ statement, I will read an F8.1 variable in my_variable, however this value is not the correct one.
A (not so elegant) work-around that I can think of is to read the entire line using the advance = no option of read() and parse each character in the line separately. While doing so, you may count white spaces or other specific characters that you know of and then identify the different formats from there.
It would be helpful if you could give more specifications of the nature of the task.
The best option is to read without format, keeping each line in a character array. Then read the line variable as an internal file with the required format using the variable IOSTAT in order to check if the format is the correct.
INT max_size = 80
CHARACTER(LEN=max_size) :: line
READ(*,*) line
READ(line,'(1X, 15,3F8.1,2 (5A, 1X))',IOSTAT=ios) var1, var2, ...
Problem solved using a mixture of some of the suggestions posted.
I read each of the lines of the input file in an internal variable (RLINFILE) in the format '(A165)'. After that, I read all the contents of the string that I put in this internal variable in several dummy variables, using the format I knew of the lines from where I wanted to get some information (read all the information of the line in the desired and get IOSTAT = 0 guarantee that this is the correct line), so if the result of the reading is ok (IOSTAT = 0), it is because the line I just read was the correct one for the information I wanted, so I store the contents of some of the dummy variables that represent the values that interest me. In the code, the solution looked something like this:
OPEN(UNIT=LU1,FILE=RlinName,STATUS='OLD')
ilin = 0
formato = '(14X,A,1X,F7.1,1X,F7.1,5X,A,1X,A,1X,A,5X,A,I5,1X,A,I3,3F8.1,A,A,A,1X,A,2(1X,F8.2),1X,A,1X,A)'
DO WHILE (.TRUE.)
READ(LU1,'(A165)',END=300) RLINFILE
READ(RLINFILE,formato,IOSTAT=linhaok) dum2_a1,dum2_f1,dum2_f2,dum2_a2,dum2_a3,dum2_a4,dum2_a5,dum2_i1,dum2_a6,dum2_i2,dum2_f3,dum2_f4,dum2_f5,dum2_a7,dum2_a8,dum2_a9,dum2_a10,dum2_f6,dum2_f7,dum2_a11,dum2_a12
IF(linhaok.EQ.0) THEN
ilin = ilin+1
rlin_lshu(ilin) = dum2_a4
rlin_nbpa(ilin) = dum2_i1
rlin_ncir(ilin) = dum2_i2
rlin_ppij(ilin) = dum2_f3
rlin_pqij(ilin) = dum2_f4
rlin_tapn(ilin) = dum2_a7
END IF
END DO
300 CLOSE(UNIT=LU1)
The description of the problem you are trying to solve is a bit vague to me, but the simplest solutions that comes to my mind, given the description of the problem, is to modify the original code that generates the input data file, to write the used Fortran READ format before the data line in the input file. This way, you can read the format as a string and use it in the subsequent data IO in your second code.
If you describe the specific task your tryting to accomplish in more details, perhaps more experienced Fortranners could help.

Fortran90 cray writing unformatted array using "*"

I have a program, written in fortran90, that is writing an array to a file, but for some reason is using an asterix to represent multiple columns:
8*9, 4, 2*9, 4
later on reading from the file I am getting I/O errors:
lib-4190 : UNRECOVERABLE library error
A numeric input field contains an invalid character.
Encountered during a list-directed READ from unit 10 Fortran unit 10 is connected to a sequential formatted text file:
Does anyone have any idea why this is happening, and if there is a flag to feed to the compiler to prevent it. I'm using the cray fortran compiler, and the write statement looks like this:
write (lun,*) nsf_species(bundle%species(1:bundle%n_prim))
Update:
The line reading in the data file looks like:
read (lun,*) Info(ifile)%alpha_i(1:size)
I have checked to ensure that it is this line that is causing the problem.
This compression of list-directed output is a very useful feature of the Cray Compilation Environment when writing out large amounts of data. This compressed output will, however, not be read in correctly, as you point out (which is less useful).
You can modify this behaviour, not using a compiler flag but by using the "assign" command.
Consider this sample code:
PROGRAM test
IMPLICIT NONE
INTEGER :: u
OPEN(UNIT=u,FILE="f1",FORM="FORMATTED",STATUS="UNKNOWN")
WRITE(u,*) 0,0,0
CLOSE(u)
OPEN(UNIT=u,FILE="f2",FORM="FORMATTED",STATUS="UNKNOWN")
WRITE(u,*) 0,0,0
CLOSE(u)
END PROGRAM test
We first build with CCE and execute. Files f1 and f2 both contain the compressed output form:
$ ftn -o test.x test.F90
$ ./test.x
$ cat f1
3*0
$ cat f2
3*0
Now we will use "assign" to modify the format in file f2. First we need to define a filename to hold the assign information:
$ export FILENV=my_filenenv
Now we use assign to switch off the compressed output for file f2:
$ assign -y on f:f2
Now we rerun the experiment (without needing to recompile):
$ ./test.x
$ cat f1
3*0
$ cat f2
0, 0, 0
There are options to do this for all files, for certain filename patterns or many other cases.
There are other things that assign can do. See "man assign" with PrgEnv-cray loaded for more details.
The write statement is using list directed formatting (it is still a formatted output statement - "formatted" means "formatted such that a human can read it")- as specified by the * inside the parenthesised part of the statement. The rules for list directed output give a great deal of freedom to the compiler. Typically, if you actually care about the details of the output, you should provide an explicit format.
One of the rules that does apply is that the resulting output should generally be suitable for list directed input. But there are some rather surprising rules for what is permitted as input for list directed formatting. One such feature is that you can specify in the input text a repeat count for an input values using the syntax repeat*value.
The compiler has noticed that there are repeat values in the output, so it has used this repeat count feature.
I don't know why you get an error message when reading the file under list directed input - as the line you show is a valid input line for list directed input. Make sure that the line causing the error is actually the line that you show.
A simple workaround solution would be to change the write statement so that it does not use the compressed format. e.g. change to:
write (lun,'(*(I5))') nsf_species(bundle%species(1:bundle%n_prim))
The '*' allows an arbitrary number of repeats of the specified format and should suppress the compressed output format.
However, if the compiler outputs in compressed format than it should be able to read back in in the same compressed format. Hopefully the helpdesk will be able to get to the root of why that does not work.

How can I access the bit representation of a file using Scheme?

If I had a file called raw_text.txt, is there a way I could iterate through each bit?
I see the following but am confused on how to use it:
http://www.gnu.org/software/mit-scheme/documentation/mit-scheme-ref/File-Manipulation.html
— procedure: file-attributes/mode-string attributes
The mode string of the file, a newly allocated string showing the file's mode bits. Under unix, this string is in unix format. Under Windows, this string shows the standard “DOS” attributes in their usual format.
EDIT: I am using mit-scheme
It's implementation-specific. On the Racket side of things, there are a few libraries:
http://planet.racket-lang.org/display.ss?package=bitsyntax.plt&owner=tonyg
http://planet.racket-lang.org/display.ss?package=bit-io.plt&owner=soegaard
You can probably use something like the binary-parse library as well: http://okmij.org/ftp/Scheme/binary-io.html, as long as your implementation of Scheme can support it.
Under MIT Scheme, you can use the bit-string functions.
I haven't actually tried to do anything with this, but I think you're looking for this section of the mit-scheme docs: Input/Output. Specifically the file ports and input procedures sections.
I didn't see anything specifically about reading the binary bits, but if it's character bytes you want, it looks like there are procedures for that. Maybe you want to do something like this?
(call-with-input-file "raw_text.txt" <procedure>)
or
(call-with-binary-file "raw_text.txt" <procedure>)
Where <procedure> will take the file port and use the input procedures to read things from that file.
Just out of curiosity, what are you trying to do?
EDIT: It appears that someone did a write up on this here.

Magic numbers of the Linux reboot() system call

The Linux Programming Interface has an exercise in Chapter 3 that goes like this:
When using the Linux-specific reboot()
system call to reboot the system, the
second argument, magic2, must be
specified as one of a set of magic
numbers (e.g., LINUX_REBOOT_MAGIC2).
What is the significance of these
numbers? (Converting them to
hexadecimal provides a clue.)
The man page tells us magic2 can be one of LINUX_REBOOT_MAGIC2 (672274793), LINUX_REBOOT_MAGIC2A (85072278), LINUX_REBOOT_MAGIC2B (369367448), or LINUX_REBOOT_MAGIC2C (537993216). I failed to decipher their meaning in hex. I also looked at /usr/include/linux/reboot.h, which didn't give any helpful comment either.
I then searched in the kernel's source code for sys_reboot's definition. All I found was a declaration in a header file.
Therefore, my first question is, what is the significance of these numbers? My second question is, where's sys_reboot's definition, and how did you find it?
EDIT: I found the definition in kernel/sys.c. I only grepped for sys_reboot, and forgot to grep for the MAGIC numbers. I figured the definition must be hidden behind some macro trick, so I looked at the System.map file under /boot, and found it next to ctrl_alt_del. I then grepped for that symbol, which led me to the correct file. If I had compiled the kernel from source code, I could try to find which object file defined the symbol, and go from there.
Just a guess, but those numbers look more interesting in hex:
672274793 = 0x28121969
85072278 = 0x05121996
369367448 = 0x16041998
537993216 = 0x20112000
Developers' or developers' children's birthdays?
Regarding finding the syscall implementation, I did a git grep -n LINUX_REBOOT_MAGIC2 and found the definition in kernel/sys.c. The symbol sys_reboot is generated by the SYSCALL_DEFINE4(reboot, ... gubbins, I suspect.
It's the birthday of Linus Torvalds (The developer of the Linux kernel and the Git version control) and his 3 daughters. works as magic numbers to reboot the system.
http://en.wikipedia.org/wiki/Linus_Torvalds

How to store binary data in a Lua string

I needed to create a custom file format with embedded meta information. Instead of whipping up my own format I decide to just use Lua.
texture
{
format=GL_LUMINANCE_ALPHA;
type=GL_UNSIGNED_BYTE;
width=256;
height=128;
pixels=[[
<binary-data-here>]];
}
texture is a function that takes a table as its sole argument. It then looks up the various parameters by name in the table and forwards the call on to a C++ routine. Nothing out of the ordinary I hope.
Occasionally the files fail to parse with the following error:
my_file.lua:8: unexpected symbol near ']'
What's going on here?
Is there a better way to store binary data in Lua?
Update
It turns out that storing binary data is a Lua string is non-trivial. But it is possible when taking care with 3 sequences.
Long-format-string-literals cannot have an embedded closing-long-bracket (]], ]=], etc).
This one is pretty obvious.
Long-format-string-literals cannot end with something like ]== which would match the chosen closing-long-bracket.
This one is more subtle. Luckily the script will fail to compile if done wrong.
The data cannot embed \n or \r.
Lua's built in line-end processing messes these up. This problem is much more subtle. The script will compile fine but it will yield the wrong data. 0x13 => 0x10, 0x1013 => 0x10, etc.
To get around these limitations I split the binary data up on \r, \n, then pick a long-bracket that works, finally emit Lua that concats the various parts back together. I used a script that does this for me.
input: XXXX\nXX]]XX\r\nXX]]XX]=
texture
{
--other fields omitted
pixels= '' ..
[[XXXX]] ..
'\n' ..
[=[XX]]XX]=] ..
'\r\n' ..
[==[XX]]XX]=]==];
}
Lua is able to encode most characters in long bracket format including nulls. However, Lua opens the script file in text mode and this causes some problems. On my Windows system the following characters have problems:
Char code(s) Problem
-------------- -------------------------------
13 (CR) Is translated to 10 (LF)
13 10 (CR LF) Is translated to 10 (LF)
26 (EOF) Causes "unfinished long string near '<eof>'"
If you are not using windows than these may not cause problems, but there may be different text-mode based problems.
I was only able to produce the error you received by encoding multiple close brackets:
a=[[
]]] --> a.lua:2: unexpected symbol near ']'
But, this was easily fixed with the following:
a=[==[
]]==]
The binary data needs to be encoded into printable characters. The simplest method for decoding purposes would be to use C-like escape sequences for all bytes. For example, hex bytes 13 41 42 1E would be encoded as '\19\65\66\30'. Of course, then the encoded data is three to four times larger than the source binary.
Alternatively, you could use something like Base64, but that would have to be decoded at runtime instead of relying on the Lua interpreter. Personally, I'd probably go the Base64 route. There are Lua examples of Base64 encoding and decoding.
Another alternative would be have two files. Use a well defined image format file (e.g. TGA) that is pointed to by a separate Lua script with the additional metadata. If you don't want two files to move around then they could be combined in an archive.

Resources