How does protobuf judge if a value belongs to an optional field, or another object? - linux

For example, if I define a photo like this:
$cat 30.proto
message hello
{
required int32 f1=1;
required int32 f2=2;
optional int32 f3=3;
}
Then I would double if protobuf can handle things like this:
I declare 3 objects, each don't have f3 field.
Write to output
Then, in reader side, how does reader know that these 6 values should belong to 3 objects(each 2 fields), or belong to 2 objects(each 3 fields)?
In another word, how does the "require"/"optional" reflected inside encoded bytes? If not reflected in the byte stream, then how does protobuf determine the start of a new offset? We know protobuf don't have "delimiter" bits.
I had a simple quick test on this:
$cat 30.cpp
#include "30.pb.h"
#include<fstream>
using namespace std;
int main()
{
fstream f("./log30.data",ios::binary|ios::out);
hello p1,p2,p3,p4,p5;
p1.set_f1(1);
p1.set_f2(2);
p2.set_f1(3);
p2.set_f2(4);
p3.set_f1(5);
p3.set_f2(6);
p1.SerializeToOstream(&f);
p2.SerializeToOstream(&f);
p3.SerializeToOstream(&f);
p4.set_f1(7);
p4.set_f2(8);
p4.set_f3(9);
p5.set_f1(0xa);
p5.set_f2(0xb);
p5.set_f3(0xc);
p4.SerializeToOstream(&f);
p5.SerializeToOstream(&f);
return 0;
}
$g++ 30.cpp 30.pb.cc -lprotobuf && ./a.out && xxd log30.data
00000000: 0801 1002 0803 1004 0805 1006 0807 1008 ................
00000010: 1809 080a 100b 180c ........
I just guessed if the byte stream always starts with the smallest tag number, and increases as it dumps the byte stream: when a smaller tag number is met, it thinks that's the start of a new object. Just my humble guess.
Need your explanations!

(3) Then, in reader side, how does reader know that these 6 values
should belong to 3 objects(each 2 fields), or belong to 2 objects(each
3 fields)?
In another word, how does the "require"/"optional" reflected inside
encoded bytes? If not reflected in the byte stream, then how does
protobuf determine the start of a new offset? We know protobuf don't
have "delimiter" bits.
Protobuf doesn't. It's up to you, the programmer, to split the messages before you feed them to protobuf.
For example, run this program:
#include "30.pb.h"
#include <fstream>
#include <iostream>
using namespace std;
int main()
{
fstream f("./log30.data",ios::binary|ios::out);
hello p1,p2,p3,p4,p5;
p1.set_f1(1);
p1.set_f2(2);
p2.set_f1(3);
p2.set_f2(4);
p3.set_f1(5);
p3.set_f2(6);
p1.SerializeToOstream(&f);
p2.SerializeToOstream(&f);
p3.SerializeToOstream(&f);
p4.set_f1(7);
p4.set_f2(8);
p4.set_f3(9);
p5.set_f1(0xa);
p5.set_f2(0xb);
p5.set_f3(0xc);
p4.SerializeToOstream(&f);
p5.SerializeToOstream(&f);
f.close();
f.open("./log30.data", ios::binary|ios::in);
hello hin;
hin.ParseFromIstream(&f);
cout << "f1: " << hin.f1() << ", f2: " << hin.f2() << ", f3: " << hin.f3() << "\n";
return 0;
}
You should see only the values of your last serialized hello object, as protobuf reads the whole stream and overwrites older values with newer ones.

Form the documentation
As you know, a protocol buffer message is a series of key-value pairs. The binary version of a message just uses the field's number as the key – the name and declared type for each field can only be determined on the decoding end by referencing the message type's definition (i.e. the .proto file).
When a message is encoded, the keys and values are concatenated into a byte stream. When the message is being decoded, the parser needs to be able to skip fields that it doesn't recognize. This way, new fields can be added to a message without breaking old programs that do not know about them. To this end, the "key" for each pair in a wire-format message is actually two values – the field number from your .proto file, plus a wire type that provides just enough information to find the length of the following value.
...
If a proto2 message definition has repeated elements (without the [packed=true] option), the encoded message has zero or more key-value pairs with the same tag number.
So optional elements could not be placed into output stream. While required must be included. Schema must be known for both serialization and deserialization (in contrast to Avro where schema must be embedded with data) so validation of required/optional field happens after deserialization when parser check if all required fields has values.

Related

Python3 Struct unpack format string

I am using the python3 struct module to unpack byte data I extracted from a serial com. (With help) I've figured out how to unpack most the data into human readable form. I am having difficult with the format string on a group header struct group_hdr (please see attached screenshot document). I have a byte data (b). I know the character string for "word" is "H" but it's unclear to me from the document what phd_status is. It hasn't been defined anywhere else in the Data structure document. Any ideas?. Thank you in advance.
struct group_hdr
{
union phdb_status status
word label
}
subrecord = struct.unpack_from('<??H', b)
As is explained under Status, it is a simple bitfield with a width of 32 bits. The union is probably defined elsewhere in C (or a similar language) as
union phdb_status {
unsigned int bit_0:1;
unsigned int bit_1:1;
};
The following Python code will store your values:
status, label = struct.unpack_from('<IH', b)
and you can test the individual bits of status with status & 1 and status & 2.

ISO 8583 - How are BCD Values Calculated For Fields With Subfields?

Can anyone answer how BCD data is usually calculated for fields what have subfield values?
I don't mean in terms of code, as I have that part nailed down.
What I mean is say I have field X, which is to be sent containing data for 5 sub values. The field is BCD, but would each sub-value be converted to BCD and then appended to field X or would they be added to field X in the clear and then converted as a whole to BCD?
Can't find a clear answer anywhere... not even in the message spec I'm working off of oO
Cheers,
Mike K
You have to know the format of field X and the format of subfields.
Let me give you an example.
Assuming that you would like to transmit emv data form pos to Host using a field X.
A format for field X is described below.
Length Attribute 3 bytes LLL Length of data to follow
Subfield 1 var bytes First Additional subfield
Subfield 2 var bytes Second Additional subfield
...
Subfield n var bytes nth Additional subfield
The structure of each additional subfield is as follows
Tag Name 2 bytes
Tag Length 1 byte
Tag Value ..bytes
If a field contains subfields, then every subfield is packed or unpacked with its own format.
The subfields should then not be packed or unpacked again.
If a field contains subfields, it is unnecessary to define the format of the field body. However, the field header format (tag or length) can be defined.
The following example contains a field with three subfields
Message Structure:
<f type="VAL" name="Parent" len="21">
<f type="VAL" name="Child1" bodyPacker="BcdBodyPacker" len="6"/>
<f type="VAL" name="Child2" bodyPacker="BcdBodyPacker" len="7"/>
<f type="VAL" name="Child3" bodyPacker="BcdBodyPacker" len="8"/>
</f>
Message data:
<f name="Parent">
<f name="Child1" val="111111111111"/>
<f name="Child2" val="22222222222222"/>
<f name="Child3" val="3333333333333333"/>
</f>
Message bytes in hex:
111111111111222222222222223333333333333333
The source code of the example can be found on GitHub
The iso-8583-packer Java library was used for creation of this example. I am the author of the library.

Go - Comparing strings/byte slices input by the user

I am getting input from the user, however when I try to compare it later on to a string literal it does not work. That is just a test though.
I would like to set it up so that when a blank line is entered (just hitting the enter/return key) the program exits. I don't understand why the strings are not comparing because when I print it, it comes out identical.
in := bufio.NewReader(os.Stdin);
input, err := in.ReadBytes('\n');
if err != nil {
fmt.Println("Error: ", err)
}
if string(input) == "example" {
os.Exit(0)
}
string vs []byte
string definition:
string is the set of all strings of 8-bit bytes, conventionally but not necessarily representing UTF-8-encoded text. A string may be empty, but not nil. Values of string type are immutable.
byte definition:
byte is an alias for uint8 and is equivalent to uint8 in all ways. It is used, by convention, to distinguish byte values from 8-bit unsigned integer values.
What does it mean?
[]byte is a byte slice. slice can be empty.
string elements are unicode characters, which can have more then 1 byte.
string elements keep a meaning of data (encoding), []bytes not.
equality operator is defined for string type but not for slice type.
As you see they are two different types with different properties.
There is a great blog post explaining different string related types [1]
Regards the issue you have in your code snippet.
Bear in mind that in.ReadBytes(char) returns a byte slice with char inclusively. So in your code input ends with '\n'. If you want your code to work in desired way then try this:
if string(input) == "example\n" { // or "example\r\n" when on windows
os.Exit(0)
}
Also make sure that your terminal code page is the same as your .go source file. Be aware about different end-line styles (Windows uses "\r\n"), Standard go compiler uses utf8 internally.
[1] Comparison of Go data types for string processing.

Parsing strings in Fortran

I am reading from a file in Fortran which has an undetermined number of floating point values on each line (for now, there are about 17 values on a line). I would like to read the 'n'th value on each line to a given floating point variable. How should i go about doing this?
In C the way I wrote it was to read the entire line onto the string and then do something like the following:
for(int il = 0; il < l; il++)
{
for(int im = -il; im <= il; im++)
pch = strtok(NULL, "\t ");
}
for(int im = -l; im <= m; im++)
pch = strtok(NULL, "\t ");
dval = atof(pch);
Here I am continually reading a value and throwing it away (thus shortening the string) until I am ready to accept the value I am trying to read.
Is there any way I can do this in Fortran? Is there a better way to do this in Fortran? The problem with my Fortran code seems to be that read(tline, '(f10.15)') tline1 does not shorten tline (tline is my string holding the entire line and tline1 what i am trying to parse it into), thus I cannot use the same method as I did in my C routine.
Any help?
The issue is that Fortran is a record-based I/O system while C is stream-based.
If you have access to a Fortran 2003 compliant compiler (modern versions of gfortran should work), you can use the stream ACCESS specifier to do what you want.
An example can be found here.
Of course, if you were really inclined, you could just use your C function directly from Fortran. Interfacing the two languages is generally simple, typically only requiring a wrapper with a lowercase name and an appended underscore (depending on compiler and platform of course). Passing arrays or strings back and forth is not so trivial typically; but for this example that wouldn't be needed.
Once the data is in a character array, you can read it into another variable as you are doing with the ADVANCE=no signature, ie.
do i = 1, numberIWant
read(tline, '(F10.15)', ADVANCE="no") tline1
end do
where tline should contain your number at the end of the loop.
Because of the record-based I/O, a READ statement will typically throw out what is after the end of the record. But the ADVANCE=no tells it not to.
If you know exactly at what position the value you want starts, you can use the T edit descriptor to initiate the next read from that position.
Let's say, for instance, that the width of each field is 10 characters and you want to read the fifth value. The read statement will then look something like the following.
read(file_unit, '(t41, f10.5)') value1
P.s.: You can dynamically create a format string at runtime, with the correct number after the t, by using a character variable as format and use an internal file write to put in this number.
Let's say you want the value that starts at position n. It will then look something like this (I alternated between single and double quotes to try to make it more clear where each string starts and stops):
write(my_format, '(a, i0, a)') "(t", n, ', f10.5)'
read(file_unit, my_format) value1

Is it possible to base64-encode a file in chunks?

I'm trying to base64 encode a huge input file and end up with an text output file, and I'm trying to find out whether it's possible to encode the input file bit-by-bit, or whether I need to encode the entire thing at once.
This will be done on the AS/400 (iSeries), if that makes any difference. I'm using my own base64 encoding routine (written in RPG) which works excellently, and, were it not a case of size limitations, would be fine.
It's not possible bit-by-bit but 3 bytes at a time, or multiples of 3 bytes at time will do!.
In other words if you split your input file in "chunks" which size(s) is (are) multiples of 3 bytes, you can encode the chunks separately and piece together the resulting B64-encoded pieces together (in the corresponding orde, of course. Note that the last chuink needn't be exactly a multiple of 3 bytes in size, depending on the modulo 3 value of its size its corresponding B64 value will have a few of these padding characters (typically the equal sign) but that's ok, as thiswill be the only piece that has (and needs) such padding.
In the decoding direction, it is the same idea except that you need to split the B64-encoded data in multiples of 4 bytes. Decode them in parallel / individually as desired and re-piece the original data by appending the decoded parts together (again in the same order).
Example:
"File" contents =
"Never argue with the data." (Jimmy Neutron).
Straight encoding = Ik5ldmVyIGFyZ3VlIHdpdGggdGhlIGRhdGEuIiAoSmltbXkgTmV1dHJvbik=
Now, in chunks:
"Never argue --> Ik5ldmVyIGFyZ3Vl
with the --> IHdpdGggdGhl
data." (Jimmy Neutron) --> IGRhdGEuIiAoSmltbXkgTmV1dHJvbik=
As you see piece in that order the 3 encoded chunks amount the same as the code produced for the whole file.
Decoding is done similarly, with arbitrary chuncked sized provided they are multiples of 4 bytes. There is absolutely not need to have any kind of correspondance between the sizes used for encoding. (although standardizing to one single size for each direction (say 300 and 400) may makes things more uniform and easier to manage.
It is a trivial effort to split any given bytestream into chunks.
You can base64 any chunk of bytes without problem.
The problem you are faced with is that unless you place specific requirements on your chunks (multiples of 3 bytes), the sequence of base64-encoded chunks will be different than the actual output you want.
In C#, this is one (sloppy) way you could do it lazily. The execution is actually deferred until string.Concat is called, so you can do anything you want with the chunked strings. (If you plug this into LINQPad you will see the output)
void Main()
{
var data = "lorum ipsum etc lol this is an example!!";
var bytes = Encoding.ASCII.GetBytes(data);
var testFinal = Convert.ToBase64String(bytes);
var chunkedBytes = bytes.Chunk(3);
var base64chunks = chunkedBytes.Select(i => Convert.ToBase64String(i.ToArray()));
var final = string.Concat(base64chunks);
testFinal.Dump(); //output
final.Dump(); //output
}
public static class Extensions
{
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> list, int chunkSize)
{
while(list.Take(1).Count() > 0)
{
yield return list.Take(chunkSize);
list = list.Skip(chunkSize);
}
}
}
Output
bG9ydW0gaXBzdW0gZXRjIGxvbCB0aGlzIGlzIGFuIGV4YW1wbGUhIQ==
bG9ydW0gaXBzdW0gZXRjIGxvbCB0aGlzIGlzIGFuIGV4YW1wbGUhIQ==
Hmmm, if you wrote the base64 conversion yourself you should have noticed the obvious thing: each sequence of 3 octets is represented by 4 characters in base64.
So you can split the base64 data at every multiple of four characters, and it will be possible to convert these chunks back to their original bits.
I don't know how character files and byte files are handled on an AS/400, but if it has both concepts, this should be very easy.
are text files limited in the length of each line?
are text files line-oriented, or are they just character streams?
how many bits does one byte have?
are byte files padded at the end, so that one can only create files that span whole disk sectors?
If you can answer all these questions, what exact difficulties do you have left?

Resources