JSONFormat.print() method encoding special characters and also adding extra slash - protobuf-java

I need to convert a protobuf message to JSON string in java. For this I am using the below API as recommended by the docs (https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/util/JsonFormat.Printer.html)
String jsonString = JsonFormat.printer().includingDefaultValueFields().print(protobufMessage);
This is working fine for a simple string, however, when my string contains special characters like &, single quote etc. the gson.toJson() method inside JsonFormat is converting special characters to octal format. For example "A&BC" is converted to "A\u0026BC". Also, the resultant string has an extra backslash appended.
So finally "A&BC" is converted to the string "A\\u0026BC".
If it were "A\u0026BC" then I could have converted to a byte array and formed a string with it. But because of the additional backslash I am not able to do so.
Currently I am using protobuf version 3.7.1 and I tried to upgrade and check if any latest API is available, but it did not help. I searched online but did not find any references (a similar issue was reported for JSONFormat.printToString but this API is removed in a later version. https://github.com/carlomedas/protobuf-java-format/issues/16). Can someone please help here if you have come across this issue.

I think the problem might be that you're using that string to pass along, and it's getting parsed a 2nd time. If you use the printer, it will convert "A&BC" to "A\u0026BC". Then when Jackson parses that, it will append the 2nd backslash. To avoid this, you can use #JsonRawValue annotation to avoid being parsed with the 2nd backslash.

Related

Python interpret bytearray as bytes

I have a question regarding the interpretation of a string as bytes
Within python, I have the situation that one variable contains e.g. this value
"bytearray(b'\x13\x02US')"
this is unfortunately due to the behavior of a module I am using. My question is, how could i get this string into bytes?
I have tried stripping away the "bytearray(b'" and the "')" at the end, and use .encode() as a function, but the result then is:
b'\\x13\\x02US'
Which clearly escapes the \ in order to prevent the interpretation as bytes.
How could i get this converted into
b'\x13\x02US'
instead though?
Thank you very much!
You could use .decode().replace('\\', '\'), this way it replaces the double slashes with single ones. Either attache it after your .encode() function or do it on your string seperately.

issues while serializing to YAML file

I have started using .net API for yaml and it seems to be helpful. However I have few questions and wondering if you can provide some sample/work around for the same.
(1) I have an object consisting 4 strings I would like to serialize its collection (List or String[]). I wrote a helper method to return me the strings in the format I want, however it adds an extra single quote before and after the string. So I am getting
-'{str1: str2, str3: str4}'
-'{str5: str6, str7: str8}'
instead of
-{str1: str2, str3: str4}
-{str5: str6, str7: str8}
Can you suggest any workarounds?
(2) I am trying to insert xaml as a string in a yaml document. My xaml is well formed xml but when I serialize it, it cuts before 3rd last element. Any idea why?
Regarding the first question, if you are serializing an array of strings, then it is normal that each element is quoted because it starts with a '{'. In this case, you should be serializing the list of objects directly instead of converting them to string first.
Regarding the second question, you should add some code to the question to clarify what you are doing.

Node.js JavaScript-stringify

JSON is not a subset of JavaScript. I need my output to be 100% valid JavaScript; it will be evaluated as such -- i.e., JSON.stringify will not (always) work for my needs.
Is there a JavaScript stringifier for Node?
As a bonus, it would be nice if it could stringify objects.
You can use JSON.stringify and afterwards replace the remaining U+2028 and U+2029 characters. As the article linked states, the characters can only occur in the strings, so we can safely replace them by their escaped versions without worrying about replacing characters where we should not be replacing them:
JSON.stringify('ro\u2028cks').replace(/\u2028/g,'\\u2028').replace(/\u2029/g,'\\u2029')
From the last paragraph in the article you linked:
The solution
Luckily, the solution is simple: If we look at the JSON specification we see that the only place where a U+2028 or U+2029 can occur is in a string. Therefore we can simply replace every U+2028 with \u2028 (the escape sequence) and U+2029 with \u2029 whenever we need to send out some JSONP.
It’s already been fixed in Rack::JSONP and I encourage all frameworks or libraries that send out JSONP to do the same. It’s a one-line patch in most languages and the result is still 100% valid JSON.

Node.js URL-encoding for pre-RFC3986 urls (using + vs %20)

Within Node.js, I am using querystring.stringify() to encode an object into a query string for usage in a URL. Values that have spaces are encoded as %20.
I'm working with a particularly finicky web service that will only accept spaces encoded as +, as used to be commonly done prior to RFC3986.
Is there a way to set an option for querystring so that it encodes spaces as +?
Currently I am simply doing a .replace() to replace all instances of %20 with +, but this is a bit tedious if there is an option I can set ahead of time.
If anyone still facing this issue, "qs" npm package has feature to encode spaces as +
qs.stringify({ a: 'b c' }, { format : 'RFC1738' })
I can't think of any library doing that by default, and unfortunately, I'd say your implementation may be the more efficient way to do this, since any other option would probably either do what you're already doing, or will use slower non-compiled pure JavaScript code.
What about asking the web service provider to follow the RFC?
https://github.com/kvz/phpjs is a node.js package that provides all the php functions. The http_build_query implementation at the time of writing this only supports urlencode (the query string includes + instead of spaces), but hopefully soon will include the enc_type parameter / rawurlencode (%20's for spaces).
See http://php.net/http_build_query.
RFC1738 (+'s) will be the default enc_type either way, so you can use it immediately for your purposes.

Groovy says my Unicode string is too long

As part of my probably wrong and cumbersome solution to print out a form I have taken a MS-Word document, saved as XML and I'm trying to store that XML as a groovy string so that I can ${fillOutTheFormProgrammatically}
However, with MS-Word documents being as large as they are, the String is 113100 unicode characters and Groovy says its limited to 65536. Is there some way to change this or am I stuck with splitting up the string?
Groovy - need to make a printable form
That's what I'm trying to do.
Update: to be clear its too long of a Groovy String.. I think a regular string might be all good. Going to change strategy and put some strings in the file I can easily find like %!%variable_name%!% and then do .replace(... uh i feel a new question coming on here...
Are you embedding this string directly in your groovy code? The jvm itself has a limit on the length of string constants, see the VM Spec if you are interested in details.
A ugly workaround might be to split the string in smaller parts and concatenate them at runtime. A better solution would be to save the text in an external file and read the contents from your code. You could also package this file along with your code and access it from the classpath using Class#getResourceAsStream.

Resources