Decode base64 to string and Encode string to base64 in XQuery - base64

I needed to work with base64 in Xquery on a project and found almost nothing on Internet, thankfully there was one GitHub repository that had what I need to complete the encode-decode.
So, if you have a xs:string and you need to encode as base64 or you have a base64 string and need to decode as xs:string* you can use the XQuery on the repository that is highlighted as answer.
Obs: Comments are PT-BR, maybe you need to translate to understand what each function is used for

If your system supports it, use the EXPath binary library. Specification is at https://www.w3.org/2013/12/expath-binary-20131203/
See in particular the functions bin:encode-string($string, $encoding) which converts xs:string to xs:base64Binary, and bin:decode-string($in, $encoding) which converts xs:base64Binary to xs:string.

Your XQuery context also might come with a utility function that does the conversion.
eXist-db offers util:base64-encode and util:base64-decode.

Related

Decoding base64 while using GitHub API to Download a File

I am using the GitHub API to download a file from GitHub. I have been able to successfully authenticate as well as get a response from github, and see a base64 encoded string representing the file contents.
Unfortunately, I get an unusual error (string length is not a multiple of 4) when decoding the base64 string.
The HTTP request is illustrated below:
GET /repos/:owner/:repo/contents/:path
The (partial) response is illustrated below:
{
"name":....,
"download_url":...",
"type":"file",
"content":"ewogICAgInN3YWdnZXIiOiAiM...
}
The issue I am encountering is that the length of the string is 15263 bytes, and I get an error in decoding the string (string length is not a multiple of 4). I am using node.js and the 'base64-js' npm module to decode the string. Code to execute the decoding is illustrated below:
var base64 = require('base64-js');
var contents = base64.toByteArray(fileContent);
The decoding causes an exception:
Error: Invalid string. Length must be a multiple of 4
at placeHoldersCount (.../node_modules/base64-js/index.js:23:11)
at Object.toByteArray (...node_modules/base64-js/index.js:42:18)
:
:
I would think that the GitHub API is sending me the correct data, so I figure that is not the issue.
Am I performing the decoding improperly or is there another problem I am overlooking?
Any help is appreciated.
I experimented a bit and found a solution by using a different base64 decoding library as follows:
var base64 = require('js-base64').Base64;
var contents = base64.decode(res.content);
I am not sure if it is mandatory to have an encoded string length divisible by 4 (clearly my 15263 character length string is not divisible by 4) but the alternate library decoded the string properly.
A second solution which I also found to work is specific to how to use the GitHub API. By adding the following to the GitHub API call header, I was also able to get the decoded file contents:
'accept': 'application/vnd.github.VERSION.raw'
After much experimenting, I think I nailed down the difference between the working and broken base64 decoding.
It appears GitHub Base-64 encodes with:
UTF-8 charset
Base 64 MIME encoder (RFC2045)
As opposed to a "basic" (RFC4648) Base64 encoder. Several languages seem to default to the basic encoder (including Java, which I was using). When I switched to a MIME encoder, I got the full contents of the file un-garbled. This would explain why switching libraries in some cases fixed the issue.
I will note the contents field contained newline characters - decoders are supposed to ignore them, but not all do, so if you still get errors, you may need to try removing them.
The media-type header will do the job better, however in my case I am trying to use the API via a GitHub App - at time of writing, GitHub requires a specific media type be used when doing that, and it returns the JSON response.
For some reason the Github APIs base64 encoded content doesn't decode properly at all the online base64 decoders I've tried from the front page of google.
Python works however:
import base64
base64.b64decode("ewogICAgInN3YWdnZXIiOiAiM...")

How to check if a string is plaintext or base64 format in Node.js

I want to check if the given string to my function is plain text or base64 format. I am able to encode a plain text to base64 format and reverse it.
But I could not figure out any way to validate if the string is already base64 format. Can anyone please suggest correct way of doing this in node js? Is there any API for doing this already available in node js.
Valid base64 strings are a subset of all plain-text strings. Assuming we have a character string, the question is whether it belongs to that subset. One way is what Basit Anwer suggests. Those libraries require installing libicu though. A more portable way is to use the built-in Buffer:
Buffer.from(str, 'base64')
Unfortunately, this decoding function will not complain about non-Base64 characters. It will just ignore non-base64 characters. So, it alone will not help. But you can try encoding it back to base64 and compare the result with the original string:
Buffer.from(str, 'base64').toString('base64') === str
This check will tell whether str is pure base64 or not.
Encoding is byte level.
If you're dealing in strings then all you can do is to guess or keep meta data information with your string to identify
But you can check these libraries out:
https://www.npmjs.com/package/detect-encoding
https://github.com/mooz/node-icu-charset-detector

How to convert a string to a byte array which is compiled with a given charset in Go?

In java, we can use the method of String : byte[] getBytes(Charset charset) .
This method Encodes a String into a sequence of bytes using the given charset, storing the result into a new byte array.
But how to do this in GO?
Is there any similar way in Go can do this?
Please let me know it.
The standard Go library only supports Unicode (UTF-8, UTF-16, UTF-32) and ASCII encoding. ASCII is a subset of UTF-8.
The go-charset package (found from here) supports conversion to and from UTF-8 and it also links to the GNU iconv library.
See also field CharsetReader in encoding/xml.Decoder.
I believe here is an answer: https://stackoverflow.com/a/6933412/1315563
There is no way to do it without writing the conversion yourself or
using a third-party package. You could try using this:
http://code.google.com/p/go-charset

Convert Plain String to JSON?

I need to convert a string to JSON (in javascript). I have a plain string with correctly formatted JSON in it, like this:
var convert = '{"name":nick,"age":19}';
I need to convert it to just the json (e.g., minus the '' quotes). I have done some testing and found this to be the reason I'm having problems. There must be a way to convert it on the fly, right?
Help very much appreciated,
Nick
You need to use a JSON library; nearly all modern browsers have a native one available to them, however, to ensure compatibility with IE7 and below you will need to pull in Douglas Crockford's JSON2 library.
Once you have a JSON library, just issue:
var result = JSON.parse('{"name":nick,"age":19}');
JSON.parse(convert)
Crockford's json2.js will give you JSON.parse for browsers that don't already have it (modern browsers have it natively).

Groovy says my Unicode string is too long

As part of my probably wrong and cumbersome solution to print out a form I have taken a MS-Word document, saved as XML and I'm trying to store that XML as a groovy string so that I can ${fillOutTheFormProgrammatically}
However, with MS-Word documents being as large as they are, the String is 113100 unicode characters and Groovy says its limited to 65536. Is there some way to change this or am I stuck with splitting up the string?
Groovy - need to make a printable form
That's what I'm trying to do.
Update: to be clear its too long of a Groovy String.. I think a regular string might be all good. Going to change strategy and put some strings in the file I can easily find like %!%variable_name%!% and then do .replace(... uh i feel a new question coming on here...
Are you embedding this string directly in your groovy code? The jvm itself has a limit on the length of string constants, see the VM Spec if you are interested in details.
A ugly workaround might be to split the string in smaller parts and concatenate them at runtime. A better solution would be to save the text in an external file and read the contents from your code. You could also package this file along with your code and access it from the classpath using Class#getResourceAsStream.

Resources