What is the advantage of using Base64 encoding? - base64

What is the advantage of using Base64 encode?
I would like to understand it better. Do I really need it? Can't I simply use pure strings?
I heard that the encoding can be up to 30% larger than the original (at least for images).

Originally some protocols only allowed 7 bit, and sometimes only 6 bit, data.
Base64 allows one to encode 8 bit data into 6 bits for transmission on those types of links.
Email is an example of this.

The primary use case of base64 encoding is when you want to store or transfer data with a restricted set of characters; i.e. when you can't pass an arbitrary value in each byte.

<img alt="Embedded Image"
src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAADIA..." />
This code will show encoded image, but no one can link to this image from another website and use your traffic.
Base64 decode

The advantages of Base64 encode, like somebody said, are available to transmit data from binary, into (most commonly) ASCII characters. Due to the likeliness that the receiving end can handle ASCII, it makes it a nice way to transfer binary data, via a text stream.
If your situation can handle native binary data, that will most likely yield better results, in terms of speed and such, but if not, Base64 is most likely the way to go. JSON is a great example of when you would benefit from something like this, or when it needs to be stored in a text field somewhere. Give us some more details and we can provide a better tailored answer.

One application is to transfer binary data in contexts where only characters are allowed. E.g. in XML documents/transfers. XML-RPC is an example of this.

Convert BLOB data to string and back...

Whether or not to use it depends on what you're using it for.
I've used it mostly for encoding binary data to pass through a mechanism that has really been created for text files. For example - when passing a digital certificate request around or retrieving the finished digital certificate -- in those cases, it's often very convenient to pass the binary data as Base 64 via a text field on a web form.
I probably wouldn't use it if you have something that is already text and you just want to pass it somewhere.

I use it for passing around files that tend to get chewed up by email programs because they look like text files (e.g. HL7 transcripts for replay).

Related

Is there anything more compact than base64 for providing large images in plaintext logs?

I need to transfer a generated 1920x1080 image out of a Node pipeline that has no connectivity to either the internet or me, with the only controllable output being plaintext logs
So in order to retrieve that image, I need some way to transfer that data through a log file, which is tricky given the image is around 45KB with .jpeg compression
Converting the image to a base64 data URI works perfectly, as I can copy it from the logs and see the image in my browser, but the size of the URI is unwieldly at 40k+ characters
Is there a more compact or efficient way of doing this?
This link gives insight: Binary To text Encoding
Base85 Base85/ASCII85 is probably most promising but offers no more than 5% improvement. Is ~42KB less unwieldy than 45KB? How long will the solution live, how "shareable" must the output be and must image quality be preserved?
If image quality for your purpose is not all that important, you may get better mileage out of fiddling with encoding options (JPEG is a lossy method of encoding so you could save significant space at the cost of letting the images get more grainy although at 45KB they probably already are (?)).
You may want to get rid of mention of URIs in your question. You're interested in Base64 as an encoding method (for which 64 common text characters are used as digits... like decimal uses 10, hexadecimal 16 and Base85 85 etc). Base64 is just a ubiquitous "standard" (actually a family of several standards - see Base64 Variants).

How to output IBM-1027-codepage-binary-file?

My output (csv/json) from my newly-created program (using .NET framework 4.6) need to be converted to a IBM-1027-codepage-binary-file (to be imported to Japanese client's IBM mainframe),
I've search the internet and know that Microsoft doesn't have equivalent to IBM-1027 code page.
So how could I output a IBM-1027-codepage-binary-file if I have an UTF-8 CSV/json file in my hand?
I'm asking around for other solutions, but for now, I think I'm going to have to suggest you do the conversion manually; I assume whichever language you're using allows you to do a hex conversion, at worst. For mainframes, the codepage is usually implicit in the dataset, it isn't something that is included in the file header.
So, what you can do is build a conversion table, from https://www.ibm.com/support/knowledgecenter/en/SSEQ5Y_5.9.0/com.ibm.pcomm.doc/reference/html/hcp_reference26.htm. Grab a character from your json/csv file, convert to the appropriate hex digits, and write those hex digits to a file. Repeat until EOF. (Note to actually write the hex data, not the ascii representation of the hex data.) Make sure that when the client transfers the file to their system, they perform a binary transfer.
If you wanted to get more complicated than that, you could look at enhancing/overriding part of the converter to CP500, which does exist on Microsoft Windows. One of the design points for EBCDIC was to make doing character conversions as simple as possible, so many of the CP500 characters hex representations are the same as the CP1027, with the exception of the Kanji characters.
This is a separate answer, from a colleague; I don't have the ability to validate it, I'm afraid.
transfer the file to the host in raw mode, just tag it as ccsid 1208
(edited)
for uss export _BPXK_AUTOCVT=ALL
oedit/obrowse handles it automatically.

Is there an advantage of sending binary packets over strings?

The title says it all. Basically I'm using TCP for a client-server setup and I'm wondering if there is an advantage of transforming strings to binary before sending the data over tcp?
Strings are binary data, or can at least be easily converted to such, byte[], with
static byte[] GetStringBytes(string str)
{
byte[] bytes = new byte[str.Length * sizeof(char)];
System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
return bytes;
}
If you compress/encode the data you send, whether it starts out life as a string or binary data you will most likely be sending the same total number of bytes.
No real advantage in a vast majority of cases. Also, binary data tends to be more platform dependent, so if you want to extend your client/server to a multi-platform environment, you're probably better off sticking with Strings.
There are some data that are encoded in binary, like DER.
If your application needs to send these kinds of data, you can send binary directly. For example, the certificate of CA. If you send it as string like base64, it needs more size to represent it.
In some lower level, like TCP, sharing information (handshaking) between two entity is using binary because of the size, performance and fault tolerance.
But as an application developer which needs to evolve agilely, there are lots of things other than sizes you need to consider, like compatibility and maintenance. Let's say you need to response some metadata with the binary data, you might need to use JSON to represent:
{
"version": 1,
"data": "base64 encoded data"
}
Things will be easier to clients, because there are tons of tools to help parsing JSON and it is easier to debug when you can see the data without tools (human readable). In other hand, binary encoded (like DER) you have to upgrade your protocol document to let clients can easily upgrade their code for new format and you also need to consider backward/forward compatible.
Although there are some tools help binary encoded more compatible and maintainable, like Apache Thrift, Protocol Buffer, Apache Avro.
Looking deeper on performance aspect. Unless you are comparing with and without encode/decode process (that is easy to compare which is better). You must do some load-test to check it.
For example there are some extremely fast JSON parser, it can easily beat some binary parser which claims to have better performance than others.

Protocol buffers logging

In our business, we require to log every request/response which coming to our server.
At this time being, we are using xml as standard implementation.
Log files are used if we need to debug/trace some error.
I am kind of curious if we switch to protocol buffers, since it is binary, what will be the best way to log request/response to file?
For example:
FileOutputStream output = new FileOutputStream("\\files\log.txt");
request.build().writeTo(outout);
For anyone who has used protocol buffers in your application, how do you log your request/response, just in case we need it for debugging purpose?
TL;DR: write debugging logs in text, write long-term logs in binary.
There are at least two ways you can do this logging (and maybe, in fact, you should do both):
Writing your logs in text format. This is good for debugging and quickly checking for problems with your eyes.
Writing your logs in binary format - this will make future analysis much quicker since you can load the data using same protocol buffers code and do all kinds of things on them.
Quite honestly, this is more or less the way this is done at the place this technology came from.
We use the ShortDebugString() method on the C++ object to write down a human-readable version of all incoming and outgoing messages to a text-file. ShortDebugString() returns a one-line version of the same string returned by the toString() method in Java. Not sure how easy it is to accomplish the same thing in Java.
If you have competing needs for logging and performance then I suppose you could dump your binary data to the file as-is, with perhaps each record preceded by a tag containing a timestamp and a length value so you'll know where this particular bit of data ends. But I hasten to admit this is very ugly. You will need to write a utility to read and analyze this file, and will be helpless without that utility.
A more reasonable solution would be to dump your binary data in text form. I'm thinking of "lines" of text, again starting with whatever tagging information you find relevant, followed by some length information in decimal or hex, followed by as many hex bytes as needed to dump your buffer - thus you could end up with some fairly long lines. But since the file is line structured, you can use text-oriented tools (an editor in the simplest case) to work with it. Hex dumping essentially means you are using two bytes in the log to represent one byte of data (plus a bit of overhead). Heh, disk space is cheap these days.
If those binary buffers have a fairly consistent structure, you could even break out and label fields (or something like that) so your data becomes a little more human readable and, more importantly, better searchable. Of course it's up to you how much effort you want to sink into making your log records look pretty; but the time spent here may well pay off a little later in analysis.
If you've non-ASCII character strings in your messages, simply logging them by using implicit or explicit call to toString would escape the characters.
"오늘은 무슨 요일입니까?" becomes "\354\230\244\353\212\230\354\235\200 \353\254\264\354\212\250 \354\232\224\354\235\274\354\236\205\353\213\210\352\271\214?"
If you want to retain the non-ASCII characters, use TextFormat.printer().escapingNonAscii(false).printToString(message).
See this answer for more details.

Will random data appended to a JPG make it unusable?

So, to simplify my life I want to be able to append from 1 to 7 additional characters on the end of some jpg images my program is processing*. These are dummy padding (fillers, etc - probably all 0x00) just to make the file size a multiple of 8 bytes for block encryption.
Having tried this out with a few programs, it appears they are fine with the additional characters, which occur after the FF D9 that specifies the end of the image - so it appears that the file format is well defined enough that the 'corruption' I'm adding at the end shouldn't matter.
I can always post process the files later if needed, but my preference is to do the simplest thing possible - which is to let them remain (I'm decrypting other file types and they won't mind, so having a special case is annoying).
I figure with all the talk of Steganography hullaballo years ago, someone has some input here...
(encryption processing by 8 byte blocks, I don't want to save pre-encrypted file size, so append 0x00 to input data, and leave them there after decoding)
No, you can add bits to the end of a jpg file, without making it unusable. The heading of the jpg file tells how to read it, so the program reading it will stop at the end of the jpg data.
In fact, people have hidden zip files inside jpg files by appending the zip data to the end of the jpg data. Because of the way these formats are structured, the resulting file is valid in either format.
You can .. but the results may be unpredictable.
Even though there is enough information in the format to tell the client to ignore the extra data it is likely not a case the programmer tested for.
A paranoid program might look at the size, notice the discrepancy and decide it won't process your file because clearly it doesn't fully understand it. This is particularly likely when reading data from the web when random bytes in a file could be considered a security risk.
You can embed your data in the XMP tag within a JPEG (or EXIF or IPTC fields for that matter).
XMP is XML so you have a fair bit of flexibility there to do you own custom stuff.
It's probably not the simplest thing possible but putting your data here will maintain the integrity of the JPEG and require no "post processing".
You data will then show up in other imaging software such as PhotoShop, which may not be ideal.
As others have stated, you have no control how programs process image files and therefore some programs may find the images valid others may not.
However, there is a bigger issue here. Judging by your question, I'm deducing you're practicing "security through obscurity." It's widely considered a very bad practice. Use Google to find a plethora of articles about the topic.

Resources