Decode file names from MIME messages with LotusScript - lotus-notes

I am parsing a MIME mail with LotusScript to get all attachments. But I get issues when it comes to encoded file names in the header. I got one file with the name
"HE336 =?Windows-1251?Q?=CF=E0=EA=E5=F2_=E4=EE=EA=F3=EC=E5=ED=F2=EE=E2.pdf?="
Is there any way to decode it with LotusScript?

The string I get is RFC 2047 header encoding. I found that Notes supports it in MIME headers. The issue I had is when I used MIMEHeader.GetParamVal it always returns the encoded value. However MIMEHeader.GetHeaderVal and GetHeaderValAndParams has an extra parameter
boolean decoded
true decodes any RFC-2047 encodings
false (default) retains any encodings; false is enforced if folded is true
When this is set to true, I get a decoded value.

it's been a while but I once used Julian Robichaux's Base64 classes with Jave and/or LS. You should be able to achieve what you are looking for with these.
Base64Encoding
Hope that helps.
Best wishes - Michael

Related

Is there any reason why okhttp3.Credentials use ISO_8859_1 instead UTF-8?

I am using okhttp3.Credentials to get Base64 string in my current project. I spot an issue with cyrillic symbols I passed on server as Base64 string and eventually find out current implementation of okhttp3.Credentials uses ISO_8859_1.
Is there a thoughtful intent to go with ISO_8859_1 here instead of more universal UTF-8?
Update from answer:
From reference to spec
The original definition of this authentication scheme failed to
specify the character encoding scheme used to convert the user-pass
into an octet sequence. In practice, most implementations chose
either a locale-specific encoding such as ISO-8859-1 ([ISO-8859-1]),
or UTF-8 ([RFC3629]). For backwards compatibility reasons, this
specification continues to leave the default encoding undefined, as
long as it is compatible with US-ASCII (mapping any US-ASCII
character to a single octet matching the US-ASCII character code).
B.3. Why not simply switch the default encoding to UTF-8?
There are sites in use today that default to a local character
encoding scheme, such as ISO-8859-1 ([ISO-8859-1]), and expect user
agents to use that encoding. Authentication on these sites will stop
working if the user agent switches to a different encoding, such as
UTF-8.
Note that sites might even inspect the User-Agent header field
([RFC7231], Section 5.5.3) to decide which character encoding scheme
to expect from the client. Therefore, they might support UTF-8 for
some user agents, but default to something else for others. User
agents in the latter group will have to continue to do what they do
today until the majority of these servers have been upgraded to
always use UTF-8.
Discussion here https://github.com/square/okhttp/pull/3134
It's legacy and you can override with the optional param.

Base64 for auth

I need to build a XMPP client. The server PLAIN mechanism to auth. (using zlib compression I think)
I captured trafic from other xmpp system thats use PLAIN mechanism and the text appear to be Base64 (id + token) ADc1Y2M2OWY0MzQwMTUwMjgyOWIwMWY2MDAyN2E0NDE2ADE1YTk0NzM3NTRiYjY2MGExMGYzYTA5MzA5NWQxMmY3 is what the client return. I put that into a Base64 decoder and its give me this : 75cc69f43401502829b01f60027a441615a9473754bb660a10f3a093095d12f7.
When I encode this using a Base64 encoder, Its give me something else than the first Base64 string (NzVjYzY5ZjQzNDAxNTAyODI5YjAxZjYwMDI3YTQ0MTYxNWE5NDczNzU0YmI2NjBhMTBmM2EwOTMwOTVkMTJmNw)
Can someone explain me? I couldn't find anything on google.
Edit:
https://xmpp.org/extensions/xep-0034.html#example-3
the result of your decoding is not correct, in fact the decoded value
contains two binary values that can't be displayed as a character
(here substituted by a �):
�75cc69f43401502829b01f60027a4416�15a9473754bb660a10f3a093095d12f7.
What you encoded then is based on a string in which the two binary
values are not present, so you encoded basically something different
and got of course a different result.
From jps

Decoding base64 while using GitHub API to Download a File

I am using the GitHub API to download a file from GitHub. I have been able to successfully authenticate as well as get a response from github, and see a base64 encoded string representing the file contents.
Unfortunately, I get an unusual error (string length is not a multiple of 4) when decoding the base64 string.
The HTTP request is illustrated below:
GET /repos/:owner/:repo/contents/:path
The (partial) response is illustrated below:
{
"name":....,
"download_url":...",
"type":"file",
"content":"ewogICAgInN3YWdnZXIiOiAiM...
}
The issue I am encountering is that the length of the string is 15263 bytes, and I get an error in decoding the string (string length is not a multiple of 4). I am using node.js and the 'base64-js' npm module to decode the string. Code to execute the decoding is illustrated below:
var base64 = require('base64-js');
var contents = base64.toByteArray(fileContent);
The decoding causes an exception:
Error: Invalid string. Length must be a multiple of 4
at placeHoldersCount (.../node_modules/base64-js/index.js:23:11)
at Object.toByteArray (...node_modules/base64-js/index.js:42:18)
:
:
I would think that the GitHub API is sending me the correct data, so I figure that is not the issue.
Am I performing the decoding improperly or is there another problem I am overlooking?
Any help is appreciated.
I experimented a bit and found a solution by using a different base64 decoding library as follows:
var base64 = require('js-base64').Base64;
var contents = base64.decode(res.content);
I am not sure if it is mandatory to have an encoded string length divisible by 4 (clearly my 15263 character length string is not divisible by 4) but the alternate library decoded the string properly.
A second solution which I also found to work is specific to how to use the GitHub API. By adding the following to the GitHub API call header, I was also able to get the decoded file contents:
'accept': 'application/vnd.github.VERSION.raw'
After much experimenting, I think I nailed down the difference between the working and broken base64 decoding.
It appears GitHub Base-64 encodes with:
UTF-8 charset
Base 64 MIME encoder (RFC2045)
As opposed to a "basic" (RFC4648) Base64 encoder. Several languages seem to default to the basic encoder (including Java, which I was using). When I switched to a MIME encoder, I got the full contents of the file un-garbled. This would explain why switching libraries in some cases fixed the issue.
I will note the contents field contained newline characters - decoders are supposed to ignore them, but not all do, so if you still get errors, you may need to try removing them.
The media-type header will do the job better, however in my case I am trying to use the API via a GitHub App - at time of writing, GitHub requires a specific media type be used when doing that, and it returns the JSON response.
For some reason the Github APIs base64 encoded content doesn't decode properly at all the online base64 decoders I've tried from the front page of google.
Python works however:
import base64
base64.b64decode("ewogICAgInN3YWdnZXIiOiAiM...")

Why do need metadata information specifying the encoding?

I feel a bit of a chicken and egg problem if i write a html meta tag specifying charset as say UTF-16 - like how do we decode the entire HTTP Request in the first place if we didn't know its UTF-16 data ? I believe request header needs to handle this and by the time we try to read metadata like say html tag charset="utf-16" we already know its UTF-16 .
Besides think one level higher about header information like Request Headers - are passed in ASCII as a standard ?
I mean at some level we need to agree upon and you can't set a data that is needed to decode as a metadata information . Can anyone clarify this ?
I am a bit confused on the idea of specifying a data that is needed to interpret the whole data as a metadata information inside the original data .
In general how can any form of encoding work if we don't have a standard agreed upon language/encoding to convey the data about the data itself ?
For example I am informed that Apache default has 8859-1 as the standard . So would all client need to enforce that for HTTP Headers and interpret the real content as UTF-8 if we want UTF-8 for the content-type ?
What character encoding should I use for a HTTP header? is a closely related question
UTF-16 (and other) encodings use a BOM (Byte Order Mark) that is read at the start of the file and that signals which encoding is being used. Only after that, the encoded part of the file begins.
For example, for UTF-16, you'll have the bytes FE FF if big-endian and FF FE if little-endian words are being used.
You also often see UTF-8 BOMs, although they don't need to be used (and may confuse some XML parsers).

how to get gmail content with content has been encode

When i try to get subject and content email, i alwasy got this content has been encode, example:
subject: Thư cảm ơn
I got this subject has been encode: Th=C6=B0_c=E1=BA=A3m_=C6=A1n?=
so how to decode this content.
That string looks like it's supposed to be encoded in the format mandated by RFC 2047, except that it is corrupt: RFC 2047 encoded strings always begin with =? end end with ?=.
Assuming you can get a non-corrupt encoded string, you should expect to see not only the Subject but also the display names in To, From, and other headers to be encoded in this way. You might want to look into using a library that does complete MIME parsing for you. Appropriate available libraries will depend on what language you're using.

Resources