Decoding Aztec barcode - barcode-scanner

I scanned an Aztec code and got the following result:
AQEpKCzOLQMf85T0bLRyJEZ3qukry/dsKwebnG2xESMj8JSSgmc=
I would like to know the encoded format and if possible the readable text to see what information is hidden behind the code!
Thanks.

Your scanned data is formatted in Base64. You can decode it to binary format in almost any programming language or tool like https://www.base64decode.org/
I tried to decode the value but, it isn't readable in any common character sets. You'll have to supply more information on to get any relevant answers. The author of barcode data would be your best bet (what is the usage of the code?)

Related

How to output IBM-1027-codepage-binary-file?

My output (csv/json) from my newly-created program (using .NET framework 4.6) need to be converted to a IBM-1027-codepage-binary-file (to be imported to Japanese client's IBM mainframe),
I've search the internet and know that Microsoft doesn't have equivalent to IBM-1027 code page.
So how could I output a IBM-1027-codepage-binary-file if I have an UTF-8 CSV/json file in my hand?
I'm asking around for other solutions, but for now, I think I'm going to have to suggest you do the conversion manually; I assume whichever language you're using allows you to do a hex conversion, at worst. For mainframes, the codepage is usually implicit in the dataset, it isn't something that is included in the file header.
So, what you can do is build a conversion table, from https://www.ibm.com/support/knowledgecenter/en/SSEQ5Y_5.9.0/com.ibm.pcomm.doc/reference/html/hcp_reference26.htm. Grab a character from your json/csv file, convert to the appropriate hex digits, and write those hex digits to a file. Repeat until EOF. (Note to actually write the hex data, not the ascii representation of the hex data.) Make sure that when the client transfers the file to their system, they perform a binary transfer.
If you wanted to get more complicated than that, you could look at enhancing/overriding part of the converter to CP500, which does exist on Microsoft Windows. One of the design points for EBCDIC was to make doing character conversions as simple as possible, so many of the CP500 characters hex representations are the same as the CP1027, with the exception of the Kanji characters.
This is a separate answer, from a colleague; I don't have the ability to validate it, I'm afraid.
transfer the file to the host in raw mode, just tag it as ccsid 1208
(edited)
for uss export _BPXK_AUTOCVT=ALL
oedit/obrowse handles it automatically.

Why do some files appear as partial gibberish when opened in a text editor?

I often come across the situation where I would like to read a file's original content in a human-readable way. When opening this kind of file in a text editor, why is it that it is usually gibberish with some complete and comprehensible text ? I would think that if the file is converted to something other than it's original written format, that there would be no comprehensible text remaining, yet I often find it is somewhere in between.
For example, I know that if I open a binary in a text format, there will be nothing comprehensible left that isn't purely accidental.
Example screencapture of partial gibberish text
Why is there complete text in here mixed with gibberish? Does that mean if I open the file with some sort of different encoding (I don't know what's possible), the file will come through as fully readable text? I would understand if it were all-or-nothing (either gibberish-non-readable OR human language) but I don't understand the in-between.
Please provide educational responses, rather than "because that's the way it is" type answers.
Those are formatting characters; there is no standard use and vary by the format of the file in question. You can still extract the text as needed with a fair knowledge of grep and regex, but it won't be fun. The best bet is to open the file with the software that can read it properly, as a text editor like gedit or Notepad++ will read the raw data and display that. Adobe's pdf format has text embedded, for instance, and all that gibberish is instructions for the Reader software for displaying it correctly on the screen while still allowing for relatively straightforward text extraction when required.
Editors have no real way to interpret the special formatting characters, and would need to be loaded with APIs for every conceivable program. They would also need to be updated constantly, since the formatting changes regularly for a variety of reasons. Many times, it is just to keep the files from being backward compatible with their own or other products, forcing an upgrade path. Microsoft is rather famous for that, but they are by far not the only company to do so.

Reading jpeg image using c/c++

How to read the pixel values of jpeg image using c/c++, without using any library.
I read about how compression takes place in jpeg in my course,i want header information.
For the syntax of the file you can check wikipedia.
Each segment has its own marker. The variable length segments have a two byte field for their length. So far is not really a problem, as you are able to extract all segments using this information (or at least it seems so on a first glance).
The more problematic part is to actually do something useful with the data inside the segments. The wikipedia page provides information on this topic, but it will require quite some mathematic knowledge to actually decode and grab the pixels.
Finally found some really helpful links..
link 1
link 2
Thanks for help and support.

How to determine codepage of a file (that had some codepage transformation applied to it)

For example if I know that ć should be ć, how can I find out the codepage transformation that occurred there?
It would be nice if there was an online site for this, but any tool will do the job. The final goal is to reverse the codepage transformation (with iconv or recode, but tools are not important, I'll take anything that works including python scripts)
EDIT:
Could you please be a little more verbose? You know for certain that some substring should be exactly. Or know just the language? Or just guessing? And the transformation that was applied, was it correct (i.e. it's valid in the other charset)? Or was it single transformation from charset X to Y but the text was actually in Z, so it's now wrong? Or was it a series of such transformations?
Actually, ideally I am looking for a tool that will tell me what happened (or what possibly happened) so I can try to transform it back to proper encoding.
What (I presume) happened in the problem I am trying to fix now is what is described in this answer - utf-8 text file got opened as ascii text file and then exported as csv.
It's extremely hard to do this generally. The main problem is that all the ascii-based encodings (iso-8859-*, dos and windows codepages) use the same range of codepoints, so no particular codepoint or set of codepoints will tell you what codepage the text is in.
There is one encoding that is easy to tell. If it's valid UTF-8, than it's almost certainly no iso-8859-* nor any windows codepage, because while all byte values are valid in them, the chance of valid utf-8 multi-byte sequence appearing in a text in them is almost zero.
Than it depends on which further encodings may can be involved. Valid sequence in Shift-JIS or Big-5 is also unlikely to be valid in any other encoding while telling apart similar encodings like cp1250 and iso-8859-2 requires spell-checking the words that contain the 3 or so characters that differ and seeing which way you get fewer errors.
If you can limit the number of transformation that may have happened, it shouldn't be too hard to put up a python script that will try them out, eliminate the obvious wrongs and uses a spell-checker to pick the most likely. I don't know about any tool that would do it.
The tools like that were quite popular decade ago. But now it's quite rare to see damaged text.
As I know it could be effectively done at least with a particular language. So, if you suggest the text language is Russian, you could collect some statistical information about characters or small groups of characters using a lot of sample texts. E.g. in English language the "th" combination appears more often than "ht".
So, then you could permute different encoding combinations and choose the one which has more probable text statistics.

What is the advantage of using Base64 encoding?

What is the advantage of using Base64 encode?
I would like to understand it better. Do I really need it? Can't I simply use pure strings?
I heard that the encoding can be up to 30% larger than the original (at least for images).
Originally some protocols only allowed 7 bit, and sometimes only 6 bit, data.
Base64 allows one to encode 8 bit data into 6 bits for transmission on those types of links.
Email is an example of this.
The primary use case of base64 encoding is when you want to store or transfer data with a restricted set of characters; i.e. when you can't pass an arbitrary value in each byte.
<img alt="Embedded Image"
src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAADIA..." />
This code will show encoded image, but no one can link to this image from another website and use your traffic.
Base64 decode
The advantages of Base64 encode, like somebody said, are available to transmit data from binary, into (most commonly) ASCII characters. Due to the likeliness that the receiving end can handle ASCII, it makes it a nice way to transfer binary data, via a text stream.
If your situation can handle native binary data, that will most likely yield better results, in terms of speed and such, but if not, Base64 is most likely the way to go. JSON is a great example of when you would benefit from something like this, or when it needs to be stored in a text field somewhere. Give us some more details and we can provide a better tailored answer.
One application is to transfer binary data in contexts where only characters are allowed. E.g. in XML documents/transfers. XML-RPC is an example of this.
Convert BLOB data to string and back...
Whether or not to use it depends on what you're using it for.
I've used it mostly for encoding binary data to pass through a mechanism that has really been created for text files. For example - when passing a digital certificate request around or retrieving the finished digital certificate -- in those cases, it's often very convenient to pass the binary data as Base 64 via a text field on a web form.
I probably wouldn't use it if you have something that is already text and you just want to pass it somewhere.
I use it for passing around files that tend to get chewed up by email programs because they look like text files (e.g. HL7 transcripts for replay).

Resources