Trouble Converting a Hex String (in Lua) - string

I've been trying different ways to read this hex string, and I cannot figure out how. Each method only converts part of it. The online converters don't do it, and this is the method I tried:
function string.fromhex(str)
return (str:gsub('..', function (cc)
return string.char(tonumber(cc, 16))
end))
end
packedStr = "1b4c756151000104040408001900000040746d702f66756e632e416767616d656e696f6e2e6c756100160000002b0000000000000203000000240000001e0000011e008000000000000100000000000000170000002900000000000008410000000500000006404000068040001a400000160000801e0080000a00800041c0000022408000450001004640c1005a40000016800c80458001008500000086404001868040018600420149808083458001008580020086c04201c5000000c640c001c680c001c600c301014103009c80800149808084450001004980c38245c003004600c4005c408000458002004640c40085800400860040018640400186c0440186004201c14003005c00810116000280450105004641c50280010000c00100025c8180015a01000016400080420180005e010001614000001600fd7f4580050081c005005c400001430080004700060043008000478001004300800047c003001e008000190000000405000000676d63700004050000004368617200040700000053746174757300040b000000416767616d656e696f6e000404000000746d7000040f000000636865636b65645f7374617475730004030000007477000409000000636861724e616d6500040900000066756c6c6e616d6500040a0000006775696c644e616d65000407000000737472696e670004060000006d617463680004060000006775696c6400040400000025772b0001010403000000756900040d00000064726177456c656d656e7473000407000000676d617463680004030000005f470004050000004e616d650004060000007461626c6500040900000069734d656d6265720004060000006572726f7200044e0000005468697320786d6c206973206e6f742070726f7065726c7920636f6e6669677572656420666f722074686973206368617261637465722e20506c6561736520636f6e74616374204b616575732100040900000071756575654163740000000000410000001800000018000000180000001800000018000000180000001900000019000000190000001a0000001a0000001a0000001a0000001b0000001b0000001b0000001b0000001b0000001b0000001c0000001c0000001c0000001c0000001c0000001c0000001c0000001c0000001c0000001c0000001d0000001d0000001e0000001e0000001e0000001f0000001f0000001f0000001f0000001f0000001f0000001f0000001f0000001f0000001f0000002000000020000000200000002000000020000000200000002000000021000000210000001f000000220000002400000024000000240000002500000025000000260000002600000027000000270000002900000005000000020000006e0009000000400000001000000028666f722067656e657261746f7229002b000000370000000c00000028666f7220737461746529002b000000370000000e00000028666f7220636f6e74726f6c29002b000000370000000200000074002c000000350000000000000003000000290000002a0000002b000000010000000800000072657466756e6300010000000200000000000000"
local f = assert(io.open("unsquished.lua", "w+"));
f:write(packedStr:fromhex());
f:close()
This simply gives me a bunch of gibberish surrounded by a few readable strings.
Could someone please tell me how to convert the entirety of this string into readable format? Thank you!

Break your packedStr in parts of 2
1b = 27
4c = 76
75 = 117
61 = 97
and so forth. When you use string.char() with the resulting decimal output, it converts them to equivalent ASCII values. Of the total possible 256 ASCII values in extended ASCII table, only 95 are printable characters.
Thus, you'll always receive the gibberish text. Here's what you'd receive when trying to print each of the character separately: http://codepad.org/orM7pmAb and that is the only possible "readable" output.

Related

Python 3.6 ASCII inside hex bytes

I am receiving binary data, such as
data = b'\xaa\x44\x12\x1c\x2a'
When I try to parse each byte - what I am actually parsing is
b'\xaaD\x12\x1c*'
Is there a reason why bytes 44 and 2a are converted from HEX to ASCII?
Is there a way to prevent this conversion.
I have tried -
data = data.hex()
print(data.hex())
#print output aa44121c2a
Which does some what maintains the format but converts it to a string and cannot iterate through each byte but each character.
Any suggestions?

How to represent this utf-8 encoded string in Rust?

On this RFC: https://www.rfc-editor.org/rfc/rfc7616#page-19 at page 19, there's this example of a text encoded in UTF-8:
J U+00E4 s U+00F8 n D o e
4A C3A4 73 C3B8 6E 20 44 6F 65
How do I represent it in a Rust String?
I tried https://mothereff.in/utf-8 and doing J\00E4s\00F8nDoe but it didn't work.
"Jäsøn Doe" should work fine. Rust source files are always UTF-8 encoded and a string literal may contain any Unicode scalar value (that is, any code point except surrogates, which must not be encoded in UTF-8).
If your editor does not support UTF-8 encoding, but supports ASCII, you can use Unicode code point escapes, which are documented in the Rust reference:
A 24-bit code point escape starts with U+0075 (u) and is followed by up to six hex digits surrounded by braces U+007B ({) and U+007D (}). It denotes the Unicode code point equal to the provided hex value.
suggesting the correct syntax should be "J\u{E4}s\u{F8}n Doe".
You can refer to Rust By Example as everything is not covered in rust eBook
(https://doc.rust-lang.org/stable/rust-by-example/std/str.html#literals-and-escapes)
You can use the syntax \u{your_unicode}
let unicode_str = String::from("J\u{00E4}s\u{00F8}nDoe");
println!("{}", unicode_str);

Capital text with space. What kind of text format is this?

I've got a list of songs that look like this.
GIFT〜白〜(冬恋/君の歌をうたう)【完全生産限定盤】
The latin letters GIFT here looks odd and I can't figure out how to make it read like a normal text. For example if you copy this word, it doesn't have a space in-between the letters or anything but seems to be in a different text format.
Can someone help me how I can convert this into normal text?
These are Unicode characters.
For example, the 'G' is
Unicode Character 'FULLWIDTH LATIN CAPITAL LETTER G' (U+FF27)
UTF-8 (hex) 0xEF 0xBC 0xA7 (efbca7)
see here
You can copy the string to Notepad++ and then convert it into Hex code (Extensions/Converter/ASCII->HEX)
and get EFBCA7EFBCA9EFBCA6EFBCB4 for the word 'GIFT'
Then serach for "Unicode EFBCA7" to find the above information.
This can be converted into normal latin characters. For example in .Net there is the Normalize function:
using System;
using System.Text;
public class Program
{
public static void Main()
{
Console.WriteLine("Unicode:");
String text = "GIFT";
Console.WriteLine(text);
byte[] bytes = Encoding.UTF8.GetBytes(text);
foreach(var b in bytes)
Console.Write("{0:X} ", b);
Console.WriteLine("\nASCII:");
String text2 = text.Normalize(NormalizationForm.FormKC);
Console.WriteLine(text2);
bytes = Encoding.UTF8.GetBytes(text2);
foreach(var b in bytes)
Console.Write("{0:X} ", b);
}
}
try it on .Net Fiddle
This will print out:
Unicode:
GIFT
EF BC A7 EF BC A9 EF BC A6 EF BC B4
ASCII:
GIFT
47 49 46 54
Probably there are similar functions in other languages too. Now you know what you're looking for.
The search term "convert unicode FULLWIDTH LATIN" will help you.
See also here
When you can't find a function for that, you can also just do your own conversion, after all the character code is just an offset to the normal ASCII/UTF-8 latin character set. See examples here.

Reading-in a binary JPEG-Header (in Python)

I would like to read in a JPEG-Header and analyze it.
According to Wikipedia, the header consists of a sequences of markers. Each Marker starts with FF xx, where xx is a specific Marker-ID.
So my idea, was to simply read in the image in binary format, and seek for the corresponding character-combinations in the binary stream. This should enable me to split the header in the corresponding marker-fields.
For instance, this is, what I receive, when I read in the first 20 bytes of an image:
binary_data = open('picture.jpg','rb').read(20)
print(binary_data)
b'\xff\xd8\xff\xe1-\xfcExif\x00\x00MM\x00*\x00\x00\x00\x08'
My questions are now:
1) Why does python not return me nice chunks of 2 bytes (in hex-format).
Somthing like this I would expect:
b'\xff \xd8 \xff \xe1 \x-' ... and so on. Some blocks delimited by '\x' are much longer than 2 bytes.
2) Why are there symbols like -, M, * in the returned string? Those are no characters of a hex representation I expect from a byte string (only: 0-9, a-f, I think).
Both observations hinder me in writing a simple parser.
So ultimately my question summarizes to:
How do I properly read-in and parse a JPEG Header in Python?
You seem overly worried about how your binary data is represented on your console. Don't worry about that.
The default built-in string-based representation that print(..) applies to a bytes object is just "printable ASCII characters as such (except a few exceptions), all others as an escaped hex sequence". The exceptions are semi-special characters such as \, ", and ', which could mess up the string representation. But this alternative representation does not change the values in any way!
>>> a = bytes([1,2,4,92,34,39])
>>> a
b'\x01\x02\x04\\"\''
>>> a[0]
1
See how the entire object is printed 'as if' it's a string, but its individual elements are still perfectly normal bytes?
If you have a byte array and you don't like the appearance of this default, then you can write your own. But – for clarity – this still doesn't have anything to do with parsing a file.
>>> binary_data = open('iaijiedc.jpg','rb').read(20)
>>> binary_data
b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x02\x01\x00H\x00H\x00\x00'
>>> ''.join(['%02x%02x ' % (binary_data[2*i],binary_data[2*i+1]) for i in range(len(binary_data)>>1)])
'ffd8 ffe0 0010 4a46 4946 0001 0201 0048 0048 0000 '
Why does python not return me nice chunks of 2 bytes (in hex-format)?
Because you don't ask it to. You are asking for a sequence of bytes, and that's what you get. If you want chunks of two-bytes, transform it after reading.
The code above only prints the data; to create a new list that contains 2-byte words, loop over it and convert each 2 bytes or use unpack (there are actually several ways):
>>> wd = [unpack('>H', binary_data[x:x+2])[0] for x in range(0,len(binary_data),2)]
>>> wd
[65496, 65504, 16, 19014, 18758, 1, 513, 72, 72, 0]
>>> [hex(x) for x in wd]
['0xffd8', '0xffe0', '0x10', '0x4a46', '0x4946', '0x1', '0x201', '0x48', '0x48', '0x0']
I'm using the little-endian specifier < and unsigned short H in unpack, because (I assume) these are the conventional ways to represent JPEG 2-byte codes. Check the documentation if you want to derive from this.

How to convert HEX to human readable

I'm trying to convert from a HEX to ASCII and I'm getting this message. I would like to understand how to interpret it in the correct way.
0x2b6162630704fe17
Using the npm module hex2ascii it returns this:
"+abc\u0007\u0004þ\u0017"
if I convert from an online converter, it returns :
+abcþ
Could someone help me to interpret this? I am using node.
Am I doing something wrong?
Appreciate the help!
If you look at the string in the console, you will notice that the two strings you're posted are actually the same.
The gist is, the string contains nonprintable unicode characters, which get escaped by the hex2ascii module.
The online converter you are using tries to display those characters. Since they are not printable you simply cannot see them.
Let's convert the hex string
var conv = "2b6162630704fe17".match (/(..)/g).reduce ((a,c) => a + String.fromCharCode(parseInt (c,16)), "")
conv //"+abcþ"
It looks just like the String from the converter! Let's compare it to the other string
conv === "+abc\u0007\u0004þ\u0017" // true
Are you sure its 8 digit ASCII?
If it is, each 2 hex characters represents a given ASCII number.
So:
2b6162630704fe17
First 2b, which is 2 * 16 + 11 = 43 - which is a plus sign
61, which is 6 * 16 + 1 = 97 = lowercase a
62, which is 6 * 16 + 2 = 98 = lowercase b
63, which is 6 * 16 + 3 = 99 = lowercase c
07, which is 0 * 16 + 7 = 7 = that's a special unprintable character.
references to convert the numbers to characters - asciitable.com
Based on the 07, I wonder if your data is truly ascii, or a different encoding.

Resources