print(bytes(97).decode('utf8'))
This line is part of my project and its giving me a headache for a while now, If I run this the Output is just an empty space, no Error or anything else. The 97 came from encoding the 'a' with utf8. I want do work with the encoded numbers so I changed them to Integers, but I cant get them to decode once I'm done working with them
So if you run bytes(97) the output is b'\x00\x00\x00\x00\...
but if you run bytes([97]) you get b'a' which I assume is what you expected.
EDIT:
I just found this question which explains the why:
Converting int to bytes in Python 3
Related
i just have a quick question on this bit of code i have here on the Cryptopals Challenges, using Python3, For XORing a string with one single character.The program takes in a hex string string, decodes it, XORs it with a single character, does this for every possible character, then finds the "most english" line of XORd data. heres my code snippet (which i admittdly used from a solutions page ) :
def singlechar_xor(input_bytes, key_value):
"""XORs every byte of the input with the given key_value and returns the result."""
output = b''
for char in input_bytes:
output += bytes([char ^ key_value])
return output
i know what is happening and i understand what is supposed to happen, im just not sure how bytes behave and what types are and arent supposed to be XORd. Why do i need the brackets around char^key_value? If i remove the brackets my output becomes a bunch of 0's. What is the result of the XOR or the character and the key_value? If someone could kindly explain so i could have a better understanding going forward in these challenges id GREATLY appreciate it <3
I have a program that is grabbing variables stored on the local file system and storing them in a variable. I then attempt to URL encode them for use in a web API call. I noticed however that several of my calls were producing errors and after researching it appears that the encoding is not working as expected.
This string encoding produces the correct result.
newstring = urllib.parse.quote(u"Müller".encode('utf8'))
print(newstring)
Output
M%C3%83%C2%BCller
However, this code does not produce the correct output
string2 = "Müller"
newstring2 = urllib.parse.quote(string2.encode('utf8'))
print(string2)
Output
Müller
Any idea what the difference is here and how I can fix it so that the second bit of code produces accurate results?
Perhaps you meant to write print(newstring2) in your second example? That will produce the same output as in the first example.
In [1]: string2 = "Müller"
In [2]: print(urllib.parse.quote(string2.encode('utf8')))
M%C3%BCller
I've got Google Protocol buffers 80% working in Python3. My .proto file works, I'm encoding data, life is almost good.
The problem is that I can't ParseFromString the result of SerializeToString.
When I print SerializeToString it looks like what I'd expect, a fairly compact binary representation (preceded by b').
My guess is that perhaps this is a difference in the way Python2 and Python3 handle strings. The putput of SerializeToString is Bytes, not a string.
Printed output of SerializeToString (Python type is ):
b'\x10\xd7\xeb\x8e\xcd\x04\x1a\x0cnamegoeshere2#\x08\x80\xf8\xde\xc3\x9f\xb0\x81\x89\x14\x11\x00\x00\x00\x00\x00\x80d\xc0\x19\x00\x00\x00\x00\x00\xc0m#!\x00\x00\x00\x00\x00\x80R\xc0)\x00\x00\x00\x00\x00x\xb7\xc01\x00\x00\x00\x00\x00\x8c\x95#9\x00\x00\x00\x00\x00\x16\xb2#'
result of ParseFromString(message):
None
No error is provided...
So - my best guess is that all I need to do is .decode() the bytes object generated, the problem is that I have no clue what the encoding is. I've tried UTF-8, -16, Latin-1, and a few others without success. My Google-Fu is strong but I haven't found anything on this.
Any help would be appreciated.
ParseFromString is a method -- it does not return anything, but rather fills in self with the parsed content. Use it like:
message = MyMessageType()
message.ParseFromString(data)
print(message.some_field)
I'm running an Excel macro that calls a LISP script, which has always worked fine in the past, but now it's coming up with this error:
decoding error on stream # >SB-SYS:FD STREAM for "file:Y\...\FILE0617.CMT"
{27B22531}>
(:EXTERNAL-FORMAT :CP1252):
the octet sequence (141) cannot be decoded
What specifically should I be looking for that might be causing this error? The input file is formatted the same as the ones that worked in the past without error.
What does octet sequence 141 refer to?
141 is a cedilla (ç). I'm guessing that you got someone with a name with a ç in it for the first time and Lisp doesn't handle the encoding right.
I needed to create a custom file format with embedded meta information. Instead of whipping up my own format I decide to just use Lua.
texture
{
format=GL_LUMINANCE_ALPHA;
type=GL_UNSIGNED_BYTE;
width=256;
height=128;
pixels=[[
<binary-data-here>]];
}
texture is a function that takes a table as its sole argument. It then looks up the various parameters by name in the table and forwards the call on to a C++ routine. Nothing out of the ordinary I hope.
Occasionally the files fail to parse with the following error:
my_file.lua:8: unexpected symbol near ']'
What's going on here?
Is there a better way to store binary data in Lua?
Update
It turns out that storing binary data is a Lua string is non-trivial. But it is possible when taking care with 3 sequences.
Long-format-string-literals cannot have an embedded closing-long-bracket (]], ]=], etc).
This one is pretty obvious.
Long-format-string-literals cannot end with something like ]== which would match the chosen closing-long-bracket.
This one is more subtle. Luckily the script will fail to compile if done wrong.
The data cannot embed \n or \r.
Lua's built in line-end processing messes these up. This problem is much more subtle. The script will compile fine but it will yield the wrong data. 0x13 => 0x10, 0x1013 => 0x10, etc.
To get around these limitations I split the binary data up on \r, \n, then pick a long-bracket that works, finally emit Lua that concats the various parts back together. I used a script that does this for me.
input: XXXX\nXX]]XX\r\nXX]]XX]=
texture
{
--other fields omitted
pixels= '' ..
[[XXXX]] ..
'\n' ..
[=[XX]]XX]=] ..
'\r\n' ..
[==[XX]]XX]=]==];
}
Lua is able to encode most characters in long bracket format including nulls. However, Lua opens the script file in text mode and this causes some problems. On my Windows system the following characters have problems:
Char code(s) Problem
-------------- -------------------------------
13 (CR) Is translated to 10 (LF)
13 10 (CR LF) Is translated to 10 (LF)
26 (EOF) Causes "unfinished long string near '<eof>'"
If you are not using windows than these may not cause problems, but there may be different text-mode based problems.
I was only able to produce the error you received by encoding multiple close brackets:
a=[[
]]] --> a.lua:2: unexpected symbol near ']'
But, this was easily fixed with the following:
a=[==[
]]==]
The binary data needs to be encoded into printable characters. The simplest method for decoding purposes would be to use C-like escape sequences for all bytes. For example, hex bytes 13 41 42 1E would be encoded as '\19\65\66\30'. Of course, then the encoded data is three to four times larger than the source binary.
Alternatively, you could use something like Base64, but that would have to be decoded at runtime instead of relying on the Lua interpreter. Personally, I'd probably go the Base64 route. There are Lua examples of Base64 encoding and decoding.
Another alternative would be have two files. Use a well defined image format file (e.g. TGA) that is pointed to by a separate Lua script with the additional metadata. If you don't want two files to move around then they could be combined in an archive.