So Ive got a string of:
YDNhZip1cDg1YWg4cCFoKg==
that needs to be decoded using Pythons Base64 module.
Ive written the code
import base64
test = 'YDNhZip1cDg1YWg4cCFoKg=='
print(test)
print(base64.b64decode(test))
which gives the answer
b'`3afup85ah8p!h'
when, according to the website decoders Ive used, its really
`3afup85ah8p!h
Im guessing that its decoding the additional quotes.
Is there some way that I can save this variable with a delimiter, as another type of variable, or run the b64encode on a section of the string as slice doesnt seem to work?
b' is Python's way of delimiting data from bytes, see: What does the 'b' character do in front of a string literal?
i.e., it is decoding it correctly.
Related
I'm trying to fix the string I'm getting from my python script.
I'm doing a call to an API, but it is returning me utf8 String that is still containing unicode encoded characters.
stuff like "Ok\u00c9" should be "Oké".
I tried converting it, but all efforts to fix it seem to result in errors or in the same result. is there someone who could fix this for me in Python 3?
print('\u00c9'.encode().decode('unicode-escape'))
>> é
print('Ok\u00c9'.encode().decode('unicode-escape'))
>> should print 'Oké'
>> but gives an error
hope you guys know the solution. thanks in advance!
Ive found the problem. The encoding decoding was wrong. The text came in as Windows-1252 encoding.
I've use
import chardet
chardet.detect(var3.encode())
to detect the proper encoding, and the did a
var3 = 'OK\u00c9'.encode('utf8').decode('Windows-1252').encode('utf8').decode('utf8')
conversion to eventually get it in the right format!
I created a .docx file.
Now, I do this:
// read the file to a buffer
const data = await fs.promises.readFile('<pathToMy.docx>')
// Converts the buffer to a string using 'utf8' but we could use any encoding
const stringContent = data.toString()
// Converts the string back to a buffer using the same encoding
const newData = Buffer.from(stringContent)
// We expect the values to be equal...
console.log(data.equals(newData)) // -> false
I don't understand in what step of the process the bytes are being changed...
I already spent sooo much time trying to figure this out, without any result... If someone can help me understand what part I'm missing out, it would be really awesome!
A .docXfile is not a UTF-8 string (it's a binary ZIP file) so when you read it into a Buffer object and then call .toString() on it, you're assuming it is already encoding as UTF-8 in the buffer and you want to now move it into a Javascript string. That's not what you have. Your binary data will likely encounter things that are invalid in UTF-8 and those will be discarded or coerced into valid UTF-8, causing an irreversible change.
What Buffer.toString() does is take a Buffer that is ALREADY encoded in UTF-8 and puts it into a Javascript string. See this comment in the doc,
If encoding is 'utf8' and a byte sequence in the input is not valid UTF-8, then each invalid byte is replaced with the replacement character U+FFFD.
So, the code you show in your question is wrongly assuming that Buffer.toString() takes binary data and reversibly encodes it as a UTF8 string. That is not what it does and that's why it doesn't do what you are expecting.
Your question doesn't describe what you're actually trying to accomplish. If you want to do something useful with the .docX file, you probably need to actually parse it from it's binary ZIP file form into the actual components of the file in their appropriate format.
Now that you explain you're trying to store it in localStorage, then you need to encode the binary into a string format. One such popular option is Base64 though it isn't super efficient (size wise), but it is better than many others. See Binary Data in JSON String. Something better than Base64 for prior discussion on this topic. Ignore the notes about compression in that other answer because your data is already ZIP compressed.
I have two encoded strings that used same encoding method but i don't know what type it is.
I have tried using base64 decode but it didn't work.
This is the first encoded string I have 3qpY0Vw86MZykGfqc7jnVg==
This is the second encoded string I have nB6dtl3iA5IE1Z+g9SpBrw==
They are using same encoding method.
I want to know what type of encoding that used in that strings. Also I want to know how to decode it.
Those base64 payload may be containing something else than a string, like a raw binary, an image, a ciphered payload, etc. that can't be displayed as text.
base64 is not exclusively used to encode text.
For example to save to a file:
$ printf %s 'iVBORw0KGgoAAAANSUhEUgAAAGAAAABgCAYAAADimHc4AAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAAZdEVYdFNvZnR3YXJlAEFkb2JlIEltYWdlUmVhZHlxyWU8AAAMfUlEQVR4Xu1bC3BU1Rn+7mN3E8IjEiAY3iQxIA8bKiJSRUofjLZKx0cf6mgfap3O6NjptNNqW6vjYDt2+pip09ZXqXa01amIoqJTpVYIijYIpFEgBAMkhEB4JCG7e/fe2+8/ezdsHhYq7t7d5H6ZO3tz7tm7e//vnO9/nLOaSyCAb9C91wA+ISDAZwQE+IyAAJ8REOAzAgJ8RkCAzwgI8BkBAT4jIMBnBAT4jIAAnzHoCbCbt8La+CicQ41eS25hUBOQ+M9LiK++A7G1K5DY9Be4iZh3JXcwKMvRbvQYrA0PI1H7NziH9/IpNWjDShD+/A8Rmne11ys3MOgIcKMdiL34M9hbVnPERwEjDE3a7Tj0sZWIXH4fjCnzk51zAINOgrRwEQ28AG7HgaTxOfplBsi507aDM+OPcI7s83r7j8HnA3Qd5uxLEVpyG3C83WsUDkiCbiJR/wqsd56gg8gNfzAonbAWHobQwm9Ar1oKN9bptbJdN9RrYsNDdNAvq3O/MWijIH30FIQ//V1ow8dS/y2vleAscOPHEXtlBZyWOq/RPwzqMNSctlBFPoh2IhVrKCmiP3AP7Ub8jQfhdB9W7X4hbwlwY13ovKuCDrXZaxkY5uzLEVr8HfqDE4ZWJIQLkXj7cSS2PEdyvAs+IG8JiD5yFWB1o+ueGUg0bvRa+0OL0B8w9jemMTKi9KSsrWl89IKRsJ75Hux9tarND+QlAdFHvwp7z2YgVEiNH4PuB5Yh9tyPvKv9oZ85i1ER/QEN7jqJEyTQKbuhAsQevxHOsRbVlm3kHQGx1XfAaniDkQ6NTykRI+pFJYjXPILuP12LD1MTc8ZnEF58K+DYtL/jtZIEMwL3SBPiL9ytZC3byCsCXCsKu+55ZrZu0tAp8aac6EzA7J3r0P2H5XAO70m290HovGsQqr5S5QDpBQCtcBSsTY/Bqn3Ka8ke8ooAjXJRcNNqGONncSQnkkZMI0EzC+A0vY3ow1dT199NtqeDEhS64FswJp8L2PHeJIwoRfzp22DvHeB9GUTeSZBeMgUFtzwPo/wiGp9yQknpgUgSSZIZEF15HRLbXmCfE3IjEH9gSpJWNEaRmCJQOeXCYvqDG+B0HlJt2UDeESDQjBAKr38M5tzlHPWM6dPLCkIC/YPbfQTRp29F/B+/7FeGDs1laDr/WjoGvpckpiD3co42I7bq+0lysgDjLsI7zzuYZy8Tq8E58D5cifOZ5aoYn9B4Lka0KUk4th/6xHMYkg5X1wR62Wy4bbvgHNyp/u95H52y/cGb0EaeCWPSPNWWSQyKcrS9/TXE1vw0uerFUazkxIOEnRqfUK9cgvDS22GUzfGu8H0tdYhztMtrOnlK1hJxFIrUTfyEassU8lKC+sI4awkKvvJ7JlsLaVWrl1+QmeCSELt+LXOFO2EzhE2NOEP8wYLroQ8brcLTHn8gRTuSEX3i2xkPTXNWgmRhxd61AW7nQcrB+F6jeiBoI8bCrLhI9Xdb36MtnaQh5Zr4BYNEtDfCad5GR10IfWy5ahMSnI42uPu3KeJSn6P6MzmTtYPQ7EtVWyaQswQ4O99A7NkfwN6xLlnH4eDURpb2GHUgiMYbVUtUycGlHMnSpDjs5EWSIElXRyucDzbxAyxo489WUZMx4Rw4+7bCaf+A/aRrmj9o2sQ8oTgZumYAOUmACIG1cSXs3Rvhdh3kTFhPA21WRoVNTR89lUQMPCNEcozKxWpNwKVBZRSDJPQYledCkLO3lpHSUbVMqY8cB23sdDiNNSp6IlXCgkr3NImgeM9Q9RXq/R83cpIAp3EjrLf+TDlpUyNUDOgeb4ezu4YJ1ha4jFzEiPoIGo5y0hdiakOinlETKCF74EpmzJnTQwLPXRLp7K9jJNRA41dyFsxRI93e8TqlyEr6BMqguehmhJfcBn0484YMIOcIkFFnbXgI9vuv8tslIxN1iAFJhhs9qiTEYcbqNJOMeDeNM7ZXiJmC6LxeOgPoJHltDDfVvTyNlxlEP+G0baf+10E7YyrMmZ8F4p2KBG3YGYgsuxPhi2l8Sl+mkHszQJwvdd/ZU8tRyGQobeQKeoiId3EE16t+9l7KE0e5VjxB1XXSoY86E/rkavbvJGEMN3s5Z5IgURKlSpGpGQgt/Dpn3gFEaHiz+sqevplCzuUBIg3ukb3U/zeR2Px32I3rSYSjFlAGioRSMbtkvzqdqc5QNDTnMujjKr0eSTjU9kTNI2qXnHucOi/SlgbX6mYOEUFk+S9gTF/EUT/eu5JZ5GwiJl/KbW9S0pF49xkeq9jAGSFrAEJE2qyQ+F09BnMAVaYeM50+YB7M+V9TYWYKUpJIbHkW1qu/ZsSzWznq1H2k0qoNH4eC6x6lP5ir2rKBvMiE3S5qeGs9HfNjSNS9REMzMhEiVCQ0ABESYsqeoNFToJedQym5Amb5omQXHvaO12CtvQ+2aL+EppxhIjXhLz+A0MzPqX7ZQl4QkIJIh9OxH4m3n4S1/iFlaESK+uu0ECGmlsSKGo9hjOMnViO06KaeXXFSP4o992NKXA0Q60Tkqt8mC3TpMysLyCsCUhC5kPjeqn0a1j9/pyIXqfX/LyJ4UTloY0wFzPOvhzn7C2q2xFZeA33qAoQZbqKPX8gG8pKAFNT6Lh2wbLSKvXI/szePiFT2mwb1mEIEIWVnffxMhM6/gQ73U0BRCQd+dkd+CjlFgHyRj2wGOlirbg3ia1dQXnao0S6G7gv1uLJII+Eoc4iCm1cli3g+IWcIkMgk+tqvYEyeD7PiQuhTFsCcfr539dQhRbjEllWIS6Sz59/QikYrR9sXkkkblUsQuXwF9DMmea3ZR04QIGWG2Mv3IfHmSsbi1GFZpVLxfQxayTTok86FWbUUJrVaK5nivevksJkxq4IecwrJbCXOT80xjQlf5JoHYTJn8BM5QYBzYCdia34Cu+FfnpHSIDovezup9bLHH8yAzfLFnCHzSUy12gk9UBkiHc6+dxF76V44u9artQEh15i1DJFL74ZePNHr5Q9yggC74XXEnrqd2Wp7Mmw8CVySIbsa1OKLxVlSMJzh5XnQz1qiEjCj/IIB7yNJXfzFe2BtW4OCL/0coYXf9K74h5wgIPHOk+h+4ibKxGh+Iy+5YlSixOKk0QkDTYePoAjhITOFfzrDTamI6pPPVStm+rgqFenIw6oqK6VOKxiRvIWP8J8AO8HsltHLut+ojFdpv1pW9ErCAiHBI+aUSOEjqRCV91b7fxJRytQo6BPnkIiZMD95tfInuYCMEOAwtY9Fo/0MZRhM98P9Q8MUpCBmN70DZ+9mVasX34DoUWo/yeA1kRsVQqYI4etABbqBIOUGyZxl90TBJXcjtPR27wo5sm3E45w9aRCzhEIhdWQSGSGgo6MDr69b5xmIxpIRyfaysjJUz/v/tnq4h5tgt76fJKalLrnCxfjdtY6TFGbE0WOqpHwqpKg1X2a7keX303lf4rUCzS0tqNu6VZGQSsgSiQSqZsxAVVWV+j9TyAgBXV1d2LB+vXoYpbveR5SWlmLO3NOrNMoscVq3JxdS2hoYar6V/BlSrEutEUhdR0oVso6QIkRe1fegHOljKhH+4r0wp52QoNbWVrxXXw/Lsk4QwPPyykpUVFSo/zOFvCNgIMhWROfgLrgHG5RsOYcaODNIhPxIjzNEESS5BR20cfYyhBl+GiVTvXcHBGQEzqHdsPfXq58iqe0oR1vgHm1WRbjwhbf0KrwNaQLUipasRkkf7+FPGyI9IkF6iKc8p+N2ju6He2A7NFmiLO2t60OaAOdQExK1f1VbRJTRMgW5N6Mpc9Yl/YpvQ5oAu/EtxJ68MfljO+PkWfDpQCKmyGUrEL74Vq8lCT8JGDheyyZkZMrabGS4qulk9AgXqaXKXIL/BAxx5AABlCcpG0iWKiWIDB7qcySTziH4T4Ds2xzByGTUhIwfWvEkJXW5BP/DUNnJ3N7EgcnR6TnAjIGfoY0qU1sZ0xEkYj5jaEdBQxwBAT4jqwToH/KjCr9hmCd+oJdtZNUHRCIRFBcXqwWbXIF8v2g0is7OTvW9su0DskKAIPUxGfi400bqOwqyTUDWNCH1YClSculIIf08W8iqKPd98Fw7/EBGCBCREamRQ3Q1L48sSWXGfMDGmhrfRtXHAdkpMb28HOU8MomMEBDg1JGbgfkQQkCAzwgI8BkBAT4jIMBnBAT4jIAAnxEQ4DMCAnxGQIDPCAjwGQEBPiMgwGcEBPiMgACfERDgMwICfEZAgM8ICPAZAQG+Avgv4WYH0htgNTQAAAAASUVORK5CYII=' | base64 -d > stackoverflow.png
I am trying to create a CSV file that contains Arabic tweets collected using tweepy for a project I am doing. All is fine gathering the data, however, when i am writing to the CSV file all Arabic results are escaped with \xXXXX sequences
as follows:
b'#\xd8\xa7\xd9\x84\xd9\x8a\xd9\x88\xd9\x85_\xd8\xa7\xd9\x84\xd8\xb9\xd8\xa7\xd9\x84\xd9\x85\xd9\x8a_\xd9\x84\xd9\x84\xd8\xa7\xd8\xb9\xd8\xa7\xd9\x82\xd9\x87_2017 \xd8\xa7\xd9\x84\xd8\xa5\xd8\xb9\xd8\xa7\xd9\x82\xd8\xa9 \xd8\xa7\xd9\x84\xd8\xad\xd9\x82\xd9\x8a\xd9\x82\xd9\x8a\xd8\xa9 \xd8\xa7\xd8\xb9\xd8\xa7\xd9\x82\xd8\xa9 \xd8\xa7\xd9\x84\xd9\x81\xd9\x83\xd8\xb1 \xd9\x88\xd9\x84\xd9\x8a\xd8\xb3\xd8\xaa \xd8\xa7\xd8\xb9\xd8\xa7\xd9\x82\xd8\xa9
I looked at many previously asked questions and all I could find was suggestions for python 2 or answers similar to the one I am writing. When I was creating JSON files instead I was using ensure_ascii=False but I couldn't find anything similar for CSV. Below is my code:
with codecs.open('tweets.csv', 'a', encoding='utf-8') as file:
fieldnames = ['tweet', 'country']
writer = csv.DictWriter(file, fieldnames=fieldnames)
data = {'tweet': status.text, 'country': status.place.full_name}
writer.writerow(data)
I tried adding .encoding='utf-8' to status.text and status.place as well but that also didn't work. Any suggestions?
You have to make sure the Arabic string you have is decoded into UTF-8 before you write it. Assuming status.text is of type bytes you should type text=status.text.decode('utf-8'). (Maybe you have to do this for status.place.full_name too.) But if it's of type str then it won't have an decode() method. To avoid escape sequences in your file, a str object should be written anyway.
If you try to specify the encoding of a bytes object (like the one you presumably have) as 'utf-8' that won't work because the text is already in UTF-8 bytes. So in order to get UTF-8 characters you must call decode() on the bytes object. That way it writes the UTF-8 characters and not the UTF-8 bytes.
I just updated from python 3.1 to python 3.2 (formatted HD) and one of my scripts stopped working. It gives me the error in the title.
I would fix it myself but I don't even know what an iterable of bytes is lol. I tried typecasting bytes(data) but that didn't work either. TypeError: string argument without an encoding
url = "http://example.com/index.php?app=core&module=global§ion=login&do=process"
values = {"username" : USERNAME,
"password" : PASSWORD}
data = urllib.parse.urlencode(values)
req = urllib.request.Request(url, data)
urllib.request.urlopen(req)
It crashes at the last line.
Works in 3.1, but not 3.2
You did basically correct in trying to convert the string into bytes, but you did it the wrong way. Python doesn't have typecasting (so what you did was not typecasting).
The way to do it is to encode the text data into bytes data, which you do with the encode function:
binary_data = data.encode('encoding')
What 'encoding' should be depends. You should probably use 'ascii' here. If you have characters that aren't ASCII, then you need to use another encoding, typically 'utf8', but then you also need to tell the receiving webserver that it is UTF-8. It might also not want UTF8, but then you have to ask it, and it's getting complicated. :-)
#Enders, I know this is an old question, but I'd like to explain a few more things for somebody fighting with this issue.
It is specifically with this line of code here:
data = urllib.parse.urlencode(values)
That you are having issues, as you are trying to encode the data: values (urlencode).
If you refer to the urllib.parse documentation scroll to the bottom to find what urlencode does: https://docs.python.org/3/library/urllib.parse.html <~ you will see that you are trying to encode your user/pass into a data string:
Convert a mapping object or a sequence of two-element tuples, which may contain str or bytes objects, to a percent-encoded ASCII text string. If the resultant string is to be used as a data for POST operation with the urlopen() function, then it should be encoded to bytes, otherwise it would result in a TypeError.
Perhaps what you are trying to do here is do some kind of encryption of your user/password, but I don't really think this is the right way. If it is, then you probably need to make sure that the receiving end (the destination of your url) know that you're encoding your user/pass with this.
A more up-to-date approach is to use the powerful Requests library. They have compatibility with very common authentication protocols: http://docs.python-requests.org/en/master/user/authentication/
In this case, I'd do something like this:
requests.get(url, auth=('user', 'pass'))