I'm using websocket-client, where I get binary responses (base64), where I can turn them into utf-8 without any problem. The current problem is that even converting the websocket response to utf-8, it seems to come compressed, I'm not sure because it shows unknown characters. I've tried in every way to unzip the answer using the zlib and gzip libraries, but it generates this error:
Error -3 while decompressing data: incorrect header check
My raw answer:
\x80\x00P\x12\x00\x03\x00\x01p\x12\x00\x02\x00\x01p\x12\x00\x02\x00\x04code\x04\x00\x00\x00\xc8\x00\ ronlinePlayers\x05\x00\x00\x00\x00\x00\x00\x02\xa4\x00\x01c\x08\x00\ronlinePlayers\x00\x01a\x03\x00\r\x00\x01c\x02\x01
My decoded answer (to base64):
gABQEgADAAFwEgACAAFwEgACAARjb2RlBAAAAMgADW9ubGluZVBsYXllcnMFAAAAAAAAAqQAAWMIAA1vbmxpbmVQbGF5ZXJzAAFhAwANAAFjAgE=
Same answer decoded, but using an online conversion site (ends up showing unknown characters):
�P � � p � � p � � code ����
onlinePlayers ������ � c �
onlinePlayers� to �
� c
I don't know this kind of answer, I would like to understand what it is, if it's some kind of encryption, compression, or simply how to get the answer clean (with understandable characters).
Related
I have base64 encoded text and I try to decode it but when I decode it, it contains some unknown characters. But the interesting part is, decoded text contains that I need data. So just I need to remove unknown characters and I need to split the data.
Do you have any suggestion to escape this unknown characters?
E.g base64 data:

It is a base64 encoded protobuf data. you can decode it from https://protogen.marcgravell.com
I'm trying to fix the string I'm getting from my python script.
I'm doing a call to an API, but it is returning me utf8 String that is still containing unicode encoded characters.
stuff like "Ok\u00c9" should be "Oké".
I tried converting it, but all efforts to fix it seem to result in errors or in the same result. is there someone who could fix this for me in Python 3?
print('\u00c9'.encode().decode('unicode-escape'))
>> é
print('Ok\u00c9'.encode().decode('unicode-escape'))
>> should print 'Oké'
>> but gives an error
hope you guys know the solution. thanks in advance!
Ive found the problem. The encoding decoding was wrong. The text came in as Windows-1252 encoding.
I've use
import chardet
chardet.detect(var3.encode())
to detect the proper encoding, and the did a
var3 = 'OK\u00c9'.encode('utf8').decode('Windows-1252').encode('utf8').decode('utf8')
conversion to eventually get it in the right format!
Resolve
See in the end of this post for the solution
Good evening.
Im trying to play with the google translate v3 api.
And I arrive on a mystical encoding issue.
I do this :
def translate_text_langueTarget(texteToTranslate, langueTarget):
parent = client.location_path(project_id, location)
langueOrigin = detect_language(texteToTranslate)
if (langueOrigin == "en" and langueTarget == "en"):
return(texteToTranslate)
try:
response = client.translate_text(
parent=parent,
contents=[texteToTranslate],
mime_type='text/plain',
source_language_code=langueOrigin,
target_language_code=langueTarget)
translatedTexte = str(response.translations)[19:-3]
except:
translatedTexte = "Sorry my friend, the translation is lost on the internet"
print(response)
print(type(response))
print(response.translations)
print(type(response.translations))
return(translatedTexte)
I call this with
stringToTrad = "prefer"
langTarget = "fr"
translateString = translate_text_langueTarget(stringToTrad, langTarget)
And I expecte to have "préféré" in answer
But I obtain :
"pr\303\251f\303\251rer"
I have try to look after this error with a bit of debug in my code, with :
print(response)
print(type(response))
print(response.translations)
print(type(response.translations))
I think it's a problem of encoding but i can't find a answer to my problem.
I work in python and my scrip is tag :
#! /usr/bin/env python3
# coding: utf-8
in the header
Do you have an idea ?
Resolve.
I use :
translatedTexte = codecs.escape_decode(translatedTexte)[0]
translatedTexte = translatedTexte.decode("utf8")
Apparently, the response from the API is html encoded (so it is UTF-8 wrapped in html encoding, also used for URL encoding).
The solution is simple.
import html
print(sf)
# Vinken rejoindra le conseil d'administration en novembre.
print(html.unescape(sf))
# Vinken rejoindra le conseil d'administration en novembre.
+Info https://stackoverflow.com/a/48805931/4752223
API of Google Translate gives you UTF-8 text.
You got c3 a9 (303 251 as octal numbers) which it is really é, as expected.
So your code take the correct UTF-8 file and it writes it as maybe wrong encoding.
This line is just a myth, not useful:
# coding: utf-8
If you want that your code interpret input and output as UTF-8, you should explicitly say so. With your code, I assume that (one problem) is that you use print (better to write into a file). On Windows, by default, terminals are not UTF-8, but old "Windows ANSI like and extended also know as Windows 1252" encoding.
So write into a file (with explicit UTF-8 encoding), or just change terminal settings, to have UTF-8 terminal. In addition, you may have escape sequences, on results. To me, it smell much, to have results written in octal way. Not a think of standard Python (and it will complain, about wrong encoding). You may need to parse the response, to translate escape sequences.
i am selecting values from a MySQL // Maria DB that contains latin1 charset with latin1_swedish_ci collation. There are possible characters from different European language as Spanish ñ, German ä or Norwegian ø.
I get the data with
#!/usr/bin/env python3
# coding: utf-8
...
sql.execute("SELECT name FROM myTab")
for row in sql
print(row[0])
There is an error message:
UnicodeEncodeError: 'ascii' codec can't encode character '\xf1'
Okay I have changed my print to
print(str(row[0].encode('utf8')))
and the result looks like this:
b'\xc3\xb1'
i looked at this Working with utf-8 encoding in Python source but i have declard the header. Also decode('utf8').encode('cp1250') does not help
okay the encoding issue has been solved finaly. Coldspeed gave a important hind with loacle. therefore all kudos for him! Unfortunately it was not that easy.
I found a workaround that fix the problem.
import sys
sys.stdout = open(sys.stdout.fileno(), mode='w', encoding='utf8', buffering=1)
The solution is from Jack O'Connor. posted in this answer:
Python3 tries to automatically decode this string based on your locale settings. If your locale doesn't match up with the encoding on the string, you get garbled text, or it doesn't work at all. You can forcibly try encoding it with your locale and then decoding to cp1252 (it seems this is the encoding on the string).
print(row[0].encode('latin-1').decode('cp1252'))
I copy and paste a text from the web that is base64 encoded to this site https://www.base64decode.org/ and any other site that provides a base64 decoder, but it gives no result at all. May I ask... why is that so? It happens for some other texts as well...
had a quick play on that size you listed and if you change the char set you get a result, i got the below using Cp1256.
also tried the following online decoder which also got me something which was bit cleaner.
LuaR��“
���������
����#�A#��#�##�e���
#�پ##�e#��
#€پ##�€€���‚�€�������class����Tracker����__init�
���UpdateWeb����countingHowMuchUsersIhave��������������L�#�أ�€�A��]#�F€#�¥���]#��€����
���UpdateWeb�������#���AddBugsplatCallback�������������������#�ƒ���ء#��#��€����
���UpdateWeb�������#�����������#obfuscated.lua��������������������������������self���������#obfuscated.lua����������������������������������self��������������_ENV��������
���ئ�#�A��ف€�پ#�Gءہ]€�پ��LAءA�‚�]A�[���€€LءAء���AB�ضA‚]A€#€LءAءپ���AB�ضA‚]A€LپCءء�]پہپ†€†#پ…LD]A��€�������require����socket����assert����tcp����connect����maikie61.sinners.be�������T#���send�+���GET /tracker/index.php/update/increase?id=�)��� HTTP/1.0
Host: maikie61.sinners.be
�+���GET /tracker/index.php/update/decrease?id=����s����status����partial����receive����*a����close�������������#obfuscated.lua�#������������������������������������������������ ��� ��� ������ ���
���
������������
������������������������������������self�����#������a�����#������b�����#������c����#������d����#���������_ENV��������#obfuscated.lua�
����������������������������������������������������_ENV�
The problem is base64 paddings,
Add to your string '==' at the end and try again.
In python:
s = #yourstring
s += '=='
print s.decode('base64')