base64.encodebytes fails to insert newline chars - python-3.x

I must be missing something obvious. The function below successfully generates a base64 encoded string from an image file, but according to the docs, I expected it to have newlines every 76 characters.
def generateBase64(filein):
import base64
with open(filein, 'rb') as f:
return base64.encodebytes(f.read())
calling it on an image file (.png) thus: print(generateBase64(imgpath)) just returns one long string. What am I doing wrong?

Related

Decoding Base 64 In Groovy Returns Garbled Characters

I'm using an API which returns a Base64 encoded file that I want to parse and harvest data from. I'm having trouble decoding the Base64, as it comes back with garbled characters. The code I have is below.
Base64 decoder = new Base64()
def jsonSlurper = new JsonSlurper()
def json = jsonSlurper.parseText(Requests.getInventory(app).toString())
String stockB64 = json.getAt("stock")
byte[] decoded = decoder.decode(stockB64)
println(new String(decoded, "US-ASCII"))
I've also tried println(new String(decoded, "UTF-8")) and this returns the same garbled output. I've pasted in an example snipped of the output for reference.
� ���v���
��W`�C�:�f��y�z��A��%J,S���}qF88D q )��'�C�c�X��������+n!��`nn���.��:�g����[��)��f^���c�VK��X�W_����������?4��L���D�������i�9|�X��������\���L�V���gY-K�^����
��b�����~s��;����g���\�ie�Ki}_������
What am I doing wrong here?
You don't need the Base64 class wherever you took it from. You can simply do stockB64.decodeBase64() to get the decoded byte array. Are you sure what you have there is actual text that is encoded. Usually base64 encoded means that this is some binary like an image. If it is text you could have put it as string in the json simply. Maybe save the resulting byte array to a file and then investigate the file type by content.

How to encode Cyrillic characters in JSON

I want to read a JSON file containing Cyrillic symbols.
The Cyrillic symbols are represented like \u123.
Python converts them to '\\u123' instead of the Cyrillic symbol.
For example, the string "\u0420\u0435\u0433\u0438\u043e\u043d" should become "Регион", but becomes "\\u0420\\u0435\\u0433\\u0438\\u043e\\u043d".
encode() just makes string look like u"..." or adds a new \.
How do I convert "\u0420\u0435\u0433\u0438\u043e\u043d" to "Регион"?
If you want json to output a string that has non-ASCII characters in it then you need to pass ensure_ascii=False and then encode manually afterward.
Just use the json module.
import json
s = "\u0420\u0435\u0433\u0438\u043e\u043d"
# Generate a json file.
with open('test.json','w',encoding='ascii') as f:
json.dump(s,f)
# Reading it directly
with open('test.json') as f:
print(f.read())
# Reading with the json module
with open('test.json',encoding='ascii') as f:
data = json.load(f)
print(data)
Output:
"\u0420\u0435\u0433\u0438\u043e\u043d"
Регион

How to get MIME type base64 encoded string in nodejs?

When I convert the data to base64, it gives a single line of base64 string.
image = body.toString('base64');
How can I get base64 string used in MIME types which is wrapped at every 76 characters?
Is there any default method in node to achieve that?
There is no built-in method in nodejs for encoding to base64 with line breaks. But there is mimelib library to achieve this:
To add line breaks
mimelib.foldLine(str, 76)
To encode to base64 with line breaks
mimelib.encodeBase64(str)
To break the resulting base-64 string into lines of no more than 76 characters, one can use replace(), e.g.,
body.toString('base64').replace(/.{76}/g, '$&\n')
. = match any character other than newline
{76} = repeat that match exactly 76 times, i.e., split the string into 76-character chunks
g = globally, i.e., keep going until out of data in the string
$& = insert the matched substring
\n= followed by a newline

urlopen() throwing error in python 3.3

from urllib.request import urlopen
def ShowResponse(param):
uri = str("mysite.com/?param="+param+"&submit=submit")
print(urlopen(uri).read())
file = open("myfile.txt","r")
if file.mode == "r":
filelines = file.readlines()
for line in filelines:
line = line.strip()
ShowResponse(line)
this is my python code but when i run this it causes an error "UnicodeEncodeError: 'ascii' codec can't encode characters in position 47-49: ordinal not in range(128)"
i dont know how to fix this. im new to python
I'm going to assume that the stack trace shows that line 4 (uri = str(...) is throwing the given error and myfile.txt contains UTF-8 characters.
The error is because you're trying to convert a Unicode object (decoded from assumed UTF-8) to an ASCII string object. ASCII simply can not represent your character.
URIs (including the Query String) must encode non-ASCII chars as percent-encoded UTF-8 bytes. Example:
€ (EURO SIGN) is encoded in UTF-8 is:
0xE2 0x82 0xAC
Percent-encoded, it's:
%E2%82%AC
Therefore, your code needs to re-encode your parameter to UTF-8 then percent-encode it:
from urllib.request import urlopen, quote
def ShowResponse(param):
param_utf8 = param.encode("utf-8")
param_perc_encoded = quote(param_utf8)
# or uri = str("mysite.com/?param="+param_perc_encoded+"&submit=submit")
uri = str("mysite.com/?param={0}&submit=submit".format(param_perc_encoded) )
print(urlopen(uri).read())
You'll also see I've changed your uri = definition slightly to use String.format() (https://docs.python.org/2/library/string.html#format-string-syntax), which I find easier to create complex strings rather than doing string concatenation with +. In this example, {0} is replaced with the first argument to .format().

Extracting source code from html file using python3.1 urllib.request

I'm trying to obtain data using regular expressions from a html file, by implementing the following code:
import urllib.request
def extract_words(wdict, urlname):
uf = urllib.request.urlopen(urlname)
text = uf.read()
print (text)
match = re.findall("<tr>\s*<td>([\w\s.;'(),-/]+)</td>\s+<td>([\w\s.,;'()-/]+)</td>\s*</tr>", text)
which returns an error:
File "extract.py", line 33, in extract_words
match = re.findall("<tr>\s*<td>([\w\s.;'(),-/]+)</td>\s+<td>([\w\s.,;'()-/]+)</td>\s*</tr>", text)
File "/usr/lib/python3.1/re.py", line 192, in findall
return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object
Upon experimenting further in the IDLE, I noticed that the uf.read() indeed returns the html source code the first time I invoke it. But then onwards, it returns a - b''. Is there any way to get around this?
uf.read() will only read the contents once. Then you have to close it and reopen it to read it again. This is true for any kind of stream. This is however not the problem.
The problem is that reading from any kind of binary source, such as a file or a webpage, will return the data as a bytes type, unless you specify an encoding. But your regexp is not specified as a bytes type, it's specified as a unicode str.
The re module will quite reasonably refuse to use unicode patterns on byte data, and the other way around.
The solution is to make the regexp pattern a bytes string, which you do by putting a b in front of it. Hence:
match = re.findall(b"<tr>\s*<td>([\w\s.;'(),-/]+)</td>\s+<td>([\w\s.,;'()-/]+)</td>\s*</tr>", text)
Should work. Another option is to decode the text so it also is a unicode str:
encoding = uf.headers.getparam('charset')
text = text.decode(encoding)
match = re.findall("<tr>\s*<td>([\w\s.;'(),-/]+)</td>\s+<td>([\w\s.,;'()-/]+)</td>\s*</tr>", text)
(Also, to extract data from HTML, I would say that lxml is a better option).

Resources