Python 3 - Base64 Encoding - python-3.x

I have some images, that I need to give to a server using JSON. I decided to use Base64 as encoding system.
In Python 2, I could simply use:
with open(path, "rb") as imageFile:
img_file = imageFile.read()
img_string = base64.b64encode(img_file)
but in Python 3 it doesnt work anymore.
What do I have to change to get this in Python 3 to work?

I followed the solution from this link it seems to work for me. So when you read the image in binary convert it to a string and then just encode the string with base64. The following solution is from the link above. Here is the tested code.
import base64
image = open(image, 'rb')
image_read = image.read()
image_64_encode = base64.encodestring(image_read)

Finally I found a code running on Python 3.7:
# Get the image
image = open(path, 'rb')
image_read = image_read()
# Get the Byte-Version of the image
image_64_encode = base64.b64encode(image_read)
# Convert it to a readable utf-8 code (a String)
image_encoded = image_64_encode.decode('utf-8')
return image_encoded

Related

Python how to extract the xml part from xml.p7m file

I have to extract information from a xml.p7m (Italian invoice with digital signature function, I think at least.).
The extraction part is already done and works fine with the usual xml from Italy, but since we get those xml.p7m too (which I just recently discovered), I'm stuck, because I can't figure out how to deal with those.
I just want the xml part so I start with those splits to remove the signature part:
with open(path, encoding='unicode_escape') as f:
txt = '<?xml version="1.0"' + re.split('<?xml version="1.0"',f.read())[1]
txt = re.split('</FatturaElettronica>', txt)[0] + "</FatturaElettronica>"
So what I'm stuck with now is that there are still parts like this in the xml:
""" <Anagrafica>
<Denominazione>AUTOCARROZZERIA CIANO S.R.L.</Denominazione>
</Anagraf♦♥èica>"""
which makes the xml not well formed, obviously and the data extraction is not working.
I have to use unicode_escape to open the file and remove those lines, because otherwise I would get an error because those signature parts can't be encoded in utf-8.
If I encode this part, I get:
b' <Anagrafica>\n <Denominazione>AUTOCARROZZERIA CIANO S.R.L.</Denominazione>\n </Anagraf\xe2\x99\xa6\xe2\x99\xa5\xc3\xa8ica>'
Anyone an idea on how to extract only the xml part from the xml?
Btw the xml should be: but if I open the xml, there are already characters that don't belong to the utf-8 charset or something?
Edit:
The way I did it at first was really not optimal. There was to much manual work, so I searched further for a real solution and found this:
from OpenSSL._util import (
ffi as _ffi,
lib as _lib,
)
def removeSignature(fileString):
p7 = crypto.load_pkcs7_data(crypto.FILETYPE_ASN1, fileString)
bio_out =crypto._new_mem_buf()
res = _lib.PKCS7_verify(p7._pkcs7, _ffi.NULL, _ffi.NULL, _ffi.NULL, bio_out, _lib.PKCS7_NOVERIFY|_lib.PKCS7_NOSIGS)
if res == 1:
return(crypto._bio_to_string(bio_out).decode('UTF-8'))
else:
errno = _lib.ERR_get_error()
errstrlib = _ffi.string(_lib.ERR_lib_error_string(errno))
errstrfunc = _ffi.string(_lib.ERR_func_error_string(errno))
errstrreason = _ffi.string(_lib.ERR_reason_error_string(errno))
return ""
What I'm doing now is checking the xml if it's allready in proper xml format, or if it has to be decoded at first, after that I remove the signature and form the xml tree, so I can do the xml stuff I need to do:
if filePath.lower().endswith('p7m'):
logger.infoLog(f"Try open file: {filePath}")
with open(filePath, 'rb') as f:
txt = f.read()
# no opening tag to find --> no xml --> decode the file, save it, and get the text
if not re.findall(b'<',txt):
image_64_decode = base64.decodebytes(txt)
image_result = open(path + 'decoded.xml', 'wb') # create a writable image and write the decoding result
image_result.write(image_64_decode)
image_result.close()
txt = open(path + 'decoded.xml', 'rb').read()
# try to parse the string
try:
logger.infoLog("Try parsing the first time")
txt = removeSignature(txt)
ET.fromstring(txt)
I had a similar problem, some chars in file were not decoded correctly.
It was caused by a BOM file type.
You can try to use utf-8-sig encoding to read the file, like this:
with open(path, encoding='utf-8-sig') as f:
...
The easiest system to use is openssl:
C:\OpenSSL-Win64\bin\openssl.exe smime -verify -noverify -in **your.xml.p7m** -inform DER -out **your.xml**

How to convert image which type is bytes to numpy.ndarray?

I'm trying to optimize my code.
First, I get an image, which type is bytes
Then I have to write that image to file system.
with open('test2.jpg', 'wb') as f:
f.write(content)
Finally I read this image with
from scipy import misc
misc.imread('test2.jpg')
which convert image to np.array.
I want to skip part where I write image to file system, and get np.array.
P.S.: I tried to use np.frombuffer(). It doesn't work for me, cause two np.arrays are not the same.
Convert str to numpy.ndarray
For test you can try yourself:
file = open('test1.jpg', 'rb')
content = file.read()
My first answer in rap...
Wrap that puppy in a BytesIO
And away you go
So, to generate some synthetic data similar to what you get from the API:
file = open('image.jpg','rb')
content = file.read()
That looks like this which has all the hallmarks of a JPEG:
content = b'\xff\xd8\xff\xe0\x00\x10JFIF...
Now for the solution:
from io import BytesIO
from scipy import misc
numpyArray = misc.imread(BytesIO(content))

Python 3 csv read does not recognize Cyrillic script

I have the following Python code:
from urllib import request
url_base = "https://translate.google.com"
url_params_list = "/#view=home&op=translate&sl=ru&tl=en&text="
with open('top5000russianlemmasraw.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
for row in csv_reader:
url = url_base + url_params_list + request.quote(row[0].encode('cp1251'))
print(url)
The file top5000russianlemmasraw.csv is a list of words in Cyrillic script.
The problem with the code is that the Cyrillic script is imported as strings of question marks, e.g. '????', which then get converted to URL code as '%3F%3F%3F%3F' type strings. I am not sure how to get Python to import Cyrillic script so that it does not show up as a question mark. Would appreciate help on this.
The open() built-in defaults to the encoding returned by locale.getpreferredencoding(). You can override this with the keyword parameter
# ...
with open('top5000russianlemmasraw.csv', encoding='cp1251') as csv_file:
# ...
Or, alternatively you could open the file as bytes and then decode in chunks
with open('top5000russianlemmasraw.csv', 'rb') as csv_file:
blob = csv_file.read()
text = blob.decode('cp1251')
# ...

python3 converting unreadable character to readable chracter

hi i have a text file and i am reading file and parsing datas,
but my file contains some text like
\u03a4\u03c1\u03b5\u03b9\u03c2 \u03bd\u03b5\u03ba\u03c1\u03bf\u03af \u03b1\u03c0\u03cc \u03c0\u03c4\u03ce\u03c3\u03b7 \u03bf\u03b2\u03af\u03b4\u03b1\u03c2 \u03c3\u03b5 \u03c3\u03c0\u03af\u03c4\u03b9 \u03c3\u03c4\u03bf \u03a3\u03b9\u03bd\u03ac
how can i convert a it readable text with python
i try to use these codes to solve but it doesn't work
def encodeDecode(self, data):
new_data = ''
for ch in data:
#let = ch.encode('utf-8').decode('utf-8')
#new_data += let
new_data += repr(ch)[1:2]
return new_data
There is no problem with your string,you have a unicode data.Just based on how you want to use it you can decode it custom or using python default encoding for example if you want to print it, since strings in python 3 are unicode you can just print it.
>>> s="""\u03a4\u03c1\u03b5\u03b9\u03c2 \u03bd\u03b5\u03ba\u03c1\u03bf\u03af \u03b1\u03c0\u03cc \u03c0\u03c4\u03ce\u03c3\u03b7 \u03bf\u03b2\u03af\u03b4\u03b1\u03c2 \u03c3\u03b5 \u03c3\u03c0\u03af\u03c4\u03b9 \u03c3\u03c4\u03bf \u03a3\u03b9\u03bd\u03ac """
>>>
>>> print s
Τρεις νεκροί από πτώση οβίδας σε σπίτι στο Σινά
>>>
But if you want to write your data in a file you need to use a proper encoding for your file.
You can do it with passing your encoding to open() function when you open a file for writing.
You could also convert it using Python's json module - this would also work in Python 2x
>>> f = open('input.txt', 'r')
>>> json_str = '"%s"' % f.read().replace('"', '\\"') # wrap the input string in double quotes
>>> print(json.loads(json_str))
Τρεις νεκροί από πτώση οβίδας σε σπίτι στο Σινά

Custom filetype in Python 3

How to start creating my own filetype in Python ? I have a design in mind but how to pack my data into a file with a specific format ?
For example I would like my fileformat to be a mix of an archive ( like other format such as zip, apk, jar, etc etc, they are basically all archives ) with some room for packed files, plus a section of the file containing settings and serialized data that will not be accessed by an archive-manager application.
My requirement for this is about doing all this with the default modules for Cpython, without external modules.
I know that this can be long to explain and do, but I can't see how to start this in Python 3.x with Cpython.
Try this:
from zipfile import ZipFile
import json
data = json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])
with ZipFile('foo.filetype', 'w') as myzip:
myzip.writestr('digest.json', data)
The file is now a zip archive with a json file (thats easy to read in again in many lannguages) for data you can add files to the archive with myzip write or writestr. You can read data back with:
with ZipFile('foo.filetype', 'r') as myzip:
json_data_read = myzip.read('digest.json')
newdata = json.loads(json_data_read)
Edit: you can append arbitrary data to the file with:
f = open('foo.filetype', 'a')
f.write(data)
f.close()
this works for winrar but python can no longer process the zipfile.
Use this:
import base64
import gzip
import ast
def save(data):
data = "[{}]".format(data).encode()
data = base64.b64encode(data)
return gzip.compress(data)
def load(data):
data = gzip.decompress(data)
data = base64.b64decode(data)
return ast.literal_eval(data.decode())[0]
How to use this with file:
open(filename, "wb").write(save(data)) # save data
data = load(open(filename, "rb").read()) # load data
This might look like this is able to be open with archive program
but it cannot because it is base64 encoded and they have to decode it to access it.
Also you can store any type of variable in it!
example:
open(filename, "wb").write(save({"foo": "bar"})) # dict
open(filename, "wb").write(save("foo bar")) # string
open(filename, "wb").write(save(b"foo bar")) # bytes
# there's more you can store!
This may not be appropriate for your question but I think this may help you.
I have a similar problem faced... but end up with some thing like creating a zip file and then renamed the zip file format to my custom file format... But it can be opened with the winRar.

Resources