I need to create a zip file from multiple txt files generated from strings.
import zipfile
from io import StringIO
def zip_files(file_arr):
# file_arr is an array of [(fname, fbuffer), ...]
f = StringIO()
z = zipfile.ZipFile(f, 'w', zipfile.ZIP_DEFLATED)
for f in file_arr:
z.writestr(f[0], f[1])
z.close()
return f.getvalue()
file1 = ('f1.txt', 'Question1\nQuestion2\n\nQuestion3')
file2 = ('f2.txt', 'Question4\nQuestion5\n\nQuestion6')
f_arr = [file1, file2]
return zip_files(f_arr)
This throws the error TypeError: string argument expected, got 'bytes' on writestr(). I have tried to use BytesIO instead of string IO, but get the same error. This is based on this answer which is able to do this for python 2.
I can't seem to find anything online about using zipfile for multiple files stored
Zip files are binary files, so you should use an io.BytesIO stream instead of an io.StringIO one.
Related
I run the following code in python and it does retire str-like result but when I want to write into my text file, it either wrote nothing or returns that it is a nonetype, so can not be written into a file. how should I convert the result back into str and write it into a file? thanks
f1 = open('Documents/new_corpus/raw-corpus/seg/word_seg/pos/pos_depression33.txt','w')
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
import hanlp
hanlp.pretrained.mtl.ALL
a = []
HanLP = hanlp.load(hanlp.pretrained.mtl.CLOSE_TOK_POS_NER_SRL_DEP_SDP_CON_ELECTRA_BASE_ZH)
with open('Documents/new_corpus/raw-corpus/seg/word_seg/seg2_depression1.txt') as file:
for line in file:
HanLP([line],tasks = 'pos').pretty_print()
The XML File I'm trying to read starts with b':
b'<?xml version="1.0" encoding="UTF-8" ?><root><property_id type="dict"><n53987 type="int">54522</n53987><n65731 type="int">66266</n65731><n44322 type="int">44857</n44322><n11633 type="int">12148</n11633><n28192 type="int">28727</n28192><n69053 type="int">69588</n69053><n26529 type="int">27064</n26529><n4844 type="int">4865</n4844><n7625 type="int">7646</n7625><n54697 type="int">55232</n54697><n6210 type="int">6231</n6210><n26710 type="int">27245</n26710><n57915 type="int">58450</n57915
import xml.etree.ElementTree as etree
tree = etree.decode("UTF-8").parse("./property.xml")
How can I decode this file? And read the dict type afterwards?
so you can try this, but this returns an Element Instance
import ast
import xml.etree.ElementTree as etree
tree = None
with open("property.xml", "r") as xml_file:
f = xml_file.read()
# convert string representation of bytes back to bytes
raw_xml_bytes= ast.literal_eval(f)
# read XML from raw bytes
tree = etree.fromstring(raw_xml_bytes)
Another way is to read the file and convert it fully to a string file and then reread it again, this returns an ElementTree instance. You can achieve this using the following:
tree = None
with open("property.xml", "r") as xml_file:
f = xml_file.read()
# convert string representation of bytes back to bytes
raw_xml_bytes= ast.literal_eval(f)
# save the converted string version of the XML file
with open('output.xml', 'w') as file_obj:
file_obj.write(raw_xml_bytes.decode())
# read saved XML file
with open('output.xml', 'r') as xml_file:
tree = etree.parse(f)
Opening and reading an xml file will return data of type bytes, which has a .decode() method (cf. https://docs.python.org/3/library/stdtypes.html#bytes.decode). You can do the following, using the appropriate encoding name:
my_xml_text = xml_file.read().decode('utf-8')
Initially I want to create zip file dynamically and return it in http response. I use python 3.7 lib zipfile.
I tried both io buffer and tmp dir, neither one of them creates valid zip archive. Archive is only opened if its saved on disc
import zipfile
import io
#==============================================
# V1
file_like_object = io.BytesIO()
myZipFile = zipfile.ZipFile(file_like_object, "w", compression=zipfile.ZIP_DEFLATED)
myZipFile.writestr(u'test.py', b'test')
tmparchive="zip1.zip"
out = open(tmparchive,'wb') ## Open temporary file as bytes
out.write(file_like_object.getvalue())
out.close()
r = open(tmparchive, 'rb')
print (r.read())
r.close()
#==============================================
# V2
tmparchive2 = 'zip2.zip'
myZipFile2 = zipfile.ZipFile(tmparchive2, "w", compression=zipfile.ZIP_DEFLATED)
myZipFile2.writestr(u'test.py', b'test')
r2 = open(tmparchive2, 'rb')
print (r2.read())
r2.close()
#====================================================
It's preferable to use a context manager like so:
import zipfile, io
file_like_object = io.BytesIO()
with zipfile.ZipFile(file_like_object, "w", compression=zipfile.ZIP_DEFLATED) as myZipFile:
myZipFile.writestr(u'test.txt', b'test')
# file_like_object.getvalue() are the bytes you send in your http response.
I wrote it to file. It's definitely a valid zip file.
If you want to open the archive, you need to save it to disk. Applications like Explorer and 7-Zip have no way to read the BytesIO object that exists in the python process. They can only open archives saved to disk.
Calling print(r.read()) isn't going to open the archive. It's just going to print the bytes that make up the tiny zip file you just created.
I need to parse a remote pdf file. With PyPDF2, it can be done by PdfReader(f), where f=urllib.request.urlopen("some-url").read() . f cannot be used by the PdfReader, and it seems that f has to be decoded. What argument should be used in decode(), or some other method has to be used.
You need to use:
f = urllib.request.urlopen("some-url").read()
Add these lines after above line:
from StringIO import StringIO
f = StringIO(f)
and then read using PdfReader as:
reader = PdfReader(f)
Also, refer: Opening pdf urls with pyPdf
How to start creating my own filetype in Python ? I have a design in mind but how to pack my data into a file with a specific format ?
For example I would like my fileformat to be a mix of an archive ( like other format such as zip, apk, jar, etc etc, they are basically all archives ) with some room for packed files, plus a section of the file containing settings and serialized data that will not be accessed by an archive-manager application.
My requirement for this is about doing all this with the default modules for Cpython, without external modules.
I know that this can be long to explain and do, but I can't see how to start this in Python 3.x with Cpython.
Try this:
from zipfile import ZipFile
import json
data = json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])
with ZipFile('foo.filetype', 'w') as myzip:
myzip.writestr('digest.json', data)
The file is now a zip archive with a json file (thats easy to read in again in many lannguages) for data you can add files to the archive with myzip write or writestr. You can read data back with:
with ZipFile('foo.filetype', 'r') as myzip:
json_data_read = myzip.read('digest.json')
newdata = json.loads(json_data_read)
Edit: you can append arbitrary data to the file with:
f = open('foo.filetype', 'a')
f.write(data)
f.close()
this works for winrar but python can no longer process the zipfile.
Use this:
import base64
import gzip
import ast
def save(data):
data = "[{}]".format(data).encode()
data = base64.b64encode(data)
return gzip.compress(data)
def load(data):
data = gzip.decompress(data)
data = base64.b64decode(data)
return ast.literal_eval(data.decode())[0]
How to use this with file:
open(filename, "wb").write(save(data)) # save data
data = load(open(filename, "rb").read()) # load data
This might look like this is able to be open with archive program
but it cannot because it is base64 encoded and they have to decode it to access it.
Also you can store any type of variable in it!
example:
open(filename, "wb").write(save({"foo": "bar"})) # dict
open(filename, "wb").write(save("foo bar")) # string
open(filename, "wb").write(save(b"foo bar")) # bytes
# there's more you can store!
This may not be appropriate for your question but I think this may help you.
I have a similar problem faced... but end up with some thing like creating a zip file and then renamed the zip file format to my custom file format... But it can be opened with the winRar.