Editing a .odt File using python - python-3.x

First off i must say i am VERY new to programming (less then a week experience in total). I set out to write a program that generates a series of documents of an .odt template. I want to use a template with a specific keyword lets say "X1234X" and so on. This will then be replaced by values generated from the program. Each document is a little different and the values are entered and calculated via a prompt (dates and other things)
I wrote most of the code so far but i am stuck since 2 days on that problem. I used the ezodf module to generate a new document (with different filenames) from a template but i am stuck on how to edit the content.
I googled hard but came up empty hope someone here could help. I tried reading the documentations but i must be honest...its a bit tough to understand. I am not familiar with the "slang"
Thanks
PS: a ezodf method would be great, but any other ways will do too. The program doesnt have to be pretty it just has to work (so i can work less ^_^)

Well i figured it out. nd finished the program. I used a ezodf to create the file, then zipfile to extract and edit the content.xml and then repacked the whole thing via a nice >def thingy< from here. I tried to mess with etree...but i couldnt figure it out...
from ezodf import newdoc
import os
import zipfile
import tempfile
for s in temp2:
input2 = s
input2 = str(s)
input1 = cname[0]
file1 = '.odt'
namef = input2 + input1 + file1
odt = newdoc(doctype='odt', filename=namef, template='template.odt')
odt.save()
a = zipfile.ZipFile('template.odt')
content = a.read('content.xml')
content = str(content.decode(encoding='utf8'))
content = str.replace(content,"XXDATEXX", input2)
content = str.replace(content, 'XXNAMEXX', input1)
def updateZip(zipname, filename, data):
# generate a temp file
tmpfd, tmpname = tempfile.mkstemp(dir=os.path.dirname(zipname))
os.close(tmpfd)
# create a temp copy of the archive without filename
with zipfile.ZipFile(zipname, 'r') as zin:
with zipfile.ZipFile(tmpname, 'w') as zout:
zout.comment = zin.comment # preserve the comment
for item in zin.infolist():
if item.filename != filename:
zout.writestr(item, zin.read(item.filename))
# replace with the temp archive
os.remove(zipname)
os.rename(tmpname, zipname)
# now add filename with its new data
with zipfile.ZipFile(zipname, mode='a', compression=zipfile.ZIP_DEFLATED) as zf:
zf.writestr(filename, data)
updateZip(namef, 'content.xml', content)

Related

How do I dynamically create a variable name in a loop to assign to a file name in python 3

I'm still relatively new to programming and Python. But I am sure this must be possible but my searches are not turning up what I'm looking for.
In my current directory, I have 6 PDF files that I wish to read in via the loop below.
What I would like to do is open each of the PDF's with a new variable name, as you can see it is imaginatively called pdf[1-6]File.pdf.
I can list the files in the console and pull them via the code when I stick breaks in to stop it executing but I can't for the life of me work out how to create the variable name. I thought something like "pdf" + str(i) + "File" would have worked but I'm missing something.
Code is below - not complete but enough so you get what I'm looking at:
#Open the PDF files in the current directory for
#reading in binary mode
def opensource():
listOfFiles = os.listdir('.')
pattern = "*.pdf"
for entry in listOfFiles:
if fnmatch.fnmatch(entry, pattern):
# Works to here perfectly
for i in range(len(entry)):
# print(len(entry))
# Trying to create the variable name with
# an incremental numeral in the file name
"pdf" + i + "File" = open(entry, 'rb')
This bit below is how I'm currently doing it and its a pain in the backside. I'm sure it can be done programmatically
#This is the old way. Monolithic and horrid
#Open the files that have to be merged one by one
pdf1File = open('file1.pdf', 'rb')
pdf2File = open('file2.pdf', 'rb')
pdf3File = open('file3.pdf', 'rb')
pdf4File = open('file4.pdf', 'rb')
pdf5File = open('file5.pdf', 'rb')
pdf6File = open('file6.pdf', 'rb')
All help gratefully received.
Thanks
If you are going to use the file pointer outside this for loop, you can very well use a dictionary to do that..
def opensource():
listOfFiles = os.listdir('.')
pattern = "*.pdf"
file_ptrs = {}
for entry in listOfFiles:
if fnmatch.fnmatch(entry, pattern):
# Works to here perfectly
for i in range(len(entry)):
# print(len(entry))
# Trying to create the variable name with
# an incremental numeral in the file name
file_ptrs["pdf" + str(i) + "File"] = open(entry, 'rb')
Caution: Its always advisable to use the open method alongside of a "with" clause in python.. it takes care of closing the file once the file operation goes out of context.

Reportlab PDF creating with python duplicating text

I am trying to automate the production of pdfs by reading data from a pandas data frame and writing it a page on an existing pdf form using pyPDF2 and reportlab. The main meat of the program is here:
def pdfOperations(row, bp):
packet = io.BytesIO()
can = canvas.Canvas(packet, pagesize=letter)
createText(row, can)
packet.seek(0)
new_pdf = PdfFileReader(packet)
textPage = new_pdf.getPage(0)
secondPage = bp.getPage(1)
secondPage.mergePage(textPage)
assemblePDF(frontPage, secondPage, row)
del packet, can, new_pdf, textPage, secondPage
def main():
df = openData()
bp = readPDF()
frontPage = bp.getPage(0)
for ind in df.index:
row = df.loc[ind]
pdfOperations(row, bp)
This works fine for the first row of data and the first pdf generated, but for the subsequent ones all the text is overwritten. I.e. the second pdf contains text from the first iteration and the second. I thought the garbage collection would take care of all the in memory changes, but that does not seem to be happening. Anyone know why?
I even tries forcing the objects to be deleted after the function has run its course, but no luck...
You read bp only once before the loop. Then in the loop, you obtain its second page via getPage(1) and merge stuff to it. But since its always from the same object (bp), each iteration will merge to the same page, therefore all the merges done before add up.
While I don't find any way to create a "deepcopy" of a page in PyPDF2's docs, it should work to just create a new bp object for each iteration.
Somewhere in readPDF you must have done something where you open your template PDF into a binary stream and then pass that to PdfFileReader. Instead, you could read the data into a variable:
with open(filename, "rb") as f:
bp_bin = f.read()
And from that, create a new PdfFileReader instance for each loop iteration:
for ind in df.index:
row = df.loc[ind]
bp = PdfFileReader(bp_bin)
pdfOperations(row, bp)
This should "reset" the secondPage everytime without any additional file I/O overhead. Only the parsing is done again each time, but depending on the file size and contents, maybe the time that takes is low and you can live with that.

add new row to numpy using realtime reading

I am using a microstacknode accelerometer and intend to save it into csv file.
while True:
numpy.loadtxt('foo.csv', delimiter=",")
raw = accelerometer.get_xyz(raw=True)
g = accelerometer.get_xyz()
ms = accelerometer.get_xyz_ms2()
a = numpy.asarray([[raw['x'],raw['y'],raw['z']]])
numpy.savetxt("foo.csv",a,delimiter=",",newline="\n")
However, the saving is only done on 1 line. Any help given? Still quite a noobie on python.
NumPy is not the best solution for this type of things.
This should do what you intend:
while True:
raw = accelerometer.get_xyz(raw=True)
fobj = open('foo.csv', 'a')
fobj.write('{},{},{}\n'.format(raw['x'], raw['y'], raw['z']))
fobj.close()
Here fobj = open('foo.csv', 'a') opens the file in append mode. So if the file already exists, the next writing will go to the end of file, keeping the data in the file.
Let's have look at your code. This line:
numpy.loadtxt('foo.csv', delimiter=",")
reads the whole file but doe not do anything with the at it read, because you don't assign to a variable. You would need to do something like this:
data = numpy.loadtxt('foo.csv', delimiter=",")
This line:
numpy.savetxt("foo.csv",a,delimiter=",",newline="\n")
Creates a new file with the name foo.csv overwriting the existing one. Therefore, you see only one line, the last one written.
This should do the same but dos not open and close the file all the time:
with open('foo.csv', 'a') as fobj:
while True:
raw = accelerometer.get_xyz(raw=True)
fobj.write('{},{},{}\n'.format(raw['x'], raw['y'], raw['z']))
The with open() opens the file with the promise to close it even in case of an exception. For example, if you break out of the while True loop with Ctrl-C.

Why is csvreader for Python starting good then producing NULL bytes?

OKay so I am reading an excel workbook. I read the file for a while and it started off a .csv after debugging and doing other things below the code i am showing you it changed to a xlsx I started getting IOError no such file or directory. I figured out why and changed FFA.csv to FFA.xlsx and it worked error free. Then I started doing other things and debugging. Got up this morning and now i get the following Error : line contains NULL byte. weird because the code started out good. Now it can't read. I put in the print repr() to debug and it infact now prints NULL bytes. So how do i fix this and prevent it in the future? here is the 1st 200 bytes:
PK\x03\x04\x14\x00\x06\x00\x08\x00\x00\x00!\x00b\xee\x9dh^\x01\x00\x00\x90\x04\x00\x00\x13\x00\x08\x02[Content_Types].xml \xa2\x04\x02(\xa0\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
import csv
def readFile():
count = 0
print repr(open("FFA.xlsx", "rb").read(200)) #dump 1st 200 bytes
with open("FFA.xlsx","rb") as csvfile:
FFAreader = csv.reader(csvfile, delimiter=",")
for row in FFAreader:
idd = row[0]
name = row[1]
pos = row[2]
team = row[3]
pts = row[4]
oecr = row[5]
oR = row[6]
posR = row[7]
up = row[8]
low =row[9]
risk = row[10]
swing = row[11]
readFile()
The code you have posted have a small but dangerous mistake, since you are leaking the file handle by opening it twice.
1) You are opening the file and reading 200 bytes from it, but not closing it.
2) You are then opening the file the proper way, via a context manager, which in fact could read anything from it.
Some questions that may help you to debug the problem:
Is the file you are opening stored in a network'd resource? (CIFS, NFS, etc)
Have you checked the file is not opened by another process? lsof can help you to check that.
Is this running on windows or Linux? Can you test in under linux, if it happens in windows, and viceversa?
I forgot to mention that you should not use CSV for anything related to Excel, even when the file seems to be a CSV data-wise. Use XLRD module (https://pypi.python.org/pypi/xlrd) , it's cross-platform and opens and reads perfectly fine both XSL and XSLX files since version 0.8.
This little piece of code will show you how to open the workbook and parse it in a basic manner:
import xlrd
def open_excel():
with xlrd.open_workbook('FFA.xlsx') as wb:
sh = wb.sheet_by_name('Sheet1')
for rownum in xrange(sh.nrows):
[Do whatever you need here]
I agree with Marc, I did a training exercise importing an excel file and I think pandas library would help in that case where you can import pandas as pd and use pd.read_excel(file_name) as part of a data_processing function like read_file() post import.
So this is what I did. But I am intersted in learning the xlrd method i have the module but no documentation. This works no error messages. Still not sure why it changed from .csv to xlsx but its working now. What is the script like in xlrd?
import csv
def readFile():
count = 0
#print repr(open("FFA.csv", "rb").read(200)) #dump 1st 200 bytes check if null values produced.
with open("FFA.csv","rb") as csvfile:
FFAreader = csv.reader(csvfile, delimiter=",")
for row in FFAreader:
idd = row[0]
name = row[1]
pos = row[2]
team = row[3]
pts = row[4]
oecr = row[5]
oR = row[6]
posR = row[7]
up = row[8]
low =row[9]
risk = row[10]
swing = row[11]
readFile()

Custom filetype in Python 3

How to start creating my own filetype in Python ? I have a design in mind but how to pack my data into a file with a specific format ?
For example I would like my fileformat to be a mix of an archive ( like other format such as zip, apk, jar, etc etc, they are basically all archives ) with some room for packed files, plus a section of the file containing settings and serialized data that will not be accessed by an archive-manager application.
My requirement for this is about doing all this with the default modules for Cpython, without external modules.
I know that this can be long to explain and do, but I can't see how to start this in Python 3.x with Cpython.
Try this:
from zipfile import ZipFile
import json
data = json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])
with ZipFile('foo.filetype', 'w') as myzip:
myzip.writestr('digest.json', data)
The file is now a zip archive with a json file (thats easy to read in again in many lannguages) for data you can add files to the archive with myzip write or writestr. You can read data back with:
with ZipFile('foo.filetype', 'r') as myzip:
json_data_read = myzip.read('digest.json')
newdata = json.loads(json_data_read)
Edit: you can append arbitrary data to the file with:
f = open('foo.filetype', 'a')
f.write(data)
f.close()
this works for winrar but python can no longer process the zipfile.
Use this:
import base64
import gzip
import ast
def save(data):
data = "[{}]".format(data).encode()
data = base64.b64encode(data)
return gzip.compress(data)
def load(data):
data = gzip.decompress(data)
data = base64.b64decode(data)
return ast.literal_eval(data.decode())[0]
How to use this with file:
open(filename, "wb").write(save(data)) # save data
data = load(open(filename, "rb").read()) # load data
This might look like this is able to be open with archive program
but it cannot because it is base64 encoded and they have to decode it to access it.
Also you can store any type of variable in it!
example:
open(filename, "wb").write(save({"foo": "bar"})) # dict
open(filename, "wb").write(save("foo bar")) # string
open(filename, "wb").write(save(b"foo bar")) # bytes
# there's more you can store!
This may not be appropriate for your question but I think this may help you.
I have a similar problem faced... but end up with some thing like creating a zip file and then renamed the zip file format to my custom file format... But it can be opened with the winRar.

Resources