INTRO: I have a Databricks notebook where I create a pdf file based on some data.
In order to generate the file I am using the fpdf library:
from fpdf import FPDF, HTMLMixin
Thanks to the library I generate a pdf file which is of type: <__main__.HTML2PDF at 0x7f3b73720fd0>.
My goal now is to send this pdf to a sharepoint folder. To do so I am using the following lines of code:
from office365.runtime.auth.user_credential import UserCredential
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
# paths
sharepoint_site = "MySharepointSite"
sharepoint_folder = "Shared Documents/General/PDFs/"
sharepoint_user = "aaa#bbb.onmicrosoft.com"
sharepoint_user_pw = "xyz"
sharepoint_folder = sharepoint_folder.strip("/")
# set environment variables
SITE_URL = f"https://sharepoint.com/sites/{sharepoint_site}"
RELATIVE_URL = f"/sites/{sharepoint_site}/{sharepoint_folder}"
# connect to sharepoint
ctx = ClientContext(SITE_URL).with_credentials(UserCredential(sharepoint_user, sharepoint_user_pw))
web = ctx.web
ctx.load(web).execute_query()
# Generate PDF
pdf = generate_pdf(ctx, row['ServerRelativeUrl'])
# HERE IS MY ISSUE!
ctx.web.get_folder_by_server_relative_url(sharepoint_folder).upload_file('test.pdf', pdf).execute_query()
PROBLEM: When I reach the last row I get the following error message:
TypeError: Object of type HTML2PDF is not JSON serializable
I believe that pdf objects cannot be serialized to be JSON and therefore I am stuck and I do not know how to send the PDF to the sharepoint.
QUESTION: Would you be able to suggest a smart and elegant way to achieve my goal i.e sending the pdf file to the sharepoint please?
I was able to solve this problem by saving the pdf as a string, then encoding it and finally pushing it to the sharepoint:
pdf_binary = pdf.output(dest='S').encode("latin1")
ctx.web.get_folder_by_server_relative_url(sharepoint_folder).upload_file("test.pdf", pdf_binary).execute_query()
Note: If it does not work, try to change the encoding type.
Related
My AWS OBJECT Lambda Function gets an unencrypted PDF via the Object Lambda inputS3Url. I want to use PyPDF2 to convert this to encrypted PDF, and send back via s3.write_get_object_response. How do I do this?
s3_url = object_get_context["inputS3Url"]
url=s3_url
response = requests.get(url)
my_raw_data = response.content
[SAVE ENCRYPTED my_raw_data TO VARIABLE so it can returned via S3.write_get_object_response - HOW?]
s3 = boto3.client('s3')
s3.write_get_object_response(
Body= [WHAT WOULD GO HERE?]
RequestRoute=request_route,
RequestToken=request_token)
The docs got you! Encrypting PDFs and Streaming Data is what you need (at least if I got you right; let me know if you want to achieve something else than getting a password-protected PDF on S3)
Not tested, but something like this
from PyPDF2 import PdfReader, PdfWriter
from io import BytesIO
reader = PdfReader(BytesIO(my_raw_data))
writer = PdfWriter()
# Add all pages to the writer
for page in reader.pages:
writer.add_page(page)
# Add a password to the new PDF
writer.encrypt("my-secret-password")
# Save the new PDF to a file
with BytesIO() as bytes_stream:
writer.write(bytes_stream)
bytes_stream.seek(0)
s3 = boto3.client('s3')
s3.write_get_object_response(
Body=bytes_stream,
RequestRoute=request_route,
RequestToken=request_token
)
I'm writing an Azure function in Python 3.9 that needs to accept a base64 string created from a known .docx file which will serve as a template. My code will decode the base64, pass it to a BytesIO instance, and pass that to docx.Document(). However, I'm receiving an exception BadZipFile: File is not a zip file.
Below is a slimmed down version of my code. It fails on document = Document(bytesIODoc). I'm beginning to think it's an encoding/decoding issue, but I don't know nearly enough about it to get to the solution.
from docx import Document
from io import BytesIO
import base64
var = {
'template': 'Some_base64_from_docx_file',
'data': {'some': 'data'}
}
run_stuff = ParseBody(body=var)
output = run_stuff.run()
class ParseBody():
def __init__(self, body):
self.template = str(body['template'])
self.contents = body['data']
def _decode_template(self):
b64Doc = base64.b64decode(self.template)
bytesIODoc = BytesIO(b64Doc)
document = Document(bytesIODoc)
def run(self):
self.document = self._decode_template()
I've also tried the following change to _decode_template and am getting the same exception. This is running base64.decodebytes() on the b64Doc object and passing that to BytesIO instead of directly passing b64Doc.
def _decode_template(self):
b64Doc = base64.b64decode(self.template)
bytesDoc = base64.decodebytes(b64Doc)
bytesIODoc = BytesIO(bytesDoc)
I have successfully tried the following on the same exact .docx file to be sure that this is possible. I can open the document in Python, base64 encode it, decode into bytes, pass that to a BytesIO instance, and pass that to docx.Document successfully.
file = r'WordTemplate.docx'
doc = open(file, 'rb').read()
b64Doc = base64.b64encode(doc)
bytesDoc = base64.decodebytes(b64Doc)
bytesIODoc= BytesIO(bytesDoc)
newDoc = Document(bytesIODoc)
I've tried countless other solutions to no avail that have lead me further away from a resolution. This is the closest I've gotten. Any help is greatly appreciated!
The answer to the question linked below actually helped me resolve my own issue. How to generate a DOCX in Python and save it in memory?
All I had to do was change document = Document(bytesIODoc) to the following:
document = Document()
document.save(bytesIODoc)
I have a tableau URL with a grid report in it. I need to print the page to pdf(A3) using python. Is there a way to achieve it. I tried using pdfkit and requests.get method but which is not giving proper output.
import requests
url = 'http://tabiisweb.sample.com/'
myfile = requests.get(url, allow_redirects=True,stream = True)
open('c:/tabfile.pdf', 'wb').write(myfile.content)
So, I need to download attachments to the issue in Jira using python. I have next code
from atlassian import Jira
issue = jira.issue(issuekey, fields='summary,comment,attachment')
for attachment in issue['fields']['attachment']:
with open((attachment.filename), 'wb') as file:
file.write(attachment.get(b'', b''))
After running the code I'm getting 3 empty files(txt, png, png) without any data inside..
How can I get(download) files from issue to my current folder?
Try using expand="attachment"
Ex:
issue = jira.issue(issuekey, expand="attachment")
for attachment in issue['fields']['attachment']:
with open(attachment.filename, 'wb') as file:
file.write(attachment.get())
You need the link to the contents of the attachment which is stored under the key 'content'. Then just use .get() request, that is in Jira library:
for attachment in issue['fields']['attachment']:
link = attachment['content']
link = link.split("https://jira.companyname.com/")[1]
b_str = jira.get(link, not_json_response=True)
with open((attachment['filename']), 'wb') as file:
file.write(b_str)
Notice that you need to trim the link, because jira.get() automatically includes the domain to the request url.
Get the attachment details:
get the jira attachment file URL
download the file using Request Module.
Check the file in file list.
issue = jira.issue(jira_ticket, expand='changelog')
attach = issue.fields.attachment
file_url = attach[0].content
file_path = "filename"
r = requests.get(file_url, auth=('jira_user', 'jira_pass'))
with open(file_path, 'wb') as f:
f.write(r.content)
I am trying to download data from a CSV file stored in a dropbox folder. So far I do like this:
import dropbox
#Get access to my dropbox folder
dbx = dropbox.Dropbox('SOME_ACCESS_TOKEN')
dbx.users_get_current_account()
#Download file
metadata, res = dbx.files_download('/Test.csv')
#Get the file content
data=res.content
print(data)
data is of the this form: b'1,2,3,4,5\r\nA,B,C,D,E\r\n1,2,3,4,5\r\nA,B,C,D,E\r\n1,2,3,4,5\r\nA,B,C,D,E\r\n'
Is there an easy way to restructure this into a list of lists?
The solution to the above mentioned problem is:
import dropbox
#Connect to dropbox folder
dbx = dropbox.Dropbox('SOME_ACCESS_TOKEN')
dbx.users_get_current_account()
#Get metadata
metadata, res = dbx.files_download('/Test.txt')
#Get and decode data
data=res.content.decode('utf-8')
#Restructure data
lines = data.split('\r\n')
lines.pop()
print(lines)