How to send a pdf object from Databricks to Sharepoint? - sharepoint

INTRO: I have a Databricks notebook where I create a pdf file based on some data.
In order to generate the file I am using the fpdf library:
from fpdf import FPDF, HTMLMixin
Thanks to the library I generate a pdf file which is of type: <__main__.HTML2PDF at 0x7f3b73720fd0>.
My goal now is to send this pdf to a sharepoint folder. To do so I am using the following lines of code:
from office365.runtime.auth.user_credential import UserCredential
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
# paths
sharepoint_site = "MySharepointSite"
sharepoint_folder = "Shared Documents/General/PDFs/"
sharepoint_user = "aaa#bbb.onmicrosoft.com"
sharepoint_user_pw = "xyz"
sharepoint_folder = sharepoint_folder.strip("/")
# set environment variables
SITE_URL = f"https://sharepoint.com/sites/{sharepoint_site}"
RELATIVE_URL = f"/sites/{sharepoint_site}/{sharepoint_folder}"
# connect to sharepoint
ctx = ClientContext(SITE_URL).with_credentials(UserCredential(sharepoint_user, sharepoint_user_pw))
web = ctx.web
ctx.load(web).execute_query()
# Generate PDF
pdf = generate_pdf(ctx, row['ServerRelativeUrl'])
# HERE IS MY ISSUE!
ctx.web.get_folder_by_server_relative_url(sharepoint_folder).upload_file('test.pdf', pdf).execute_query()
PROBLEM: When I reach the last row I get the following error message:
TypeError: Object of type HTML2PDF is not JSON serializable
I believe that pdf objects cannot be serialized to be JSON and therefore I am stuck and I do not know how to send the PDF to the sharepoint.
QUESTION: Would you be able to suggest a smart and elegant way to achieve my goal i.e sending the pdf file to the sharepoint please?

I was able to solve this problem by saving the pdf as a string, then encoding it and finally pushing it to the sharepoint:
pdf_binary = pdf.output(dest='S').encode("latin1")
ctx.web.get_folder_by_server_relative_url(sharepoint_folder).upload_file("test.pdf", pdf_binary).execute_query()
Note: If it does not work, try to change the encoding type.

Related

AWS Object Lambda using PyPDF2 to send back encrypted PDF

My AWS OBJECT Lambda Function gets an unencrypted PDF via the Object Lambda inputS3Url. I want to use PyPDF2 to convert this to encrypted PDF, and send back via s3.write_get_object_response. How do I do this?
s3_url = object_get_context["inputS3Url"]
url=s3_url
response = requests.get(url)
my_raw_data = response.content
[SAVE ENCRYPTED my_raw_data TO VARIABLE so it can returned via S3.write_get_object_response - HOW?]
s3 = boto3.client('s3')
s3.write_get_object_response(
Body= [WHAT WOULD GO HERE?]
RequestRoute=request_route,
RequestToken=request_token)
The docs got you! Encrypting PDFs and Streaming Data is what you need (at least if I got you right; let me know if you want to achieve something else than getting a password-protected PDF on S3)
Not tested, but something like this
from PyPDF2 import PdfReader, PdfWriter
from io import BytesIO
reader = PdfReader(BytesIO(my_raw_data))
writer = PdfWriter()
# Add all pages to the writer
for page in reader.pages:
writer.add_page(page)
# Add a password to the new PDF
writer.encrypt("my-secret-password")
# Save the new PDF to a file
with BytesIO() as bytes_stream:
writer.write(bytes_stream)
bytes_stream.seek(0)
s3 = boto3.client('s3')
s3.write_get_object_response(
Body=bytes_stream,
RequestRoute=request_route,
RequestToken=request_token
)

Passing base64 .docx to docx.Document results in BadZipFile exception

I'm writing an Azure function in Python 3.9 that needs to accept a base64 string created from a known .docx file which will serve as a template. My code will decode the base64, pass it to a BytesIO instance, and pass that to docx.Document(). However, I'm receiving an exception BadZipFile: File is not a zip file.
Below is a slimmed down version of my code. It fails on document = Document(bytesIODoc). I'm beginning to think it's an encoding/decoding issue, but I don't know nearly enough about it to get to the solution.
from docx import Document
from io import BytesIO
import base64
var = {
'template': 'Some_base64_from_docx_file',
'data': {'some': 'data'}
}
run_stuff = ParseBody(body=var)
output = run_stuff.run()
class ParseBody():
def __init__(self, body):
self.template = str(body['template'])
self.contents = body['data']
def _decode_template(self):
b64Doc = base64.b64decode(self.template)
bytesIODoc = BytesIO(b64Doc)
document = Document(bytesIODoc)
def run(self):
self.document = self._decode_template()
I've also tried the following change to _decode_template and am getting the same exception. This is running base64.decodebytes() on the b64Doc object and passing that to BytesIO instead of directly passing b64Doc.
def _decode_template(self):
b64Doc = base64.b64decode(self.template)
bytesDoc = base64.decodebytes(b64Doc)
bytesIODoc = BytesIO(bytesDoc)
I have successfully tried the following on the same exact .docx file to be sure that this is possible. I can open the document in Python, base64 encode it, decode into bytes, pass that to a BytesIO instance, and pass that to docx.Document successfully.
file = r'WordTemplate.docx'
doc = open(file, 'rb').read()
b64Doc = base64.b64encode(doc)
bytesDoc = base64.decodebytes(b64Doc)
bytesIODoc= BytesIO(bytesDoc)
newDoc = Document(bytesIODoc)
I've tried countless other solutions to no avail that have lead me further away from a resolution. This is the closest I've gotten. Any help is greatly appreciated!
The answer to the question linked below actually helped me resolve my own issue. How to generate a DOCX in Python and save it in memory?
All I had to do was change document = Document(bytesIODoc) to the following:
document = Document()
document.save(bytesIODoc)

Print a webpage to a PDF file using Python

I have a tableau URL with a grid report in it. I need to print the page to pdf(A3) using python. Is there a way to achieve it. I tried using pdfkit and requests.get method but which is not giving proper output.
import requests
url = 'http://tabiisweb.sample.com/'
myfile = requests.get(url, allow_redirects=True,stream = True)
open('c:/tabfile.pdf', 'wb').write(myfile.content)

Jira rest api get issue attachment via python

So, I need to download attachments to the issue in Jira using python. I have next code
from atlassian import Jira
issue = jira.issue(issuekey, fields='summary,comment,attachment')
for attachment in issue['fields']['attachment']:
with open((attachment.filename), 'wb') as file:
file.write(attachment.get(b'', b''))
After running the code I'm getting 3 empty files(txt, png, png) without any data inside..
How can I get(download) files from issue to my current folder?
Try using expand="attachment"
Ex:
issue = jira.issue(issuekey, expand="attachment")
for attachment in issue['fields']['attachment']:
with open(attachment.filename, 'wb') as file:
file.write(attachment.get())
You need the link to the contents of the attachment which is stored under the key 'content'. Then just use .get() request, that is in Jira library:
for attachment in issue['fields']['attachment']:
link = attachment['content']
link = link.split("https://jira.companyname.com/")[1]
b_str = jira.get(link, not_json_response=True)
with open((attachment['filename']), 'wb') as file:
file.write(b_str)
Notice that you need to trim the link, because jira.get() automatically includes the domain to the request url.
Get the attachment details:
get the jira attachment file URL
download the file using Request Module.
Check the file in file list.
issue = jira.issue(jira_ticket, expand='changelog')
attach = issue.fields.attachment
file_url = attach[0].content
file_path = "filename"
r = requests.get(file_url, auth=('jira_user', 'jira_pass'))
with open(file_path, 'wb') as f:
f.write(r.content)

Restructure data loaded from dropbox file in python

I am trying to download data from a CSV file stored in a dropbox folder. So far I do like this:
import dropbox
#Get access to my dropbox folder
dbx = dropbox.Dropbox('SOME_ACCESS_TOKEN')
dbx.users_get_current_account()
#Download file
metadata, res = dbx.files_download('/Test.csv')
#Get the file content
data=res.content
print(data)
data is of the this form: b'1,2,3,4,5\r\nA,B,C,D,E\r\n1,2,3,4,5\r\nA,B,C,D,E\r\n1,2,3,4,5\r\nA,B,C,D,E\r\n'
Is there an easy way to restructure this into a list of lists?
The solution to the above mentioned problem is:
import dropbox
#Connect to dropbox folder
dbx = dropbox.Dropbox('SOME_ACCESS_TOKEN')
dbx.users_get_current_account()
#Get metadata
metadata, res = dbx.files_download('/Test.txt')
#Get and decode data
data=res.content.decode('utf-8')
#Restructure data
lines = data.split('\r\n')
lines.pop()
print(lines)

Resources