Download pdf file(Not restricted) from google drive through URL - python-3.x

import os
import requests
def download_file(download_url: str, filename: str):
"""
Download resume pdf file from storage
#param download_url: URL of reusme to be downloaded
#type download_url: str
#param filename: Name and location of file to be stored
#type filename: str
#return: None
#rtype: None
"""
file_request = requests.get(download_url)
with open(f'{filename}.pdf', 'wb+') as file:
file.write(file_request.content)
cand_id = "101"
time_current = "801"
file_location = f"{cand_id}_{time_current}"
download_file("https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf", file_location)
cand_id = "201"
time_current = "901"
download_file("https://drive.google.com/file/d/0B1HXnM1lBuoqMzVhZjcwNTAtZWI5OS00ZDg3LWEyMzktNzZmYWY2Y2NhNWQx/view?hl=en&resourcekey=0-5DqnTtXPFvySMiWstuAYdA", file_location)
----------
First file is working perfectly fine (i.e. 101_801.pdf)
But Second one is not able to open in any pdf reader(i.e.
201_901.pdf)(Error: We can't open this file).
What I understood is I'm not able properly read and write for file
from drive which is open for all. How to read that file and write?
I can use google drive API but can we have better solution without
using that ?

I tried out the code and couldnt open the PDF file as well. I suggest trying out gdown package. It is easy to use and you can download even large files from google drive. I used it in my class to download .sql db files (+-20Gb) for my assignments.
If you want to build more on this code, then you should probably check out Drive API. It is a well documented fast API.

I was able to find the solution for it through wget in python. Answering it so that it could help someone in the future.
import os
import wget
def download_candidate_resume(email: str, resume_url: str):
"""
This function is used to download resume from google drive and store on the local system
#param email: candidate email
#type email: str
#param resume_url: url of resume on google drive
#type resume_url: str
"""
file_extension = "pdf"
current_time = datetime.now()
file_name = f'{email}_{int(current_time.timestamp())}.{file_extension}'
temp_file_path = os.path.join(
os.getcwd(),
f'{email}_{int(current_time.timestamp())}.{file_extension}',
)
downloadable_resume_url = re.sub(
r"https://drive\.google\.com/file/d/(.*?)/.*?\?usp=sharing",
r"https://drive.google.com/uc?export=download&id=\1",
resume_url,
)
wget.download(downloadable_resume_url, out=temp_file_path)

Related

Passing base64 .docx to docx.Document results in BadZipFile exception

I'm writing an Azure function in Python 3.9 that needs to accept a base64 string created from a known .docx file which will serve as a template. My code will decode the base64, pass it to a BytesIO instance, and pass that to docx.Document(). However, I'm receiving an exception BadZipFile: File is not a zip file.
Below is a slimmed down version of my code. It fails on document = Document(bytesIODoc). I'm beginning to think it's an encoding/decoding issue, but I don't know nearly enough about it to get to the solution.
from docx import Document
from io import BytesIO
import base64
var = {
'template': 'Some_base64_from_docx_file',
'data': {'some': 'data'}
}
run_stuff = ParseBody(body=var)
output = run_stuff.run()
class ParseBody():
def __init__(self, body):
self.template = str(body['template'])
self.contents = body['data']
def _decode_template(self):
b64Doc = base64.b64decode(self.template)
bytesIODoc = BytesIO(b64Doc)
document = Document(bytesIODoc)
def run(self):
self.document = self._decode_template()
I've also tried the following change to _decode_template and am getting the same exception. This is running base64.decodebytes() on the b64Doc object and passing that to BytesIO instead of directly passing b64Doc.
def _decode_template(self):
b64Doc = base64.b64decode(self.template)
bytesDoc = base64.decodebytes(b64Doc)
bytesIODoc = BytesIO(bytesDoc)
I have successfully tried the following on the same exact .docx file to be sure that this is possible. I can open the document in Python, base64 encode it, decode into bytes, pass that to a BytesIO instance, and pass that to docx.Document successfully.
file = r'WordTemplate.docx'
doc = open(file, 'rb').read()
b64Doc = base64.b64encode(doc)
bytesDoc = base64.decodebytes(b64Doc)
bytesIODoc= BytesIO(bytesDoc)
newDoc = Document(bytesIODoc)
I've tried countless other solutions to no avail that have lead me further away from a resolution. This is the closest I've gotten. Any help is greatly appreciated!
The answer to the question linked below actually helped me resolve my own issue. How to generate a DOCX in Python and save it in memory?
All I had to do was change document = Document(bytesIODoc) to the following:
document = Document()
document.save(bytesIODoc)

Jira rest api get issue attachment via python

So, I need to download attachments to the issue in Jira using python. I have next code
from atlassian import Jira
issue = jira.issue(issuekey, fields='summary,comment,attachment')
for attachment in issue['fields']['attachment']:
with open((attachment.filename), 'wb') as file:
file.write(attachment.get(b'', b''))
After running the code I'm getting 3 empty files(txt, png, png) without any data inside..
How can I get(download) files from issue to my current folder?
Try using expand="attachment"
Ex:
issue = jira.issue(issuekey, expand="attachment")
for attachment in issue['fields']['attachment']:
with open(attachment.filename, 'wb') as file:
file.write(attachment.get())
You need the link to the contents of the attachment which is stored under the key 'content'. Then just use .get() request, that is in Jira library:
for attachment in issue['fields']['attachment']:
link = attachment['content']
link = link.split("https://jira.companyname.com/")[1]
b_str = jira.get(link, not_json_response=True)
with open((attachment['filename']), 'wb') as file:
file.write(b_str)
Notice that you need to trim the link, because jira.get() automatically includes the domain to the request url.
Get the attachment details:
get the jira attachment file URL
download the file using Request Module.
Check the file in file list.
issue = jira.issue(jira_ticket, expand='changelog')
attach = issue.fields.attachment
file_url = attach[0].content
file_path = "filename"
r = requests.get(file_url, auth=('jira_user', 'jira_pass'))
with open(file_path, 'wb') as f:
f.write(r.content)

Getting the absolute filename of file uploaded through Python Flask App

I am trying to create a flask app that can be used to upload any user selected file to my azure storage. For some reason, the mime-type of the uploaded file is always set to 'application/octet-stream'. If I directly upload the file to azure using its UI, then the mime-type is correct. To solve this problem, I am trying to manually calculate the mimetype of the file and pass it as metadata.
The issue I am having is that I am not able to figure out a way to get the absolute filepath of the user selected file to be uploaded.
What I am looking for is the absolute path: path/to/file/doctest2.txt
Here is how the flask app looks like:
#app.route('/', methods=['GET', 'POST'])
def upload_file():
if request.method == 'POST':
file = request.files['file']
filename = secure_filename(file.filename)
fileextension = filename.rsplit('.',1)[1]
Randomfilename = id_generator()
filename = Randomfilename + '.' + fileextension
try:
blob_service.create_blob_from_stream(container, filename, file)
except Exception:
print 'Exception=' + Exception
pass
ref = 'http://'+ account + '.blob.core.windows.net/' + container + '/' + filename
seems like we can get the filename using f.filename, but I am not sure how to get the full path here.
Complete code can be found here:
https://github.com/codesagar/Azure-Blobs/blob/master/blob.py
The ultimate goal is to calculate the mimetype of the file to be uploaded.
I do have the file-blob(variable f). IS there a better way to get the mime from blob rather than hunting for the absolute file-path?
I solved my problem by using the following line of code:
mime_type = f.content_type
This gives me the mimetype of the file and eliminates the need for getting the file's absolute path.

How do I replace the contents of a Google Drive file using python?

I have a Google Drive file that has a specific file id that others rely on so they can open it. I need to replace the contents of this file (a CSV file) with new content within python. I can't seem to find any examples of how to do this, though. I'm using Google Drive API v3 (googleapiclient). Please help.
I managed to create a successful test. I created a file (test.txt) in Google drive with some basic text in it. I obtained the file id by sharing a link and extracting the id from the link and then unshared it again. This python code successfully replaced the text in that file with the text in _textStream. My ultimate goal of replacing the CSV information is achieved similarly, except that the BytesIO object will be the CSV data and the _mimeType will be 'text/csv':
from oauth2client import file, client, tools
from googleapiclient.discovery import build
from httplib2 import Http
from io import BytesIO
from apiclient.http import MediaIoBaseUpload
def gauthenticate():
# If modifying these scopes, delete the file token.json.
SCOPES = 'https://www.googleapis.com/auth/drive'
store = file.Storage('token.json')
creds = store.get()
if not creds or creds.invalid:
flow = client.flow_from_clientsecrets('credentials.json', SCOPES)
creds = tools.run_flow(flow, store)
service = build('drive', 'v3', http=creds.authorize(Http()))
return service
def SaveTxtGDrive(service):
# Fill in your own file id here:
fileId = '1kEyXXXXXXXX_gEd6lQq'
_mimeType = 'text/plain'
_textStream = BytesIO(b'Here is some arbitrary data\nnew text\nAnother line\n')
_media = MediaIoBaseUpload(_textStream, mimetype=_mimeType,
chunksize=1024*1024, resumable=True)
_updatedFile = service.files().update(fileId=fileId,
media_body=_media).execute()
def main():
_service = gauthenticate()
SaveTxtGDrive(_service)
return
if (__name__ == '__main__'):
main()
The Google API documentation sure is opaque and v3 has few examples in python! Obviously, the Drive API key must be set up and the API enabled for Google Drive for this to work. I simply followed the error messages that popped up to enable all of that. I hope this helps someone else.

tmp file in Google cloud Functions for Python

Python runs like a charm on google cloud functions, but for the tmp files. Here's my simplified code:
FILE_PATH = "{}/report.pdf".format(tempfile.gettempdir())
pdf.output(FILE_PATH)
...
with open(FILE_PATH,'rb') as f:
data = f.read()
f.close()
encoded = base64.b64encode(data).decode()
attachment = Attachment()
attachment.content = str(encoded)
attachment.type = "application/pdf"
attachment.filename = "report"
attachment.disposition = "attachment"
attachment.content_id = "Report"
mail = Mail(from_email, subject, to_email, content)
mail.add_attachment(attachment)
Error is: [Errno 2] No such file or directory: '/tmp/report.pdf'
It works perfectly fine locally. Docs unfortunately only shows the node version. Workarounds would also be fine for sending that PDF.
It is a little difficult to find Google official documentation for writing in temporary folder. In My case, I needed to write in a temporary directory and upload it to google cloud storage using GCF.
Writing in temporary directory of Google Cloud Functions, it will consume memory resources provisioned for the function.
After creating the file and using it, it is recommended to remove it from the temporary directory. I used this code snippet for Write a csv into a temp dir in GCF(Python 3.7).
import pandas as pd
import os
import tempfile
from werkzeug.utils import secure_filename
def get_file_path(filename):
file_name = secure_filename(filename)
return os.path.join(tempfile.gettempdir(), file_name)
def write_temp_dir():
data = [['tom', 10], ['nick', 15]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
name = 'example.csv'
path_name = get_file_path(name)
df.to_csv(path_name, index=False)
os.remove(path_name)

Resources