Restructure data loaded from dropbox file in python - python-3.x

I am trying to download data from a CSV file stored in a dropbox folder. So far I do like this:
import dropbox
#Get access to my dropbox folder
dbx = dropbox.Dropbox('SOME_ACCESS_TOKEN')
dbx.users_get_current_account()
#Download file
metadata, res = dbx.files_download('/Test.csv')
#Get the file content
data=res.content
print(data)
data is of the this form: b'1,2,3,4,5\r\nA,B,C,D,E\r\n1,2,3,4,5\r\nA,B,C,D,E\r\n1,2,3,4,5\r\nA,B,C,D,E\r\n'
Is there an easy way to restructure this into a list of lists?

The solution to the above mentioned problem is:
import dropbox
#Connect to dropbox folder
dbx = dropbox.Dropbox('SOME_ACCESS_TOKEN')
dbx.users_get_current_account()
#Get metadata
metadata, res = dbx.files_download('/Test.txt')
#Get and decode data
data=res.content.decode('utf-8')
#Restructure data
lines = data.split('\r\n')
lines.pop()
print(lines)

Related

How to send a pdf object from Databricks to Sharepoint?

INTRO: I have a Databricks notebook where I create a pdf file based on some data.
In order to generate the file I am using the fpdf library:
from fpdf import FPDF, HTMLMixin
Thanks to the library I generate a pdf file which is of type: <__main__.HTML2PDF at 0x7f3b73720fd0>.
My goal now is to send this pdf to a sharepoint folder. To do so I am using the following lines of code:
from office365.runtime.auth.user_credential import UserCredential
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
# paths
sharepoint_site = "MySharepointSite"
sharepoint_folder = "Shared Documents/General/PDFs/"
sharepoint_user = "aaa#bbb.onmicrosoft.com"
sharepoint_user_pw = "xyz"
sharepoint_folder = sharepoint_folder.strip("/")
# set environment variables
SITE_URL = f"https://sharepoint.com/sites/{sharepoint_site}"
RELATIVE_URL = f"/sites/{sharepoint_site}/{sharepoint_folder}"
# connect to sharepoint
ctx = ClientContext(SITE_URL).with_credentials(UserCredential(sharepoint_user, sharepoint_user_pw))
web = ctx.web
ctx.load(web).execute_query()
# Generate PDF
pdf = generate_pdf(ctx, row['ServerRelativeUrl'])
# HERE IS MY ISSUE!
ctx.web.get_folder_by_server_relative_url(sharepoint_folder).upload_file('test.pdf', pdf).execute_query()
PROBLEM: When I reach the last row I get the following error message:
TypeError: Object of type HTML2PDF is not JSON serializable
I believe that pdf objects cannot be serialized to be JSON and therefore I am stuck and I do not know how to send the PDF to the sharepoint.
QUESTION: Would you be able to suggest a smart and elegant way to achieve my goal i.e sending the pdf file to the sharepoint please?
I was able to solve this problem by saving the pdf as a string, then encoding it and finally pushing it to the sharepoint:
pdf_binary = pdf.output(dest='S').encode("latin1")
ctx.web.get_folder_by_server_relative_url(sharepoint_folder).upload_file("test.pdf", pdf_binary).execute_query()
Note: If it does not work, try to change the encoding type.

How to download an image from the internet using google colab jupyter

I need to download an image using a url. I managed to obtain the urls of the images I need to download, but now I'm lost on how to download it to my local computer. I'm using google colab/ jupyter. Thank you!
here's my code so far:
from bs4 import BeautifulSoup
import requests
import json
import urllib.request
#use Globe API to get data
#input userid - plan: have program read userids from csv or excel file
userid = xxxxxxxx
#use Globe API to get data
source = requests.get('https://api.globe.gov/search/v1/measurement/protocol/measureddate/userid/?protocols=land_covers&startdate=2020-05-04&enddate=2020-07-16&userid=' + str(userid) +'&geojson=FALSE&sample=FALSE').text
#set up BeautifulSoup4
soup = BeautifulSoup(source, 'lxml')
#Isolate the Json data and put it into a string called "paragraph"
body = soup.find('body')
paragraph = body.p.text
#load the string into a python object
data = json.loads(paragraph)
#pick out the needed information and store them
for landcover in data['results']:
siteId = landcover['siteId']
measuredDate = landcover['measuredDate']
latitude = landcover['latitude']
longitude = landcover['longitude']
protocol = landcover['protocol']
DownURL = landcover['data']['landcoversDownwardPhotoUrl']
#Here is where I want to download the url contained in 'DownURL'
Try
from google.colab import files as FILE
import os
img_data = requests.get(DownURL).content
with open('image_name.jpg', 'wb') as handler:
handler.write(img_data)
FILE.download('image_name.jpg')
os.remove('image_name.jpg') # to save up space
You can call a random function in case you do not wish to set an image name or a counter variable which keeps increments at each loop iteration.

Jira rest api get issue attachment via python

So, I need to download attachments to the issue in Jira using python. I have next code
from atlassian import Jira
issue = jira.issue(issuekey, fields='summary,comment,attachment')
for attachment in issue['fields']['attachment']:
with open((attachment.filename), 'wb') as file:
file.write(attachment.get(b'', b''))
After running the code I'm getting 3 empty files(txt, png, png) without any data inside..
How can I get(download) files from issue to my current folder?
Try using expand="attachment"
Ex:
issue = jira.issue(issuekey, expand="attachment")
for attachment in issue['fields']['attachment']:
with open(attachment.filename, 'wb') as file:
file.write(attachment.get())
You need the link to the contents of the attachment which is stored under the key 'content'. Then just use .get() request, that is in Jira library:
for attachment in issue['fields']['attachment']:
link = attachment['content']
link = link.split("https://jira.companyname.com/")[1]
b_str = jira.get(link, not_json_response=True)
with open((attachment['filename']), 'wb') as file:
file.write(b_str)
Notice that you need to trim the link, because jira.get() automatically includes the domain to the request url.
Get the attachment details:
get the jira attachment file URL
download the file using Request Module.
Check the file in file list.
issue = jira.issue(jira_ticket, expand='changelog')
attach = issue.fields.attachment
file_url = attach[0].content
file_path = "filename"
r = requests.get(file_url, auth=('jira_user', 'jira_pass'))
with open(file_path, 'wb') as f:
f.write(r.content)

How to only upload only the filename (not entire directly name+filename) to a google storage bucket

I have found a way to upload multiple csv files at once into a google cloud storage bucket that satisfy certain criteria. The problem I have is that when they all upload to the google storage bucket, the entire path name of the file is uploaded with it. I am wanting to upload only the actual file name
I have tried using the os.path.basename but it doesn't work. Is there any other way to obtain just the basename before it gets uploaded OR is there a way simply to rename the file before it gets uploaded?
import glob
import os
from pathlib import Path
from os import listdir
from google.cloud import storage
GOOGLE_APPLICATION_CREDENTIALS = "O:\My Creds\creds.json"
for file in glob.glob("O:\Team Drives\AU_A\Raw_Dauts\Dynamets\**\*.csv", recursive = True):
filename = os.path.basename(file) # throught this would work but doesn't
storage_client = storage.Client.from_service_account_json(GOOGLE_APPLICATION_CREDENTIALS)
bucket = storage_client.get_bucket('bukcetang81')
blob = bucket.blob("Dynamic_datasets/" +filename)
blob.upload_from_filename(filename)
I'd suggest that you remove parameter "filename", instead hard code it with the name of the file itself:
for file in glob.glob("O:\Team Drives\AU_A\Raw_Dauts\Dynamets\**\*.csv", recursive = True):
storage_client = storage.Client.from_service_account_json(GOOGLE_APPLICATION_CREDENTIALS)
bucket = storage_client.get_bucket('bukcetang81')
blob = bucket.blob(destination_filename)
blob.upload_from_filename(source_filename)

tmp file in Google cloud Functions for Python

Python runs like a charm on google cloud functions, but for the tmp files. Here's my simplified code:
FILE_PATH = "{}/report.pdf".format(tempfile.gettempdir())
pdf.output(FILE_PATH)
...
with open(FILE_PATH,'rb') as f:
data = f.read()
f.close()
encoded = base64.b64encode(data).decode()
attachment = Attachment()
attachment.content = str(encoded)
attachment.type = "application/pdf"
attachment.filename = "report"
attachment.disposition = "attachment"
attachment.content_id = "Report"
mail = Mail(from_email, subject, to_email, content)
mail.add_attachment(attachment)
Error is: [Errno 2] No such file or directory: '/tmp/report.pdf'
It works perfectly fine locally. Docs unfortunately only shows the node version. Workarounds would also be fine for sending that PDF.
It is a little difficult to find Google official documentation for writing in temporary folder. In My case, I needed to write in a temporary directory and upload it to google cloud storage using GCF.
Writing in temporary directory of Google Cloud Functions, it will consume memory resources provisioned for the function.
After creating the file and using it, it is recommended to remove it from the temporary directory. I used this code snippet for Write a csv into a temp dir in GCF(Python 3.7).
import pandas as pd
import os
import tempfile
from werkzeug.utils import secure_filename
def get_file_path(filename):
file_name = secure_filename(filename)
return os.path.join(tempfile.gettempdir(), file_name)
def write_temp_dir():
data = [['tom', 10], ['nick', 15]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
name = 'example.csv'
path_name = get_file_path(name)
df.to_csv(path_name, index=False)
os.remove(path_name)

Resources