How to access an Excel file on Onedrive with Openpyxl - python-3.x

I want to open an Excel file (on Onedrive) with Openpyxl (Python). I received error trying this:
from openpyxl import load_workbook
file = r"https://d.docs.live.net/dd10xxxxxxxxxx"
wb = load_workbook(filename = file)
self.fp = io.open(file, filemode)
OSError: [Errno 22] Invalid argument: 'https://d.docs.live.net/dd10...

OpenPyXL cannot read/write files over http. It expects a file on a traditional filesystem, whether it's local, on a network share, etc.
If you're using OneDrive For Business you could try mapping it to a drive letter, or investigate the use of Google Sheets and the gspread library instead.

One alternative way is to use google drive rather than one drive then open google colab, mount the google drive and open the file from google drive.
from google.colab import drive
drive.mount('/content/gdrive/')

Related

Python: Access a zipped XL file without extracting it

Is there a way I can process an open the excel file within a zip file without first extracting it. I am not interested in modifying it.
from zipfile import ZipFile
from openpyxl import load_workbook
procFile ="C:\\Temp2\\XLFile-Demo-PW123.zip"
xl_file = "XLFile-Demo.xlsx"
myzip = ZipFile(procFile)
myzip.setpassword(bytes('123', 'utf-8'))
# line below returns an error
with load_workbook(myzip.open(xl_file)) as wb_obj:
print(wb_obj.sheetnames)
Most of the examples that perform this only directly open text files.
I would like to simulate the behaviour of archiving programs such as WinRar and 7zip.
Thanks

Save my file on a shared drive google colab

We are working as a team on a Shared Drive, we are using Google Colab to support our code.
Here is the path : /content/drive/Shared drives/Projet_IE/Technique/BDD
We want to save the file (.json) in our drive, but it fails because of the blank in the "Shared drives"
How can we make this path understandable by the code, since "Shared drives" is impose by GDrive and the code doesn't understand the blank in the path ?
Thanks !
from google.colab import drive
import json
!mkdir -p /content/gdrive/My\ Drive/test
a = {'a':1,'b':2,'c':3}
with open("/content/gdrive/My Drive/test/your_json_file", "w") as fp:
json.dump(a , fp)

Camelot-py does not work in loops but works for an individual file

I am currently working on an automation project for a company, and one of the tasks require that I loop through a directory and convert all the pdf files into a CSV file. I am using the camelot-py library (which has been better than the others I have tried). When I apply the code below to a single file, it works just fine; however, I wish to make it loop through all pdf files in the directory. I get the following error with the code below:
"OSError: [Errno 22] Invalid argument"
import camelot
import csv
import pandas as pd
import os
directoryPath = r'Z:\testDirectory'
os.chdir(directoryPath)
print(os.listdir())
folderList = os.listdir(directoryPath)
for folders, sub_folders, file in os.walk(directoryPath):
for name in file:
if name.endswith(".pdf"):
filename = os.path.join(folders,name)
print(filename)
print(name)
tables = camelot.read_pdf(filename, flavor = 'stream', columns= ['72,73,150,327,442,520,566,606,683'])
tables = tables[0].df
print(tables[0].parsing_report)
tables.to_csv('foo2.csv')
I expect all files to be converted to '.csv' files but I get the error 'OSError: [Errno 22] Invalid argument'. My error appears to be from line 16.
I don’t know if you have the same problem, but in my case I made a really stupid mistake of not putting the files in the correct directory. I was getting the same error but once I found out the problem, script works within a regular for loop.
Instead of the to methods, I am using the bulk export to export the results in sql, but that should not be a problem.

How to upload Cloud files into Python

I use to upload excel files into pandas dataframe
pd.ExcelFile if the files are in my local drive
How can I do the same if I have an Excel file in Google Drive or Microsoft One Drive and I want to connect remotely?
You can use read_csv() on a StringIO object:
from StringIO import StringIO # moved to io in python3.
import requests
r = requests.get('Your google drive link')
data = r.content
df = pd.read_csv(StringIO(data))

Writing Pandas DataFrames to Google sheets: no such file or directory .oauth/drive.json

I've been trying to find a way to read and write data between Pandas and Google sheets for a while now. I found the library df2gspread which seems perfect for the job. Been spending a while now trying to get it to work.
As instructed, I used the Google API console to create my client secrets file and saved it as ~/.gdrive_private. Now, I'm trying to download the contents of a Google spreadsheet as follows:
workbook = [local filepath to workbook in Google Drive folder]
df = g2d.download(workbook, 'Sheet1', col_names = True, row_names = True)
When I run this, it is successfully opening a browser window asking to give my app access to my Google sheets. However, when I click allow, an iPython error is coming up:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/samlilienfeld/.oauth/drive.json'
What is this file supposed to contain? I've tried creating the folder and including my client secrets again there as drive.json, but this does not work.
I did a work around for the time being by passing a pre-authenticated credential file to the g2d call.
I made a gist here (for Python2x but should work for 3x) to save the credential file by passing the secret file (basically ~/.gdrive_private) and the resulting authenticated credential filename to save.
Use the above gist in an standalone script with appropriate filenames and run it from a terminal console. A browser window will open to perform the OAuth authentication via Google, and should give you a token which you can copy paste into the terminal prompt. Here's a quick example:
from gdrive_creds import create_creds
# Copy Paste whatever shows up in the browser in the console.
create_creds('./.gdrive_private', './authenticated_creds')
You can then use the file to authenticate for df2gspread calls.
Once you create the cred file using the gist method, try something like this to get access to your GDrive:
from oauth2client.file import Storage
from df2gspread import gspread2df as g2d
# Read the cred file
creds = Storage('./authenticated_creds').get()
# Pass it to g2df (Trimmed for brevity)
workbook = [local filepath to workbook in Google Drive folder]
df = g2d.download(workbook, 'Sheet1', col_names = True, credentials=creds)
df.head()
This worked for me.
Here the two functioning ways as of 2019:
1.DateFrame data to Google sheet:
#Import libraries
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
# Connection to googlesheet
import gspread
from oauth2client.service_account import ServiceAccountCredentials
# From dataframe to google sheet
from df2gspread import df2gspread as d2g
# Configure the connection
scope = ['https://spreadsheets.google.com/feeds']
# Add the JSON file you downloaded from Google Cloud to your working directory
# the JSON file in this case is called 'service_account_gs.json' you can rename as you wish
credentials =ServiceAccountCredentials.from_json_keyfile_name('service_account_gs.json',
scope
)
# Authorise your Notebook with credentials just provided above
gc = gspread.authorize(credentials)
# The spreadsheet ID, you see it in the URL path of your google sheet
spreadsheet_key = '1yr6LwGQzdNnaonn....'
# Create the dataframe within your notebook
df = pd.DataFrame({'number': [1,2,3],'letter': ['a','b','c']})
# Set the sheet name you want to upload data to and the start cell where the upload data begins
wks_name = 'Sheet1'
cell_of_start_df = 'A1'
# upload the dataframe
d2g.upload(df,
spreadsheet_key,
wks_name,
credentials=credentials,
col_names=True,
row_names=False,
start_cell = cell_of_start_df,
clean=False)
print ('Successfully updated')
2.Google sheet to DataFrame
from df2gspread import gspread2df as g2d
df = g2d.download(gfile='1yr6LwGQzdNnaonn....',
credentials=credentials,
col_names=True,
row_names=False)
df
It seems like this issue was because /User/***/.oauth folder wasn't created automatically by oauth2client package (e.g. issue). One of possible solutions is to create this folder manually or you can update df2gspread, issue should be fixed in last version.

Resources