How to search for Tensorflow Files in Google Drive? - python-3.x

I'm following the docs here: https://colab.research.google.com/github/google/earthengine-api/blob/master/python/examples/ipynb/TF_demo1_keras.ipynb#scrollTo=43-c0JNFI_m6 to learn how to use Tensorflow with GEE. One part of this tutorial is checking the existence of exported files. In the docs, the example code is:
fileNameSuffix = '.tfrecord.gz'
trainFilePath = 'gs://' + outputBucket + '/' + trainFilePrefix + fileNameSuffix
testFilePath = 'gs://' + outputBucket + '/' + testFilePrefix + fileNameSuffix
print('Found training file.' if tf.gfile.Exists(trainFilePath)
else 'No training file found.')
print('Found testing file.' if tf.gfile.Exists(testFilePath)
else 'No testing file found.')
In my case, I'm just exporting the files to Google Drive instead of Google Cloud bucket. How would I change trainFilePath and testFilePath to point to the Google Drive folder? FWIW, when I go into the Google Drive Folder, I do actually see files.

I would say you could use the Google Drive API to list files in you Google Drive instead of a GCS Bucket. You can find the documentation here.
You can also use PyDrive, which is pretty easy to understand. This is an example, you only have to adjust the query "q" to your needs:
from pydrive.drive import GoogleDrive
from pydrive.auth import GoogleAuth
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)
file_list = drive.ListFile({'q': "'root' in parents and trashed=false"}).GetList()
for file in file_list:
print(f"title: {file['title']}, id: {file['id']}")

Solution
You can use the great PyDrive library to acess your Drive files easily from Google collab and thus check which files you have or have been exported, etc.
The following piece of code is an example that lists all the files in the root directory of your Google Drive API. This has been found in this answer (yes, I am making this answer a community wiki post) :
# Install the library
!pip install -U -q PyDrive
# Install the rest of the services/libraries needed
import os
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# choose a local (colab) directory to store the data.
local_download_path = os.path.expanduser('~/data')
try:
os.makedirs(local_download_path)
except: pass
# 2. Auto-iterate using the query syntax, in this case as I am using the main directory of Drive this would be root
# https://developers.google.com/drive/v2/web/search-parameters
file_list = drive.ListFile(
{'q': "'root' in parents"}).GetList()
for f in file_list:
# 3. Print the name and id of the files
print('title: %s, id: %s' % (f['title'], f['id']))
NOTE: when you do this colab will take you to another page to authentificate and make you insert the secret key. Just follow what the service indicates you to do, it is pretty straightforward.
I hope this has helped you. Let me know if you need anything else or if you did not understood something. :)

Related

How to upload downloaded telegram media directly on google drive?

I'm working on the telethon download_media method for downloading images and videos. It is working fine (as expected). Now, I want to directly upload the download_media to my google drive folder.
Sample code looks something like:
from telethon import TelegramClient, events, sync
from telethon.tl.types import PeerUser, PeerChat, PeerChannel
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
gauth = GoogleAuth()
drive = GoogleDrive(gauth)
gfile = drive.CreateFile({'parents': [{'id': 'drive_directory_path'}]})
api_id = #####
api_hash = ##########
c = client.get_entity(PeerChannel(1234567)) # some random channel id
for m in client.iter_messages(c):
if m.photo:
# below is the one way and it works
# m.download_media("Media/")
# I want to try something like this - below code
gfile.SetContentFile(m.media)
gfile.Upload()
This code is not working. How Can I define the google drive object for download_media?
Thanks in advance. Kindly assist!
The main problem is that according to PyDrive's documentation, setContentFile() expects a string with the file's local path, and then it just uses open(), so you're meant to use this with local files. In your code you're trying to feed it the media file so it won't work.
To upload a bytes file with PyDrive you'll need to convert it to BytesIO and send it as the content. An example with a local file would look like this:
drive = GoogleDrive(gauth)
file = drive.CreateFile({'mimeType':'image/jpeg', 'title':'example.jpg'})
filebytes = open('example.jpg', 'rb').read()
file.content = io.BytesIO(filebytes)
file.Upload()
Normally you don't need to do it this way because setContentFile() does the opening and conversion for you, but this should give you the idea that if you get the bytes media file you can just convert it and assign it to file.content and then you can upload it.
Now, if you look at the Telethon documentation, you will see that download_media() takes a file argument which you can set to bytes:
file (str | file, optional):
The output file path, directory, or stream-like object. If the path exists and is a file, it will be overwritten. If file is the type bytes, it will be downloaded in-memory as a bytestring (e.g. file=bytes).
So you should be able to call m.download_media(file=bytes) to get a bytes object. Looking even deeper at the Telethon source code it appears that this does return a BytesIO object. With this in mind, you can try the following change in your loop:
for m in client.iter_messages(c):
if m.photo:
gfile.content = io.BytesIO(m.download_media(file=bytes))
gfile.Upload()
Note that I only tested the PyDrive side since I currently don't have access to the Telegram API, but looking at the docs I believe this should work. Let me know what happens.
Sources:
PyDrive docs and source
Telethon docs and source

Google cloude authentification without json

So im using a small programm to get license plates from images. I do that by sending google vision the image and searching the text that i get bex for licens plates that are like a regular expression.
# -*- coding: utf-8 -*-
"""
Created on Sat May 23 19:42:18 2020
#author: Odatas
"""
import io
import os
from google.cloud import vision_v1p3beta1 as vision
import cv2
import re
# Setup google authen client key
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'client_key.json'
# Source path content all images
SOURCE_PATH = "F:/Radsteuereintreiber/Bilder Temp/"
def recognize_license_plate(img_path):
# Read image with opencv
img = cv2.imread(img_path)
# Get image size
height, width = img.shape[:2]
# Scale image
img = cv2.resize(img, (800, int((height * 800) / width)))
# Save the image to temp file
cv2.imwrite(SOURCE_PATH + "output.jpg", img)
# Create new img path for google vision
img_path = SOURCE_PATH + "output.jpg"
# Create google vision client
client = vision.ImageAnnotatorClient()
# Read image file
with io.open(img_path, 'rb') as image_file:
content = image_file.read()
image = vision.types.Image(content=content)
# Recognize text
response = client.text_detection(image=image)
texts = response.text_annotations
return texts
path = SOURCE_PATH + 'IMG_20200513_173356.jpg'
plate = recognize_license_plate(path)
for text in plate:
# read description
license_plate = text.description
# change all symbols to whitespace.
license_plate = re.sub('[^a-zA-Z0-9\n\.]', ' ', license_plate)
# see if some text matches pattern
test = re.findall('[A-Z]{1,3}\s[A-Z]{1,2}\s\d{1,4}', str(license_plate))
# stop if you found someting
if test is not None:
break
try:
print(test[0])
except Exception:
print("No plate found")
As you can see i set my envoiremental variable to the client_key.json at the start. When i distribut my programm i dont like to send out my key to every user. So i would like to include the key inside the program directly.
I tried it by using the explicit credential method by google with a json created inside the program like this:
def explicit():
#creat json
credentials={ REMOVED: INSIDE HER WOULD BE ALL THE INFORMATION FROM THE JSON KEY FILE.
}
json_credentials=json.dumps(credentials)
# Explicitly use service account credentials by specifying the private key
# file.
storage_client = storage.Client.from_service_account_json(
json_credentials)
# Make an authenticated API request
buckets = list(storage_client.list_buckets())
print(buckets)
# [END auth_cloud_explicit]
But i get the error.
[Errno 2] No such file or directory: content of my json again removed
So i not sure if i have to switch to an api based call and how do i call the same functionality then? Because i have to upload a picture obvriously i dont even think thats possible through an api call.
So im kinda lost. Thanks for any help.
If you want the user to be able to make API calls against your Google Cloud project, then including your service account key, either as a JSON file or inline in your code, is basically equivalent, and either way the user would have access to your key.
This is generally not advised though: even a minimally scoped service account would be able to make requests and potentially incur charges against your account.
An alternative would be to deploy your own API inside your Google Cloud project which wraps the call to the Vision API. This would allow you to protect your service account key, and also to rate limit or even block calls to this API if you need to.
Your script or library would then make calls to this custom API instead of directly to the Vision API.

Save my file on a shared drive google colab

We are working as a team on a Shared Drive, we are using Google Colab to support our code.
Here is the path : /content/drive/Shared drives/Projet_IE/Technique/BDD
We want to save the file (.json) in our drive, but it fails because of the blank in the "Shared drives"
How can we make this path understandable by the code, since "Shared drives" is impose by GDrive and the code doesn't understand the blank in the path ?
Thanks !
from google.colab import drive
import json
!mkdir -p /content/gdrive/My\ Drive/test
a = {'a':1,'b':2,'c':3}
with open("/content/gdrive/My Drive/test/your_json_file", "w") as fp:
json.dump(a , fp)

Loading a 8.9 GB dataset from Google Drive to Google Colab?

I am working on a huge laboratory dataset and want to know how to load an 8.9GB dataset from my google drive to my google colab file. The error it shows is runtime stopped, Restarting it.
I've already tried chunksize, nrows, na_filter, and dask. There might be a problem implementing them though. If you could explain to me how to use it. I am attaching my original code below.
import pandas as pd
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
id = '1M4tregypJ_HpXaQCIykyG2lQtAMR9nPe'
downloaded = drive.CreateFile({'id':id})
downloaded.GetContentFile('Filename.csv')
df = pd.read_csv('Filename.csv')
df.head()
If you suggest any of the methods I've already tried please do so with appropriate and working code.
The problem is probably from pd.read_csv('Filename.csv').
A 8.9GB CSV file will take more than 13GB RAM. You should not load the whole file into memory, but work incrementally.

Writing Pandas DataFrames to Google sheets: no such file or directory .oauth/drive.json

I've been trying to find a way to read and write data between Pandas and Google sheets for a while now. I found the library df2gspread which seems perfect for the job. Been spending a while now trying to get it to work.
As instructed, I used the Google API console to create my client secrets file and saved it as ~/.gdrive_private. Now, I'm trying to download the contents of a Google spreadsheet as follows:
workbook = [local filepath to workbook in Google Drive folder]
df = g2d.download(workbook, 'Sheet1', col_names = True, row_names = True)
When I run this, it is successfully opening a browser window asking to give my app access to my Google sheets. However, when I click allow, an iPython error is coming up:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/samlilienfeld/.oauth/drive.json'
What is this file supposed to contain? I've tried creating the folder and including my client secrets again there as drive.json, but this does not work.
I did a work around for the time being by passing a pre-authenticated credential file to the g2d call.
I made a gist here (for Python2x but should work for 3x) to save the credential file by passing the secret file (basically ~/.gdrive_private) and the resulting authenticated credential filename to save.
Use the above gist in an standalone script with appropriate filenames and run it from a terminal console. A browser window will open to perform the OAuth authentication via Google, and should give you a token which you can copy paste into the terminal prompt. Here's a quick example:
from gdrive_creds import create_creds
# Copy Paste whatever shows up in the browser in the console.
create_creds('./.gdrive_private', './authenticated_creds')
You can then use the file to authenticate for df2gspread calls.
Once you create the cred file using the gist method, try something like this to get access to your GDrive:
from oauth2client.file import Storage
from df2gspread import gspread2df as g2d
# Read the cred file
creds = Storage('./authenticated_creds').get()
# Pass it to g2df (Trimmed for brevity)
workbook = [local filepath to workbook in Google Drive folder]
df = g2d.download(workbook, 'Sheet1', col_names = True, credentials=creds)
df.head()
This worked for me.
Here the two functioning ways as of 2019:
1.DateFrame data to Google sheet:
#Import libraries
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
# Connection to googlesheet
import gspread
from oauth2client.service_account import ServiceAccountCredentials
# From dataframe to google sheet
from df2gspread import df2gspread as d2g
# Configure the connection
scope = ['https://spreadsheets.google.com/feeds']
# Add the JSON file you downloaded from Google Cloud to your working directory
# the JSON file in this case is called 'service_account_gs.json' you can rename as you wish
credentials =ServiceAccountCredentials.from_json_keyfile_name('service_account_gs.json',
scope
)
# Authorise your Notebook with credentials just provided above
gc = gspread.authorize(credentials)
# The spreadsheet ID, you see it in the URL path of your google sheet
spreadsheet_key = '1yr6LwGQzdNnaonn....'
# Create the dataframe within your notebook
df = pd.DataFrame({'number': [1,2,3],'letter': ['a','b','c']})
# Set the sheet name you want to upload data to and the start cell where the upload data begins
wks_name = 'Sheet1'
cell_of_start_df = 'A1'
# upload the dataframe
d2g.upload(df,
spreadsheet_key,
wks_name,
credentials=credentials,
col_names=True,
row_names=False,
start_cell = cell_of_start_df,
clean=False)
print ('Successfully updated')
2.Google sheet to DataFrame
from df2gspread import gspread2df as g2d
df = g2d.download(gfile='1yr6LwGQzdNnaonn....',
credentials=credentials,
col_names=True,
row_names=False)
df
It seems like this issue was because /User/***/.oauth folder wasn't created automatically by oauth2client package (e.g. issue). One of possible solutions is to create this folder manually or you can update df2gspread, issue should be fixed in last version.

Resources