So im using a small programm to get license plates from images. I do that by sending google vision the image and searching the text that i get bex for licens plates that are like a regular expression.
# -*- coding: utf-8 -*-
"""
Created on Sat May 23 19:42:18 2020
#author: Odatas
"""
import io
import os
from google.cloud import vision_v1p3beta1 as vision
import cv2
import re
# Setup google authen client key
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'client_key.json'
# Source path content all images
SOURCE_PATH = "F:/Radsteuereintreiber/Bilder Temp/"
def recognize_license_plate(img_path):
# Read image with opencv
img = cv2.imread(img_path)
# Get image size
height, width = img.shape[:2]
# Scale image
img = cv2.resize(img, (800, int((height * 800) / width)))
# Save the image to temp file
cv2.imwrite(SOURCE_PATH + "output.jpg", img)
# Create new img path for google vision
img_path = SOURCE_PATH + "output.jpg"
# Create google vision client
client = vision.ImageAnnotatorClient()
# Read image file
with io.open(img_path, 'rb') as image_file:
content = image_file.read()
image = vision.types.Image(content=content)
# Recognize text
response = client.text_detection(image=image)
texts = response.text_annotations
return texts
path = SOURCE_PATH + 'IMG_20200513_173356.jpg'
plate = recognize_license_plate(path)
for text in plate:
# read description
license_plate = text.description
# change all symbols to whitespace.
license_plate = re.sub('[^a-zA-Z0-9\n\.]', ' ', license_plate)
# see if some text matches pattern
test = re.findall('[A-Z]{1,3}\s[A-Z]{1,2}\s\d{1,4}', str(license_plate))
# stop if you found someting
if test is not None:
break
try:
print(test[0])
except Exception:
print("No plate found")
As you can see i set my envoiremental variable to the client_key.json at the start. When i distribut my programm i dont like to send out my key to every user. So i would like to include the key inside the program directly.
I tried it by using the explicit credential method by google with a json created inside the program like this:
def explicit():
#creat json
credentials={ REMOVED: INSIDE HER WOULD BE ALL THE INFORMATION FROM THE JSON KEY FILE.
}
json_credentials=json.dumps(credentials)
# Explicitly use service account credentials by specifying the private key
# file.
storage_client = storage.Client.from_service_account_json(
json_credentials)
# Make an authenticated API request
buckets = list(storage_client.list_buckets())
print(buckets)
# [END auth_cloud_explicit]
But i get the error.
[Errno 2] No such file or directory: content of my json again removed
So i not sure if i have to switch to an api based call and how do i call the same functionality then? Because i have to upload a picture obvriously i dont even think thats possible through an api call.
So im kinda lost. Thanks for any help.
If you want the user to be able to make API calls against your Google Cloud project, then including your service account key, either as a JSON file or inline in your code, is basically equivalent, and either way the user would have access to your key.
This is generally not advised though: even a minimally scoped service account would be able to make requests and potentially incur charges against your account.
An alternative would be to deploy your own API inside your Google Cloud project which wraps the call to the Vision API. This would allow you to protect your service account key, and also to rate limit or even block calls to this API if you need to.
Your script or library would then make calls to this custom API instead of directly to the Vision API.
Related
I'm working on the telethon download_media method for downloading images and videos. It is working fine (as expected). Now, I want to directly upload the download_media to my google drive folder.
Sample code looks something like:
from telethon import TelegramClient, events, sync
from telethon.tl.types import PeerUser, PeerChat, PeerChannel
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
gauth = GoogleAuth()
drive = GoogleDrive(gauth)
gfile = drive.CreateFile({'parents': [{'id': 'drive_directory_path'}]})
api_id = #####
api_hash = ##########
c = client.get_entity(PeerChannel(1234567)) # some random channel id
for m in client.iter_messages(c):
if m.photo:
# below is the one way and it works
# m.download_media("Media/")
# I want to try something like this - below code
gfile.SetContentFile(m.media)
gfile.Upload()
This code is not working. How Can I define the google drive object for download_media?
Thanks in advance. Kindly assist!
The main problem is that according to PyDrive's documentation, setContentFile() expects a string with the file's local path, and then it just uses open(), so you're meant to use this with local files. In your code you're trying to feed it the media file so it won't work.
To upload a bytes file with PyDrive you'll need to convert it to BytesIO and send it as the content. An example with a local file would look like this:
drive = GoogleDrive(gauth)
file = drive.CreateFile({'mimeType':'image/jpeg', 'title':'example.jpg'})
filebytes = open('example.jpg', 'rb').read()
file.content = io.BytesIO(filebytes)
file.Upload()
Normally you don't need to do it this way because setContentFile() does the opening and conversion for you, but this should give you the idea that if you get the bytes media file you can just convert it and assign it to file.content and then you can upload it.
Now, if you look at the Telethon documentation, you will see that download_media() takes a file argument which you can set to bytes:
file (str | file, optional):
The output file path, directory, or stream-like object. If the path exists and is a file, it will be overwritten. If file is the type bytes, it will be downloaded in-memory as a bytestring (e.g. file=bytes).
So you should be able to call m.download_media(file=bytes) to get a bytes object. Looking even deeper at the Telethon source code it appears that this does return a BytesIO object. With this in mind, you can try the following change in your loop:
for m in client.iter_messages(c):
if m.photo:
gfile.content = io.BytesIO(m.download_media(file=bytes))
gfile.Upload()
Note that I only tested the PyDrive side since I currently don't have access to the Telegram API, but looking at the docs I believe this should work. Let me know what happens.
Sources:
PyDrive docs and source
Telethon docs and source
I have a spacy model and I am trying to save it to a gcs bucket using this format
trainer.to_disk('gs://{bucket-name}/model')
But each time I run this I get this error message
FileNotFoundError: [Errno 2] No such file or directory: 'gs:/{bucket-name}/model'
Also when I create a kubeflow persistent volume and save the model there I can download the model using trainer.load('model') I get this error message
File "/usr/local/lib/python3.7/site-packages/spacy/__init__.py", line 30, in load
return util.load_model(name, **overrides)
File "/usr/local/lib/python3.7/site-packages/spacy/util.py", line 175, in load_model
raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model '/model/'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
I don't understand why I am having these errors as this works perfectly when I run this on my pc locally and use a local path.
Cloud storage is not a local disk or a physical storage unit where you can save things directly to.
As you say
this on my pc locally and use a local path
Cloud Storage is virtually not a local path of any other tool in the cloud
If you are using python you will have to create a client using the Storage library and then upload your file using upload_blob i.e.:
from google.cloud import storage
def upload_blob(bucket_name, source_file_name, destination_blob_name):
"""Uploads a file to the bucket."""
# bucket_name = "your-bucket-name"
# source_file_name = "local/path/to/file"
# destination_blob_name = "storage-object-name"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
Since you've tagged this question "kubeflow-pipelines", I'll answer from that perspective.
KFP strives to be platform-agnostic. Most good components are cloud-independent.
KFP promotes system-managed artifact passing where the components code only writes output data to local files and the system takes it and makes it available for other components.
So, it's best to describe your SpaCy model trainer that way - to write data to local files. Check how all other components work, for example, Train Keras classifier.
Since you want to upload to GCS, do that explicitly, but passing the model output of your trainer to an "Upload to GCS" component:
upload_to_gcs_op = components.load_component_from_url('https://raw.githubusercontent.com/kubeflow/pipelines/616542ac0f789914f4eb53438da713dd3004fba4/components/google-cloud/storage/upload_to_explicit_uri/component.yaml')
def my_pipeline():
model = train_specy_model(...).outputs['model']
upload_to_gcs_op(
data=model,
gcs_path='gs:/.....',
)
The following implementation assumes you have gsutil installed in your computer. The spaCy version used was 3.2.4. In my case, I wanted everything to be part of a (demo) single Python file, spacy_import_export.py. To do so, I had to use subprocess python library, plus this comment, as follows:
# spacy_import_export.py
import spacy
import subprocess # Will be used later
# spaCy models trained by user, are always stored as LOCAL directories, with more subdirectories and files in it.
PATH_TO_MODEL = "/home/jupyter/" # Use your own path!
# Test-loading your "trainer" (optional step)
trainer = spacy.load(PATH_TO_MODEL+"model")
# Replace 'bucket-name' with the one of your own:
bucket_name = "destination-bucket-name"
GCS_BUCKET = "gs://{}/model".format(bucket_name)
# This does the trick for the UPLOAD to Cloud Storage:
# TIP: Just for security, check Cloud Storage afterwards: "model" should be in GCS_BUCKET
subprocess.run(["gsutil", "-m", "cp", "-r", PATH_TO_MODEL+"model", GCS_BUCKET])
# This does the trick for the DOWNLOAD:
# HINT: By now, in PATH_TO_MODEL, you should have a "model" & "downloaded_model"
subprocess.run(["gsutil", "-m", "cp", "-r", GCS_BUCKET+MODEL_NAME+"/*", PATH_TO_MODEL+"downloaded_model"])
# Test-loading your "GCS downloaded model" (optional step)
nlp_original = spacy.load(PATH_TO_MODEL+"downloaded_model")
I apologize for the excess of comments, I just wanted to make everything clear, for "spaCy newcomers". I know it is a bit late, but hope it helps.
I'm following the docs here: https://colab.research.google.com/github/google/earthengine-api/blob/master/python/examples/ipynb/TF_demo1_keras.ipynb#scrollTo=43-c0JNFI_m6 to learn how to use Tensorflow with GEE. One part of this tutorial is checking the existence of exported files. In the docs, the example code is:
fileNameSuffix = '.tfrecord.gz'
trainFilePath = 'gs://' + outputBucket + '/' + trainFilePrefix + fileNameSuffix
testFilePath = 'gs://' + outputBucket + '/' + testFilePrefix + fileNameSuffix
print('Found training file.' if tf.gfile.Exists(trainFilePath)
else 'No training file found.')
print('Found testing file.' if tf.gfile.Exists(testFilePath)
else 'No testing file found.')
In my case, I'm just exporting the files to Google Drive instead of Google Cloud bucket. How would I change trainFilePath and testFilePath to point to the Google Drive folder? FWIW, when I go into the Google Drive Folder, I do actually see files.
I would say you could use the Google Drive API to list files in you Google Drive instead of a GCS Bucket. You can find the documentation here.
You can also use PyDrive, which is pretty easy to understand. This is an example, you only have to adjust the query "q" to your needs:
from pydrive.drive import GoogleDrive
from pydrive.auth import GoogleAuth
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)
file_list = drive.ListFile({'q': "'root' in parents and trashed=false"}).GetList()
for file in file_list:
print(f"title: {file['title']}, id: {file['id']}")
Solution
You can use the great PyDrive library to acess your Drive files easily from Google collab and thus check which files you have or have been exported, etc.
The following piece of code is an example that lists all the files in the root directory of your Google Drive API. This has been found in this answer (yes, I am making this answer a community wiki post) :
# Install the library
!pip install -U -q PyDrive
# Install the rest of the services/libraries needed
import os
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# choose a local (colab) directory to store the data.
local_download_path = os.path.expanduser('~/data')
try:
os.makedirs(local_download_path)
except: pass
# 2. Auto-iterate using the query syntax, in this case as I am using the main directory of Drive this would be root
# https://developers.google.com/drive/v2/web/search-parameters
file_list = drive.ListFile(
{'q': "'root' in parents"}).GetList()
for f in file_list:
# 3. Print the name and id of the files
print('title: %s, id: %s' % (f['title'], f['id']))
NOTE: when you do this colab will take you to another page to authentificate and make you insert the secret key. Just follow what the service indicates you to do, it is pretty straightforward.
I hope this has helped you. Let me know if you need anything else or if you did not understood something. :)
So i am working on a small project(a restful service so json format which is not mentioned in the code) in which the code accepts base_64 image data and decodes it to from an image ,i'm able to convert it back to image but i am not able to use google vision(googel ocr) on the image to extract the text . The only part that isn't working is the following block of code:
from flask import Flask,request,jsonify
import os,io,re,glob,base64
from google.cloud import vision
from google.cloud.vision import types
from PIL import Image
app = Flask(__name__)
os.environ['GOOGLE_APPLICATION_CREDENTIALS']=r'date_scanner.json'
#app.route('/jason_example',methods=['POST'])
def jason_example():
req_data=request.get_json()
base_64_image_content=req_data['imgcnt']
#the issue starts from here
image = base64.b64decode(base_64_image_content)
image=Image.open(io.BytesIO(image))
image=vision.types.Image(content=content)
response=client.text_detection(image=image)
texts=response.text_annotations`
enter code here
No need to use Image.open which I think is a PIL method anyway. You should be able to decode this straight to a byte string with base64.decodebytes, as outlined in this answer,
The code should look like:
# the issue starts from here
image_bytes = base64.decodebytes(base_64_image_content)
image = vision.types.Image(content=image_bytes)
response=client.text_detection(image=image)
texts=response.text_annotations
I've been trying to find a way to read and write data between Pandas and Google sheets for a while now. I found the library df2gspread which seems perfect for the job. Been spending a while now trying to get it to work.
As instructed, I used the Google API console to create my client secrets file and saved it as ~/.gdrive_private. Now, I'm trying to download the contents of a Google spreadsheet as follows:
workbook = [local filepath to workbook in Google Drive folder]
df = g2d.download(workbook, 'Sheet1', col_names = True, row_names = True)
When I run this, it is successfully opening a browser window asking to give my app access to my Google sheets. However, when I click allow, an iPython error is coming up:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/samlilienfeld/.oauth/drive.json'
What is this file supposed to contain? I've tried creating the folder and including my client secrets again there as drive.json, but this does not work.
I did a work around for the time being by passing a pre-authenticated credential file to the g2d call.
I made a gist here (for Python2x but should work for 3x) to save the credential file by passing the secret file (basically ~/.gdrive_private) and the resulting authenticated credential filename to save.
Use the above gist in an standalone script with appropriate filenames and run it from a terminal console. A browser window will open to perform the OAuth authentication via Google, and should give you a token which you can copy paste into the terminal prompt. Here's a quick example:
from gdrive_creds import create_creds
# Copy Paste whatever shows up in the browser in the console.
create_creds('./.gdrive_private', './authenticated_creds')
You can then use the file to authenticate for df2gspread calls.
Once you create the cred file using the gist method, try something like this to get access to your GDrive:
from oauth2client.file import Storage
from df2gspread import gspread2df as g2d
# Read the cred file
creds = Storage('./authenticated_creds').get()
# Pass it to g2df (Trimmed for brevity)
workbook = [local filepath to workbook in Google Drive folder]
df = g2d.download(workbook, 'Sheet1', col_names = True, credentials=creds)
df.head()
This worked for me.
Here the two functioning ways as of 2019:
1.DateFrame data to Google sheet:
#Import libraries
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
# Connection to googlesheet
import gspread
from oauth2client.service_account import ServiceAccountCredentials
# From dataframe to google sheet
from df2gspread import df2gspread as d2g
# Configure the connection
scope = ['https://spreadsheets.google.com/feeds']
# Add the JSON file you downloaded from Google Cloud to your working directory
# the JSON file in this case is called 'service_account_gs.json' you can rename as you wish
credentials =ServiceAccountCredentials.from_json_keyfile_name('service_account_gs.json',
scope
)
# Authorise your Notebook with credentials just provided above
gc = gspread.authorize(credentials)
# The spreadsheet ID, you see it in the URL path of your google sheet
spreadsheet_key = '1yr6LwGQzdNnaonn....'
# Create the dataframe within your notebook
df = pd.DataFrame({'number': [1,2,3],'letter': ['a','b','c']})
# Set the sheet name you want to upload data to and the start cell where the upload data begins
wks_name = 'Sheet1'
cell_of_start_df = 'A1'
# upload the dataframe
d2g.upload(df,
spreadsheet_key,
wks_name,
credentials=credentials,
col_names=True,
row_names=False,
start_cell = cell_of_start_df,
clean=False)
print ('Successfully updated')
2.Google sheet to DataFrame
from df2gspread import gspread2df as g2d
df = g2d.download(gfile='1yr6LwGQzdNnaonn....',
credentials=credentials,
col_names=True,
row_names=False)
df
It seems like this issue was because /User/***/.oauth folder wasn't created automatically by oauth2client package (e.g. issue). One of possible solutions is to create this folder manually or you can update df2gspread, issue should be fixed in last version.